date:20210217

Re: [PATCH 09/16] media: i2c: rdacm21: Re-work OV10640 initialization

2021-02-17 Thread Kieran Bingham

Hi Jacopo,

On 16/02/2021 17:41, Jacopo Mondi wrote:
> The OV10640 image sensor reset and powerdown on signals are controlled
> by the embedded OV490 ISP. The current reset procedure does not respect
> the 1 millisecond power-up delay and releases the reset signal before
> the powerdown one.
> 
> Fix the OV10640 power up sequence by releasing the powerdown signal,
> waiting the mandatory 1 millisecond power up delay and then releasing
> the reset signal. The reset delay is not characterized in the chip
> manual if not as "255 XVCLK + initialization". Wait for at least 3
> milliseconds to guarantee the SCCB bus is available.
> 
> This commit fixes a sporadic start-up error triggered by a failure to
> read the OV10640 chip ID:
> rdacm21 8-0054: OV10640 ID mismatch: (0x01)

Have you done a similar initialisation rework for the RDACM21 as you do
in [03/16] of this series for the RDACM20 (or was it already better
perhaps?)

Only interested because of noting that I think the 'mismatch' check is
now gone from the RDAMC20.


> 
> Signed-off-by: Jacopo Mondi 
> ---
>  drivers/media/i2c/rdacm21.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/i2c/rdacm21.c b/drivers/media/i2c/rdacm21.c
> index b22a2ca5340b..c420a6b96879 100644
> --- a/drivers/media/i2c/rdacm21.c
> +++ b/drivers/media/i2c/rdacm21.c
> @@ -333,13 +333,15 @@ static int ov10640_initialize(struct rdacm21_device 
> *dev)
>  {
>   u8 val;
>  
> - /* Power-up OV10640 by setting RESETB and PWDNB pins high. */
> + /* Power-up OV10640 by setting PWDNB and RESETB pins high. */
>   ov490_write_reg(dev, OV490_GPIO_SEL0, OV490_GPIO0);
>   ov490_write_reg(dev, OV490_GPIO_SEL1, OV490_SPWDN0);
>   ov490_write_reg(dev, OV490_GPIO_DIRECTION0, OV490_GPIO0);
>   ov490_write_reg(dev, OV490_GPIO_DIRECTION1, OV490_SPWDN0);
> - ov490_write_reg(dev, OV490_GPIO_OUTPUT_VALUE0, OV490_GPIO0);
> +
>   ov490_write_reg(dev, OV490_GPIO_OUTPUT_VALUE0, OV490_SPWDN0);
> + usleep_range(1500, 3000);
> + ov490_write_reg(dev, OV490_GPIO_OUTPUT_VALUE0, OV490_GPIO0);
>   usleep_range(3000, 5000);
>  
>   /* Read OV10640 ID to test communications. */
>

Re: [PATCH 1/2] mm: Make alloc_contig_range handle free hugetlb pages

2021-02-17 Thread David Hildenbrand


On 17.02.21 14:50, Michal Hocko wrote:

On Wed 17-02-21 14:36:47, David Hildenbrand wrote:

On 17.02.21 14:30, Michal Hocko wrote:

On Wed 17-02-21 11:08:15, Oscar Salvador wrote:

Free hugetlb pages are tricky to handle so as to no userspace application
notices disruption, we need to replace the current free hugepage with
a new one.

In order to do that, a new function called alloc_and_dissolve_huge_page
is introduced.
This function will first try to get a new fresh hugetlb page, and if it
succeeds, it will dissolve the old one.

With regard to the allocation, since we do not know whether the old page
was allocated on a specific node on request, the node the old page belongs
to will be tried first, and then we will fallback to all nodes containing
memory (N_MEMORY).


I do not think fallback to a different zone is ok. If yes then this
really requires a very good reasoning. alloc_contig_range is an
optimistic allocation interface at best and it shouldn't break carefully
node aware preallocation done by administrator.


What does memory offlining do when migrating in-use hugetlbfs pages? Does it
always keep the node?


No it will break the node pool. The reasoning behind that is that
offlining is an explicit request from the userspace and it is expected


userspace? in 99,9996% it's the hardware that triggers the unplug of a DIMM.




I think keeping the node is the easiest/simplest approach for now.




Note that gigantic hugetlb pages are fenced off since there is a cyclic
dependency between them and alloc_contig_range.


Why do we need/want to do all this in the first place?


cma and virtio-mem (especially on ZONE_MOVABLE) really want to handle
hugetlbfs pages.


Do we have any real life examples? Or does this fall more into, let's
optimize an existing implementation category.



It's a big TODO item I have on my list and I am happy that Oscar is 
looking into it. So yes, I noticed it while working on virtio-mem. It's 
real.


--
Thanks,

David / dhildenb

Re: [PATCH 2/2] mm: Make alloc_contig_range handle in-use hugetlb pages

2021-02-17 Thread Michal Hocko

On Wed 17-02-21 14:46:49, Oscar Salvador wrote:
> On Wed, Feb 17, 2021 at 02:36:31PM +0100, Michal Hocko wrote:
> > On Wed 17-02-21 11:08:16, Oscar Salvador wrote:
> > > In-use hugetlb pages can be migrated as any other page (LRU
> > > and Movable), so let alloc_contig_range handle them.
> > > 
> > > All we need is to succesfully isolate such page.
> > 
> > Again, this is missing a problem statement and a justification why we
> > want/need this.
> 
> Heh, I was poor in words.
> 
> "alloc_contig_range() will fail miserably if it finds a HugeTLB page within
>  the range without a chance to handle them. Since HugeTLB pages can be 
> migrated
>  as any other page (LRU and Movable), it does not make sense to bail out.
>  Enable the interface to recognize in-use HugeTLB pages and have a chance
>  to migrate them"
> 
> What about something along those lines?

Is this a real life problem? I know we _can_ but I do not see any
reasoning _why_ should we care all that much.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH v6 0/3] pinctrl: pinmux: Add pinmux-select debugfs file

2021-02-17 Thread Tony Lindgren

* Drew Fustini  [210216 22:46]:
> This series first converts the debugfs files in the pinctrl subsystem to
> octal permissions and then adds a new debugfs file "pinmux-select".
> 
> Function name and group name can be written to "pinmux-select" which
> will cause the function and group to be activated on the pin controller.

Nice to see this happening!

Reviewed-by: Tony Lindgren

Re: [PATCH RESEND V12 4/8] fuse: Passthrough initialization and release

2021-02-17 Thread Miklos Szeredi

On Mon, Jan 25, 2021 at 4:31 PM Alessio Balsini  wrote:
>
> Implement the FUSE passthrough ioctl that associates the lower
> (passthrough) file system file with the fuse_file.
>
> The file descriptor passed to the ioctl by the FUSE daemon is used to
> access the relative file pointer, that will be copied to the fuse_file
> data structure to consolidate the link between the FUSE and lower file
> system.
>
> To enable the passthrough mode, user space triggers the
> FUSE_DEV_IOC_PASSTHROUGH_OPEN ioctl and, if the call succeeds, receives
> back an identifier that will be used at open/create response time in the
> fuse_open_out field to associate the FUSE file to the lower file system
> file.
> The value returned by the ioctl to user space can be:
> - > 0: success, the identifier can be used as part of an open/create
> reply.
> - <= 0: an error occurred.
> The value 0 represents an error to preserve backward compatibility: the
> fuse_open_out field that is used to pass the passthrough_fh back to the
> kernel uses the same bits that were previously as struct padding, and is
> commonly zero-initialized (e.g., in the libfuse implementation).
> Removing 0 from the correct values fixes the ambiguity between the case
> in which 0 corresponds to a real passthrough_fh, a missing
> implementation of FUSE passthrough or a request for a normal FUSE file,
> simplifying the user space implementation.
>
> For the passthrough mode to be successfully activated, the lower file
> system file must implement both read_iter and write_iter file
> operations. This extra check avoids special pseudo files to be targeted
> for this feature.
> Passthrough comes with another limitation: no further file system
> stacking is allowed for those FUSE file systems using passthrough.
>
> Signed-off-by: Alessio Balsini 
> ---
>  fs/fuse/inode.c   |  5 +++
>  fs/fuse/passthrough.c | 87 ++-
>  2 files changed, 90 insertions(+), 2 deletions(-)
>
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index a1104d5abb70..7ebc398fbacb 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -1133,6 +1133,11 @@ EXPORT_SYMBOL_GPL(fuse_send_init);
>
>  static int free_fuse_passthrough(int id, void *p, void *data)
>  {
> +   struct fuse_passthrough *passthrough = (struct fuse_passthrough *)p;
> +
> +   fuse_passthrough_release(passthrough);
> +   kfree(p);
> +
> return 0;
>  }
>
> diff --git a/fs/fuse/passthrough.c b/fs/fuse/passthrough.c
> index 594060c654f8..cf993e83803e 100644
> --- a/fs/fuse/passthrough.c
> +++ b/fs/fuse/passthrough.c
> @@ -3,19 +3,102 @@
>  #include "fuse_i.h"
>
>  #include 
> +#include 
>
>  int fuse_passthrough_open(struct fuse_dev *fud,
>   struct fuse_passthrough_out *pto)
>  {
> -   return -EINVAL;
> +   int res;
> +   struct file *passthrough_filp;
> +   struct fuse_conn *fc = fud->fc;
> +   struct inode *passthrough_inode;
> +   struct super_block *passthrough_sb;
> +   struct fuse_passthrough *passthrough;
> +
> +   if (!fc->passthrough)
> +   return -EPERM;
> +
> +   /* This field is reserved for future implementation */
> +   if (pto->len != 0)
> +   return -EINVAL;
> +
> +   passthrough_filp = fget(pto->fd);
> +   if (!passthrough_filp) {
> +   pr_err("FUSE: invalid file descriptor for passthrough.\n");
> +   return -EBADF;
> +   }
> +
> +   if (!passthrough_filp->f_op->read_iter ||
> +   !passthrough_filp->f_op->write_iter) {
> +   pr_err("FUSE: passthrough file misses file operations.\n");
> +   res = -EBADF;
> +   goto err_free_file;
> +   }
> +
> +   passthrough_inode = file_inode(passthrough_filp);
> +   passthrough_sb = passthrough_inode->i_sb;
> +   if (passthrough_sb->s_stack_depth >= FILESYSTEM_MAX_STACK_DEPTH) {
> +   pr_err("FUSE: fs stacking depth exceeded for passthrough\n");

No need to print an error to the logs, this can be a perfectly normal
occurrence.  However I'd try to find a more unique error value than
EINVAL so that the fuse server can interpret this as "not your fault,
but can't support passthrough on this file".  E.g. EOPNOTSUPP.


> +   res = -EINVAL;
> +   goto err_free_file;
> +   }
> +
> +   passthrough = kmalloc(sizeof(struct fuse_passthrough), GFP_KERNEL);
> +   if (!passthrough) {
> +   res = -ENOMEM;
> +   goto err_free_file;
> +   }
> +
> +   passthrough->filp = passthrough_filp;
> +
> +   idr_preload(GFP_KERNEL);
> +   spin_lock(>passthrough_req_lock);

Should be okay to use fc->lock, since neither adding nor removing the
passthrough ID should be a heavily used operation, and querying the
mapping is lockless.

Thanks,
Miklos

Re: [PATCH] lib: vsprintf: check for NULL device_node name in device_node_string()

2021-02-17 Thread Andy Shevchenko

On Wed, Feb 17, 2021 at 01:15:43PM +0100, Enrico Weigelt, metux IT consult 
wrote:
> Under rare circumstances it may happen that a device node's name is NULL
> (most likely kernel bug in some other place).

What circumstances? How can I reproduce this? More information, please!

> In such situations anything
> but helpful, if the debug printout crashes, and nobody knows what actually
> happened here.
> 
> Therefore protect it by an explicit NULL check and print out an extra
> warning.

...

> + pr_warn("device_node without name. Kernel bug 
> ?\n");

If it's not once, then it's possible to have log spammed with this, right?

...

> + p = "";

We have different standard de facto for NULL pointers to be printed. Actually
if you wish, you may gather them under one definition (maybe somewhere under
printk) and export to everybody to use.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH 1/2] mm: Make alloc_contig_range handle free hugetlb pages

2021-02-17 Thread Michal Hocko

On Wed 17-02-21 14:36:47, David Hildenbrand wrote:
> On 17.02.21 14:30, Michal Hocko wrote:
> > On Wed 17-02-21 11:08:15, Oscar Salvador wrote:
> > > Free hugetlb pages are tricky to handle so as to no userspace application
> > > notices disruption, we need to replace the current free hugepage with
> > > a new one.
> > > 
> > > In order to do that, a new function called alloc_and_dissolve_huge_page
> > > is introduced.
> > > This function will first try to get a new fresh hugetlb page, and if it
> > > succeeds, it will dissolve the old one.
> > > 
> > > With regard to the allocation, since we do not know whether the old page
> > > was allocated on a specific node on request, the node the old page belongs
> > > to will be tried first, and then we will fallback to all nodes containing
> > > memory (N_MEMORY).
> > 
> > I do not think fallback to a different zone is ok. If yes then this
> > really requires a very good reasoning. alloc_contig_range is an
> > optimistic allocation interface at best and it shouldn't break carefully
> > node aware preallocation done by administrator.
> 
> What does memory offlining do when migrating in-use hugetlbfs pages? Does it
> always keep the node?

No it will break the node pool. The reasoning behind that is that
offlining is an explicit request from the userspace and it is expected
to break affinities because it is a destructive action from the memory
capacity point of view. It is impossible to have former affinity while
you are cutting the memory off under its user.

> I think keeping the node is the easiest/simplest approach for now.
> 
> > 
> > > Note that gigantic hugetlb pages are fenced off since there is a cyclic
> > > dependency between them and alloc_contig_range.
> > 
> > Why do we need/want to do all this in the first place?
> 
> cma and virtio-mem (especially on ZONE_MOVABLE) really want to handle
> hugetlbfs pages.

Do we have any real life examples? Or does this fall more into, let's
optimize an existing implementation category.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH v3 3/4] i2c: designware: Use the correct name of device-managed function

2021-02-17 Thread Andy Shevchenko

On Wed, Feb 17, 2021 at 07:40:14PM +0800, Dejin Zheng wrote:
> On Tue, Feb 16, 2021 at 06:46:01PM +0100, Krzysztof Wilczyński wrote:

...

> > The change simplifies the error handling path, how?  A line of two which
> > explains how it has been achieved might help should someone reads the
> > commit message in the future.
> > 
> To put it simply, if the driver probe fail, the device-managed function
> mechanism will automatically call pcim_release(), then the 
> pci_free_irq_vectors()
> will be executed. For details, please see the relevant code.

Perhaps as a compromise you may add this short sentence to your commit
messages, like "the freeing resources will take automatically when device
is gone".

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH] drm/msm/a6xx: fix for kernels without CONFIG_NVMEM

2021-02-17 Thread Akhil P Oommen


On 2/17/2021 8:36 AM, Rob Clark wrote:

On Tue, Feb 16, 2021 at 12:10 PM Jonathan Marek  wrote:


Ignore nvmem_cell_get() EOPNOTSUPP error in the same way as a ENOENT error,
to fix the case where the kernel was compiled without CONFIG_NVMEM.

Fixes: fe7952c629da ("drm/msm: Add speed-bin support to a618 gpu")
Signed-off-by: Jonathan Marek 
---
  drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index ba8e9d3cf0fe..7fe5d97606aa 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1356,10 +1356,10 @@ static int a6xx_set_supported_hw(struct device *dev, 
struct a6xx_gpu *a6xx_gpu,

 cell = nvmem_cell_get(dev, "speed_bin");
 /*
-* -ENOENT means that the platform doesn't support speedbin which is
-* fine
+* -ENOENT means no speed bin in device tree,
+* -EOPNOTSUPP means kernel was built without CONFIG_NVMEM


very minor nit, it would be nice to at least preserve the gist of the
"which is fine" (ie. some variation of "this is an optional thing and
things won't catch fire without it" ;-))

(which is, I believe, is true, hopefully Akhil could confirm.. if not
we should have a harder dependency on CONFIG_NVMEM..)
IIRC, if the gpu opp table in the DT uses the 'opp-supported-hw' 
property, we will see some error during boot up if we don't call 
dev_pm_opp_set_supported_hw(). So calling "nvmem_cell_get(dev, 
"speed_bin")" is a way to test this.


If there is no other harm, we can put a hard dependency on CONFIG_NVMEM.

-Akhil.


BR,
-R


  */
-   if (PTR_ERR(cell) == -ENOENT)
+   if (PTR_ERR(cell) == -ENOENT || PTR_ERR(cell) == -EOPNOTSUPP)
 return 0;
 else if (IS_ERR(cell)) {
 DRM_DEV_ERROR(dev,
--
2.26.1

Re: [PATCH 2/2] mm: Make alloc_contig_range handle in-use hugetlb pages

2021-02-17 Thread Oscar Salvador

On Wed, Feb 17, 2021 at 02:36:31PM +0100, Michal Hocko wrote:
> On Wed 17-02-21 11:08:16, Oscar Salvador wrote:
> > In-use hugetlb pages can be migrated as any other page (LRU
> > and Movable), so let alloc_contig_range handle them.
> > 
> > All we need is to succesfully isolate such page.
> 
> Again, this is missing a problem statement and a justification why we
> want/need this.

Heh, I was poor in words.

"alloc_contig_range() will fail miserably if it finds a HugeTLB page within
 the range without a chance to handle them. Since HugeTLB pages can be migrated
 as any other page (LRU and Movable), it does not make sense to bail out.
 Enable the interface to recognize in-use HugeTLB pages and have a chance
 to migrate them"

What about something along those lines?

-- 
Oscar Salvador
SUSE L3

Re: [PATCH v3 2/4] Documentation: devres: Add pcim_alloc_irq_vectors()

2021-02-17 Thread Andy Shevchenko

On Wed, Feb 17, 2021 at 06:50:04PM +0800, Dejin Zheng wrote:
> On Tue, Feb 16, 2021 at 06:10:52PM +0100, Krzysztof Wilczyński wrote:

...

> > Having said that, people might ask - how does it simplify the error
> > handling path?
> > 
> > You might have to back this with a line of two to explain how does the
> > change achieved that, so that when someone looks at the commit message
> > it would be clear what the benefits of the change were.

> The device-managed function is a conventional concept that every developer
> knows. So don't worry about this. And I really can't explain its operation
> mechanism to you in a sentence or two. If you are really interested, you
> can read the relevant code.

I tend on agree on the above. It would be enough to spell it clearly that it's
part of devres API (Managed Device Resource) and we are fine.

-- 
With Best Regards,
Andy Shevchenko

[GIT PULL] scheduler updates for v5.12

2021-02-17 Thread Ingo Molnar

Linus,

Please pull the latest sched/core git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
sched-core-2021-02-17

   # HEAD: c5e6fc08feb2b88dc5dac2f3c817e1c2a4cafda4 sched,x86: Allow 
!PREEMPT_DYNAMIC

Scheduler updates for v5.12:

[ NOTE: unfortunately this tree had to be freshly rebased today,
it's a same-content tree of 82891be90f3c (-next published)
merged with v5.11.

The main reason for the rebase was an authorship misattribution
problem with a new commit, which we noticed in the last minute,
and which we didn't want to be merged upstream. The offending
commit was deep in the tree, and dependent commits had to be
rebased as well. ]

- Core scheduler updates:

  - Add CONFIG_PREEMPT_DYNAMIC: this in its current form adds the
preempt=none/voluntary/full boot options (default: full),
to allow distros to build a PREEMPT kernel but fall back to
close to PREEMPT_VOLUNTARY (or PREEMPT_NONE) runtime scheduling
behavior via a boot time selection.

There's also the /debug/sched_debug switch to do this runtime.

This feature is implemented via runtime patching (a new variant of static 
calls).

The scope of the runtime patching can be best reviewed by looking
at the sched_dynamic_update() function in kernel/sched/core.c.

( Note that the dynamic none/voluntary mode isn't 100% identical,
  for example preempt-RCU is available in all cases, plus the
  preempt count is maintained in all models, which has runtime
  overhead even with the code patching. )

The PREEMPT_VOLUNTARY/PREEMPT_NONE models, used by the vast majority
of distributions, are supposed to be unaffected.

  - Fix ignored rescheduling after rcu_eqs_enter(). This is a bug that
was found via rcutorture triggering a hang. The bug is that
rcu_idle_enter() may wake up a NOCB kthread, but this happens after
the last generic need_resched() check. Some cpuidle drivers fix it
by chance but many others don't.

In true 2020 fashion the original bug fix has grown into a 5-patch
scheduler/RCU fix series plus another 16 RCU patches to address
the underlying issue of missed preemption events. These are the
initial fixes that should fix current incarnations of the bug.

  - Clean up rbtree usage in the scheduler, by providing & using the following
consistent set of rbtree APIs:

 partial-order; less() based:
   - rb_add(): add a new entry to the rbtree
   - rb_add_cached(): like rb_add(), but for a rb_root_cached

 total-order; cmp() based:
   - rb_find(): find an entry in an rbtree
   - rb_find_add(): find an entry, and add if not found

   - rb_find_first(): find the first (leftmost) matching entry
   - rb_next_match(): continue from rb_find_first()
   - rb_for_each(): iterate a sub-tree using the previous two

  - Improve the SMP/NUMA load-balancer: scan for an idle sibling in a single 
pass.
This is a 4-commit series where each commit improves one aspect of the idle
sibling scan logic.

  - Improve the cpufreq cooling driver by getting the effective CPU utilization
metrics from the scheduler

  - Improve the fair scheduler's active load-balancing logic by reducing the 
number
of active LB attempts & lengthen the load-balancing interval. This improves
stress-ng mmapfork performance.

  - Fix CFS's estimated utilization (util_est) calculation bug that can result 
in
too high utilization values

- Misc updates & fixes:

   - Fix the HRTICK reprogramming & optimization feature
   - Fix SCHED_SOFTIRQ raising race & warning in the CPU offlining code
   - Reduce dl_add_task_root_domain() overhead
   - Fix uprobes refcount bug
   - Process pending softirqs in flush_smp_call_function_from_idle()
   - Clean up task priority related defines, remove *USER_*PRIO and
 USER_PRIO()
   - Simplify the sched_init_numa() deduplication sort
   - Documentation updates
   - Fix EAS bug in update_misfit_status(), which degraded the quality
 of energy-balancing
   - Smaller cleanups

 Thanks,

Ingo

-->
Anna-Maria Behnsen (1):
  sched: Prevent raising SCHED_SOFTIRQ when CPU is !active

Dietmar Eggemann (5):
  sched/topology: Fix sched_domain_topology_level alloc in sched_init_numa()
  sched: Remove MAX_USER_RT_PRIO
  sched: Remove USER_PRIO, TASK_USER_PRIO and MAX_USER_PRIO
  sched/core: Update task_prio() function header
  sched/deadline: Reduce rq lock contention in dl_add_task_root_domain()

Frederic Weisbecker (6):
  static_call: Provide DEFINE_STATIC_CALL_RET0()
  rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
  rcu/nocb: Perform deferred wake up before last idle's need_resched() check
  rcu/nocb: Trigger self-IPI on late deferred wake up before user resume
  entry: Explicitly flush pending rcuog wakeup before last rescheduling 
point
  entry/kvm:

Re: [PATCH 1/2] mm: Make alloc_contig_range handle free hugetlb pages

2021-02-17 Thread Oscar Salvador

On Wed, Feb 17, 2021 at 02:30:43PM +0100, Michal Hocko wrote:
> On Wed 17-02-21 11:08:15, Oscar Salvador wrote:
> I do not think fallback to a different zone is ok. If yes then this
> really requires a very good reasoning. alloc_contig_range is an
> optimistic allocation interface at best and it shouldn't break carefully
> node aware preallocation done by administrator.

Yeah, previous version (RFC) was more careful with that.
I somehow thought that it might be ok to fallback to other nodes in case
we failed to allocate on the preferred nid.

I will get RFC handling back wrt. allocation once I gather more feedback.

> 
> > Note that gigantic hugetlb pages are fenced off since there is a cyclic
> > dependency between them and alloc_contig_range.
> 
> Why do we need/want to do all this in the first place?

When trying to allocate a memory chunk with alloc_contig_range, it will fail
if it ever sees a Hugetlb page because isolate_migratepages_range() does not
"recognize" those for migration (or to put it different, does not know about
them).

Given that HugeTLB pages can be migrated or dissolved if free, it makes sense
to enable isolate_migratepages_range() to recognize HugeTLB pages in order
to handle them, as it currently does with LRU and Movable pages.

-- 
Oscar Salvador
SUSE L3

Re: [PATCH RESEND V12 3/8] fuse: Definitions and ioctl for passthrough

2021-02-17 Thread Miklos Szeredi

On Mon, Jan 25, 2021 at 4:31 PM Alessio Balsini  wrote:
>
> Expose the FUSE_PASSTHROUGH interface to user space and declare all the
> basic data structures and functions as the skeleton on top of which the
> FUSE passthrough functionality will be built.
>
> As part of this, introduce the new FUSE passthrough ioctl, which allows
> the FUSE daemon to specify a direct connection between a FUSE file and a
> lower file system file. Such ioctl requires user space to pass the file
> descriptor of one of its opened files through the fuse_passthrough_out
> data structure introduced in this patch. This structure includes extra
> fields for possible future extensions.
> Also, add the passthrough functions for the set-up and tear-down of the
> data structures and locks that will be used both when fuse_conns and
> fuse_files are created/deleted.
>
> Signed-off-by: Alessio Balsini 

[...]

> diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> index 54442612c48b..9d7685ce0acd 100644
> --- a/include/uapi/linux/fuse.h
> +++ b/include/uapi/linux/fuse.h
> @@ -360,6 +360,7 @@ struct fuse_file_lock {
>  #define FUSE_MAP_ALIGNMENT (1 << 26)
>  #define FUSE_SUBMOUNTS (1 << 27)
>  #define FUSE_HANDLE_KILLPRIV_V2(1 << 28)
> +#define FUSE_PASSTHROUGH   (1 << 29)

This header has a version and a changelog.  Please update those as well.

>
>  /**
>   * CUSE INIT request/reply flags
> @@ -625,7 +626,7 @@ struct fuse_create_in {
>  struct fuse_open_out {
> uint64_tfh;
> uint32_topen_flags;
> -   uint32_tpadding;
> +   uint32_tpassthrough_fh;

I think it would be cleaner to add a FOPEN_PASSTHROUGH flag to
explicitly request passthrough instead of just passing a non-null
value to passthrough_fh.

>  };
>
>  struct fuse_release_in {
> @@ -828,6 +829,13 @@ struct fuse_in_header {
> uint32_tpadding;
>  };
>
> +struct fuse_passthrough_out {
> +   uint32_tfd;
> +   /* For future implementation */
> +   uint32_tlen;
> +   void*vec;
> +};

I don't see why we'd need these extensions.The ioctl just needs to
establish an ID to open file mapping that can be referenced on the
regular protocol, i.e. it just needs to be passed an open file
descriptor and return an unique ID.

Mapping the fuse file's data to the underlying file's data is a
different matter.  That can be an identity mapping established at open
time (this is what this series does) or it can be an arbitrary extent
mapping to one or more underlying open files, established at open time
or on demand.  All of these can be done in band using the fuse
protocol, no need to involve the ioctl mechanism.

So I think we can just get rid of "struct fuse_passthrough_out"
completely and use "uint32_t *" as the ioctl argument.

What I think would be useful is to have an explicit
FUSE_DEV_IOC_PASSTHROUGH_CLOSE ioctl, that would need to be called
once the fuse server no longer needs this ID.   If this turns out to
be a performance problem, we could still add the auto-close behavior
with an explicit FOPEN_PASSTHROUGH_AUTOCLOSE flag later.

Thanks,
Miklos

Re: [PATCH v7 3/6] clk: ralink: add clock driver for mt7621 SoC

2021-02-17 Thread kernel test robot

Hi Sergio,

I love your patch! Perhaps something to improve:

[auto build test WARNING on staging/staging-testing]
[also build test WARNING on clk/clk-next robh/for-next linus/master v5.11 
next-20210216]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Sergio-Paracuellos/MIPS-ralink-add-CPU-clock-detection-and-clock-driver-for-MT7621/20210217-194316
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git 
4eb839aef182fccf8995ee439fc2b48d43e45918
config: riscv-randconfig-r036-20210217 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
c9439ca36342fb6013187d0a69aef92736951476)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install riscv cross compiling tool for clang build
# apt-get install binutils-riscv64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/9b83f7b7032e26686ddc5d89e82ee2df4dc260d3
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Sergio-Paracuellos/MIPS-ralink-add-CPU-clock-detection-and-clock-driver-for-MT7621/20210217-194316
git checkout 9b83f7b7032e26686ddc5d89e82ee2df4dc260d3
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=riscv 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/clk/ralink/clk-mt7621.c:459:2: warning: variable 'ret' is used 
>> uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
   if (!clk_data)
   ^~
   include/linux/compiler.h:56:28: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
  ^~~
   include/linux/compiler.h:58:30: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : 
__trace_if_value(cond))

^~
   drivers/clk/ralink/clk-mt7621.c:517:9: note: uninitialized use occurs here
   return ret;
  ^~~
   drivers/clk/ralink/clk-mt7621.c:459:2: note: remove the 'if' if its 
condition is always false
   if (!clk_data)
   ^~
   include/linux/compiler.h:56:23: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
 ^
   drivers/clk/ralink/clk-mt7621.c:451:2: warning: variable 'ret' is used 
uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
   if (IS_ERR(priv->memc)) {
   ^~~
   include/linux/compiler.h:56:28: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
  ^~~
   include/linux/compiler.h:58:30: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : 
__trace_if_value(cond))

^~
   drivers/clk/ralink/clk-mt7621.c:517:9: note: uninitialized use occurs here
   return ret;
  ^~~
   drivers/clk/ralink/clk-mt7621.c:451:2: note: remove the 'if' if its 
condition is always false
   if (IS_ERR(priv->memc)) {
   ^
   include/linux/compiler.h:56:23: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
 ^
   drivers/clk/ralink/clk-mt7621.c:445:2: warning: variable 'ret' is used 
uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
   if (IS_ERR(priv->sysc)) {
   ^~~
   include/linux/compiler.h:56:28: note: expanded from macro 'if'
   #define if(cond, ...) if ( __trace_if_var( !!(cond , ## __VA_ARGS__) ) )
  ^~~
   include/linux/compiler.h:58:30: note: expanded from macro '__trace_if_var'
   #define __trace_if_var(cond) (__builtin_constant_p(cond) ? (cond) : 
__trace_if_value(cond))

^~
   drivers/clk/ralink/clk-mt7621.c:517:9: note: uninitialized use occurs here
   return ret;
  ^~~
   drivers/clk/ralink/clk-mt7621.c:445:2: note: remove the 'if' if its

Re: [PATCH 1/2] mm: Make alloc_contig_range handle free hugetlb pages

2021-02-17 Thread David Hildenbrand


On 17.02.21 14:30, Michal Hocko wrote:

On Wed 17-02-21 11:08:15, Oscar Salvador wrote:

Free hugetlb pages are tricky to handle so as to no userspace application
notices disruption, we need to replace the current free hugepage with
a new one.

In order to do that, a new function called alloc_and_dissolve_huge_page
is introduced.
This function will first try to get a new fresh hugetlb page, and if it
succeeds, it will dissolve the old one.

With regard to the allocation, since we do not know whether the old page
was allocated on a specific node on request, the node the old page belongs
to will be tried first, and then we will fallback to all nodes containing
memory (N_MEMORY).


I do not think fallback to a different zone is ok. If yes then this
really requires a very good reasoning. alloc_contig_range is an
optimistic allocation interface at best and it shouldn't break carefully
node aware preallocation done by administrator.


What does memory offlining do when migrating in-use hugetlbfs pages? 
Does it always keep the node?


I think keeping the node is the easiest/simplest approach for now.




Note that gigantic hugetlb pages are fenced off since there is a cyclic
dependency between them and alloc_contig_range.


Why do we need/want to do all this in the first place?


cma and virtio-mem (especially on ZONE_MOVABLE) really want to handle 
hugetlbfs pages.


--
Thanks,

David / dhildenb

Re: [PATCH 08/16] media: i2c: max9286: Adjust parameters indent

2021-02-17 Thread Kieran Bingham

On 16/02/2021 17:41, Jacopo Mondi wrote:
> The parameters to max9286_i2c_mux_configure() fits on the previous
> line. Adjust it.
> 
> Cosmetic change only.

Cosmetic tag ;-)

Reviewed-by: Kieran Bingham 

> 
> Signed-off-by: Jacopo Mondi 
> ---
>  drivers/media/i2c/max9286.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/media/i2c/max9286.c b/drivers/media/i2c/max9286.c
> index 6fd4d59fcc72..1d9951215868 100644
> --- a/drivers/media/i2c/max9286.c
> +++ b/drivers/media/i2c/max9286.c
> @@ -287,9 +287,8 @@ static int max9286_i2c_mux_select(struct i2c_mux_core 
> *muxc, u32 chan)
>  
>   priv->mux_channel = chan;
>  
> - max9286_i2c_mux_configure(priv,
> -   MAX9286_FWDCCEN(chan) |
> -   MAX9286_REVCCEN(chan));
> + max9286_i2c_mux_configure(priv, MAX9286_FWDCCEN(chan) |
> + MAX9286_REVCCEN(chan));
>  
>   return 0;
>  }
>

Re: [PATCH RFC 1/4] firmware: Add the support for ZSTD-compressed firmware files

2021-02-17 Thread Takashi Iwai

On Wed, 17 Feb 2021 14:24:19 +0100,
Luis Chamberlain wrote:
> 
> On Wed, Jan 27, 2021 at 04:49:36PM +0100, Takashi Iwai wrote:
> > Due to the popular demands on ZSTD, here is a patch to add a support
> > of ZSTD-compressed firmware files via the direct firmware loader.
> > It's just like XZ-compressed file support, providing a decompressor
> > with ZSTD.  Since ZSTD API can give the decompression size beforehand,
> > the code is even simpler than XZ.
> > 
> > Signed-off-by: Takashi Iwai 
> 
> It also occurs to me that having a simple like #define 
> HAVE_FIRMWARE_COMPRESS_ZSTD
> on include/linux/firmware.h would enable userspace to be aware (if they
> have kernel sources) to determine if the kernels supports this format.

Extending that idea, we might want to have a sysfs entry showing the
supported formats instead?  This will allow to judge dynamically.


thanks,

Takashi

Re: [PATCH 2/2] mm: Make alloc_contig_range handle in-use hugetlb pages

2021-02-17 Thread Michal Hocko

On Wed 17-02-21 11:08:16, Oscar Salvador wrote:
> In-use hugetlb pages can be migrated as any other page (LRU
> and Movable), so let alloc_contig_range handle them.
> 
> All we need is to succesfully isolate such page.

Again, this is missing a problem statement and a justification why we
want/need this.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH 07/16] media: i2c: rdacm2x: Fix wake up delay

2021-02-17 Thread Kieran Bingham

Hi Jacopo,


On 16/02/2021 17:41, Jacopo Mondi wrote:
> The MAX9271 chip manual prescribes a delay of 5 milliseconds
> after the chip exists from low power state.
> 
> Adjust the required delay in the rdacm21 camera module and add it
> to the rdacm20 that currently doesn't implement one.
> 

This sounds to me like it should be a common function in the max9271 module:

> /* Verify communication with the MAX9271: ping to wakeup. */
> dev->serializer.client->addr = MAX9271_DEFAULT_ADDR;
> i2c_smbus_read_byte(dev->serializer.client);
> usleep_range(5000, 8000);


Especially as that MAX9271_DEFAULT_ADDR should probably be handled
directly in the max9271.c file too, and the RDACM's shouldn't care about it.


If we end up moving the max9271 'library' into more of a module/device
then this would have to be done in it's 'probe' anyway, so it's likely
better handled down there...?

But ... it's not essential at this point in the series, so if you want
to keep this patch as is,

Reviewed-by: Kieran Bingham 
> Signed-off-by: Jacopo Mondi 
> ---
>  drivers/media/i2c/rdacm20.c | 1 +
>  drivers/media/i2c/rdacm21.c | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/media/i2c/rdacm20.c b/drivers/media/i2c/rdacm20.c
> index ea30cc936531..39e4b4241870 100644
> --- a/drivers/media/i2c/rdacm20.c
> +++ b/drivers/media/i2c/rdacm20.c
> @@ -460,6 +460,7 @@ static int rdacm20_initialize(struct rdacm20_device *dev)
>   /* Verify communication with the MAX9271: ping to wakeup. */
>   dev->serializer.client->addr = MAX9271_DEFAULT_ADDR;
>   i2c_smbus_read_byte(dev->serializer.client);
> + usleep_range(5000, 8000);
>  
>   /* Serial link disabled during config as it needs a valid pixel clock. 
> */
>   ret = max9271_set_serial_link(>serializer, false);
> diff --git a/drivers/media/i2c/rdacm21.c b/drivers/media/i2c/rdacm21.c
> index 179d107f494c..b22a2ca5340b 100644
> --- a/drivers/media/i2c/rdacm21.c
> +++ b/drivers/media/i2c/rdacm21.c
> @@ -453,7 +453,7 @@ static int rdacm21_initialize(struct rdacm21_device *dev)
>   /* Verify communication with the MAX9271: ping to wakeup. */
>   dev->serializer.client->addr = MAX9271_DEFAULT_ADDR;
>   i2c_smbus_read_byte(dev->serializer.client);
> - usleep_range(3000, 5000);
> + usleep_range(5000, 8000);
>  
>   /* Enable reverse channel and disable the serial link. */
>   ret = max9271_set_serial_link(>serializer, false);
>

[PATCH v4 15/16] rpmsg: char: no dynamic endpoint management for the default one

2021-02-17 Thread Arnaud Pouliquen

Do not dynamically manage the default endpoint. The ept address must
not change.
This update is needed to manage the RPMSG_CREATE_DEV_IOCTL. In this
case a default endpoint is used and it's address must not change or
been reused by another service.

Signed-off-by: Arnaud Pouliquen 
---
 drivers/rpmsg/rpmsg_char.c | 28 +---
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
index d5aa874865f7..0b0a6b7c0c9a 100644
--- a/drivers/rpmsg/rpmsg_char.c
+++ b/drivers/rpmsg/rpmsg_char.c
@@ -114,14 +114,23 @@ static int rpmsg_eptdev_open(struct inode *inode, struct 
file *filp)
struct rpmsg_endpoint *ept;
struct rpmsg_device *rpdev = eptdev->rpdev;
struct device *dev = >dev;
+   u32 addr = eptdev->chinfo.src;
 
get_device(dev);
 
-   ept = rpmsg_create_ept(rpdev, rpmsg_ept_cb, eptdev, eptdev->chinfo);
-   if (!ept) {
-   dev_err(dev, "failed to open %s\n", eptdev->chinfo.name);
-   put_device(dev);
-   return -EINVAL;
+   /*
+* The ept device can has been created by a ns announcement. In this
+* case a default endpoint has been created. Reuse it to avoid to manage
+* a new address on each open close.
+*/
+   ept = rpdev->ept;
+   if (!ept || addr != ept->addr) {
+   ept = rpmsg_create_ept(rpdev, rpmsg_ept_cb, eptdev, 
eptdev->chinfo);
+   if (!ept) {
+   dev_err(dev, "failed to open %s\n", 
eptdev->chinfo.name);
+   put_device(dev);
+   return -EINVAL;
+   }
}
 
eptdev->ept = ept;
@@ -133,12 +142,17 @@ static int rpmsg_eptdev_open(struct inode *inode, struct 
file *filp)
 static int rpmsg_eptdev_release(struct inode *inode, struct file *filp)
 {
struct rpmsg_eptdev *eptdev = cdev_to_eptdev(inode->i_cdev);
+   struct rpmsg_device *rpdev = eptdev->rpdev;
struct device *dev = >dev;
 
-   /* Close the endpoint, if it's not already destroyed by the parent */
+   /*
+* Close the endpoint, if it's not already destroyed by the parent and 
it is not the
+* default one.
+*/
mutex_lock(>ept_lock);
if (eptdev->ept) {
-   rpmsg_destroy_ept(eptdev->ept);
+   if (eptdev->ept != rpdev->ept)
+   rpmsg_destroy_ept(eptdev->ept);
eptdev->ept = NULL;
}
mutex_unlock(>ept_lock);
-- 
2.17.1

[PATCH v4 16/16] rpmsg: char: return an error if device already open

2021-02-17 Thread Arnaud Pouliquen

The rpmsg_create_ept function is invoked when the device is opened.
As only one endpoint must be created per device. It is not possible to
open the same device twice. But there is nothing to prevent multi open.
Return -EBUSY when device is already opened to have a generic error
instead of relying on the back-end to potentially detect the error.

Without this patch for instance the GLINK driver return -EBUSY while
the virtio bus return -ENOSPC.

Signed-off-by: Arnaud Pouliquen 
---
 drivers/rpmsg/rpmsg_char.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
index 0b0a6b7c0c9a..2eacddb83e29 100644
--- a/drivers/rpmsg/rpmsg_char.c
+++ b/drivers/rpmsg/rpmsg_char.c
@@ -116,6 +116,9 @@ static int rpmsg_eptdev_open(struct inode *inode, struct 
file *filp)
struct device *dev = >dev;
u32 addr = eptdev->chinfo.src;
 
+   if (eptdev->ept)
+   return -EBUSY;
+
get_device(dev);
 
/*
-- 
2.17.1

[PATCH v4 13/16] rpmsg: char: introduce __rpmsg_chrdev_create_eptdev function

2021-02-17 Thread Arnaud Pouliquen

Introduce the __rpmsg_chrdev_create_eptdev internal function that returns
the rpmsg_eptdev context structure.
This patch prepares the introduction of a RPMsg device for the
char device. the RPMsg device will need a reference to the context.

Signed-off-by: Arnaud Pouliquen 
---
 drivers/rpmsg/rpmsg_char.c | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
index 09ae1304837c..66dcb8845d6c 100644
--- a/drivers/rpmsg/rpmsg_char.c
+++ b/drivers/rpmsg/rpmsg_char.c
@@ -328,8 +328,9 @@ int rpmsg_chrdev_eptdev_destroy(struct device *dev, void 
*data)
 }
 EXPORT_SYMBOL(rpmsg_chrdev_eptdev_destroy);
 
-int rpmsg_chrdev_create_eptdev(struct rpmsg_device *rpdev, struct device 
*parent,
-  struct rpmsg_channel_info chinfo)
+static struct rpmsg_eptdev *__rpmsg_chrdev_create_eptdev(struct rpmsg_device 
*rpdev,
+struct device *parent,
+struct 
rpmsg_channel_info chinfo)
 {
struct rpmsg_eptdev *eptdev;
struct device *dev;
@@ -337,7 +338,7 @@ int rpmsg_chrdev_create_eptdev(struct rpmsg_device *rpdev, 
struct device *parent
 
eptdev = kzalloc(sizeof(*eptdev), GFP_KERNEL);
if (!eptdev)
-   return -ENOMEM;
+   return ERR_PTR(-ENOMEM);
 
dev = >dev;
eptdev->rpdev = rpdev;
@@ -381,7 +382,7 @@ int rpmsg_chrdev_create_eptdev(struct rpmsg_device *rpdev, 
struct device *parent
put_device(dev);
}
 
-   return ret;
+   return eptdev;
 
 free_ept_ida:
ida_simple_remove(_ept_ida, dev->id);
@@ -391,7 +392,19 @@ int rpmsg_chrdev_create_eptdev(struct rpmsg_device *rpdev, 
struct device *parent
put_device(dev);
kfree(eptdev);
 
-   return ret;
+   return ERR_PTR(ret);
+}
+
+int rpmsg_chrdev_create_eptdev(struct rpmsg_device *rpdev, struct device 
*parent,
+  struct rpmsg_channel_info chinfo)
+{
+   struct rpmsg_eptdev *eptdev;
+
+   eptdev = __rpmsg_chrdev_create_eptdev(rpdev, >dev, chinfo);
+   if (IS_ERR(eptdev))
+   return PTR_ERR(eptdev);
+
+   return 0;
 }
 EXPORT_SYMBOL(rpmsg_chrdev_create_eptdev);
 
-- 
2.17.1

[PATCH v4 10/16] rpmsg: char: use sendto to specify the message destination address

2021-02-17 Thread Arnaud Pouliquen

When the endpoint device is created by the application a destination
address as been specified in the rpmsg_channel_info structure.
Send the message to this address instead of the default one.

Signed-off-by: Arnaud Pouliquen 
---
 drivers/rpmsg/rpmsg_char.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
index 83c10b39b139..09ae1304837c 100644
--- a/drivers/rpmsg/rpmsg_char.c
+++ b/drivers/rpmsg/rpmsg_char.c
@@ -225,9 +225,9 @@ static ssize_t rpmsg_eptdev_write_iter(struct kiocb *iocb,
}
 
if (filp->f_flags & O_NONBLOCK)
-   ret = rpmsg_trysend(eptdev->ept, kbuf, len);
+   ret = rpmsg_trysendto(eptdev->ept, kbuf, len, 
eptdev->chinfo.dst);
else
-   ret = rpmsg_send(eptdev->ept, kbuf, len);
+   ret = rpmsg_sendto(eptdev->ept, kbuf, len, eptdev->chinfo.dst);
 
 unlock_eptdev:
mutex_unlock(>ept_lock);
-- 
2.17.1

[PATCH v4 14/16] rpmsg: char: introduce a RPMsg driver for the RPMsg char device

2021-02-17 Thread Arnaud Pouliquen

A RPMsg char device allows to probe the endpoint device on a remote name
service announcement. With this patch the /dev/rpmsgX interface is created
either by a user application or by the remote firmware.

Signed-off-by: Arnaud Pouliquen 
---
 drivers/rpmsg/rpmsg_char.c | 63 +-
 1 file changed, 62 insertions(+), 1 deletion(-)

diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
index 66dcb8845d6c..d5aa874865f7 100644
--- a/drivers/rpmsg/rpmsg_char.c
+++ b/drivers/rpmsg/rpmsg_char.c
@@ -28,6 +28,8 @@
 
 #define RPMSG_DEV_MAX  (MINORMASK + 1)
 
+#define RPMSG_CHAR_DEVNAME "rpmsg-raw"
+
 static dev_t rpmsg_major;
 static struct class *rpmsg_class;
 
@@ -408,6 +410,51 @@ int rpmsg_chrdev_create_eptdev(struct rpmsg_device *rpdev, 
struct device *parent
 }
 EXPORT_SYMBOL(rpmsg_chrdev_create_eptdev);
 
+static int rpmsg_chrdev_probe(struct rpmsg_device *rpdev)
+{
+   struct rpmsg_channel_info chinfo;
+   struct rpmsg_eptdev *eptdev;
+
+   memcpy(chinfo.name, RPMSG_CHAR_DEVNAME, sizeof(RPMSG_CHAR_DEVNAME));
+   chinfo.src = rpdev->src;
+   chinfo.dst = rpdev->dst;
+
+   eptdev = __rpmsg_chrdev_create_eptdev(rpdev, >dev, chinfo);
+   if (IS_ERR(eptdev) && rpdev->ept) {
+   rpmsg_destroy_ept(rpdev->ept);
+   return PTR_ERR(eptdev);
+   }
+
+   /* Set the private field of the default endpoint to retrieve context on 
callback. */
+   rpdev->ept->priv = eptdev;
+
+   return 0;
+}
+
+static void rpmsg_chrdev_remove(struct rpmsg_device *rpdev)
+{
+   int ret;
+
+   ret = device_for_each_child(>dev, NULL, 
rpmsg_chrdev_eptdev_destroy);
+   if (ret)
+   dev_warn(>dev, "failed to destroy endpoints: %d\n", ret);
+}
+
+static struct rpmsg_device_id rpmsg_chrdev_id_table[] = {
+   { .name = RPMSG_CHAR_DEVNAME },
+   { },
+};
+
+static struct rpmsg_driver rpmsg_chrdev_driver = {
+   .probe = rpmsg_chrdev_probe,
+   .remove = rpmsg_chrdev_remove,
+   .id_table = rpmsg_chrdev_id_table,
+   .callback = rpmsg_ept_cb,
+   .drv = {
+   .name = "rpmsg_chrdev",
+   },
+};
+
 static int rpmsg_chrdev_init(void)
 {
int ret;
@@ -422,9 +469,23 @@ static int rpmsg_chrdev_init(void)
if (IS_ERR(rpmsg_class)) {
pr_err("failed to create rpmsg class\n");
unregister_chrdev_region(rpmsg_major, RPMSG_DEV_MAX);
-   return PTR_ERR(rpmsg_class);
+   ret = PTR_ERR(rpmsg_class);
+   goto free_region;
}
 
+   ret = register_rpmsg_driver(_chrdev_driver);
+   if (ret < 0) {
+   pr_err("rpmsg raw: failed to register rpmsg driver\n");
+   goto free_class;
+   }
+
+   return 0;
+
+free_class:
+   class_destroy(rpmsg_class);
+free_region:
+   unregister_chrdev_region(rpmsg_major, RPMSG_DEV_MAX);
+
return ret;
 }
 postcore_initcall(rpmsg_chrdev_init);
-- 
2.17.1

[PATCH v4 12/16] rpmsg: ctrl: introduce RPMSG_CREATE_DEV_IOCTL

2021-02-17 Thread Arnaud Pouliquen

Implement the RPMSG_CREATE_DEV_IOCTL to allow the user application to
initiate a communication through a new RPMsg channel.
This Ioctl can be used to instantiate a local RPMsg device.
Depending on the back-end implementation, a NS announcement can be sent
to the remote processor.

Suggested-by: Mathieu Poirier 
Signed-off-by: Arnaud Pouliquen 
---
 drivers/rpmsg/rpmsg_ctrl.c | 21 +
 include/uapi/linux/rpmsg.h |  5 +
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/rpmsg/rpmsg_ctrl.c b/drivers/rpmsg/rpmsg_ctrl.c
index 2e43b4096aa8..78c13816bfc6 100644
--- a/drivers/rpmsg/rpmsg_ctrl.c
+++ b/drivers/rpmsg/rpmsg_ctrl.c
@@ -70,9 +70,7 @@ static long rpmsg_ctrl_ioctl(struct file *fp, unsigned int 
cmd, unsigned long ar
void __user *argp = (void __user *)arg;
struct rpmsg_endpoint_info eptinfo;
struct rpmsg_channel_info chinfo;
-
-   if (cmd != RPMSG_CREATE_EPT_IOCTL)
-   return -EINVAL;
+   struct rpmsg_device *newch;
 
if (copy_from_user(, argp, sizeof(eptinfo)))
return -EFAULT;
@@ -82,7 +80,22 @@ static long rpmsg_ctrl_ioctl(struct file *fp, unsigned int 
cmd, unsigned long ar
chinfo.src = eptinfo.src;
chinfo.dst = eptinfo.dst;
 
-   return rpmsg_chrdev_create_eptdev(ctrldev->rpdev, >dev, 
chinfo);
+   switch (cmd) {
+   case RPMSG_CREATE_EPT_IOCTL:
+   return rpmsg_chrdev_create_eptdev(ctrldev->rpdev, 
>dev, chinfo);
+
+   case RPMSG_CREATE_DEV_IOCTL:
+   newch = rpmsg_create_channel(ctrldev->rpdev, );
+   if (!newch) {
+   dev_err(>dev, "rpmsg_create_channel failed\n");
+   return -ENXIO;
+   }
+   return 0;
+
+   default:
+   return -EINVAL;
+   }
+
 };
 
 static const struct file_operations rpmsg_ctrl_fops = {
diff --git a/include/uapi/linux/rpmsg.h b/include/uapi/linux/rpmsg.h
index f5ca8740f3fb..f9d5a74e7801 100644
--- a/include/uapi/linux/rpmsg.h
+++ b/include/uapi/linux/rpmsg.h
@@ -33,4 +33,9 @@ struct rpmsg_endpoint_info {
  */
 #define RPMSG_DESTROY_EPT_IOCTL_IO(0xb5, 0x2)
 
+/**
+ * Instantiate a rpmsg service device.
+ */
+#define RPMSG_CREATE_DEV_IOCTL _IOW(0xb5, 0x3, struct rpmsg_endpoint_info)
+
 #endif
-- 
2.17.1

[PATCH v4 11/16] rpmsg: virtio: register the rpmsg_ctrl device

2021-02-17 Thread Arnaud Pouliquen

Instantiate the rpmsg_ioctl device on virtio RPMsg bus creation.
This provides the possibility to expose the RPMSG_CREATE_EPT_IOCTL
to create RPMsg chdev endpoints.

Signed-off-by: Arnaud Pouliquen 

---
V3:
Fix compilation issue
Reported-by: kernel test robot 
Reported-by: Dan Carpenter 

warnings:
drivers/rpmsg/virtio_rpmsg_bus.c:978 rpmsg_probe() error: uninitialized symbol 
'vch'.
drivers/rpmsg/virtio_rpmsg_bus.c:979 rpmsg_probe() error: uninitialized symbol 
'rpdev_ctrl'.
---
 drivers/rpmsg/virtio_rpmsg_bus.c | 37 +++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
index e87d4cf926eb..5143fdeca306 100644
--- a/drivers/rpmsg/virtio_rpmsg_bus.c
+++ b/drivers/rpmsg/virtio_rpmsg_bus.c
@@ -813,6 +813,35 @@ static void rpmsg_xmit_done(struct virtqueue *svq)
wake_up_interruptible(>sendq);
 }
 
+static struct rpmsg_device *rpmsg_virtio_add_char_dev(struct virtio_device 
*vdev)
+{
+   struct virtproc_info *vrp = vdev->priv;
+   struct virtio_rpmsg_channel *vch;
+   struct rpmsg_device *rpdev_ctrl;
+   int err = 0;
+
+   vch = kzalloc(sizeof(*vch), GFP_KERNEL);
+   if (!vch)
+   return ERR_PTR(-ENOMEM);
+
+   /* Link the channel to the vrp */
+   vch->vrp = vrp;
+
+   /* Assign public information to the rpmsg_device */
+   rpdev_ctrl = >rpdev;
+   rpdev_ctrl->ops = _rpmsg_ops;
+
+   rpdev_ctrl->dev.parent = >vdev->dev;
+   rpdev_ctrl->dev.release = virtio_rpmsg_release_device;
+   rpdev_ctrl->little_endian = virtio_is_little_endian(vrp->vdev);
+
+   err = rpmsg_ctrl_register_device(rpdev_ctrl);
+   if (err)
+   return ERR_PTR(err);
+
+   return rpdev_ctrl;
+}
+
 static int rpmsg_probe(struct virtio_device *vdev)
 {
vq_callback_t *vq_cbs[] = { rpmsg_recv_done, rpmsg_xmit_done };
@@ -820,7 +849,7 @@ static int rpmsg_probe(struct virtio_device *vdev)
struct virtqueue *vqs[2];
struct virtproc_info *vrp;
struct virtio_rpmsg_channel *vch;
-   struct rpmsg_device *rpdev_ns;
+   struct rpmsg_device *rpdev_ns = NULL, *rpdev_ctrl = NULL;
void *bufs_va;
int err = 0, i;
size_t total_buf_space;
@@ -918,6 +947,11 @@ static int rpmsg_probe(struct virtio_device *vdev)
goto free_coherent;
}
 
+   rpdev_ctrl = rpmsg_virtio_add_char_dev(vdev);
+   if (IS_ERR(rpdev_ctrl)) {
+   err = PTR_ERR(rpdev_ctrl);
+   goto free_coherent;
+   }
/*
 * Prepare to kick but don't notify yet - we can't do this before
 * device is ready.
@@ -941,6 +975,7 @@ static int rpmsg_probe(struct virtio_device *vdev)
 
 free_coherent:
kfree(vch);
+   kfree(to_virtio_rpmsg_channel(rpdev_ctrl));
dma_free_coherent(vdev->dev.parent, total_buf_space,
  bufs_va, vrp->bufs_dma);
 vqs_del:
-- 
2.17.1

[PATCH v4 06/16] rpmsg: move the rpmsg control device from rpmsg_char to rpmsg_ctrl

2021-02-17 Thread Arnaud Pouliquen

Move the code related to the rpmsg_ctrl char device to the new
rpmsg_ctrl.c module.
Manage the dependency in the kconfig.

Signed-off-by: Arnaud Pouliquen 
---
 drivers/rpmsg/Kconfig  |   9 ++
 drivers/rpmsg/Makefile |   1 +
 drivers/rpmsg/rpmsg_char.c | 163 
 drivers/rpmsg/rpmsg_ctrl.c | 216 +
 4 files changed, 226 insertions(+), 163 deletions(-)
 create mode 100644 drivers/rpmsg/rpmsg_ctrl.c

diff --git a/drivers/rpmsg/Kconfig b/drivers/rpmsg/Kconfig
index 0b4407abdf13..2d0cd7fdd710 100644
--- a/drivers/rpmsg/Kconfig
+++ b/drivers/rpmsg/Kconfig
@@ -10,11 +10,20 @@ config RPMSG_CHAR
tristate "RPMSG device interface"
depends on RPMSG
depends on NET
+   select RPMSG_CTRL
help
  Say Y here to export rpmsg endpoints as device files, usually found
  in /dev. They make it possible for user-space programs to send and
  receive rpmsg packets.
 
+config RPMSG_CTRL
+   tristate "RPMSG control interface"
+   depends on RPMSG
+   help
+ Say Y here to enable the support of the /dev/rpmsg_ctlX API. This API
+ allows user-space programs to create endpoints with specific service 
name,
+ source and destination addresses.
+
 config RPMSG_NS
tristate "RPMSG name service announcement"
depends on RPMSG
diff --git a/drivers/rpmsg/Makefile b/drivers/rpmsg/Makefile
index 8d452656f0ee..58e3b382e316 100644
--- a/drivers/rpmsg/Makefile
+++ b/drivers/rpmsg/Makefile
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_RPMSG)+= rpmsg_core.o
 obj-$(CONFIG_RPMSG_CHAR)   += rpmsg_char.o
+obj-$(CONFIG_RPMSG_CTRL)   += rpmsg_ctrl.o
 obj-$(CONFIG_RPMSG_NS) += rpmsg_ns.o
 obj-$(CONFIG_RPMSG_MTK_SCP)+= mtk_rpmsg.o
 qcom_glink-objs:= qcom_glink_native.o qcom_glink_ssr.o
diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
index 23e369a00531..83c10b39b139 100644
--- a/drivers/rpmsg/rpmsg_char.c
+++ b/drivers/rpmsg/rpmsg_char.c
@@ -31,28 +31,12 @@
 static dev_t rpmsg_major;
 static struct class *rpmsg_class;
 
-static DEFINE_IDA(rpmsg_ctrl_ida);
 static DEFINE_IDA(rpmsg_ept_ida);
 static DEFINE_IDA(rpmsg_minor_ida);
 
 #define dev_to_eptdev(dev) container_of(dev, struct rpmsg_eptdev, dev)
 #define cdev_to_eptdev(i_cdev) container_of(i_cdev, struct rpmsg_eptdev, cdev)
 
-#define dev_to_ctrldev(dev) container_of(dev, struct rpmsg_ctrldev, dev)
-#define cdev_to_ctrldev(i_cdev) container_of(i_cdev, struct rpmsg_ctrldev, 
cdev)
-
-/**
- * struct rpmsg_ctrldev - control device for instantiating endpoint devices
- * @rpdev: underlaying rpmsg device
- * @cdev:  cdev for the ctrl device
- * @dev:   device for the ctrl device
- */
-struct rpmsg_ctrldev {
-   struct rpmsg_device *rpdev;
-   struct cdev cdev;
-   struct device dev;
-};
-
 /**
  * struct rpmsg_eptdev - endpoint device context
  * @dev:   endpoint device
@@ -411,145 +395,6 @@ int rpmsg_chrdev_create_eptdev(struct rpmsg_device 
*rpdev, struct device *parent
 }
 EXPORT_SYMBOL(rpmsg_chrdev_create_eptdev);
 
-static int rpmsg_ctrldev_open(struct inode *inode, struct file *filp)
-{
-   struct rpmsg_ctrldev *ctrldev = cdev_to_ctrldev(inode->i_cdev);
-
-   get_device(>dev);
-   filp->private_data = ctrldev;
-
-   return 0;
-}
-
-static int rpmsg_ctrldev_release(struct inode *inode, struct file *filp)
-{
-   struct rpmsg_ctrldev *ctrldev = cdev_to_ctrldev(inode->i_cdev);
-
-   put_device(>dev);
-
-   return 0;
-}
-
-static long rpmsg_ctrldev_ioctl(struct file *fp, unsigned int cmd,
-   unsigned long arg)
-{
-   struct rpmsg_ctrldev *ctrldev = fp->private_data;
-   void __user *argp = (void __user *)arg;
-   struct rpmsg_endpoint_info eptinfo;
-   struct rpmsg_channel_info chinfo;
-
-   if (cmd != RPMSG_CREATE_EPT_IOCTL)
-   return -EINVAL;
-
-   if (copy_from_user(, argp, sizeof(eptinfo)))
-   return -EFAULT;
-
-   memcpy(chinfo.name, eptinfo.name, RPMSG_NAME_SIZE);
-   chinfo.name[RPMSG_NAME_SIZE-1] = '\0';
-   chinfo.src = eptinfo.src;
-   chinfo.dst = eptinfo.dst;
-
-   return rpmsg_chrdev_create_eptdev(ctrldev->rpdev, >dev, 
chinfo);
-};
-
-static const struct file_operations rpmsg_ctrldev_fops = {
-   .owner = THIS_MODULE,
-   .open = rpmsg_ctrldev_open,
-   .release = rpmsg_ctrldev_release,
-   .unlocked_ioctl = rpmsg_ctrldev_ioctl,
-   .compat_ioctl = compat_ptr_ioctl,
-};
-
-static void rpmsg_ctrldev_release_device(struct device *dev)
-{
-   struct rpmsg_ctrldev *ctrldev = dev_to_ctrldev(dev);
-
-   ida_simple_remove(_ctrl_ida, dev->id);
-   ida_simple_remove(_minor_ida, MINOR(dev->devt));
-   cdev_del(>cdev);
-   kfree(ctrldev);
-}
-
-static int rpmsg_chrdev_probe(struct rpmsg_device *rpdev)
-{
-   struct rpmsg_ctrldev *ctrldev;
-

[PATCH v4 09/16] rpmsg: smd: add sendto and trysendto ops

2021-02-17 Thread Arnaud Pouliquen

Implement the sendto ops to support the future rpmsg_char update for the
vitio backend support.
The use of sendto in rpmsg_char is needed as a destination address is
requested at least by the virtio backend.
The SMD implementation does not need a destination address so ignores it.

Signed-off-by: Arnaud Pouliquen 
---
 drivers/rpmsg/qcom_smd.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/rpmsg/qcom_smd.c b/drivers/rpmsg/qcom_smd.c
index 40a1c415c775..2d279c03a090 100644
--- a/drivers/rpmsg/qcom_smd.c
+++ b/drivers/rpmsg/qcom_smd.c
@@ -974,6 +974,20 @@ static int qcom_smd_trysend(struct rpmsg_endpoint *ept, 
void *data, int len)
return __qcom_smd_send(qsept->qsch, data, len, false);
 }
 
+static int qcom_smd_sendto(struct rpmsg_endpoint *ept, void *data, int len, 
u32 dst)
+{
+   struct qcom_smd_endpoint *qsept = to_smd_endpoint(ept);
+
+   return __qcom_smd_send(qsept->qsch, data, len, true);
+}
+
+static int qcom_smd_trysendto(struct rpmsg_endpoint *ept, void *data, int len, 
u32 dst)
+{
+   struct qcom_smd_endpoint *qsept = to_smd_endpoint(ept);
+
+   return __qcom_smd_send(qsept->qsch, data, len, false);
+}
+
 static __poll_t qcom_smd_poll(struct rpmsg_endpoint *ept,
  struct file *filp, poll_table *wait)
 {
@@ -1038,7 +1052,9 @@ static const struct rpmsg_device_ops qcom_smd_device_ops 
= {
 static const struct rpmsg_endpoint_ops qcom_smd_endpoint_ops = {
.destroy_ept = qcom_smd_destroy_ept,
.send = qcom_smd_send,
+   .sendto = qcom_smd_sendto,
.trysend = qcom_smd_trysend,
+   .trysendto = qcom_smd_trysendto,
.poll = qcom_smd_poll,
 };
 
-- 
2.17.1

[PATCH v4 07/16] rpmsg: update rpmsg_chrdev_register_device function

2021-02-17 Thread Arnaud Pouliquen

As driver is now the rpmsg_ioctl, rename the function.
In addition, initialize the rpdev addresses to RPMSG_ADDR_ANY as not
defined.

Signed-off-by: Arnaud Pouliquen 
---
 drivers/rpmsg/qcom_glink_native.c |  2 +-
 drivers/rpmsg/qcom_smd.c  |  2 +-
 drivers/rpmsg/rpmsg_ctrl.c|  2 +-
 drivers/rpmsg/rpmsg_internal.h| 10 ++
 4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/rpmsg/qcom_glink_native.c 
b/drivers/rpmsg/qcom_glink_native.c
index 27a05167c18c..d4e4dd482614 100644
--- a/drivers/rpmsg/qcom_glink_native.c
+++ b/drivers/rpmsg/qcom_glink_native.c
@@ -1625,7 +1625,7 @@ static int qcom_glink_create_chrdev(struct qcom_glink 
*glink)
rpdev->dev.parent = glink->dev;
rpdev->dev.release = qcom_glink_device_release;
 
-   return rpmsg_chrdev_register_device(rpdev);
+   return rpmsg_ctrl_register_device(rpdev);
 }
 
 struct qcom_glink *qcom_glink_native_probe(struct device *dev,
diff --git a/drivers/rpmsg/qcom_smd.c b/drivers/rpmsg/qcom_smd.c
index 19903de6268d..40a1c415c775 100644
--- a/drivers/rpmsg/qcom_smd.c
+++ b/drivers/rpmsg/qcom_smd.c
@@ -1097,7 +1097,7 @@ static int qcom_smd_create_chrdev(struct qcom_smd_edge 
*edge)
qsdev->rpdev.dev.parent = >dev;
qsdev->rpdev.dev.release = qcom_smd_release_device;
 
-   return rpmsg_chrdev_register_device(>rpdev);
+   return rpmsg_ctrl_register_device(>rpdev);
 }
 
 /*
diff --git a/drivers/rpmsg/rpmsg_ctrl.c b/drivers/rpmsg/rpmsg_ctrl.c
index fa05b67d24da..2e43b4096aa8 100644
--- a/drivers/rpmsg/rpmsg_ctrl.c
+++ b/drivers/rpmsg/rpmsg_ctrl.c
@@ -180,7 +180,7 @@ static struct rpmsg_driver rpmsg_ctrl_driver = {
.probe = rpmsg_ctrl_probe,
.remove = rpmsg_ctrl_remove,
.drv = {
-   .name = "rpmsg_chrdev",
+   .name = KBUILD_MODNAME,
},
 };
 
diff --git a/drivers/rpmsg/rpmsg_internal.h b/drivers/rpmsg/rpmsg_internal.h
index a76c344253bf..7428f4465d17 100644
--- a/drivers/rpmsg/rpmsg_internal.h
+++ b/drivers/rpmsg/rpmsg_internal.h
@@ -82,16 +82,18 @@ struct rpmsg_device *rpmsg_create_channel(struct 
rpmsg_device *rpdev,
 int rpmsg_release_channel(struct rpmsg_device *rpdev,
  struct rpmsg_channel_info *chinfo);
 /**
- * rpmsg_chrdev_register_device() - register chrdev device based on rpdev
+ * rpmsg_ctrl_register_device() - register a char device for control based on 
rpdev
  * @rpdev: prepared rpdev to be used for creating endpoints
  *
  * This function wraps rpmsg_register_device() preparing the rpdev for use as
  * basis for the rpmsg chrdev.
  */
-static inline int rpmsg_chrdev_register_device(struct rpmsg_device *rpdev)
+static inline int rpmsg_ctrl_register_device(struct rpmsg_device *rpdev)
 {
-   strcpy(rpdev->id.name, "rpmsg_chrdev");
-   rpdev->driver_override = "rpmsg_chrdev";
+   strcpy(rpdev->id.name, "rpmsg_ctrl");
+   rpdev->driver_override = "rpmsg_ctrl";
+   rpdev->src = RPMSG_ADDR_ANY;
+   rpdev->dst = RPMSG_ADDR_ANY;
 
return rpmsg_register_device(rpdev);
 }
-- 
2.17.1

[PATCH v4 08/16] rpmsg: glink: add sendto and trysendto ops

2021-02-17 Thread Arnaud Pouliquen

Implement the sendto ops to support the future rpmsg_char update for the
vitio backend support.
The use of sendto in rpmsg_char is needed as a destination address is
requested at least by the virtio backend.
The glink implementation does not need a destination address so ignores it.

Signed-off-by: Arnaud Pouliquen 
---
 drivers/rpmsg/qcom_glink_native.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/rpmsg/qcom_glink_native.c 
b/drivers/rpmsg/qcom_glink_native.c
index d4e4dd482614..ae2c03b59c55 100644
--- a/drivers/rpmsg/qcom_glink_native.c
+++ b/drivers/rpmsg/qcom_glink_native.c
@@ -1332,6 +1332,20 @@ static int qcom_glink_trysend(struct rpmsg_endpoint 
*ept, void *data, int len)
return __qcom_glink_send(channel, data, len, false);
 }
 
+static int qcom_glink_sendto(struct rpmsg_endpoint *ept, void *data, int len, 
u32 dst)
+{
+   struct glink_channel *channel = to_glink_channel(ept);
+
+   return __qcom_glink_send(channel, data, len, true);
+}
+
+static int qcom_glink_trysendto(struct rpmsg_endpoint *ept, void *data, int 
len, u32 dst)
+{
+   struct glink_channel *channel = to_glink_channel(ept);
+
+   return __qcom_glink_send(channel, data, len, false);
+}
+
 /*
  * Finds the device_node for the glink child interested in this channel.
  */
@@ -1364,7 +1378,9 @@ static const struct rpmsg_device_ops glink_device_ops = {
 static const struct rpmsg_endpoint_ops glink_endpoint_ops = {
.destroy_ept = qcom_glink_destroy_ept,
.send = qcom_glink_send,
+   .sendto = qcom_glink_sendto,
.trysend = qcom_glink_trysend,
+   .trysendto = qcom_glink_trysendto,
 };
 
 static void qcom_glink_rpdev_release(struct device *dev)
-- 
2.17.1

[PATCH v4 00/16] introduce a generic IOCTL interface for RPMsg channels management

2021-02-17 Thread Arnaud Pouliquen

This series restructures the RPMsg char driver to decorrelate the control part 
and to
create a generic RPMsg ioctl interface compatible with other RPMsg services.

The V4 fixes compilation issue reported by the kernel test robot 


The V3 is based on the guideline proposed by Mathieu Poirier to keep as much as 
possible
the legacy implementation of the rpmsg_char used by the GLINK and SMD platforms.

Objectives of the series:
- Allow to create a service from Linux user application:
  - with a specific name
  - with or without name service announcement.
- Allow to probe the same service by receiving either a NS announcement from 
the remote firmware
  or a Linux user application request.
- Use these services independently of the RPMsg transport implementation (e.g 
be able to use
  RPMSg char with the RPMsg virtio bus).

Steps in the series:
  - Extract the control part of the char dev and create the rpmsg_ctrl.c file 
(patches 1 to 5)
  - Enable the use of the chardev with the virtio backend (patches 6 to 10)
  - Introduce the RPMSG_CREATE_DEV_IOCTL IOCTL to instantiate RPMsg devices 
(patch 11)
The application can then create or release a channel by specifying:
   - the name service of the device to instantiate.   
   - the source address.
   - the destination address.
  - Instantiate the /dev/rpmsg interface on remote NS announcement (patches 12 
to 15)

In this revision, I do not divide the series into several parts in order to 
show a complete
picture of the proposed evolution. To simplify the review, as a next step, I 
can send it in
several steps listed above.

Known current Limitations:
- Tested only with virtio RPMsg bus. The glink and smd drivers adaptations have 
not been tested
  (not able to test it).
- For the virtio backend: No NS announcement is sent to the remote processor if 
the source
  address is set to RPMSG_ADDR_ANY.
- For the virtio backend: the existing RPMSG_CREATE_EPT_IOCTL is working but 
the endpoints are
  not attached to an exiting channel.
- to limit patches the pending RPMSG_DESTROY_DEV_IOCTL has not ben implemented. 
This will be
  proposed in a second step.

This series can be applied on git/andersson/remoteproc.git for-next branch 
(d9ff3a5789cb).

This series can be tested using rpmsgexport, rpmsgcreatedev and ping tools 
available here:
https://github.com/arnopo/rpmsgexport.git

Reference to the V3 discussion thread: https://lkml.org/lkml/2021/2/4/194

Arnaud Pouliquen (16):
  rpmsg: char: rename rpmsg_char_init to rpmsg_chrdev_init
  rpmsg: move RPMSG_ADDR_ANY in user API
  rpmsg: add short description of the IOCTL defined in UAPI.
  rpmsg: char: export eptdev create an destroy functions
  rpmsg: char: dissociate the control device from the rpmsg class
  rpmsg: move the rpmsg control device from rpmsg_char to rpmsg_ctrl
  rpmsg: update rpmsg_chrdev_register_device function
  rpmsg: glink: add sendto and trysendto ops
  rpmsg: smd: add sendto and trysendto ops
  rpmsg: char: use sendto to specify the message destination address
  rpmsg: virtio: register the rpmsg_ctrl device
  rpmsg: ctrl: introduce RPMSG_CREATE_DEV_IOCTL
  rpmsg: char: introduce __rpmsg_chrdev_create_eptdev function
  rpmsg: char: introduce a RPMsg driver for the RPMsg char device
  rpmsg: char: no dynamic endpoint management for the default one
  rpmsg: char: return an error if device already open

 drivers/rpmsg/Kconfig |   9 ++
 drivers/rpmsg/Makefile|   1 +
 drivers/rpmsg/qcom_glink_native.c |  18 ++-
 drivers/rpmsg/qcom_smd.c  |  18 ++-
 drivers/rpmsg/rpmsg_char.c| 237 +++---
 drivers/rpmsg/rpmsg_char.h|  51 +++
 drivers/rpmsg/rpmsg_ctrl.c| 229 +
 drivers/rpmsg/rpmsg_internal.h|  10 +-
 drivers/rpmsg/virtio_rpmsg_bus.c  |  37 -
 include/linux/rpmsg.h |   3 +-
 include/uapi/linux/rpmsg.h|  18 ++-
 11 files changed, 469 insertions(+), 162 deletions(-)
 create mode 100644 drivers/rpmsg/rpmsg_char.h
 create mode 100644 drivers/rpmsg/rpmsg_ctrl.c

-- 
2.17.1

[PATCH v4 02/16] rpmsg: move RPMSG_ADDR_ANY in user API

2021-02-17 Thread Arnaud Pouliquen

As the RPMSG_ADDR_ANY is a valid src or dst address that can be set by
user applications,  migrate its definition in user API.

Signed-off-by: Arnaud Pouliquen 
---
 include/linux/rpmsg.h  | 3 +--
 include/uapi/linux/rpmsg.h | 2 ++
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/linux/rpmsg.h b/include/linux/rpmsg.h
index a5db828b2420..d97dcd049f18 100644
--- a/include/linux/rpmsg.h
+++ b/include/linux/rpmsg.h
@@ -18,8 +18,7 @@
 #include 
 #include 
 #include 
-
-#define RPMSG_ADDR_ANY 0x
+#include 
 
 struct rpmsg_device;
 struct rpmsg_endpoint;
diff --git a/include/uapi/linux/rpmsg.h b/include/uapi/linux/rpmsg.h
index e14c6dab4223..5e00748da319 100644
--- a/include/uapi/linux/rpmsg.h
+++ b/include/uapi/linux/rpmsg.h
@@ -9,6 +9,8 @@
 #include 
 #include 
 
+#define RPMSG_ADDR_ANY 0x
+
 /**
  * struct rpmsg_endpoint_info - endpoint info representation
  * @name: name of service
-- 
2.17.1

[PATCH v4 05/16] rpmsg: char: dissociate the control device from the rpmsg class

2021-02-17 Thread Arnaud Pouliquen

The RPMsg control device is a RPMsg device, it is already
referenced in the RPMsg bus. There is only an interest to
reference the ept char devices in the rpmsg class.
This patch prepares the code split of the control and end point
devices in two separate files.

Signed-off-by: Arnaud Pouliquen 
---
 drivers/rpmsg/rpmsg_char.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
index 78a6d19fdf82..23e369a00531 100644
--- a/drivers/rpmsg/rpmsg_char.c
+++ b/drivers/rpmsg/rpmsg_char.c
@@ -485,7 +485,6 @@ static int rpmsg_chrdev_probe(struct rpmsg_device *rpdev)
dev = >dev;
device_initialize(dev);
dev->parent = >dev;
-   dev->class = rpmsg_class;
 
cdev_init(>cdev, _ctrldev_fops);
ctrldev->cdev.owner = THIS_MODULE;
-- 
2.17.1

[PATCH v4 04/16] rpmsg: char: export eptdev create an destroy functions

2021-02-17 Thread Arnaud Pouliquen

To prepare the split code related to the control and the endpoint
devices in separate files:
- suppress the dependency with the rpmsg_ctrldev struct,
- rename and export the functions in rpmsg_char.h.

Suggested-by: Mathieu Poirier 
Signed-off-by: Arnaud Pouliquen 
---
 drivers/rpmsg/rpmsg_char.c | 22 ++--
 drivers/rpmsg/rpmsg_char.h | 51 ++
 2 files changed, 66 insertions(+), 7 deletions(-)
 create mode 100644 drivers/rpmsg/rpmsg_char.h

diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
index 9e33b53bbf56..78a6d19fdf82 100644
--- a/drivers/rpmsg/rpmsg_char.c
+++ b/drivers/rpmsg/rpmsg_char.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
+ * Copyright (C) 2021, STMicroelectronics
  * Copyright (c) 2016, Linaro Ltd.
  * Copyright (c) 2012, Michal Simek 
  * Copyright (c) 2012, PetaLogix
@@ -22,6 +23,7 @@
 #include 
 #include 
 
+#include "rpmsg_char.h"
 #include "rpmsg_internal.h"
 
 #define RPMSG_DEV_MAX  (MINORMASK + 1)
@@ -78,7 +80,7 @@ struct rpmsg_eptdev {
wait_queue_head_t readq;
 };
 
-static int rpmsg_eptdev_destroy(struct device *dev, void *data)
+static int rpmsg_eptdev_destroy(struct device *dev)
 {
struct rpmsg_eptdev *eptdev = dev_to_eptdev(dev);
 
@@ -277,7 +279,7 @@ static long rpmsg_eptdev_ioctl(struct file *fp, unsigned 
int cmd,
if (cmd != RPMSG_DESTROY_EPT_IOCTL)
return -EINVAL;
 
-   return rpmsg_eptdev_destroy(>dev, NULL);
+   return rpmsg_eptdev_destroy(>dev);
 }
 
 static const struct file_operations rpmsg_eptdev_fops = {
@@ -336,10 +338,15 @@ static void rpmsg_eptdev_release_device(struct device 
*dev)
kfree(eptdev);
 }
 
-static int rpmsg_eptdev_create(struct rpmsg_ctrldev *ctrldev,
+int rpmsg_chrdev_eptdev_destroy(struct device *dev, void *data)
+{
+   return rpmsg_eptdev_destroy(dev);
+}
+EXPORT_SYMBOL(rpmsg_chrdev_eptdev_destroy);
+
+int rpmsg_chrdev_create_eptdev(struct rpmsg_device *rpdev, struct device 
*parent,
   struct rpmsg_channel_info chinfo)
 {
-   struct rpmsg_device *rpdev = ctrldev->rpdev;
struct rpmsg_eptdev *eptdev;
struct device *dev;
int ret;
@@ -359,7 +366,7 @@ static int rpmsg_eptdev_create(struct rpmsg_ctrldev 
*ctrldev,
 
device_initialize(dev);
dev->class = rpmsg_class;
-   dev->parent = >dev;
+   dev->parent = parent;
dev->groups = rpmsg_eptdev_groups;
dev_set_drvdata(dev, eptdev);
 
@@ -402,6 +409,7 @@ static int rpmsg_eptdev_create(struct rpmsg_ctrldev 
*ctrldev,
 
return ret;
 }
+EXPORT_SYMBOL(rpmsg_chrdev_create_eptdev);
 
 static int rpmsg_ctrldev_open(struct inode *inode, struct file *filp)
 {
@@ -441,7 +449,7 @@ static long rpmsg_ctrldev_ioctl(struct file *fp, unsigned 
int cmd,
chinfo.src = eptinfo.src;
chinfo.dst = eptinfo.dst;
 
-   return rpmsg_eptdev_create(ctrldev, chinfo);
+   return rpmsg_chrdev_create_eptdev(ctrldev->rpdev, >dev, 
chinfo);
 };
 
 static const struct file_operations rpmsg_ctrldev_fops = {
@@ -527,7 +535,7 @@ static void rpmsg_chrdev_remove(struct rpmsg_device *rpdev)
int ret;
 
/* Destroy all endpoints */
-   ret = device_for_each_child(>dev, NULL, rpmsg_eptdev_destroy);
+   ret = device_for_each_child(>dev, NULL, 
rpmsg_chrdev_eptdev_destroy);
if (ret)
dev_warn(>dev, "failed to nuke endpoints: %d\n", ret);
 
diff --git a/drivers/rpmsg/rpmsg_char.h b/drivers/rpmsg/rpmsg_char.h
new file mode 100644
index ..0feb3ea9445c
--- /dev/null
+++ b/drivers/rpmsg/rpmsg_char.h
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright (C) STMicroelectronics 2021.
+ */
+
+#ifndef __RPMSG_CHRDEV_H__
+#define __RPMSG_CHRDEV_H__
+
+#if IS_ENABLED(CONFIG_RPMSG_CHAR)
+/**
+ * rpmsg_chrdev_create_eptdev() - register char device based on an endpoint
+ * @rpdev:  prepared rpdev to be used for creating endpoints
+ * @parent: parent device
+ * @chinfo: assiated endpoint channel information.
+ *
+ * This function create a new rpmsg char endpoint device to instantiate a new
+ * endpoint based on chinfo information.
+ */
+int rpmsg_chrdev_create_eptdev(struct rpmsg_device *rpdev, struct device 
*parent,
+  struct rpmsg_channel_info chinfo);
+
+/**
+ * rpmsg_chrdev_eptdev_destroy() - destroy created char device
+ * @data: parent device
+ * @chinfo: assiated endpoint channel information.
+ *
+ * This function create a new rpmsg char endpoint device to instantiate a new
+ * endpoint based on chinfo information.
+ */
+int rpmsg_chrdev_eptdev_destroy(struct device *dev, void *data);
+
+#else  /*IS_ENABLED(CONFIG_RPMSG_CHAR) */
+
+static inline int rpmsg_chrdev_create_eptdev(struct rpmsg_device *rpdev,
+struct device *parent,
+struct rpmsg_channel_info chinfo)
+{
+   return

[PATCH v4 03/16] rpmsg: add short description of the IOCTL defined in UAPI.

2021-02-17 Thread Arnaud Pouliquen

Add a description of the IOCTL and provide information on the default
value of the source and destination addresses.

Signed-off-by: Arnaud Pouliquen 
---
 include/uapi/linux/rpmsg.h | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/rpmsg.h b/include/uapi/linux/rpmsg.h
index 5e00748da319..f5ca8740f3fb 100644
--- a/include/uapi/linux/rpmsg.h
+++ b/include/uapi/linux/rpmsg.h
@@ -14,8 +14,8 @@
 /**
  * struct rpmsg_endpoint_info - endpoint info representation
  * @name: name of service
- * @src: local address
- * @dst: destination address
+ * @src: local address. To set to RPMSG_ADDR_ANY if not used.
+ * @dst: destination address. To set to RPMSG_ADDR_ANY if not used.
  */
 struct rpmsg_endpoint_info {
char name[32];
@@ -23,7 +23,14 @@ struct rpmsg_endpoint_info {
__u32 dst;
 };
 
+/**
+ * Instantiate a new rmpsg char device endpoint.
+ */
 #define RPMSG_CREATE_EPT_IOCTL _IOW(0xb5, 0x1, struct rpmsg_endpoint_info)
+
+/**
+ * Destroy a rpmsg char device endpoint created by the RPMSG_CREATE_EPT_IOCTL.
+ */
 #define RPMSG_DESTROY_EPT_IOCTL_IO(0xb5, 0x2)
 
 #endif
-- 
2.17.1

[PATCH v4 01/16] rpmsg: char: rename rpmsg_char_init to rpmsg_chrdev_init

2021-02-17 Thread Arnaud Pouliquen

To be coherent with the other functions which are prefixed by
rpmsg_chrdev, rename the rpmsg_char_init function.

Signed-off-by: Arnaud Pouliquen 
---
 drivers/rpmsg/rpmsg_char.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/rpmsg/rpmsg_char.c b/drivers/rpmsg/rpmsg_char.c
index 4bbbacdbf3bb..9e33b53bbf56 100644
--- a/drivers/rpmsg/rpmsg_char.c
+++ b/drivers/rpmsg/rpmsg_char.c
@@ -543,7 +543,7 @@ static struct rpmsg_driver rpmsg_chrdev_driver = {
},
 };
 
-static int rpmsg_char_init(void)
+static int rpmsg_chrdev_init(void)
 {
int ret;
 
@@ -569,7 +569,7 @@ static int rpmsg_char_init(void)
 
return ret;
 }
-postcore_initcall(rpmsg_char_init);
+postcore_initcall(rpmsg_chrdev_init);
 
 static void rpmsg_chrdev_exit(void)
 {
-- 
2.17.1

Re: [PATCH 1/2] mm: Make alloc_contig_range handle free hugetlb pages

2021-02-17 Thread Michal Hocko

On Wed 17-02-21 11:08:15, Oscar Salvador wrote:
> Free hugetlb pages are tricky to handle so as to no userspace application
> notices disruption, we need to replace the current free hugepage with
> a new one.
> 
> In order to do that, a new function called alloc_and_dissolve_huge_page
> is introduced.
> This function will first try to get a new fresh hugetlb page, and if it
> succeeds, it will dissolve the old one.
> 
> With regard to the allocation, since we do not know whether the old page
> was allocated on a specific node on request, the node the old page belongs
> to will be tried first, and then we will fallback to all nodes containing
> memory (N_MEMORY).

I do not think fallback to a different zone is ok. If yes then this
really requires a very good reasoning. alloc_contig_range is an
optimistic allocation interface at best and it shouldn't break carefully
node aware preallocation done by administrator.

> Note that gigantic hugetlb pages are fenced off since there is a cyclic
> dependency between them and alloc_contig_range.

Why do we need/want to do all this in the first place?
-- 
Michal Hocko
SUSE Labs

Re: [PATCH] opp: fix dev_pm_opp_set_rate for different frequency at the same opp level

2021-02-17 Thread Jonathan Marek


On 2/16/21 11:53 PM, Viresh Kumar wrote:

On 16-02-21, 15:10, Jonathan Marek wrote:

There is not "nothing to do" when the opp is the same. The frequency can
be different from opp->rate.


I am sorry but I am not sure what are you trying to fix here and what exactly is
broken here. Can you provide a usecase for your platform where this doesn't work
like it used to ?



The specific case is this opp table:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/qcom/sm8250.dtsi#n439

It does not define every possible clock frequency, it only defines the 
rates at which a higher rpmhpd level must be used. Which is the intended 
use of opp.


Your change broke this completely: the clock rate change can be silently 
ignored because the opp level is the same. In particular it breaks 
bluetooth for this platform.



Fixes: 81c4d8a3c414 ("opp: Keep track of currently programmed OPP")
Signed-off-by: Jonathan Marek 
---
  drivers/opp/core.c | 7 +--
  drivers/opp/opp.h  | 1 +
  2 files changed, 6 insertions(+), 2 deletions(-)

[PATCH] staging: gasket Fix comparision with Null

2021-02-17 Thread mayanksuman

From: Mayank Suman 

The change was suggested by checkpatch.pl.

Signed-off-by: Mayank Suman 
---
 drivers/staging/gasket/gasket_sysfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/gasket/gasket_sysfs.c 
b/drivers/staging/gasket/gasket_sysfs.c
index af26bc9f1..c5658fdf4 100644
--- a/drivers/staging/gasket/gasket_sysfs.c
+++ b/drivers/staging/gasket/gasket_sysfs.c
@@ -228,7 +228,7 @@ int gasket_sysfs_create_entries(struct device *device,
}
 
mutex_lock(>mutex);
-   for (i = 0; attrs[i].attr.attr.name != NULL; i++) {
+   for (i = 0; attrs[i].attr.attr.name; i++) {
if (mapping->attribute_count == GASKET_SYSFS_MAX_NODES) {
dev_err(device,
"Maximum number of sysfs nodes reached for 
device\n");
-- 
2.30.0

Re: [PATCH v2 8/8] xen/evtchn: use READ/WRITE_ONCE() for accessing ring indices

2021-02-17 Thread Ross Lagerwall

On 2021-02-11 10:16, Juergen Gross wrote:
> For avoiding read- and write-tearing by the compiler use READ_ONCE()
> and WRITE_ONCE() for accessing the ring indices in evtchn.c.
> 
> Signed-off-by: Juergen Gross 
> ---
> V2:
> - modify all accesses (Julien Grall)
> ---
>  drivers/xen/evtchn.c | 25 -
>  1 file changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
> index 421382c73d88..620008f89dbe 100644
> --- a/drivers/xen/evtchn.c
> +++ b/drivers/xen/evtchn.c
> @@ -162,6 +162,7 @@ static irqreturn_t evtchn_interrupt(int irq, void *data)
>  {
>   struct user_evtchn *evtchn = data;
>   struct per_user_data *u = evtchn->user;
> + unsigned int prod, cons;
>  
>   WARN(!evtchn->enabled,
>"Interrupt for port %u, but apparently not enabled; per-user %p\n",
> @@ -171,10 +172,14 @@ static irqreturn_t evtchn_interrupt(int irq, void *data)
>  
>   spin_lock(>ring_prod_lock);
>  
> - if ((u->ring_prod - u->ring_cons) < u->ring_size) {
> - *evtchn_ring_entry(u, u->ring_prod) = evtchn->port;
> + prod = READ_ONCE(u->ring_prod);
> + cons = READ_ONCE(u->ring_cons);
> +
> + if ((prod - cons) < u->ring_size) {
> + *evtchn_ring_entry(u, prod) = evtchn->port;
>   smp_wmb(); /* Ensure ring contents visible */
> - if (u->ring_cons == u->ring_prod++) {
> + if (cons == prod++) {
> + WRITE_ONCE(u->ring_prod, prod);
>   wake_up_interruptible(>evtchn_wait);
>   kill_fasync(>evtchn_async_queue,
>   SIGIO, POLL_IN);

This doesn't work correctly since now u->ring_prod is only updated if cons == 
prod++.

Ross

Re: [RFT][PATCH v1] cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is known

2021-02-17 Thread Michael Larabel


On 2/15/21 7:49 PM, Michael Larabel wrote:


On 2/15/21 1:24 PM, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

Commit 3c55e94c0ade ("cpufreq: ACPI: Extend frequency tables to cover
boost frequencies") attempted to address a performance issue involving
acpi-cpufreq, the schedutil governor and scale-invariance on x86 by
extending the frequency tables created by acpi-cpufreq to cover the
entire range of "turbo" (or "boost") frequencies, but that caused
frequencies reported via /proc/cpuinfo and the scaling_cur_freq
attribute in sysfs to change which may confuse users and monitoring
tools.

For this reason, revert the part of commit 3c55e94c0ade adding the
extra entry to the frequency table and use the observation that
in principle cpuinfo.max_freq need not be equal to the maximum
frequency listed in the frequency table for the given policy.

Namely, modify cpufreq_frequency_table_cpuinfo() to allow cpufreq
drivers to set their own cpuinfo.max_freq above that frequency and
change  acpi-cpufreq to set cpuinfo.max_freq to the maximum boost
frequency found via CPPC.

This should be sufficient to let all of the cpufreq subsystem know
the real maximum frequency of the CPU without changing frequency
reporting.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=211305
Fixes: 3c55e94c0ade ("cpufreq: ACPI: Extend frequency tables to cover 
boost frequencies")

Reported-by: Matt McDonald 
Signed-off-by: Rafael J. Wysocki 
---

Michael, Giovanni,

The fix for the EPYC performance regression that was merged into 5.11 
introduced

an undesirable side-effect by distorting the CPU frequency reporting via
/proc/cpuinfo and scaling_cur_freq (see the BZ link above for details).

The patch below is reported to address this problem and it should 
still allow

schedutil to achieve desirable performance, because it simply sets
cpuinfo.max_freq without extending the frequency table of the CPU.

Please test this one and let me know if it adversely affects 
performance.


Thanks!



When carrying out tests so far today on an EPYC 7F72 2P and Ryzen 9 
5900X with workloads seeing impact from the prior patches, everything 
is looking good when comparing v5.11 to v5.11 + this patch. Not seeing 
any measurable difference on either of those systems as a result of 
this patch.


Running some additional tests and on a few more boxes that should wrap 
up tomorrow but at least so far the patch isn't showing any measurable 
changes to performance.


Michael



Linux 5.11 + this patch is still looking fine on the mix of EPYC (Zen 2) 
and Ryzen (Zen 2/3) systems tried with the previous workloads. Not 
seeing any measurable change in performance from this new patch, so it's 
looking fine on my end.


Michael

Tested-by: Michael Larabel 






---
  drivers/cpufreq/acpi-cpufreq.c |   62 
++---

  drivers/cpufreq/freq_table.c   |    8 -
  2 files changed, 23 insertions(+), 47 deletions(-)

Index: linux-pm/drivers/cpufreq/acpi-cpufreq.c
===
--- linux-pm.orig/drivers/cpufreq/acpi-cpufreq.c
+++ linux-pm/drivers/cpufreq/acpi-cpufreq.c
@@ -54,7 +54,6 @@ struct acpi_cpufreq_data {
  unsigned int resume;
  unsigned int cpu_feature;
  unsigned int acpi_perf_cpu;
-    unsigned int first_perf_state;
  cpumask_var_t freqdomain_cpus;
  void (*cpu_freq_write)(struct acpi_pct_register *reg, u32 val);
  u32 (*cpu_freq_read)(struct acpi_pct_register *reg);
@@ -223,10 +222,10 @@ static unsigned extract_msr(struct cpufr
    perf = to_perf_data(data);
  -    cpufreq_for_each_entry(pos, policy->freq_table + 
data->first_perf_state)

+    cpufreq_for_each_entry(pos, policy->freq_table)
  if (msr == perf->states[pos->driver_data].status)
  return pos->frequency;
-    return policy->freq_table[data->first_perf_state].frequency;
+    return policy->freq_table[0].frequency;
  }
    static unsigned extract_freq(struct cpufreq_policy *policy, u32 val)
@@ -365,7 +364,6 @@ static unsigned int get_cur_freq_on_cpu(
  struct cpufreq_policy *policy;
  unsigned int freq;
  unsigned int cached_freq;
-    unsigned int state;
    pr_debug("%s (%d)\n", __func__, cpu);
  @@ -377,11 +375,7 @@ static unsigned int get_cur_freq_on_cpu(
  if (unlikely(!data || !policy->freq_table))
  return 0;
  -    state = to_perf_data(data)->state;
-    if (state < data->first_perf_state)
-    state = data->first_perf_state;
-
-    cached_freq = policy->freq_table[state].frequency;
+    cached_freq = 
policy->freq_table[to_perf_data(data)->state].frequency;

  freq = extract_freq(policy, get_cur_val(cpumask_of(cpu), data));
  if (freq != cached_freq) {
  /*
@@ -680,7 +674,6 @@ static int acpi_cpufreq_cpu_init(struct
  struct cpuinfo_x86 *c = _data(cpu);
  unsigned int valid_states = 0;
  unsigned int result = 0;
-    unsigned int state_count;
  u64 max_boost_ratio;
  unsigned int i;

Re: [PATCH RFC 1/4] firmware: Add the support for ZSTD-compressed firmware files

2021-02-17 Thread Luis Chamberlain

On Wed, Jan 27, 2021 at 04:49:36PM +0100, Takashi Iwai wrote:
> Due to the popular demands on ZSTD, here is a patch to add a support
> of ZSTD-compressed firmware files via the direct firmware loader.
> It's just like XZ-compressed file support, providing a decompressor
> with ZSTD.  Since ZSTD API can give the decompression size beforehand,
> the code is even simpler than XZ.
> 
> Signed-off-by: Takashi Iwai 

It also occurs to me that having a simple like #define 
HAVE_FIRMWARE_COMPRESS_ZSTD
on include/linux/firmware.h would enable userspace to be aware (if they
have kernel sources) to determine if the kernels supports this format.

  Luis

Re: [PATCH RFC 1/4] firmware: Add the support for ZSTD-compressed firmware files

2021-02-17 Thread Takashi Iwai

On Wed, 17 Feb 2021 14:16:44 +0100,
Luis Chamberlain wrote:
> 
> On Wed, Jan 27, 2021 at 04:49:36PM +0100, Takashi Iwai wrote:
> > Due to the popular demands on ZSTD, here is a patch to add a support
> > of ZSTD-compressed firmware files via the direct firmware loader.
> > It's just like XZ-compressed file support, providing a decompressor
> > with ZSTD.  Since ZSTD API can give the decompression size beforehand,
> > the code is even simpler than XZ.
> > 
> > Signed-off-by: Takashi Iwai 
> > ---
> >  drivers/base/firmware_loader/Kconfig | 21 ++--
> >  drivers/base/firmware_loader/main.c  | 74 ++--
> >  2 files changed, 87 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/base/firmware_loader/Kconfig 
> > b/drivers/base/firmware_loader/Kconfig
> > index 5b24f3959255..f5307978927c 100644
> > --- a/drivers/base/firmware_loader/Kconfig
> > +++ b/drivers/base/firmware_loader/Kconfig
> > @@ -157,17 +157,28 @@ config FW_LOADER_USER_HELPER_FALLBACK
> >  
> >  config FW_LOADER_COMPRESS
> > bool "Enable compressed firmware support"
> > -   select FW_LOADER_PAGED_BUF
> > -   select XZ_DEC
> > help
> >   This option enables the support for loading compressed firmware
> >   files. The caller of firmware API receives the decompressed file
> >   content. The compressed file is loaded as a fallback, only after
> >   loading the raw file failed at first.
> >  
> > - Currently only XZ-compressed files are supported, and they have to
> > - be compressed with either none or crc32 integrity check type (pass
> > - "-C crc32" option to xz command).
> > +if FW_LOADER_COMPRESS
> > +config FW_LOADER_COMPRESS_XZ
> > +   bool "Enable XZ-compressed firmware support"
> > +   select FW_LOADER_PAGED_BUF
> > +   select XZ_DEC
> > +   help
> > + This option adds the support for XZ-compressed files.
> > + The files have to be compressed with either none or crc32
> > + integrity check type (pass "-C crc32" option to xz command).
> > +
> > +config FW_LOADER_COMPRESS_ZSTD
> > +   bool "Enable ZSTD-compressed firmware support"
> > +   select ZSTD_DECOMPRESS
> > +   help
> > + This option adds the support for ZSTD-compressed files.
> > +endif # FW_LOADER_COMPRESS
> >  
> >  config FW_CACHE
> > bool "Enable firmware caching during suspend"
> > diff --git a/drivers/base/firmware_loader/main.c 
> > b/drivers/base/firmware_loader/main.c
> > index 78355095e00d..71332ed4959d 100644
> > --- a/drivers/base/firmware_loader/main.c
> > +++ b/drivers/base/firmware_loader/main.c
> > @@ -34,6 +34,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  
> >  #include 
> > @@ -362,10 +363,72 @@ int fw_map_paged_buf(struct fw_priv *fw_priv)
> >  }
> >  #endif
> >  
> > +/*
> > + * ZSTD-compressed firmware support
> > + */
> > +#ifdef CONFIG_FW_LOADER_COMPRESS_ZSTD
> > +static int fw_decompress_zstd(struct device *dev, struct fw_priv *fw_priv,
> > + size_t in_size, const void *in_buffer)
> > +{
> > +   size_t len, out_size, workspace_size;
> > +   void *workspace, *out_buf;
> > +   ZSTD_DCtx *ctx;
> > +   int err;
> > +
> > +   if (fw_priv->data) {
> > +   out_size = fw_priv->allocated_size;
> > +   out_buf = fw_priv->data;
> > +   } else {
> > +   out_size = ZSTD_findDecompressedSize(in_buffer, in_size);
> > +   if (out_size == ZSTD_CONTENTSIZE_UNKNOWN ||
> > +   out_size == ZSTD_CONTENTSIZE_ERROR) {
> > +   dev_dbg(dev, "%s: invalid decompression size\n", 
> > __func__);
> > +   return -EINVAL;
> > +   }
> > +   out_buf = vzalloc(out_size);
> > +   if (!out_buf)
> > +   return -ENOMEM;
> > +   }
> > +
> > +   workspace_size = ZSTD_DCtxWorkspaceBound();
> > +   workspace = kvzalloc(workspace_size, GFP_KERNEL);
> > +   if (!workspace) {
> > +   err = -ENOMEM;
> > +   goto error;
> > +   }
> > +
> > +   ctx = ZSTD_initDCtx(workspace, workspace_size);
> > +   if (!ctx) {
> > +   dev_dbg(dev, "%s: failed to initialize context\n", __func__);
> > +   err = -EINVAL;
> > +   goto error;
> > +   }
> > +
> > +   len = ZSTD_decompressDCtx(ctx, out_buf, out_size, in_buffer, in_size);
> > +   if (ZSTD_isError(len)) {
> > +   dev_dbg(dev, "%s: failed to decompress: %d\n", __func__,
> > +   ZSTD_getErrorCode(len));
> > +   err = -EINVAL;
> > +   goto error;
> > +   }
> > +
> > +   fw_priv->size = len;
> > +   if (!fw_priv->data)
> > +   fw_priv->data = out_buf;
> > +   err = 0;
> > +
> > + error:
> > +   kvfree(workspace);
> > +   if (!fw_priv->data)
> > +   vfree(out_buf);
> > +   return err;
> > +}
> > +#endif /* CONFIG_FW_LOADER_COMPRESS_ZSTD */
> > +
> >  /*
> >   * XZ-compressed firmware support
> >   */
> > -#ifdef CONFIG_FW_LOADER_COMPRESS
> > +#ifdef CONFIG_FW_LOADER_COMPRESS_XZ
> >  /* show an error and return the

Re: [PATCH 06/16] media: i2c: rdacm20: Re-work ov10635 reset

2021-02-17 Thread Kieran Bingham

On 16/02/2021 17:41, Jacopo Mondi wrote:
> The OV10635 image sensor embedded in the camera module is currently
> reset after the MAX9271 initialization with two long delays that were
> most probably not correctly characterized.
> 
> Re-work the image sensor reset procedure by holding the chip in reset
> during the MAX9271 configuration, removing the long sleep delays and
> only wait after the chip exits from reset for 350-500 microseconds
> interval, which is larger than the minimum (2048 * (1 / XVCLK)) timeout
> characterized in the chip manual.

Holding the OV10635 in reset earlier sounds good to me, but I don't know
beyond that what implications there would be. If it's working better
that's good though.

Reviewed-by: Kieran Bingham 


> Signed-off-by: Jacopo Mondi 
> ---
>  drivers/media/i2c/rdacm20.c | 25 +++--
>  1 file changed, 11 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/media/i2c/rdacm20.c b/drivers/media/i2c/rdacm20.c
> index e982373908f2..ea30cc936531 100644
> --- a/drivers/media/i2c/rdacm20.c
> +++ b/drivers/media/i2c/rdacm20.c
> @@ -477,6 +477,15 @@ static int rdacm20_initialize(struct rdacm20_device *dev)
>   if (ret)
>   return ret;
>  
> + /* Hold OV10635 in reset during max9271 configuration. */
> + ret = max9271_enable_gpios(>serializer, MAX9271_GPIO1OUT);
> + if (ret)
> + return ret;
> +
> + ret = max9271_clear_gpios(>serializer, MAX9271_GPIO1OUT);
> + if (ret)
> + return ret;
> +
>   ret = max9271_configure_gmsl_link(>serializer);
>   if (ret)
>   return ret;
> @@ -490,23 +499,11 @@ static int rdacm20_initialize(struct rdacm20_device 
> *dev)
>   return ret;
>   dev->serializer.client->addr = dev->addrs[0];
>  
> - /*
> -  * Reset the sensor by cycling the OV10635 reset signal connected to the
> -  * MAX9271 GPIO1 and verify communication with the OV10635.
> -  */
> - ret = max9271_enable_gpios(>serializer, MAX9271_GPIO1OUT);
> - if (ret)
> - return ret;
> -
> - ret = max9271_clear_gpios(>serializer, MAX9271_GPIO1OUT);
> - if (ret)
> - return ret;
> - usleep_range(1, 15000);
> -
> + /* Release ov10635 from reset and initialize it. */
>   ret = max9271_set_gpios(>serializer, MAX9271_GPIO1OUT);
>   if (ret)
>   return ret;
> - usleep_range(1, 15000);
> + usleep_range(350, 500);
>  
>   for (i = 0; i < OV10635_PID_TIMEOUT; ++i) {
>   ret = ov10635_read16(dev, OV10635_PID);
>

Re: [PATCH V0 1/6] dt-bindings: Added the yaml bindings for DCC

2021-02-17 Thread schowdhu


On 2021-02-17 16:32, Vinod Koul wrote:

On 17-02-21, 12:18, Souradeep Chowdhury wrote:

Documentation for Data Capture and Compare(DCC) device tree bindings
in yaml format.

Signed-off-by: Souradeep Chowdhury 
---
 .../devicetree/bindings/arm/msm/qcom,dcc.yaml  | 49 
++

 1 file changed, 49 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/arm/msm/qcom,dcc.yaml


diff --git a/Documentation/devicetree/bindings/arm/msm/qcom,dcc.yaml 
b/Documentation/devicetree/bindings/arm/msm/qcom,dcc.yaml

new file mode 100644
index 000..8f09578
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/msm/qcom,dcc.yaml
@@ -0,0 +1,49 @@
+# SPDX-License-Identifier: (GPL-2.0-or-later OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/arm/msm/qcom,dcc.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Data Capture and Compare
+
+maintainers:
+  - Souradeep Chowdhury 
+
+description: |
+DCC (Data Capture and Compare) is a DMA engine which is used to 
save
+configuration data or system memory contents during catastrophic 
failure
+or SW trigger.DCC is used to capture and store data for debugging 
purpose


space after .


Ack




+
+
+properties:
+  compatible:
+items:
+- enum:
+  - qcom,sm8150-dcc
+- const: qcom,dcc
+
+  reg:
+items:
+  - description: DCC base register region
+  - description: DCC RAM base register region
+
+  reg-names:
+items:
+  - const: dcc-base
+  - const: dcc-ram-base


drop dcc from names


Since DCC has a dedicated SRAM, this has been named like this as
only base and ram-base are generic names. Let me know if this is
still required to be changed.




+
+required:
+  - compatible
+  - reg
+  - reg-names
+
+additionalProperties: false
+
+examples:
+  - |
+dcc@010a2000{
+compatible = "qcom,sm8150-dcc";


should this not be:
compatible = "qcom,sm8150-dcc", "qcom,dcc";


Ack




+reg = <0 0x010a2000 0  0x1000>,
+  <0 0x010ae000 0  0x2000>;
+reg-names = "dcc-base", "dcc-ram-base";
+};
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation

[tip: sched/core] sched/fair: Merge select_idle_core/cpu()

2021-02-17 Thread tip-bot2 for Mel Gorman

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 9fe1f127b913318c631d0041ecf71486e38c2c2d
Gitweb:
https://git.kernel.org/tip/9fe1f127b913318c631d0041ecf71486e38c2c2d
Author:Mel Gorman 
AuthorDate:Wed, 27 Jan 2021 13:52:03 
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:07:25 +01:00

sched/fair: Merge select_idle_core/cpu()

Both select_idle_core() and select_idle_cpu() do a loop over the same
cpumask. Observe that by clearing the already visited CPUs, we can
fold the iteration and iterate a core at a time.

All we need to do is remember any non-idle CPU we encountered while
scanning for an idle core. This way we'll only iterate every CPU once.

Signed-off-by: Mel Gorman 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Reviewed-by: Vincent Guittot 
Link: 
https://lkml.kernel.org/r/20210127135203.19633-5-mgor...@techsingularity.net
---
 kernel/sched/fair.c |  99 +--
 1 file changed, 59 insertions(+), 40 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6a0fc8a..c73d588 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6019,6 +6019,14 @@ static inline int find_idlest_cpu(struct sched_domain 
*sd, struct task_struct *p
return new_cpu;
 }
 
+static inline int __select_idle_cpu(int cpu)
+{
+   if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
+   return cpu;
+
+   return -1;
+}
+
 #ifdef CONFIG_SCHED_SMT
 DEFINE_STATIC_KEY_FALSE(sched_smt_present);
 EXPORT_SYMBOL_GPL(sched_smt_present);
@@ -6077,48 +6085,51 @@ unlock:
  * there are no idle cores left in the system; tracked through
  * sd_llc->shared->has_idle_cores and enabled through update_idle_core() above.
  */
-static int select_idle_core(struct task_struct *p, struct sched_domain *sd, 
int target)
+static int select_idle_core(struct task_struct *p, int core, struct cpumask 
*cpus, int *idle_cpu)
 {
-   struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
-   int core, cpu;
+   bool idle = true;
+   int cpu;
 
if (!static_branch_likely(_smt_present))
-   return -1;
-
-   if (!test_idle_cores(target, false))
-   return -1;
-
-   cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+   return __select_idle_cpu(core);
 
-   for_each_cpu_wrap(core, cpus, target) {
-   bool idle = true;
-
-   for_each_cpu(cpu, cpu_smt_mask(core)) {
-   if (!available_idle_cpu(cpu)) {
-   idle = false;
-   break;
+   for_each_cpu(cpu, cpu_smt_mask(core)) {
+   if (!available_idle_cpu(cpu)) {
+   idle = false;
+   if (*idle_cpu == -1) {
+   if (sched_idle_cpu(cpu) && 
cpumask_test_cpu(cpu, p->cpus_ptr)) {
+   *idle_cpu = cpu;
+   break;
+   }
+   continue;
}
+   break;
}
-
-   if (idle)
-   return core;
-
-   cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
+   if (*idle_cpu == -1 && cpumask_test_cpu(cpu, p->cpus_ptr))
+   *idle_cpu = cpu;
}
 
-   /*
-* Failed to find an idle core; stop looking for one.
-*/
-   set_idle_cores(target, 0);
+   if (idle)
+   return core;
 
+   cpumask_andnot(cpus, cpus, cpu_smt_mask(core));
return -1;
 }
 
 #else /* CONFIG_SCHED_SMT */
 
-static inline int select_idle_core(struct task_struct *p, struct sched_domain 
*sd, int target)
+static inline void set_idle_cores(int cpu, int val)
 {
-   return -1;
+}
+
+static inline bool test_idle_cores(int cpu, bool def)
+{
+   return def;
+}
+
+static inline int select_idle_core(struct task_struct *p, int core, struct 
cpumask *cpus, int *idle_cpu)
+{
+   return __select_idle_cpu(core);
 }
 
 #endif /* CONFIG_SCHED_SMT */
@@ -6131,10 +6142,11 @@ static inline int select_idle_core(struct task_struct 
*p, struct sched_domain *s
 static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int 
target)
 {
struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
+   int i, cpu, idle_cpu = -1, nr = INT_MAX;
+   bool smt = test_idle_cores(target, false);
+   int this = smp_processor_id();
struct sched_domain *this_sd;
u64 time;
-   int this = smp_processor_id();
-   int cpu, nr = INT_MAX;
 
this_sd = rcu_dereference(*this_cpu_ptr(_llc));
if (!this_sd)
@@ -6142,7 +6154,7 @@ static int select_idle_cpu(struct task_struct *p, struct 
sched_domain *sd, int t
 
cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
 
-   if

[tip: sched/core] rbtree, perf: Use new rbtree helpers

2021-02-17 Thread tip-bot2 for Peter Zijlstra

The following commit has been merged into the sched/core branch of tip:

Commit-ID: a3b89864554bbce1594b7abdb5739fc708c1ca95
Gitweb:
https://git.kernel.org/tip/a3b89864554bbce1594b7abdb5739fc708c1ca95
Author:Peter Zijlstra 
AuthorDate:Wed, 29 Apr 2020 17:05:15 +02:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:07:48 +01:00

rbtree, perf: Use new rbtree helpers

Reduce rbtree boiler plate by using the new helpers.

One noteworthy change is unification of the various (partial) compare
functions. We construct a subtree match by forcing the sub-order to
always match, see __group_cmp().

Due to 'const' we had to touch cgroup_id().

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Acked-by: Tejun Heo 
Acked-by: Davidlohr Bueso 
---
 include/linux/cgroup.h |   4 +-
 kernel/events/core.c   | 195 ++--
 2 files changed, 92 insertions(+), 107 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 451c2d2..4f2f79d 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -307,7 +307,7 @@ void css_task_iter_end(struct css_task_iter *it);
  * Inline functions.
  */
 
-static inline u64 cgroup_id(struct cgroup *cgrp)
+static inline u64 cgroup_id(const struct cgroup *cgrp)
 {
return cgrp->kn->id;
 }
@@ -701,7 +701,7 @@ void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t 
buflen);
 struct cgroup_subsys_state;
 struct cgroup;
 
-static inline u64 cgroup_id(struct cgroup *cgrp) { return 1; }
+static inline u64 cgroup_id(const struct cgroup *cgrp) { return 1; }
 static inline void css_get(struct cgroup_subsys_state *css) {}
 static inline void css_put(struct cgroup_subsys_state *css) {}
 static inline int cgroup_attach_task_all(struct task_struct *from,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 55d1879..3d89096 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1595,50 +1595,91 @@ static void perf_event_groups_init(struct 
perf_event_groups *groups)
groups->index = 0;
 }
 
+static inline struct cgroup *event_cgroup(const struct perf_event *event)
+{
+   struct cgroup *cgroup = NULL;
+
+#ifdef CONFIG_CGROUP_PERF
+   if (event->cgrp)
+   cgroup = event->cgrp->css.cgroup;
+#endif
+
+   return cgroup;
+}
+
 /*
  * Compare function for event groups;
  *
  * Implements complex key that first sorts by CPU and then by virtual index
  * which provides ordering when rotating groups for the same CPU.
  */
-static bool
-perf_event_groups_less(struct perf_event *left, struct perf_event *right)
+static __always_inline int
+perf_event_groups_cmp(const int left_cpu, const struct cgroup *left_cgroup,
+ const u64 left_group_index, const struct perf_event 
*right)
 {
-   if (left->cpu < right->cpu)
-   return true;
-   if (left->cpu > right->cpu)
-   return false;
+   if (left_cpu < right->cpu)
+   return -1;
+   if (left_cpu > right->cpu)
+   return 1;
 
 #ifdef CONFIG_CGROUP_PERF
-   if (left->cgrp != right->cgrp) {
-   if (!left->cgrp || !left->cgrp->css.cgroup) {
-   /*
-* Left has no cgroup but right does, no cgroups come
-* first.
-*/
-   return true;
-   }
-   if (!right->cgrp || !right->cgrp->css.cgroup) {
-   /*
-* Right has no cgroup but left does, no cgroups come
-* first.
-*/
-   return false;
-   }
-   /* Two dissimilar cgroups, order by id. */
-   if (left->cgrp->css.cgroup->kn->id < 
right->cgrp->css.cgroup->kn->id)
-   return true;
+   {
+   const struct cgroup *right_cgroup = event_cgroup(right);
 
-   return false;
+   if (left_cgroup != right_cgroup) {
+   if (!left_cgroup) {
+   /*
+* Left has no cgroup but right does, no
+* cgroups come first.
+*/
+   return -1;
+   }
+   if (!right_cgroup) {
+   /*
+* Right has no cgroup but left does, no
+* cgroups come first.
+*/
+   return 1;
+   }
+   /* Two dissimilar cgroups, order by id. */
+   if (cgroup_id(left_cgroup) < cgroup_id(right_cgroup))
+   return -1;
+
+   return 1;
+   }
}
 #endif
 
-   if (left->group_index < right->group_index)
-

[tip: sched/core] rbtree, timerqueue: Use rb_add_cached()

2021-02-17 Thread tip-bot2 for Peter Zijlstra

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 798172b1374e28ecf687d6662fc5fdaec5c65385
Gitweb:
https://git.kernel.org/tip/798172b1374e28ecf687d6662fc5fdaec5c65385
Author:Peter Zijlstra 
AuthorDate:Wed, 29 Apr 2020 17:07:53 +02:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:08:01 +01:00

rbtree, timerqueue: Use rb_add_cached()

Reduce rbtree boiler plate by using the new helpers.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Acked-by: Davidlohr Bueso 
---
 lib/timerqueue.c | 28 +---
 1 file changed, 9 insertions(+), 19 deletions(-)

diff --git a/lib/timerqueue.c b/lib/timerqueue.c
index c527109..cdb9c76 100644
--- a/lib/timerqueue.c
+++ b/lib/timerqueue.c
@@ -14,6 +14,14 @@
 #include 
 #include 
 
+#define __node_2_tq(_n) \
+   rb_entry((_n), struct timerqueue_node, node)
+
+static inline bool __timerqueue_less(struct rb_node *a, const struct rb_node 
*b)
+{
+   return __node_2_tq(a)->expires < __node_2_tq(b)->expires;
+}
+
 /**
  * timerqueue_add - Adds timer to timerqueue.
  *
@@ -26,28 +34,10 @@
  */
 bool timerqueue_add(struct timerqueue_head *head, struct timerqueue_node *node)
 {
-   struct rb_node **p = >rb_root.rb_root.rb_node;
-   struct rb_node *parent = NULL;
-   struct timerqueue_node *ptr;
-   bool leftmost = true;
-
/* Make sure we don't add nodes that are already added */
WARN_ON_ONCE(!RB_EMPTY_NODE(>node));
 
-   while (*p) {
-   parent = *p;
-   ptr = rb_entry(parent, struct timerqueue_node, node);
-   if (node->expires < ptr->expires) {
-   p = &(*p)->rb_left;
-   } else {
-   p = &(*p)->rb_right;
-   leftmost = false;
-   }
-   }
-   rb_link_node(>node, parent, p);
-   rb_insert_color_cached(>node, >rb_root, leftmost);
-
-   return leftmost;
+   return rb_add_cached(>node, >rb_root, __timerqueue_less);
 }
 EXPORT_SYMBOL_GPL(timerqueue_add);

[tip: sched/core] rbtree, uprobes: Use rbtree helpers

2021-02-17 Thread tip-bot2 for Peter Zijlstra

The following commit has been merged into the sched/core branch of tip:

Commit-ID: a905e84e64083a0ee701f61810badee234050825
Gitweb:
https://git.kernel.org/tip/a905e84e64083a0ee701f61810badee234050825
Author:Peter Zijlstra 
AuthorDate:Wed, 29 Apr 2020 17:06:27 +02:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:07:52 +01:00

rbtree, uprobes: Use rbtree helpers

Reduce rbtree boilerplate by using the new helpers.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Acked-by: Davidlohr Bueso 
---
 kernel/events/uprobes.c | 80 +++-
 1 file changed, 39 insertions(+), 41 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index bf9edd8..fd5160d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -613,41 +613,56 @@ static void put_uprobe(struct uprobe *uprobe)
}
 }
 
-static int match_uprobe(struct uprobe *l, struct uprobe *r)
+static __always_inline
+int uprobe_cmp(const struct inode *l_inode, const loff_t l_offset,
+  const struct uprobe *r)
 {
-   if (l->inode < r->inode)
+   if (l_inode < r->inode)
return -1;
 
-   if (l->inode > r->inode)
+   if (l_inode > r->inode)
return 1;
 
-   if (l->offset < r->offset)
+   if (l_offset < r->offset)
return -1;
 
-   if (l->offset > r->offset)
+   if (l_offset > r->offset)
return 1;
 
return 0;
 }
 
+#define __node_2_uprobe(node) \
+   rb_entry((node), struct uprobe, rb_node)
+
+struct __uprobe_key {
+   struct inode *inode;
+   loff_t offset;
+};
+
+static inline int __uprobe_cmp_key(const void *key, const struct rb_node *b)
+{
+   const struct __uprobe_key *a = key;
+   return uprobe_cmp(a->inode, a->offset, __node_2_uprobe(b));
+}
+
+static inline int __uprobe_cmp(struct rb_node *a, const struct rb_node *b)
+{
+   struct uprobe *u = __node_2_uprobe(a);
+   return uprobe_cmp(u->inode, u->offset, __node_2_uprobe(b));
+}
+
 static struct uprobe *__find_uprobe(struct inode *inode, loff_t offset)
 {
-   struct uprobe u = { .inode = inode, .offset = offset };
-   struct rb_node *n = uprobes_tree.rb_node;
-   struct uprobe *uprobe;
-   int match;
+   struct __uprobe_key key = {
+   .inode = inode,
+   .offset = offset,
+   };
+   struct rb_node *node = rb_find(, _tree, __uprobe_cmp_key);
 
-   while (n) {
-   uprobe = rb_entry(n, struct uprobe, rb_node);
-   match = match_uprobe(, uprobe);
-   if (!match)
-   return get_uprobe(uprobe);
+   if (node)
+   return __node_2_uprobe(node);
 
-   if (match < 0)
-   n = n->rb_left;
-   else
-   n = n->rb_right;
-   }
return NULL;
 }
 
@@ -668,32 +683,15 @@ static struct uprobe *find_uprobe(struct inode *inode, 
loff_t offset)
 
 static struct uprobe *__insert_uprobe(struct uprobe *uprobe)
 {
-   struct rb_node **p = _tree.rb_node;
-   struct rb_node *parent = NULL;
-   struct uprobe *u;
-   int match;
+   struct rb_node *node;
 
-   while (*p) {
-   parent = *p;
-   u = rb_entry(parent, struct uprobe, rb_node);
-   match = match_uprobe(uprobe, u);
-   if (!match)
-   return get_uprobe(u);
+   node = rb_find_add(>rb_node, _tree, __uprobe_cmp);
+   if (node)
+   return get_uprobe(__node_2_uprobe(node));
 
-   if (match < 0)
-   p = >rb_left;
-   else
-   p = >rb_right;
-
-   }
-
-   u = NULL;
-   rb_link_node(>rb_node, parent, p);
-   rb_insert_color(>rb_node, _tree);
/* get access + creation ref */
refcount_set(>ref, 2);
-
-   return u;
+   return NULL;
 }
 
 /*

[tip: sched/core] static_call: Provide DEFINE_STATIC_CALL_RET0()

2021-02-17 Thread tip-bot2 for Frederic Weisbecker

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 29fd01944b7273bb630c649a2104b7f9e4ef3fa6
Gitweb:
https://git.kernel.org/tip/29fd01944b7273bb630c649a2104b7f9e4ef3fa6
Author:Frederic Weisbecker 
AuthorDate:Mon, 18 Jan 2021 15:12:17 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:08:51 +01:00

static_call: Provide DEFINE_STATIC_CALL_RET0()

DECLARE_STATIC_CALL() must pass the original function targeted for a
given static call. But DEFINE_STATIC_CALL() may want to initialize it as
off. In this case we can't pass NULL (for functions without return value)
or __static_call_return0 (for functions returning a value) directly
to DEFINE_STATIC_CALL() as that may trigger a static call redeclaration
with a different function prototype. Type casts neither can work around
that as they don't get along with typeof().

The proper way to do that for functions that don't return a value is
to use DEFINE_STATIC_CALL_NULL(). But functions returning a actual value
don't have an equivalent yet.

Provide DEFINE_STATIC_CALL_RET0() to solve this situation.

Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Link: https://lkml.kernel.org/r/20210118141223.123667-3-frede...@kernel.org
---
 include/linux/static_call.h | 22 ++
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/linux/static_call.h b/include/linux/static_call.h
index bd6735d..d69dd8b 100644
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -144,13 +144,13 @@ extern int static_call_text_reserved(void *start, void 
*end);
 
 extern long __static_call_return0(void);
 
-#define DEFINE_STATIC_CALL(name, _func)
\
+#define __DEFINE_STATIC_CALL(name, _func, _func_init)  \
DECLARE_STATIC_CALL(name, _func);   \
struct static_call_key STATIC_CALL_KEY(name) = {\
-   .func = _func,  \
+   .func = _func_init, \
.type = 1,  \
};  \
-   ARCH_DEFINE_STATIC_CALL_TRAMP(name, _func)
+   ARCH_DEFINE_STATIC_CALL_TRAMP(name, _func_init)
 
 #define DEFINE_STATIC_CALL_NULL(name, _func)   \
DECLARE_STATIC_CALL(name, _func);   \
@@ -178,12 +178,12 @@ struct static_call_key {
void *func;
 };
 
-#define DEFINE_STATIC_CALL(name, _func)
\
+#define __DEFINE_STATIC_CALL(name, _func, _func_init)  \
DECLARE_STATIC_CALL(name, _func);   \
struct static_call_key STATIC_CALL_KEY(name) = {\
-   .func = _func,  \
+   .func = _func_init, \
};  \
-   ARCH_DEFINE_STATIC_CALL_TRAMP(name, _func)
+   ARCH_DEFINE_STATIC_CALL_TRAMP(name, _func_init)
 
 #define DEFINE_STATIC_CALL_NULL(name, _func)   \
DECLARE_STATIC_CALL(name, _func);   \
@@ -234,10 +234,10 @@ static inline long __static_call_return0(void)
return 0;
 }
 
-#define DEFINE_STATIC_CALL(name, _func)
\
+#define __DEFINE_STATIC_CALL(name, _func, _func_init)  \
DECLARE_STATIC_CALL(name, _func);   \
struct static_call_key STATIC_CALL_KEY(name) = {\
-   .func = _func,  \
+   .func = _func_init, \
}
 
 #define DEFINE_STATIC_CALL_NULL(name, _func)   \
@@ -286,4 +286,10 @@ static inline int static_call_text_reserved(void *start, 
void *end)
 
 #endif /* CONFIG_HAVE_STATIC_CALL */
 
+#define DEFINE_STATIC_CALL(name, _func)
\
+   __DEFINE_STATIC_CALL(name, _func, _func)
+
+#define DEFINE_STATIC_CALL_RET0(name, _func)   \
+   __DEFINE_STATIC_CALL(name, _func, __static_call_return0)
+
 #endif /* _LINUX_STATIC_CALL_H */

[tip: sched/core] sched: Remove USER_PRIO, TASK_USER_PRIO and MAX_USER_PRIO

2021-02-17 Thread tip-bot2 for Dietmar Eggemann

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 9d061ba6bc170045857f3efe0bba5def30188d4d
Gitweb:
https://git.kernel.org/tip/9d061ba6bc170045857f3efe0bba5def30188d4d
Author:Dietmar Eggemann 
AuthorDate:Thu, 28 Jan 2021 14:10:39 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:08:17 +01:00

sched: Remove USER_PRIO, TASK_USER_PRIO and MAX_USER_PRIO

The only remaining use of MAX_USER_PRIO (and USER_PRIO) is the
SCALE_PRIO() definition in the PowerPC Cell architecture's Synergistic
Processor Unit (SPU) scheduler. TASK_USER_PRIO isn't used anymore.

Commit fe443ef2ac42 ("[POWERPC] spusched: Dynamic timeslicing for
SCHED_OTHER") copied SCALE_PRIO() from the task scheduler in v2.6.23.

Commit a4ec24b48dde ("sched: tidy up SCHED_RR") removed it from the task
scheduler in v2.6.24.

Commit 3ee237dddcd8 ("sched/prio: Add 3 macros of MAX_NICE, MIN_NICE and
NICE_WIDTH in prio.h") introduced NICE_WIDTH much later.

With:

  MAX_USER_PRIO = USER_PRIO(MAX_PRIO)

= MAX_PRIO - MAX_RT_PRIO

   MAX_PRIO = MAX_RT_PRIO + NICE_WIDTH

  MAX_USER_PRIO = MAX_RT_PRIO + NICE_WIDTH - MAX_RT_PRIO

  MAX_USER_PRIO = NICE_WIDTH

MAX_USER_PRIO can be replaced by NICE_WIDTH to be able to remove all the
{*_}USER_PRIO defines.

Signed-off-by: Dietmar Eggemann 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Link: https://lkml.kernel.org/r/20210128131040.296856-3-dietmar.eggem...@arm.com
---
 arch/powerpc/platforms/cell/spufs/sched.c |  2 +-
 include/linux/sched/prio.h|  9 -
 kernel/sched/sched.h  |  2 +-
 3 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/platforms/cell/spufs/sched.c 
b/arch/powerpc/platforms/cell/spufs/sched.c
index f18d506..aeb7f39 100644
--- a/arch/powerpc/platforms/cell/spufs/sched.c
+++ b/arch/powerpc/platforms/cell/spufs/sched.c
@@ -72,7 +72,7 @@ static struct timer_list spuloadavg_timer;
 #define DEF_SPU_TIMESLICE  (100 * HZ / (1000 * SPUSCHED_TICK))
 
 #define SCALE_PRIO(x, prio) \
-   max(x * (MAX_PRIO - prio) / (MAX_USER_PRIO / 2), MIN_SPU_TIMESLICE)
+   max(x * (MAX_PRIO - prio) / (NICE_WIDTH / 2), MIN_SPU_TIMESLICE)
 
 /*
  * scale user-nice values [ -20 ... 0 ... 19 ] to time slice values:
diff --git a/include/linux/sched/prio.h b/include/linux/sched/prio.h
index d111f2f..ab83d85 100644
--- a/include/linux/sched/prio.h
+++ b/include/linux/sched/prio.h
@@ -27,15 +27,6 @@
 #define PRIO_TO_NICE(prio) ((prio) - DEFAULT_PRIO)
 
 /*
- * 'User priority' is the nice value converted to something we
- * can work with better when scaling various scheduler parameters,
- * it's a [ 0 ... 39 ] range.
- */
-#define USER_PRIO(p)   ((p)-MAX_RT_PRIO)
-#define TASK_USER_PRIO(p)  USER_PRIO((p)->static_prio)
-#define MAX_USER_PRIO  (USER_PRIO(MAX_PRIO))
-
-/*
  * Convert nice value [19,-20] to rlimit style value [1,40].
  */
 static inline long nice_to_rlimit(long nice)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index f519aba..2185b3b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -140,7 +140,7 @@ extern void call_trace_sched_update_nr_running(struct rq 
*rq, int count);
  * scale_load() and scale_load_down(w) to convert between them. The
  * following must be true:
  *
- *  scale_load(sched_prio_to_weight[USER_PRIO(NICE_TO_PRIO(0))]) == NICE_0_LOAD
+ *  scale_load(sched_prio_to_weight[NICE_TO_PRIO(0)-MAX_RT_PRIO]) == 
NICE_0_LOAD
  *
  */
 #define NICE_0_LOAD(1L << NICE_0_LOAD_SHIFT)

[tip: sched/core] static_call/x86: Add __static_call_return0()

2021-02-17 Thread tip-bot2 for Peter Zijlstra

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 3f2a8fc4b15de18644e8a80a09edda168676e22c
Gitweb:
https://git.kernel.org/tip/3f2a8fc4b15de18644e8a80a09edda168676e22c
Author:Peter Zijlstra 
AuthorDate:Mon, 18 Jan 2021 15:12:16 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:08:43 +01:00

static_call/x86: Add __static_call_return0()

Provide a stub function that return 0 and wire up the static call site
patching to replace the CALL with a single 5 byte instruction that
clears %RAX, the return value register.

The function can be cast to any function pointer type that has a
single %RAX return (including pointers). Also provide a version that
returns an int for convenience. We are clearing the entire %RAX register
in any case, whether the return value is 32 or 64 bits, since %RAX is
always a scratch register anyway.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Link: https://lkml.kernel.org/r/20210118141223.123667-2-frede...@kernel.org
---
 arch/x86/kernel/static_call.c | 17 +++--
 include/linux/static_call.h   | 12 
 kernel/static_call.c  |  5 +
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/static_call.c b/arch/x86/kernel/static_call.c
index ca9a380..9442c41 100644
--- a/arch/x86/kernel/static_call.c
+++ b/arch/x86/kernel/static_call.c
@@ -11,14 +11,26 @@ enum insn_type {
RET = 3,  /* tramp / site cond-tail-call */
 };
 
+/*
+ * data16 data16 xorq %rax, %rax - a single 5 byte instruction that clears %rax
+ * The REX.W cancels the effect of any data16.
+ */
+static const u8 xor5rax[] = { 0x66, 0x66, 0x48, 0x31, 0xc0 };
+
 static void __ref __static_call_transform(void *insn, enum insn_type type, 
void *func)
 {
+   const void *emulate = NULL;
int size = CALL_INSN_SIZE;
const void *code;
 
switch (type) {
case CALL:
code = text_gen_insn(CALL_INSN_OPCODE, insn, func);
+   if (func == &__static_call_return0) {
+   emulate = code;
+   code = 
+   }
+
break;
 
case NOP:
@@ -41,7 +53,7 @@ static void __ref __static_call_transform(void *insn, enum 
insn_type type, void 
if (unlikely(system_state == SYSTEM_BOOTING))
return text_poke_early(insn, code, size);
 
-   text_poke_bp(insn, code, size, NULL);
+   text_poke_bp(insn, code, size, emulate);
 }
 
 static void __static_call_validate(void *insn, bool tail)
@@ -54,7 +66,8 @@ static void __static_call_validate(void *insn, bool tail)
return;
} else {
if (opcode == CALL_INSN_OPCODE ||
-   !memcmp(insn, ideal_nops[NOP_ATOMIC5], 5))
+   !memcmp(insn, ideal_nops[NOP_ATOMIC5], 5) ||
+   !memcmp(insn, xor5rax, 5))
return;
}
 
diff --git a/include/linux/static_call.h b/include/linux/static_call.h
index a2c0645..bd6735d 100644
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -142,6 +142,8 @@ extern void __static_call_update(struct static_call_key 
*key, void *tramp, void 
 extern int static_call_mod_init(struct module *mod);
 extern int static_call_text_reserved(void *start, void *end);
 
+extern long __static_call_return0(void);
+
 #define DEFINE_STATIC_CALL(name, _func)
\
DECLARE_STATIC_CALL(name, _func);   \
struct static_call_key STATIC_CALL_KEY(name) = {\
@@ -206,6 +208,11 @@ static inline int static_call_text_reserved(void *start, 
void *end)
return 0;
 }
 
+static inline long __static_call_return0(void)
+{
+   return 0;
+}
+
 #define EXPORT_STATIC_CALL(name)   \
EXPORT_SYMBOL(STATIC_CALL_KEY(name));   \
EXPORT_SYMBOL(STATIC_CALL_TRAMP(name))
@@ -222,6 +229,11 @@ struct static_call_key {
void *func;
 };
 
+static inline long __static_call_return0(void)
+{
+   return 0;
+}
+
 #define DEFINE_STATIC_CALL(name, _func)
\
DECLARE_STATIC_CALL(name, _func);   \
struct static_call_key STATIC_CALL_KEY(name) = {\
diff --git a/kernel/static_call.c b/kernel/static_call.c
index 84565c2..0bc11b5 100644
--- a/kernel/static_call.c
+++ b/kernel/static_call.c
@@ -438,6 +438,11 @@ int __init static_call_init(void)
 }
 early_initcall(static_call_init);
 
+long __static_call_return0(void)
+{
+   return 0;
+}
+
 #ifdef CONFIG_STATIC_CALL_SELFTEST
 
 static int func_a(int x)

[tip: sched/core] preempt/dynamic: Provide cond_resched() and might_resched() static calls

2021-02-17 Thread tip-bot2 for Peter Zijlstra (Intel)

The following commit has been merged into the sched/core branch of tip:

Commit-ID: b965f1ddb47daa5b8b2e2bc9c921431236830367
Gitweb:
https://git.kernel.org/tip/b965f1ddb47daa5b8b2e2bc9c921431236830367
Author:Peter Zijlstra (Intel) 
AuthorDate:Mon, 18 Jan 2021 15:12:20 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:42 +01:00

preempt/dynamic: Provide cond_resched() and might_resched() static calls

Provide static calls to control cond_resched() (called in !CONFIG_PREEMPT)
and might_resched() (called in CONFIG_PREEMPT_VOLUNTARY) to that we
can override their behaviour when preempt= is overriden.

Since the default behaviour is full preemption, both their calls are
ignored when preempt= isn't passed.

  [fweisbec: branch might_resched() directly to __cond_resched(), only
 define static calls when PREEMPT_DYNAMIC]

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Link: https://lkml.kernel.org/r/20210118141223.123667-6-frede...@kernel.org
---
 include/linux/kernel.h | 23 +++
 include/linux/sched.h  | 27 ---
 kernel/sched/core.c| 16 +---
 3 files changed, 56 insertions(+), 10 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index f7902d8..cfd3d34 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -15,7 +15,7 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 
 #include 
@@ -81,11 +81,26 @@ struct pt_regs;
 struct user;
 
 #ifdef CONFIG_PREEMPT_VOLUNTARY
-extern int _cond_resched(void);
-# define might_resched() _cond_resched()
+
+extern int __cond_resched(void);
+# define might_resched() __cond_resched()
+
+#elif defined(CONFIG_PREEMPT_DYNAMIC)
+
+extern int __cond_resched(void);
+
+DECLARE_STATIC_CALL(might_resched, __cond_resched);
+
+static __always_inline void might_resched(void)
+{
+   static_call(might_resched)();
+}
+
 #else
+
 # define might_resched() do { } while (0)
-#endif
+
+#endif /* CONFIG_PREEMPT_* */
 
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 extern void ___might_sleep(const char *file, int line, int preempt_offset);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e115222..2f35594 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1871,11 +1871,32 @@ static inline int test_tsk_need_resched(struct 
task_struct *tsk)
  * value indicates whether a reschedule was done in fact.
  * cond_resched_lock() will drop the spinlock before scheduling,
  */
-#ifndef CONFIG_PREEMPTION
-extern int _cond_resched(void);
+#if !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC)
+extern int __cond_resched(void);
+
+#ifdef CONFIG_PREEMPT_DYNAMIC
+
+DECLARE_STATIC_CALL(cond_resched, __cond_resched);
+
+static __always_inline int _cond_resched(void)
+{
+   return static_call(cond_resched)();
+}
+
 #else
+
+static inline int _cond_resched(void)
+{
+   return __cond_resched();
+}
+
+#endif /* CONFIG_PREEMPT_DYNAMIC */
+
+#else
+
 static inline int _cond_resched(void) { return 0; }
-#endif
+
+#endif /* !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC) */
 
 #define cond_resched() ({  \
___might_sleep(__FILE__, __LINE__, 0);  \
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4afbdd2..f7c8fd8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6785,17 +6785,27 @@ SYSCALL_DEFINE0(sched_yield)
return 0;
 }
 
-#ifndef CONFIG_PREEMPTION
-int __sched _cond_resched(void)
+#if !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC)
+int __sched __cond_resched(void)
 {
if (should_resched(0)) {
preempt_schedule_common();
return 1;
}
+#ifndef CONFIG_PREEMPT_RCU
rcu_all_qs();
+#endif
return 0;
 }
-EXPORT_SYMBOL(_cond_resched);
+EXPORT_SYMBOL(__cond_resched);
+#endif
+
+#ifdef CONFIG_PREEMPT_DYNAMIC
+DEFINE_STATIC_CALL_RET0(cond_resched, __cond_resched);
+EXPORT_STATIC_CALL(cond_resched);
+
+DEFINE_STATIC_CALL_RET0(might_resched, __cond_resched);
+EXPORT_STATIC_CALL(might_resched);
 #endif
 
 /*

[tip: sched/core] preempt/dynamic: Support dynamic preempt with preempt= boot option

2021-02-17 Thread tip-bot2 for Peter Zijlstra (Intel)

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 826bfeb37bb4302ee6042f330c4c0c757152bdb8
Gitweb:
https://git.kernel.org/tip/826bfeb37bb4302ee6042f330c4c0c757152bdb8
Author:Peter Zijlstra (Intel) 
AuthorDate:Mon, 18 Jan 2021 15:12:23 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:42 +01:00

preempt/dynamic: Support dynamic preempt with preempt= boot option

Support the preempt= boot option and patch the static call sites
accordingly.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Link: https://lkml.kernel.org/r/20210118141223.123667-9-frede...@kernel.org
---
 kernel/sched/core.c | 68 +++-
 1 file changed, 67 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 880611c..0c06717 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5328,9 +5328,75 @@ DEFINE_STATIC_CALL(preempt_schedule_notrace, 
__preempt_schedule_notrace_func);
 EXPORT_STATIC_CALL(preempt_schedule_notrace);
 #endif
 
-
 #endif /* CONFIG_PREEMPTION */
 
+#ifdef CONFIG_PREEMPT_DYNAMIC
+
+#include 
+
+/*
+ * SC:cond_resched
+ * SC:might_resched
+ * SC:preempt_schedule
+ * SC:preempt_schedule_notrace
+ * SC:irqentry_exit_cond_resched
+ *
+ *
+ * NONE:
+ *   cond_resched   <- __cond_resched
+ *   might_resched  <- RET0
+ *   preempt_schedule   <- NOP
+ *   preempt_schedule_notrace   <- NOP
+ *   irqentry_exit_cond_resched <- NOP
+ *
+ * VOLUNTARY:
+ *   cond_resched   <- __cond_resched
+ *   might_resched  <- __cond_resched
+ *   preempt_schedule   <- NOP
+ *   preempt_schedule_notrace   <- NOP
+ *   irqentry_exit_cond_resched <- NOP
+ *
+ * FULL:
+ *   cond_resched   <- RET0
+ *   might_resched  <- RET0
+ *   preempt_schedule   <- preempt_schedule
+ *   preempt_schedule_notrace   <- preempt_schedule_notrace
+ *   irqentry_exit_cond_resched <- irqentry_exit_cond_resched
+ */
+static int __init setup_preempt_mode(char *str)
+{
+   if (!strcmp(str, "none")) {
+   static_call_update(cond_resched, __cond_resched);
+   static_call_update(might_resched, (typeof(&__cond_resched)) 
__static_call_return0);
+   static_call_update(preempt_schedule, 
(typeof(_schedule)) NULL);
+   static_call_update(preempt_schedule_notrace, 
(typeof(_schedule_notrace)) NULL);
+   static_call_update(irqentry_exit_cond_resched, 
(typeof(_exit_cond_resched)) NULL);
+   pr_info("Dynamic Preempt: %s\n", str);
+   } else if (!strcmp(str, "voluntary")) {
+   static_call_update(cond_resched, __cond_resched);
+   static_call_update(might_resched, __cond_resched);
+   static_call_update(preempt_schedule, 
(typeof(_schedule)) NULL);
+   static_call_update(preempt_schedule_notrace, 
(typeof(_schedule_notrace)) NULL);
+   static_call_update(irqentry_exit_cond_resched, 
(typeof(_exit_cond_resched)) NULL);
+   pr_info("Dynamic Preempt: %s\n", str);
+   } else if (!strcmp(str, "full")) {
+   static_call_update(cond_resched, (typeof(&__cond_resched)) 
__static_call_return0);
+   static_call_update(might_resched, (typeof(&__cond_resched)) 
__static_call_return0);
+   static_call_update(preempt_schedule, __preempt_schedule_func);
+   static_call_update(preempt_schedule_notrace, 
__preempt_schedule_notrace_func);
+   static_call_update(irqentry_exit_cond_resched, 
irqentry_exit_cond_resched);
+   pr_info("Dynamic Preempt: %s\n", str);
+   } else {
+   pr_warn("Dynamic Preempt: Unsupported preempt mode %s, default 
to full\n", str);
+   return 1;
+   }
+   return 0;
+}
+__setup("preempt=", setup_preempt_mode);
+
+#endif /* CONFIG_PREEMPT_DYNAMIC */
+
+
 /*
  * This is the entry point to schedule() from kernel preemption
  * off of irq context.

[tip: sched/core] static_call: Allow module use without exposing static_call_key

2021-02-17 Thread tip-bot2 for Josh Poimboeuf

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 73f44fe19d359635a607e8e8daa0da4001c1cfc2
Gitweb:
https://git.kernel.org/tip/73f44fe19d359635a607e8e8daa0da4001c1cfc2
Author:Josh Poimboeuf 
AuthorDate:Wed, 27 Jan 2021 17:18:37 -06:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:42 +01:00

static_call: Allow module use without exposing static_call_key

When exporting static_call_key; with EXPORT_STATIC_CALL*(), the module
can use static_call_update() to change the function called.  This is
not desirable in general.

Not exporting static_call_key however also disallows usage of
static_call(), since objtool needs the key to construct the
static_call_site.

Solve this by allowing objtool to create the static_call_site using
the trampoline address when it builds a module and cannot find the
static_call_key symbol. The module loader will then try and map the
trampole back to a key before it constructs the normal sites list.

Doing this requires a trampoline -> key associsation, so add another
magic section that keeps those.

Originally-by: Peter Zijlstra (Intel) 
Signed-off-by: Josh Poimboeuf 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Link: https://lkml.kernel.org/r/20210127231837.ifddpn7rhwdaepiu@treble
---
 arch/x86/include/asm/static_call.h  |  7 +++-
 include/asm-generic/vmlinux.lds.h   |  5 +-
 include/linux/static_call.h | 22 +-
 include/linux/static_call_types.h   | 27 +++-
 kernel/static_call.c| 55 +++-
 tools/include/linux/static_call_types.h | 27 +++-
 tools/objtool/check.c   | 17 ++-
 7 files changed, 149 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/static_call.h 
b/arch/x86/include/asm/static_call.h
index c37f119..cbb67b6 100644
--- a/arch/x86/include/asm/static_call.h
+++ b/arch/x86/include/asm/static_call.h
@@ -37,4 +37,11 @@
 #define ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name)   \
__ARCH_DEFINE_STATIC_CALL_TRAMP(name, "ret; nop; nop; nop; nop")
 
+
+#define ARCH_ADD_TRAMP_KEY(name)   \
+   asm(".pushsection .static_call_tramp_key, \"a\" \n" \
+   ".long " STATIC_CALL_TRAMP_STR(name) " - .  \n" \
+   ".long " STATIC_CALL_KEY_STR(name) " - .\n" \
+   ".popsection\n")
+
 #endif /* _ASM_STATIC_CALL_H */
diff --git a/include/asm-generic/vmlinux.lds.h 
b/include/asm-generic/vmlinux.lds.h
index b97c628..3f747de 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -393,7 +393,10 @@
. = ALIGN(8);   \
__start_static_call_sites = .;  \
KEEP(*(.static_call_sites)) \
-   __stop_static_call_sites = .;
+   __stop_static_call_sites = .;   \
+   __start_static_call_tramp_key = .;  \
+   KEEP(*(.static_call_tramp_key)) \
+   __stop_static_call_tramp_key = .;
 
 /*
  * Allow architectures to handle ro_after_init data on their
diff --git a/include/linux/static_call.h b/include/linux/static_call.h
index d69dd8b..85ecc78 100644
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -138,6 +138,12 @@ struct static_call_key {
};
 };
 
+/* For finding the key associated with a trampoline */
+struct static_call_tramp_key {
+   s32 tramp;
+   s32 key;
+};
+
 extern void __static_call_update(struct static_call_key *key, void *tramp, 
void *func);
 extern int static_call_mod_init(struct module *mod);
 extern int static_call_text_reserved(void *start, void *end);
@@ -165,11 +171,18 @@ extern long __static_call_return0(void);
 #define EXPORT_STATIC_CALL(name)   \
EXPORT_SYMBOL(STATIC_CALL_KEY(name));   \
EXPORT_SYMBOL(STATIC_CALL_TRAMP(name))
-
 #define EXPORT_STATIC_CALL_GPL(name)   \
EXPORT_SYMBOL_GPL(STATIC_CALL_KEY(name));   \
EXPORT_SYMBOL_GPL(STATIC_CALL_TRAMP(name))
 
+/* Leave the key unexported, so modules can't change static call targets: */
+#define EXPORT_STATIC_CALL_TRAMP(name) \
+   EXPORT_SYMBOL(STATIC_CALL_TRAMP(name)); \
+   ARCH_ADD_TRAMP_KEY(name)
+#define EXPORT_STATIC_CALL_TRAMP_GPL(name) \
+   EXPORT_SYMBOL_GPL(STATIC_CALL_TRAMP(name)); \
+   ARCH_ADD_TRAMP_KEY(name)
+
 #elif defined(CONFIG_HAVE_STATIC_CALL)
 
 static inline int static_call_init(void) { return 0; }
@@ -216,11 +229,16 @@ static inline long __static_call_return0(void)

[tip: sched/core] uprobes: (Re)add missing get_uprobe() in __find_uprobe()

2021-02-17 Thread tip-bot2 for Sven Schnelle

The following commit has been merged into the sched/core branch of tip:

Commit-ID: b0d6d4789677d128b1933af023083054f0973574
Gitweb:
https://git.kernel.org/tip/b0d6d4789677d128b1933af023083054f0973574
Author:Sven Schnelle 
AuthorDate:Tue, 09 Feb 2021 16:07:11 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:42 +01:00

uprobes: (Re)add missing get_uprobe() in __find_uprobe()

commit c6bc9bd06dff ("rbtree, uprobes: Use rbtree helpers")
accidentally removed the refcount increase. Add it again.

Fixes: c6bc9bd06dff ("rbtree, uprobes: Use rbtree helpers")
Signed-off-by: Sven Schnelle 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Link: https://lkml.kernel.org/r/20210209150711.36778-1-sv...@linux.ibm.com
---
 kernel/events/uprobes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index fd5160d..3ea7f8f 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -661,7 +661,7 @@ static struct uprobe *__find_uprobe(struct inode *inode, 
loff_t offset)
struct rb_node *node = rb_find(, _tree, __uprobe_cmp_key);
 
if (node)
-   return __node_2_uprobe(node);
+   return get_uprobe(__node_2_uprobe(node));
 
return NULL;
 }

[tip: sched/core] rbtree, sched/fair: Use rb_add_cached()

2021-02-17 Thread tip-bot2 for Peter Zijlstra

The following commit has been merged into the sched/core branch of tip:

Commit-ID: bf9be9a163b464aa90f60af13b336da2db8b2ea1
Gitweb:
https://git.kernel.org/tip/bf9be9a163b464aa90f60af13b336da2db8b2ea1
Author:Peter Zijlstra 
AuthorDate:Wed, 29 Apr 2020 17:04:12 +02:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:07:39 +01:00

rbtree, sched/fair: Use rb_add_cached()

Reduce rbtree boiler plate by using the new helper function.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Acked-by: Davidlohr Bueso 
---
 kernel/sched/fair.c | 46 +---
 1 file changed, 14 insertions(+), 32 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c73d588..59b645e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -531,12 +531,15 @@ static inline u64 min_vruntime(u64 min_vruntime, u64 
vruntime)
return min_vruntime;
 }
 
-static inline int entity_before(struct sched_entity *a,
+static inline bool entity_before(struct sched_entity *a,
struct sched_entity *b)
 {
return (s64)(a->vruntime - b->vruntime) < 0;
 }
 
+#define __node_2_se(node) \
+   rb_entry((node), struct sched_entity, run_node)
+
 static void update_min_vruntime(struct cfs_rq *cfs_rq)
 {
struct sched_entity *curr = cfs_rq->curr;
@@ -552,8 +555,7 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
}
 
if (leftmost) { /* non-empty tree */
-   struct sched_entity *se;
-   se = rb_entry(leftmost, struct sched_entity, run_node);
+   struct sched_entity *se = __node_2_se(leftmost);
 
if (!curr)
vruntime = se->vruntime;
@@ -569,37 +571,17 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
 #endif
 }
 
+static inline bool __entity_less(struct rb_node *a, const struct rb_node *b)
+{
+   return entity_before(__node_2_se(a), __node_2_se(b));
+}
+
 /*
  * Enqueue an entity into the rb-tree:
  */
 static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-   struct rb_node **link = _rq->tasks_timeline.rb_root.rb_node;
-   struct rb_node *parent = NULL;
-   struct sched_entity *entry;
-   bool leftmost = true;
-
-   /*
-* Find the right place in the rbtree:
-*/
-   while (*link) {
-   parent = *link;
-   entry = rb_entry(parent, struct sched_entity, run_node);
-   /*
-* We dont care about collisions. Nodes with
-* the same key stay together.
-*/
-   if (entity_before(se, entry)) {
-   link = >rb_left;
-   } else {
-   link = >rb_right;
-   leftmost = false;
-   }
-   }
-
-   rb_link_node(>run_node, parent, link);
-   rb_insert_color_cached(>run_node,
-  _rq->tasks_timeline, leftmost);
+   rb_add_cached(>run_node, _rq->tasks_timeline, __entity_less);
 }
 
 static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
@@ -614,7 +596,7 @@ struct sched_entity *__pick_first_entity(struct cfs_rq 
*cfs_rq)
if (!left)
return NULL;
 
-   return rb_entry(left, struct sched_entity, run_node);
+   return __node_2_se(left);
 }
 
 static struct sched_entity *__pick_next_entity(struct sched_entity *se)
@@ -624,7 +606,7 @@ static struct sched_entity *__pick_next_entity(struct 
sched_entity *se)
if (!next)
return NULL;
 
-   return rb_entry(next, struct sched_entity, run_node);
+   return __node_2_se(next);
 }
 
 #ifdef CONFIG_SCHED_DEBUG
@@ -635,7 +617,7 @@ struct sched_entity *__pick_last_entity(struct cfs_rq 
*cfs_rq)
if (!last)
return NULL;
 
-   return rb_entry(last, struct sched_entity, run_node);
+   return __node_2_se(last);
 }
 
 /**

[tip: sched/core] sched/features: Distinguish between NORMAL and DEADLINE hrtick

2021-02-17 Thread tip-bot2 for Juri Lelli

The following commit has been merged into the sched/core branch of tip:

Commit-ID: e0ee463c93c43b1657ad69cf2678ff5bf1b754fe
Gitweb:
https://git.kernel.org/tip/e0ee463c93c43b1657ad69cf2678ff5bf1b754fe
Author:Juri Lelli 
AuthorDate:Mon, 08 Feb 2021 08:35:54 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:42 +01:00

sched/features: Distinguish between NORMAL and DEADLINE hrtick

The HRTICK feature has traditionally been servicing configurations that
need precise preemptions point for NORMAL tasks. More recently, the
feature has been extended to also service DEADLINE tasks with stringent
runtime enforcement needs (e.g., runtime < 1ms with HZ=1000).

Enabling HRTICK sched feature currently enables the additional timer and
task tick for both classes, which might introduced undesired overhead
for no additional benefit if one needed it only for one of the cases.

Separate HRTICK sched feature in two (and leave the traditional case
name unmodified) so that it can be selectively enabled when needed.

With:

  $ echo HRTICK > /sys/kernel/debug/sched_features

the NORMAL/fair hrtick gets enabled.

With:

  $ echo HRTICK_DL > /sys/kernel/debug/sched_features

the DEADLINE hrtick gets enabled.

Signed-off-by: Juri Lelli 
Signed-off-by: Luis Claudio R. Goncalves 
Signed-off-by: Daniel Bristot de Oliveira 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Link: https://lkml.kernel.org/r/20210208073554.14629-3-juri.le...@redhat.com
---
 kernel/sched/core.c |  2 +-
 kernel/sched/deadline.c |  4 ++--
 kernel/sched/fair.c |  4 ++--
 kernel/sched/features.h |  1 +
 kernel/sched/sched.h| 26 --
 5 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 18d51ab..88a2e2b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4969,7 +4969,7 @@ static void __sched notrace __schedule(bool preempt)
 
schedule_debug(prev, preempt);
 
-   if (sched_feat(HRTICK))
+   if (sched_feat(HRTICK) || sched_feat(HRTICK_DL))
hrtick_clear(rq);
 
local_irq_disable();
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 6f37796..aac3539 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1832,7 +1832,7 @@ static void set_next_task_dl(struct rq *rq, struct 
task_struct *p, bool first)
if (!first)
return;
 
-   if (hrtick_enabled(rq))
+   if (hrtick_enabled_dl(rq))
start_hrtick_dl(rq, p);
 
if (rq->curr->sched_class != _sched_class)
@@ -1895,7 +1895,7 @@ static void task_tick_dl(struct rq *rq, struct 
task_struct *p, int queued)
 * not being the leftmost task anymore. In that case NEED_RESCHED will
 * be set and schedule() will start a new hrtick for the next task.
 */
-   if (hrtick_enabled(rq) && queued && p->dl.runtime > 0 &&
+   if (hrtick_enabled_dl(rq) && queued && p->dl.runtime > 0 &&
is_leftmost(p, >dl))
start_hrtick_dl(rq, p);
 }
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 59b645e..8a8bd7b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5429,7 +5429,7 @@ static void hrtick_update(struct rq *rq)
 {
struct task_struct *curr = rq->curr;
 
-   if (!hrtick_enabled(rq) || curr->sched_class != _sched_class)
+   if (!hrtick_enabled_fair(rq) || curr->sched_class != _sched_class)
return;
 
if (cfs_rq_of(>se)->nr_running < sched_nr_latency)
@@ -7116,7 +7116,7 @@ done: __maybe_unused;
list_move(>se.group_node, >cfs_tasks);
 #endif
 
-   if (hrtick_enabled(rq))
+   if (hrtick_enabled_fair(rq))
hrtick_start_fair(rq, p);
 
update_misfit_status(p, rq);
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index e875eab..1bc2b15 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -38,6 +38,7 @@ SCHED_FEAT(CACHE_HOT_BUDDY, true)
 SCHED_FEAT(WAKEUP_PREEMPTION, true)
 
 SCHED_FEAT(HRTICK, false)
+SCHED_FEAT(HRTICK_DL, false)
 SCHED_FEAT(DOUBLE_TICK, false)
 
 /*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 0dfdd52..10a1522 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2105,17 +2105,39 @@ extern const_debug unsigned int 
sysctl_sched_migration_cost;
  */
 static inline int hrtick_enabled(struct rq *rq)
 {
-   if (!sched_feat(HRTICK))
-   return 0;
if (!cpu_active(cpu_of(rq)))
return 0;
return hrtimer_is_hres_active(>hrtick_timer);
 }
 
+static inline int hrtick_enabled_fair(struct rq *rq)
+{
+   if (!sched_feat(HRTICK))
+   return 0;
+   return hrtick_enabled(rq);
+}
+
+static inline int hrtick_enabled_dl(struct rq *rq)
+{
+   if (!sched_feat(HRTICK_DL))
+   return 0;
+   return hrtick_enabled(rq);
+}
+
 void hrtick_start(struct rq *rq, u64 delay);

[tip: sched/core] rbtree: Add generic add and find helpers

2021-02-17 Thread tip-bot2 for Peter Zijlstra

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 2d24dd5798d0474d9bf705bfca8725e7d20f9d54
Gitweb:
https://git.kernel.org/tip/2d24dd5798d0474d9bf705bfca8725e7d20f9d54
Author:Peter Zijlstra 
AuthorDate:Wed, 29 Apr 2020 17:03:22 +02:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:07:31 +01:00

rbtree: Add generic add and find helpers

I've always been bothered by the endless (fragile) boilerplate for
rbtree, and I recently wrote some rbtree helpers for objtool and
figured I should lift them into the kernel and use them more widely.

Provide:

partial-order; less() based:
 - rb_add(): add a new entry to the rbtree
 - rb_add_cached(): like rb_add(), but for a rb_root_cached

total-order; cmp() based:
 - rb_find(): find an entry in an rbtree
 - rb_find_add(): find an entry, and add if not found

 - rb_find_first(): find the first (leftmost) matching entry
 - rb_next_match(): continue from rb_find_first()
 - rb_for_each(): iterate a sub-tree using the previous two

Inlining and constant propagation should see the compiler inline the
whole thing, including the various compare functions.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Reviewed-by: Michel Lespinasse 
Acked-by: Davidlohr Bueso 
---
 include/linux/rbtree.h   | 190 ++-
 tools/include/linux/rbtree.h | 192 +-
 tools/objtool/elf.c  |  73 +
 3 files changed, 392 insertions(+), 63 deletions(-)

diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index d7db179..e0b300d 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -158,4 +158,194 @@ static inline void rb_replace_node_cached(struct rb_node 
*victim,
rb_replace_node(victim, new, >rb_root);
 }
 
+/*
+ * The below helper functions use 2 operators with 3 different
+ * calling conventions. The operators are related like:
+ *
+ * comp(a->key,b) < 0  := less(a,b)
+ * comp(a->key,b) > 0  := less(b,a)
+ * comp(a->key,b) == 0 := !less(a,b) && !less(b,a)
+ *
+ * If these operators define a partial order on the elements we make no
+ * guarantee on which of the elements matching the key is found. See
+ * rb_find().
+ *
+ * The reason for this is to allow the find() interface without requiring an
+ * on-stack dummy object, which might not be feasible due to object size.
+ */
+
+/**
+ * rb_add_cached() - insert @node into the leftmost cached tree @tree
+ * @node: node to insert
+ * @tree: leftmost cached tree to insert @node into
+ * @less: operator defining the (partial) node order
+ */
+static __always_inline void
+rb_add_cached(struct rb_node *node, struct rb_root_cached *tree,
+ bool (*less)(struct rb_node *, const struct rb_node *))
+{
+   struct rb_node **link = >rb_root.rb_node;
+   struct rb_node *parent = NULL;
+   bool leftmost = true;
+
+   while (*link) {
+   parent = *link;
+   if (less(node, parent)) {
+   link = >rb_left;
+   } else {
+   link = >rb_right;
+   leftmost = false;
+   }
+   }
+
+   rb_link_node(node, parent, link);
+   rb_insert_color_cached(node, tree, leftmost);
+}
+
+/**
+ * rb_add() - insert @node into @tree
+ * @node: node to insert
+ * @tree: tree to insert @node into
+ * @less: operator defining the (partial) node order
+ */
+static __always_inline void
+rb_add(struct rb_node *node, struct rb_root *tree,
+   bool (*less)(struct rb_node *, const struct rb_node *))
+{
+   struct rb_node **link = >rb_node;
+   struct rb_node *parent = NULL;
+
+   while (*link) {
+   parent = *link;
+   if (less(node, parent))
+   link = >rb_left;
+   else
+   link = >rb_right;
+   }
+
+   rb_link_node(node, parent, link);
+   rb_insert_color(node, tree);
+}
+
+/**
+ * rb_find_add() - find equivalent @node in @tree, or add @node
+ * @node: node to look-for / insert
+ * @tree: tree to search / modify
+ * @cmp: operator defining the node order
+ *
+ * Returns the rb_node matching @node, or NULL when no match is found and @node
+ * is inserted.
+ */
+static __always_inline struct rb_node *
+rb_find_add(struct rb_node *node, struct rb_root *tree,
+   int (*cmp)(struct rb_node *, const struct rb_node *))
+{
+   struct rb_node **link = >rb_node;
+   struct rb_node *parent = NULL;
+   int c;
+
+   while (*link) {
+   parent = *link;
+   c = cmp(node, parent);
+
+   if (c < 0)
+   link = >rb_left;
+   else if (c > 0)
+   link = >rb_right;
+   else
+   return parent;
+   }
+
+   rb_link_node(node, parent, link);
+   rb_insert_color(node, tree);
+   return NULL;
+}
+
+/**
+ *

[tip: sched/core] rbtree, sched/deadline: Use rb_add_cached()

2021-02-17 Thread tip-bot2 for Peter Zijlstra

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 8ecca39483ed4e4e97096d0d6f8e25fdd323b189
Gitweb:
https://git.kernel.org/tip/8ecca39483ed4e4e97096d0d6f8e25fdd323b189
Author:Peter Zijlstra 
AuthorDate:Wed, 29 Apr 2020 17:04:41 +02:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:07:44 +01:00

rbtree, sched/deadline: Use rb_add_cached()

Reduce rbtree boiler plate by using the new helpers.

Make rb_add_cached() / rb_erase_cached() return a pointer to the
leftmost node to aid in updating additional state.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Acked-by: Davidlohr Bueso 
---
 include/linux/rbtree.h  | 18 +++--
 kernel/sched/deadline.c | 77 ++--
 2 files changed, 42 insertions(+), 53 deletions(-)

diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index e0b300d..d31ecaf 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -141,12 +141,18 @@ static inline void rb_insert_color_cached(struct rb_node 
*node,
rb_insert_color(node, >rb_root);
 }
 
-static inline void rb_erase_cached(struct rb_node *node,
-  struct rb_root_cached *root)
+
+static inline struct rb_node *
+rb_erase_cached(struct rb_node *node, struct rb_root_cached *root)
 {
+   struct rb_node *leftmost = NULL;
+
if (root->rb_leftmost == node)
-   root->rb_leftmost = rb_next(node);
+   leftmost = root->rb_leftmost = rb_next(node);
+
rb_erase(node, >rb_root);
+
+   return leftmost;
 }
 
 static inline void rb_replace_node_cached(struct rb_node *victim,
@@ -179,8 +185,10 @@ static inline void rb_replace_node_cached(struct rb_node 
*victim,
  * @node: node to insert
  * @tree: leftmost cached tree to insert @node into
  * @less: operator defining the (partial) node order
+ *
+ * Returns @node when it is the new leftmost, or NULL.
  */
-static __always_inline void
+static __always_inline struct rb_node *
 rb_add_cached(struct rb_node *node, struct rb_root_cached *tree,
  bool (*less)(struct rb_node *, const struct rb_node *))
 {
@@ -200,6 +208,8 @@ rb_add_cached(struct rb_node *node, struct rb_root_cached 
*tree,
 
rb_link_node(node, parent, link);
rb_insert_color_cached(node, tree, leftmost);
+
+   return leftmost ? node : NULL;
 }
 
 /**
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 5421782..1508d12 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -517,58 +517,44 @@ static void dec_dl_migration(struct sched_dl_entity 
*dl_se, struct dl_rq *dl_rq)
update_dl_migration(dl_rq);
 }
 
+#define __node_2_pdl(node) \
+   rb_entry((node), struct task_struct, pushable_dl_tasks)
+
+static inline bool __pushable_less(struct rb_node *a, const struct rb_node *b)
+{
+   return dl_entity_preempt(&__node_2_pdl(a)->dl, &__node_2_pdl(b)->dl);
+}
+
 /*
  * The list of pushable -deadline task is not a plist, like in
  * sched_rt.c, it is an rb-tree with tasks ordered by deadline.
  */
 static void enqueue_pushable_dl_task(struct rq *rq, struct task_struct *p)
 {
-   struct dl_rq *dl_rq = >dl;
-   struct rb_node **link = _rq->pushable_dl_tasks_root.rb_root.rb_node;
-   struct rb_node *parent = NULL;
-   struct task_struct *entry;
-   bool leftmost = true;
+   struct rb_node *leftmost;
 
BUG_ON(!RB_EMPTY_NODE(>pushable_dl_tasks));
 
-   while (*link) {
-   parent = *link;
-   entry = rb_entry(parent, struct task_struct,
-pushable_dl_tasks);
-   if (dl_entity_preempt(>dl, >dl))
-   link = >rb_left;
-   else {
-   link = >rb_right;
-   leftmost = false;
-   }
-   }
-
+   leftmost = rb_add_cached(>pushable_dl_tasks,
+>dl.pushable_dl_tasks_root,
+__pushable_less);
if (leftmost)
-   dl_rq->earliest_dl.next = p->dl.deadline;
-
-   rb_link_node(>pushable_dl_tasks, parent, link);
-   rb_insert_color_cached(>pushable_dl_tasks,
-  _rq->pushable_dl_tasks_root, leftmost);
+   rq->dl.earliest_dl.next = p->dl.deadline;
 }
 
 static void dequeue_pushable_dl_task(struct rq *rq, struct task_struct *p)
 {
struct dl_rq *dl_rq = >dl;
+   struct rb_root_cached *root = _rq->pushable_dl_tasks_root;
+   struct rb_node *leftmost;
 
if (RB_EMPTY_NODE(>pushable_dl_tasks))
return;
 
-   if (dl_rq->pushable_dl_tasks_root.rb_leftmost == >pushable_dl_tasks) 
{
-   struct rb_node *next_node;
-
-   next_node = rb_next(>pushable_dl_tasks);
-   if (next_node) {
-   dl_rq->earliest_dl.next = rb_entry(next_node,
-   struct

[tip: sched/core] sched/fair: Remove select_idle_smt()

2021-02-17 Thread tip-bot2 for Mel Gorman

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 6cd56ef1df399a004f90ecb682427f9964969fc9
Gitweb:
https://git.kernel.org/tip/6cd56ef1df399a004f90ecb682427f9964969fc9
Author:Mel Gorman 
AuthorDate:Mon, 25 Jan 2021 08:59:08 
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:06:59 +01:00

sched/fair: Remove select_idle_smt()

In order to make the next patch more readable, and to quantify the
actual effectiveness of this pass, start by removing it.

Signed-off-by: Mel Gorman 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Reviewed-by: Vincent Guittot 
Link: 
https://lkml.kernel.org/r/20210125085909.4600-4-mgor...@techsingularity.net
---
 kernel/sched/fair.c | 30 --
 1 file changed, 30 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4c18ef6..6a0fc8a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6114,27 +6114,6 @@ static int select_idle_core(struct task_struct *p, 
struct sched_domain *sd, int 
return -1;
 }
 
-/*
- * Scan the local SMT mask for idle CPUs.
- */
-static int select_idle_smt(struct task_struct *p, struct sched_domain *sd, int 
target)
-{
-   int cpu;
-
-   if (!static_branch_likely(_smt_present))
-   return -1;
-
-   for_each_cpu(cpu, cpu_smt_mask(target)) {
-   if (!cpumask_test_cpu(cpu, p->cpus_ptr) ||
-   !cpumask_test_cpu(cpu, sched_domain_span(sd)))
-   continue;
-   if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
-   return cpu;
-   }
-
-   return -1;
-}
-
 #else /* CONFIG_SCHED_SMT */
 
 static inline int select_idle_core(struct task_struct *p, struct sched_domain 
*sd, int target)
@@ -6142,11 +6121,6 @@ static inline int select_idle_core(struct task_struct 
*p, struct sched_domain *s
return -1;
 }
 
-static inline int select_idle_smt(struct task_struct *p, struct sched_domain 
*sd, int target)
-{
-   return -1;
-}
-
 #endif /* CONFIG_SCHED_SMT */
 
 /*
@@ -6336,10 +6310,6 @@ static int select_idle_sibling(struct task_struct *p, 
int prev, int target)
if ((unsigned)i < nr_cpumask_bits)
return i;
 
-   i = select_idle_smt(p, sd, target);
-   if ((unsigned)i < nr_cpumask_bits)
-   return i;
-
return target;
 }

[tip: sched/core] rbtree, rtmutex: Use rb_add_cached()

2021-02-17 Thread tip-bot2 for Peter Zijlstra

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 5a7987253ef0909d94e176cd97e511013de0fe19
Gitweb:
https://git.kernel.org/tip/5a7987253ef0909d94e176cd97e511013de0fe19
Author:Peter Zijlstra 
AuthorDate:Wed, 29 Apr 2020 17:29:58 +02:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:07:57 +01:00

rbtree, rtmutex: Use rb_add_cached()

Reduce rbtree boiler plate by using the new helpers.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Acked-by: Davidlohr Bueso 
---
 kernel/locking/rtmutex.c | 54 +--
 1 file changed, 18 insertions(+), 36 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 2f8cd61..57e3804 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -267,27 +267,18 @@ rt_mutex_waiter_equal(struct rt_mutex_waiter *left,
return 1;
 }
 
+#define __node_2_waiter(node) \
+   rb_entry((node), struct rt_mutex_waiter, tree_entry)
+
+static inline bool __waiter_less(struct rb_node *a, const struct rb_node *b)
+{
+   return rt_mutex_waiter_less(__node_2_waiter(a), __node_2_waiter(b));
+}
+
 static void
 rt_mutex_enqueue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter)
 {
-   struct rb_node **link = >waiters.rb_root.rb_node;
-   struct rb_node *parent = NULL;
-   struct rt_mutex_waiter *entry;
-   bool leftmost = true;
-
-   while (*link) {
-   parent = *link;
-   entry = rb_entry(parent, struct rt_mutex_waiter, tree_entry);
-   if (rt_mutex_waiter_less(waiter, entry)) {
-   link = >rb_left;
-   } else {
-   link = >rb_right;
-   leftmost = false;
-   }
-   }
-
-   rb_link_node(>tree_entry, parent, link);
-   rb_insert_color_cached(>tree_entry, >waiters, leftmost);
+   rb_add_cached(>tree_entry, >waiters, __waiter_less);
 }
 
 static void
@@ -300,27 +291,18 @@ rt_mutex_dequeue(struct rt_mutex *lock, struct 
rt_mutex_waiter *waiter)
RB_CLEAR_NODE(>tree_entry);
 }
 
+#define __node_2_pi_waiter(node) \
+   rb_entry((node), struct rt_mutex_waiter, pi_tree_entry)
+
+static inline bool __pi_waiter_less(struct rb_node *a, const struct rb_node *b)
+{
+   return rt_mutex_waiter_less(__node_2_pi_waiter(a), 
__node_2_pi_waiter(b));
+}
+
 static void
 rt_mutex_enqueue_pi(struct task_struct *task, struct rt_mutex_waiter *waiter)
 {
-   struct rb_node **link = >pi_waiters.rb_root.rb_node;
-   struct rb_node *parent = NULL;
-   struct rt_mutex_waiter *entry;
-   bool leftmost = true;
-
-   while (*link) {
-   parent = *link;
-   entry = rb_entry(parent, struct rt_mutex_waiter, pi_tree_entry);
-   if (rt_mutex_waiter_less(waiter, entry)) {
-   link = >rb_left;
-   } else {
-   link = >rb_right;
-   leftmost = false;
-   }
-   }
-
-   rb_link_node(>pi_tree_entry, parent, link);
-   rb_insert_color_cached(>pi_tree_entry, >pi_waiters, 
leftmost);
+   rb_add_cached(>pi_tree_entry, >pi_waiters, 
__pi_waiter_less);
 }
 
 static void

[tip: sched/core] sched/topology: Fix sched_domain_topology_level alloc in sched_init_numa()

2021-02-17 Thread tip-bot2 for Dietmar Eggemann

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 71e5f6644fb2f3304fcb310145ded234a37e7cc1
Gitweb:
https://git.kernel.org/tip/71e5f6644fb2f3304fcb310145ded234a37e7cc1
Author:Dietmar Eggemann 
AuthorDate:Mon, 01 Feb 2021 10:53:53 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:08:05 +01:00

sched/topology: Fix sched_domain_topology_level alloc in sched_init_numa()

Commit "sched/topology: Make sched_init_numa() use a set for the
deduplicating sort" allocates 'i + nr_levels (level)' instead of
'i + nr_levels + 1' sched_domain_topology_level.

This led to an Oops (on Arm64 juno with CONFIG_SCHED_DEBUG):

sched_init_domains
  build_sched_domains()
__free_domain_allocs()
  __sdt_free() {
...
for_each_sd_topology(tl)
  ...
  sd = *per_cpu_ptr(sdd->sd, j); <--
  ...
  }

Signed-off-by: Dietmar Eggemann 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Tested-by: Vincent Guittot 
Tested-by: Barry Song 
Link: https://lkml.kernel.org/r/6000e39e-7d28-c360-9cd6-8798fd22a...@arm.com
---
 kernel/sched/topology.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index bf5c9bd..09d3504 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1702,7 +1702,7 @@ void sched_init_numa(void)
/* Compute default topology size */
for (i = 0; sched_domain_topology[i].mask; i++);
 
-   tl = kzalloc((i + nr_levels) *
+   tl = kzalloc((i + nr_levels + 1) *
sizeof(struct sched_domain_topology_level), GFP_KERNEL);
if (!tl)
return;

[tip: sched/core] sched: Remove MAX_USER_RT_PRIO

2021-02-17 Thread tip-bot2 for Dietmar Eggemann

The following commit has been merged into the sched/core branch of tip:

Commit-ID: ae18ad281e825993d190073d0ae2ea35dee27ee1
Gitweb:
https://git.kernel.org/tip/ae18ad281e825993d190073d0ae2ea35dee27ee1
Author:Dietmar Eggemann 
AuthorDate:Thu, 28 Jan 2021 14:10:38 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:08:11 +01:00

sched: Remove MAX_USER_RT_PRIO

Commit d46523ea32a7 ("[PATCH] fix MAX_USER_RT_PRIO and MAX_RT_PRIO")
was introduced due to a a small time period in which the realtime patch
set was using different values for MAX_USER_RT_PRIO and MAX_RT_PRIO.

This is no longer true, i.e. now MAX_RT_PRIO == MAX_USER_RT_PRIO.

Get rid of MAX_USER_RT_PRIO and make everything use MAX_RT_PRIO
instead.

Signed-off-by: Dietmar Eggemann 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Link: https://lkml.kernel.org/r/20210128131040.296856-2-dietmar.eggem...@arm.com
---
 include/linux/sched/prio.h |  9 +
 kernel/sched/core.c|  7 +++
 2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched/prio.h b/include/linux/sched/prio.h
index 7d64fea..d111f2f 100644
--- a/include/linux/sched/prio.h
+++ b/include/linux/sched/prio.h
@@ -11,16 +11,9 @@
  * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
  * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1. Priority
  * values are inverted: lower p->prio value means higher priority.
- *
- * The MAX_USER_RT_PRIO value allows the actual maximum
- * RT priority to be separate from the value exported to
- * user-space.  This allows kernel threads to set their
- * priority to a value higher than any user task. Note:
- * MAX_RT_PRIO must not be smaller than MAX_USER_RT_PRIO.
  */
 
-#define MAX_USER_RT_PRIO   100
-#define MAX_RT_PRIOMAX_USER_RT_PRIO
+#define MAX_RT_PRIO100
 
 #define MAX_PRIO   (MAX_RT_PRIO + NICE_WIDTH)
 #define DEFAULT_PRIO   (MAX_RT_PRIO + NICE_WIDTH / 2)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6c789dc..f0b0b67 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5911,11 +5911,10 @@ recheck:
 
/*
 * Valid priorities for SCHED_FIFO and SCHED_RR are
-* 1..MAX_USER_RT_PRIO-1, valid priority for SCHED_NORMAL,
+* 1..MAX_RT_PRIO-1, valid priority for SCHED_NORMAL,
 * SCHED_BATCH and SCHED_IDLE is 0.
 */
-   if ((p->mm && attr->sched_priority > MAX_USER_RT_PRIO-1) ||
-   (!p->mm && attr->sched_priority > MAX_RT_PRIO-1))
+   if (attr->sched_priority > MAX_RT_PRIO-1)
return -EINVAL;
if ((dl_policy(policy) && !__checkparam_dl(attr)) ||
(rt_policy(policy) != (attr->sched_priority != 0)))
@@ -6983,7 +6982,7 @@ SYSCALL_DEFINE1(sched_get_priority_max, int, policy)
switch (policy) {
case SCHED_FIFO:
case SCHED_RR:
-   ret = MAX_USER_RT_PRIO-1;
+   ret = MAX_RT_PRIO-1;
break;
case SCHED_DEADLINE:
case SCHED_NORMAL:

[tip: sched/core] static_call: Pull some static_call declarations to the type headers

2021-02-17 Thread tip-bot2 for Peter Zijlstra

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 880cfed3a012d7863f42251791cea7fe78c39390
Gitweb:
https://git.kernel.org/tip/880cfed3a012d7863f42251791cea7fe78c39390
Author:Peter Zijlstra 
AuthorDate:Mon, 18 Jan 2021 15:12:18 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:08:35 +01:00

static_call: Pull some static_call declarations to the type headers

Some static call declarations are going to be needed on low level header
files. Move the necessary material to the dedicated static call types
header to avoid inclusion dependency hell.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Link: https://lkml.kernel.org/r/20210118141223.123667-4-frede...@kernel.org
---
 include/linux/static_call.h | 21 +---
 include/linux/static_call_types.h   | 27 -
 tools/include/linux/static_call_types.h | 27 -
 3 files changed, 54 insertions(+), 21 deletions(-)

diff --git a/include/linux/static_call.h b/include/linux/static_call.h
index 695da4c..a2c0645 100644
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -107,26 +107,10 @@ extern void arch_static_call_transform(void *site, void 
*tramp, void *func, bool
 
 #define STATIC_CALL_TRAMP_ADDR(name) _CALL_TRAMP(name)
 
-/*
- * __ADDRESSABLE() is used to ensure the key symbol doesn't get stripped from
- * the symbol table so that objtool can reference it when it generates the
- * .static_call_sites section.
- */
-#define __static_call(name)\
-({ \
-   __ADDRESSABLE(STATIC_CALL_KEY(name));   \
-   _CALL_TRAMP(name);   \
-})
-
 #else
 #define STATIC_CALL_TRAMP_ADDR(name) NULL
 #endif
 
-
-#define DECLARE_STATIC_CALL(name, func)
\
-   extern struct static_call_key STATIC_CALL_KEY(name);\
-   extern typeof(func) STATIC_CALL_TRAMP(name);
-
 #define static_call_update(name, func) \
 ({ \
BUILD_BUG_ON(!__same_type(*(func), STATIC_CALL_TRAMP(name)));   \
@@ -174,7 +158,6 @@ extern int static_call_text_reserved(void *start, void 
*end);
};  \
ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name)
 
-#define static_call(name)  __static_call(name)
 #define static_call_cond(name) (void)__static_call(name)
 
 #define EXPORT_STATIC_CALL(name)   \
@@ -207,7 +190,6 @@ struct static_call_key {
};  \
ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name)
 
-#define static_call(name)  __static_call(name)
 #define static_call_cond(name) (void)__static_call(name)
 
 static inline
@@ -252,9 +234,6 @@ struct static_call_key {
.func = NULL,   \
}
 
-#define static_call(name)  \
-   ((typeof(STATIC_CALL_TRAMP(name))*)(STATIC_CALL_KEY(name).func))
-
 static inline void __static_call_nop(void) { }
 
 /*
diff --git a/include/linux/static_call_types.h 
b/include/linux/static_call_types.h
index 89135bb..08f78b1 100644
--- a/include/linux/static_call_types.h
+++ b/include/linux/static_call_types.h
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 
 #define STATIC_CALL_KEY_PREFIX __SCK__
 #define STATIC_CALL_KEY_PREFIX_STR __stringify(STATIC_CALL_KEY_PREFIX)
@@ -32,4 +33,30 @@ struct static_call_site {
s32 key;
 };
 
+#define DECLARE_STATIC_CALL(name, func)
\
+   extern struct static_call_key STATIC_CALL_KEY(name);\
+   extern typeof(func) STATIC_CALL_TRAMP(name);
+
+#ifdef CONFIG_HAVE_STATIC_CALL
+
+/*
+ * __ADDRESSABLE() is used to ensure the key symbol doesn't get stripped from
+ * the symbol table so that objtool can reference it when it generates the
+ * .static_call_sites section.
+ */
+#define __static_call(name)\
+({ \
+   __ADDRESSABLE(STATIC_CALL_KEY(name));   \
+   _CALL_TRAMP(name);   \
+})
+
+#define static_call(name)  __static_call(name)
+
+#else
+
+#define static_call(name)  \
+   ((typeof(STATIC_CALL_TRAMP(name))*)(STATIC_CALL_KEY(name).func))
+
+#endif /* CONFIG_HAVE_STATIC_CALL */
+
 #endif /* _STATIC_CALL_TYPES_H */
diff --git a/tools/include/linux/static_call_types.h

[tip: sched/core] preempt: Introduce CONFIG_PREEMPT_DYNAMIC

2021-02-17 Thread tip-bot2 for Michal Hocko

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 6ef869e0647439af0fc28dde162d33320d4e1dd7
Gitweb:
https://git.kernel.org/tip/6ef869e0647439af0fc28dde162d33320d4e1dd7
Author:Michal Hocko 
AuthorDate:Mon, 18 Jan 2021 15:12:19 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:24 +01:00

preempt: Introduce CONFIG_PREEMPT_DYNAMIC

Preemption mode selection is currently hardcoded on Kconfig choices.
Introduce a dedicated option to tune preemption flavour at boot time,

This will be only available on architectures efficiently supporting
static calls in order not to tempt with the feature against additional
overhead that might be prohibitive or undesirable.

CONFIG_PREEMPT_DYNAMIC is automatically selected by CONFIG_PREEMPT if
the architecture provides the necessary support (CONFIG_STATIC_CALL_INLINE,
CONFIG_GENERIC_ENTRY, and provide with __preempt_schedule_function() /
__preempt_schedule_notrace_function()).

Suggested-by: Peter Zijlstra 
Signed-off-by: Michal Hocko 
Signed-off-by: Frederic Weisbecker 
[peterz: relax requirement to HAVE_STATIC_CALL]
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Link: https://lkml.kernel.org/r/20210118141223.123667-5-frede...@kernel.org
---
 Documentation/admin-guide/kernel-parameters.txt |  7 ++-
 arch/Kconfig|  9 -
 arch/x86/Kconfig|  1 +-
 kernel/Kconfig.preempt  | 19 -
 4 files changed, 36 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index a10b545..78ab294 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3916,6 +3916,13 @@
Format: {"off"}
Disable Hardware Transactional Memory
 
+   preempt=[KNL]
+   Select preemption mode if you have 
CONFIG_PREEMPT_DYNAMIC
+   none - Limited to cond_resched() calls
+   voluntary - Limited to cond_resched() and might_sleep() 
calls
+   full - Any section that isn't explicitly preempt 
disabled
+  can be preempted anytime.
+
print-fatal-signals=
[KNL] debug: print fatal signals
 
diff --git a/arch/Kconfig b/arch/Kconfig
index 24862d1..1245079 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1090,6 +1090,15 @@ config HAVE_STATIC_CALL_INLINE
bool
depends on HAVE_STATIC_CALL
 
+config HAVE_PREEMPT_DYNAMIC
+   bool
+   depends on HAVE_STATIC_CALL
+   depends on GENERIC_ENTRY
+   help
+  Select this if the architecture support boot time preempt setting
+  on top of static calls. It is strongly advised to support inline
+  static call to avoid any overhead.
+
 config ARCH_WANT_LD_ORPHAN_WARN
bool
help
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 21f8511..d3338a8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -224,6 +224,7 @@ config X86
select HAVE_STACK_VALIDATIONif X86_64
select HAVE_STATIC_CALL
select HAVE_STATIC_CALL_INLINE  if HAVE_STACK_VALIDATION
+   select HAVE_PREEMPT_DYNAMIC
select HAVE_RSEQ
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_UNSTABLE_SCHED_CLOCK
diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index bf82259..4160173 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -40,6 +40,7 @@ config PREEMPT
depends on !ARCH_NO_PREEMPT
select PREEMPTION
select UNINLINE_SPIN_UNLOCK if !ARCH_INLINE_SPIN_UNLOCK
+   select PREEMPT_DYNAMIC if HAVE_PREEMPT_DYNAMIC
help
  This option reduces the latency of the kernel by making
  all kernel code (that is not executing in a critical section)
@@ -80,3 +81,21 @@ config PREEMPT_COUNT
 config PREEMPTION
bool
select PREEMPT_COUNT
+
+config PREEMPT_DYNAMIC
+   bool
+   help
+ This option allows to define the preemption model on the kernel
+ command line parameter and thus override the default preemption
+ model defined during compile time.
+
+ The feature is primarily interesting for Linux distributions which
+ provide a pre-built kernel binary to reduce the number of kernel
+ flavors they offer while still offering different usecases.
+
+ The runtime overhead is negligible with HAVE_STATIC_CALL_INLINE 
enabled
+ but if runtime patching is not available for the specific architecture
+ then the potential overhead should be considered.
+
+ Interesting if you want the same pre-built kernel should be used for
+ both Server and Desktop workloads.

[tip: sched/core] sched/core: Update task_prio() function header

2021-02-17 Thread tip-bot2 for Dietmar Eggemann

The following commit has been merged into the sched/core branch of tip:

Commit-ID: c541bb7835a306cdbbe8abbdf4e4df507e0ca27a
Gitweb:
https://git.kernel.org/tip/c541bb7835a306cdbbe8abbdf4e4df507e0ca27a
Author:Dietmar Eggemann 
AuthorDate:Thu, 28 Jan 2021 14:10:40 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:08:30 +01:00

sched/core: Update task_prio() function header

The description of the RT offset and the values for 'normal' tasks needs
update. Moreover there are DL tasks now.
task_prio() has to stay like it is to guarantee compatibility with the
/proc//stat priority field:

  # cat /proc//stat | awk '{ print $18; }'

Signed-off-by: Dietmar Eggemann 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Link: https://lkml.kernel.org/r/20210128131040.296856-4-dietmar.eggem...@arm.com
---
 kernel/sched/core.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f0b0b67..4afbdd2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5616,8 +5616,12 @@ SYSCALL_DEFINE1(nice, int, increment)
  * @p: the task in question.
  *
  * Return: The priority value as seen by users in /proc.
- * RT tasks are offset by -200. Normal tasks are centered
- * around 0, value goes from -16 to +15.
+ *
+ * sched policy return value   kernel priouser prio/nice
+ *
+ * normal, batch, idle [0 ... 39]  [100 ... 139]  0/[-20 ... 19]
+ * fifo, rr [-2 ... -100] [98 ... 0]  [1 ... 99]
+ * deadline -101 -1   0
  */
 int task_prio(const struct task_struct *p)
 {

[tip: sched/core] sched: Add /debug/sched_preempt

2021-02-17 Thread tip-bot2 for Peter Zijlstra

The following commit has been merged into the sched/core branch of tip:

Commit-ID: e59e10f8ef63d42fbb99776a5a112841e798b3b5
Gitweb:
https://git.kernel.org/tip/e59e10f8ef63d42fbb99776a5a112841e798b3b5
Author:Peter Zijlstra 
AuthorDate:Fri, 22 Jan 2021 13:01:58 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:42 +01:00

sched: Add /debug/sched_preempt

Add a debugfs file to muck about with the preempt mode at runtime.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Link: https://lkml.kernel.org/r/yasgiuyf6nyat...@hirez.programming.kicks-ass.net
---
 kernel/sched/core.c | 135 ---
 1 file changed, 126 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0c06717..4a17bb5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5363,37 +5363,154 @@ EXPORT_STATIC_CALL(preempt_schedule_notrace);
  *   preempt_schedule_notrace   <- preempt_schedule_notrace
  *   irqentry_exit_cond_resched <- irqentry_exit_cond_resched
  */
-static int __init setup_preempt_mode(char *str)
+
+enum {
+   preempt_dynamic_none = 0,
+   preempt_dynamic_voluntary,
+   preempt_dynamic_full,
+};
+
+static int preempt_dynamic_mode = preempt_dynamic_full;
+
+static int sched_dynamic_mode(const char *str)
 {
-   if (!strcmp(str, "none")) {
+   if (!strcmp(str, "none"))
+   return 0;
+
+   if (!strcmp(str, "voluntary"))
+   return 1;
+
+   if (!strcmp(str, "full"))
+   return 2;
+
+   return -1;
+}
+
+static void sched_dynamic_update(int mode)
+{
+   /*
+* Avoid {NONE,VOLUNTARY} -> FULL transitions from ever ending up in
+* the ZERO state, which is invalid.
+*/
+   static_call_update(cond_resched, __cond_resched);
+   static_call_update(might_resched, __cond_resched);
+   static_call_update(preempt_schedule, __preempt_schedule_func);
+   static_call_update(preempt_schedule_notrace, 
__preempt_schedule_notrace_func);
+   static_call_update(irqentry_exit_cond_resched, 
irqentry_exit_cond_resched);
+
+   switch (mode) {
+   case preempt_dynamic_none:
static_call_update(cond_resched, __cond_resched);
static_call_update(might_resched, (typeof(&__cond_resched)) 
__static_call_return0);
static_call_update(preempt_schedule, 
(typeof(_schedule)) NULL);
static_call_update(preempt_schedule_notrace, 
(typeof(_schedule_notrace)) NULL);
static_call_update(irqentry_exit_cond_resched, 
(typeof(_exit_cond_resched)) NULL);
-   pr_info("Dynamic Preempt: %s\n", str);
-   } else if (!strcmp(str, "voluntary")) {
+   pr_info("Dynamic Preempt: none\n");
+   break;
+
+   case preempt_dynamic_voluntary:
static_call_update(cond_resched, __cond_resched);
static_call_update(might_resched, __cond_resched);
static_call_update(preempt_schedule, 
(typeof(_schedule)) NULL);
static_call_update(preempt_schedule_notrace, 
(typeof(_schedule_notrace)) NULL);
static_call_update(irqentry_exit_cond_resched, 
(typeof(_exit_cond_resched)) NULL);
-   pr_info("Dynamic Preempt: %s\n", str);
-   } else if (!strcmp(str, "full")) {
+   pr_info("Dynamic Preempt: voluntary\n");
+   break;
+
+   case preempt_dynamic_full:
static_call_update(cond_resched, (typeof(&__cond_resched)) 
__static_call_return0);
static_call_update(might_resched, (typeof(&__cond_resched)) 
__static_call_return0);
static_call_update(preempt_schedule, __preempt_schedule_func);
static_call_update(preempt_schedule_notrace, 
__preempt_schedule_notrace_func);
static_call_update(irqentry_exit_cond_resched, 
irqentry_exit_cond_resched);
-   pr_info("Dynamic Preempt: %s\n", str);
-   } else {
-   pr_warn("Dynamic Preempt: Unsupported preempt mode %s, default 
to full\n", str);
+   pr_info("Dynamic Preempt: full\n");
+   break;
+   }
+
+   preempt_dynamic_mode = mode;
+}
+
+static int __init setup_preempt_mode(char *str)
+{
+   int mode = sched_dynamic_mode(str);
+   if (mode < 0) {
+   pr_warn("Dynamic Preempt: unsupported mode: %s\n", str);
return 1;
}
+
+   sched_dynamic_update(mode);
return 0;
 }
 __setup("preempt=", setup_preempt_mode);
 
+#ifdef CONFIG_SCHED_DEBUG
+
+static ssize_t sched_dynamic_write(struct file *filp, const char __user *ubuf,
+  size_t cnt, loff_t *ppos)
+{
+   char buf[16];
+   int mode;
+
+   if (cnt > 15)
+   cnt = 15;
+
+   if (copy_from_user(, ubuf, cnt))
+   return -EFAULT;
+
+   buf[cnt] = 0;
+   mode =

[tip: sched/core] preempt/dynamic: Provide preempt_schedule[_notrace]() static calls

2021-02-17 Thread tip-bot2 for Peter Zijlstra (Intel)

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 2c9a98d3bc808717ab63ad928a2b568967775388
Gitweb:
https://git.kernel.org/tip/2c9a98d3bc808717ab63ad928a2b568967775388
Author:Peter Zijlstra (Intel) 
AuthorDate:Mon, 18 Jan 2021 15:12:21 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:42 +01:00

preempt/dynamic: Provide preempt_schedule[_notrace]() static calls

Provide static calls to control preempt_schedule[_notrace]()
(called in CONFIG_PREEMPT) so that we can override their behaviour when
preempt= is overriden.

Since the default behaviour is full preemption, both their calls are
initialized to the arch provided wrapper, if any.

[fweisbec: only define static calls when PREEMPT_DYNAMIC, make it less
   dependent on x86 with __preempt_schedule_func]
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Link: https://lkml.kernel.org/r/20210118141223.123667-7-frede...@kernel.org
---
 arch/x86/include/asm/preempt.h | 34 +
 kernel/sched/core.c| 12 -
 2 files changed, 38 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index 69485ca..9b12dce 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
 DECLARE_PER_CPU(int, __preempt_count);
 
@@ -103,16 +104,33 @@ static __always_inline bool should_resched(int 
preempt_offset)
 }
 
 #ifdef CONFIG_PREEMPTION
-  extern asmlinkage void preempt_schedule_thunk(void);
-# define __preempt_schedule() \
-   asm volatile ("call preempt_schedule_thunk" : ASM_CALL_CONSTRAINT)
 
-  extern asmlinkage void preempt_schedule(void);
-  extern asmlinkage void preempt_schedule_notrace_thunk(void);
-# define __preempt_schedule_notrace() \
-   asm volatile ("call preempt_schedule_notrace_thunk" : 
ASM_CALL_CONSTRAINT)
+extern asmlinkage void preempt_schedule(void);
+extern asmlinkage void preempt_schedule_thunk(void);
+
+#define __preempt_schedule_func preempt_schedule_thunk
+
+DECLARE_STATIC_CALL(preempt_schedule, __preempt_schedule_func);
+
+#define __preempt_schedule() \
+do { \
+   __ADDRESSABLE(STATIC_CALL_KEY(preempt_schedule)); \
+   asm volatile ("call " STATIC_CALL_TRAMP_STR(preempt_schedule) : 
ASM_CALL_CONSTRAINT); \
+} while (0)
+
+extern asmlinkage void preempt_schedule_notrace(void);
+extern asmlinkage void preempt_schedule_notrace_thunk(void);
+
+#define __preempt_schedule_notrace_func preempt_schedule_notrace_thunk
+
+DECLARE_STATIC_CALL(preempt_schedule_notrace, __preempt_schedule_notrace_func);
+
+#define __preempt_schedule_notrace() \
+do { \
+   __ADDRESSABLE(STATIC_CALL_KEY(preempt_schedule_notrace)); \
+   asm volatile ("call " STATIC_CALL_TRAMP_STR(preempt_schedule_notrace) : 
ASM_CALL_CONSTRAINT); \
+} while (0)
 
-  extern asmlinkage void preempt_schedule_notrace(void);
 #endif
 
 #endif /* __ASM_PREEMPT_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f7c8fd8..880611c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5265,6 +5265,12 @@ asmlinkage __visible void __sched notrace 
preempt_schedule(void)
 NOKPROBE_SYMBOL(preempt_schedule);
 EXPORT_SYMBOL(preempt_schedule);
 
+#ifdef CONFIG_PREEMPT_DYNAMIC
+DEFINE_STATIC_CALL(preempt_schedule, __preempt_schedule_func);
+EXPORT_STATIC_CALL(preempt_schedule);
+#endif
+
+
 /**
  * preempt_schedule_notrace - preempt_schedule called by tracing
  *
@@ -5317,6 +5323,12 @@ asmlinkage __visible void __sched notrace 
preempt_schedule_notrace(void)
 }
 EXPORT_SYMBOL_GPL(preempt_schedule_notrace);
 
+#ifdef CONFIG_PREEMPT_DYNAMIC
+DEFINE_STATIC_CALL(preempt_schedule_notrace, __preempt_schedule_notrace_func);
+EXPORT_STATIC_CALL(preempt_schedule_notrace);
+#endif
+
+
 #endif /* CONFIG_PREEMPTION */
 
 /*

[tip: sched/core] sched: Harden PREEMPT_DYNAMIC

2021-02-17 Thread tip-bot2 for Peter Zijlstra

The following commit has been merged into the sched/core branch of tip:

Commit-ID: ef72661e28c64ad610f89acc2832ec67b27ba438
Gitweb:
https://git.kernel.org/tip/ef72661e28c64ad610f89acc2832ec67b27ba438
Author:Peter Zijlstra 
AuthorDate:Mon, 25 Jan 2021 16:26:50 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:42 +01:00

sched: Harden PREEMPT_DYNAMIC

Use the new EXPORT_STATIC_CALL_TRAMP() / static_call_mod() to unexport
the static_call_key for the PREEMPT_DYNAMIC calls such that modules
can no longer update these calls.

Having modules change/hi-jack the preemption calls would be horrible.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/preempt.h | 4 ++--
 include/linux/kernel.h | 2 +-
 include/linux/sched.h  | 2 +-
 kernel/sched/core.c| 8 
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index 9b12dce..0aa96f8 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -114,7 +114,7 @@ DECLARE_STATIC_CALL(preempt_schedule, 
__preempt_schedule_func);
 
 #define __preempt_schedule() \
 do { \
-   __ADDRESSABLE(STATIC_CALL_KEY(preempt_schedule)); \
+   __STATIC_CALL_MOD_ADDRESSABLE(preempt_schedule); \
asm volatile ("call " STATIC_CALL_TRAMP_STR(preempt_schedule) : 
ASM_CALL_CONSTRAINT); \
 } while (0)
 
@@ -127,7 +127,7 @@ DECLARE_STATIC_CALL(preempt_schedule_notrace, 
__preempt_schedule_notrace_func);
 
 #define __preempt_schedule_notrace() \
 do { \
-   __ADDRESSABLE(STATIC_CALL_KEY(preempt_schedule_notrace)); \
+   __STATIC_CALL_MOD_ADDRESSABLE(preempt_schedule_notrace); \
asm volatile ("call " STATIC_CALL_TRAMP_STR(preempt_schedule_notrace) : 
ASM_CALL_CONSTRAINT); \
 } while (0)
 
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index cfd3d34..5b7ed6d 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -93,7 +93,7 @@ DECLARE_STATIC_CALL(might_resched, __cond_resched);
 
 static __always_inline void might_resched(void)
 {
-   static_call(might_resched)();
+   static_call_mod(might_resched)();
 }
 
 #else
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2f35594..4d56828 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1880,7 +1880,7 @@ DECLARE_STATIC_CALL(cond_resched, __cond_resched);
 
 static __always_inline int _cond_resched(void)
 {
-   return static_call(cond_resched)();
+   return static_call_mod(cond_resched)();
 }
 
 #else
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4a17bb5..cec507b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5267,7 +5267,7 @@ EXPORT_SYMBOL(preempt_schedule);
 
 #ifdef CONFIG_PREEMPT_DYNAMIC
 DEFINE_STATIC_CALL(preempt_schedule, __preempt_schedule_func);
-EXPORT_STATIC_CALL(preempt_schedule);
+EXPORT_STATIC_CALL_TRAMP(preempt_schedule);
 #endif
 
 
@@ -5325,7 +5325,7 @@ EXPORT_SYMBOL_GPL(preempt_schedule_notrace);
 
 #ifdef CONFIG_PREEMPT_DYNAMIC
 DEFINE_STATIC_CALL(preempt_schedule_notrace, __preempt_schedule_notrace_func);
-EXPORT_STATIC_CALL(preempt_schedule_notrace);
+EXPORT_STATIC_CALL_TRAMP(preempt_schedule_notrace);
 #endif
 
 #endif /* CONFIG_PREEMPTION */
@@ -6997,10 +6997,10 @@ EXPORT_SYMBOL(__cond_resched);
 
 #ifdef CONFIG_PREEMPT_DYNAMIC
 DEFINE_STATIC_CALL_RET0(cond_resched, __cond_resched);
-EXPORT_STATIC_CALL(cond_resched);
+EXPORT_STATIC_CALL_TRAMP(cond_resched);
 
 DEFINE_STATIC_CALL_RET0(might_resched, __cond_resched);
-EXPORT_STATIC_CALL(might_resched);
+EXPORT_STATIC_CALL_TRAMP(might_resched);
 #endif
 
 /*

[tip: sched/core] preempt/dynamic: Provide irqentry_exit_cond_resched() static call

2021-02-17 Thread tip-bot2 for Peter Zijlstra (Intel)

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 40607ee97e4eec5655cc0f76a720bdc4c63a6434
Gitweb:
https://git.kernel.org/tip/40607ee97e4eec5655cc0f76a720bdc4c63a6434
Author:Peter Zijlstra (Intel) 
AuthorDate:Mon, 18 Jan 2021 15:12:22 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:42 +01:00

preempt/dynamic: Provide irqentry_exit_cond_resched() static call

Provide static call to control IRQ preemption (called in CONFIG_PREEMPT)
so that we can override its behaviour when preempt= is overriden.

Since the default behaviour is full preemption, its call is
initialized to provide IRQ preemption when preempt= isn't passed.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Link: https://lkml.kernel.org/r/20210118141223.123667-8-frede...@kernel.org
---
 include/linux/entry-common.h |  4 
 kernel/entry/common.c| 10 +-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
index a104b29..883acef 100644
--- a/include/linux/entry-common.h
+++ b/include/linux/entry-common.h
@@ -2,6 +2,7 @@
 #ifndef __LINUX_ENTRYCOMMON_H
 #define __LINUX_ENTRYCOMMON_H
 
+#include 
 #include 
 #include 
 #include 
@@ -454,6 +455,9 @@ irqentry_state_t noinstr irqentry_enter(struct pt_regs 
*regs);
  * Conditional reschedule with additional sanity checks.
  */
 void irqentry_exit_cond_resched(void);
+#ifdef CONFIG_PREEMPT_DYNAMIC
+DECLARE_STATIC_CALL(irqentry_exit_cond_resched, irqentry_exit_cond_resched);
+#endif
 
 /**
  * irqentry_exit - Handle return from exception that used irqentry_enter()
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index f9d491b..f09cae3 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -385,6 +385,9 @@ void irqentry_exit_cond_resched(void)
preempt_schedule_irq();
}
 }
+#ifdef CONFIG_PREEMPT_DYNAMIC
+DEFINE_STATIC_CALL(irqentry_exit_cond_resched, irqentry_exit_cond_resched);
+#endif
 
 noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
 {
@@ -411,8 +414,13 @@ noinstr void irqentry_exit(struct pt_regs *regs, 
irqentry_state_t state)
}
 
instrumentation_begin();
-   if (IS_ENABLED(CONFIG_PREEMPTION))
+   if (IS_ENABLED(CONFIG_PREEMPTION)) {
+#ifdef CONFIG_PREEMT_DYNAMIC
+   static_call(irqentry_exit_cond_resched)();
+#else
irqentry_exit_cond_resched();
+#endif
+   }
/* Covers both tracing and lockdep */
trace_hardirqs_on();
instrumentation_end();

[tip: sched/core] sched/deadline: Reduce rq lock contention in dl_add_task_root_domain()

2021-02-17 Thread tip-bot2 for Dietmar Eggemann

The following commit has been merged into the sched/core branch of tip:

Commit-ID: de40f33e788b0c016bfde512ace2f76339ef7ddb
Gitweb:
https://git.kernel.org/tip/de40f33e788b0c016bfde512ace2f76339ef7ddb
Author:Dietmar Eggemann 
AuthorDate:Tue, 19 Jan 2021 09:35:42 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:42 +01:00

sched/deadline: Reduce rq lock contention in dl_add_task_root_domain()

dl_add_task_root_domain() is called during sched domain rebuild:

  rebuild_sched_domains_locked()
partition_and_rebuild_sched_domains()
  rebuild_root_domains()
 for all top_cpuset descendants:
   update_tasks_root_domain()
 for all tasks of cpuset:
   dl_add_task_root_domain()

Change it so that only the task pi lock is taken to check if the task
has a SCHED_DEADLINE (DL) policy. In case that p is a DL task take the
rq lock as well to be able to safely de-reference root domain's DL
bandwidth structure.

Most of the tasks will have another policy (namely SCHED_NORMAL) and
can now bail without taking the rq lock.

One thing to note here: Even in case that there aren't any DL user
tasks, a slow frequency switching system with cpufreq gov schedutil has
a DL task (sugov) per frequency domain running which participates in DL
bandwidth management.

Signed-off-by: Dietmar Eggemann 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Reviewed-by: Quentin Perret 
Reviewed-by: Valentin Schneider 
Reviewed-by: Daniel Bristot de Oliveira 
Acked-by: Juri Lelli 
Link: https://lkml.kernel.org/r/20210119083542.19856-1-dietmar.eggem...@arm.com
---
 kernel/sched/deadline.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 1508d12..6f37796 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2388,9 +2388,13 @@ void dl_add_task_root_domain(struct task_struct *p)
struct rq *rq;
struct dl_bw *dl_b;
 
-   rq = task_rq_lock(p, );
-   if (!dl_task(p))
-   goto unlock;
+   raw_spin_lock_irqsave(>pi_lock, rf.flags);
+   if (!dl_task(p)) {
+   raw_spin_unlock_irqrestore(>pi_lock, rf.flags);
+   return;
+   }
+
+   rq = __task_rq_lock(p, );
 
dl_b = >rd->dl_bw;
raw_spin_lock(_b->lock);
@@ -2399,7 +2403,6 @@ void dl_add_task_root_domain(struct task_struct *p)
 
raw_spin_unlock(_b->lock);
 
-unlock:
task_rq_unlock(rq, p, );
 }

[tip: sched/core] sched/features: Fix hrtick reprogramming

2021-02-17 Thread tip-bot2 for Juri Lelli

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 156ec6f42b8d300dbbf382738ff35c8bad8f4c3a
Gitweb:
https://git.kernel.org/tip/156ec6f42b8d300dbbf382738ff35c8bad8f4c3a
Author:Juri Lelli 
AuthorDate:Mon, 08 Feb 2021 08:35:53 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:42 +01:00

sched/features: Fix hrtick reprogramming

Hung tasks and RCU stall cases were reported on systems which were not
100% busy. Investigation of such unexpected cases (no sign of potential
starvation caused by tasks hogging the system) pointed out that the
periodic sched tick timer wasn't serviced anymore after a certain point
and that caused all machinery that depends on it (timers, RCU, etc.) to
stop working as well. This issues was however only reproducible if
HRTICK was enabled.

Looking at core dumps it was found that the rbtree of the hrtimer base
used also for the hrtick was corrupted (i.e. next as seen from the base
root and actual leftmost obtained by traversing the tree are different).
Same base is also used for periodic tick hrtimer, which might get "lost"
if the rbtree gets corrupted.

Much alike what described in commit 1f71addd34f4c ("tick/sched: Do not
mess with an enqueued hrtimer") there is a race window between
hrtimer_set_expires() in hrtick_start and hrtimer_start_expires() in
__hrtick_restart() in which the former might be operating on an already
queued hrtick hrtimer, which might lead to corruption of the base.

Use hrtick_start() (which removes the timer before enqueuing it back) to
ensure hrtick hrtimer reprogramming is entirely guarded by the base
lock, so that no race conditions can occur.

Signed-off-by: Juri Lelli 
Signed-off-by: Luis Claudio R. Goncalves 
Signed-off-by: Daniel Bristot de Oliveira 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Link: https://lkml.kernel.org/r/20210208073554.14629-2-juri.le...@redhat.com
---
 kernel/sched/core.c  | 8 +++-
 kernel/sched/sched.h | 1 +
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cec507b..18d51ab 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -355,8 +355,9 @@ static enum hrtimer_restart hrtick(struct hrtimer *timer)
 static void __hrtick_restart(struct rq *rq)
 {
struct hrtimer *timer = >hrtick_timer;
+   ktime_t time = rq->hrtick_time;
 
-   hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED_HARD);
+   hrtimer_start(timer, time, HRTIMER_MODE_ABS_PINNED_HARD);
 }
 
 /*
@@ -380,7 +381,6 @@ static void __hrtick_start(void *arg)
 void hrtick_start(struct rq *rq, u64 delay)
 {
struct hrtimer *timer = >hrtick_timer;
-   ktime_t time;
s64 delta;
 
/*
@@ -388,9 +388,7 @@ void hrtick_start(struct rq *rq, u64 delay)
 * doesn't make sense and can cause timer DoS.
 */
delta = max_t(s64, delay, 1LL);
-   time = ktime_add_ns(timer->base->get_time(), delta);
-
-   hrtimer_set_expires(timer, time);
+   rq->hrtick_time = ktime_add_ns(timer->base->get_time(), delta);
 
if (rq == this_rq())
__hrtick_restart(rq);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2185b3b..0dfdd52 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1031,6 +1031,7 @@ struct rq {
call_single_data_t  hrtick_csd;
 #endif
struct hrtimer  hrtick_timer;
+   ktime_t hrtick_time;
 #endif
 
 #ifdef CONFIG_SCHEDSTATS

[tip: sched/core] rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers

2021-02-17 Thread tip-bot2 for Frederic Weisbecker

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 54b7429efffc99e845ba9381bee3244f012a06c2
Gitweb:
https://git.kernel.org/tip/54b7429efffc99e845ba9381bee3244f012a06c2
Author:Frederic Weisbecker 
AuthorDate:Mon, 01 Feb 2021 00:05:44 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:42 +01:00

rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers

Deferred wakeup of rcuog kthreads upon RCU idle mode entry is going to
be handled differently whether initiated by idle, user or guest. Prepare
with pulling that control up to rcu_eqs_enter() callers.

Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-2-frede...@kernel.org
---
 kernel/rcu/tree.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 40e5e3d..63032e5 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -644,7 +644,6 @@ static noinstr void rcu_eqs_enter(bool user)
trace_rcu_dyntick(TPS("Start"), rdp->dynticks_nesting, 0, 
atomic_read(>dynticks));
WARN_ON_ONCE(IS_ENABLED(CONFIG_RCU_EQS_DEBUG) && !user && 
!is_idle_task(current));
rdp = this_cpu_ptr(_data);
-   do_nocb_deferred_wakeup(rdp);
rcu_prepare_for_idle();
rcu_preempt_deferred_qs(current);
 
@@ -672,7 +671,10 @@ static noinstr void rcu_eqs_enter(bool user)
  */
 void rcu_idle_enter(void)
 {
+   struct rcu_data *rdp = this_cpu_ptr(_data);
+
lockdep_assert_irqs_disabled();
+   do_nocb_deferred_wakeup(rdp);
rcu_eqs_enter(false);
 }
 EXPORT_SYMBOL_GPL(rcu_idle_enter);
@@ -691,7 +693,14 @@ EXPORT_SYMBOL_GPL(rcu_idle_enter);
  */
 noinstr void rcu_user_enter(void)
 {
+   struct rcu_data *rdp = this_cpu_ptr(_data);
+
lockdep_assert_irqs_disabled();
+
+   instrumentation_begin();
+   do_nocb_deferred_wakeup(rdp);
+   instrumentation_end();
+
rcu_eqs_enter(true);
 }
 #endif /* CONFIG_NO_HZ_FULL */

[tip: sched/core] smp: Process pending softirqs in flush_smp_call_function_from_idle()

2021-02-17 Thread tip-bot2 for Sebastian Andrzej Siewior

The following commit has been merged into the sched/core branch of tip:

Commit-ID: f9d34595ae4feed38856b88769e2ba5af22d2548
Gitweb:
https://git.kernel.org/tip/f9d34595ae4feed38856b88769e2ba5af22d2548
Author:Sebastian Andrzej Siewior 
AuthorDate:Sat, 23 Jan 2021 21:10:25 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:42 +01:00

smp: Process pending softirqs in flush_smp_call_function_from_idle()

send_call_function_single_ipi() may wake an idle CPU without sending an
IPI. The woken up CPU will process the SMP-functions in
flush_smp_call_function_from_idle(). Any raised softirq from within the
SMP-function call will not be processed.
Should the CPU have no tasks assigned, then it will go back to idle with
pending softirqs and the NOHZ will rightfully complain.

Process pending softirqs on return from flush_smp_call_function_queue().

Fixes: b2a02fc43a1f4 ("smp: Optimize send_call_function_single_ipi()")
Reported-by: Jens Axboe 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Link: https://lkml.kernel.org/r/20210123201027.3262800-2-bige...@linutronix.de
---
 kernel/smp.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/smp.c b/kernel/smp.c
index 1b6070b..aeb0adf 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -449,6 +450,9 @@ void flush_smp_call_function_from_idle(void)
 
local_irq_save(flags);
flush_smp_call_function_queue(true);
+   if (local_softirq_pending())
+   do_softirq();
+
local_irq_restore(flags);
 }

[tip: sched/core] rcu/nocb: Perform deferred wake up before last idle's need_resched() check

2021-02-17 Thread tip-bot2 for Frederic Weisbecker

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 43789ef3f7d61aa7bed0cb2764e588fc990c30ef
Gitweb:
https://git.kernel.org/tip/43789ef3f7d61aa7bed0cb2764e588fc990c30ef
Author:Frederic Weisbecker 
AuthorDate:Mon, 01 Feb 2021 00:05:45 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:43 +01:00

rcu/nocb: Perform deferred wake up before last idle's need_resched() check

Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.

Usually a local wake up happening while running the idle task is handled
in one of the need_resched() checks carefully placed within the idle
loop that can break to the scheduler.

Unfortunately the call to rcu_idle_enter() is already beyond the last
generic need_resched() check and we may halt the CPU with a resched
request unhandled, leaving the task hanging.

Fix this with splitting the rcuog wakeup handling from rcu_idle_enter()
and place it before the last generic need_resched() check in the idle
loop. It is then assumed that no call to call_rcu() will be performed
after that in the idle loop until the CPU is put in low power mode.

Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and 
perf)
Reported-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-3-frede...@kernel.org
---
 include/linux/rcupdate.h | 2 ++
 kernel/rcu/tree.c| 3 ---
 kernel/rcu/tree_plugin.h | 5 +
 kernel/sched/idle.c  | 1 +
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index fd02c5f..36c2119 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -110,8 +110,10 @@ static inline void rcu_user_exit(void) { }
 
 #ifdef CONFIG_RCU_NOCB_CPU
 void rcu_init_nohz(void);
+void rcu_nocb_flush_deferred_wakeup(void);
 #else /* #ifdef CONFIG_RCU_NOCB_CPU */
 static inline void rcu_init_nohz(void) { }
+static inline void rcu_nocb_flush_deferred_wakeup(void) { }
 #endif /* #else #ifdef CONFIG_RCU_NOCB_CPU */
 
 /**
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 63032e5..82838e9 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -671,10 +671,7 @@ static noinstr void rcu_eqs_enter(bool user)
  */
 void rcu_idle_enter(void)
 {
-   struct rcu_data *rdp = this_cpu_ptr(_data);
-
lockdep_assert_irqs_disabled();
-   do_nocb_deferred_wakeup(rdp);
rcu_eqs_enter(false);
 }
 EXPORT_SYMBOL_GPL(rcu_idle_enter);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 7e291ce..d5b38c2 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -2187,6 +2187,11 @@ static void do_nocb_deferred_wakeup(struct rcu_data *rdp)
do_nocb_deferred_wakeup_common(rdp);
 }
 
+void rcu_nocb_flush_deferred_wakeup(void)
+{
+   do_nocb_deferred_wakeup(this_cpu_ptr(_data));
+}
+
 void __init rcu_init_nohz(void)
 {
int cpu;
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 305727e..7199e6f 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -285,6 +285,7 @@ static void do_idle(void)
}
 
arch_cpu_idle_enter();
+   rcu_nocb_flush_deferred_wakeup();
 
/*
 * In poll mode we reenable interrupts and spin. Also if we

[tip: sched/core] sched,x86: Allow !PREEMPT_DYNAMIC

2021-02-17 Thread tip-bot2 for Peter Zijlstra

The following commit has been merged into the sched/core branch of tip:

Commit-ID: c5e6fc08feb2b88dc5dac2f3c817e1c2a4cafda4
Gitweb:
https://git.kernel.org/tip/c5e6fc08feb2b88dc5dac2f3c817e1c2a4cafda4
Author:Peter Zijlstra 
AuthorDate:Tue, 09 Feb 2021 22:02:33 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:43 +01:00

sched,x86: Allow !PREEMPT_DYNAMIC

Allow building x86 with PREEMPT_DYNAMIC=n, this is needed for
PREEMPT_RT as it makes no sense to not have full preemption on
PREEMPT_RT.

Fixes: 8c98e8cf723c ("preempt/dynamic: Provide preempt_schedule[_notrace]() 
static calls")
Reported-by: Mike Galbraith 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Tested-by: Mike Galbraith 
Link: https://lkml.kernel.org/r/yck1+jyfnxqnw...@hirez.programming.kicks-ass.net
---
 arch/x86/include/asm/preempt.h | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
index 0aa96f8..f8cb8af 100644
--- a/arch/x86/include/asm/preempt.h
+++ b/arch/x86/include/asm/preempt.h
@@ -110,6 +110,13 @@ extern asmlinkage void preempt_schedule_thunk(void);
 
 #define __preempt_schedule_func preempt_schedule_thunk
 
+extern asmlinkage void preempt_schedule_notrace(void);
+extern asmlinkage void preempt_schedule_notrace_thunk(void);
+
+#define __preempt_schedule_notrace_func preempt_schedule_notrace_thunk
+
+#ifdef CONFIG_PREEMPT_DYNAMIC
+
 DECLARE_STATIC_CALL(preempt_schedule, __preempt_schedule_func);
 
 #define __preempt_schedule() \
@@ -118,11 +125,6 @@ do { \
asm volatile ("call " STATIC_CALL_TRAMP_STR(preempt_schedule) : 
ASM_CALL_CONSTRAINT); \
 } while (0)
 
-extern asmlinkage void preempt_schedule_notrace(void);
-extern asmlinkage void preempt_schedule_notrace_thunk(void);
-
-#define __preempt_schedule_notrace_func preempt_schedule_notrace_thunk
-
 DECLARE_STATIC_CALL(preempt_schedule_notrace, __preempt_schedule_notrace_func);
 
 #define __preempt_schedule_notrace() \
@@ -131,6 +133,16 @@ do { \
asm volatile ("call " STATIC_CALL_TRAMP_STR(preempt_schedule_notrace) : 
ASM_CALL_CONSTRAINT); \
 } while (0)
 
-#endif
+#else /* PREEMPT_DYNAMIC */
+
+#define __preempt_schedule() \
+   asm volatile ("call preempt_schedule_thunk" : ASM_CALL_CONSTRAINT);
+
+#define __preempt_schedule_notrace() \
+   asm volatile ("call preempt_schedule_notrace_thunk" : 
ASM_CALL_CONSTRAINT);
+
+#endif /* PREEMPT_DYNAMIC */
+
+#endif /* PREEMPTION */
 
 #endif /* __ASM_PREEMPT_H */

[tip: sched/core] entry: Explicitly flush pending rcuog wakeup before last rescheduling point

2021-02-17 Thread tip-bot2 for Frederic Weisbecker

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 47b8ff194c1fd73d58dc339b597d466fe48c8958
Gitweb:
https://git.kernel.org/tip/47b8ff194c1fd73d58dc339b597d466fe48c8958
Author:Frederic Weisbecker 
AuthorDate:Mon, 01 Feb 2021 00:05:47 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:43 +01:00

entry: Explicitly flush pending rcuog wakeup before last rescheduling point

Following the idle loop model, cleanly check for pending rcuog wakeup
before the last rescheduling point on resuming to user mode. This
way we can avoid to do it from rcu_user_enter() with the last resort
self-IPI hack that enforces rescheduling.

Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-5-frede...@kernel.org
---
 kernel/entry/common.c |  7 +++
 kernel/rcu/tree.c | 12 +++-
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index f09cae3..8442e5c 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -184,6 +184,10 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs 
*regs,
 * enabled above.
 */
local_irq_disable_exit_to_user();
+
+   /* Check if any of the above work has queued a deferred wakeup 
*/
+   rcu_nocb_flush_deferred_wakeup();
+
ti_work = READ_ONCE(current_thread_info()->flags);
}
 
@@ -197,6 +201,9 @@ static void exit_to_user_mode_prepare(struct pt_regs *regs)
 
lockdep_assert_irqs_disabled();
 
+   /* Flush pending rcuog wakeup before the last need_resched() check */
+   rcu_nocb_flush_deferred_wakeup();
+
if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
ti_work = exit_to_user_mode_loop(regs, ti_work);
 
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4b1e5bd..2ebc211 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -707,13 +707,15 @@ noinstr void rcu_user_enter(void)
lockdep_assert_irqs_disabled();
 
/*
-* We may be past the last rescheduling opportunity in the entry code.
-* Trigger a self IPI that will fire and reschedule once we resume to
-* user/guest mode.
+* Other than generic entry implementation, we may be past the last
+* rescheduling opportunity in the entry code. Trigger a self IPI
+* that will fire and reschedule once we resume in user/guest mode.
 */
instrumentation_begin();
-   if (do_nocb_deferred_wakeup(rdp) && need_resched())
-   irq_work_queue(this_cpu_ptr(_wakeup_work));
+   if (!IS_ENABLED(CONFIG_GENERIC_ENTRY) || (current->flags & PF_VCPU)) {
+   if (do_nocb_deferred_wakeup(rdp) && need_resched())
+   irq_work_queue(this_cpu_ptr(_wakeup_work));
+   }
instrumentation_end();
 
rcu_eqs_enter(true);

[tip: sched/core] rcu/nocb: Trigger self-IPI on late deferred wake up before user resume

2021-02-17 Thread tip-bot2 for Frederic Weisbecker

The following commit has been merged into the sched/core branch of tip:

Commit-ID: f8bb5cae9616224a39cbb399de382d36ac41df10
Gitweb:
https://git.kernel.org/tip/f8bb5cae9616224a39cbb399de382d36ac41df10
Author:Frederic Weisbecker 
AuthorDate:Mon, 01 Feb 2021 00:05:46 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:43 +01:00

rcu/nocb: Trigger self-IPI on late deferred wake up before user resume

Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.

Unfortunately the call to rcu_user_enter() is already past the last
rescheduling opportunity before we resume to userspace or to guest mode.
We may escape there with the woken task ignored.

The ultimate resort to fix every callsites is to trigger a self-IPI
(nohz_full depends on arch to implement arch_irq_work_raise()) that will
trigger a reschedule on IRQ tail or guest exit.

Eventually every site that want a saner treatment will need to carefully
place a call to rcu_nocb_flush_deferred_wakeup() before the last explicit
need_resched() check upon resume.

Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and 
perf)
Reported-by: Paul E. McKenney 
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-4-frede...@kernel.org
---
 kernel/rcu/tree.c| 21 -
 kernel/rcu/tree.h|  2 +-
 kernel/rcu/tree_plugin.h | 25 -
 3 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 82838e9..4b1e5bd 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -677,6 +677,18 @@ void rcu_idle_enter(void)
 EXPORT_SYMBOL_GPL(rcu_idle_enter);
 
 #ifdef CONFIG_NO_HZ_FULL
+
+/*
+ * An empty function that will trigger a reschedule on
+ * IRQ tail once IRQs get re-enabled on userspace resume.
+ */
+static void late_wakeup_func(struct irq_work *work)
+{
+}
+
+static DEFINE_PER_CPU(struct irq_work, late_wakeup_work) =
+   IRQ_WORK_INIT(late_wakeup_func);
+
 /**
  * rcu_user_enter - inform RCU that we are resuming userspace.
  *
@@ -694,12 +706,19 @@ noinstr void rcu_user_enter(void)
 
lockdep_assert_irqs_disabled();
 
+   /*
+* We may be past the last rescheduling opportunity in the entry code.
+* Trigger a self IPI that will fire and reschedule once we resume to
+* user/guest mode.
+*/
instrumentation_begin();
-   do_nocb_deferred_wakeup(rdp);
+   if (do_nocb_deferred_wakeup(rdp) && need_resched())
+   irq_work_queue(this_cpu_ptr(_wakeup_work));
instrumentation_end();
 
rcu_eqs_enter(true);
 }
+
 #endif /* CONFIG_NO_HZ_FULL */
 
 /**
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 7708ed1..9226f40 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -433,7 +433,7 @@ static bool rcu_nocb_try_bypass(struct rcu_data *rdp, 
struct rcu_head *rhp,
 static void __call_rcu_nocb_wake(struct rcu_data *rdp, bool was_empty,
 unsigned long flags);
 static int rcu_nocb_need_deferred_wakeup(struct rcu_data *rdp);
-static void do_nocb_deferred_wakeup(struct rcu_data *rdp);
+static bool do_nocb_deferred_wakeup(struct rcu_data *rdp);
 static void rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp);
 static void rcu_spawn_cpu_nocb_kthread(int cpu);
 static void __init rcu_spawn_nocb_kthreads(void);
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index d5b38c2..384856e 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1631,8 +1631,8 @@ bool rcu_is_nocb_cpu(int cpu)
  * Kick the GP kthread for this NOCB group.  Caller holds ->nocb_lock
  * and this function releases it.
  */
-static void wake_nocb_gp(struct rcu_data *rdp, bool force,
-  unsigned long flags)
+static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
+unsigned long flags)
__releases(rdp->nocb_lock)
 {
bool needwake = false;
@@ -1643,7 +1643,7 @@ static void wake_nocb_gp(struct rcu_data *rdp, bool force,
trace_rcu_nocb_wake(rcu_state.name, rdp->cpu,
TPS("AlreadyAwake"));
rcu_nocb_unlock_irqrestore(rdp, flags);
-   return;
+   return false;
}
del_timer(>nocb_timer);
rcu_nocb_unlock_irqrestore(rdp, flags);
@@ -1656,6 +1656,8 @@ static void wake_nocb_gp(struct rcu_data *rdp, bool force,
raw_spin_unlock_irqrestore(_gp->nocb_gp_lock, flags);
if (needwake)
wake_up_process(rdp_gp->nocb_gp_kthread);
+
+   return needwake;
 }
 
 /*
@@ -2152,20 +2154,23 @@ static int rcu_nocb_need_deferred_wakeup(struct 
rcu_data *rdp)
 }
 
 /* Do a deferred wakeup of rcu_nocb_kthread(). */
-static void

[tip: sched/core] entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point

2021-02-17 Thread tip-bot2 for Frederic Weisbecker

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 4ae7dc97f726ea95c58ac58af71cc034ad22d7de
Gitweb:
https://git.kernel.org/tip/4ae7dc97f726ea95c58ac58af71cc034ad22d7de
Author:Frederic Weisbecker 
AuthorDate:Mon, 01 Feb 2021 00:05:48 +01:00
Committer: Ingo Molnar 
CommitterDate: Wed, 17 Feb 2021 14:12:43 +01:00

entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point

Following the idle loop model, cleanly check for pending rcuog wakeup
before the last rescheduling point upon resuming to guest mode. This
way we can avoid to do it from rcu_user_enter() with the last resort
self-IPI hack that enforces rescheduling.

Suggested-by: Peter Zijlstra 
Signed-off-by: Frederic Weisbecker 
Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Ingo Molnar 
Cc: sta...@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-6-frede...@kernel.org
---
 arch/x86/kvm/x86.c|  1 +-
 include/linux/entry-kvm.h | 14 -
 kernel/rcu/tree.c | 44 +-
 kernel/rcu/tree_plugin.h  |  1 +-
 4 files changed, 50 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1b404e4..b967c1c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1782,6 +1782,7 @@ EXPORT_SYMBOL_GPL(kvm_emulate_wrmsr);
 
 bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu)
 {
+   xfer_to_guest_mode_prepare();
return vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu) ||
xfer_to_guest_mode_work_pending();
 }
diff --git a/include/linux/entry-kvm.h b/include/linux/entry-kvm.h
index 9b93f85..8b2b1d6 100644
--- a/include/linux/entry-kvm.h
+++ b/include/linux/entry-kvm.h
@@ -47,6 +47,20 @@ static inline int arch_xfer_to_guest_mode_handle_work(struct 
kvm_vcpu *vcpu,
 int xfer_to_guest_mode_handle_work(struct kvm_vcpu *vcpu);
 
 /**
+ * xfer_to_guest_mode_prepare - Perform last minute preparation work that
+ * need to be handled while IRQs are disabled
+ * upon entering to guest.
+ *
+ * Has to be invoked with interrupts disabled before the last call
+ * to xfer_to_guest_mode_work_pending().
+ */
+static inline void xfer_to_guest_mode_prepare(void)
+{
+   lockdep_assert_irqs_disabled();
+   rcu_nocb_flush_deferred_wakeup();
+}
+
+/**
  * __xfer_to_guest_mode_work_pending - Check if work is pending
  *
  * Returns: True if work pending, False otherwise.
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2ebc211..ce17b84 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -678,9 +678,10 @@ EXPORT_SYMBOL_GPL(rcu_idle_enter);
 
 #ifdef CONFIG_NO_HZ_FULL
 
+#if !defined(CONFIG_GENERIC_ENTRY) || !defined(CONFIG_KVM_XFER_TO_GUEST_WORK)
 /*
  * An empty function that will trigger a reschedule on
- * IRQ tail once IRQs get re-enabled on userspace resume.
+ * IRQ tail once IRQs get re-enabled on userspace/guest resume.
  */
 static void late_wakeup_func(struct irq_work *work)
 {
@@ -689,6 +690,37 @@ static void late_wakeup_func(struct irq_work *work)
 static DEFINE_PER_CPU(struct irq_work, late_wakeup_work) =
IRQ_WORK_INIT(late_wakeup_func);
 
+/*
+ * If either:
+ *
+ * 1) the task is about to enter in guest mode and $ARCH doesn't support KVM 
generic work
+ * 2) the task is about to enter in user mode and $ARCH doesn't support 
generic entry.
+ *
+ * In these cases the late RCU wake ups aren't supported in the resched loops 
and our
+ * last resort is to fire a local irq_work that will trigger a reschedule once 
IRQs
+ * get re-enabled again.
+ */
+noinstr static void rcu_irq_work_resched(void)
+{
+   struct rcu_data *rdp = this_cpu_ptr(_data);
+
+   if (IS_ENABLED(CONFIG_GENERIC_ENTRY) && !(current->flags & PF_VCPU))
+   return;
+
+   if (IS_ENABLED(CONFIG_KVM_XFER_TO_GUEST_WORK) && (current->flags & 
PF_VCPU))
+   return;
+
+   instrumentation_begin();
+   if (do_nocb_deferred_wakeup(rdp) && need_resched()) {
+   irq_work_queue(this_cpu_ptr(_wakeup_work));
+   }
+   instrumentation_end();
+}
+
+#else
+static inline void rcu_irq_work_resched(void) { }
+#endif
+
 /**
  * rcu_user_enter - inform RCU that we are resuming userspace.
  *
@@ -702,8 +734,6 @@ static DEFINE_PER_CPU(struct irq_work, late_wakeup_work) =
  */
 noinstr void rcu_user_enter(void)
 {
-   struct rcu_data *rdp = this_cpu_ptr(_data);
-
lockdep_assert_irqs_disabled();
 
/*
@@ -711,13 +741,7 @@ noinstr void rcu_user_enter(void)
 * rescheduling opportunity in the entry code. Trigger a self IPI
 * that will fire and reschedule once we resume in user/guest mode.
 */
-   instrumentation_begin();
-   if (!IS_ENABLED(CONFIG_GENERIC_ENTRY) || (current->flags & PF_VCPU)) {
-   if (do_nocb_deferred_wakeup(rdp) && need_resched())
-

Re: [PATCH RFC 1/4] firmware: Add the support for ZSTD-compressed firmware files

2021-02-17 Thread Luis Chamberlain

On Wed, Jan 27, 2021 at 04:49:36PM +0100, Takashi Iwai wrote:
> Due to the popular demands on ZSTD, here is a patch to add a support
> of ZSTD-compressed firmware files via the direct firmware loader.
> It's just like XZ-compressed file support, providing a decompressor
> with ZSTD.  Since ZSTD API can give the decompression size beforehand,
> the code is even simpler than XZ.
> 
> Signed-off-by: Takashi Iwai 
> ---
>  drivers/base/firmware_loader/Kconfig | 21 ++--
>  drivers/base/firmware_loader/main.c  | 74 ++--
>  2 files changed, 87 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/base/firmware_loader/Kconfig 
> b/drivers/base/firmware_loader/Kconfig
> index 5b24f3959255..f5307978927c 100644
> --- a/drivers/base/firmware_loader/Kconfig
> +++ b/drivers/base/firmware_loader/Kconfig
> @@ -157,17 +157,28 @@ config FW_LOADER_USER_HELPER_FALLBACK
>  
>  config FW_LOADER_COMPRESS
>   bool "Enable compressed firmware support"
> - select FW_LOADER_PAGED_BUF
> - select XZ_DEC
>   help
> This option enables the support for loading compressed firmware
> files. The caller of firmware API receives the decompressed file
> content. The compressed file is loaded as a fallback, only after
> loading the raw file failed at first.
>  
> -   Currently only XZ-compressed files are supported, and they have to
> -   be compressed with either none or crc32 integrity check type (pass
> -   "-C crc32" option to xz command).
> +if FW_LOADER_COMPRESS
> +config FW_LOADER_COMPRESS_XZ
> + bool "Enable XZ-compressed firmware support"
> + select FW_LOADER_PAGED_BUF
> + select XZ_DEC
> + help
> +   This option adds the support for XZ-compressed files.
> +   The files have to be compressed with either none or crc32
> +   integrity check type (pass "-C crc32" option to xz command).
> +
> +config FW_LOADER_COMPRESS_ZSTD
> + bool "Enable ZSTD-compressed firmware support"
> + select ZSTD_DECOMPRESS
> + help
> +   This option adds the support for ZSTD-compressed files.
> +endif # FW_LOADER_COMPRESS
>  
>  config FW_CACHE
>   bool "Enable firmware caching during suspend"
> diff --git a/drivers/base/firmware_loader/main.c 
> b/drivers/base/firmware_loader/main.c
> index 78355095e00d..71332ed4959d 100644
> --- a/drivers/base/firmware_loader/main.c
> +++ b/drivers/base/firmware_loader/main.c
> @@ -34,6 +34,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -362,10 +363,72 @@ int fw_map_paged_buf(struct fw_priv *fw_priv)
>  }
>  #endif
>  
> +/*
> + * ZSTD-compressed firmware support
> + */
> +#ifdef CONFIG_FW_LOADER_COMPRESS_ZSTD
> +static int fw_decompress_zstd(struct device *dev, struct fw_priv *fw_priv,
> +   size_t in_size, const void *in_buffer)
> +{
> + size_t len, out_size, workspace_size;
> + void *workspace, *out_buf;
> + ZSTD_DCtx *ctx;
> + int err;
> +
> + if (fw_priv->data) {
> + out_size = fw_priv->allocated_size;
> + out_buf = fw_priv->data;
> + } else {
> + out_size = ZSTD_findDecompressedSize(in_buffer, in_size);
> + if (out_size == ZSTD_CONTENTSIZE_UNKNOWN ||
> + out_size == ZSTD_CONTENTSIZE_ERROR) {
> + dev_dbg(dev, "%s: invalid decompression size\n", 
> __func__);
> + return -EINVAL;
> + }
> + out_buf = vzalloc(out_size);
> + if (!out_buf)
> + return -ENOMEM;
> + }
> +
> + workspace_size = ZSTD_DCtxWorkspaceBound();
> + workspace = kvzalloc(workspace_size, GFP_KERNEL);
> + if (!workspace) {
> + err = -ENOMEM;
> + goto error;
> + }
> +
> + ctx = ZSTD_initDCtx(workspace, workspace_size);
> + if (!ctx) {
> + dev_dbg(dev, "%s: failed to initialize context\n", __func__);
> + err = -EINVAL;
> + goto error;
> + }
> +
> + len = ZSTD_decompressDCtx(ctx, out_buf, out_size, in_buffer, in_size);
> + if (ZSTD_isError(len)) {
> + dev_dbg(dev, "%s: failed to decompress: %d\n", __func__,
> + ZSTD_getErrorCode(len));
> + err = -EINVAL;
> + goto error;
> + }
> +
> + fw_priv->size = len;
> + if (!fw_priv->data)
> + fw_priv->data = out_buf;
> + err = 0;
> +
> + error:
> + kvfree(workspace);
> + if (!fw_priv->data)
> + vfree(out_buf);
> + return err;
> +}
> +#endif /* CONFIG_FW_LOADER_COMPRESS_ZSTD */
> +
>  /*
>   * XZ-compressed firmware support
>   */
> -#ifdef CONFIG_FW_LOADER_COMPRESS
> +#ifdef CONFIG_FW_LOADER_COMPRESS_XZ
>  /* show an error and return the standard error code */
>  static int fw_decompress_xz_error(struct device *dev, enum xz_ret xz_ret)
>  {
> @@ -459,7 +522,7 @@ static int fw_decompress_xz(struct device *dev, struct 
> fw_priv

Re: [PATCH v2] gpio: pca953x: add support for open drain pins on PCAL6524

2021-02-17 Thread Bedel, Alban

On Tue, 2021-02-16 at 19:50 +0200, Andy Shevchenko wrote:
> On Tue, Feb 16, 2021 at 6:37 PM Bedel, Alban 
> wrote:
> > On Mon, 2021-02-15 at 14:53 +0200, Andy Shevchenko wrote:
> > > Hint: don't forget to include reviewers from previous version
> > 
> > I added you to the CC list for the new patch, did I miss someone
> > else?
> 
> Then we are fine, thanks!
> 
> > > On Thu, Feb 11, 2021 at 7:52 PM Alban Bedel  > > >
> > > wrote:
> > > > From a quick glance at various datasheets the PCAL6524 and the
> > > > PCAL6534 seems to be the only chips in this family that support
> > > > setting the drive mode of single pins. Other chips either don't
> > > > support it at all, or can only set the drive mode of whole
> > > > banks,
> > > > which doesn't map to the GPIO API.
> > > > 
> > > > Add a new flag, PCAL65xx_REGS, to mark chips that have the
> > > > extra
> > > > registers needed for this feature. Then mark the needed
> > > > register
> > > > banks
> > > > as readable and writable, here we don't set OUT_CONF as
> > > > writable,
> > > > although it is, as we only need to read it. Finally add a
> > > > function
> > > > that configures the OUT_INDCONF register when the GPIO API sets
> > > > the
> > > > drive mode of the pins.
> 
> Before continuing on this, have you considered to split this
> particular chip to a real pin controller and use the existing driver
> only for GPIO part of the functionality?

No, this driver already use the ->set_config() mechanism so adding
another property is trivial. On the other hand having a pinctrl driver
would be a massive undertaking as the pinctrl API is quiet complex
iirc. Furthermore, unless I'm missing something, that would not allow a
consumer to request an open drain GPIO which is what I'm after.

> > > > +#define PCAL65xx_REGS  BIT(10)
> > > 
> > > Can we have it as a _TYPE, please?
> > 
> > Let's please take a closer look at these macros and what they mean.
> > Currently we have 3 possible set of functions that are indicated by
> > setting bits in driver_data using the PCA_xxx macros:
> > 
> > - Basic register only: 0
> > - With interrupt support: PCA_INT
> > - With latching interrupt regs: PCA_INT | PCA_PCAL = PCA_LATCH_INT
> > 
> > This patch then add a forth case:
> > 
> > - With pin config regs: PCA_INT | PCA_PCAL |
> > $MACRO_WE_ARE_DICUSSING
> > 
> > Then there is the PCA953X_TYPE and PCA957X_TYPE macros which
> > indicate
> > the need for a different regmap config and register layout.
> 
> Exactly, and you have a different register layout (coincidentally
> overlaps with the original PCA953x).

We have 2 layout for the base registers, the "mixed up registers" of
the PCA957x and the "standard" of the PCA953x. Then we have the
PCAL chips which extend the base PCA953x registers with further
registers for better interrupt handling. These are not treated as a new
type in the current code, but as an extra feature on top of the
PCA953x. The PCAL65xx we are talking about add a further register
block, so following the existing concept its not a new layout.

> > These are
> > accessed using the PCA_CHIP_TYPE() and are used as an integer
> > value,
> > not as bit-field like the above ones. If we had a struct instead of
> > a
> > packed integer that would be a different field.
> 
> How come? It's a bitmask.

The definitions use BIT() but all evaluations of PCA_CHIP_TYPE() use
the equality operator.

> > I'll change it to PCAL65xx_TYPE if you insist, but that seems very
> > wrong to me to use the same naming convention for macros meant for
> > different fields.
> 
> To me it's the opposite. The 6524 defines a new type (which has some
> registers others don't have). We even have already definitions of
> those special registers. I think TYPE is a better approach here.

Let's look at pca953x_writeable_register() which I think clearly show
how chips variants are currently handled. This is just one example but
the rest of the code follows the same concept.

static bool pca953x_writeable_register(struct device *dev, unsigned int reg)
{
struct pca953x_chip *chip = dev_get_drvdata(dev);
u32 bank;

if (PCA_CHIP_TYPE(chip->driver_data) == PCA953X_TYPE) {
bank = PCA953x_BANK_OUTPUT | PCA953x_BANK_POLARITY |
PCA953x_BANK_CONFIG;
} else {
bank = PCA957x_BANK_OUTPUT | PCA957x_BANK_POLARITY |
PCA957x_BANK_CONFIG | PCA957x_BANK_BUSHOLD;
}

if (chip->driver_data & PCA_PCAL)
bank |= PCAL9xxx_BANK_IN_LATCH | PCAL9xxx_BANK_PULL_EN |
PCAL9xxx_BANK_PULL_SEL | PCAL9xxx_BANK_IRQ_MASK;

+   if (chip->driver_data & PCAL65xx_REGS)
+   bank |= PCAL65xx_BANK_INDOUT_CONF;
+
return pca953x_check_register(chip, reg, bank);
}

The chip we are talking about further extend the PCAL registers, so it
is currently covered by PCA953X_TYPE as base type and has the PCA_PCAL
bit set. I really fails to see how this

Re: [PATCH V0 2/6] arm64: dts: qcom: sm8150: Add Data Capture and Compare(DCC) support node

2021-02-17 Thread schowdhu


Hi,

Please find the replies inline.


On 2021-02-17 16:33, Vinod Koul wrote:

On 17-02-21, 12:18, Souradeep Chowdhury wrote:
Add the DCC(Data Capture and Compare) device tree node entry along 
with

the addresses for register regions.


This should be last patch..


Ack





Signed-off-by: Souradeep Chowdhury 
---
 arch/arm64/boot/dts/qcom/sm8150.dtsi | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sm8150.dtsi 
b/arch/arm64/boot/dts/qcom/sm8150.dtsi

index e5bb17b..3198bd3 100644
--- a/arch/arm64/boot/dts/qcom/sm8150.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8150.dtsi
@@ -654,6 +654,13 @@
interrupts = ;
};

+   dcc@010a2000{


no leading zero here and space before {


Ack




+   compatible = "qcom,sm8150-dcc", "qcom,dcc";
+   reg = <0x0 0x010a2000 0x0 0x1000>,
+   <0x0 0x010ad000 0x0 0x3000>;


pls align this to preceding line


Ack




+   reg-names = "dcc-base", "dcc-ram-base";
+   };
+
ufs_mem_hc: ufshc@1d84000 {
compatible = "qcom,sm8150-ufshc", "qcom,ufshc",
 "jedec,ufs-2.0";
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation

Re: [PATCH 05/16] media: i2c: rdacm20: Check return values

2021-02-17 Thread Kieran Bingham

Hi Jacopo,

On 16/02/2021 17:41, Jacopo Mondi wrote:
> The camera module initialization routine does not check the return
> value of a few functions. Fix that.
> 

Sounds quite valid to me.

Reviewed-by: Kieran Bingham 

> Signed-off-by: Jacopo Mondi 
> ---
>  drivers/media/i2c/rdacm20.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/media/i2c/rdacm20.c b/drivers/media/i2c/rdacm20.c
> index 56406d82b5ac..e982373908f2 100644
> --- a/drivers/media/i2c/rdacm20.c
> +++ b/drivers/media/i2c/rdacm20.c
> @@ -470,11 +470,16 @@ static int rdacm20_initialize(struct rdacm20_device 
> *dev)
>*  Ensure that we have a good link configuration before attempting to
>*  identify the device.
>*/
> - max9271_configure_i2c(>serializer, MAX9271_I2CSLVSH_469NS_234NS |
> - MAX9271_I2CSLVTO_1024US |
> - MAX9271_I2CMSTBT_105KBPS);
> + ret = max9271_configure_i2c(>serializer,
> + MAX9271_I2CSLVSH_469NS_234NS |
> + MAX9271_I2CSLVTO_1024US |
> + MAX9271_I2CMSTBT_105KBPS);
> + if (ret)
> + return ret;
>  
> - max9271_configure_gmsl_link(>serializer);
> + ret = max9271_configure_gmsl_link(>serializer);
> + if (ret)
> + return ret;
>  
>   ret = max9271_verify_id(>serializer);
>   if (ret < 0)
>

Re: [PATCH 03/16] media: i2c: rdacm20: Replace goto with a loop

2021-02-17 Thread Kieran Bingham

Hi Jacopo,

On 16/02/2021 17:41, Jacopo Mondi wrote:
> During the camera module initialization the image sensor PID is read to
> verify it can correctly be identified. The current implementation is
> rather confused and uses a loop implemented with a label and a goto.
> 
> Replace it with a more compact for() loop.
> 
> No functional changes intended.

I think there is a functional change in here, but I almost like it.

Before, if the read was successful, it would check to see if the
OV10635_PID == OV10635_VERSION, and if not it would print that the read
was successful but a mismatch.

Now - it will retry again instead, and if at the end of the retries it
still fails then it's a failure.

This means we perhaps don't get told if the device id is not correct in
the same way, but it also means that if the VERSION was not correct
because of a read error (which I believe i've seen occur), it will retry.

Because there is a functional change you might want to update the
commit, but I still think this is a good change overall.

Reviewed-by: Kieran Bingham 

> 
> Signed-off-by: Jacopo Mondi 
> ---
>  drivers/media/i2c/rdacm20.c | 27 ++-
>  1 file changed, 10 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/media/i2c/rdacm20.c b/drivers/media/i2c/rdacm20.c
> index 4d9bac87cba8..6504ed0bd3bc 100644
> --- a/drivers/media/i2c/rdacm20.c
> +++ b/drivers/media/i2c/rdacm20.c
> @@ -59,6 +59,8 @@
>   */
>  #define OV10635_PIXEL_RATE   (4400)
>  
> +#define OV10635_PID_TIMEOUT  3
> +
>  static const struct ov10635_reg {
>   u16 reg;
>   u8  val;
> @@ -452,7 +454,7 @@ static const struct v4l2_subdev_ops rdacm20_subdev_ops = {
>  
>  static int rdacm20_initialize(struct rdacm20_device *dev)
>  {
> - unsigned int retry = 3;
> + unsigned int i;
>   int ret;
>  
>   /* Verify communication with the MAX9271: ping to wakeup. */
> @@ -501,23 +503,14 @@ static int rdacm20_initialize(struct rdacm20_device 
> *dev)
>   return ret;
>   usleep_range(1, 15000);
>  
> -again:
> - ret = ov10635_read16(dev, OV10635_PID);
> - if (ret < 0) {
> - if (retry--)
> - goto again;
> -
> - dev_err(dev->dev, "OV10635 ID read failed (%d)\n",
> - ret);
> - return -ENXIO;
> + for (i = 0; i < OV10635_PID_TIMEOUT; ++i) {
> + ret = ov10635_read16(dev, OV10635_PID);
> + if (ret == OV10635_VERSION)
> + break;
> + usleep_range(1000, 2000);
>   }
> -
> - if (ret != OV10635_VERSION) {
> - if (retry--)
> - goto again;
> -
> - dev_err(dev->dev, "OV10635 ID mismatch (0x%04x)\n",
> - ret);
> + if (i == OV10635_PID_TIMEOUT) {
> + dev_err(dev->dev, "OV10635 ID read failed (%d)\n", ret);
>   return -ENXIO;
>   }
>  
>

Re: [PATCH 04/16] media: i2c: rdacm20: Report camera module name

2021-02-17 Thread Kieran Bingham

Hi Jacopo,

On 16/02/2021 17:41, Jacopo Mondi wrote:
> When the device is identified the driver currently reports the
> names of the chips embedded in the camera module.
> 
> Report the name of the camera module itself instead.
> Cosmetic change only.

Aha, there go all my scripts that rely on this string to know if the
camera was found ... but I can fix that ;-)

Reviewed-by: Kieran Bingham 

> 
> Signed-off-by: Jacopo Mondi 
> ---
>  drivers/media/i2c/rdacm20.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/media/i2c/rdacm20.c b/drivers/media/i2c/rdacm20.c
> index 6504ed0bd3bc..56406d82b5ac 100644
> --- a/drivers/media/i2c/rdacm20.c
> +++ b/drivers/media/i2c/rdacm20.c
> @@ -532,7 +532,7 @@ static int rdacm20_initialize(struct rdacm20_device *dev)
>   if (ret)
>   return ret;
>  
> - dev_info(dev->dev, "Identified MAX9271 + OV10635 device\n");
> + dev_info(dev->dev, "Identified RDACM20 camera module\n");
>  
>   /*
>* Set reverse channel high threshold to increase noise immunity.
>

Re: [PATCH 3/3] tools/lib/fs: Cache cgroupfs mount point

2021-02-17 Thread Arnaldo Carvalho de Melo

Em Fri, Jan 08, 2021 at 02:51:44PM +0900, Namhyung Kim escreveu:
> On Wed, Jan 6, 2021 at 10:33 AM Namhyung Kim  wrote:
> >
> > Hi Arnaldo,
> >
> > On Tue, Dec 29, 2020 at 8:51 PM Arnaldo Carvalho de Melo
> >  wrote:
> > >
> > > Em Wed, Dec 16, 2020 at 06:05:56PM +0900, Namhyung Kim escreveu:
> > > > Currently it parses the /proc file everytime it opens a file in the
> > > > cgroupfs.  Save the last result to avoid it (assuming it won't be
> > > > changed between the accesses).
> > >
> > > Which is the most likely case, but can't we use something like inotify
> > > to detect that and bail out or warn the user?
> >
> > Hmm.. looks doable.  Will check.
> 
> So I've played with inotify a little bit, and it seems it needs to monitor
> changes on the file or the directory.  I didn't get any notification from
> the /proc/mounts file even if I did some mount/umount.
> 
> Instead, I could get IN_UNMOUNT when the cgroup filesystem was
> unmounted.  But for the monitoring, we need to do one of a) select-like
> syscall to wait for the events, b) signal-driven IO notification or c) read
> the inotify file with non-block mode everytime.
> 
> In a library code, I don't think we can do a) or b) since it can affect
> user program behaviors.  Then we should go with c) but I think
> it's opposite to the purpose of this patch. :)
> 
> As you said, I think mostly we don't care as the accesses will happen
> in a short period of time.  But if you really care, maybe for the upcoming
> perf daemon changes, I think we can add an API to invalidate the cache
> or internal time-based invalidation logic (like remove it after 10 sec.).

Ok, we can have something in 'perf daemon' to periodically invalidate
this, maybe do a poor man inotify and when asking for the cgroup
mountpoint, check some characteristic of that file that changes when it
is modified, or plain use a timestamp and have some threshold.

- Arnaldo
 
> Thoughts?
> 
> Thanks,
> Namhyung

-- 

- Arnaldo

Re: [PATCH 02/16] media: i2c: rdacm20: Embedded 'serializer' field

2021-02-17 Thread Kieran Bingham

Hi Jacopo,

On 16/02/2021 17:41, Jacopo Mondi wrote:
> There's no reason to allocate dynamically the 'serializer' field in
> the driver structure.
> 
> Embed the field and adjust all its users in the driver.
> 
> Signed-off-by: Jacopo Mondi 
> ---
>  drivers/media/i2c/rdacm20.c | 38 -
>  1 file changed, 16 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/media/i2c/rdacm20.c b/drivers/media/i2c/rdacm20.c
> index f7fd5ae955d0..4d9bac87cba8 100644
> --- a/drivers/media/i2c/rdacm20.c
> +++ b/drivers/media/i2c/rdacm20.c
> @@ -312,7 +312,7 @@ static const struct ov10635_reg {
>  
>  struct rdacm20_device {
>   struct device   *dev;
> - struct max9271_device   *serializer;
> + struct max9271_device   serializer;
>   struct i2c_client   *sensor;
>   struct v4l2_subdev  sd;
>   struct media_padpad;
> @@ -399,7 +399,7 @@ static int rdacm20_s_stream(struct v4l2_subdev *sd, int 
> enable)
>  {
>   struct rdacm20_device *dev = sd_to_rdacm20(sd);
>  
> - return max9271_set_serial_link(dev->serializer, enable);
> + return max9271_set_serial_link(>serializer, enable);
>  }
>  
>  static int rdacm20_enum_mbus_code(struct v4l2_subdev *sd,
> @@ -456,11 +456,11 @@ static int rdacm20_initialize(struct rdacm20_device 
> *dev)
>   int ret;
>  
>   /* Verify communication with the MAX9271: ping to wakeup. */
> - dev->serializer->client->addr = MAX9271_DEFAULT_ADDR;
> - i2c_smbus_read_byte(dev->serializer->client);
> + dev->serializer.client->addr = MAX9271_DEFAULT_ADDR;
> + i2c_smbus_read_byte(dev->serializer.client);
>  
>   /* Serial link disabled during config as it needs a valid pixel clock. 
> */
> - ret = max9271_set_serial_link(dev->serializer, false);
> + ret = max9271_set_serial_link(>serializer, false);
>   if (ret)
>   return ret;
>  
> @@ -468,35 +468,35 @@ static int rdacm20_initialize(struct rdacm20_device 
> *dev)
>*  Ensure that we have a good link configuration before attempting to
>*  identify the device.
>*/
> - max9271_configure_i2c(dev->serializer, MAX9271_I2CSLVSH_469NS_234NS |
> -MAX9271_I2CSLVTO_1024US |
> -MAX9271_I2CMSTBT_105KBPS);
> + max9271_configure_i2c(>serializer, MAX9271_I2CSLVSH_469NS_234NS |
> + MAX9271_I2CSLVTO_1024US |
> + MAX9271_I2CMSTBT_105KBPS);
>  
> - max9271_configure_gmsl_link(dev->serializer);
> + max9271_configure_gmsl_link(>serializer);
>  
> - ret = max9271_verify_id(dev->serializer);
> + ret = max9271_verify_id(>serializer);
>   if (ret < 0)
>   return ret;
>  
> - ret = max9271_set_address(dev->serializer, dev->addrs[0]);
> + ret = max9271_set_address(>serializer, dev->addrs[0]);
>   if (ret < 0)
>   return ret;
> - dev->serializer->client->addr = dev->addrs[0];
> + dev->serializer.client->addr = dev->addrs[0];
>  
>   /*
>* Reset the sensor by cycling the OV10635 reset signal connected to the
>* MAX9271 GPIO1 and verify communication with the OV10635.
>*/
> - ret = max9271_enable_gpios(dev->serializer, MAX9271_GPIO1OUT);
> + ret = max9271_enable_gpios(>serializer, MAX9271_GPIO1OUT);
>   if (ret)
>   return ret;
>  
> - ret = max9271_clear_gpios(dev->serializer, MAX9271_GPIO1OUT);
> + ret = max9271_clear_gpios(>serializer, MAX9271_GPIO1OUT);
>   if (ret)
>   return ret;
>   usleep_range(1, 15000);
>  
> - ret = max9271_set_gpios(dev->serializer, MAX9271_GPIO1OUT);
> + ret = max9271_set_gpios(>serializer, MAX9271_GPIO1OUT);
>   if (ret)
>   return ret;
>   usleep_range(1, 15000);
> @@ -560,13 +560,7 @@ static int rdacm20_probe(struct i2c_client *client)
>   if (!dev)
>   return -ENOMEM;
>   dev->dev = >dev;
> -
> - dev->serializer = devm_kzalloc(>dev, sizeof(*dev->serializer),
> -GFP_KERNEL);
> - if (!dev->serializer)
> - return -ENOMEM;

Every allocation removed is a win.

Reviewed-by: Kieran Bingham 


> -
> - dev->serializer->client = client;
> + dev->serializer.client = client;
>  
>   ret = of_property_read_u32_array(client->dev.of_node, "reg",
>dev->addrs, 2);
>

Re: [PATCH v4] perf tools: Fix arm64 build error with gcc-11

2021-02-17 Thread Arnaldo Carvalho de Melo

Em Wed, Feb 17, 2021 at 07:58:30PM +0800, Jianlin Lv escreveu:
> gcc version: 11.0.0 20210208 (experimental) (GCC)
> 
> Following build error on arm64:
> 
> ...
> In function ‘printf’,
> inlined from ‘regs_dump__printf’ at util/session.c:1141:3,
> inlined from ‘regs__printf’ at util/session.c:1169:2:
> /usr/include/aarch64-linux-gnu/bits/stdio2.h:107:10: \
>   error: ‘%-5s’ directive argument is null [-Werror=format-overflow=]
> 
> 107 |   return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, \
> __va_arg_pack ());
> 
> ..
> In function ‘fprintf’,
>   inlined from ‘perf_sample__fprintf_regs.isra’ at \
> builtin-script.c:622:14:
> /usr/include/aarch64-linux-gnu/bits/stdio2.h:100:10: \
> error: ‘%5s’ directive argument is null [-Werror=format-overflow=]
>   100 |   return __fprintf_chk (__stream, __USE_FORTIFY_LEVEL - 1, __fmt,
>   101 | __va_arg_pack ());
> 
> cc1: all warnings being treated as errors
> ...
> 
> This patch fixes Wformat-overflow warnings. Add helper function to
> convert NULL to "unknown".

I think this is the right approach, but since both return a string, it
is strange that only one of them have its _str() at the end, what we
usually do in such cases is to have:

  const char *__perf_reg_name(int id)
  {
return NULL if id unknown;
  }

  And:

  static inline const char *perf_reg_name(int id)
  {
const char *name = __perf_reg_name(id);
return name ?: "unknown";
  }

Ok?

- Arnaldo
 
> Signed-off-by: Jianlin Lv 
> ---
> v2: Add ternary operator to avoid similar errors in other arch.
> v3: Declared reg_name in inner block.
> v4: Add helper function: perf_reg_name_str, update changelog.
> ---
>  tools/perf/builtin-script.c   |  2 +-
>  tools/perf/util/perf_regs.h   | 11 ++-
>  .../perf/util/scripting-engines/trace-event-python.c  |  2 +-
>  tools/perf/util/session.c |  2 +-
>  4 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index 42dad4a0f8cf..35cddca2c7a7 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -643,7 +643,7 @@ static int perf_sample__fprintf_regs(struct regs_dump 
> *regs, uint64_t mask,
>  
>   for_each_set_bit(r, (unsigned long *) , sizeof(mask) * 8) {
>   u64 val = regs->regs[i++];
> - printed += fprintf(fp, "%5s:0x%"PRIx64" ", perf_reg_name(r), 
> val);
> + printed += fprintf(fp, "%5s:0x%"PRIx64" ", 
> perf_reg_name_str(r), val);
>   }
>  
>   return printed;
> diff --git a/tools/perf/util/perf_regs.h b/tools/perf/util/perf_regs.h
> index a45499126184..e4a0a6f5408e 100644
> --- a/tools/perf/util/perf_regs.h
> +++ b/tools/perf/util/perf_regs.h
> @@ -33,13 +33,22 @@ extern const struct sample_reg sample_reg_masks[];
>  
>  int perf_reg_value(u64 *valp, struct regs_dump *regs, int id);
>  
> +static inline const char *perf_reg_name_str(int id)
> +{
> + const char *str = perf_reg_name(id);
> +
> + if (!str)
> + return "unknown";
> + return str;
> +}
> +
>  #else
>  #define PERF_REGS_MASK   0
>  #define PERF_REGS_MAX0
>  
>  #define DWARF_MINIMAL_REGS PERF_REGS_MASK
>  
> -static inline const char *perf_reg_name(int id __maybe_unused)
> +static inline const char *perf_reg_name_str(int id __maybe_unused)
>  {
>   return "unknown";
>  }
> diff --git a/tools/perf/util/scripting-engines/trace-event-python.c 
> b/tools/perf/util/scripting-engines/trace-event-python.c
> index c83c2c6564e0..361307026485 100644
> --- a/tools/perf/util/scripting-engines/trace-event-python.c
> +++ b/tools/perf/util/scripting-engines/trace-event-python.c
> @@ -702,7 +702,7 @@ static int regs_map(struct regs_dump *regs, uint64_t 
> mask, char *bf, int size)
>  
>   printed += scnprintf(bf + printed, size - printed,
>"%5s:0x%" PRIx64 " ",
> -  perf_reg_name(r), val);
> +  perf_reg_name_str(r), val);
>   }
>  
>   return printed;
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 25adbcce0281..0737d3e7e698 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1140,7 +1140,7 @@ static void regs_dump__printf(u64 mask, u64 *regs)
>   u64 val = regs[i++];
>  
>   printf(" %-5s 0x%016" PRIx64 "\n",
> -perf_reg_name(rid), val);
> +perf_reg_name_str(rid), val);
>   }
>  }
>  
> -- 
> 2.25.1
> 

-- 

- Arnaldo

Re: [PATCH 01/16] media: i2c: rdacm20: Enable noise immunity

2021-02-17 Thread Kieran Bingham

Hi Jacopo,

On 16/02/2021 17:41, Jacopo Mondi wrote:
> Enable the noise immunity threshold at the end of the rdacm20
> initialization routine.
> 
> The rdcam20 camera module has been so far tested with a startup
> delay that allowed the embedded MCU to program the serializer. If
> the initialization routine is run before the MCU programs the
> serializer and the image sensor and their addresses gets changed
> by the rdacm20 driver it is required to manually enable the noise
> immunity threshold to make the communication on the control channel
> more reliable.
> 

Oh, this is interesting, ... booting up without the delays would be ...
much nicer.

> Signed-off-by: Jacopo Mondi 
> ---
>  drivers/media/i2c/rdacm20.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/media/i2c/rdacm20.c b/drivers/media/i2c/rdacm20.c
> index 90eb73f0e6e9..f7fd5ae955d0 100644
> --- a/drivers/media/i2c/rdacm20.c
> +++ b/drivers/media/i2c/rdacm20.c
> @@ -541,7 +541,13 @@ static int rdacm20_initialize(struct rdacm20_device *dev)
>  
>   dev_info(dev->dev, "Identified MAX9271 + OV10635 device\n");
>  
> - return 0;
> + /*
> +  * Set reverse channel high threshold to increase noise immunity.
> +  *
> +  * This should be compensated by increasing the reverse channel
> +  * amplitude on the remote deserializer side.
> +  */
> + return max9271_set_high_threshold(>serializer, true);

Does this work 'out of the box' ? I.e. if this patch is applied, I
assume it is required to remove the regulator delays that I/we have in DT?

Likewise, does that note mean this patch must also be accompanied by the
update in max9286 somehow?

I guess we can't keep 'test bisectability' with this very easily so it
probably doesn't matter too much, the end result will be the interesting
part.

Reviewed-by: Kieran Bingham 



>  }
>  
>  static int rdacm20_probe(struct i2c_client *client)
>

Re: [PATCH v2] perf tools: Resolve symbols against debug file first

2021-02-17 Thread Arnaldo Carvalho de Melo

Em Wed, Feb 17, 2021 at 01:21:25PM +0100, Jiri Slaby escreveu:
> With LTO, there are symbols like these:
> /usr/lib/debug/usr/lib64/libantlr4-runtime.so.4.8-4.8-1.4.x86_64.debug
>  10305: 00955fa4 0 NOTYPE  LOCAL  DEFAULT   29 
> Predicate.cpp.2bc410e7
> 
> This comes from a runtime/debug split done by the standard way:
> objcopy --only-keep-debug $runtime $debug
> objcopy --add-gnu-debuglink=$debugfn -R .comment -R .GCC.command.line 
> --strip-all $runtime
> 
> perf currently cannot resolve such symbols (relicts of LTO), as section
> 29 exists only in the debug file (29 is .debug_info). And perf resolves
> symbols only against runtime file. This results in all symbols from such
> a library being unresolved:
>  0.38%  main2libantlr4-runtime.so.4.8  [.] 0x000671e0
> 
> So try resolving against the debug file first. And only if it fails (the
> section has NOBITS set), try runtime file. We can do this, as "objcopy
> --only-keep-debug" per documentation preserves all sections, but clears
> data of some of them (the runtime ones) and marks them as NOBITS.
> 
> The correct result is now:
>  0.38%  main2libantlr4-runtime.so.4.8  [.] 
> antlr4::IntStream::~IntStream
> 
> Note that these LTO symbols are properly skipped anyway as they belong
> neither to *text* nor to *data* (is_label && !elf_sec__filter(,
> secstrs) is true).
> 
> Signed-off-by: Jiri Slaby 
> Acked-by: Namhyung Kim 

Thanks, applied.

- Arnaldo

> Cc: Peter Zijlstra 
> Cc: Ingo Molnar 
> Cc: Arnaldo Carvalho de Melo 
> Cc: Mark Rutland 
> Cc: Alexander Shishkin 
> Cc: Jiri Olsa 
> ---
> [v2] added a comment
> 
>  tools/perf/util/symbol-elf.c | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
> index f3577f7d72fe..ecc05aa8399d 100644
> --- a/tools/perf/util/symbol-elf.c
> +++ b/tools/perf/util/symbol-elf.c
> @@ -1226,12 +1226,26 @@ int dso__load_sym(struct dso *dso, struct map *map, 
> struct symsrc *syms_ss,
>   if (sym.st_shndx == SHN_ABS)
>   continue;
>  
> - sec = elf_getscn(runtime_ss->elf, sym.st_shndx);
> + sec = elf_getscn(syms_ss->elf, sym.st_shndx);
>   if (!sec)
>   goto out_elf_end;
>  
>   gelf_getshdr(sec, );
>  
> + /*
> +  * We have to fallback to runtime when syms' section header has
> +  * NOBITS set. NOBITS results in file offset (sh_offset) not
> +  * being incremented. So sh_offset used below has different
> +  * values for syms (invalid) and runtime (valid).
> +  */
> + if (shdr.sh_type == SHT_NOBITS) {
> + sec = elf_getscn(runtime_ss->elf, sym.st_shndx);
> + if (!sec)
> + goto out_elf_end;
> +
> + gelf_getshdr(sec, );
> + }
> +
>   if (is_label && !elf_sec__filter(, secstrs))
>   continue;
>  
> -- 
> 2.30.1
> 

-- 

- Arnaldo

Re: ANNOUNCE: pahole v1.20 (gcc11 DWARF5's default, lots of ELF sections, BTF)

2021-02-17 Thread Arnaldo Carvalho de Melo

Em Wed, Feb 17, 2021 at 01:08:23PM +0100, Domenico Andreoli escreveu:
> On Mon, Feb 08, 2021 at 09:32:53AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Feb 08, 2021 at 03:44:54AM +0100, Sedat Dilek escreveu:
> > > On Thu, Feb 4, 2021 at 11:07 PM Arnaldo Carvalho de Melo
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > The v1.20 release of pahole and its friends is out, mostly
> > > > addressing problems related to gcc 11 defaulting to DWARF5 for -g,
> > > > available at the usual places:
> > > >
> > > > Main git repo:
> > > >
> > > >git://git.kernel.org/pub/scm/devel/pahole/pahole.git
> > > >
> > > > Mirror git repo:
> > > >
> > > >https://github.com/acmel/dwarves.git
> > > >
> > > > tarball + gpg signature:
> > > >
> > > >https://fedorapeople.org/~acme/dwarves/dwarves-1.20.tar.xz
> > > >https://fedorapeople.org/~acme/dwarves/dwarves-1.20.tar.bz2
> > > >https://fedorapeople.org/~acme/dwarves/dwarves-1.20.tar.sign
> > > >
> > > 
> > > FYI:
> > > Debian now ships dwarves package version 1.20-1 in unstable.
> > > 
> > > Just a small nit to this release and its tagging:
> > > 
> > > You did:
> > > commit 0d415f68c468b77c5bf8e71965cd08c6efd25fc4 ("pahole: Prep 1.20")
> > > 
> > > Is this new?
> > > 
> > > The release before:
> > > commit dd15aa4b0a6421295cbb7c3913429142fef8abe0 ("dwarves: Prep v1.19")
> > 
> > Its minor but intentional, pahole is by far the most well known tool in
> > dwarves, so using that name more frequently (the git repo is pahole.git
> > , for instance) may help more quickly associate with the tool needed for
> > BTF encoding, data analysis, etc. And since its not about only DWARF,
> > perhaps transitioning to using 'pahole' more widely is interesting.
 
> Any plan to switch also the release tarball name?
 
> We are planning to rename the Debian package once the Bullseye is
> released, currently it's dwarves-dfsg for legacy/unclear reasons.
 
> Would it be a good idea to switch directly to pahole then?

Yeah, I think it is, I'll check what are the bureaucratic steps to do
that rename on Fedora and RHEL, but then, no need for all distros to do
it at once if that is something that requires long term planning or
whatever.

- Arnaldo
 
> Dom
> 
> > 
> > - Arnaldo
> >  
> > > - Sedat -
> > > 
> > > > Best Regards,
> > > >
> > > >  - Arnaldo
> > > >
> > > > v1.20:
> > > >
> > > > BTF encoder:
> > > >
> > > >   - Improve ELF error reporting using elf_errmsg(elf_errno()).
> > > >
> > > >   - Improve objcopy error handling.
> > > >
> > > >   - Fix handling of 'restrict' qualifier, that was being treated as a 
> > > > 'const'.
> > > >
> > > >   - Support SHN_XINDEX in st_shndx symbol indexes, to handle ELF 
> > > > objects with
> > > > more than 65534 sections, for instance, which happens with kernels 
> > > > built
> > > > with 'KCFLAGS="-ffunction-sections -fdata-sections", Other cases may
> > > > include when using FG-ASLR, LTO.
> > > >
> > > >   - Cope with functions without a name, as seen sometimes when building 
> > > > kernel
> > > > images with some versions of clang, when a SEGFAULT was taking 
> > > > place.
> > > >
> > > >   - Fix BTF variable generation for kernel modules, not skipping 
> > > > variables at
> > > > offset zero.
> > > >
> > > >   - Fix address size to match what is in the ELF file being processed, 
> > > > to fix using
> > > > a 64-bit pahole binary to generate BTF for a 32-bit vmlinux image.
> > > >
> > > >   - Use kernel module ftrace addresses when finding which functions to 
> > > > encode,
> > > > which increases the number of functions encoded.
> > > >
> > > > libbpf:
> > > >
> > > >   - Allow use of packaged version, for distros wanting to dynamically 
> > > > link with
> > > > the system's libbpf package instead of using the libbpf git 
> > > > submodule shipped
> > > > in pahole's source code.
> > > >
> > > > DWARF loader:
> > > >
> > > >   - Support DW_AT_data_bit_offset
> > > >
> > > > This appeared in DWARF4 but is supported only in gcc's -gdwarf-5,
> > > > support it in a way that makes the output be the same for both 
> > > > cases.
> > > >
> > > >   $ gcc -gdwarf-5 -c examples/dwarf5/bf.c
> > > >   $ pahole bf.o
> > > >   struct pea {
> > > > long int   a:1;  /* 0: 
> > > > 0  8 */
> > > > long int   b:1;  /* 0: 
> > > > 1  8 */
> > > > long int   c:1;  /* 0: 
> > > > 2  8 */
> > > >
> > > > /* XXX 29 bits hole, try to pack */
> > > > /* Bitfield combined with next fields */
> > > >
> > > > intafter_bitfield;   /* 4   
> > > >   4 */
> > > >
> > > > /* size: 8, cachelines: 1, members: 4 */
> > > > /* sum members: 4 */
> > > > /* sum bitfield members: 3 bits, bit holes: 1, sum bit 
> > > > holes: 29 bits */
> > > >

Re: [PATCH v3 1/3] platform/x86: dell-privacy: Add support for Dell hardware privacy

2021-02-17 Thread Perry Yuan


Hi Pierre:
On 2021/2/16 22:56, Pierre-Louis Bossart wrote:



+static const struct acpi_device_id privacy_acpi_device_ids[] = {
+    {"PNP0C09", 0},
+    { },
+};
+MODULE_DEVICE_TABLE(acpi, privacy_acpi_device_ids);
+
+static struct platform_driver dell_privacy_platform_drv = {
+    .driver = {
+    .name = PRIVACY_PLATFORM_NAME,
+    .acpi_match_table = ACPI_PTR(privacy_acpi_device_ids),
+    },


no .probe?
Originally i added the probe here, but it cause the driver  .probe 
called twice. after i use platform_driver_probe to register the driver 
loading process, the duplicated probe issue resolved.


I



+    .remove = dell_privacy_acpi_remove,
+};
+
+int __init dell_privacy_acpi_init(void)
+{
+    int err;
+    struct platform_device *pdev;
+    int privacy_capable = wmi_has_guid(DELL_PRIVACY_GUID);
+
+    if (!wmi_has_guid(DELL_PRIVACY_GUID))
+    return -ENODEV;
+
+    privacy_acpi = kzalloc(sizeof(*privacy_acpi), GFP_KERNEL);
+    if (!privacy_acpi)
+    return -ENOMEM;
+
+    pdev = platform_device_register_simple(
+    PRIVACY_PLATFORM_NAME, PLATFORM_DEVID_NONE, NULL, 0);
+    if (IS_ERR(pdev)) {
+    err = PTR_ERR(pdev);
+    goto pdev_err;
+    }
+    err = platform_driver_probe(_privacy_platform_drv,
+    dell_privacy_acpi_probe);
+    if (err)
+    goto pdrv_err;


why is the probe done here? Put differently, what prevents you from 
using a 'normal' platform driver, and do the leds_setup in the .probe()?
At first ,I used the normal platform driver framework, however tt 
cause the driver  .probe called twice. after i use 
platform_driver_probe to register the driver loading process, the 
duplicated probe issue resolved.


This sounds very odd...

this looks like a conflict with the ACPI subsystem finding a device and 
probing the driver that's associated with the PNP0C09 HID, and then this 
module __init  creating a device manually which leads to a second probe


Neither the platform_device_register_simple() nor 
platform_driver_probe() seem necessary?Because this privacy acpi driver file has dependency on the ec handle, 
so i want to determine if the driver can be loaded basing on the EC ID 
PNP0C09 matching.


So far,It works well for me to register the privacy driver with  the 
register sequence.
Dose it hurt to keep current registering process with 
platform_driver_probe used?


Perry

Re: [PATCH v1 1/3] string: Consolidate yesno() helpers under string.h hood

2021-02-17 Thread Petr Mladek

On Mon 2021-02-15 16:39:26, Andy Shevchenko wrote:
> +Cc: Sakari and printk people
> 
> On Mon, Feb 15, 2021 at 4:28 PM Christian König
>  wrote:
> > Am 15.02.21 um 15:21 schrieb Andy Shevchenko:
> > > We have already few similar implementation and a lot of code that can 
> > > benefit
> > > of the yesno() helper.  Consolidate yesno() helpers under string.h hood.
> > >
> > > Signed-off-by: Andy Shevchenko 
> >
> > Looks like a good idea to me, feel free to add an Acked-by: Christian
> > König  to the series.
> 
> Thanks.
> 
> > But looking at the use cases for this, wouldn't it make more sense to
> > teach kprintf some new format modifier for this?
> 
> As a next step? IIRC Sakari has at some point the series converted
> yesno and Co. to something which I don't remember the details of.
> 
> Guys, what do you think?

Honestly, I think that yesno() is much easier to understand than %py.
And %py[DOY] looks really scary. It has been suggested at
https://lore.kernel.org/lkml/ycqannr7ynryd...@smile.fi.intel.com/#t

Yes, enabledisable() is hard to parse but it is still self-explaining
and can be found easily by cscope. On the contrary, %pyD will likely
print some python code and it is not clear if it would be compatible
with v3. I am just kidding but you get the picture.

Best Regards,
Petr

linux-next: Tree for Feb 17

2021-02-17 Thread Stephen Rothwell

Hi all,

Changes since 20210216:

The pm tree gained a conflict against the i3c tree.

The net-next tree gained conflicts against the net tree.

The tip tree gained a conflict against the pm tree.

Non-merge commits (relative to Linus' tree): 10598
 10682 files changed, 464339 insertions(+), 296846 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig and htmldocs. And finally, a simple boot test
of the powerpc pseries_le_defconfig kernel in qemu (with and without
kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 333 trees (counting Linus' and 87 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (f40ddce88593 Linux 5.11)
Merging fixes/fixes (e71ba9452f0b Linux 5.11-rc2)
Merging kbuild-current/fixes (fe968c41ac4f scripts: set proper OpenSSL include 
dir also for sign-file)
Merging arc-current/for-curr (7c53f6b671f4 Linux 5.11-rc3)
Merging arm-current/fixes (4d62e81b60d4 ARM: kexec: fix oops after TLB are 
invalidated)
Merging arm64-fixes/for-next/fixes (68d54ceeec0e arm64: mte: Allow 
PTRACE_PEEKMTETAGS access to the zero page)
Merging arm-soc-fixes/arm/fixes (090e502e4e63 Merge tag 
'socfpga_dts_fix_for_v5.12' of 
git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux into arm/fixes)
Merging drivers-memory-fixes/fixes (5c8fe583cce5 Linux 5.11-rc1)
Merging m68k-current/for-linus (c396dd2ec5bb macintosh/adb-iop: Use big-endian 
autopoll mask)
Merging powerpc-fixes/fixes (8c511eff1827 powerpc/kuap: Allow kernel thread to 
access userspace after kthread_use_mm)
Merging s390-fixes/fixes (92bf22614b21 Linux 5.11-rc7)
Merging sparc/master (0a95a6d1a4cd sparc: use for_each_child_of_node() macro)
Merging fscrypt-current/for-stable (d19d8d345eec fscrypt: fix inline encryption 
not used on new files)
Merging net/master (25c5a7e89b1d net: ipa: initialize all resources)
Merging bpf/master (57baf8cc70ea net: axienet: Handle deferred probe on clock 
properly)
Merging ipsec/master (57baf8cc70ea net: axienet: Handle deferred probe on clock 
properly)
Merging netfilter/master (57baf8cc70ea net: axienet: Handle deferred probe on 
clock properly)
Merging ipvs/master (44a674d6f798 Merge tag 'mlx5-fixes-2021-01-26' of 
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux)
Merging wireless-drivers/master (93a1d4791c10 mt76: dma: fix a possible memory 
leak in mt76_add_fragment())
Merging mac80211/master (57baf8cc70ea net: axienet: Handle deferred probe on 
clock properly)
Merging rdma-fixes/for-rc (1048ba83fb1c Linux 5.11-rc6)
Merging sound-current/for-linus (c3bb2b521944 ALSA: hda/realtek: Quirk for HP 
Spectre x360 14 amp setup)
Merging sound-asoc-fixes/for-linus (919fb44b0840 Merge remote-tracking branch 
'asoc/for-5.12' into asoc-linus)
Merging regmap-fixes/for-linus (19c329f68089 Linux 5.11-rc4)
Merging regulator-fixes/for-linus (8571bdc21388 Merge remote-tracking branch 
'regulator/for-5.11' into regulator-linus)
Merging spi-fixes/for-linus (110bc220aaab Merge remote-tracking branch 
'spi/for-5.11' into spi-linus)
Merging pci-current/for-linus (7e69d07d7c3c Revert "PCI/ASPM: Save/restore L1SS 
Capability for suspend/resume")
Merging driver-core.current/driver-core-linus (6ee1d745b7c9 Linux 5.11-rc5)
Merging tty.current/tty-linus (6ee1d745b7c9 Linux 5.11-rc5)
Merging usb.current/usb-linus (92bf22614b21 Linux 5.11-rc7)
Merging usb-gadget-fixes/fixes (129aa9734559 usb: raw-gadget: fix memory leak 
in gadget_setup)
Merging usb-serial-fixes/usb-linus (92bf22614b21 Linux

Re: [PATCH v3] perf probe: fix kretprobe issue caused by GCC bug

2021-02-17 Thread Masami Hiramatsu

On Tue, 16 Feb 2021 18:30:26 +0800
Jianlin Lv  wrote:

> Perf failed to add kretprobe event with debuginfo of vmlinux which is
> compiled by gcc with -fpatchable-function-entry option enabled.
> The same issue with kernel module.
> 
> Issue:
> 
>   # perf probe  -v 'kernel_clone%return $retval'
>   ..
>   Writing event: r:probe/kernel_clone__return _text+599624 $retval
>   Failed to write event: Invalid argument
> Error: Failed to add events. Reason: Invalid argument (Code: -22)
> 
>   # cat /sys/kernel/debug/tracing/error_log
>   [156.75] trace_kprobe: error: Retprobe address must be an function entry
>   Command: r:probe/kernel_clone__return _text+599624 $retval
> ^
> 
>   # llvm-dwarfdump  vmlinux |grep  -A 10  -w 0x00df2c2b
>   0x00df2c2b:   DW_TAG_subprogram
> DW_AT_external  (true)
> DW_AT_name  ("kernel_clone")
> DW_AT_decl_file ("/home/code/linux-next/kernel/fork.c")
> DW_AT_decl_line (2423)
> DW_AT_decl_column   (0x07)
> DW_AT_prototyped(true)
> DW_AT_type  (0x00dcd492 "pid_t")
> DW_AT_low_pc(0x800010092648)
> DW_AT_high_pc   (0x800010092b9c)
> DW_AT_frame_base(DW_OP_call_frame_cfa)
> 
>   # cat /proc/kallsyms |grep kernel_clone
>   800010092640 T kernel_clone
>   # readelf -s vmlinux |grep -i kernel_clone
>   183173: 800010092640  1372 FUNCGLOBAL DEFAULT2 kernel_clone
> 
>   # objdump -d vmlinux |grep -A 10  -w \:
>   800010092640 :
>   800010092640:   d503201fnop
>   800010092644:   d503201fnop
>   800010092648:   d503233fpaciasp
>   80001009264c:   a9b87bfdstp x29, x30, [sp, #-128]!
>   800010092650:   910003fdmov x29, sp
>   800010092654:   a90153f3stp x19, x20, [sp, #16]
> 
> The entry address of kernel_clone converted by debuginfo is _text+599624
> (0x92648), which is consistent with the value of DW_AT_low_pc attribute.
> But the symbolic address of kernel_clone from /proc/kallsyms is
> 800010092640.
> 
> This issue is found on arm64, -fpatchable-function-entry=2 is enabled when
> CONFIG_DYNAMIC_FTRACE_WITH_REGS=y;
> Just as objdump displayed the assembler contents of kernel_clone,
> GCC generate 2 NOPs  at the beginning of each function.
> 
> kprobe_on_func_entry detects that (_text+599624) is not the entry address
> of the function, which leads to the failure of adding kretprobe event.
> 
> kprobe_on_func_entry
> ->_kprobe_addr
> ->kallsyms_lookup_size_offset
> ->arch_kprobe_on_func_entry   // FALSE
> 
> The cause of the issue is that the first instruction in the compile unit
> indicated by DW_AT_low_pc does not include NOPs.
> This issue exists in all gcc versions that support
> -fpatchable-function-entry option.
> 
> I have reported it to the GCC community:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98776
> 
> Currently arm64 and PA-RISC may enable fpatchable-function-entry option.
> The kernel compiled with clang does not have this issue.
> 
> FIX:
> 
> This GCC issue only cause the registration failure of the kretprobe event
> which doesn't need debuginfo. So, stop using debuginfo for retprobe.
> map will be used to query the probe function address.
> 
> Signed-off-by: Jianlin Lv 

Looks good to me.

Acked-by: Masami Hiramatsu 

Thank you!

> ---
> v2: stop using debuginfo for retprobe, and update changelog.
> v3: Update changelog, fixed misuse of --- marker.
> ---
>  tools/perf/util/probe-event.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
> index 8eae2afff71a..a59d3268adb0 100644
> --- a/tools/perf/util/probe-event.c
> +++ b/tools/perf/util/probe-event.c
> @@ -894,6 +894,16 @@ static int try_to_find_probe_trace_events(struct 
> perf_probe_event *pev,
>   struct debuginfo *dinfo;
>   int ntevs, ret = 0;
>  
> + /* Workaround for gcc #98776 issue.
> +  * Perf failed to add kretprobe event with debuginfo of vmlinux which is
> +  * compiled by gcc with -fpatchable-function-entry option enabled. The
> +  * same issue with kernel module. The retprobe doesn`t need debuginfo.
> +  * This workaround solution use map to query the probe function address
> +  * for retprobe event.
> +  */
> + if (pev->point.retprobe)
> + return 0;
> +
>   dinfo = open_debuginfo(pev->target, pev->nsi, !need_dwarf);
>   if (!dinfo) {
>   if (need_dwarf)
> -- 
> 2.25.1
> 


-- 
Masami Hiramatsu

Re: [RFC PATCH] mm, oom: introduce vm.sacrifice_hugepage_on_oom

2021-02-17 Thread Michal Hocko

On Wed 17-02-21 13:31:07, Michal Hocko wrote:
[...]
> Thanks for your usecase description. It helped me to understand what you
> are doing and how this can be really useful for your particular setup.
> This is really a very specific situation from my POV. I am not yet sure
> this is generic enough to warrant for a yet another tunable. One thing
> you can do [1] is to
> hook into oom notifiers interface (register_oom_notifier) and release
> pages from the callback.

Forgot to mention that this would be done from a kernel module.

> Why is that batter than a global tunable?
> For one thing you can make the implementation tailored to your specific
> usecase. As the review feedback has shown this would be more tricky to
> be done in a general case. Unlike a generic solution it would allow you
> to coordinate with your userspace if you need. Would something like that
> work for you?
> 
> ---
> [1] and I have to say I hate myself for suggesting that because I was
> really hoping this interface would go away. But the reality disagrees so
> I gave up on that goal...
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

Re: [PATCH for 5.4] Fix unsynchronized access to sev members through svm_register_enc_region

2021-02-17 Thread Paolo Bonzini


On 17/02/21 10:18, Dov Murik wrote:

Hi Peter,

On 08/02/2021 18:48, Peter Gonda wrote:

commit 19a23da53932bc8011220bd8c410cb76012de004 upstream.

Grab kvm->lock before pinning memory when registering an encrypted
region; sev_pin_memory() relies on kvm->lock being held to ensure
correctness when checking and updating the number of pinned pages.

Add a lockdep assertion to help prevent future regressions.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: Paolo Bonzini 
Cc: Joerg Roedel 
Cc: Tom Lendacky 
Cc: Brijesh Singh 
Cc: Sean Christopherson 
Cc: x...@kernel.org
Cc: k...@vger.kernel.org
Cc: sta...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Fixes: 1e80fdc09d12 ("KVM: SVM: Pin guest memory when SEV is active")
Signed-off-by: Peter Gonda 

V2
  - Fix up patch description
  - Correct file paths svm.c -> sev.c
  - Add unlock of kvm->lock on sev_pin_memory error

V1
  - https://lore.kernel.org/kvm/20210126185431.1824530-1-pgo...@google.com/

Message-Id: <20210127161524.2832400-1-pgo...@google.com>
Signed-off-by: Paolo Bonzini 
---
  arch/x86/kvm/svm.c | 18 +++---
  1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 2b506904be02..93c89f1ffc5d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1830,6 +1830,8 @@ static struct page **sev_pin_memory(struct kvm *kvm, 
unsigned long uaddr,
struct page **pages;
unsigned long first, last;

+   lockdep_assert_held(>lock);
+
if (ulen == 0 || uaddr + ulen < uaddr)
return NULL;

@@ -7086,12 +7088,21 @@ static int svm_register_enc_region(struct kvm *kvm,
if (!region)
return -ENOMEM;

+   mutex_lock(>lock);
region->pages = sev_pin_memory(kvm, range->addr, range->size, 
>npages, 1);
if (!region->pages) {
ret = -ENOMEM;
+   mutex_unlock(>lock);
goto e_free;
}

+   region->uaddr = range->addr;
+   region->size = range->size;
+
+   mutex_lock(>lock);


This extra mutex_lock call doesn't appear in the upstream patch (committed
as 19a23da5393), but does appear in the 5.4 and 4.19 backports.  Is it
needed here?


Ouch.  No it isn't and it's an insta-deadlock.  Let me send a fix.

Paolo


-Dov



+   list_add_tail(>list, >regions_list);
+   mutex_unlock(>lock);
+
/*
 * The guest may change the memory encryption attribute from C=0 -> C=1
 * or vice versa for this memory range. Lets make sure caches are
@@ -7100,13 +7111,6 @@ static int svm_register_enc_region(struct kvm *kvm,
 */
sev_clflush_pages(region->pages, region->npages);

-   region->uaddr = range->addr;
-   region->size = range->size;
-
-   mutex_lock(>lock);
-   list_add_tail(>list, >regions_list);
-   mutex_unlock(>lock);
-
return ret;

  e_free:

[PATCH v2 09/11] drm/qxl: move shadow handling to new qxl_prepare_shadow()

2021-02-17 Thread Gerd Hoffmann

Pure code motion, no functional change.

Signed-off-by: Gerd Hoffmann 
---
 drivers/gpu/drm/qxl/qxl_display.c | 61 +--
 1 file changed, 34 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_display.c 
b/drivers/gpu/drm/qxl/qxl_display.c
index f106da917863..b315d7484e21 100644
--- a/drivers/gpu/drm/qxl/qxl_display.c
+++ b/drivers/gpu/drm/qxl/qxl_display.c
@@ -771,13 +771,45 @@ static void qxl_calc_dumb_shadow(struct qxl_device *qdev,
DRM_DEBUG("%dx%d\n", surf->width, surf->height);
 }
 
+static void qxl_prepare_shadow(struct qxl_device *qdev, struct qxl_bo *user_bo,
+  int crtc_index)
+{
+   struct qxl_surface surf;
+
+   qxl_update_dumb_head(qdev, crtc_index,
+user_bo);
+   qxl_calc_dumb_shadow(qdev, );
+   if (!qdev->dumb_shadow_bo ||
+   qdev->dumb_shadow_bo->surf.width  != surf.width ||
+   qdev->dumb_shadow_bo->surf.height != surf.height) {
+   if (qdev->dumb_shadow_bo) {
+   drm_gem_object_put
+   (>dumb_shadow_bo->tbo.base);
+   qdev->dumb_shadow_bo = NULL;
+   }
+   qxl_bo_create(qdev, surf.height * surf.stride,
+ true, true, QXL_GEM_DOMAIN_SURFACE, 0,
+ , >dumb_shadow_bo);
+   }
+   if (user_bo->shadow != qdev->dumb_shadow_bo) {
+   if (user_bo->shadow) {
+   qxl_bo_unpin(user_bo->shadow);
+   drm_gem_object_put
+   (_bo->shadow->tbo.base);
+   user_bo->shadow = NULL;
+   }
+   drm_gem_object_get(>dumb_shadow_bo->tbo.base);
+   user_bo->shadow = qdev->dumb_shadow_bo;
+   qxl_bo_pin(user_bo->shadow);
+   }
+}
+
 static int qxl_plane_prepare_fb(struct drm_plane *plane,
struct drm_plane_state *new_state)
 {
struct qxl_device *qdev = to_qxl(plane->dev);
struct drm_gem_object *obj;
struct qxl_bo *user_bo;
-   struct qxl_surface surf;
 
if (!new_state->fb)
return 0;
@@ -787,32 +819,7 @@ static int qxl_plane_prepare_fb(struct drm_plane *plane,
 
if (plane->type == DRM_PLANE_TYPE_PRIMARY &&
user_bo->is_dumb) {
-   qxl_update_dumb_head(qdev, new_state->crtc->index,
-user_bo);
-   qxl_calc_dumb_shadow(qdev, );
-   if (!qdev->dumb_shadow_bo ||
-   qdev->dumb_shadow_bo->surf.width  != surf.width ||
-   qdev->dumb_shadow_bo->surf.height != surf.height) {
-   if (qdev->dumb_shadow_bo) {
-   drm_gem_object_put
-   (>dumb_shadow_bo->tbo.base);
-   qdev->dumb_shadow_bo = NULL;
-   }
-   qxl_bo_create(qdev, surf.height * surf.stride,
- true, true, QXL_GEM_DOMAIN_SURFACE, 0,
- , >dumb_shadow_bo);
-   }
-   if (user_bo->shadow != qdev->dumb_shadow_bo) {
-   if (user_bo->shadow) {
-   qxl_bo_unpin(user_bo->shadow);
-   drm_gem_object_put
-   (_bo->shadow->tbo.base);
-   user_bo->shadow = NULL;
-   }
-   drm_gem_object_get(>dumb_shadow_bo->tbo.base);
-   user_bo->shadow = qdev->dumb_shadow_bo;
-   qxl_bo_pin(user_bo->shadow);
-   }
+   qxl_prepare_shadow(qdev, user_bo, new_state->crtc->index);
}
 
return qxl_bo_pin(user_bo);
-- 
2.29.2

< 3 4 5 6 7 8 9 10 11 >

701 - 800 of 1042 matches

Mail list logo