Re: [PATCH AUTOSEL 4.14 17/19] selftests: ftrace: Add synthetic event syntax testcase

2018-10-30 Thread Sasha Levin

On Tue, Oct 30, 2018 at 10:58:15AM -0400, Steven Rostedt wrote:

On Tue, 30 Oct 2018 09:28:22 -0400
Sasha Levin  wrote:


From: Masami Hiramatsu 

[ Upstream commit ba0e41ca81b935b958006c7120466e2217357827 ]

Add a testcase to check the syntax and field types for
synthetic_events interface.

Link: 
http://lkml.kernel.org/r/153986838264.18251.16627517536956299922.stgit@devbox

Acked-by: Shuah Khan 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Sasha Levin 


Why are you adding selftests for old kernels to test a feature that
wasn't added until 4.17?


We backport all selftest patches to stable to keep selftest on stable
kernels as similar as possible to upstream.

--
Thanks,
Sasha


Re: [PATCH AUTOSEL 4.14 17/19] selftests: ftrace: Add synthetic event syntax testcase

2018-10-30 Thread Sasha Levin

On Tue, Oct 30, 2018 at 10:58:15AM -0400, Steven Rostedt wrote:

On Tue, 30 Oct 2018 09:28:22 -0400
Sasha Levin  wrote:


From: Masami Hiramatsu 

[ Upstream commit ba0e41ca81b935b958006c7120466e2217357827 ]

Add a testcase to check the syntax and field types for
synthetic_events interface.

Link: 
http://lkml.kernel.org/r/153986838264.18251.16627517536956299922.stgit@devbox

Acked-by: Shuah Khan 
Signed-off-by: Masami Hiramatsu 
Signed-off-by: Steven Rostedt (VMware) 
Signed-off-by: Sasha Levin 


Why are you adding selftests for old kernels to test a feature that
wasn't added until 4.17?


We backport all selftest patches to stable to keep selftest on stable
kernels as similar as possible to upstream.

--
Thanks,
Sasha


[PATCH] firmware: raspberrypi: Define timeout for transactions

2018-10-30 Thread Stefan Wahren
We should never assume to get a reply from the firmware otherwise
the call could block forever and the user don't get informed. So
define a timeout of 1 sec and print a stacktrace once in the unlikely
case the timeout expired.

Signed-off-by: Stefan Wahren 
---
 drivers/firmware/raspberrypi.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/raspberrypi.c b/drivers/firmware/raspberrypi.c
index a200a21..bf45ac4 100644
--- a/drivers/firmware/raspberrypi.c
+++ b/drivers/firmware/raspberrypi.c
@@ -56,8 +56,12 @@ rpi_firmware_transaction(struct rpi_firmware *fw, u32 chan, 
u32 data)
reinit_completion(>c);
ret = mbox_send_message(fw->chan, );
if (ret >= 0) {
-   wait_for_completion(>c);
-   ret = 0;
+   if (wait_for_completion_timeout(>c, HZ)) {
+   ret = 0;
+   } else {
+   ret = -ETIMEDOUT;
+   WARN_ONCE(1, "Firmware transaction timeout");
+   }
} else {
dev_err(fw->cl.dev, "mbox_send_message returned %d\n", ret);
}
-- 
2.7.4



[PATCH] firmware: raspberrypi: Define timeout for transactions

2018-10-30 Thread Stefan Wahren
We should never assume to get a reply from the firmware otherwise
the call could block forever and the user don't get informed. So
define a timeout of 1 sec and print a stacktrace once in the unlikely
case the timeout expired.

Signed-off-by: Stefan Wahren 
---
 drivers/firmware/raspberrypi.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/raspberrypi.c b/drivers/firmware/raspberrypi.c
index a200a21..bf45ac4 100644
--- a/drivers/firmware/raspberrypi.c
+++ b/drivers/firmware/raspberrypi.c
@@ -56,8 +56,12 @@ rpi_firmware_transaction(struct rpi_firmware *fw, u32 chan, 
u32 data)
reinit_completion(>c);
ret = mbox_send_message(fw->chan, );
if (ret >= 0) {
-   wait_for_completion(>c);
-   ret = 0;
+   if (wait_for_completion_timeout(>c, HZ)) {
+   ret = 0;
+   } else {
+   ret = -ETIMEDOUT;
+   WARN_ONCE(1, "Firmware transaction timeout");
+   }
} else {
dev_err(fw->cl.dev, "mbox_send_message returned %d\n", ret);
}
-- 
2.7.4



Re: [PATCH v2 1/4] platform/x86: intel_pmc_core: Show Latency Tolerance info

2018-10-30 Thread Srinivas Pandruvada
[...]
> > > For example, index printing. Please, remove it. I completely
> > > forgot
> > > about userspace powerful tools. At least two very old and nice
> > > can
> > > enumerate lines for you.
> > > Obviously the index printing is redundant.
> > 
> > Index printing is required here (for LTR Show and LTR Ignore)
> > because it
> > paves an obvious and easy way for the users of this driver to know
> > the
> > IP number to be used for LTR ignore. This was specifically
> > requested by
> > some customer and Srinivas asked me to implement this so adding him
> > for
> > his inputs.
> 
> So, why it should be in kernel? When user prints this, they usually
> call `cat /.../file`, right?
> Is it too hard to call `cat -n /.../file` instead? The benefit of
> such
> approach is that it's independent on the file we are printing.
> 
> (Note, `grep -n  /.../file` does the same`)
> 
> For more variants
> 
https://stackoverflow.com/questions/8206370/add-numbers-to-the-beginning-of-every-line-in-a-file
> 
We get copy/paste data from some serial terminals from systems which
don't have traditional linux shell or busy box. Not sure if they can do
cat "-n" option or have this command at all. So line number helps. They
can't even store output as as file as this is RO file system.

But I am not as sticky on this.

Thanks,
Srinivas





Re: [PATCH v2 1/4] platform/x86: intel_pmc_core: Show Latency Tolerance info

2018-10-30 Thread Srinivas Pandruvada
[...]
> > > For example, index printing. Please, remove it. I completely
> > > forgot
> > > about userspace powerful tools. At least two very old and nice
> > > can
> > > enumerate lines for you.
> > > Obviously the index printing is redundant.
> > 
> > Index printing is required here (for LTR Show and LTR Ignore)
> > because it
> > paves an obvious and easy way for the users of this driver to know
> > the
> > IP number to be used for LTR ignore. This was specifically
> > requested by
> > some customer and Srinivas asked me to implement this so adding him
> > for
> > his inputs.
> 
> So, why it should be in kernel? When user prints this, they usually
> call `cat /.../file`, right?
> Is it too hard to call `cat -n /.../file` instead? The benefit of
> such
> approach is that it's independent on the file we are printing.
> 
> (Note, `grep -n  /.../file` does the same`)
> 
> For more variants
> 
https://stackoverflow.com/questions/8206370/add-numbers-to-the-beginning-of-every-line-in-a-file
> 
We get copy/paste data from some serial terminals from systems which
don't have traditional linux shell or busy box. Not sure if they can do
cat "-n" option or have this command at all. So line number helps. They
can't even store output as as file as this is RO file system.

But I am not as sticky on this.

Thanks,
Srinivas





[PATCH v2] proc: use ns_capable instead of capable for timerslack_ns

2018-10-30 Thread Benjamin Gordon
Access to timerslack_ns is controlled by a process having CAP_SYS_NICE
in its effective capability set, but the current check looks in the root
namespace instead of the process' user namespace.  Since a process is
allowed to do other activities controlled by CAP_SYS_NICE inside a
namespace, it should also be able to adjust timerslack_ns.

Signed-off-by: Benjamin Gordon 
Cc: John Stultz 
Cc: "Eric W. Biederman" 
Cc: Kees Cook 
Cc: "Serge E. Hallyn" 
Cc: Thomas Gleixner 
Cc: Arjan van de Ven 
Cc: Oren Laadan 
Cc: Ruchi Kandoi 
Cc: Rom Lemarchand 
Cc: Todd Kjos 
Cc: Colin Cross 
Cc: Nick Kralevich 
Cc: Dmitry Shmidt 
Cc: Elliott Hughes 
Cc: Android Kernel Team 
Cc: Andrew Morton 
---

Changes from v1:
  - Use the namespace of the target process instead of the file opener.
Didn't carry over John Stultz' Acked-by since the changes aren't
cosmetic.

 fs/proc/base.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index c78d8da09b52c..bdc093ba81dd3 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2385,10 +2385,13 @@ static ssize_t timerslack_ns_write(struct file *file, 
const char __user *buf,
return -ESRCH;
 
if (p != current) {
-   if (!capable(CAP_SYS_NICE)) {
+   rcu_read_lock();
+   if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
+   rcu_read_unlock();
count = -EPERM;
goto out;
}
+   rcu_read_unlock();
 
err = security_task_setscheduler(p);
if (err) {
@@ -2421,11 +2424,14 @@ static int timerslack_ns_show(struct seq_file *m, void 
*v)
return -ESRCH;
 
if (p != current) {
-
-   if (!capable(CAP_SYS_NICE)) {
+   rcu_read_lock();
+   if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
+   rcu_read_unlock();
err = -EPERM;
goto out;
}
+   rcu_read_unlock();
+
err = security_task_getscheduler(p);
if (err)
goto out;
-- 
2.19.1.568.g152ad8e336-goog



[PATCH v2] proc: use ns_capable instead of capable for timerslack_ns

2018-10-30 Thread Benjamin Gordon
Access to timerslack_ns is controlled by a process having CAP_SYS_NICE
in its effective capability set, but the current check looks in the root
namespace instead of the process' user namespace.  Since a process is
allowed to do other activities controlled by CAP_SYS_NICE inside a
namespace, it should also be able to adjust timerslack_ns.

Signed-off-by: Benjamin Gordon 
Cc: John Stultz 
Cc: "Eric W. Biederman" 
Cc: Kees Cook 
Cc: "Serge E. Hallyn" 
Cc: Thomas Gleixner 
Cc: Arjan van de Ven 
Cc: Oren Laadan 
Cc: Ruchi Kandoi 
Cc: Rom Lemarchand 
Cc: Todd Kjos 
Cc: Colin Cross 
Cc: Nick Kralevich 
Cc: Dmitry Shmidt 
Cc: Elliott Hughes 
Cc: Android Kernel Team 
Cc: Andrew Morton 
---

Changes from v1:
  - Use the namespace of the target process instead of the file opener.
Didn't carry over John Stultz' Acked-by since the changes aren't
cosmetic.

 fs/proc/base.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index c78d8da09b52c..bdc093ba81dd3 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2385,10 +2385,13 @@ static ssize_t timerslack_ns_write(struct file *file, 
const char __user *buf,
return -ESRCH;
 
if (p != current) {
-   if (!capable(CAP_SYS_NICE)) {
+   rcu_read_lock();
+   if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
+   rcu_read_unlock();
count = -EPERM;
goto out;
}
+   rcu_read_unlock();
 
err = security_task_setscheduler(p);
if (err) {
@@ -2421,11 +2424,14 @@ static int timerslack_ns_show(struct seq_file *m, void 
*v)
return -ESRCH;
 
if (p != current) {
-
-   if (!capable(CAP_SYS_NICE)) {
+   rcu_read_lock();
+   if (!ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE)) {
+   rcu_read_unlock();
err = -EPERM;
goto out;
}
+   rcu_read_unlock();
+
err = security_task_getscheduler(p);
if (err)
goto out;
-- 
2.19.1.568.g152ad8e336-goog



Re: [PATCH 0/9 v3] locks: avoid thundering-herd wake-ups

2018-10-30 Thread Jeff Layton
On Tue, 2018-10-30 at 08:04 -0400, Jeff Layton wrote:
> On Mon, 2018-10-29 at 08:38 -0400, Jeff Layton wrote:
> > On Mon, 2018-10-29 at 12:56 +1100, NeilBrown wrote:
> > > On Fri, Oct 26 2018, Jeff Layton wrote:
> > > 
> > > > On Wed, 2018-10-24 at 09:43 +1100, NeilBrown wrote:
> > > > > This took longer that I had wanted, due to various reasons - sorry.
> > > > > And I'm now posting it in a merge window, which is not ideal.  I don't
> > > > > expect it to be included in this merge window and I won't be at all
> > > > > impatient for review, but I didn't want to delay it further.
> > > > > 
> > > > > Testing found some problems, particularly the need to use
> > > > > locks_copy_lock in NFS.  And there was a small thinko in there that
> > > > > effectively removed all the speed gains :-(
> > > > > 
> > > > > But this version:
> > > > >  - shows excellent scalability with lots of threads on lots of CPUs
> > > > >contending on a single file
> > > > >  - avoids the problems that Bruce found
> > > > >  - seems to work.
> > > > > 
> > > > > Thanks,
> > > > > NeilBrown
> > > > > 
> > > > > 
> > > > > ---
> > > > > 
> > > > > NeilBrown (9):
> > > > >   fs/locks: rename some lists and pointers.
> > > > >   fs/locks: split out __locks_wake_up_blocks().
> > > > >   NFS: use locks_copy_lock() to copy locks.
> > > > >   fs/locks: allow a lock request to block other requests.
> > > > >   fs/locks: always delete_block after waiting.
> > > > >   fs/locks: change all *_conflict() functions to return bool.
> > > > >   fs/locks: create a tree of dependent requests.
> > > > >   locks: merge posix_unblock_lock() and locks_delete_block()
> > > > >   VFS: locks: remove unnecessary white space.
> > > > > 
> > > > > 
> > > > >  fs/cifs/file.c  |4 -
> > > > >  fs/lockd/svclock.c  |2 
> > > > >  fs/locks.c  |  231 
> > > > > ++-
> > > > >  fs/nfs/nfs4proc.c   |6 +
> > > > >  fs/nfsd/nfs4state.c |6 +
> > > > >  include/linux/fs.h  |   11 +-
> > > > >  include/trace/events/filelock.h |   16 +--
> > > > >  7 files changed, 153 insertions(+), 123 deletions(-)
> > > > > 
> > > > > --
> > > > > Signature
> > > > > 
> > > > 
> > > > 
> > > > I built a kernel with these patches and ran the cthon04 lock tests and
> > > > got this on lock test 1 after a long hang (the test passed though):
> > > 
> > > I've been looking deeper into this, and I cannot see how this can
> > > happen.
> > > 
> > > This is an unlock request, happening when a file is closed.
> > > locks_delete_block() will only be called from locks_lock_inode_wait()
> > > after posix_lock_inode() (or possible flock_lock_inode()) returns
> > > FILE_LOCK_DEFERRED.
> > > 
> > > But these only return FILE_LOCK_DEFERRED when fl_type != F_UNLCK.
> > > (or possibly if FL_ACCESS and FL_SLEEP are both set - that would be
> > >  weird).
> > > 
> > > So this shouldn't happen - an unlock request should never result in
> > > locks_delete_block() being called.
> > > But if it can, I'll need to change do_flock() in gfs2/file.c to use a
> > > properly initialized 'struct file_lock', rather than a manifest
> > > constant.  Maybe I should do that anyway.
> > > 
> > > Any ideas? I'll try running connectathon myself later and see if I can
> > > reproduce it.
> > > 
> > > Thanks,
> > > NeilBrown
> > > 

Sorry, I missed your comment about gfs2.

Yes, we'll need to do something similar there. Looks like ocfs2_do_flock
likely has the same problem too. I think we can just call
locks_init_lock on those and then initialize the fields being set
afterward.

Given that that needs to be fixed, I'll hold off on merging this into
linux-next just yet. Would you mind respinning this series with those
fixed up? Feel free to squash my one-liner patch in with another too.

> > > > [ 1694.787367] BUG: unable to handle kernel NULL pointer dereference at 
> > > > 002c
> > > > [ 1694.789546] PGD 118ff0067 P4D 118ff0067 PUD 135915067 PMD 0 
> > > > [ 1694.790772] Oops:  [#1] SMP NOPTI
> > > > [ 1694.791749] CPU: 7 PID: 1514 Comm: tlocklfs Not tainted 4.19.0+ #56
> > > > [ 1694.792876] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > > BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 
> > > > 04/01/2014
> > > > [ 1694.795179] RIP: 0010:__locks_delete_block+0x14/0x90
> > > > [ 1694.796203] Code: 01 a1 e9 9f 4f d8 ff 0f 1f 44 00 00 66 2e 0f 1f 84 
> > > > 00 00 00 00 00 0f 1f 44 00 00 8b 05 29 9d 58 01 55 53 48 89 fb 85 c0 75 
> > > > 5a <48> 8b 43 20 48 85 c0 74 20 48 8b 53 18 48 89 10 48 85 d2 74 04 48
> > > > [ 1694.799277] RSP: 0018:9d21c1f63cb8 EFLAGS: 00010202
> > > > [ 1694.800374] RAX: 0001 RBX: 000c RCX: 
> > > > aaad
> > > > [ 1694.801682] RDX: 0001 RSI: 9f7b0c38 RDI: 
> > > > 0246
> > > > [ 1694.802996] RBP: 000c R08: 

Re: [PATCH 0/9 v3] locks: avoid thundering-herd wake-ups

2018-10-30 Thread Jeff Layton
On Tue, 2018-10-30 at 08:04 -0400, Jeff Layton wrote:
> On Mon, 2018-10-29 at 08:38 -0400, Jeff Layton wrote:
> > On Mon, 2018-10-29 at 12:56 +1100, NeilBrown wrote:
> > > On Fri, Oct 26 2018, Jeff Layton wrote:
> > > 
> > > > On Wed, 2018-10-24 at 09:43 +1100, NeilBrown wrote:
> > > > > This took longer that I had wanted, due to various reasons - sorry.
> > > > > And I'm now posting it in a merge window, which is not ideal.  I don't
> > > > > expect it to be included in this merge window and I won't be at all
> > > > > impatient for review, but I didn't want to delay it further.
> > > > > 
> > > > > Testing found some problems, particularly the need to use
> > > > > locks_copy_lock in NFS.  And there was a small thinko in there that
> > > > > effectively removed all the speed gains :-(
> > > > > 
> > > > > But this version:
> > > > >  - shows excellent scalability with lots of threads on lots of CPUs
> > > > >contending on a single file
> > > > >  - avoids the problems that Bruce found
> > > > >  - seems to work.
> > > > > 
> > > > > Thanks,
> > > > > NeilBrown
> > > > > 
> > > > > 
> > > > > ---
> > > > > 
> > > > > NeilBrown (9):
> > > > >   fs/locks: rename some lists and pointers.
> > > > >   fs/locks: split out __locks_wake_up_blocks().
> > > > >   NFS: use locks_copy_lock() to copy locks.
> > > > >   fs/locks: allow a lock request to block other requests.
> > > > >   fs/locks: always delete_block after waiting.
> > > > >   fs/locks: change all *_conflict() functions to return bool.
> > > > >   fs/locks: create a tree of dependent requests.
> > > > >   locks: merge posix_unblock_lock() and locks_delete_block()
> > > > >   VFS: locks: remove unnecessary white space.
> > > > > 
> > > > > 
> > > > >  fs/cifs/file.c  |4 -
> > > > >  fs/lockd/svclock.c  |2 
> > > > >  fs/locks.c  |  231 
> > > > > ++-
> > > > >  fs/nfs/nfs4proc.c   |6 +
> > > > >  fs/nfsd/nfs4state.c |6 +
> > > > >  include/linux/fs.h  |   11 +-
> > > > >  include/trace/events/filelock.h |   16 +--
> > > > >  7 files changed, 153 insertions(+), 123 deletions(-)
> > > > > 
> > > > > --
> > > > > Signature
> > > > > 
> > > > 
> > > > 
> > > > I built a kernel with these patches and ran the cthon04 lock tests and
> > > > got this on lock test 1 after a long hang (the test passed though):
> > > 
> > > I've been looking deeper into this, and I cannot see how this can
> > > happen.
> > > 
> > > This is an unlock request, happening when a file is closed.
> > > locks_delete_block() will only be called from locks_lock_inode_wait()
> > > after posix_lock_inode() (or possible flock_lock_inode()) returns
> > > FILE_LOCK_DEFERRED.
> > > 
> > > But these only return FILE_LOCK_DEFERRED when fl_type != F_UNLCK.
> > > (or possibly if FL_ACCESS and FL_SLEEP are both set - that would be
> > >  weird).
> > > 
> > > So this shouldn't happen - an unlock request should never result in
> > > locks_delete_block() being called.
> > > But if it can, I'll need to change do_flock() in gfs2/file.c to use a
> > > properly initialized 'struct file_lock', rather than a manifest
> > > constant.  Maybe I should do that anyway.
> > > 
> > > Any ideas? I'll try running connectathon myself later and see if I can
> > > reproduce it.
> > > 
> > > Thanks,
> > > NeilBrown
> > > 

Sorry, I missed your comment about gfs2.

Yes, we'll need to do something similar there. Looks like ocfs2_do_flock
likely has the same problem too. I think we can just call
locks_init_lock on those and then initialize the fields being set
afterward.

Given that that needs to be fixed, I'll hold off on merging this into
linux-next just yet. Would you mind respinning this series with those
fixed up? Feel free to squash my one-liner patch in with another too.

> > > > [ 1694.787367] BUG: unable to handle kernel NULL pointer dereference at 
> > > > 002c
> > > > [ 1694.789546] PGD 118ff0067 P4D 118ff0067 PUD 135915067 PMD 0 
> > > > [ 1694.790772] Oops:  [#1] SMP NOPTI
> > > > [ 1694.791749] CPU: 7 PID: 1514 Comm: tlocklfs Not tainted 4.19.0+ #56
> > > > [ 1694.792876] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > > BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 
> > > > 04/01/2014
> > > > [ 1694.795179] RIP: 0010:__locks_delete_block+0x14/0x90
> > > > [ 1694.796203] Code: 01 a1 e9 9f 4f d8 ff 0f 1f 44 00 00 66 2e 0f 1f 84 
> > > > 00 00 00 00 00 0f 1f 44 00 00 8b 05 29 9d 58 01 55 53 48 89 fb 85 c0 75 
> > > > 5a <48> 8b 43 20 48 85 c0 74 20 48 8b 53 18 48 89 10 48 85 d2 74 04 48
> > > > [ 1694.799277] RSP: 0018:9d21c1f63cb8 EFLAGS: 00010202
> > > > [ 1694.800374] RAX: 0001 RBX: 000c RCX: 
> > > > aaad
> > > > [ 1694.801682] RDX: 0001 RSI: 9f7b0c38 RDI: 
> > > > 0246
> > > > [ 1694.802996] RBP: 000c R08: 

Re: [PATCH] mm: handle no memcg case in memcg_kmem_charge() properly

2018-10-30 Thread Shakeel Butt
On Tue, Oct 30, 2018 at 8:56 AM Roman Gushchin  wrote:
>
> On Tue, Oct 30, 2018 at 07:12:49AM +0100, Michal Hocko wrote:
> > On Mon 29-10-18 21:51:55, Roman Gushchin wrote:
> > > Mike Galbraith reported a regression caused by the commit 9b6f7e163cd0
> > > ("mm: rework memcg kernel stack accounting") on a system with
> > > "cgroup_disable=memory" boot option: the system panics with the
> > > following stack trace:
> > >
> > >   [0.928542] BUG: unable to handle kernel NULL pointer dereference at 
> > > 00f8
> > >   [0.929317] PGD 0 P4D 0
> > >   [0.929573] Oops: 0002 [#1] PREEMPT SMP PTI
> > >   [0.929984] CPU: 0 PID: 1 Comm: systemd Not tainted 4.19.0-preempt+ #410
> > >   [0.930637] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > ?-20180531_142017-buildhw-08.phx2.fed4
> > >   [0.931862] RIP: 0010:page_counter_try_charge+0x22/0xc0
> > >   [0.932376] Code: 41 5d c3 c3 0f 1f 40 00 0f 1f 44 00 00 48 85 ff 0f 84 
> > > a7 00 00 00 41 56 48 89 f8 49 89 fe 49
> > >   [0.934283] RSP: 0018:acf68031fcb8 EFLAGS: 00010202
> > >   [0.934826] RAX: 00f8 RBX:  RCX: 
> > > 
> > >   [0.935558] RDX: acf68031fd08 RSI: 0020 RDI: 
> > > 00f8
> > >   [0.936288] RBP: 0001 R08: 8063 R09: 
> > > 99ff7cd37a40
> > >   [0.937021] R10: acf68031fed0 R11: 0020 R12: 
> > > 0020
> > >   [0.937749] R13: acf68031fd08 R14: 00f8 R15: 
> > > 99ff7da1ec60
> > >   [0.938486] FS:  7fc2140bb280() GS:99ff7da0() 
> > > knlGS:
> > >   [0.939311] CS:  0010 DS:  ES:  CR0: 80050033
> > >   [0.939905] CR2: 00f8 CR3: 12dc8002 CR4: 
> > > 00760ef0
> > >   [0.940638] DR0:  DR1:  DR2: 
> > > 
> > >   [0.941366] DR3:  DR6: fffe0ff0 DR7: 
> > > 0400
> > >   [0.942110] PKRU: 5554
> > >   [0.942412] Call Trace:
> > >   [0.942673]  try_charge+0xcb/0x780
> > >   [0.943031]  memcg_kmem_charge_memcg+0x28/0x80
> > >   [0.943486]  ? __vmalloc_node_range+0x1e4/0x280
> > >   [0.943971]  memcg_kmem_charge+0x8b/0x1d0
> > >   [0.944396]  copy_process.part.41+0x1ca/0x2070
> > >   [0.944853]  ? get_acl+0x1a/0x120
> > >   [0.945200]  ? shmem_tmpfile+0x90/0x90
> > >   [0.945596]  _do_fork+0xd7/0x3d0
> > >   [0.945934]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> > >   [0.946421]  do_syscall_64+0x5a/0x180
> > >   [0.946798]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >
> > > The problem occurs because get_mem_cgroup_from_current() returns
> > > the NULL pointer if memory controller is disabled. Let's check
> > > if this is a case at the beginning of memcg_kmem_charge() and
> > > just return 0 if mem_cgroup_disabled() returns true. This is how
> > > we handle this case in many other places in the memory controller
> > > code.
> > >
> > > Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
> > > Reported-by: Mike Galbraith 
> > > Signed-off-by: Roman Gushchin 
> > > Cc: Michal Hocko 
> > > Cc: Johannes Weiner 
> > > Cc: Vladimir Davydov 
> > > Cc: Andrew Morton 
> >
> > I tend to agree with Shakeel that consistency with the other caller
> > would be less confusing.
>
> I totally agree that consistency is a thing here (and everywhere),
> however using memcg_kmem_enabled() here is not consistent at all.
> memcg_kmem_enabled() is tight to the slab allocation accounting,

Not really, see __alloc_pages_nodemask() where page allocations with
__GFP_ACCOUNT call memcg_kmem_charge() only if memcg_kmem_enabled().
Anyways it's a separate discussion and can be done in the followup
cleanup.

Shakeel


Re: [PATCH] mm: handle no memcg case in memcg_kmem_charge() properly

2018-10-30 Thread Shakeel Butt
On Tue, Oct 30, 2018 at 8:56 AM Roman Gushchin  wrote:
>
> On Tue, Oct 30, 2018 at 07:12:49AM +0100, Michal Hocko wrote:
> > On Mon 29-10-18 21:51:55, Roman Gushchin wrote:
> > > Mike Galbraith reported a regression caused by the commit 9b6f7e163cd0
> > > ("mm: rework memcg kernel stack accounting") on a system with
> > > "cgroup_disable=memory" boot option: the system panics with the
> > > following stack trace:
> > >
> > >   [0.928542] BUG: unable to handle kernel NULL pointer dereference at 
> > > 00f8
> > >   [0.929317] PGD 0 P4D 0
> > >   [0.929573] Oops: 0002 [#1] PREEMPT SMP PTI
> > >   [0.929984] CPU: 0 PID: 1 Comm: systemd Not tainted 4.19.0-preempt+ #410
> > >   [0.930637] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > > ?-20180531_142017-buildhw-08.phx2.fed4
> > >   [0.931862] RIP: 0010:page_counter_try_charge+0x22/0xc0
> > >   [0.932376] Code: 41 5d c3 c3 0f 1f 40 00 0f 1f 44 00 00 48 85 ff 0f 84 
> > > a7 00 00 00 41 56 48 89 f8 49 89 fe 49
> > >   [0.934283] RSP: 0018:acf68031fcb8 EFLAGS: 00010202
> > >   [0.934826] RAX: 00f8 RBX:  RCX: 
> > > 
> > >   [0.935558] RDX: acf68031fd08 RSI: 0020 RDI: 
> > > 00f8
> > >   [0.936288] RBP: 0001 R08: 8063 R09: 
> > > 99ff7cd37a40
> > >   [0.937021] R10: acf68031fed0 R11: 0020 R12: 
> > > 0020
> > >   [0.937749] R13: acf68031fd08 R14: 00f8 R15: 
> > > 99ff7da1ec60
> > >   [0.938486] FS:  7fc2140bb280() GS:99ff7da0() 
> > > knlGS:
> > >   [0.939311] CS:  0010 DS:  ES:  CR0: 80050033
> > >   [0.939905] CR2: 00f8 CR3: 12dc8002 CR4: 
> > > 00760ef0
> > >   [0.940638] DR0:  DR1:  DR2: 
> > > 
> > >   [0.941366] DR3:  DR6: fffe0ff0 DR7: 
> > > 0400
> > >   [0.942110] PKRU: 5554
> > >   [0.942412] Call Trace:
> > >   [0.942673]  try_charge+0xcb/0x780
> > >   [0.943031]  memcg_kmem_charge_memcg+0x28/0x80
> > >   [0.943486]  ? __vmalloc_node_range+0x1e4/0x280
> > >   [0.943971]  memcg_kmem_charge+0x8b/0x1d0
> > >   [0.944396]  copy_process.part.41+0x1ca/0x2070
> > >   [0.944853]  ? get_acl+0x1a/0x120
> > >   [0.945200]  ? shmem_tmpfile+0x90/0x90
> > >   [0.945596]  _do_fork+0xd7/0x3d0
> > >   [0.945934]  ? trace_hardirqs_off_thunk+0x1a/0x1c
> > >   [0.946421]  do_syscall_64+0x5a/0x180
> > >   [0.946798]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >
> > > The problem occurs because get_mem_cgroup_from_current() returns
> > > the NULL pointer if memory controller is disabled. Let's check
> > > if this is a case at the beginning of memcg_kmem_charge() and
> > > just return 0 if mem_cgroup_disabled() returns true. This is how
> > > we handle this case in many other places in the memory controller
> > > code.
> > >
> > > Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting")
> > > Reported-by: Mike Galbraith 
> > > Signed-off-by: Roman Gushchin 
> > > Cc: Michal Hocko 
> > > Cc: Johannes Weiner 
> > > Cc: Vladimir Davydov 
> > > Cc: Andrew Morton 
> >
> > I tend to agree with Shakeel that consistency with the other caller
> > would be less confusing.
>
> I totally agree that consistency is a thing here (and everywhere),
> however using memcg_kmem_enabled() here is not consistent at all.
> memcg_kmem_enabled() is tight to the slab allocation accounting,

Not really, see __alloc_pages_nodemask() where page allocations with
__GFP_ACCOUNT call memcg_kmem_charge() only if memcg_kmem_enabled().
Anyways it's a separate discussion and can be done in the followup
cleanup.

Shakeel


Re: [PATCH RFC kenrel/rcu] Eliminate BUG_ON() for sync.c

2018-10-30 Thread Paul E. McKenney
On Mon, Oct 22, 2018 at 06:14:40PM +0200, Oleg Nesterov wrote:
> On 10/22, Paul E. McKenney wrote:
> >
> > > > @@ -125,12 +125,12 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> > > > rsp->gp_state = GP_PENDING;
> > > > spin_unlock_irq(>rss_lock);
> > > >
> > > > -   BUG_ON(need_wait && need_sync);
> > > > -
> > > > if (need_sync) {
> > > > gp_ops[rsp->gp_type].sync();
> > > > rsp->gp_state = GP_PASSED;
> > > > wake_up_all(>gp_wait);
> > > > +   if (WARN_ON_ONCE(need_wait))
> > > > +   wait_event(rsp->gp_wait, rsp->gp_state == 
> > > > GP_PASSED);
> > >
> > > This wait_event(gp_state == GP_PASSED) is pointless, note that this branch
> > > does gp_state = GP_PASSED 2 lines above.
> >
> > OK, I have removed this one.
> >
> > > And if we add WARN_ON_ONCE(need_wait), then we should probably also add
> > > WARN_ON_ONCE(need_sync) into the next "if (need_wait)" branch just for
> > > symmetry.
> >
> > But in that case, the earlier "if" prevents "need_sync" from ever getting
> > there, unless I lost the thread here.
> 
> Yes, you are right, we would also need to remove "else",
> 
> > Should I remove the others?
> 
> Up to you, I am fine either way.
> 
> IOW, feel free to remove this BUG_ON's altogether, or turn them all into
> WARN_ON_ONCE's, whatever you like more.
> 
> > > 
> > > Damn.
> > >
> > > This suddenly reminds me that I rewrote this code completely, and you even
> > > reviewed the new implementation and (iirc) acked it!
> > >
> > > However, I failed to force myself to rewrite the comments, and that is why
> > > I didn't send the "official" patch :/
> > >
> > > May be some time...
> >
> > Could you please point me at the last email thread?  Yes, I should be
> > able to find it, but I would probably get the wrong one.  :-/
> 
> probably this one,
> 
>   [PATCH] rcu_sync: simplify the state machine, introduce 
> __rcu_sync_enter()
>   https://lkml.org/lkml/2016/7/16/150
> 
> but I am not sure, will recheck tomorrow.

Just following up...  Here is what I currently have.

Thanx, Paul



commit 1c1d315dfb7049d0233b89948a3fbcb61ea15d26
Author: Dennis Krein 
Date:   Fri Oct 26 07:38:24 2018 -0700

srcu: Lock srcu_data structure in srcu_gp_start()

The srcu_gp_start() function is called with the srcu_struct structure's
->lock held, but not with the srcu_data structure's ->lock.  This is
problematic because this function accesses and updates the srcu_data
structure's ->srcu_cblist, which is protected by that lock.  Failing to
hold this lock can result in corruption of the SRCU callback lists,
which in turn can result in arbitrarily bad results.

This commit therefore makes srcu_gp_start() acquire the srcu_data
structure's ->lock across the calls to rcu_segcblist_advance() and
rcu_segcblist_accelerate(), thus preventing this corruption.

Reported-by: Bart Van Assche 
Reported-by: Christoph Hellwig 
Signed-off-by: Dennis Krein 
Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 60f3236beaf7..697a2d7e8e8a 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -451,10 +451,12 @@ static void srcu_gp_start(struct srcu_struct *sp)
 
lockdep_assert_held(_PRIVATE(sp, lock));
WARN_ON_ONCE(ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed));
+   spin_lock_rcu_node(sdp);  /* Interrupts already disabled. */
rcu_segcblist_advance(>srcu_cblist,
  rcu_seq_current(>srcu_gp_seq));
(void)rcu_segcblist_accelerate(>srcu_cblist,
   rcu_seq_snap(>srcu_gp_seq));
+   spin_unlock_rcu_node(sdp);  /* Interrupts remain disabled. */
smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */
rcu_seq_start(>srcu_gp_seq);
state = rcu_seq_state(READ_ONCE(sp->srcu_gp_seq));



Re: [PATCH RFC kenrel/rcu] Eliminate BUG_ON() for sync.c

2018-10-30 Thread Paul E. McKenney
On Mon, Oct 22, 2018 at 06:14:40PM +0200, Oleg Nesterov wrote:
> On 10/22, Paul E. McKenney wrote:
> >
> > > > @@ -125,12 +125,12 @@ void rcu_sync_enter(struct rcu_sync *rsp)
> > > > rsp->gp_state = GP_PENDING;
> > > > spin_unlock_irq(>rss_lock);
> > > >
> > > > -   BUG_ON(need_wait && need_sync);
> > > > -
> > > > if (need_sync) {
> > > > gp_ops[rsp->gp_type].sync();
> > > > rsp->gp_state = GP_PASSED;
> > > > wake_up_all(>gp_wait);
> > > > +   if (WARN_ON_ONCE(need_wait))
> > > > +   wait_event(rsp->gp_wait, rsp->gp_state == 
> > > > GP_PASSED);
> > >
> > > This wait_event(gp_state == GP_PASSED) is pointless, note that this branch
> > > does gp_state = GP_PASSED 2 lines above.
> >
> > OK, I have removed this one.
> >
> > > And if we add WARN_ON_ONCE(need_wait), then we should probably also add
> > > WARN_ON_ONCE(need_sync) into the next "if (need_wait)" branch just for
> > > symmetry.
> >
> > But in that case, the earlier "if" prevents "need_sync" from ever getting
> > there, unless I lost the thread here.
> 
> Yes, you are right, we would also need to remove "else",
> 
> > Should I remove the others?
> 
> Up to you, I am fine either way.
> 
> IOW, feel free to remove this BUG_ON's altogether, or turn them all into
> WARN_ON_ONCE's, whatever you like more.
> 
> > > 
> > > Damn.
> > >
> > > This suddenly reminds me that I rewrote this code completely, and you even
> > > reviewed the new implementation and (iirc) acked it!
> > >
> > > However, I failed to force myself to rewrite the comments, and that is why
> > > I didn't send the "official" patch :/
> > >
> > > May be some time...
> >
> > Could you please point me at the last email thread?  Yes, I should be
> > able to find it, but I would probably get the wrong one.  :-/
> 
> probably this one,
> 
>   [PATCH] rcu_sync: simplify the state machine, introduce 
> __rcu_sync_enter()
>   https://lkml.org/lkml/2016/7/16/150
> 
> but I am not sure, will recheck tomorrow.

Just following up...  Here is what I currently have.

Thanx, Paul



commit 1c1d315dfb7049d0233b89948a3fbcb61ea15d26
Author: Dennis Krein 
Date:   Fri Oct 26 07:38:24 2018 -0700

srcu: Lock srcu_data structure in srcu_gp_start()

The srcu_gp_start() function is called with the srcu_struct structure's
->lock held, but not with the srcu_data structure's ->lock.  This is
problematic because this function accesses and updates the srcu_data
structure's ->srcu_cblist, which is protected by that lock.  Failing to
hold this lock can result in corruption of the SRCU callback lists,
which in turn can result in arbitrarily bad results.

This commit therefore makes srcu_gp_start() acquire the srcu_data
structure's ->lock across the calls to rcu_segcblist_advance() and
rcu_segcblist_accelerate(), thus preventing this corruption.

Reported-by: Bart Van Assche 
Reported-by: Christoph Hellwig 
Signed-off-by: Dennis Krein 
Signed-off-by: Paul E. McKenney 

diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
index 60f3236beaf7..697a2d7e8e8a 100644
--- a/kernel/rcu/srcutree.c
+++ b/kernel/rcu/srcutree.c
@@ -451,10 +451,12 @@ static void srcu_gp_start(struct srcu_struct *sp)
 
lockdep_assert_held(_PRIVATE(sp, lock));
WARN_ON_ONCE(ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed));
+   spin_lock_rcu_node(sdp);  /* Interrupts already disabled. */
rcu_segcblist_advance(>srcu_cblist,
  rcu_seq_current(>srcu_gp_seq));
(void)rcu_segcblist_accelerate(>srcu_cblist,
   rcu_seq_snap(>srcu_gp_seq));
+   spin_unlock_rcu_node(sdp);  /* Interrupts remain disabled. */
smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */
rcu_seq_start(>srcu_gp_seq);
state = rcu_seq_state(READ_ONCE(sp->srcu_gp_seq));



RE: [PATCH] x86/fsgsbase/64: Fix the base write helper functions

2018-10-30 Thread Bae, Chang Seok
> Factor out the code to change index from x86_fsbase_write_cpu() and
> x86_gsbase_write_cpu_inactive(). Now the code is located in
> do_arch_prctl_64().
> 
> The helper functions that purport to write the base register should just 
> write the
> base register only. It shouldn't have magic optimizations to change the index.
> 
> putreg() in ptrace does not write the current task, but a stopped task.
> 

Wm, I just realized that the changelog should also clearly include this:
"While at here, subsequently factor out the read for the current task's 
base 
from x86_fsbase_read_task() and x86_gsbase_read_task() to 
do_arch_prctl_64()."
Sorry for this. Let me know if you feel it should go as a separate patch.

Chang


RE: [PATCH] x86/fsgsbase/64: Fix the base write helper functions

2018-10-30 Thread Bae, Chang Seok
> Factor out the code to change index from x86_fsbase_write_cpu() and
> x86_gsbase_write_cpu_inactive(). Now the code is located in
> do_arch_prctl_64().
> 
> The helper functions that purport to write the base register should just 
> write the
> base register only. It shouldn't have magic optimizations to change the index.
> 
> putreg() in ptrace does not write the current task, but a stopped task.
> 

Wm, I just realized that the changelog should also clearly include this:
"While at here, subsequently factor out the read for the current task's 
base 
from x86_fsbase_read_task() and x86_gsbase_read_task() to 
do_arch_prctl_64()."
Sorry for this. Let me know if you feel it should go as a separate patch.

Chang


[PATCH] mm: hide incomplete nr_indirectly_reclaimable in /proc/zoneinfo

2018-10-30 Thread Roman Gushchin
Yongqin reported that /proc/zoneinfo format is broken in 4.14
due to commit 7aaf77272358 ("mm: don't show nr_indirectly_reclaimable
in /proc/vmstat")

Node 0, zone  DMA
  per-node stats
  nr_inactive_anon 403
  nr_active_anon 89123
  nr_inactive_file 128887
  nr_active_file 47377
  nr_unevictable 2053
  nr_slab_reclaimable 7510
  nr_slab_unreclaimable 10775
  nr_isolated_anon 0
  nr_isolated_file 0
  <...>
  nr_vmscan_write 0
  nr_vmscan_immediate_reclaim 0
  nr_dirtied   6022
  nr_written   5985
   74240
  ^^
  pages free 131656

The problem is caused by the nr_indirectly_reclaimable counter,
which is hidden from the /proc/vmstat, but not from the
/proc/zoneinfo. Let's fix this inconsistency and hide the
counter from /proc/zoneinfo exactly as from /proc/vmstat.

BTW, in 4.19+ the counter has been renamed and exported by
the commit b29940c1abd7 ("mm: rename and change semantics of
nr_indirectly_reclaimable_bytes"), so there is no such a problem
anymore.

Cc:  # 4.14.x-4.18.x
Fixes: 7aaf77272358 ("mm: don't show nr_indirectly_reclaimable in /proc/vmstat")
Reported-by: Yongqin Liu 
Signed-off-by: Roman Gushchin 
Cc: Vlastimil Babka 
Cc: Andrew Morton 
---
 mm/vmstat.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 527ae727d547..6389e876c7a7 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1500,6 +1500,10 @@ static void zoneinfo_show_print(struct seq_file *m, 
pg_data_t *pgdat,
if (is_zone_first_populated(pgdat, zone)) {
seq_printf(m, "\n  per-node stats");
for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
+   /* Skip hidden vmstat items. */
+   if (*vmstat_text[i + NR_VM_ZONE_STAT_ITEMS +
+NR_VM_NUMA_STAT_ITEMS] == '\0')
+   continue;
seq_printf(m, "\n  %-12s %lu",
vmstat_text[i + NR_VM_ZONE_STAT_ITEMS +
NR_VM_NUMA_STAT_ITEMS],
-- 
2.17.2



[PATCH] mm: hide incomplete nr_indirectly_reclaimable in /proc/zoneinfo

2018-10-30 Thread Roman Gushchin
Yongqin reported that /proc/zoneinfo format is broken in 4.14
due to commit 7aaf77272358 ("mm: don't show nr_indirectly_reclaimable
in /proc/vmstat")

Node 0, zone  DMA
  per-node stats
  nr_inactive_anon 403
  nr_active_anon 89123
  nr_inactive_file 128887
  nr_active_file 47377
  nr_unevictable 2053
  nr_slab_reclaimable 7510
  nr_slab_unreclaimable 10775
  nr_isolated_anon 0
  nr_isolated_file 0
  <...>
  nr_vmscan_write 0
  nr_vmscan_immediate_reclaim 0
  nr_dirtied   6022
  nr_written   5985
   74240
  ^^
  pages free 131656

The problem is caused by the nr_indirectly_reclaimable counter,
which is hidden from the /proc/vmstat, but not from the
/proc/zoneinfo. Let's fix this inconsistency and hide the
counter from /proc/zoneinfo exactly as from /proc/vmstat.

BTW, in 4.19+ the counter has been renamed and exported by
the commit b29940c1abd7 ("mm: rename and change semantics of
nr_indirectly_reclaimable_bytes"), so there is no such a problem
anymore.

Cc:  # 4.14.x-4.18.x
Fixes: 7aaf77272358 ("mm: don't show nr_indirectly_reclaimable in /proc/vmstat")
Reported-by: Yongqin Liu 
Signed-off-by: Roman Gushchin 
Cc: Vlastimil Babka 
Cc: Andrew Morton 
---
 mm/vmstat.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 527ae727d547..6389e876c7a7 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1500,6 +1500,10 @@ static void zoneinfo_show_print(struct seq_file *m, 
pg_data_t *pgdat,
if (is_zone_first_populated(pgdat, zone)) {
seq_printf(m, "\n  per-node stats");
for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
+   /* Skip hidden vmstat items. */
+   if (*vmstat_text[i + NR_VM_ZONE_STAT_ITEMS +
+NR_VM_NUMA_STAT_ITEMS] == '\0')
+   continue;
seq_printf(m, "\n  %-12s %lu",
vmstat_text[i + NR_VM_ZONE_STAT_ITEMS +
NR_VM_NUMA_STAT_ITEMS],
-- 
2.17.2



Re: Logitech high-resolution scrolling..

2018-10-30 Thread Harry Cutts
Thanks for the analysis, Peter.

On Mon, 29 Oct 2018 at 23:27, Peter Hutterer  wrote:
> IMO this is a lost battle because you cannot know when the ratchet is
> enabled or not (at least not on all mice). Users switch between ratchet and
> freewheeling time and once you're out of one mode, you have no reference
> to the other mode's reset point anymore.

It would be a lost battle, if it weren't for the fact that on all the
mice I've tested, putting the wheel back into clicky mode causes the
wheel to jump to the nearest notch resting point, which should mean
that the remainder resets to 0 (or maybe ±1 if the mechanism is worn).

> So my suggestion is to combine Linus' reset with your approach and use the
> center-point for the trigger. This gives us a few events to slide and still
> do the right thing, and any direction change will reset anyway. Biggest
> drawback is that the first event after a direction change is triggered
> faster than the next event. Otherwise it feels correct to me, both in
> free-wheeling and in ratchet mode now.

This sounds like a reasonable approach if we find that we can't keep
the triggering point consistent.

> Also, WTF moment: I managed to get the mouse into a state where it would
> only give me 1 hi-res event per notch movement but failed to reproduce that
> again.

Interesting; let me know if you manage to reliably reproduce it. The
only time I've encountered this in the past was when connecting to the
mouse over BLE, where we don't seem to be able to detect if the mouse
is power cycled (meaning that the mouse resets to low-res mode but the
kernel is still expecting high-res reports). I held off on enabling
high-res scrolling over Bluetooth for this reason.

On Tue, 30 Oct 2018 at 09:29, Linus Torvalds
 wrote:
> I wonder if there's some docs on what Logitech does internally in the
> mouse. It might involve a timeout (ie "if not moving for a while, do
> the rounding _and_ reset), which would probably be too expensive to do
> on the host side.

I've been wondering this as well. Nestor (CCed), is there anything you
can tell us about this?

Harry Cutts
Chrome OS Touch/Input team


Re: Logitech high-resolution scrolling..

2018-10-30 Thread Harry Cutts
Thanks for the analysis, Peter.

On Mon, 29 Oct 2018 at 23:27, Peter Hutterer  wrote:
> IMO this is a lost battle because you cannot know when the ratchet is
> enabled or not (at least not on all mice). Users switch between ratchet and
> freewheeling time and once you're out of one mode, you have no reference
> to the other mode's reset point anymore.

It would be a lost battle, if it weren't for the fact that on all the
mice I've tested, putting the wheel back into clicky mode causes the
wheel to jump to the nearest notch resting point, which should mean
that the remainder resets to 0 (or maybe ±1 if the mechanism is worn).

> So my suggestion is to combine Linus' reset with your approach and use the
> center-point for the trigger. This gives us a few events to slide and still
> do the right thing, and any direction change will reset anyway. Biggest
> drawback is that the first event after a direction change is triggered
> faster than the next event. Otherwise it feels correct to me, both in
> free-wheeling and in ratchet mode now.

This sounds like a reasonable approach if we find that we can't keep
the triggering point consistent.

> Also, WTF moment: I managed to get the mouse into a state where it would
> only give me 1 hi-res event per notch movement but failed to reproduce that
> again.

Interesting; let me know if you manage to reliably reproduce it. The
only time I've encountered this in the past was when connecting to the
mouse over BLE, where we don't seem to be able to detect if the mouse
is power cycled (meaning that the mouse resets to low-res mode but the
kernel is still expecting high-res reports). I held off on enabling
high-res scrolling over Bluetooth for this reason.

On Tue, 30 Oct 2018 at 09:29, Linus Torvalds
 wrote:
> I wonder if there's some docs on what Logitech does internally in the
> mouse. It might involve a timeout (ie "if not moving for a while, do
> the rounding _and_ reset), which would probably be too expensive to do
> on the host side.

I've been wondering this as well. Nestor (CCed), is there anything you
can tell us about this?

Harry Cutts
Chrome OS Touch/Input team


RE: [PATCH] Choose CPU based on allocated IRQs

2018-10-30 Thread Long Li
> Subject: Re: [PATCH] Choose CPU based on allocated IRQs
> 
> Long,
> 
> On Tue, 23 Oct 2018, Long Li wrote:
> 
> thanks for this patch.
> 
> A trivial formal thing ahead. The subject line
> 
>[PATCH] Choose CPU based on allocated IRQs
> 
> is lacking a proper subsystem prefix. In most cases you can figure the prefix
> out by running 'git log path/to/file' which in this case will show you that 
> most
> commits touching this file use the prefix 'genirq/matrix:'.
> 
> So the proper subject would be:
> 
>[PATCH] genirq/matrix: Choose CPU based on allocated IRQs
> 
> Subsystem prefixes are important to see where a patch belongs to right from
> the subject. Without that it could belong to any random part of the kernel
> and needs further inspection of the patch itself. This applies to both email
> and to git shortlog listings.

Thank you. I will send v2 to address this.

> 
> > From: Long Li 
> >
> > In irq_matrix, "available" is set when IRQs are allocated earlier in
> > the IRQ assigning process.
> >
> > Later, when IRQs are activated those values are not good indicators of
> > what CPU to choose to assign to this IRQ.
> 
> Can you please explain why you think that available is the wrong indicator
> and which problem you are trying to solve?
> 
> The WHY is really the most important part of a changelog.

The problem I'm seeing is that on a very large system with multiple devices of 
the same class (e.g. NVMe disks, using managed IRQs), they tend to use 
interrupts on several CPUs on the system. Under heavy load, those several CPUs 
are busy while other CPU are most idling. The issue is that when NVMe call 
irq_matrix_alloc_managed(), the assigned the CPU is always the first CPU in the 
cpumask, because they check for cpumap->available that will not change after 
managed IRQs are reserved in irq_matrix_reserve_managed (which was called from 
the 1st stage of IRQ setup in irq_domain_ops->alloc).

> 
> > Change to choose CPU for an IRQ based on how many IRQs are already
> > allocated on this CPU.
> 
> Looking deeper. The initial values are:
> 
> available = alloc_size - (managed + systembits)
> allocated = 0
> 
> There are two distinct functionalities which modify 'available' and 
> 'allocated'
> (omitting the reverse operations for simplicity):
> 
> 1) managed interrupts
> 
>reserve_managed()
>   managed++;
>   available--;
> 
>alloc_managed()
> allocated++;
> 
> 2) regular interrupts
> 
>alloc()
>   allocated++;
>   available--;
> 
> So 'available' can be lower than 'allocated' depending on the number of
> reserved managed interrupts, which have not yet been activated.
> 
> So for all regular interrupts we really want to look at the number of 
> 'available'
> vectors because the reserved managed ones are already accounted there
> and they need to be taken into account.

I think "reserved managed" may not always be accurate. Reserved managed IRQs 
may not always get activated. For an irq_data, when irq_matrix_reserve_managed 
is called, all the CPUs in the cpumask are reserved. Later, only one of them is 
activated via the call to irq_matrix_alloc_managed(). So we end up with a 
number of "reserved managed" that never get used.

> 
> For the spreading of managed interrupts in alloc_managed() that's indeed a
> different story and 'allocated' is more correct. But even that is not 
> completely
> accurate and can lead to the wrong result. The accurate solution would be to
> account the managed _and_ allocated vectors separately and do the
> spreading for managed interrupts based on that.

I think checking for "allocated" is the best approach for picking which CPU to 
assign for a given irq_data, since we really can't rely on "managed" to decide 
how busy this CPU really is. Checking for "allocated" should work for both 
unmanaged and managed IRQs.

> 
> Thanks,
> 
>   tglx


RE: [PATCH] Choose CPU based on allocated IRQs

2018-10-30 Thread Long Li
> Subject: Re: [PATCH] Choose CPU based on allocated IRQs
> 
> Long,
> 
> On Tue, 23 Oct 2018, Long Li wrote:
> 
> thanks for this patch.
> 
> A trivial formal thing ahead. The subject line
> 
>[PATCH] Choose CPU based on allocated IRQs
> 
> is lacking a proper subsystem prefix. In most cases you can figure the prefix
> out by running 'git log path/to/file' which in this case will show you that 
> most
> commits touching this file use the prefix 'genirq/matrix:'.
> 
> So the proper subject would be:
> 
>[PATCH] genirq/matrix: Choose CPU based on allocated IRQs
> 
> Subsystem prefixes are important to see where a patch belongs to right from
> the subject. Without that it could belong to any random part of the kernel
> and needs further inspection of the patch itself. This applies to both email
> and to git shortlog listings.

Thank you. I will send v2 to address this.

> 
> > From: Long Li 
> >
> > In irq_matrix, "available" is set when IRQs are allocated earlier in
> > the IRQ assigning process.
> >
> > Later, when IRQs are activated those values are not good indicators of
> > what CPU to choose to assign to this IRQ.
> 
> Can you please explain why you think that available is the wrong indicator
> and which problem you are trying to solve?
> 
> The WHY is really the most important part of a changelog.

The problem I'm seeing is that on a very large system with multiple devices of 
the same class (e.g. NVMe disks, using managed IRQs), they tend to use 
interrupts on several CPUs on the system. Under heavy load, those several CPUs 
are busy while other CPU are most idling. The issue is that when NVMe call 
irq_matrix_alloc_managed(), the assigned the CPU is always the first CPU in the 
cpumask, because they check for cpumap->available that will not change after 
managed IRQs are reserved in irq_matrix_reserve_managed (which was called from 
the 1st stage of IRQ setup in irq_domain_ops->alloc).

> 
> > Change to choose CPU for an IRQ based on how many IRQs are already
> > allocated on this CPU.
> 
> Looking deeper. The initial values are:
> 
> available = alloc_size - (managed + systembits)
> allocated = 0
> 
> There are two distinct functionalities which modify 'available' and 
> 'allocated'
> (omitting the reverse operations for simplicity):
> 
> 1) managed interrupts
> 
>reserve_managed()
>   managed++;
>   available--;
> 
>alloc_managed()
> allocated++;
> 
> 2) regular interrupts
> 
>alloc()
>   allocated++;
>   available--;
> 
> So 'available' can be lower than 'allocated' depending on the number of
> reserved managed interrupts, which have not yet been activated.
> 
> So for all regular interrupts we really want to look at the number of 
> 'available'
> vectors because the reserved managed ones are already accounted there
> and they need to be taken into account.

I think "reserved managed" may not always be accurate. Reserved managed IRQs 
may not always get activated. For an irq_data, when irq_matrix_reserve_managed 
is called, all the CPUs in the cpumask are reserved. Later, only one of them is 
activated via the call to irq_matrix_alloc_managed(). So we end up with a 
number of "reserved managed" that never get used.

> 
> For the spreading of managed interrupts in alloc_managed() that's indeed a
> different story and 'allocated' is more correct. But even that is not 
> completely
> accurate and can lead to the wrong result. The accurate solution would be to
> account the managed _and_ allocated vectors separately and do the
> spreading for managed interrupts based on that.

I think checking for "allocated" is the best approach for picking which CPU to 
assign for a given irq_data, since we really can't rely on "managed" to decide 
how busy this CPU really is. Checking for "allocated" should work for both 
unmanaged and managed IRQs.

> 
> Thanks,
> 
>   tglx


Re: [PATCH] ARM: kprobes: Fix false positive with FORTIFY_SOURCE

2018-10-30 Thread William Cohen
On 10/22/18 5:30 AM, Kees Cook wrote:
> The arm compiler internally interprets an inline assembly label
> as an unsigned long value, not a pointer. As a result, under
> CONFIG_FORTIFY_SOURCE, the size of the array pointed to by an address
> of a label is 4 bytes, which was tripping the runtime checks. Instead,
> we can just cast the label (as done with the size calculations earlier)
> to avoid the problem.
> 
> Reported-by: William Cohen 
> Fixes: 6974f0c4555e ("include/linux/string.h: add the option of fortified 
> string.h functions")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Kees Cook 
> ---
>  arch/arm/probes/kprobes/opt-arm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/probes/kprobes/opt-arm.c 
> b/arch/arm/probes/kprobes/opt-arm.c
> index b2aa9b32bff2..2c118a6ab358 100644
> --- a/arch/arm/probes/kprobes/opt-arm.c
> +++ b/arch/arm/probes/kprobes/opt-arm.c
> @@ -247,7 +247,7 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe 
> *op, struct kprobe *or
>   }
>  
>   /* Copy arch-dep-instance from template. */
> - memcpy(code, _template_entry,
> + memcpy(code, (unsigned char *)optprobe_template_entry,
>   TMPL_END_IDX * sizeof(kprobe_opcode_t));
>  
>   /* Adjust buffer according to instruction. */
> 

The patch fixes the issue for kretprobes.  It looks good to me.

Thanks,

-Will


Re: [PATCH] ARM: kprobes: Fix false positive with FORTIFY_SOURCE

2018-10-30 Thread William Cohen
On 10/22/18 5:30 AM, Kees Cook wrote:
> The arm compiler internally interprets an inline assembly label
> as an unsigned long value, not a pointer. As a result, under
> CONFIG_FORTIFY_SOURCE, the size of the array pointed to by an address
> of a label is 4 bytes, which was tripping the runtime checks. Instead,
> we can just cast the label (as done with the size calculations earlier)
> to avoid the problem.
> 
> Reported-by: William Cohen 
> Fixes: 6974f0c4555e ("include/linux/string.h: add the option of fortified 
> string.h functions")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Kees Cook 
> ---
>  arch/arm/probes/kprobes/opt-arm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/probes/kprobes/opt-arm.c 
> b/arch/arm/probes/kprobes/opt-arm.c
> index b2aa9b32bff2..2c118a6ab358 100644
> --- a/arch/arm/probes/kprobes/opt-arm.c
> +++ b/arch/arm/probes/kprobes/opt-arm.c
> @@ -247,7 +247,7 @@ int arch_prepare_optimized_kprobe(struct optimized_kprobe 
> *op, struct kprobe *or
>   }
>  
>   /* Copy arch-dep-instance from template. */
> - memcpy(code, _template_entry,
> + memcpy(code, (unsigned char *)optprobe_template_entry,
>   TMPL_END_IDX * sizeof(kprobe_opcode_t));
>  
>   /* Adjust buffer according to instruction. */
> 

The patch fixes the issue for kretprobes.  It looks good to me.

Thanks,

-Will


[PATCH] x86/fsgsbase/64: Fix the base write helper functions

2018-10-30 Thread Chang S. Bae
Factor out the code to change index from x86_fsbase_write_cpu() and
x86_gsbase_write_cpu_inactive(). Now the code is located in
do_arch_prctl_64().

The helper functions that purport to write the base register should just
write the base register only. It shouldn't have magic optimizations to
change the index.

putreg() in ptrace does not write the current task, but a stopped task.

Suggested-by: Andy Lutomirski 
Signed-off-by: Chang S. Bae 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: H. Peter Anvin 
Cc: Andi Kleen 
Cc: Dave Hansen 
---
 arch/x86/kernel/process_64.c | 67 +---
 1 file changed, 39 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 31b4755369f0..4fd865fb7097 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -339,19 +339,11 @@ static unsigned long x86_fsgsbase_read_task(struct 
task_struct *task,
 
 void x86_fsbase_write_cpu(unsigned long fsbase)
 {
-   /*
-* Set the selector to 0 as a notion, that the segment base is
-* overwritten, which will be checked for skipping the segment load
-* during context switch.
-*/
-   loadseg(FS, 0);
wrmsrl(MSR_FS_BASE, fsbase);
 }
 
 void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
 {
-   /* Set the selector to 0 for the same reason as %fs above. */
-   loadseg(GS, 0);
wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
 }
 
@@ -359,9 +351,7 @@ unsigned long x86_fsbase_read_task(struct task_struct *task)
 {
unsigned long fsbase;
 
-   if (task == current)
-   fsbase = x86_fsbase_read_cpu();
-   else if (task->thread.fsindex == 0)
+   if (task->thread.fsindex == 0)
fsbase = task->thread.fsbase;
else
fsbase = x86_fsgsbase_read_task(task, task->thread.fsindex);
@@ -373,9 +363,7 @@ unsigned long x86_gsbase_read_task(struct task_struct *task)
 {
unsigned long gsbase;
 
-   if (task == current)
-   gsbase = x86_gsbase_read_cpu_inactive();
-   else if (task->thread.gsindex == 0)
+   if (task->thread.gsindex == 0)
gsbase = task->thread.gsbase;
else
gsbase = x86_fsgsbase_read_task(task, task->thread.gsindex);
@@ -392,12 +380,8 @@ int x86_fsbase_write_task(struct task_struct *task, 
unsigned long fsbase)
if (unlikely(fsbase >= TASK_SIZE_MAX))
return -EPERM;
 
-   preempt_disable();
task->thread.fsbase = fsbase;
-   if (task == current)
-   x86_fsbase_write_cpu(fsbase);
task->thread.fsindex = 0;
-   preempt_enable();
 
return 0;
 }
@@ -407,12 +391,8 @@ int x86_gsbase_write_task(struct task_struct *task, 
unsigned long gsbase)
if (unlikely(gsbase >= TASK_SIZE_MAX))
return -EPERM;
 
-   preempt_disable();
task->thread.gsbase = gsbase;
-   if (task == current)
-   x86_gsbase_write_cpu_inactive(gsbase);
task->thread.gsindex = 0;
-   preempt_enable();
 
return 0;
 }
@@ -757,22 +737,53 @@ long do_arch_prctl_64(struct task_struct *task, int 
option, unsigned long arg2)
int ret = 0;
 
switch (option) {
-   case ARCH_SET_GS: {
-   ret = x86_gsbase_write_task(task, arg2);
-   break;
-   }
case ARCH_SET_FS: {
+   preempt_disable();
ret = x86_fsbase_write_task(task, arg2);
+   if (task == current && ret == 0) {
+   /*
+* Set the selector to 0, implying that the base is
+* overwritten. The index value will be checked
+* during context switch for skipping segment reload.
+*/
+   loadseg(FS, 0);
+   x86_fsbase_write_cpu(arg2);
+   }
+   preempt_enable();
+   break;
+   }
+   case ARCH_SET_GS: {
+   preempt_disable();
+   ret = x86_gsbase_write_task(task, arg2);
+   if (task == current && ret == 0) {
+   /*
+* Set the selector to 0 for the same reason
+* as %fs above.
+*/
+   loadseg(GS, 0);
+   x86_gsbase_write_cpu_inactive(arg2);
+   }
+   preempt_enable();
break;
}
case ARCH_GET_FS: {
-   unsigned long base = x86_fsbase_read_task(task);
+   unsigned long base;
+
+   if (task == current)
+   base = x86_fsbase_read_cpu();
+   else
+   base = x86_fsbase_read_task(task);
 
ret = put_user(base, (unsigned long __user *)arg2);
break;
}
case ARCH_GET_GS: {
-   unsigned long base = 

[PATCH] x86/fsgsbase/64: Fix the base write helper functions

2018-10-30 Thread Chang S. Bae
Factor out the code to change index from x86_fsbase_write_cpu() and
x86_gsbase_write_cpu_inactive(). Now the code is located in
do_arch_prctl_64().

The helper functions that purport to write the base register should just
write the base register only. It shouldn't have magic optimizations to
change the index.

putreg() in ptrace does not write the current task, but a stopped task.

Suggested-by: Andy Lutomirski 
Signed-off-by: Chang S. Bae 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: H. Peter Anvin 
Cc: Andi Kleen 
Cc: Dave Hansen 
---
 arch/x86/kernel/process_64.c | 67 +---
 1 file changed, 39 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 31b4755369f0..4fd865fb7097 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -339,19 +339,11 @@ static unsigned long x86_fsgsbase_read_task(struct 
task_struct *task,
 
 void x86_fsbase_write_cpu(unsigned long fsbase)
 {
-   /*
-* Set the selector to 0 as a notion, that the segment base is
-* overwritten, which will be checked for skipping the segment load
-* during context switch.
-*/
-   loadseg(FS, 0);
wrmsrl(MSR_FS_BASE, fsbase);
 }
 
 void x86_gsbase_write_cpu_inactive(unsigned long gsbase)
 {
-   /* Set the selector to 0 for the same reason as %fs above. */
-   loadseg(GS, 0);
wrmsrl(MSR_KERNEL_GS_BASE, gsbase);
 }
 
@@ -359,9 +351,7 @@ unsigned long x86_fsbase_read_task(struct task_struct *task)
 {
unsigned long fsbase;
 
-   if (task == current)
-   fsbase = x86_fsbase_read_cpu();
-   else if (task->thread.fsindex == 0)
+   if (task->thread.fsindex == 0)
fsbase = task->thread.fsbase;
else
fsbase = x86_fsgsbase_read_task(task, task->thread.fsindex);
@@ -373,9 +363,7 @@ unsigned long x86_gsbase_read_task(struct task_struct *task)
 {
unsigned long gsbase;
 
-   if (task == current)
-   gsbase = x86_gsbase_read_cpu_inactive();
-   else if (task->thread.gsindex == 0)
+   if (task->thread.gsindex == 0)
gsbase = task->thread.gsbase;
else
gsbase = x86_fsgsbase_read_task(task, task->thread.gsindex);
@@ -392,12 +380,8 @@ int x86_fsbase_write_task(struct task_struct *task, 
unsigned long fsbase)
if (unlikely(fsbase >= TASK_SIZE_MAX))
return -EPERM;
 
-   preempt_disable();
task->thread.fsbase = fsbase;
-   if (task == current)
-   x86_fsbase_write_cpu(fsbase);
task->thread.fsindex = 0;
-   preempt_enable();
 
return 0;
 }
@@ -407,12 +391,8 @@ int x86_gsbase_write_task(struct task_struct *task, 
unsigned long gsbase)
if (unlikely(gsbase >= TASK_SIZE_MAX))
return -EPERM;
 
-   preempt_disable();
task->thread.gsbase = gsbase;
-   if (task == current)
-   x86_gsbase_write_cpu_inactive(gsbase);
task->thread.gsindex = 0;
-   preempt_enable();
 
return 0;
 }
@@ -757,22 +737,53 @@ long do_arch_prctl_64(struct task_struct *task, int 
option, unsigned long arg2)
int ret = 0;
 
switch (option) {
-   case ARCH_SET_GS: {
-   ret = x86_gsbase_write_task(task, arg2);
-   break;
-   }
case ARCH_SET_FS: {
+   preempt_disable();
ret = x86_fsbase_write_task(task, arg2);
+   if (task == current && ret == 0) {
+   /*
+* Set the selector to 0, implying that the base is
+* overwritten. The index value will be checked
+* during context switch for skipping segment reload.
+*/
+   loadseg(FS, 0);
+   x86_fsbase_write_cpu(arg2);
+   }
+   preempt_enable();
+   break;
+   }
+   case ARCH_SET_GS: {
+   preempt_disable();
+   ret = x86_gsbase_write_task(task, arg2);
+   if (task == current && ret == 0) {
+   /*
+* Set the selector to 0 for the same reason
+* as %fs above.
+*/
+   loadseg(GS, 0);
+   x86_gsbase_write_cpu_inactive(arg2);
+   }
+   preempt_enable();
break;
}
case ARCH_GET_FS: {
-   unsigned long base = x86_fsbase_read_task(task);
+   unsigned long base;
+
+   if (task == current)
+   base = x86_fsbase_read_cpu();
+   else
+   base = x86_fsbase_read_task(task);
 
ret = put_user(base, (unsigned long __user *)arg2);
break;
}
case ARCH_GET_GS: {
-   unsigned long base = 

[PATCH v2] arm64: dts: sdm845: Add PSCI cpuidle low power states

2018-10-30 Thread Raju P.L.S.S.S.N
Add device bindings for cpuidle states for cpu devices.

Cc: 
Signed-off-by: Raju P.L.S.S.S.N 
---
Changes in v2
 - Address comments from Doug
---
---
 arch/arm64/boot/dts/qcom/sdm845.dtsi | 62 
 1 file changed, 62 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 0c9a2aa..3a8381e 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -96,6 +96,7 @@
reg = <0x0 0x0>;
enable-method = "psci";
next-level-cache = <_0>;
+   cpu-idle-states = <_CPU_PD _CPU_RPD _PD>;
L2_0: l2-cache {
compatible = "cache";
next-level-cache = <_0>;
@@ -111,6 +112,7 @@
reg = <0x0 0x100>;
enable-method = "psci";
next-level-cache = <_100>;
+   cpu-idle-states = <_CPU_PD _CPU_RPD _PD>;
L2_100: l2-cache {
compatible = "cache";
next-level-cache = <_0>;
@@ -123,6 +125,7 @@
reg = <0x0 0x200>;
enable-method = "psci";
next-level-cache = <_200>;
+   cpu-idle-states = <_CPU_PD _CPU_RPD _PD>;
L2_200: l2-cache {
compatible = "cache";
next-level-cache = <_0>;
@@ -135,6 +138,7 @@
reg = <0x0 0x300>;
enable-method = "psci";
next-level-cache = <_300>;
+   cpu-idle-states = <_CPU_PD _CPU_RPD _PD>;
L2_300: l2-cache {
compatible = "cache";
next-level-cache = <_0>;
@@ -147,6 +151,7 @@
reg = <0x0 0x400>;
enable-method = "psci";
next-level-cache = <_400>;
+   cpu-idle-states = <_CPU_PD _CPU_RPD _PD>;
L2_400: l2-cache {
compatible = "cache";
next-level-cache = <_0>;
@@ -159,6 +164,7 @@
reg = <0x0 0x500>;
enable-method = "psci";
next-level-cache = <_500>;
+   cpu-idle-states = <_CPU_PD _CPU_RPD _PD>;
L2_500: l2-cache {
compatible = "cache";
next-level-cache = <_0>;
@@ -171,6 +177,7 @@
reg = <0x0 0x600>;
enable-method = "psci";
next-level-cache = <_600>;
+   cpu-idle-states = <_CPU_PD _CPU_RPD _PD>;
L2_600: l2-cache {
compatible = "cache";
next-level-cache = <_0>;
@@ -183,11 +190,66 @@
reg = <0x0 0x700>;
enable-method = "psci";
next-level-cache = <_700>;
+   cpu-idle-states = <_CPU_PD _CPU_RPD _PD>;
L2_700: l2-cache {
compatible = "cache";
next-level-cache = <_0>;
};
};
+
+   idle-states {
+   entry-method = "psci";
+
+   C0_CPU_PD: c0-power-down {
+   compatible = "arm,idle-state";
+   arm,psci-suspend-param = <0x4003>;
+   entry-latency-us = <350>;
+   exit-latency-us = <461>;
+   min-residency-us = <1890>;
+   local-timer-stop;
+   idle-state-name = "power-down";
+   };
+
+   C0_CPU_RPD: c0-rail-power-down {
+   compatible = "arm,idle-state";
+   arm,psci-suspend-param = <0x4004>;
+   entry-latency-us = <360>;
+   exit-latency-us = <531>;
+   min-residency-us = <3934>;
+   local-timer-stop;
+   idle-state-name = "rail-power-down";
+   };
+
+   C4_CPU_PD: c4-power-down {
+   compatible = "arm,idle-state";
+   arm,psci-suspend-param = <0x4003>;
+   entry-latency-us = <264>;
+   exit-latency-us = <621>;
+   

[PATCH v2] arm64: dts: sdm845: Add PSCI cpuidle low power states

2018-10-30 Thread Raju P.L.S.S.S.N
Add device bindings for cpuidle states for cpu devices.

Cc: 
Signed-off-by: Raju P.L.S.S.S.N 
---
Changes in v2
 - Address comments from Doug
---
---
 arch/arm64/boot/dts/qcom/sdm845.dtsi | 62 
 1 file changed, 62 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi 
b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 0c9a2aa..3a8381e 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -96,6 +96,7 @@
reg = <0x0 0x0>;
enable-method = "psci";
next-level-cache = <_0>;
+   cpu-idle-states = <_CPU_PD _CPU_RPD _PD>;
L2_0: l2-cache {
compatible = "cache";
next-level-cache = <_0>;
@@ -111,6 +112,7 @@
reg = <0x0 0x100>;
enable-method = "psci";
next-level-cache = <_100>;
+   cpu-idle-states = <_CPU_PD _CPU_RPD _PD>;
L2_100: l2-cache {
compatible = "cache";
next-level-cache = <_0>;
@@ -123,6 +125,7 @@
reg = <0x0 0x200>;
enable-method = "psci";
next-level-cache = <_200>;
+   cpu-idle-states = <_CPU_PD _CPU_RPD _PD>;
L2_200: l2-cache {
compatible = "cache";
next-level-cache = <_0>;
@@ -135,6 +138,7 @@
reg = <0x0 0x300>;
enable-method = "psci";
next-level-cache = <_300>;
+   cpu-idle-states = <_CPU_PD _CPU_RPD _PD>;
L2_300: l2-cache {
compatible = "cache";
next-level-cache = <_0>;
@@ -147,6 +151,7 @@
reg = <0x0 0x400>;
enable-method = "psci";
next-level-cache = <_400>;
+   cpu-idle-states = <_CPU_PD _CPU_RPD _PD>;
L2_400: l2-cache {
compatible = "cache";
next-level-cache = <_0>;
@@ -159,6 +164,7 @@
reg = <0x0 0x500>;
enable-method = "psci";
next-level-cache = <_500>;
+   cpu-idle-states = <_CPU_PD _CPU_RPD _PD>;
L2_500: l2-cache {
compatible = "cache";
next-level-cache = <_0>;
@@ -171,6 +177,7 @@
reg = <0x0 0x600>;
enable-method = "psci";
next-level-cache = <_600>;
+   cpu-idle-states = <_CPU_PD _CPU_RPD _PD>;
L2_600: l2-cache {
compatible = "cache";
next-level-cache = <_0>;
@@ -183,11 +190,66 @@
reg = <0x0 0x700>;
enable-method = "psci";
next-level-cache = <_700>;
+   cpu-idle-states = <_CPU_PD _CPU_RPD _PD>;
L2_700: l2-cache {
compatible = "cache";
next-level-cache = <_0>;
};
};
+
+   idle-states {
+   entry-method = "psci";
+
+   C0_CPU_PD: c0-power-down {
+   compatible = "arm,idle-state";
+   arm,psci-suspend-param = <0x4003>;
+   entry-latency-us = <350>;
+   exit-latency-us = <461>;
+   min-residency-us = <1890>;
+   local-timer-stop;
+   idle-state-name = "power-down";
+   };
+
+   C0_CPU_RPD: c0-rail-power-down {
+   compatible = "arm,idle-state";
+   arm,psci-suspend-param = <0x4004>;
+   entry-latency-us = <360>;
+   exit-latency-us = <531>;
+   min-residency-us = <3934>;
+   local-timer-stop;
+   idle-state-name = "rail-power-down";
+   };
+
+   C4_CPU_PD: c4-power-down {
+   compatible = "arm,idle-state";
+   arm,psci-suspend-param = <0x4003>;
+   entry-latency-us = <264>;
+   exit-latency-us = <621>;
+   

Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Tycho Andersen
On Tue, Oct 30, 2018 at 05:39:26PM +0100, Oleg Nesterov wrote:
> On 10/30, Oleg Nesterov wrote:
> >
> > On 10/30, Tycho Andersen wrote:
> > >
> > > @@ -828,6 +823,11 @@ static int __seccomp_filter(int this_syscall, const 
> > > struct seccomp_data *sd,
> > >*/
> > >   rmb();
> > >
> > > + if (!sd) {
> > > + populate_seccomp_data(_local);
> > > + sd = _local;
> > > + }
> > > +
> >
> > To me it would be more clean to remove the "if (!sd)" check, 
> > case(SECCOMP_RET_TRACE)
> > in __seccomp_filter() can simply do populate_seccomp_data(_local) 
> > unconditionally
> > and pass _local to __seccomp_filter().
> 
> Ah, please ignore, emulate_vsyscall() does secure_computing(NULL).
> 
> Btw. why __seccomp_filter() doesn't return a boolean?
> 
> Or at least, why can't case(SECCOMP_RET_TRACE) simply do
> 
>   return __seccomp_filter(this_syscall, NULL, true);
> 
> ?

Yeah, at least the second one definitely makes sense. I can add that
as a patch in the next version of this series unless Kees does it
before.

Thanks for your help, Oleg!

Tycho


Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Tycho Andersen
On Tue, Oct 30, 2018 at 05:39:26PM +0100, Oleg Nesterov wrote:
> On 10/30, Oleg Nesterov wrote:
> >
> > On 10/30, Tycho Andersen wrote:
> > >
> > > @@ -828,6 +823,11 @@ static int __seccomp_filter(int this_syscall, const 
> > > struct seccomp_data *sd,
> > >*/
> > >   rmb();
> > >
> > > + if (!sd) {
> > > + populate_seccomp_data(_local);
> > > + sd = _local;
> > > + }
> > > +
> >
> > To me it would be more clean to remove the "if (!sd)" check, 
> > case(SECCOMP_RET_TRACE)
> > in __seccomp_filter() can simply do populate_seccomp_data(_local) 
> > unconditionally
> > and pass _local to __seccomp_filter().
> 
> Ah, please ignore, emulate_vsyscall() does secure_computing(NULL).
> 
> Btw. why __seccomp_filter() doesn't return a boolean?
> 
> Or at least, why can't case(SECCOMP_RET_TRACE) simply do
> 
>   return __seccomp_filter(this_syscall, NULL, true);
> 
> ?

Yeah, at least the second one definitely makes sense. I can add that
as a patch in the next version of this series unless Kees does it
before.

Thanks for your help, Oleg!

Tycho


Re: [PATCH] soc: ti: QMSS: Fix usage of irq_set_affinity_hint

2018-10-30 Thread Santosh Shilimkar

Hi Greg,

On 10/30/2018 4:11 AM, Marc Zyngier wrote:

The Keystone QMSS driver is pretty damaged, in the sense that it
does things like this:

irq_set_affinity_hint(irq, to_cpumask(_map));

where cpu_map is a local variable. As we leave the function, this
will point to nowhere-land, and things will end-up badly.

Instead, let's use a proper cpumask that gets allocated, giving
the driver a chance to actually work with things like irqbalance
as well as have a hypothetical 64bit future.

Signed-off-by: Marc Zyngier 
---
I found this one by inspection after finding a similar bug in an
unrelated driver. It is only compile-tested. It would probably
a Cc stable, but that's Santosh's decision.


Would be able to apply this fix from Marc for stable or it needs
to be re-posted with CC to stable ?



  drivers/soc/ti/knav_qmss.h   |  4 ++--
  drivers/soc/ti/knav_qmss_acc.c   | 10 +-
  drivers/soc/ti/knav_qmss_queue.c | 22 +++---
  3 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/drivers/soc/ti/knav_qmss.h b/drivers/soc/ti/knav_qmss.h
index 3efc47e82973..bd040c29c4bf 100644
--- a/drivers/soc/ti/knav_qmss.h
+++ b/drivers/soc/ti/knav_qmss.h
@@ -329,8 +329,8 @@ struct knav_range_ops {
  };
  
  struct knav_irq_info {

-   int irq;
-   u32 cpu_map;
+   int irq;
+   struct cpumask  *cpu_mask;
  };
  
  struct knav_range_info {

diff --git a/drivers/soc/ti/knav_qmss_acc.c b/drivers/soc/ti/knav_qmss_acc.c
index 316e82e46f6c..2f7fb2dcc1d6 100644
--- a/drivers/soc/ti/knav_qmss_acc.c
+++ b/drivers/soc/ti/knav_qmss_acc.c
@@ -205,18 +205,18 @@ static int knav_range_setup_acc_irq(struct 
knav_range_info *range,
  {
struct knav_device *kdev = range->kdev;
struct knav_acc_channel *acc;
-   unsigned long cpu_map;
+   struct cpumask *cpu_mask;
int ret = 0, irq;
u32 old, new;
  
  	if (range->flags & RANGE_MULTI_QUEUE) {

acc = range->acc;
irq = range->irqs[0].irq;
-   cpu_map = range->irqs[0].cpu_map;
+   cpu_mask = range->irqs[0].cpu_mask;
} else {
acc = range->acc + queue;
irq = range->irqs[queue].irq;
-   cpu_map = range->irqs[queue].cpu_map;
+   cpu_mask = range->irqs[queue].cpu_mask;
}
  
  	old = acc->open_mask;

@@ -239,8 +239,8 @@ static int knav_range_setup_acc_irq(struct knav_range_info 
*range,
acc->name, acc->name);
ret = request_irq(irq, knav_acc_int_handler, 0, acc->name,
  range);
-   if (!ret && cpu_map) {
-   ret = irq_set_affinity_hint(irq, to_cpumask(_map));
+   if (!ret && cpu_mask) {
+   ret = irq_set_affinity_hint(irq, cpu_mask);
if (ret) {
dev_warn(range->kdev->dev,
 "Failed to set IRQ affinity\n");
diff --git a/drivers/soc/ti/knav_qmss_queue.c b/drivers/soc/ti/knav_qmss_queue.c
index b5d5673c255c..8b418379272d 100644
--- a/drivers/soc/ti/knav_qmss_queue.c
+++ b/drivers/soc/ti/knav_qmss_queue.c
@@ -118,19 +118,17 @@ static int knav_queue_setup_irq(struct knav_range_info 
*range,
  struct knav_queue_inst *inst)
  {
unsigned queue = inst->id - range->queue_base;
-   unsigned long cpu_map;
int ret = 0, irq;
  
  	if (range->flags & RANGE_HAS_IRQ) {

irq = range->irqs[queue].irq;
-   cpu_map = range->irqs[queue].cpu_map;
ret = request_irq(irq, knav_queue_int_handler, 0,
inst->irq_name, inst);
if (ret)
return ret;
disable_irq(irq);
-   if (cpu_map) {
-   ret = irq_set_affinity_hint(irq, to_cpumask(_map));
+   if (range->irqs[queue].cpu_mask) {
+   ret = irq_set_affinity_hint(irq, 
range->irqs[queue].cpu_mask);
if (ret) {
dev_warn(range->kdev->dev,
 "Failed to set IRQ affinity\n");
@@ -1262,9 +1260,19 @@ static int knav_setup_queue_range(struct knav_device 
*kdev,
  
  		range->num_irqs++;
  
-		if (IS_ENABLED(CONFIG_SMP) && oirq.args_count == 3)

-   range->irqs[i].cpu_map =
-   (oirq.args[2] & 0xff00) >> 8;
+   if (IS_ENABLED(CONFIG_SMP) && oirq.args_count == 3) {
+   unsigned long mask;
+   int bit;
+
+   range->irqs[i].cpu_mask = devm_kzalloc(dev,
+  cpumask_size(), 
GFP_KERNEL);
+   if (!range->irqs[i].cpu_mask)
+   return -ENOMEM;
+
+   mask = 

Re: [PATCH] soc: ti: QMSS: Fix usage of irq_set_affinity_hint

2018-10-30 Thread Santosh Shilimkar

Hi Greg,

On 10/30/2018 4:11 AM, Marc Zyngier wrote:

The Keystone QMSS driver is pretty damaged, in the sense that it
does things like this:

irq_set_affinity_hint(irq, to_cpumask(_map));

where cpu_map is a local variable. As we leave the function, this
will point to nowhere-land, and things will end-up badly.

Instead, let's use a proper cpumask that gets allocated, giving
the driver a chance to actually work with things like irqbalance
as well as have a hypothetical 64bit future.

Signed-off-by: Marc Zyngier 
---
I found this one by inspection after finding a similar bug in an
unrelated driver. It is only compile-tested. It would probably
a Cc stable, but that's Santosh's decision.


Would be able to apply this fix from Marc for stable or it needs
to be re-posted with CC to stable ?



  drivers/soc/ti/knav_qmss.h   |  4 ++--
  drivers/soc/ti/knav_qmss_acc.c   | 10 +-
  drivers/soc/ti/knav_qmss_queue.c | 22 +++---
  3 files changed, 22 insertions(+), 14 deletions(-)

diff --git a/drivers/soc/ti/knav_qmss.h b/drivers/soc/ti/knav_qmss.h
index 3efc47e82973..bd040c29c4bf 100644
--- a/drivers/soc/ti/knav_qmss.h
+++ b/drivers/soc/ti/knav_qmss.h
@@ -329,8 +329,8 @@ struct knav_range_ops {
  };
  
  struct knav_irq_info {

-   int irq;
-   u32 cpu_map;
+   int irq;
+   struct cpumask  *cpu_mask;
  };
  
  struct knav_range_info {

diff --git a/drivers/soc/ti/knav_qmss_acc.c b/drivers/soc/ti/knav_qmss_acc.c
index 316e82e46f6c..2f7fb2dcc1d6 100644
--- a/drivers/soc/ti/knav_qmss_acc.c
+++ b/drivers/soc/ti/knav_qmss_acc.c
@@ -205,18 +205,18 @@ static int knav_range_setup_acc_irq(struct 
knav_range_info *range,
  {
struct knav_device *kdev = range->kdev;
struct knav_acc_channel *acc;
-   unsigned long cpu_map;
+   struct cpumask *cpu_mask;
int ret = 0, irq;
u32 old, new;
  
  	if (range->flags & RANGE_MULTI_QUEUE) {

acc = range->acc;
irq = range->irqs[0].irq;
-   cpu_map = range->irqs[0].cpu_map;
+   cpu_mask = range->irqs[0].cpu_mask;
} else {
acc = range->acc + queue;
irq = range->irqs[queue].irq;
-   cpu_map = range->irqs[queue].cpu_map;
+   cpu_mask = range->irqs[queue].cpu_mask;
}
  
  	old = acc->open_mask;

@@ -239,8 +239,8 @@ static int knav_range_setup_acc_irq(struct knav_range_info 
*range,
acc->name, acc->name);
ret = request_irq(irq, knav_acc_int_handler, 0, acc->name,
  range);
-   if (!ret && cpu_map) {
-   ret = irq_set_affinity_hint(irq, to_cpumask(_map));
+   if (!ret && cpu_mask) {
+   ret = irq_set_affinity_hint(irq, cpu_mask);
if (ret) {
dev_warn(range->kdev->dev,
 "Failed to set IRQ affinity\n");
diff --git a/drivers/soc/ti/knav_qmss_queue.c b/drivers/soc/ti/knav_qmss_queue.c
index b5d5673c255c..8b418379272d 100644
--- a/drivers/soc/ti/knav_qmss_queue.c
+++ b/drivers/soc/ti/knav_qmss_queue.c
@@ -118,19 +118,17 @@ static int knav_queue_setup_irq(struct knav_range_info 
*range,
  struct knav_queue_inst *inst)
  {
unsigned queue = inst->id - range->queue_base;
-   unsigned long cpu_map;
int ret = 0, irq;
  
  	if (range->flags & RANGE_HAS_IRQ) {

irq = range->irqs[queue].irq;
-   cpu_map = range->irqs[queue].cpu_map;
ret = request_irq(irq, knav_queue_int_handler, 0,
inst->irq_name, inst);
if (ret)
return ret;
disable_irq(irq);
-   if (cpu_map) {
-   ret = irq_set_affinity_hint(irq, to_cpumask(_map));
+   if (range->irqs[queue].cpu_mask) {
+   ret = irq_set_affinity_hint(irq, 
range->irqs[queue].cpu_mask);
if (ret) {
dev_warn(range->kdev->dev,
 "Failed to set IRQ affinity\n");
@@ -1262,9 +1260,19 @@ static int knav_setup_queue_range(struct knav_device 
*kdev,
  
  		range->num_irqs++;
  
-		if (IS_ENABLED(CONFIG_SMP) && oirq.args_count == 3)

-   range->irqs[i].cpu_map =
-   (oirq.args[2] & 0xff00) >> 8;
+   if (IS_ENABLED(CONFIG_SMP) && oirq.args_count == 3) {
+   unsigned long mask;
+   int bit;
+
+   range->irqs[i].cpu_mask = devm_kzalloc(dev,
+  cpumask_size(), 
GFP_KERNEL);
+   if (!range->irqs[i].cpu_mask)
+   return -ENOMEM;
+
+   mask = 

Re: [PATCH] soc: ti: QMSS: Fix usage of irq_set_affinity_hint

2018-10-30 Thread Santosh Shilimkar

Hi Arnd, Olof,

On 10/30/2018 4:11 AM, Marc Zyngier wrote:

The Keystone QMSS driver is pretty damaged, in the sense that it
does things like this:

irq_set_affinity_hint(irq, to_cpumask(_map));

where cpu_map is a local variable. As we leave the function, this
will point to nowhere-land, and things will end-up badly.

Instead, let's use a proper cpumask that gets allocated, giving
the driver a chance to actually work with things like irqbalance
as well as have a hypothetical 64bit future.

Signed-off-by: Marc Zyngier 
---

Could you please add this patch to your fixes branch ?

FWIW,
Acked-by: Santosh Shilimkar 



Re: [PATCH] soc: ti: QMSS: Fix usage of irq_set_affinity_hint

2018-10-30 Thread Santosh Shilimkar

Hi Arnd, Olof,

On 10/30/2018 4:11 AM, Marc Zyngier wrote:

The Keystone QMSS driver is pretty damaged, in the sense that it
does things like this:

irq_set_affinity_hint(irq, to_cpumask(_map));

where cpu_map is a local variable. As we leave the function, this
will point to nowhere-land, and things will end-up badly.

Instead, let's use a proper cpumask that gets allocated, giving
the driver a chance to actually work with things like irqbalance
as well as have a hypothetical 64bit future.

Signed-off-by: Marc Zyngier 
---

Could you please add this patch to your fixes branch ?

FWIW,
Acked-by: Santosh Shilimkar 



Re: [PATCH] soc: ti: QMSS: Fix usage of irq_set_affinity_hint

2018-10-30 Thread Santosh Shilimkar

10/30/2018 4:11 AM, Marc Zyngier wrote:

The Keystone QMSS driver is pretty damaged, in the sense that it
does things like this:

irq_set_affinity_hint(irq, to_cpumask(_map));

where cpu_map is a local variable. As we leave the function, this
will point to nowhere-land, and things will end-up badly.

Instead, let's use a proper cpumask that gets allocated, giving
the driver a chance to actually work with things like irqbalance
as well as have a hypothetical 64bit future.

Signed-off-by: Marc Zyngier 
---

Thanks Marc for the fix. Will send note to arm-soc folks
to pick this up for fixes and also stable.

Regards,
Santosh


Re: [PATCH] soc: ti: QMSS: Fix usage of irq_set_affinity_hint

2018-10-30 Thread Santosh Shilimkar

10/30/2018 4:11 AM, Marc Zyngier wrote:

The Keystone QMSS driver is pretty damaged, in the sense that it
does things like this:

irq_set_affinity_hint(irq, to_cpumask(_map));

where cpu_map is a local variable. As we leave the function, this
will point to nowhere-land, and things will end-up badly.

Instead, let's use a proper cpumask that gets allocated, giving
the driver a chance to actually work with things like irqbalance
as well as have a hypothetical 64bit future.

Signed-off-by: Marc Zyngier 
---

Thanks Marc for the fix. Will send note to arm-soc folks
to pick this up for fixes and also stable.

Regards,
Santosh


Re: [PATCH] soc: ti: QMSS: Fix usage of irq_set_affinity_hint

2018-10-30 Thread Florian Fainelli
On 10/30/18 4:11 AM, Marc Zyngier wrote:
> The Keystone QMSS driver is pretty damaged, in the sense that it
> does things like this:
> 
>   irq_set_affinity_hint(irq, to_cpumask(_map));
> 
> where cpu_map is a local variable. As we leave the function, this
> will point to nowhere-land, and things will end-up badly.
> 
> Instead, let's use a proper cpumask that gets allocated, giving
> the driver a chance to actually work with things like irqbalance
> as well as have a hypothetical 64bit future.

Since this is at least the second patch from you that I can see in this
area, would it make sense to sprinkle object_is_on_stack() checks
throughout irq_set_affinity_hint() to help catch offenders?

> 
> Signed-off-by: Marc Zyngier 
> ---
> I found this one by inspection after finding a similar bug in an
> unrelated driver. It is only compile-tested. It would probably
> a Cc stable, but that's Santosh's decision.
> 
>  drivers/soc/ti/knav_qmss.h   |  4 ++--
>  drivers/soc/ti/knav_qmss_acc.c   | 10 +-
>  drivers/soc/ti/knav_qmss_queue.c | 22 +++---
>  3 files changed, 22 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/soc/ti/knav_qmss.h b/drivers/soc/ti/knav_qmss.h
> index 3efc47e82973..bd040c29c4bf 100644
> --- a/drivers/soc/ti/knav_qmss.h
> +++ b/drivers/soc/ti/knav_qmss.h
> @@ -329,8 +329,8 @@ struct knav_range_ops {
>  };
>  
>  struct knav_irq_info {
> - int irq;
> - u32 cpu_map;
> + int irq;
> + struct cpumask  *cpu_mask;
>  };
>  
>  struct knav_range_info {
> diff --git a/drivers/soc/ti/knav_qmss_acc.c b/drivers/soc/ti/knav_qmss_acc.c
> index 316e82e46f6c..2f7fb2dcc1d6 100644
> --- a/drivers/soc/ti/knav_qmss_acc.c
> +++ b/drivers/soc/ti/knav_qmss_acc.c
> @@ -205,18 +205,18 @@ static int knav_range_setup_acc_irq(struct 
> knav_range_info *range,
>  {
>   struct knav_device *kdev = range->kdev;
>   struct knav_acc_channel *acc;
> - unsigned long cpu_map;
> + struct cpumask *cpu_mask;
>   int ret = 0, irq;
>   u32 old, new;
>  
>   if (range->flags & RANGE_MULTI_QUEUE) {
>   acc = range->acc;
>   irq = range->irqs[0].irq;
> - cpu_map = range->irqs[0].cpu_map;
> + cpu_mask = range->irqs[0].cpu_mask;
>   } else {
>   acc = range->acc + queue;
>   irq = range->irqs[queue].irq;
> - cpu_map = range->irqs[queue].cpu_map;
> + cpu_mask = range->irqs[queue].cpu_mask;
>   }
>  
>   old = acc->open_mask;
> @@ -239,8 +239,8 @@ static int knav_range_setup_acc_irq(struct 
> knav_range_info *range,
>   acc->name, acc->name);
>   ret = request_irq(irq, knav_acc_int_handler, 0, acc->name,
> range);
> - if (!ret && cpu_map) {
> - ret = irq_set_affinity_hint(irq, to_cpumask(_map));
> + if (!ret && cpu_mask) {
> + ret = irq_set_affinity_hint(irq, cpu_mask);
>   if (ret) {
>   dev_warn(range->kdev->dev,
>"Failed to set IRQ affinity\n");
> diff --git a/drivers/soc/ti/knav_qmss_queue.c 
> b/drivers/soc/ti/knav_qmss_queue.c
> index b5d5673c255c..8b418379272d 100644
> --- a/drivers/soc/ti/knav_qmss_queue.c
> +++ b/drivers/soc/ti/knav_qmss_queue.c
> @@ -118,19 +118,17 @@ static int knav_queue_setup_irq(struct knav_range_info 
> *range,
> struct knav_queue_inst *inst)
>  {
>   unsigned queue = inst->id - range->queue_base;
> - unsigned long cpu_map;
>   int ret = 0, irq;
>  
>   if (range->flags & RANGE_HAS_IRQ) {
>   irq = range->irqs[queue].irq;
> - cpu_map = range->irqs[queue].cpu_map;
>   ret = request_irq(irq, knav_queue_int_handler, 0,
>   inst->irq_name, inst);
>   if (ret)
>   return ret;
>   disable_irq(irq);
> - if (cpu_map) {
> - ret = irq_set_affinity_hint(irq, to_cpumask(_map));
> + if (range->irqs[queue].cpu_mask) {
> + ret = irq_set_affinity_hint(irq, 
> range->irqs[queue].cpu_mask);
>   if (ret) {
>   dev_warn(range->kdev->dev,
>"Failed to set IRQ affinity\n");
> @@ -1262,9 +1260,19 @@ static int knav_setup_queue_range(struct knav_device 
> *kdev,
>  
>   range->num_irqs++;
>  
> - if (IS_ENABLED(CONFIG_SMP) && oirq.args_count == 3)
> - range->irqs[i].cpu_map =
> - (oirq.args[2] & 0xff00) >> 8;
> + if (IS_ENABLED(CONFIG_SMP) && oirq.args_count == 3) {
> + unsigned long mask;
> + int bit;
> +
> + range->irqs[i].cpu_mask = devm_kzalloc(dev,
> +

Re: [GIT PULL] tracing: Updates for 4.20 (or 5.0)

2018-10-30 Thread Linus Torvalds
On Tue, Oct 30, 2018 at 6:37 AM Steven Rostedt  wrote:
>
> The biggest change here is the updates to kprobes

Pulled.

However, a little note:

>  - Added support for SDT in uprobes

As an explanation, the above kind of sucks. "SDT"? I had to look it
up. I added a note to the commit message, but please, don't use random
acronyms in kernel explanations unless they are industry-standard and
comprehensible to kernel people.

Good acronyms: "TLB", "WTF", "LOL". People know what those mean.
"SDT"? Not so much.

Linus


Re: [GIT PULL] tracing: Updates for 4.20 (or 5.0)

2018-10-30 Thread Linus Torvalds
On Tue, Oct 30, 2018 at 6:37 AM Steven Rostedt  wrote:
>
> The biggest change here is the updates to kprobes

Pulled.

However, a little note:

>  - Added support for SDT in uprobes

As an explanation, the above kind of sucks. "SDT"? I had to look it
up. I added a note to the commit message, but please, don't use random
acronyms in kernel explanations unless they are industry-standard and
comprehensible to kernel people.

Good acronyms: "TLB", "WTF", "LOL". People know what those mean.
"SDT"? Not so much.

Linus


Re: [PATCH] soc: ti: QMSS: Fix usage of irq_set_affinity_hint

2018-10-30 Thread Florian Fainelli
On 10/30/18 4:11 AM, Marc Zyngier wrote:
> The Keystone QMSS driver is pretty damaged, in the sense that it
> does things like this:
> 
>   irq_set_affinity_hint(irq, to_cpumask(_map));
> 
> where cpu_map is a local variable. As we leave the function, this
> will point to nowhere-land, and things will end-up badly.
> 
> Instead, let's use a proper cpumask that gets allocated, giving
> the driver a chance to actually work with things like irqbalance
> as well as have a hypothetical 64bit future.

Since this is at least the second patch from you that I can see in this
area, would it make sense to sprinkle object_is_on_stack() checks
throughout irq_set_affinity_hint() to help catch offenders?

> 
> Signed-off-by: Marc Zyngier 
> ---
> I found this one by inspection after finding a similar bug in an
> unrelated driver. It is only compile-tested. It would probably
> a Cc stable, but that's Santosh's decision.
> 
>  drivers/soc/ti/knav_qmss.h   |  4 ++--
>  drivers/soc/ti/knav_qmss_acc.c   | 10 +-
>  drivers/soc/ti/knav_qmss_queue.c | 22 +++---
>  3 files changed, 22 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/soc/ti/knav_qmss.h b/drivers/soc/ti/knav_qmss.h
> index 3efc47e82973..bd040c29c4bf 100644
> --- a/drivers/soc/ti/knav_qmss.h
> +++ b/drivers/soc/ti/knav_qmss.h
> @@ -329,8 +329,8 @@ struct knav_range_ops {
>  };
>  
>  struct knav_irq_info {
> - int irq;
> - u32 cpu_map;
> + int irq;
> + struct cpumask  *cpu_mask;
>  };
>  
>  struct knav_range_info {
> diff --git a/drivers/soc/ti/knav_qmss_acc.c b/drivers/soc/ti/knav_qmss_acc.c
> index 316e82e46f6c..2f7fb2dcc1d6 100644
> --- a/drivers/soc/ti/knav_qmss_acc.c
> +++ b/drivers/soc/ti/knav_qmss_acc.c
> @@ -205,18 +205,18 @@ static int knav_range_setup_acc_irq(struct 
> knav_range_info *range,
>  {
>   struct knav_device *kdev = range->kdev;
>   struct knav_acc_channel *acc;
> - unsigned long cpu_map;
> + struct cpumask *cpu_mask;
>   int ret = 0, irq;
>   u32 old, new;
>  
>   if (range->flags & RANGE_MULTI_QUEUE) {
>   acc = range->acc;
>   irq = range->irqs[0].irq;
> - cpu_map = range->irqs[0].cpu_map;
> + cpu_mask = range->irqs[0].cpu_mask;
>   } else {
>   acc = range->acc + queue;
>   irq = range->irqs[queue].irq;
> - cpu_map = range->irqs[queue].cpu_map;
> + cpu_mask = range->irqs[queue].cpu_mask;
>   }
>  
>   old = acc->open_mask;
> @@ -239,8 +239,8 @@ static int knav_range_setup_acc_irq(struct 
> knav_range_info *range,
>   acc->name, acc->name);
>   ret = request_irq(irq, knav_acc_int_handler, 0, acc->name,
> range);
> - if (!ret && cpu_map) {
> - ret = irq_set_affinity_hint(irq, to_cpumask(_map));
> + if (!ret && cpu_mask) {
> + ret = irq_set_affinity_hint(irq, cpu_mask);
>   if (ret) {
>   dev_warn(range->kdev->dev,
>"Failed to set IRQ affinity\n");
> diff --git a/drivers/soc/ti/knav_qmss_queue.c 
> b/drivers/soc/ti/knav_qmss_queue.c
> index b5d5673c255c..8b418379272d 100644
> --- a/drivers/soc/ti/knav_qmss_queue.c
> +++ b/drivers/soc/ti/knav_qmss_queue.c
> @@ -118,19 +118,17 @@ static int knav_queue_setup_irq(struct knav_range_info 
> *range,
> struct knav_queue_inst *inst)
>  {
>   unsigned queue = inst->id - range->queue_base;
> - unsigned long cpu_map;
>   int ret = 0, irq;
>  
>   if (range->flags & RANGE_HAS_IRQ) {
>   irq = range->irqs[queue].irq;
> - cpu_map = range->irqs[queue].cpu_map;
>   ret = request_irq(irq, knav_queue_int_handler, 0,
>   inst->irq_name, inst);
>   if (ret)
>   return ret;
>   disable_irq(irq);
> - if (cpu_map) {
> - ret = irq_set_affinity_hint(irq, to_cpumask(_map));
> + if (range->irqs[queue].cpu_mask) {
> + ret = irq_set_affinity_hint(irq, 
> range->irqs[queue].cpu_mask);
>   if (ret) {
>   dev_warn(range->kdev->dev,
>"Failed to set IRQ affinity\n");
> @@ -1262,9 +1260,19 @@ static int knav_setup_queue_range(struct knav_device 
> *kdev,
>  
>   range->num_irqs++;
>  
> - if (IS_ENABLED(CONFIG_SMP) && oirq.args_count == 3)
> - range->irqs[i].cpu_map =
> - (oirq.args[2] & 0xff00) >> 8;
> + if (IS_ENABLED(CONFIG_SMP) && oirq.args_count == 3) {
> + unsigned long mask;
> + int bit;
> +
> + range->irqs[i].cpu_mask = devm_kzalloc(dev,
> +

Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-30 Thread Joel Fernandes
On Tue, Oct 30, 2018 at 1:50 AM Daniel Colascione  wrote:
>
> On Tue, Oct 30, 2018 at 3:21 AM, Joel Fernandes  wrote:
> > On Mon, Oct 29, 2018 at 3:11 PM Daniel Colascione  wrote:
> >>
> >> Add a simple proc-based kill interface. To use /proc/pid/kill, just
> >> write the signal number in base-10 ASCII to the kill file of the
> >> process to be killed: for example, 'echo 9 > /proc/$$/kill'.
> >>
> >> Semantically, /proc/pid/kill works like kill(2), except that the
> >> process ID comes from the proc filesystem context instead of from an
> >> explicit system call parameter. This way, it's possible to avoid races
> >> between inspecting some aspect of a process and that process's PID
> >> being reused for some other process.
> >>
> >> With /proc/pid/kill, it's possible to write a proper race-free and
> >> safe pkill(1). An approximation follows. A real program might use
> >> openat(2), having opened a process's /proc/pid directory explicitly,
> >> with the directory file descriptor serving as a sort of "process
> >> handle".
> >
> > How long does the 'inspection' procedure take? If its a short
> > duration, then is PID reuse really an issue, I mean the PIDs are not
> > reused until wrap around and the only reason this can be a problem is
> > if you have the wrap around while the 'inspecting some aspect'
> > procedure takes really long.
>
> It's a race. Would you make similar statements about a similar fix for
> a race condition involving a mutex and a double-free just because the
> race didn't crash most of the time? The issue I'm trying to fix here
> is the same problem, one level higher up in the abstraction hierarchy.

I was just curious that if this was a real issue you are hitting in a
production system, it wasn't clear from the commit message. When I
read your commit I thought "Does the inspection process take that long
that we wrap around an entire PID range?". So perhaps you should amend
your commit message to address that it is not really a problem you ARE
seeing, but rather something you anticipate and that this patch would
be a nice-to-have to avoid that. Typically there should be good
reasons/real-usecases to add a new interface to the kernel. Linus has
repeatedly rejected new interfaces on the grounds of non existent
use-cases or non real-world use cases. Again if I am missing something
here, then please improve the commit message so others don't have
similar questions :) Its completely upto you though..

> > IMO without a really good reason for this, it could really be a hard
> > sell but the RFC was worth it anyway to discuss it ;-)
>
> The traditional unix process API is down there at level -10 of Rusty
> Russel's old bad API scale: "It's impossible to get right". The races
> in the current API are unavoidable. That most programs don't hit these
> races most of the time doesn't mean that the race isn't present.
>
> We've moved to a model where we identify other system resources, like
> DRM fences, locks, sockets, and everything else via file descriptors.
> This change is a step toward using procfs file descriptors to work
> with processes, which makes the system more regular and easier to
> reason about. A clean API that's possible to use correctly is a
> worthwhile project.

Ok, agreed. thanks,

- Joel


Re: [RFC PATCH] Implement /proc/pid/kill

2018-10-30 Thread Joel Fernandes
On Tue, Oct 30, 2018 at 1:50 AM Daniel Colascione  wrote:
>
> On Tue, Oct 30, 2018 at 3:21 AM, Joel Fernandes  wrote:
> > On Mon, Oct 29, 2018 at 3:11 PM Daniel Colascione  wrote:
> >>
> >> Add a simple proc-based kill interface. To use /proc/pid/kill, just
> >> write the signal number in base-10 ASCII to the kill file of the
> >> process to be killed: for example, 'echo 9 > /proc/$$/kill'.
> >>
> >> Semantically, /proc/pid/kill works like kill(2), except that the
> >> process ID comes from the proc filesystem context instead of from an
> >> explicit system call parameter. This way, it's possible to avoid races
> >> between inspecting some aspect of a process and that process's PID
> >> being reused for some other process.
> >>
> >> With /proc/pid/kill, it's possible to write a proper race-free and
> >> safe pkill(1). An approximation follows. A real program might use
> >> openat(2), having opened a process's /proc/pid directory explicitly,
> >> with the directory file descriptor serving as a sort of "process
> >> handle".
> >
> > How long does the 'inspection' procedure take? If its a short
> > duration, then is PID reuse really an issue, I mean the PIDs are not
> > reused until wrap around and the only reason this can be a problem is
> > if you have the wrap around while the 'inspecting some aspect'
> > procedure takes really long.
>
> It's a race. Would you make similar statements about a similar fix for
> a race condition involving a mutex and a double-free just because the
> race didn't crash most of the time? The issue I'm trying to fix here
> is the same problem, one level higher up in the abstraction hierarchy.

I was just curious that if this was a real issue you are hitting in a
production system, it wasn't clear from the commit message. When I
read your commit I thought "Does the inspection process take that long
that we wrap around an entire PID range?". So perhaps you should amend
your commit message to address that it is not really a problem you ARE
seeing, but rather something you anticipate and that this patch would
be a nice-to-have to avoid that. Typically there should be good
reasons/real-usecases to add a new interface to the kernel. Linus has
repeatedly rejected new interfaces on the grounds of non existent
use-cases or non real-world use cases. Again if I am missing something
here, then please improve the commit message so others don't have
similar questions :) Its completely upto you though..

> > IMO without a really good reason for this, it could really be a hard
> > sell but the RFC was worth it anyway to discuss it ;-)
>
> The traditional unix process API is down there at level -10 of Rusty
> Russel's old bad API scale: "It's impossible to get right". The races
> in the current API are unavoidable. That most programs don't hit these
> races most of the time doesn't mean that the race isn't present.
>
> We've moved to a model where we identify other system resources, like
> DRM fences, locks, sockets, and everything else via file descriptors.
> This change is a step toward using procfs file descriptors to work
> with processes, which makes the system more regular and easier to
> reason about. A clean API that's possible to use correctly is a
> worthwhile project.

Ok, agreed. thanks,

- Joel


Re: [PATCH v2 0/6] staging:iio:ad2s90: Add scale info and improve error handling

2018-10-30 Thread Matheus Tavares Bernardino
On Sun, Oct 28, 2018 at 1:52 PM Jonathan Cameron  wrote:
>
> On Fri, 26 Oct 2018 22:59:59 -0300
> Matheus Tavares  wrote:
>
> > This patch set adds scale info to ad2s90's single channel, improve
> > error handling in it's functions and fix a possible race condition
> > issue.
> >
> > The goal with this patch set is to address the points discussed in the
> > mailing list in an effort to move ad2s90.c out of staging.
> Thanks,
>
> A good series in general.  A few suggested improvements.
> If I haven't commented on a patch, usually it means I'm happy with it
> and will pick it up with the rest of the series.
>
> Jonathan
>

Thanks for the review, Jonathan. We will address the necessary changes in v3!

Matheus

> >
> > Changes in v2:
> >  - Added my S-o-B in patch 5.
> >
> > Matheus Tavares (5):
> >   staging:iio:ad2s90: Make read_raw return spi_read's error code
> >   staging:iio:ad2s90: Make probe handle spi_setup failure
> >   staging:iio:ad2s90: Remove always overwritten assignment
> >   staging:iio:ad2s90: Move device registration to the end of probe
> >   staging:iio:ad2s90: Check channel type at read_raw
> >
> > Victor Colombo (1):
> >   staging:iio:ad2s90: Add IIO_CHAN_INFO_SCALE to channel spec and
> > read_raw
> >
> >  drivers/staging/iio/resolver/ad2s90.c | 55 ++-
> >  1 file changed, 37 insertions(+), 18 deletions(-)
> >
>


Re: [PATCH v2 0/6] staging:iio:ad2s90: Add scale info and improve error handling

2018-10-30 Thread Matheus Tavares Bernardino
On Sun, Oct 28, 2018 at 1:52 PM Jonathan Cameron  wrote:
>
> On Fri, 26 Oct 2018 22:59:59 -0300
> Matheus Tavares  wrote:
>
> > This patch set adds scale info to ad2s90's single channel, improve
> > error handling in it's functions and fix a possible race condition
> > issue.
> >
> > The goal with this patch set is to address the points discussed in the
> > mailing list in an effort to move ad2s90.c out of staging.
> Thanks,
>
> A good series in general.  A few suggested improvements.
> If I haven't commented on a patch, usually it means I'm happy with it
> and will pick it up with the rest of the series.
>
> Jonathan
>

Thanks for the review, Jonathan. We will address the necessary changes in v3!

Matheus

> >
> > Changes in v2:
> >  - Added my S-o-B in patch 5.
> >
> > Matheus Tavares (5):
> >   staging:iio:ad2s90: Make read_raw return spi_read's error code
> >   staging:iio:ad2s90: Make probe handle spi_setup failure
> >   staging:iio:ad2s90: Remove always overwritten assignment
> >   staging:iio:ad2s90: Move device registration to the end of probe
> >   staging:iio:ad2s90: Check channel type at read_raw
> >
> > Victor Colombo (1):
> >   staging:iio:ad2s90: Add IIO_CHAN_INFO_SCALE to channel spec and
> > read_raw
> >
> >  drivers/staging/iio/resolver/ad2s90.c | 55 ++-
> >  1 file changed, 37 insertions(+), 18 deletions(-)
> >
>


Re:Business proposition for you

2018-10-30 Thread Melvin Greg
Hello, 



I have a client from Syrian who will like to invest with your 
company. My client is willing to invest $4 Million. Can I have 
your company website to show to my client your company so that 
they will check and decide if they will invest there funds with 
you as joint partner. 

This information is needed urgently.

Please reply. 

Best Regards,
Agent Melvin Greg
Tel:+1 4045966532


Re:Business proposition for you

2018-10-30 Thread Melvin Greg
Hello, 



I have a client from Syrian who will like to invest with your 
company. My client is willing to invest $4 Million. Can I have 
your company website to show to my client your company so that 
they will check and decide if they will invest there funds with 
you as joint partner. 

This information is needed urgently.

Please reply. 

Best Regards,
Agent Melvin Greg
Tel:+1 4045966532


Re: [PATCH] thermal: broadcom: use devm_thermal_zone_of_sensor_register

2018-10-30 Thread Daniel Lezcano
On 30/10/2018 17:08, Julia Lawall wrote:
> Using devm_thermal_zone_of_sensor_register allows to simplify some
> error handling code, drop a label, and drop the remove function.
> 
> Signed-off-by: Julia Lawall 

Reviewed-by: Daniel Lezcano 

> ---
> 
> This patch is completely orthogonal to the recent constification
> patch.
> 
>  drivers/thermal/broadcom/brcmstb_thermal.c |   24 
>  1 file changed, 4 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/thermal/broadcom/brcmstb_thermal.c 
> b/drivers/thermal/broadcom/brcmstb_thermal.c
> index 1919f91fa756..956eef8717bb 100644
> --- a/drivers/thermal/broadcom/brcmstb_thermal.c
> +++ b/drivers/thermal/broadcom/brcmstb_thermal.c
> @@ -329,7 +329,8 @@ static int brcmstb_thermal_probe(struct platform_device 
> *pdev)
>   priv->dev = >dev;
>   platform_set_drvdata(pdev, priv);
>  
> - thermal = thermal_zone_of_sensor_register(>dev, 0, priv, _ops);
> + thermal = devm_thermal_zone_of_sensor_register(>dev, 0, priv,
> +_ops);
>   if (IS_ERR(thermal)) {
>   ret = PTR_ERR(thermal);
>   dev_err(>dev, "could not register sensor: %d\n", ret);
> @@ -341,40 +342,23 @@ static int brcmstb_thermal_probe(struct platform_device 
> *pdev)
>   irq = platform_get_irq(pdev, 0);
>   if (irq < 0) {
>   dev_err(>dev, "could not get IRQ\n");
> - ret = irq;
> - goto err;
> + return irq;
>   }
>   ret = devm_request_threaded_irq(>dev, irq, NULL,
>   brcmstb_tmon_irq_thread, IRQF_ONESHOT,
>   DRV_NAME, priv);
>   if (ret < 0) {
>   dev_err(>dev, "could not request IRQ: %d\n", ret);
> - goto err;
> + return ret;
>   }
>  
>   dev_info(>dev, "registered AVS TMON of-sensor driver\n");
>  
>   return 0;
> -
> -err:
> - thermal_zone_of_sensor_unregister(>dev, thermal);
> - return ret;
> -}
> -
> -static int brcmstb_thermal_exit(struct platform_device *pdev)
> -{
> - struct brcmstb_thermal_priv *priv = platform_get_drvdata(pdev);
> - struct thermal_zone_device *thermal = priv->thermal;
> -
> - if (thermal)
> - thermal_zone_of_sensor_unregister(>dev, priv->thermal);
> -
> - return 0;
>  }
>  
>  static struct platform_driver brcmstb_thermal_driver = {
>   .probe = brcmstb_thermal_probe,
> - .remove = brcmstb_thermal_exit,
>   .driver = {
>   .name = DRV_NAME,
>   .of_match_table = brcmstb_thermal_id_table,
> 


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH] thermal: broadcom: use devm_thermal_zone_of_sensor_register

2018-10-30 Thread Daniel Lezcano
On 30/10/2018 17:08, Julia Lawall wrote:
> Using devm_thermal_zone_of_sensor_register allows to simplify some
> error handling code, drop a label, and drop the remove function.
> 
> Signed-off-by: Julia Lawall 

Reviewed-by: Daniel Lezcano 

> ---
> 
> This patch is completely orthogonal to the recent constification
> patch.
> 
>  drivers/thermal/broadcom/brcmstb_thermal.c |   24 
>  1 file changed, 4 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/thermal/broadcom/brcmstb_thermal.c 
> b/drivers/thermal/broadcom/brcmstb_thermal.c
> index 1919f91fa756..956eef8717bb 100644
> --- a/drivers/thermal/broadcom/brcmstb_thermal.c
> +++ b/drivers/thermal/broadcom/brcmstb_thermal.c
> @@ -329,7 +329,8 @@ static int brcmstb_thermal_probe(struct platform_device 
> *pdev)
>   priv->dev = >dev;
>   platform_set_drvdata(pdev, priv);
>  
> - thermal = thermal_zone_of_sensor_register(>dev, 0, priv, _ops);
> + thermal = devm_thermal_zone_of_sensor_register(>dev, 0, priv,
> +_ops);
>   if (IS_ERR(thermal)) {
>   ret = PTR_ERR(thermal);
>   dev_err(>dev, "could not register sensor: %d\n", ret);
> @@ -341,40 +342,23 @@ static int brcmstb_thermal_probe(struct platform_device 
> *pdev)
>   irq = platform_get_irq(pdev, 0);
>   if (irq < 0) {
>   dev_err(>dev, "could not get IRQ\n");
> - ret = irq;
> - goto err;
> + return irq;
>   }
>   ret = devm_request_threaded_irq(>dev, irq, NULL,
>   brcmstb_tmon_irq_thread, IRQF_ONESHOT,
>   DRV_NAME, priv);
>   if (ret < 0) {
>   dev_err(>dev, "could not request IRQ: %d\n", ret);
> - goto err;
> + return ret;
>   }
>  
>   dev_info(>dev, "registered AVS TMON of-sensor driver\n");
>  
>   return 0;
> -
> -err:
> - thermal_zone_of_sensor_unregister(>dev, thermal);
> - return ret;
> -}
> -
> -static int brcmstb_thermal_exit(struct platform_device *pdev)
> -{
> - struct brcmstb_thermal_priv *priv = platform_get_drvdata(pdev);
> - struct thermal_zone_device *thermal = priv->thermal;
> -
> - if (thermal)
> - thermal_zone_of_sensor_unregister(>dev, priv->thermal);
> -
> - return 0;
>  }
>  
>  static struct platform_driver brcmstb_thermal_driver = {
>   .probe = brcmstb_thermal_probe,
> - .remove = brcmstb_thermal_exit,
>   .driver = {
>   .name = DRV_NAME,
>   .of_match_table = brcmstb_thermal_id_table,
> 


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH] arm64: dts: sdm845: Add PSCI cpuidle low power states

2018-10-30 Thread Doug Anderson
Resending since I accidentally clicked back to HTML and lists all bounced.  :(

On Tue, Oct 30, 2018 at 9:24 AM Raju P L S S S N  wrote:
>
> On 10/24/2018 12:06 AM, Doug Anderson wrote:
> > Hi,
> >
> > On Mon, Oct 15, 2018 at 10:02 AM Raju P.L.S.S.S.N
> >  wrote:
> >> +   idle-states {
> >> +   entry-method = "psci";
> >> +
> >> +   C0_CPU_SPC: c0_spc {
> >
> > nit: all these nodes should have dashes instead of spaces in the node
> > names (labels can still have spaces).  AKA:
> >
> > C0_CPU_SPC: c0-spc {
>
> Sure. I will change this. (However, from device tree specification v0.2,
> I see that Table 2.1 mentions underscore as valid character for node.
> Please correct me if I'm missing something)

You're right that I don't see it there.  I've always heard that they
are allowed but discouraged.  One place that mentions this:

https://elinux.org/Device_Tree_Linux

> Linux conventions
> * node names
>   * use dash "-" instead of underscore "_"

That same site points that the majority of people use dashes in node
names instead of underscores.

-Doug


Re: [PATCH] arm64: dts: sdm845: Add PSCI cpuidle low power states

2018-10-30 Thread Doug Anderson
Resending since I accidentally clicked back to HTML and lists all bounced.  :(

On Tue, Oct 30, 2018 at 9:24 AM Raju P L S S S N  wrote:
>
> On 10/24/2018 12:06 AM, Doug Anderson wrote:
> > Hi,
> >
> > On Mon, Oct 15, 2018 at 10:02 AM Raju P.L.S.S.S.N
> >  wrote:
> >> +   idle-states {
> >> +   entry-method = "psci";
> >> +
> >> +   C0_CPU_SPC: c0_spc {
> >
> > nit: all these nodes should have dashes instead of spaces in the node
> > names (labels can still have spaces).  AKA:
> >
> > C0_CPU_SPC: c0-spc {
>
> Sure. I will change this. (However, from device tree specification v0.2,
> I see that Table 2.1 mentions underscore as valid character for node.
> Please correct me if I'm missing something)

You're right that I don't see it there.  I've always heard that they
are allowed but discouraged.  One place that mentions this:

https://elinux.org/Device_Tree_Linux

> Linux conventions
> * node names
>   * use dash "-" instead of underscore "_"

That same site points that the majority of people use dashes in node
names instead of underscores.

-Doug


Re: [PATCH 0/2] [GIT PULL] tracing: Fixes to synthetic events

2018-10-30 Thread Linus Torvalds
On Tue, Oct 30, 2018 at 6:15 AM Steven Rostedt  wrote:
>
> This is not my 4.20 pull request (that will be coming shortly). This
> is just some last minute fixes that Masami sent me that I finally got
> around to test.

Pulled. Real pull request is next,

  Linus


Re: [PATCH 0/2] [GIT PULL] tracing: Fixes to synthetic events

2018-10-30 Thread Linus Torvalds
On Tue, Oct 30, 2018 at 6:15 AM Steven Rostedt  wrote:
>
> This is not my 4.20 pull request (that will be coming shortly). This
> is just some last minute fixes that Masami sent me that I finally got
> around to test.

Pulled. Real pull request is next,

  Linus


Re: [PATCH v5] kernel/signal: Signal-based pre-coredump notification

2018-10-30 Thread Oleg Nesterov
On 10/29, Enke Chen wrote:
>
> Reviewed-by: Oleg Nesterov 

Hmm. I didn't say this ;)

But OK, feel free to keep this tag.

I do not like this feauture. But I see no technical problems in this version
and I never pretented I understand the user-space needs, so I won't argue.

Oleg.



Re: [PATCH v5] kernel/signal: Signal-based pre-coredump notification

2018-10-30 Thread Oleg Nesterov
On 10/29, Enke Chen wrote:
>
> Reviewed-by: Oleg Nesterov 

Hmm. I didn't say this ;)

But OK, feel free to keep this tag.

I do not like this feauture. But I see no technical problems in this version
and I never pretented I understand the user-space needs, so I won't argue.

Oleg.



[PATCH] thermal: broadcom: use devm_thermal_zone_of_sensor_register

2018-10-30 Thread Julia Lawall
Using devm_thermal_zone_of_sensor_register allows to simplify some
error handling code, drop a label, and drop the remove function.

Signed-off-by: Julia Lawall 

---

This patch is completely orthogonal to the recent constification
patch.

 drivers/thermal/broadcom/brcmstb_thermal.c |   24 
 1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/drivers/thermal/broadcom/brcmstb_thermal.c 
b/drivers/thermal/broadcom/brcmstb_thermal.c
index 1919f91fa756..956eef8717bb 100644
--- a/drivers/thermal/broadcom/brcmstb_thermal.c
+++ b/drivers/thermal/broadcom/brcmstb_thermal.c
@@ -329,7 +329,8 @@ static int brcmstb_thermal_probe(struct platform_device 
*pdev)
priv->dev = >dev;
platform_set_drvdata(pdev, priv);
 
-   thermal = thermal_zone_of_sensor_register(>dev, 0, priv, _ops);
+   thermal = devm_thermal_zone_of_sensor_register(>dev, 0, priv,
+  _ops);
if (IS_ERR(thermal)) {
ret = PTR_ERR(thermal);
dev_err(>dev, "could not register sensor: %d\n", ret);
@@ -341,40 +342,23 @@ static int brcmstb_thermal_probe(struct platform_device 
*pdev)
irq = platform_get_irq(pdev, 0);
if (irq < 0) {
dev_err(>dev, "could not get IRQ\n");
-   ret = irq;
-   goto err;
+   return irq;
}
ret = devm_request_threaded_irq(>dev, irq, NULL,
brcmstb_tmon_irq_thread, IRQF_ONESHOT,
DRV_NAME, priv);
if (ret < 0) {
dev_err(>dev, "could not request IRQ: %d\n", ret);
-   goto err;
+   return ret;
}
 
dev_info(>dev, "registered AVS TMON of-sensor driver\n");
 
return 0;
-
-err:
-   thermal_zone_of_sensor_unregister(>dev, thermal);
-   return ret;
-}
-
-static int brcmstb_thermal_exit(struct platform_device *pdev)
-{
-   struct brcmstb_thermal_priv *priv = platform_get_drvdata(pdev);
-   struct thermal_zone_device *thermal = priv->thermal;
-
-   if (thermal)
-   thermal_zone_of_sensor_unregister(>dev, priv->thermal);
-
-   return 0;
 }
 
 static struct platform_driver brcmstb_thermal_driver = {
.probe = brcmstb_thermal_probe,
-   .remove = brcmstb_thermal_exit,
.driver = {
.name = DRV_NAME,
.of_match_table = brcmstb_thermal_id_table,



[PATCH] thermal: broadcom: use devm_thermal_zone_of_sensor_register

2018-10-30 Thread Julia Lawall
Using devm_thermal_zone_of_sensor_register allows to simplify some
error handling code, drop a label, and drop the remove function.

Signed-off-by: Julia Lawall 

---

This patch is completely orthogonal to the recent constification
patch.

 drivers/thermal/broadcom/brcmstb_thermal.c |   24 
 1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/drivers/thermal/broadcom/brcmstb_thermal.c 
b/drivers/thermal/broadcom/brcmstb_thermal.c
index 1919f91fa756..956eef8717bb 100644
--- a/drivers/thermal/broadcom/brcmstb_thermal.c
+++ b/drivers/thermal/broadcom/brcmstb_thermal.c
@@ -329,7 +329,8 @@ static int brcmstb_thermal_probe(struct platform_device 
*pdev)
priv->dev = >dev;
platform_set_drvdata(pdev, priv);
 
-   thermal = thermal_zone_of_sensor_register(>dev, 0, priv, _ops);
+   thermal = devm_thermal_zone_of_sensor_register(>dev, 0, priv,
+  _ops);
if (IS_ERR(thermal)) {
ret = PTR_ERR(thermal);
dev_err(>dev, "could not register sensor: %d\n", ret);
@@ -341,40 +342,23 @@ static int brcmstb_thermal_probe(struct platform_device 
*pdev)
irq = platform_get_irq(pdev, 0);
if (irq < 0) {
dev_err(>dev, "could not get IRQ\n");
-   ret = irq;
-   goto err;
+   return irq;
}
ret = devm_request_threaded_irq(>dev, irq, NULL,
brcmstb_tmon_irq_thread, IRQF_ONESHOT,
DRV_NAME, priv);
if (ret < 0) {
dev_err(>dev, "could not request IRQ: %d\n", ret);
-   goto err;
+   return ret;
}
 
dev_info(>dev, "registered AVS TMON of-sensor driver\n");
 
return 0;
-
-err:
-   thermal_zone_of_sensor_unregister(>dev, thermal);
-   return ret;
-}
-
-static int brcmstb_thermal_exit(struct platform_device *pdev)
-{
-   struct brcmstb_thermal_priv *priv = platform_get_drvdata(pdev);
-   struct thermal_zone_device *thermal = priv->thermal;
-
-   if (thermal)
-   thermal_zone_of_sensor_unregister(>dev, priv->thermal);
-
-   return 0;
 }
 
 static struct platform_driver brcmstb_thermal_driver = {
.probe = brcmstb_thermal_probe,
-   .remove = brcmstb_thermal_exit,
.driver = {
.name = DRV_NAME,
.of_match_table = brcmstb_thermal_id_table,



Re: [PATCH 2/7] dt-bindings: mfd: ds90ux9xx: add description of TI DS90Ux9xx I2C bridge

2018-10-30 Thread Luca Ceresoli
Hi Vladimir,

On 08/10/18 23:12, Vladimir Zapolskiy wrote:
> From: Vladimir Zapolskiy 
> 
> TI DS90Ux9xx de-/serializers are capable to route I2C messages to
> I2C slave devices connected to a remote de-/serializer in a pair,
> the change adds description of device tree bindings of the subcontroller
> to configure and enable this functionality.
> 
> Signed-off-by: Vladimir Zapolskiy 
> ---
>  .../bindings/mfd/ti,ds90ux9xx-i2c-bridge.txt  | 61 +++
>  1 file changed, 61 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/mfd/ti,ds90ux9xx-i2c-bridge.txt
> 
> diff --git 
> a/Documentation/devicetree/bindings/mfd/ti,ds90ux9xx-i2c-bridge.txt 
> b/Documentation/devicetree/bindings/mfd/ti,ds90ux9xx-i2c-bridge.txt
> new file mode 100644
> index ..4169e382073a
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/mfd/ti,ds90ux9xx-i2c-bridge.txt
> @@ -0,0 +1,61 @@
> +TI DS90Ux9xx de-/serializer I2C bridge subcontroller
> +
> +Required properties:
> +- compatible: Must contain a generic "ti,ds90ux9xx-i2c-bridge" value and
> + may contain one more specific value from the list:
> + "ti,ds90ux925-i2c-bridge",
> + "ti,ds90ux926-i2c-bridge",
> + "ti,ds90ux927-i2c-bridge",
> + "ti,ds90ux928-i2c-bridge",
> + "ti,ds90ux940-i2c-bridge".
> +
> +Required properties of a de-/serializer device connected to a local I2C bus:
> +- ti,i2c-bridges: List of phandles to remote de-/serializer devices with
> + two arguments: id of a local de-/serializer FPD link and an assigned
> + I2C address of a remote de-/serializer to be accessed on a local
> + I2C bus.
> +
> +Optional properties of a de-/serializer device connected to a local I2C bus:
> +- ti,i2c-bridge-maps: List of 3-cell values:
> + - the first argument is id of a local de-/serializer FPD link,
> + - the second argument is an I2C address of a device connected to
> +   a remote de-/serializer IC,
> + - the third argument is an I2C address of the remote I2C device
> +   for access on a local I2C bus.

BTW I usually use names "remove slave" address and "alias" for bullets 2
and 3. These are the names from the datasheets, and are clearer IMO.

Now to the big stuff.

I find a static map in the "local" chip DT node is a limit. You might
have to support multiple models of remote device, where you'll know the
model only when after it gets connected. Think Beaglebone capes, but
over FPD-Link 3. This scenario opens several issues, but specifically
for I2C address mapping I addressed it by adding in the "local" chip's
DT node a pool of I2C aliases it can use. The DT author is responsible
to pick addresses that are not used on the same I2C bus, which cannot be
done at runtime reliably.

Here's my current draft on a dual/quad port deserializer:

 {
serializer@3d {
reg = <0x3d>;
...

/* Guaranteed not physically present on i2c0 */
i2c-alias-pool = /bits/ 8 <0x20 0x21 0x22 0x23 0x24 0x25>;

rxports {
#address-cells = <1>;
#size-cells = <0>;

rxport@0 {
reg = <0>;
remote-i2c-bus { /* The proxied I2C bus on rxport 0 */
#address-cells = <1>;
#size-cells = <0>;

eeprom@51 {
reg = <0x51>;
compatible = "at,24c02";
};
};

rxport@1 {
reg = <1>;
remote-i2c-bus { /* The proxied I2C bus on rxport 1 */
#address-cells = <1>;
#size-cells = <0>;

eeprom@51 {
reg = <0x51>;
compatible = "at,24c02";
};
};
};
};
};
};

At probe time the serializer driver instantiates one new i2c_adapter for
each rxport. Any remote device is added (removed) to that adapter, then
the driver finds an available alias and maps (unmaps) it. The
transactions are handled in a way similar to i2c-mux, i.e. the ds90*
i2c_adapter has a master_xfer callback that changes the remote slave
address to the corresponding alias, then calls parent->algo->master_xfer().

Note how both eeproms in the example have the same physical address.
They will be given two different aliases.

> +- ti,i2c-bridge-auto-ack: Enables AUTO ACK mode.

It this useful other than for debugging? And, as Laurent noted, this
should not be in DT: it doesn't describe the hardware.

> +- ti,i2c-bridge-pass-all: Enables PASS ALL mode, remote I2C slave devices
> + are accessible on a local (host) I2C bus without I2C address
> + remappings.

It should be clear from the DT docs that either ti,i2c-bridge-pass-all
is enabled or the alias map/pool is used, but not both.

Bye,
-- 
Luca


Re: [PATCH 2/7] dt-bindings: mfd: ds90ux9xx: add description of TI DS90Ux9xx I2C bridge

2018-10-30 Thread Luca Ceresoli
Hi Vladimir,

On 08/10/18 23:12, Vladimir Zapolskiy wrote:
> From: Vladimir Zapolskiy 
> 
> TI DS90Ux9xx de-/serializers are capable to route I2C messages to
> I2C slave devices connected to a remote de-/serializer in a pair,
> the change adds description of device tree bindings of the subcontroller
> to configure and enable this functionality.
> 
> Signed-off-by: Vladimir Zapolskiy 
> ---
>  .../bindings/mfd/ti,ds90ux9xx-i2c-bridge.txt  | 61 +++
>  1 file changed, 61 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/mfd/ti,ds90ux9xx-i2c-bridge.txt
> 
> diff --git 
> a/Documentation/devicetree/bindings/mfd/ti,ds90ux9xx-i2c-bridge.txt 
> b/Documentation/devicetree/bindings/mfd/ti,ds90ux9xx-i2c-bridge.txt
> new file mode 100644
> index ..4169e382073a
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/mfd/ti,ds90ux9xx-i2c-bridge.txt
> @@ -0,0 +1,61 @@
> +TI DS90Ux9xx de-/serializer I2C bridge subcontroller
> +
> +Required properties:
> +- compatible: Must contain a generic "ti,ds90ux9xx-i2c-bridge" value and
> + may contain one more specific value from the list:
> + "ti,ds90ux925-i2c-bridge",
> + "ti,ds90ux926-i2c-bridge",
> + "ti,ds90ux927-i2c-bridge",
> + "ti,ds90ux928-i2c-bridge",
> + "ti,ds90ux940-i2c-bridge".
> +
> +Required properties of a de-/serializer device connected to a local I2C bus:
> +- ti,i2c-bridges: List of phandles to remote de-/serializer devices with
> + two arguments: id of a local de-/serializer FPD link and an assigned
> + I2C address of a remote de-/serializer to be accessed on a local
> + I2C bus.
> +
> +Optional properties of a de-/serializer device connected to a local I2C bus:
> +- ti,i2c-bridge-maps: List of 3-cell values:
> + - the first argument is id of a local de-/serializer FPD link,
> + - the second argument is an I2C address of a device connected to
> +   a remote de-/serializer IC,
> + - the third argument is an I2C address of the remote I2C device
> +   for access on a local I2C bus.

BTW I usually use names "remove slave" address and "alias" for bullets 2
and 3. These are the names from the datasheets, and are clearer IMO.

Now to the big stuff.

I find a static map in the "local" chip DT node is a limit. You might
have to support multiple models of remote device, where you'll know the
model only when after it gets connected. Think Beaglebone capes, but
over FPD-Link 3. This scenario opens several issues, but specifically
for I2C address mapping I addressed it by adding in the "local" chip's
DT node a pool of I2C aliases it can use. The DT author is responsible
to pick addresses that are not used on the same I2C bus, which cannot be
done at runtime reliably.

Here's my current draft on a dual/quad port deserializer:

 {
serializer@3d {
reg = <0x3d>;
...

/* Guaranteed not physically present on i2c0 */
i2c-alias-pool = /bits/ 8 <0x20 0x21 0x22 0x23 0x24 0x25>;

rxports {
#address-cells = <1>;
#size-cells = <0>;

rxport@0 {
reg = <0>;
remote-i2c-bus { /* The proxied I2C bus on rxport 0 */
#address-cells = <1>;
#size-cells = <0>;

eeprom@51 {
reg = <0x51>;
compatible = "at,24c02";
};
};

rxport@1 {
reg = <1>;
remote-i2c-bus { /* The proxied I2C bus on rxport 1 */
#address-cells = <1>;
#size-cells = <0>;

eeprom@51 {
reg = <0x51>;
compatible = "at,24c02";
};
};
};
};
};
};

At probe time the serializer driver instantiates one new i2c_adapter for
each rxport. Any remote device is added (removed) to that adapter, then
the driver finds an available alias and maps (unmaps) it. The
transactions are handled in a way similar to i2c-mux, i.e. the ds90*
i2c_adapter has a master_xfer callback that changes the remote slave
address to the corresponding alias, then calls parent->algo->master_xfer().

Note how both eeproms in the example have the same physical address.
They will be given two different aliases.

> +- ti,i2c-bridge-auto-ack: Enables AUTO ACK mode.

It this useful other than for debugging? And, as Laurent noted, this
should not be in DT: it doesn't describe the hardware.

> +- ti,i2c-bridge-pass-all: Enables PASS ALL mode, remote I2C slave devices
> + are accessible on a local (host) I2C bus without I2C address
> + remappings.

It should be clear from the DT docs that either ti,i2c-bridge-pass-all
is enabled or the alias map/pool is used, but not both.

Bye,
-- 
Luca


Re: [PATCH] proc: use ns_capable instead of capable for timerslack_ns

2018-10-30 Thread Benjamin Gordon
On Thu, Oct 25, 2018 at 01:56:27PM -0500, Eric W. Biederman wrote:
> > Access to timerslack_ns is controlled by a process having CAP_SYS_NICE
> > in its effective capability set, but the current check looks in the root
> > namespace instead of the process' user namespace.  Since a process is
> > allowed to do other activities controlled by CAP_SYS_NICE inside a
> > namespace, it should also be able to adjust timerslack_ns.
> 
> The goal seems legitimate.  However the permission checks look wrong.
> 
> In particular the choice of user namespace should be
> "p->cred->user_ns".  This will limit this to tasks that have
> CAP_SYS_NICE in the same namespace as the task that is being modified.
> 
> Testing file->f_cred->user_ns it is testing whoever opened the file and
> that could be anyone.

Thanks, that seems like the right answer here.  I'll send a v2 to fix
it.

Thanks,
Benjamin


Re: [PATCH] proc: use ns_capable instead of capable for timerslack_ns

2018-10-30 Thread Benjamin Gordon
On Thu, Oct 25, 2018 at 01:56:27PM -0500, Eric W. Biederman wrote:
> > Access to timerslack_ns is controlled by a process having CAP_SYS_NICE
> > in its effective capability set, but the current check looks in the root
> > namespace instead of the process' user namespace.  Since a process is
> > allowed to do other activities controlled by CAP_SYS_NICE inside a
> > namespace, it should also be able to adjust timerslack_ns.
> 
> The goal seems legitimate.  However the permission checks look wrong.
> 
> In particular the choice of user namespace should be
> "p->cred->user_ns".  This will limit this to tasks that have
> CAP_SYS_NICE in the same namespace as the task that is being modified.
> 
> Testing file->f_cred->user_ns it is testing whoever opened the file and
> that could be anyone.

Thanks, that seems like the right answer here.  I'll send a v2 to fix
it.

Thanks,
Benjamin


Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Oleg Nesterov
On 10/30, Oleg Nesterov wrote:
>
> On 10/30, Tycho Andersen wrote:
> >
> > @@ -828,6 +823,11 @@ static int __seccomp_filter(int this_syscall, const 
> > struct seccomp_data *sd,
> >  */
> > rmb();
> >
> > +   if (!sd) {
> > +   populate_seccomp_data(_local);
> > +   sd = _local;
> > +   }
> > +
>
> To me it would be more clean to remove the "if (!sd)" check, 
> case(SECCOMP_RET_TRACE)
> in __seccomp_filter() can simply do populate_seccomp_data(_local) 
> unconditionally
> and pass _local to __seccomp_filter().

Ah, please ignore, emulate_vsyscall() does secure_computing(NULL).

Btw. why __seccomp_filter() doesn't return a boolean?

Or at least, why can't case(SECCOMP_RET_TRACE) simply do

return __seccomp_filter(this_syscall, NULL, true);

?

Oleg.



Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Oleg Nesterov
On 10/30, Oleg Nesterov wrote:
>
> On 10/30, Tycho Andersen wrote:
> >
> > @@ -828,6 +823,11 @@ static int __seccomp_filter(int this_syscall, const 
> > struct seccomp_data *sd,
> >  */
> > rmb();
> >
> > +   if (!sd) {
> > +   populate_seccomp_data(_local);
> > +   sd = _local;
> > +   }
> > +
>
> To me it would be more clean to remove the "if (!sd)" check, 
> case(SECCOMP_RET_TRACE)
> in __seccomp_filter() can simply do populate_seccomp_data(_local) 
> unconditionally
> and pass _local to __seccomp_filter().

Ah, please ignore, emulate_vsyscall() does secure_computing(NULL).

Btw. why __seccomp_filter() doesn't return a boolean?

Or at least, why can't case(SECCOMP_RET_TRACE) simply do

return __seccomp_filter(this_syscall, NULL, true);

?

Oleg.



Re: [alsa-devel] [PATCH] ASoC: stm32: sai: fix invalid use of sizeof in stm32_sai_add_mclk_provider()

2018-10-30 Thread Mark Brown
On Sat, Oct 27, 2018 at 02:19:59AM +, Wei Yongjun wrote:
> sizeof() when applied to a pointer typed expression gives the
> size of the pointer, not that of the pointed data.

Someone else already sent a fix for this.


signature.asc
Description: PGP signature


Re: [alsa-devel] [PATCH] ASoC: stm32: sai: fix invalid use of sizeof in stm32_sai_add_mclk_provider()

2018-10-30 Thread Mark Brown
On Sat, Oct 27, 2018 at 02:19:59AM +, Wei Yongjun wrote:
> sizeof() when applied to a pointer typed expression gives the
> size of the pointer, not that of the pointed data.

Someone else already sent a fix for this.


signature.asc
Description: PGP signature


Re: [PATCH v1 2/2] clk: qcom : dispcc: Add support for display port clocks

2018-10-30 Thread Stephen Boyd
Quoting Taniya Das (2018-10-29 23:01:44)
> On 10/30/2018 12:13 AM, Stephen Boyd wrote:
> > Quoting Taniya Das (2018-10-28 03:34:55)
> >> On 2018-10-19 16:04, Taniya Das wrote:
> >>> On 10/10/2018 2:04 AM, Stephen Boyd wrote:
>  Quoting Taniya Das (2018-10-09 06:57:47)
> > diff --git a/drivers/clk/qcom/dispcc-sdm845.c
> > b/drivers/clk/qcom/dispcc-sdm845.c
> > index 0cc4909..6d3136a 100644
> > --- a/drivers/clk/qcom/dispcc-sdm845.c
> > +++ b/drivers/clk/qcom/dispcc-sdm845.c
> > +   },
> > +};
> > +
> > +static const struct freq_tbl ftbl_disp_cc_mdss_dp_link_clk_src[] = {
> > +   F(162000, P_DP_PHY_PLL_LINK_CLK,   1,   0,   0),
> > +   F(27, P_DP_PHY_PLL_LINK_CLK,   1,   0,   0),
> > +   F(54, P_DP_PHY_PLL_LINK_CLK,   1,   0,   0),
> > +   F(81, P_DP_PHY_PLL_LINK_CLK,   1,   0,   0),
> 
>  Are these in kHz? They really look like it and that's bad. Why do we
>  need them at all? Just to make sure the display driver picks these
>  exact
>  frequencies? It seems like we could just pass whatever number comes in
>  up to the parent and see what it can do.
> 
> >>>
> >>> Let me check back the reason we had to make this change.
> >>
> >> We will need this flag since we reset/power-down the PLL every time we
> >> disconnect/connect the DP cable or during suspend/resume. Only with this
> >> flag, the calls to the PLL driver are properly called.
> > 
> > What does this mean? I wanted to know about the weird frequencies listed
> > above, and why it can't be done without a frequency table and direct
> > rates passed up to the parent.
> >
> 
> OOps, my bad :(.
> 
> We added these changes to handle higher clock rates. These rates when 
> greater than 4.3Ghz cannot be represented in 32bit variables. For DP, we 
> already have 5.4G and 8.1GHz freq for VCO clock. We will need these Khz 
> freq list in clock driver.
>   Let me check if they can do something like the byte/pixel clocks of 
> display.

Well then we really should just throw away the freq table here and have
rcg ops that pass the frequency up to the display PLL. Also, those
numbers look like gigabits per second (Gbit/s) for the DP spec which
isn't exactly the same as a clk frequency. What frequency does the PLL
run at for these various DP link speeds?



Re: [PATCH v1 2/2] clk: qcom : dispcc: Add support for display port clocks

2018-10-30 Thread Stephen Boyd
Quoting Taniya Das (2018-10-29 23:01:44)
> On 10/30/2018 12:13 AM, Stephen Boyd wrote:
> > Quoting Taniya Das (2018-10-28 03:34:55)
> >> On 2018-10-19 16:04, Taniya Das wrote:
> >>> On 10/10/2018 2:04 AM, Stephen Boyd wrote:
>  Quoting Taniya Das (2018-10-09 06:57:47)
> > diff --git a/drivers/clk/qcom/dispcc-sdm845.c
> > b/drivers/clk/qcom/dispcc-sdm845.c
> > index 0cc4909..6d3136a 100644
> > --- a/drivers/clk/qcom/dispcc-sdm845.c
> > +++ b/drivers/clk/qcom/dispcc-sdm845.c
> > +   },
> > +};
> > +
> > +static const struct freq_tbl ftbl_disp_cc_mdss_dp_link_clk_src[] = {
> > +   F(162000, P_DP_PHY_PLL_LINK_CLK,   1,   0,   0),
> > +   F(27, P_DP_PHY_PLL_LINK_CLK,   1,   0,   0),
> > +   F(54, P_DP_PHY_PLL_LINK_CLK,   1,   0,   0),
> > +   F(81, P_DP_PHY_PLL_LINK_CLK,   1,   0,   0),
> 
>  Are these in kHz? They really look like it and that's bad. Why do we
>  need them at all? Just to make sure the display driver picks these
>  exact
>  frequencies? It seems like we could just pass whatever number comes in
>  up to the parent and see what it can do.
> 
> >>>
> >>> Let me check back the reason we had to make this change.
> >>
> >> We will need this flag since we reset/power-down the PLL every time we
> >> disconnect/connect the DP cable or during suspend/resume. Only with this
> >> flag, the calls to the PLL driver are properly called.
> > 
> > What does this mean? I wanted to know about the weird frequencies listed
> > above, and why it can't be done without a frequency table and direct
> > rates passed up to the parent.
> >
> 
> OOps, my bad :(.
> 
> We added these changes to handle higher clock rates. These rates when 
> greater than 4.3Ghz cannot be represented in 32bit variables. For DP, we 
> already have 5.4G and 8.1GHz freq for VCO clock. We will need these Khz 
> freq list in clock driver.
>   Let me check if they can do something like the byte/pixel clocks of 
> display.

Well then we really should just throw away the freq table here and have
rcg ops that pass the frequency up to the display PLL. Also, those
numbers look like gigabits per second (Gbit/s) for the DP spec which
isn't exactly the same as a clk frequency. What frequency does the PLL
run at for these various DP link speeds?



Re: Logitech high-resolution scrolling..

2018-10-30 Thread Linus Torvalds
On Mon, Oct 29, 2018 at 11:27 PM Peter Hutterer
 wrote:
>
> Other issues I found with an MX Anywhere 2S is that on slow scroll and in
> ratchet mode we get some scroll jitter.  In ratchet mode  we can get this
> sequence if you scroll just past the notch and it snaps back:
> [1, 1, 1, 1, 1, 1, 1, 1, -1]
> That's quite easy to trigger. In free-wheel mode we may get the same for
> slow motion due to human finger jitter (the Anywhere 2S didn't have HW
> jitter, but other devices may). So a perceived-consistent scroll motion may
> really look like this:
> [1, 1, 1, 1, 1, -1, 1, 1]
> Hard to triggger but when it does, it feels like we're dropping events.
> The former isn't that much of an issue as long as the ratchet is enabled so
> you get the haptic feedback and we (usually) don't drop events.

Both of these actually argue that doing the reset on direction change
can be a real problem.

But equally clearly, _not_ doing the reset is unacceptable too.

I wonder if there's some docs on what Logitech does internally in the
mouse. It might involve a timeout (ie "if not moving for a while, do
the rounding _and_ reset), which would probably be too expensive to do
on the host side.

Linus


Re: Logitech high-resolution scrolling..

2018-10-30 Thread Linus Torvalds
On Mon, Oct 29, 2018 at 11:27 PM Peter Hutterer
 wrote:
>
> Other issues I found with an MX Anywhere 2S is that on slow scroll and in
> ratchet mode we get some scroll jitter.  In ratchet mode  we can get this
> sequence if you scroll just past the notch and it snaps back:
> [1, 1, 1, 1, 1, 1, 1, 1, -1]
> That's quite easy to trigger. In free-wheel mode we may get the same for
> slow motion due to human finger jitter (the Anywhere 2S didn't have HW
> jitter, but other devices may). So a perceived-consistent scroll motion may
> really look like this:
> [1, 1, 1, 1, 1, -1, 1, 1]
> Hard to triggger but when it does, it feels like we're dropping events.
> The former isn't that much of an issue as long as the ratchet is enabled so
> you get the haptic feedback and we (usually) don't drop events.

Both of these actually argue that doing the reset on direction change
can be a real problem.

But equally clearly, _not_ doing the reset is unacceptable too.

I wonder if there's some docs on what Logitech does internally in the
mouse. It might involve a timeout (ie "if not moving for a while, do
the rounding _and_ reset), which would probably be too expensive to do
on the host side.

Linus


Re: [PATCH 16/17] prmem: pratomic-long

2018-10-30 Thread Will Deacon
On Tue, Oct 30, 2018 at 04:58:41PM +0100, Peter Zijlstra wrote:
> On Mon, Oct 29, 2018 at 11:17:14PM +0200, Igor Stoppa wrote:
> > 
> > 
> > On 25/10/2018 01:13, Peter Zijlstra wrote:
> > > On Wed, Oct 24, 2018 at 12:35:03AM +0300, Igor Stoppa wrote:
> > > > +static __always_inline
> > > > +bool __pratomic_long_op(bool inc, struct pratomic_long_t *l)
> > > > +{
> > > > +   struct page *page;
> > > > +   uintptr_t base;
> > > > +   uintptr_t offset;
> > > > +   unsigned long flags;
> > > > +   size_t size = sizeof(*l);
> > > > +   bool is_virt = __is_wr_after_init(l, size);
> > > > +
> > > > +   if (WARN(!(is_virt || likely(__is_wr_pool(l, size))),
> > > > +WR_ERR_RANGE_MSG))
> > > > +   return false;
> > > > +   local_irq_save(flags);
> > > > +   if (is_virt)
> > > > +   page = virt_to_page(l);
> > > > +   else
> > > > +   vmalloc_to_page(l);
> > > > +   offset = (~PAGE_MASK) & (uintptr_t)l;
> > > > +   base = (uintptr_t)vmap(, 1, VM_MAP, PAGE_KERNEL);
> > > > +   if (WARN(!base, WR_ERR_PAGE_MSG)) {
> > > > +   local_irq_restore(flags);
> > > > +   return false;
> > > > +   }
> > > > +   if (inc)
> > > > +   atomic_long_inc((atomic_long_t *)(base + offset));
> > > > +   else
> > > > +   atomic_long_dec((atomic_long_t *)(base + offset));
> > > > +   vunmap((void *)base);
> > > > +   local_irq_restore(flags);
> > > > +   return true;
> > > > +
> > > > +}
> > > 
> > > That's just hideously nasty.. and horribly broken.
> > > 
> > > We're not going to duplicate all these kernel interfaces wrapped in gunk
> > > like that.
> > 
> > one possibility would be to have macros which use typeof() on the parameter
> > being passed, to decide what implementation to use: regular or write-rare
> > 
> > This means that type punning would still be needed, to select the
> > implementation.
> > 
> > Would this be enough? Is there some better way?
> 
> Like mentioned elsewhere; if you do write_enable() + write_disable()
> thingies, it all becomes:
> 
>   write_enable();
>   atomic_foo();
>   write_disable();
> 
> No magic gunk infested duplication at all. Of course, ideally you'd then
> teach objtool about this (or a GCC plugin I suppose) to ensure any
> enable reached a disable.

Isn't the issue here that we don't want to change the page tables for the
mapping of , but instead want to create a temporary writable alias
at a random virtual address?

So you'd want:

wbar = write_enable();
atomic_foo(wbar);
write_disable(wbar);

which is probably better expressed as a map/unmap API. I suspect this
would also be the only way to do things for cmpxchg() loops, where you
want to create the mapping outside of the loop to minimise your time in
the critical section.

Will


Re: [Xen-devel] [PATCH] xen-blkfront: fix kernel panic with negotiate_mq error path

2018-10-30 Thread Manjunath Patil

Thank you Juergen for your comments.

I will soon send v2 patch.

-Thanks,
Manjunath
On 10/30/2018 12:04 AM, Juergen Gross wrote:

On 29/10/2018 19:31, Manjunath Patil wrote:

info->nr_rings isn't adjusted in case of ENOMEM error from
negotiate_mq(). This leads to kernel panic in error path.

Typical call stack involving panic -
  #8 page_fault at 8175936f
 [exception RIP: blkif_free_ring+33]
 RIP: a0149491  RSP: 8804f7673c08  RFLAGS: 00010292
  ...
  #9 blkif_free at a0149aaa [xen_blkfront]
  #10 talk_to_blkback at a014c8cd [xen_blkfront]
  #11 blkback_changed at a014ea8b [xen_blkfront]
  #12 xenbus_otherend_changed at 81424670
  #13 backend_changed at 81426dc3
  #14 xenwatch_thread at 81422f29
  #15 kthread at 810abe6a
  #16 ret_from_fork at 81754078

Though either of my changes avoid the panic, I included both the
changes. This issue got introduced with "7ed8ce1 xen-blkfront: move
negotiate_mq to cover all cases of new VBDs"

Please us the correct format for specifying another commit.


Signed-off-by: Manjunath Patil 

Can you please add a "Fixes:" tag and "Cc: sta...@vger.kernel.org" ?


---
  drivers/block/xen-blkfront.c |7 +--
  1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 429d201..dc8fe25 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1350,8 +1350,10 @@ static void blkif_free(struct blkfront_info *info, int 
suspend)
if (info->rq)
blk_mq_stop_hw_queues(info->rq);
  
-	for (i = 0; i < info->nr_rings; i++)

-   blkif_free_ring(>rinfo[i]);
+   if (info->rinfo) {
+   for (i = 0; i < info->nr_rings; i++)
+   blkif_free_ring(>rinfo[i]);
+   }

I'd drop this change.

  
  	kfree(info->rinfo);

info->rinfo = NULL;
@@ -1919,6 +1921,7 @@ static int negotiate_mq(struct blkfront_info *info)
  GFP_KERNEL);
if (!info->rinfo) {
xenbus_dev_fatal(info->xbdev, -ENOMEM, "allocating ring_info 
structure");
+   info->nr_rings = 0;
return -ENOMEM;
}


Juergen





Re: [PATCH 16/17] prmem: pratomic-long

2018-10-30 Thread Will Deacon
On Tue, Oct 30, 2018 at 04:58:41PM +0100, Peter Zijlstra wrote:
> On Mon, Oct 29, 2018 at 11:17:14PM +0200, Igor Stoppa wrote:
> > 
> > 
> > On 25/10/2018 01:13, Peter Zijlstra wrote:
> > > On Wed, Oct 24, 2018 at 12:35:03AM +0300, Igor Stoppa wrote:
> > > > +static __always_inline
> > > > +bool __pratomic_long_op(bool inc, struct pratomic_long_t *l)
> > > > +{
> > > > +   struct page *page;
> > > > +   uintptr_t base;
> > > > +   uintptr_t offset;
> > > > +   unsigned long flags;
> > > > +   size_t size = sizeof(*l);
> > > > +   bool is_virt = __is_wr_after_init(l, size);
> > > > +
> > > > +   if (WARN(!(is_virt || likely(__is_wr_pool(l, size))),
> > > > +WR_ERR_RANGE_MSG))
> > > > +   return false;
> > > > +   local_irq_save(flags);
> > > > +   if (is_virt)
> > > > +   page = virt_to_page(l);
> > > > +   else
> > > > +   vmalloc_to_page(l);
> > > > +   offset = (~PAGE_MASK) & (uintptr_t)l;
> > > > +   base = (uintptr_t)vmap(, 1, VM_MAP, PAGE_KERNEL);
> > > > +   if (WARN(!base, WR_ERR_PAGE_MSG)) {
> > > > +   local_irq_restore(flags);
> > > > +   return false;
> > > > +   }
> > > > +   if (inc)
> > > > +   atomic_long_inc((atomic_long_t *)(base + offset));
> > > > +   else
> > > > +   atomic_long_dec((atomic_long_t *)(base + offset));
> > > > +   vunmap((void *)base);
> > > > +   local_irq_restore(flags);
> > > > +   return true;
> > > > +
> > > > +}
> > > 
> > > That's just hideously nasty.. and horribly broken.
> > > 
> > > We're not going to duplicate all these kernel interfaces wrapped in gunk
> > > like that.
> > 
> > one possibility would be to have macros which use typeof() on the parameter
> > being passed, to decide what implementation to use: regular or write-rare
> > 
> > This means that type punning would still be needed, to select the
> > implementation.
> > 
> > Would this be enough? Is there some better way?
> 
> Like mentioned elsewhere; if you do write_enable() + write_disable()
> thingies, it all becomes:
> 
>   write_enable();
>   atomic_foo();
>   write_disable();
> 
> No magic gunk infested duplication at all. Of course, ideally you'd then
> teach objtool about this (or a GCC plugin I suppose) to ensure any
> enable reached a disable.

Isn't the issue here that we don't want to change the page tables for the
mapping of , but instead want to create a temporary writable alias
at a random virtual address?

So you'd want:

wbar = write_enable();
atomic_foo(wbar);
write_disable(wbar);

which is probably better expressed as a map/unmap API. I suspect this
would also be the only way to do things for cmpxchg() loops, where you
want to create the mapping outside of the loop to minimise your time in
the critical section.

Will


Re: [Xen-devel] [PATCH] xen-blkfront: fix kernel panic with negotiate_mq error path

2018-10-30 Thread Manjunath Patil

Thank you Juergen for your comments.

I will soon send v2 patch.

-Thanks,
Manjunath
On 10/30/2018 12:04 AM, Juergen Gross wrote:

On 29/10/2018 19:31, Manjunath Patil wrote:

info->nr_rings isn't adjusted in case of ENOMEM error from
negotiate_mq(). This leads to kernel panic in error path.

Typical call stack involving panic -
  #8 page_fault at 8175936f
 [exception RIP: blkif_free_ring+33]
 RIP: a0149491  RSP: 8804f7673c08  RFLAGS: 00010292
  ...
  #9 blkif_free at a0149aaa [xen_blkfront]
  #10 talk_to_blkback at a014c8cd [xen_blkfront]
  #11 blkback_changed at a014ea8b [xen_blkfront]
  #12 xenbus_otherend_changed at 81424670
  #13 backend_changed at 81426dc3
  #14 xenwatch_thread at 81422f29
  #15 kthread at 810abe6a
  #16 ret_from_fork at 81754078

Though either of my changes avoid the panic, I included both the
changes. This issue got introduced with "7ed8ce1 xen-blkfront: move
negotiate_mq to cover all cases of new VBDs"

Please us the correct format for specifying another commit.


Signed-off-by: Manjunath Patil 

Can you please add a "Fixes:" tag and "Cc: sta...@vger.kernel.org" ?


---
  drivers/block/xen-blkfront.c |7 +--
  1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 429d201..dc8fe25 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1350,8 +1350,10 @@ static void blkif_free(struct blkfront_info *info, int 
suspend)
if (info->rq)
blk_mq_stop_hw_queues(info->rq);
  
-	for (i = 0; i < info->nr_rings; i++)

-   blkif_free_ring(>rinfo[i]);
+   if (info->rinfo) {
+   for (i = 0; i < info->nr_rings; i++)
+   blkif_free_ring(>rinfo[i]);
+   }

I'd drop this change.

  
  	kfree(info->rinfo);

info->rinfo = NULL;
@@ -1919,6 +1921,7 @@ static int negotiate_mq(struct blkfront_info *info)
  GFP_KERNEL);
if (!info->rinfo) {
xenbus_dev_fatal(info->xbdev, -ENOMEM, "allocating ring_info 
structure");
+   info->nr_rings = 0;
return -ENOMEM;
}


Juergen





Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Oleg Nesterov
On 10/30, Tycho Andersen wrote:
>
> @@ -828,6 +823,11 @@ static int __seccomp_filter(int this_syscall, const 
> struct seccomp_data *sd,
>*/
>   rmb();
>  
> + if (!sd) {
> + populate_seccomp_data(_local);
> + sd = _local;
> + }
> +

To me it would be more clean to remove the "if (!sd)" check, 
case(SECCOMP_RET_TRACE)
in __seccomp_filter() can simply do populate_seccomp_data(_local) 
unconditionally
and pass _local to __seccomp_filter().

Oleg.



Re: Regression found (Stop-marking-clocks-as-CLK_IS_CRITICAL)

2018-10-30 Thread Pierre-Louis Bossart



On 10/30/18 11:02 AM, Hans de Goede wrote:

Hi,

On 30-10-18 16:46, Hans de Goede wrote:

Hi,

On 30-10-18 16:04, Pierre-Louis Bossart wrote:


On 10/30/18 9:38 AM, Dean Wallace wrote:

On 30-10-18, Hans de Goede wrote:

Hi Dean,

Attached are 2 different attempts at fixing this.

When trying these patches do not forget to remove the revert of the
"Stop-marking-clocks-as-CLK_IS_CRITICAL" commit.

Please first try the 
0001-ASoC-intel-cht_bsw_max98090_ti-Use-pmc_plt_clk_0-ins.patch
patch I expect that one to do the trick indicating that the Swanky 
model
uses a different pmc clk then which is normally used for the codec 
clock.


If that patch does not fix things, please give the other patch a try.


For Baytrail devices, the audio platform clocks are not managed by 
the firmware. They are for CHT-based devices - as can be seen by 
clock resources being described in the DSDT. We used to have a 
if(baytrail) in the code which was replaced by this CRITICAL label, 
but the point remains that there is a difference between the two SOC 
versions.


As I mentioned before the CRITICAL flag was only added a year ago to 
workaround
an issue with on board ethernet needing plt_clk_4 on some laptops, 
this never

had anything to do with sound.


Ah I see now that you later made some changes based on the patch to 
fix the ethernet:


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=7735bce05a9c0bb0eb0f08c9002d65843a7c5798 



Note though that that does not touch the machine driver which we are 
discussing

now and the reporters machine is also BYT based.

As explained below (in my original reply) I think it is fine to always
manage the clk from within the kernel. But if you think this is a bad
idea, we could re-introduce the is_valleyview() checks in machine
drivers which are used on CHT devices.


I am having difficulties following the proposal

for audio, managing the platform clocks from the kernel is only useful 
for Baytrail, and the code handling the CRITICAL flags is really 
equivalent to having a is_valleyview() test in the machine drivers. I 
don't quite understand the S0ix-related changes and I also don't get how 
using plt_clk_0 makes things better or wonder if audio on Swanky worked 
before the commit 7735bce05a9c ('ASoC: Intel: boards: use devm_clk_get() 
unconditionally'.


In other words, I don't know if we are dealing with a platform that only 
started working with 7735bce05a9c and does need a quirk to make use of 
plt_clk_0 or a fundamental issue with clock/power management.





In addition I am not aware of any baytrail device using plt_clk_0, 
so moving a common machine driver such a cht_bsw_max98090_ti to use 
plt_clk0 only would break other devices (e.g. Rambi/Orco). Asking 
for both clocks to be on might work though, 


Ok, so we need to have a DMI based quirk for the Swanky and maybe also
the clapper to use plt_clk_0 there. Asking for 2 clks if we only need
one does not seem like a good plan.

however you still have the problem of trying to manage from the 
kernel what the firmware already manages.


The firmware only manages it when going to D3 state, with ASoC most of
the codecs gets turned off (and we no longer need the clock) but we do
not put the device in D3 / execute the _PS3 method. So from a pm pov
it is better if we manage the clk ourselves.

Once we do actually put the device in D3 (on suspend) the kernel will 
have
already turned off the clk and the _OFF method of the CLK3 power 
resource

which directly pokes mmio, will just set the enable bit to 0 when it
already is 0, so no problem.


Regards,

Hans


Re: [PATCH v8 1/2] seccomp: add a return code to trap to userspace

2018-10-30 Thread Oleg Nesterov
On 10/30, Tycho Andersen wrote:
>
> @@ -828,6 +823,11 @@ static int __seccomp_filter(int this_syscall, const 
> struct seccomp_data *sd,
>*/
>   rmb();
>  
> + if (!sd) {
> + populate_seccomp_data(_local);
> + sd = _local;
> + }
> +

To me it would be more clean to remove the "if (!sd)" check, 
case(SECCOMP_RET_TRACE)
in __seccomp_filter() can simply do populate_seccomp_data(_local) 
unconditionally
and pass _local to __seccomp_filter().

Oleg.



Re: Regression found (Stop-marking-clocks-as-CLK_IS_CRITICAL)

2018-10-30 Thread Pierre-Louis Bossart



On 10/30/18 11:02 AM, Hans de Goede wrote:

Hi,

On 30-10-18 16:46, Hans de Goede wrote:

Hi,

On 30-10-18 16:04, Pierre-Louis Bossart wrote:


On 10/30/18 9:38 AM, Dean Wallace wrote:

On 30-10-18, Hans de Goede wrote:

Hi Dean,

Attached are 2 different attempts at fixing this.

When trying these patches do not forget to remove the revert of the
"Stop-marking-clocks-as-CLK_IS_CRITICAL" commit.

Please first try the 
0001-ASoC-intel-cht_bsw_max98090_ti-Use-pmc_plt_clk_0-ins.patch
patch I expect that one to do the trick indicating that the Swanky 
model
uses a different pmc clk then which is normally used for the codec 
clock.


If that patch does not fix things, please give the other patch a try.


For Baytrail devices, the audio platform clocks are not managed by 
the firmware. They are for CHT-based devices - as can be seen by 
clock resources being described in the DSDT. We used to have a 
if(baytrail) in the code which was replaced by this CRITICAL label, 
but the point remains that there is a difference between the two SOC 
versions.


As I mentioned before the CRITICAL flag was only added a year ago to 
workaround
an issue with on board ethernet needing plt_clk_4 on some laptops, 
this never

had anything to do with sound.


Ah I see now that you later made some changes based on the patch to 
fix the ethernet:


https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=7735bce05a9c0bb0eb0f08c9002d65843a7c5798 



Note though that that does not touch the machine driver which we are 
discussing

now and the reporters machine is also BYT based.

As explained below (in my original reply) I think it is fine to always
manage the clk from within the kernel. But if you think this is a bad
idea, we could re-introduce the is_valleyview() checks in machine
drivers which are used on CHT devices.


I am having difficulties following the proposal

for audio, managing the platform clocks from the kernel is only useful 
for Baytrail, and the code handling the CRITICAL flags is really 
equivalent to having a is_valleyview() test in the machine drivers. I 
don't quite understand the S0ix-related changes and I also don't get how 
using plt_clk_0 makes things better or wonder if audio on Swanky worked 
before the commit 7735bce05a9c ('ASoC: Intel: boards: use devm_clk_get() 
unconditionally'.


In other words, I don't know if we are dealing with a platform that only 
started working with 7735bce05a9c and does need a quirk to make use of 
plt_clk_0 or a fundamental issue with clock/power management.





In addition I am not aware of any baytrail device using plt_clk_0, 
so moving a common machine driver such a cht_bsw_max98090_ti to use 
plt_clk0 only would break other devices (e.g. Rambi/Orco). Asking 
for both clocks to be on might work though, 


Ok, so we need to have a DMI based quirk for the Swanky and maybe also
the clapper to use plt_clk_0 there. Asking for 2 clks if we only need
one does not seem like a good plan.

however you still have the problem of trying to manage from the 
kernel what the firmware already manages.


The firmware only manages it when going to D3 state, with ASoC most of
the codecs gets turned off (and we no longer need the clock) but we do
not put the device in D3 / execute the _PS3 method. So from a pm pov
it is better if we manage the clk ourselves.

Once we do actually put the device in D3 (on suspend) the kernel will 
have
already turned off the clk and the _OFF method of the CLK3 power 
resource

which directly pokes mmio, will just set the enable bit to 0 when it
already is 0, so no problem.


Regards,

Hans


[PATCH v2] ASoC: AMD: Fix race condition between register access

2018-10-30 Thread Agrawal, Akshu
During simultaneous running of playback and capture, we
got hit by incorrect value write on common register. This was due
to race condition between 2 streams.
Fixing this by locking the common register access.

Signed-off-by: Akshu Agrawal 
---
v2: Added 2 helper functions, removed locking in ch enable/disable as they
are separate registers.
 sound/soc/amd/acp-pcm-dma.c | 63 ++---
 1 file changed, 48 insertions(+), 15 deletions(-)

diff --git a/sound/soc/amd/acp-pcm-dma.c b/sound/soc/amd/acp-pcm-dma.c
index 0ac4b5b..5be9c2d 100644
--- a/sound/soc/amd/acp-pcm-dma.c
+++ b/sound/soc/amd/acp-pcm-dma.c
@@ -121,6 +121,9 @@
.periods_max = CAPTURE_MAX_NUM_PERIODS,
 };
 
+/* Lock to protect access to registers */
+static DEFINE_SPINLOCK(lock);
+
 static u32 acp_reg_read(void __iomem *acp_mmio, u32 reg)
 {
return readl(acp_mmio + (reg * 4));
@@ -131,6 +134,33 @@ static void acp_reg_write(u32 val, void __iomem *acp_mmio, 
u32 reg)
writel(val, acp_mmio + (reg * 4));
 }
 
+static void acp_reg_write_srbm_targ(void __iomem *acp_mmio,
+   u32 addr, u32 data)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(, flags);
+   acp_reg_write(addr, acp_mmio, mmACP_SRBM_Targ_Idx_Addr);
+   acp_reg_write(data, acp_mmio, mmACP_SRBM_Targ_Idx_Data);
+   spin_unlock_irqrestore(, flags);
+}
+
+static void acp_reg_read_mod_write(void __iomem *acp_mmio, u32 addr,
+  u32 mask, bool set)
+{
+   u32 val;
+   unsigned long flags;
+
+   spin_lock_irqsave(, flags);
+   val = acp_reg_read(acp_mmio, addr);
+   if (set)
+   val |= mask;
+   else
+   val &= ~mask;
+   acp_reg_write(val, acp_mmio, addr);
+   spin_unlock_irqrestore(, flags);
+}
+
 /*
  * Configure a given dma channel parameters - enable/disable,
  * number of descriptors, priority
@@ -172,15 +202,12 @@ static void config_dma_descriptor_in_sram(void __iomem 
*acp_mmio,
sram_offset = (descr_idx * sizeof(acp_dma_dscr_transfer_t));
 
/* program the source base address. */
-   acp_reg_write(sram_offset, acp_mmio, mmACP_SRBM_Targ_Idx_Addr);
-   acp_reg_write(descr_info->src,  acp_mmio, mmACP_SRBM_Targ_Idx_Data);
+   acp_reg_write_srbm_targ(acp_mmio, sram_offset, descr_info->src);
/* program the destination base address. */
-   acp_reg_write(sram_offset + 4,  acp_mmio, mmACP_SRBM_Targ_Idx_Addr);
-   acp_reg_write(descr_info->dest, acp_mmio, mmACP_SRBM_Targ_Idx_Data);
-
+   acp_reg_write_srbm_targ(acp_mmio, sram_offset + 4, descr_info->dest);
/* program the number of bytes to be transferred for this descriptor. */
-   acp_reg_write(sram_offset + 8,  acp_mmio, mmACP_SRBM_Targ_Idx_Addr);
-   acp_reg_write(descr_info->xfer_val, acp_mmio, mmACP_SRBM_Targ_Idx_Data);
+   acp_reg_write_srbm_targ(acp_mmio, sram_offset + 8,
+   descr_info->xfer_val);
 }
 
 static void pre_config_reset(void __iomem *acp_mmio, u16 ch_num)
@@ -309,8 +336,12 @@ static void acp_pte_config(void __iomem *acp_mmio, struct 
page *pg,
u32 low;
u32 high;
u32 offset;
+   unsigned long flags;
 
offset  = ACP_DAGB_GRP_SRBM_SRAM_BASE_OFFSET + (pte_offset * 8);
+
+   spin_lock_irqsave(, flags);
+
for (page_idx = 0; page_idx < (num_of_pages); page_idx++) {
/* Load the low address of page int ACP SRAM through SRBM */
acp_reg_write((offset + (page_idx * 8)),
@@ -333,6 +364,8 @@ static void acp_pte_config(void __iomem *acp_mmio, struct 
page *pg,
/* Move to next physically contiguos page */
pg++;
}
+
+   spin_unlock_irqrestore(, flags);
 }
 
 static void config_acp_dma(void __iomem *acp_mmio,
@@ -870,29 +903,29 @@ static int acp_dma_hw_params(struct snd_pcm_substream 
*substream,
}
}
if (adata->asic_type == CHIP_STONEY) {
-   val = acp_reg_read(adata->acp_mmio,
-  mmACP_I2S_16BIT_RESOLUTION_EN);
if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) {
switch (rtd->i2s_instance) {
case I2S_BT_INSTANCE:
-   val |= ACP_I2S_BT_16BIT_RESOLUTION_EN;
+   val = ACP_I2S_BT_16BIT_RESOLUTION_EN;
break;
case I2S_SP_INSTANCE:
default:
-   val |= ACP_I2S_SP_16BIT_RESOLUTION_EN;
+   val = ACP_I2S_SP_16BIT_RESOLUTION_EN;
}
} else {
switch (rtd->i2s_instance) {
case I2S_BT_INSTANCE:
-   val |= ACP_I2S_BT_16BIT_RESOLUTION_EN;
+   val = ACP_I2S_BT_16BIT_RESOLUTION_EN;
 

[PATCH v2] ASoC: AMD: Fix race condition between register access

2018-10-30 Thread Agrawal, Akshu
During simultaneous running of playback and capture, we
got hit by incorrect value write on common register. This was due
to race condition between 2 streams.
Fixing this by locking the common register access.

Signed-off-by: Akshu Agrawal 
---
v2: Added 2 helper functions, removed locking in ch enable/disable as they
are separate registers.
 sound/soc/amd/acp-pcm-dma.c | 63 ++---
 1 file changed, 48 insertions(+), 15 deletions(-)

diff --git a/sound/soc/amd/acp-pcm-dma.c b/sound/soc/amd/acp-pcm-dma.c
index 0ac4b5b..5be9c2d 100644
--- a/sound/soc/amd/acp-pcm-dma.c
+++ b/sound/soc/amd/acp-pcm-dma.c
@@ -121,6 +121,9 @@
.periods_max = CAPTURE_MAX_NUM_PERIODS,
 };
 
+/* Lock to protect access to registers */
+static DEFINE_SPINLOCK(lock);
+
 static u32 acp_reg_read(void __iomem *acp_mmio, u32 reg)
 {
return readl(acp_mmio + (reg * 4));
@@ -131,6 +134,33 @@ static void acp_reg_write(u32 val, void __iomem *acp_mmio, 
u32 reg)
writel(val, acp_mmio + (reg * 4));
 }
 
+static void acp_reg_write_srbm_targ(void __iomem *acp_mmio,
+   u32 addr, u32 data)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(, flags);
+   acp_reg_write(addr, acp_mmio, mmACP_SRBM_Targ_Idx_Addr);
+   acp_reg_write(data, acp_mmio, mmACP_SRBM_Targ_Idx_Data);
+   spin_unlock_irqrestore(, flags);
+}
+
+static void acp_reg_read_mod_write(void __iomem *acp_mmio, u32 addr,
+  u32 mask, bool set)
+{
+   u32 val;
+   unsigned long flags;
+
+   spin_lock_irqsave(, flags);
+   val = acp_reg_read(acp_mmio, addr);
+   if (set)
+   val |= mask;
+   else
+   val &= ~mask;
+   acp_reg_write(val, acp_mmio, addr);
+   spin_unlock_irqrestore(, flags);
+}
+
 /*
  * Configure a given dma channel parameters - enable/disable,
  * number of descriptors, priority
@@ -172,15 +202,12 @@ static void config_dma_descriptor_in_sram(void __iomem 
*acp_mmio,
sram_offset = (descr_idx * sizeof(acp_dma_dscr_transfer_t));
 
/* program the source base address. */
-   acp_reg_write(sram_offset, acp_mmio, mmACP_SRBM_Targ_Idx_Addr);
-   acp_reg_write(descr_info->src,  acp_mmio, mmACP_SRBM_Targ_Idx_Data);
+   acp_reg_write_srbm_targ(acp_mmio, sram_offset, descr_info->src);
/* program the destination base address. */
-   acp_reg_write(sram_offset + 4,  acp_mmio, mmACP_SRBM_Targ_Idx_Addr);
-   acp_reg_write(descr_info->dest, acp_mmio, mmACP_SRBM_Targ_Idx_Data);
-
+   acp_reg_write_srbm_targ(acp_mmio, sram_offset + 4, descr_info->dest);
/* program the number of bytes to be transferred for this descriptor. */
-   acp_reg_write(sram_offset + 8,  acp_mmio, mmACP_SRBM_Targ_Idx_Addr);
-   acp_reg_write(descr_info->xfer_val, acp_mmio, mmACP_SRBM_Targ_Idx_Data);
+   acp_reg_write_srbm_targ(acp_mmio, sram_offset + 8,
+   descr_info->xfer_val);
 }
 
 static void pre_config_reset(void __iomem *acp_mmio, u16 ch_num)
@@ -309,8 +336,12 @@ static void acp_pte_config(void __iomem *acp_mmio, struct 
page *pg,
u32 low;
u32 high;
u32 offset;
+   unsigned long flags;
 
offset  = ACP_DAGB_GRP_SRBM_SRAM_BASE_OFFSET + (pte_offset * 8);
+
+   spin_lock_irqsave(, flags);
+
for (page_idx = 0; page_idx < (num_of_pages); page_idx++) {
/* Load the low address of page int ACP SRAM through SRBM */
acp_reg_write((offset + (page_idx * 8)),
@@ -333,6 +364,8 @@ static void acp_pte_config(void __iomem *acp_mmio, struct 
page *pg,
/* Move to next physically contiguos page */
pg++;
}
+
+   spin_unlock_irqrestore(, flags);
 }
 
 static void config_acp_dma(void __iomem *acp_mmio,
@@ -870,29 +903,29 @@ static int acp_dma_hw_params(struct snd_pcm_substream 
*substream,
}
}
if (adata->asic_type == CHIP_STONEY) {
-   val = acp_reg_read(adata->acp_mmio,
-  mmACP_I2S_16BIT_RESOLUTION_EN);
if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) {
switch (rtd->i2s_instance) {
case I2S_BT_INSTANCE:
-   val |= ACP_I2S_BT_16BIT_RESOLUTION_EN;
+   val = ACP_I2S_BT_16BIT_RESOLUTION_EN;
break;
case I2S_SP_INSTANCE:
default:
-   val |= ACP_I2S_SP_16BIT_RESOLUTION_EN;
+   val = ACP_I2S_SP_16BIT_RESOLUTION_EN;
}
} else {
switch (rtd->i2s_instance) {
case I2S_BT_INSTANCE:
-   val |= ACP_I2S_BT_16BIT_RESOLUTION_EN;
+   val = ACP_I2S_BT_16BIT_RESOLUTION_EN;
 

Re: Re: [PATCH v2] clk: boston: fix possible memory leak in clk_boston_setup()

2018-10-30 Thread Stephen Boyd
Quoting wang.y...@zte.com.cn (2018-10-29 23:13:24)
> > Quoting Yi Wang (2018-10-29 01:31:47)
> > > 'onecell' is malloced in clk_boston_setup(), but is not freed
> > > before leaving from the error handling cases.
> >
> > How did you find this? Visual inspection? Some coccinelle script?
> 
> Smatch report this:
> drivers/clk/imgtec/clk-boston.c:76 clk_boston_setup() warn: possible memory 
> leak of 'onecell'
> drivers/clk/imgtec/clk-boston.c:83 clk_boston_setup() warn: possible memory 
> leak of 'onecell'
> drivers/clk/imgtec/clk-boston.c:90 clk_boston_setup() warn: possible memory 
> leak of 'onecell'

Ok. Please include those details in the commit text.

> 
> >
> > >
> > > Signed-off-by: Yi Wang 
> > > ---
> > > v2: fix syntax issue in comment, thanks to  Sergei.
> > >
> > >  drivers/clk/imgtec/clk-boston.c | 11 ---
> > >  1 file changed, 8 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/clk/imgtec/clk-boston.c 
> > > b/drivers/clk/imgtec/clk-boston.c
> > > index 15af423..f5d54a6 100644
> > > --- a/drivers/clk/imgtec/clk-boston.c
> > > +++ b/drivers/clk/imgtec/clk-boston.c
> > > @@ -73,27 +73,32 @@ static void __init clk_boston_setup(struct 
> > > device_node *np)
> > > hw = clk_hw_register_fixed_rate(NULL, "input", NULL, 0, in_freq);
> > > if (IS_ERR(hw)) {
> > > pr_err("failed to register input clock: %ld\n", 
> > > PTR_ERR(hw));
> > > -   return;
> > > +   goto error;
> > > }
> > > onecell->hws[BOSTON_CLK_INPUT] = hw;
> > >
> > > hw = clk_hw_register_fixed_rate(NULL, "sys", "input", 0, 
> > > sys_freq);
> > > if (IS_ERR(hw)) {
> > > pr_err("failed to register sys clock: %ld\n", 
> > > PTR_ERR(hw));
> > > -   return;
> > > +   goto error;
> > > }
> > > onecell->hws[BOSTON_CLK_SYS] = hw;
> > >
> > > hw = clk_hw_register_fixed_rate(NULL, "cpu", "input", 0, 
> > > cpu_freq);
> > > if (IS_ERR(hw)) {
> > > pr_err("failed to register cpu clock: %ld\n", 
> > > PTR_ERR(hw));
> > > -   return;
> > > +   goto error;
> > > }
> > > onecell->hws[BOSTON_CLK_CPU] = hw;
> > >
> > > err = of_clk_add_hw_provider(np, of_clk_hw_onecell_get, onecell);
> > > if (err)
> > > pr_err("failed to add DT provider: %d\n", err);
> > > +
> > > +   return;
> > > +
> > > +error:
> > > +   kfree(onecell);
> >
> > Ok, sure. But then clks are still left registered on failure?
> 
> Yeah, but this patch does not change the original flow of the function, so I 
> suppose
> if you deem this is not proper, it's better to improve that in another patch, 
> what do
> you think?
> 

I think we should attempt to fix all the theoretical problems with the
driver instead of just fixing some things to make static checkers happy.
It looks like this driver was written with the assumption that if things
go bad we give up all hope. It would be better to clean everything up
properly if things go bad and have better code.



Re: Re: [PATCH v2] clk: boston: fix possible memory leak in clk_boston_setup()

2018-10-30 Thread Stephen Boyd
Quoting wang.y...@zte.com.cn (2018-10-29 23:13:24)
> > Quoting Yi Wang (2018-10-29 01:31:47)
> > > 'onecell' is malloced in clk_boston_setup(), but is not freed
> > > before leaving from the error handling cases.
> >
> > How did you find this? Visual inspection? Some coccinelle script?
> 
> Smatch report this:
> drivers/clk/imgtec/clk-boston.c:76 clk_boston_setup() warn: possible memory 
> leak of 'onecell'
> drivers/clk/imgtec/clk-boston.c:83 clk_boston_setup() warn: possible memory 
> leak of 'onecell'
> drivers/clk/imgtec/clk-boston.c:90 clk_boston_setup() warn: possible memory 
> leak of 'onecell'

Ok. Please include those details in the commit text.

> 
> >
> > >
> > > Signed-off-by: Yi Wang 
> > > ---
> > > v2: fix syntax issue in comment, thanks to  Sergei.
> > >
> > >  drivers/clk/imgtec/clk-boston.c | 11 ---
> > >  1 file changed, 8 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/clk/imgtec/clk-boston.c 
> > > b/drivers/clk/imgtec/clk-boston.c
> > > index 15af423..f5d54a6 100644
> > > --- a/drivers/clk/imgtec/clk-boston.c
> > > +++ b/drivers/clk/imgtec/clk-boston.c
> > > @@ -73,27 +73,32 @@ static void __init clk_boston_setup(struct 
> > > device_node *np)
> > > hw = clk_hw_register_fixed_rate(NULL, "input", NULL, 0, in_freq);
> > > if (IS_ERR(hw)) {
> > > pr_err("failed to register input clock: %ld\n", 
> > > PTR_ERR(hw));
> > > -   return;
> > > +   goto error;
> > > }
> > > onecell->hws[BOSTON_CLK_INPUT] = hw;
> > >
> > > hw = clk_hw_register_fixed_rate(NULL, "sys", "input", 0, 
> > > sys_freq);
> > > if (IS_ERR(hw)) {
> > > pr_err("failed to register sys clock: %ld\n", 
> > > PTR_ERR(hw));
> > > -   return;
> > > +   goto error;
> > > }
> > > onecell->hws[BOSTON_CLK_SYS] = hw;
> > >
> > > hw = clk_hw_register_fixed_rate(NULL, "cpu", "input", 0, 
> > > cpu_freq);
> > > if (IS_ERR(hw)) {
> > > pr_err("failed to register cpu clock: %ld\n", 
> > > PTR_ERR(hw));
> > > -   return;
> > > +   goto error;
> > > }
> > > onecell->hws[BOSTON_CLK_CPU] = hw;
> > >
> > > err = of_clk_add_hw_provider(np, of_clk_hw_onecell_get, onecell);
> > > if (err)
> > > pr_err("failed to add DT provider: %d\n", err);
> > > +
> > > +   return;
> > > +
> > > +error:
> > > +   kfree(onecell);
> >
> > Ok, sure. But then clks are still left registered on failure?
> 
> Yeah, but this patch does not change the original flow of the function, so I 
> suppose
> if you deem this is not proper, it's better to improve that in another patch, 
> what do
> you think?
> 

I think we should attempt to fix all the theoretical problems with the
driver instead of just fixing some things to make static checkers happy.
It looks like this driver was written with the assumption that if things
go bad we give up all hope. It would be better to clean everything up
properly if things go bad and have better code.



Re: Regression found (Stop-marking-clocks-as-CLK_IS_CRITICAL)

2018-10-30 Thread Andy Shevchenko
On Tue, Oct 30, 2018 at 1:05 PM Hans de Goede  wrote:

> for i in /sys/kernel/debug/clk/pmc_plt_clk_?; do echo -n "$i: "; cat 
> $i/clk_flags; echo; done

Just a side note about nice tool, called `grep` :-)

grep -H . /sys/kernel/debug/clk/pmc_plt_clk_*/clk_flags

does above in one call.

-- 
With Best Regards,
Andy Shevchenko


Re: Regression found (Stop-marking-clocks-as-CLK_IS_CRITICAL)

2018-10-30 Thread Andy Shevchenko
On Tue, Oct 30, 2018 at 1:05 PM Hans de Goede  wrote:

> for i in /sys/kernel/debug/clk/pmc_plt_clk_?; do echo -n "$i: "; cat 
> $i/clk_flags; echo; done

Just a side note about nice tool, called `grep` :-)

grep -H . /sys/kernel/debug/clk/pmc_plt_clk_*/clk_flags

does above in one call.

-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH] arm64: dts: sdm845: Add PSCI cpuidle low power states

2018-10-30 Thread Raju P L S S S N




On 10/24/2018 12:06 AM, Doug Anderson wrote:

Hi,

On Mon, Oct 15, 2018 at 10:02 AM Raju P.L.S.S.S.N
 wrote:

+   idle-states {
+   entry-method = "psci";
+
+   C0_CPU_SPC: c0_spc {


nit: all these nodes should have dashes instead of spaces in the node
names (labels can still have spaces).  AKA:

C0_CPU_SPC: c0-spc {


Sure. I will change this. (However, from device tree specification v0.2, 
I see that Table 2.1 mentions underscore as valid character for node. 
Please correct me if I'm missing something)






+   compatible = "arm,idle-state";
+   arm,psci-suspend-param = <0x4003>;
+   entry-latency-us = <350>;
+   exit-latency-us = <461>;
+   min-residency-us = <1890>;
+   local-timer-stop;
+   idle-state-name = "pc";


It seems weird that the idle state with the node name "spc" has the
name "pc" and the idle state with the node name "pc" has the name
"rail pc".  Can this be more consistent or is there a reason why they
need to be mismatched?


Agree. They are confusing. Will change this.



Also: AAHTUFSWDKWTM (acronyms are hard to understand for someone who
doesn't know what they mean).  If you really need to use an the
acronyms "PC" and "SPC" please document them somewhere.  In the very
least the commit message, but having a comment in the file is good
too.  ...or (even better) don't use the acronym and spell out what
you're talking about.  Please correct me if I'm wrong, but I don't
think it's obvious what the "PC" and "SPC" idle states mean.


Sure. Will change this.


Thanks for feedback Doug.




-Doug



Re: [PATCH] arm64: dts: sdm845: Add PSCI cpuidle low power states

2018-10-30 Thread Raju P L S S S N




On 10/24/2018 12:06 AM, Doug Anderson wrote:

Hi,

On Mon, Oct 15, 2018 at 10:02 AM Raju P.L.S.S.S.N
 wrote:

+   idle-states {
+   entry-method = "psci";
+
+   C0_CPU_SPC: c0_spc {


nit: all these nodes should have dashes instead of spaces in the node
names (labels can still have spaces).  AKA:

C0_CPU_SPC: c0-spc {


Sure. I will change this. (However, from device tree specification v0.2, 
I see that Table 2.1 mentions underscore as valid character for node. 
Please correct me if I'm missing something)






+   compatible = "arm,idle-state";
+   arm,psci-suspend-param = <0x4003>;
+   entry-latency-us = <350>;
+   exit-latency-us = <461>;
+   min-residency-us = <1890>;
+   local-timer-stop;
+   idle-state-name = "pc";


It seems weird that the idle state with the node name "spc" has the
name "pc" and the idle state with the node name "pc" has the name
"rail pc".  Can this be more consistent or is there a reason why they
need to be mismatched?


Agree. They are confusing. Will change this.



Also: AAHTUFSWDKWTM (acronyms are hard to understand for someone who
doesn't know what they mean).  If you really need to use an the
acronyms "PC" and "SPC" please document them somewhere.  In the very
least the commit message, but having a comment in the file is good
too.  ...or (even better) don't use the acronym and spell out what
you're talking about.  Please correct me if I'm wrong, but I don't
think it's obvious what the "PC" and "SPC" idle states mean.


Sure. Will change this.


Thanks for feedback Doug.




-Doug



Re: Build error in drivers/cpufreq/intel_pstate.c

2018-10-30 Thread Srinivas Pandruvada
Hi,
On Tue, 2018-10-30 at 18:00 +0200, Igor Stoppa wrote:
> Hi,
> I'm getting the following build error:
> 
> /home/igor/dev/kernel/linux/drivers/cpufreq/intel_pstate.c: In
> function 
> ‘show_base_frequency’:
> /home/igor/dev/kernel/linux/drivers/cpufreq/intel_pstate.c:726:10: 
> error: implicit declaration of function 
> ‘intel_pstate_get_cppc_guranteed’; did you mean
> ‘intel_pstate_get_epp’? 
> [-Werror=implicit-function-declaration]
>ratio = intel_pstate_get_cppc_guranteed(policy->cpu);
>^~~
>intel_pstate_get_epp
> 
> 

This should fix this:

https://patchwork.kernel.org/patch/10653593/

Thanks,
Srinivas
> 
> on top of:
> 
> commit 11743c56785c751c087eecdb98713eef796609e0
> Merge: 929e134c43c9 928002a5e9da
> 
> --
> igor



Re: Build error in drivers/cpufreq/intel_pstate.c

2018-10-30 Thread Srinivas Pandruvada
Hi,
On Tue, 2018-10-30 at 18:00 +0200, Igor Stoppa wrote:
> Hi,
> I'm getting the following build error:
> 
> /home/igor/dev/kernel/linux/drivers/cpufreq/intel_pstate.c: In
> function 
> ‘show_base_frequency’:
> /home/igor/dev/kernel/linux/drivers/cpufreq/intel_pstate.c:726:10: 
> error: implicit declaration of function 
> ‘intel_pstate_get_cppc_guranteed’; did you mean
> ‘intel_pstate_get_epp’? 
> [-Werror=implicit-function-declaration]
>ratio = intel_pstate_get_cppc_guranteed(policy->cpu);
>^~~
>intel_pstate_get_epp
> 
> 

This should fix this:

https://patchwork.kernel.org/patch/10653593/

Thanks,
Srinivas
> 
> on top of:
> 
> commit 11743c56785c751c087eecdb98713eef796609e0
> Merge: 929e134c43c9 928002a5e9da
> 
> --
> igor



Re: [PATCH 2/2] thermal: broadcom: constify thermal_zone_of_device_ops structure

2018-10-30 Thread Daniel Lezcano
On 30/10/2018 16:15, Julia Lawall wrote:
> The thermal_zone_of_device_ops structure can be const as it is only
> passed as the last argument of thermal_zone_of_sensor_register
> and the corresponding parameter is declared as const.
> 
> Done with the help of Coccinelle.
> 
> Signed-off-by: Julia Lawall 

Reviewed-by: Daniel Lezcano 

> ---
> 
> Unrelated to this change, is there a reason not to use
> devm_thermal_zone_of_sensor_register?

Agree, it would make sense to use the devm* version.

>  drivers/thermal/broadcom/brcmstb_thermal.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff -u -p a/drivers/thermal/broadcom/brcmstb_thermal.c 
> b/drivers/thermal/broadcom/brcmstb_thermal.c
> --- a/drivers/thermal/broadcom/brcmstb_thermal.c
> +++ b/drivers/thermal/broadcom/brcmstb_thermal.c
> @@ -299,7 +299,7 @@ static int brcmstb_set_trips(void *data,
>   return 0;
>  }
>  
> -static struct thermal_zone_of_device_ops of_ops = {
> +static const struct thermal_zone_of_device_ops of_ops = {
>   .get_temp   = brcmstb_get_temp,
>   .set_trips  = brcmstb_set_trips,
>  };
> 


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH 2/2] thermal: broadcom: constify thermal_zone_of_device_ops structure

2018-10-30 Thread Daniel Lezcano
On 30/10/2018 16:15, Julia Lawall wrote:
> The thermal_zone_of_device_ops structure can be const as it is only
> passed as the last argument of thermal_zone_of_sensor_register
> and the corresponding parameter is declared as const.
> 
> Done with the help of Coccinelle.
> 
> Signed-off-by: Julia Lawall 

Reviewed-by: Daniel Lezcano 

> ---
> 
> Unrelated to this change, is there a reason not to use
> devm_thermal_zone_of_sensor_register?

Agree, it would make sense to use the devm* version.

>  drivers/thermal/broadcom/brcmstb_thermal.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff -u -p a/drivers/thermal/broadcom/brcmstb_thermal.c 
> b/drivers/thermal/broadcom/brcmstb_thermal.c
> --- a/drivers/thermal/broadcom/brcmstb_thermal.c
> +++ b/drivers/thermal/broadcom/brcmstb_thermal.c
> @@ -299,7 +299,7 @@ static int brcmstb_set_trips(void *data,
>   return 0;
>  }
>  
> -static struct thermal_zone_of_device_ops of_ops = {
> +static const struct thermal_zone_of_device_ops of_ops = {
>   .get_temp   = brcmstb_get_temp,
>   .set_trips  = brcmstb_set_trips,
>  };
> 


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH 1/2] thermal: armada: constify thermal_zone_of_device_ops structure

2018-10-30 Thread Daniel Lezcano
On 30/10/2018 16:14, Julia Lawall wrote:
> The thermal_zone_of_device_ops structure can be const as it is only
> passed as the last argument of devm_thermal_zone_of_sensor_register
> and the corresponding parameter is declared as const.
> 
> Done with the help of Coccinelle.
> 
> Signed-off-by: Julia Lawall 
> 
> ---
>  drivers/thermal/armada_thermal.c |2 +-

Reviewed-by: Daniel Lezcano 


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH 1/2] thermal: armada: constify thermal_zone_of_device_ops structure

2018-10-30 Thread Daniel Lezcano
On 30/10/2018 16:14, Julia Lawall wrote:
> The thermal_zone_of_device_ops structure can be const as it is only
> passed as the last argument of devm_thermal_zone_of_sensor_register
> and the corresponding parameter is declared as const.
> 
> Done with the help of Coccinelle.
> 
> Signed-off-by: Julia Lawall 
> 
> ---
>  drivers/thermal/armada_thermal.c |2 +-

Reviewed-by: Daniel Lezcano 


-- 
  Linaro.org │ Open source software for ARM SoCs

Follow Linaro:   Facebook |
 Twitter |
 Blog



Re: [PATCH] arm64: dts: qcom: msm8998: Reserve gpio ranges on MTP

2018-10-30 Thread Jeffrey Hugo

On 10/29/2018 11:45 PM, Bjorn Andersson wrote:

GPIOs 0 through 3 and 81 through 84 are configured to not be accessible
from the application CPUs. Mark them as reserved to allow the MSM8998
MTP to boot after the introduction of 3edfb7bd76bd ("gpiolib: Show
correct direction from the beginning").

Signed-off-by: Bjorn Andersson 
---
  arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi | 4 
  1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi 
b/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi
index b4276da1fb0d..11fd1fe8bdb5 100644
--- a/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi
@@ -241,3 +241,7 @@
};
};
  };
+
+ {
+   gpio-reserved-ranges = <0 4>, <81 4>;
+};



Does Documentation/devicetree/bindings/pinctrl/qcom,msm8998-pinctrl.txt 
need to be updated?  Seems like the majority of the other qcom pinctrl 
bindings docs mention that gpio-reserved-ranges is optional.


In any case,
Reviewed-by: Jeffrey Hugo 

--
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm 
Technologies, Inc.

Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.


Re: [PATCH] arm64: dts: qcom: msm8998: Reserve gpio ranges on MTP

2018-10-30 Thread Jeffrey Hugo

On 10/29/2018 11:45 PM, Bjorn Andersson wrote:

GPIOs 0 through 3 and 81 through 84 are configured to not be accessible
from the application CPUs. Mark them as reserved to allow the MSM8998
MTP to boot after the introduction of 3edfb7bd76bd ("gpiolib: Show
correct direction from the beginning").

Signed-off-by: Bjorn Andersson 
---
  arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi | 4 
  1 file changed, 4 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi 
b/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi
index b4276da1fb0d..11fd1fe8bdb5 100644
--- a/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8998-mtp.dtsi
@@ -241,3 +241,7 @@
};
};
  };
+
+ {
+   gpio-reserved-ranges = <0 4>, <81 4>;
+};



Does Documentation/devicetree/bindings/pinctrl/qcom,msm8998-pinctrl.txt 
need to be updated?  Seems like the majority of the other qcom pinctrl 
bindings docs mention that gpio-reserved-ranges is optional.


In any case,
Reviewed-by: Jeffrey Hugo 

--
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm 
Technologies, Inc.

Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.


Re: [GIT PULL] More ACPI updates for v4.20-rc1

2018-10-30 Thread Linus Torvalds
On Tue, Oct 30, 2018 at 1:30 AM Rafael J. Wysocki  wrote:
>
> These rework the handling of the P-unit semaphore on Intel
> Baytrail and Cherrytrail systems to avoid race conditions and
> excessive overhead related to it (Hans de Goede).

Pulled,

Linus


Re: [GIT PULL] More ACPI updates for v4.20-rc1

2018-10-30 Thread Linus Torvalds
On Tue, Oct 30, 2018 at 1:30 AM Rafael J. Wysocki  wrote:
>
> These rework the handling of the P-unit semaphore on Intel
> Baytrail and Cherrytrail systems to avoid race conditions and
> excessive overhead related to it (Hans de Goede).

Pulled,

Linus


Re: [PATCH 4/9] mailbox: tegra-hsp: Add support for shared mailboxes

2018-10-30 Thread Pekka Pessi

I guess it being an example doesn't make it strictly a recommendation, but
I wonder if we should avoid giving examples that use mappings which we
discourage.


Well, yes, that is example for top0 – but I agree, giving examples that 
do not make much sense is not very productive.



So currently
we list all available interrupts in DT and then we could pick just the
first. But what if somebody else already picked the first and "owns" it


We have rather static hardware allocation in device trees, if an another 
VM instance or another aux cpu owns interrupt, we just remove it from 
the DTS or DTSI.  So the native Linux would see all the possible 
interrupts (that are not used by SPE or RCE, for instance), but a 
virtualized one would see only one.



For mailboxes that the CCPLEX writes to, we can reconfigure them as
producers by disabling the FULL interrupt and enabling the EMPTY
interrupt instead. We can do that the first time somebody calls the
mbox_send_message() on the mailbox.


The problem here is "disabling the FULL interrupt". The main problem are 
the mailbox-specific IE registers, they are not really shared, but the 
consumer owns the full_ie and producer empty_ie. We can not enable or 
disable interrupts without interfering with the interrupt settings of 
the remote end, when the interrupt is enabled, we either modify 
full_ie/empty_ie register or run with a risk that it has incorrect value.


Can we just leave the mbox channel without interrupts? How the consumer 
indicates that it is interested in receiving messages? mbox_peek_data()?


--Pekka



On 10/29/2018 03:16 PM, Thierry Reding wrote:

On Mon, Oct 29, 2018 at 02:25:42PM +0200, Pekka Pessi wrote:

Hi Thierry,

 From the 0/9:

Are you aware of any others that we need to take into account?

We would like to use upstream driver for RCE (and probably AON and SCE)
mailbox handling, too. Eventually.

This is a bit
of a problem because the mailbox driver doesn't really know anything
about the direction when it starts up, so how would it make the decision
about how to program the registers?

I'm afraid that information must be stored in the device tree.

What's somewhat surprising is that you're saying that both FULL and
EMPTY interrupts should be handled by the same shared interrupt. That's
the opposite of what the recommended programming sequence is that the
TRM specifies.

Which TRM you mean? Something here?

https://p4viewer.nvidia.com/get///hw/ar/doc/Tech_Pubs/Xavier/TRM/

I browsed through he Xavier_TRM_DP09253001v1.pdf HSP section and I could not
find any recommendations? But see below.

I'm referring to Xavier_TRM_Public.pdf in that location. If you look at
page 4422, "Shared Interrupts Configuration", the example has aggregated
EMPTY and FULL interrupts on shared interrupts 0 and 1, respectively. I
guess it being an example doesn't make it strictly a recommendation, but
I wonder if we should avoid giving examples that use mappings which we
discourage.


Why is it better to handle both FULL and EMPTY interrupts
with the same shared interrupt?

Only top0 has 8 interrupts, rest of the HSP blocks have only 4 (or 5 in case
of top1) interrupts per HSP available through LIC. Hogging two of them means
that only two VMs can access a HSP.

Virtualization isn't something that we're very concerned about upstream,
but I'll take that under consideration.


Would it be safe to clear all of the IE registers to 0 on driver probe?

Nope, the driver should clear only the IE register for the shared interrupt
that the driver uses. Other IEs are used by other entities.

Right, that makes sense. But within that IE register it can consider all
bits fair game, right? One thing I wonder, though, is whether there
should be some external mechanism to set the shared interrupt to use. If
we go purely by convention we'll eventually get this wrong. So currently
we list all available interrupts in DT and then we could pick just the
first. But what if somebody else already picked the first and "owns" it?

I'm not sure we have any practical mechanism to rewrite the DTB, but
perhaps something to keep in mind if ever we need to support other
entities down the road.


If they are indeed separate
for each processor, it should be fairly easy to keep track of the
mailboxes used by the kernel and process only those.

Yes, that is what we should do. Again the directionality should be specified
in DT.

Currently we encode the shared mailboxes in DT like this:

mboxes = <_top0 TEGRA_HSP_MBOX_TYPE_SM 0>,
 <_aon TEGRA_HSP_MBOX_TYPE_SM 1>;
mbox-names = "rx", "tx";

I suppose we could change that to something like:

mboxes = <_top0 TEGRA_HSP_MBOX_TYPE_SM (0 << 31 | 0)>,
 <_aon TEGRA_HSP_MBOX_TYPE_SM (1 << 31 | 0)>;

Where the MSB of the mailbox index would indicate the direction. We
could maybe add some eye-candy to make it easier to read:

mboxes = <_top0 TEGRA_HSP_MBOX_TYPE_SM TEGRA_HSP_MBOX_SM_RX(0)>,
 

Re: [PATCH 4/9] mailbox: tegra-hsp: Add support for shared mailboxes

2018-10-30 Thread Pekka Pessi

I guess it being an example doesn't make it strictly a recommendation, but
I wonder if we should avoid giving examples that use mappings which we
discourage.


Well, yes, that is example for top0 – but I agree, giving examples that 
do not make much sense is not very productive.



So currently
we list all available interrupts in DT and then we could pick just the
first. But what if somebody else already picked the first and "owns" it


We have rather static hardware allocation in device trees, if an another 
VM instance or another aux cpu owns interrupt, we just remove it from 
the DTS or DTSI.  So the native Linux would see all the possible 
interrupts (that are not used by SPE or RCE, for instance), but a 
virtualized one would see only one.



For mailboxes that the CCPLEX writes to, we can reconfigure them as
producers by disabling the FULL interrupt and enabling the EMPTY
interrupt instead. We can do that the first time somebody calls the
mbox_send_message() on the mailbox.


The problem here is "disabling the FULL interrupt". The main problem are 
the mailbox-specific IE registers, they are not really shared, but the 
consumer owns the full_ie and producer empty_ie. We can not enable or 
disable interrupts without interfering with the interrupt settings of 
the remote end, when the interrupt is enabled, we either modify 
full_ie/empty_ie register or run with a risk that it has incorrect value.


Can we just leave the mbox channel without interrupts? How the consumer 
indicates that it is interested in receiving messages? mbox_peek_data()?


--Pekka



On 10/29/2018 03:16 PM, Thierry Reding wrote:

On Mon, Oct 29, 2018 at 02:25:42PM +0200, Pekka Pessi wrote:

Hi Thierry,

 From the 0/9:

Are you aware of any others that we need to take into account?

We would like to use upstream driver for RCE (and probably AON and SCE)
mailbox handling, too. Eventually.

This is a bit
of a problem because the mailbox driver doesn't really know anything
about the direction when it starts up, so how would it make the decision
about how to program the registers?

I'm afraid that information must be stored in the device tree.

What's somewhat surprising is that you're saying that both FULL and
EMPTY interrupts should be handled by the same shared interrupt. That's
the opposite of what the recommended programming sequence is that the
TRM specifies.

Which TRM you mean? Something here?

https://p4viewer.nvidia.com/get///hw/ar/doc/Tech_Pubs/Xavier/TRM/

I browsed through he Xavier_TRM_DP09253001v1.pdf HSP section and I could not
find any recommendations? But see below.

I'm referring to Xavier_TRM_Public.pdf in that location. If you look at
page 4422, "Shared Interrupts Configuration", the example has aggregated
EMPTY and FULL interrupts on shared interrupts 0 and 1, respectively. I
guess it being an example doesn't make it strictly a recommendation, but
I wonder if we should avoid giving examples that use mappings which we
discourage.


Why is it better to handle both FULL and EMPTY interrupts
with the same shared interrupt?

Only top0 has 8 interrupts, rest of the HSP blocks have only 4 (or 5 in case
of top1) interrupts per HSP available through LIC. Hogging two of them means
that only two VMs can access a HSP.

Virtualization isn't something that we're very concerned about upstream,
but I'll take that under consideration.


Would it be safe to clear all of the IE registers to 0 on driver probe?

Nope, the driver should clear only the IE register for the shared interrupt
that the driver uses. Other IEs are used by other entities.

Right, that makes sense. But within that IE register it can consider all
bits fair game, right? One thing I wonder, though, is whether there
should be some external mechanism to set the shared interrupt to use. If
we go purely by convention we'll eventually get this wrong. So currently
we list all available interrupts in DT and then we could pick just the
first. But what if somebody else already picked the first and "owns" it?

I'm not sure we have any practical mechanism to rewrite the DTB, but
perhaps something to keep in mind if ever we need to support other
entities down the road.


If they are indeed separate
for each processor, it should be fairly easy to keep track of the
mailboxes used by the kernel and process only those.

Yes, that is what we should do. Again the directionality should be specified
in DT.

Currently we encode the shared mailboxes in DT like this:

mboxes = <_top0 TEGRA_HSP_MBOX_TYPE_SM 0>,
 <_aon TEGRA_HSP_MBOX_TYPE_SM 1>;
mbox-names = "rx", "tx";

I suppose we could change that to something like:

mboxes = <_top0 TEGRA_HSP_MBOX_TYPE_SM (0 << 31 | 0)>,
 <_aon TEGRA_HSP_MBOX_TYPE_SM (1 << 31 | 0)>;

Where the MSB of the mailbox index would indicate the direction. We
could maybe add some eye-candy to make it easier to read:

mboxes = <_top0 TEGRA_HSP_MBOX_TYPE_SM TEGRA_HSP_MBOX_SM_RX(0)>,
 

Re: Regression found (Stop-marking-clocks-as-CLK_IS_CRITICAL)

2018-10-30 Thread Dean Wallace
On 30-10-18, Hans de Goede wrote:
> Hi,
> 
> On 30-10-18 16:46, Hans de Goede wrote:
> > Hi,
> > 
> > On 30-10-18 16:04, Pierre-Louis Bossart wrote:
> > > In addition I am not aware of any baytrail device using plt_clk_0,
> > > so moving a common machine driver such a cht_bsw_max98090_ti to use
> > > plt_clk0 only would break other devices (e.g. Rambi/Orco). Asking
> > > for both clocks to be on might work though,
> > 
> > Ok, so we need to have a DMI based quirk for the Swanky and maybe also
> > the clapper to use plt_clk_0 there. Asking for 2 clks if we only need
> > one does not seem like a good plan.
> 
> Dean, Mogens,
> 
> To write a proper patch for this I'm going to need DMI strings
> from your devices.
> 
> Can you please run (as normal user):
> 
> grep . /sys/class/dmi/id/* 2> /dev/null
> 
> And reply with the output of this command?
Here's mine, for a coreboot uefi based swanky.

https://ptpb.pw/~swanky-dmi-log

--
Dean


Re: Regression found (Stop-marking-clocks-as-CLK_IS_CRITICAL)

2018-10-30 Thread Dean Wallace
On 30-10-18, Hans de Goede wrote:
> Hi,
> 
> On 30-10-18 16:46, Hans de Goede wrote:
> > Hi,
> > 
> > On 30-10-18 16:04, Pierre-Louis Bossart wrote:
> > > In addition I am not aware of any baytrail device using plt_clk_0,
> > > so moving a common machine driver such a cht_bsw_max98090_ti to use
> > > plt_clk0 only would break other devices (e.g. Rambi/Orco). Asking
> > > for both clocks to be on might work though,
> > 
> > Ok, so we need to have a DMI based quirk for the Swanky and maybe also
> > the clapper to use plt_clk_0 there. Asking for 2 clks if we only need
> > one does not seem like a good plan.
> 
> Dean, Mogens,
> 
> To write a proper patch for this I'm going to need DMI strings
> from your devices.
> 
> Can you please run (as normal user):
> 
> grep . /sys/class/dmi/id/* 2> /dev/null
> 
> And reply with the output of this command?
Here's mine, for a coreboot uefi based swanky.

https://ptpb.pw/~swanky-dmi-log

--
Dean


[PATCH] sched/fair: util_est: fix cpu_util_wake for execl

2018-10-30 Thread Patrick Bellasi
A ~10% regression has been reported for UnixBench's execl throughput
test:

   Message-ID: <20180402032000.GD3101@yexl-desktop>
   Message-ID: <20181024064100.ga27...@intel.com>

That test is pretty simple, it does a "recursive" execve syscall on the
same binary. Starting from the syscall, this sequence is possible:

   do_execve()
 do_execveat_common()
   __do_execve_file()
 sched_exec()
   select_task_rq_fair()  <==| Task already enqueued
 find_idlest_cpu()
   find_idlest_group()
 capacity_spare_wake()<==| Functions not called from
   cpu_util_wake()   | the wakeup path

which means we can end up calling cpu_util_wake() not only from the
"wakeup path", as its name would suggest. Indeed, the task doing an
execve syscall is already enqueued on the CPU we want to get the
cpu_util_wake for.

The estimated utilization for a CPU computed in cpu_util_wake() was
encoded under the assumption that function can be called only from the
wakeup path. If instead the task is already enqueued, we end up with a
utilization which does not remove the current task's contribution form
the estimated utilization of the CPU.
This will wrongly assume a reduced spare capacity on the current CPU and
increase the chances to migrate the task on execve.

The regression is tracked down to:

 commit d519329f72a6 ("sched/fair: Update util_est only on util_avg updates")

because in that patch we turn on by default the UTIL_EST sched feature.
However, the real issue is introduced by:

 commit f9be3e5961c5 (sched/fair: Use util_est in LB and WU paths)

Let's fix this by ensuring to always discount the task estimated
utilization from the CPU's estimated utilization when the task is also
the current one. The same benchmark of the bug report, executed on a
dual socket 40 CPUs Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz machine,
reports these "Execl Throughput" figures (higher the better):

   mainline : 48136.5 lps
   mainline+fix : 55376.5 lps

which correspond to a 15% speedup.

Moreover, since {cpu_util,capacity_spare}_wake() are not really only
used from the wakeup path, let's remove this ambiguity by using a better
matching name: {cpu_util,capacity_spare}_without().
Since we are at that, let's also improve the existing documentation.

Signed-off-by: Patrick Bellasi 
Fixes: f9be3e5961c5 (sched/fair: Use util_est in LB and WU paths)
Reported-by: Aaron Lu 
Reported-by: Ye Xiaolong 
Tested-by: Aaron Lu 
Link: https://lore.kernel.org/lkml/20181025093100.GB13236@e110439-lin/
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: linux-kernel@vger.kernel.org
---
 kernel/sched/fair.c | 44 +++-
 1 file changed, 31 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 908c9cdae2f0..bdc0be267621 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5672,11 +5672,11 @@ static int wake_affine(struct sched_domain *sd, struct 
task_struct *p,
return target;
 }
 
-static unsigned long cpu_util_wake(int cpu, struct task_struct *p);
+static unsigned long cpu_util_without(int cpu, struct task_struct *p);
 
-static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
+static unsigned long capacity_spare_without(int cpu, struct task_struct *p)
 {
-   return max_t(long, capacity_of(cpu) - cpu_util_wake(cpu, p), 0);
+   return max_t(long, capacity_of(cpu) - cpu_util_without(cpu, p), 0);
 }
 
 /*
@@ -5736,7 +5736,7 @@ find_idlest_group(struct sched_domain *sd, struct 
task_struct *p,
 
avg_load += cfs_rq_load_avg(_rq(i)->cfs);
 
-   spare_cap = capacity_spare_wake(i, p);
+   spare_cap = capacity_spare_without(i, p);
 
if (spare_cap > max_spare_cap)
max_spare_cap = spare_cap;
@@ -5887,8 +5887,8 @@ static inline int find_idlest_cpu(struct sched_domain 
*sd, struct task_struct *p
return prev_cpu;
 
/*
-* We need task's util for capacity_spare_wake, sync it up to prev_cpu's
-* last_update_time.
+* We need task's util for capacity_spare_without, sync it up to
+* prev_cpu's last_update_time.
 */
if (!(sd_flag & SD_BALANCE_FORK))
sync_entity_load_avg(>se);
@@ -6214,10 +6214,19 @@ static inline unsigned long cpu_util(int cpu)
 }
 
 /*
- * cpu_util_wake: Compute CPU utilization with any contributions from
- * the waking task p removed.
+ * cpu_util_without: compute cpu utilization without any contributions from *p
+ * @cpu: the CPU which utilization is requested
+ * @p: the task which utilization should be discounted
+ *
+ * The utilization of a CPU is defined by the utilization of tasks currently
+ * enqueued on that CPU as well as tasks which are currently sleeping after an
+ * execution on that CPU.
+ *
+ * This method returns the utilization of the specified 

[PATCH] sched/fair: util_est: fix cpu_util_wake for execl

2018-10-30 Thread Patrick Bellasi
A ~10% regression has been reported for UnixBench's execl throughput
test:

   Message-ID: <20180402032000.GD3101@yexl-desktop>
   Message-ID: <20181024064100.ga27...@intel.com>

That test is pretty simple, it does a "recursive" execve syscall on the
same binary. Starting from the syscall, this sequence is possible:

   do_execve()
 do_execveat_common()
   __do_execve_file()
 sched_exec()
   select_task_rq_fair()  <==| Task already enqueued
 find_idlest_cpu()
   find_idlest_group()
 capacity_spare_wake()<==| Functions not called from
   cpu_util_wake()   | the wakeup path

which means we can end up calling cpu_util_wake() not only from the
"wakeup path", as its name would suggest. Indeed, the task doing an
execve syscall is already enqueued on the CPU we want to get the
cpu_util_wake for.

The estimated utilization for a CPU computed in cpu_util_wake() was
encoded under the assumption that function can be called only from the
wakeup path. If instead the task is already enqueued, we end up with a
utilization which does not remove the current task's contribution form
the estimated utilization of the CPU.
This will wrongly assume a reduced spare capacity on the current CPU and
increase the chances to migrate the task on execve.

The regression is tracked down to:

 commit d519329f72a6 ("sched/fair: Update util_est only on util_avg updates")

because in that patch we turn on by default the UTIL_EST sched feature.
However, the real issue is introduced by:

 commit f9be3e5961c5 (sched/fair: Use util_est in LB and WU paths)

Let's fix this by ensuring to always discount the task estimated
utilization from the CPU's estimated utilization when the task is also
the current one. The same benchmark of the bug report, executed on a
dual socket 40 CPUs Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz machine,
reports these "Execl Throughput" figures (higher the better):

   mainline : 48136.5 lps
   mainline+fix : 55376.5 lps

which correspond to a 15% speedup.

Moreover, since {cpu_util,capacity_spare}_wake() are not really only
used from the wakeup path, let's remove this ambiguity by using a better
matching name: {cpu_util,capacity_spare}_without().
Since we are at that, let's also improve the existing documentation.

Signed-off-by: Patrick Bellasi 
Fixes: f9be3e5961c5 (sched/fair: Use util_est in LB and WU paths)
Reported-by: Aaron Lu 
Reported-by: Ye Xiaolong 
Tested-by: Aaron Lu 
Link: https://lore.kernel.org/lkml/20181025093100.GB13236@e110439-lin/
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: linux-kernel@vger.kernel.org
---
 kernel/sched/fair.c | 44 +++-
 1 file changed, 31 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 908c9cdae2f0..bdc0be267621 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5672,11 +5672,11 @@ static int wake_affine(struct sched_domain *sd, struct 
task_struct *p,
return target;
 }
 
-static unsigned long cpu_util_wake(int cpu, struct task_struct *p);
+static unsigned long cpu_util_without(int cpu, struct task_struct *p);
 
-static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
+static unsigned long capacity_spare_without(int cpu, struct task_struct *p)
 {
-   return max_t(long, capacity_of(cpu) - cpu_util_wake(cpu, p), 0);
+   return max_t(long, capacity_of(cpu) - cpu_util_without(cpu, p), 0);
 }
 
 /*
@@ -5736,7 +5736,7 @@ find_idlest_group(struct sched_domain *sd, struct 
task_struct *p,
 
avg_load += cfs_rq_load_avg(_rq(i)->cfs);
 
-   spare_cap = capacity_spare_wake(i, p);
+   spare_cap = capacity_spare_without(i, p);
 
if (spare_cap > max_spare_cap)
max_spare_cap = spare_cap;
@@ -5887,8 +5887,8 @@ static inline int find_idlest_cpu(struct sched_domain 
*sd, struct task_struct *p
return prev_cpu;
 
/*
-* We need task's util for capacity_spare_wake, sync it up to prev_cpu's
-* last_update_time.
+* We need task's util for capacity_spare_without, sync it up to
+* prev_cpu's last_update_time.
 */
if (!(sd_flag & SD_BALANCE_FORK))
sync_entity_load_avg(>se);
@@ -6214,10 +6214,19 @@ static inline unsigned long cpu_util(int cpu)
 }
 
 /*
- * cpu_util_wake: Compute CPU utilization with any contributions from
- * the waking task p removed.
+ * cpu_util_without: compute cpu utilization without any contributions from *p
+ * @cpu: the CPU which utilization is requested
+ * @p: the task which utilization should be discounted
+ *
+ * The utilization of a CPU is defined by the utilization of tasks currently
+ * enqueued on that CPU as well as tasks which are currently sleeping after an
+ * execution on that CPU.
+ *
+ * This method returns the utilization of the specified 

<    1   2   3   4   5   6   7   8   9   10   >