date:20160517

Re: Linux-next parallel cp workload hang

2016-05-17 Thread Dave Chinner

On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote:
> Hi,
> 
> Parallel cp workload (xfstests generic/273) hangs like blow.
> It's reproducible with a small chance, less the 1/100 i think.
> 
> Have hit this in linux-next 20160504 0506 0510 trees, testing on
> xfs with loop or block device. Ext4 survived several rounds
> of testing.
> 
> Linux next 20160510 tree hangs within 500 rounds testing several
> times. The same tree with vfs parallel lookup patchset reverted
> survived 900 rounds testing. Reverted commits are attached.

What hardware?

> Bisecting in this patchset ided this commit:
> 
> 3b0a3c1ac1598722fc289da19219d14f2a37b31f is the first bad commit
> commit 3b0a3c1ac1598722fc289da19219d14f2a37b31f
> Author: Al Viro 
> Date:   Wed Apr 20 23:42:46 2016 -0400
> 
> simple local filesystems: switch to ->iterate_shared()
> 
> no changes needed (XFS isn't simple, but it has the same parallelism
> in the interesting parts exercised from CXFS).
> 
> With this commit reverted on top of Linux next 0510 tree, 5000+ rounds
> of testing passed.
> 
> Although 2000 rounds testing had been conducted before good/bad
> verdict, i'm not 100 percent sure about all this, since it's
> so hard to hit, and i am not that lucky..
> 
> Bisect log and full blocked state process dump log are also attached.
> 
> Furthermore, this was first hit when testing fs dax on nvdimm,
> however it's reproducible without dax mount option, and also
> reproducible on loop device, just seems harder to hit.
> 
> Thanks,
> Xiong
> 
> [0.771475] INFO: task cp:49033 blocked for more than 120 seconds.
> [0.794263]   Not tainted 4.6.0-rc6-next-20160504 #5
> [0.812515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [0.841801] cp  D 880b4e977928 0 49033  49014
> 0x0080
> [0.868923]  880b4e977928 880ba275d380 880b8d712b80
> 880b4e978000
> [0.897504]  7fff 0002 
> 880b8d712b80
> [0.925234]  880b4e977940 816cbc25 88035a1dabb0
> 880b4e9779e8
> [0.953237] Call Trace:
> [0.958314]  [] schedule+0x35/0x80
> [0.974854]  [] schedule_timeout+0x231/0x2d0
> [0.995728]  [] ? down_trylock+0x2d/0x40
> [1.015351]  [] ? xfs_iext_bno_to_ext+0xa2/0x190 [xfs]
> [1.040182]  [] __down_common+0xaa/0x104
> [1.059021]  [] ? _xfs_buf_find+0x162/0x340 [xfs]
> [1.081357]  [] __down+0x1d/0x1f
> [1.097166]  [] down+0x41/0x50
> [1.112869]  [] xfs_buf_lock+0x3c/0xf0 [xfs]
> [1.134504]  [] _xfs_buf_find+0x162/0x340 [xfs]
> [1.156871]  [] xfs_buf_get_map+0x2a/0x270 [xfs]

So what's holding that directory data buffer lock? It should only be
held if there is either IO in progress, or a modification of the
buffer in progress that is blocked somewhere else.

> [1.180010]  [] xfs_buf_read_map+0x2d/0x180 [xfs]
> [1.203538]  [] xfs_trans_read_buf_map+0xf1/0x300 [xfs]
> [1.229194]  [] xfs_da_read_buf+0xd1/0x100 [xfs]
> [1.251948]  [] xfs_dir3_data_read+0x26/0x60 [xfs]
> [1.275736]  [] xfs_dir2_leaf_readbuf.isra.12+0x1be/0x4a0 
> [xfs]
> [1.305094]  [] ? down_read+0x12/0x30
> [1.323787]  [] ? xfs_ilock+0xe4/0x110 [xfs]
> [1.345114]  [] xfs_dir2_leaf_getdents+0x13b/0x3d0 [xfs]
> [1.371818]  [] xfs_readdir+0x1a6/0x1c0 [xfs]

So we should be holding the ilock in shared mode here...

> [1.393471]  [] xfs_file_readdir+0x2b/0x30 [xfs]
> [1.416874]  [] iterate_dir+0x173/0x190
> [1.436709]  [] ? do_audit_syscall_entry+0x66/0x70
> [1.460951]  [] SyS_getdents+0x98/0x120
> [1.480566]  [] ? iterate_dir+0x190/0x190
> [1.500909]  [] do_syscall_64+0x62/0x110
> [1.520847]  [] entry_SYSCALL64_slow_path+0x25/0x25
> [1.545372] INFO: task cp:49040 blocked for more than 120 seconds.
> [1.568933]   Not tainted 4.6.0-rc6-next-20160504 #5
> [1.587943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [1.618544] cp  D 880b91463b00 0 49040  49016
> 0x0080
> [1.645502]  880b91463b00 880464d5c140 88029b475700
> 880b91464000
> [1.674145]  880411c42610  880411c42628
> 8802c10bc610
> [1.702834]  880b91463b18 816cbc25 88029b475700
> 880b91463b88
> [1.731501] Call Trace:
> [1.736866]  [] schedule+0x35/0x80
> [1.754119]  [] rwsem_down_read_failed+0xf2/0x140
> [1.777411]  [] ? xfs_ilock_data_map_shared+0x30/0x40
> [xfs]
> [1.805090]  [] call_rwsem_down_read_failed+0x18/0x30
> [1.830482]  [] down_read+0x20/0x30
> [1.848505]  [] xfs_ilock+0xe4/0x110 [xfs]
> [1.869293]  [] xfs_ilock_data_map_shared+0x30/0x40

And it this is an attempt to lock the inode shared, so if that is
failing while there's another shared holder, than means there's an
exclusive waiter queued up (i.e. read iheld -> write blocked -> read
blocked).


So looking at dump-g273xfs0510:

[  845.727907] INFO: task cp:40126 blocked for more than 120 seconds.
[  845.751175]   Not tainted 4.6.0-rc7-next-20160510 #9
[  845.770011] "echo 0 >

Re: Linux-next parallel cp workload hang

2016-05-17 Thread Dave Chinner

On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote:
> Hi,
> 
> Parallel cp workload (xfstests generic/273) hangs like blow.
> It's reproducible with a small chance, less the 1/100 i think.
> 
> Have hit this in linux-next 20160504 0506 0510 trees, testing on
> xfs with loop or block device. Ext4 survived several rounds
> of testing.
> 
> Linux next 20160510 tree hangs within 500 rounds testing several
> times. The same tree with vfs parallel lookup patchset reverted
> survived 900 rounds testing. Reverted commits are attached.

What hardware?

> Bisecting in this patchset ided this commit:
> 
> 3b0a3c1ac1598722fc289da19219d14f2a37b31f is the first bad commit
> commit 3b0a3c1ac1598722fc289da19219d14f2a37b31f
> Author: Al Viro 
> Date:   Wed Apr 20 23:42:46 2016 -0400
> 
> simple local filesystems: switch to ->iterate_shared()
> 
> no changes needed (XFS isn't simple, but it has the same parallelism
> in the interesting parts exercised from CXFS).
> 
> With this commit reverted on top of Linux next 0510 tree, 5000+ rounds
> of testing passed.
> 
> Although 2000 rounds testing had been conducted before good/bad
> verdict, i'm not 100 percent sure about all this, since it's
> so hard to hit, and i am not that lucky..
> 
> Bisect log and full blocked state process dump log are also attached.
> 
> Furthermore, this was first hit when testing fs dax on nvdimm,
> however it's reproducible without dax mount option, and also
> reproducible on loop device, just seems harder to hit.
> 
> Thanks,
> Xiong
> 
> [0.771475] INFO: task cp:49033 blocked for more than 120 seconds.
> [0.794263]   Not tainted 4.6.0-rc6-next-20160504 #5
> [0.812515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [0.841801] cp  D 880b4e977928 0 49033  49014
> 0x0080
> [0.868923]  880b4e977928 880ba275d380 880b8d712b80
> 880b4e978000
> [0.897504]  7fff 0002 
> 880b8d712b80
> [0.925234]  880b4e977940 816cbc25 88035a1dabb0
> 880b4e9779e8
> [0.953237] Call Trace:
> [0.958314]  [] schedule+0x35/0x80
> [0.974854]  [] schedule_timeout+0x231/0x2d0
> [0.995728]  [] ? down_trylock+0x2d/0x40
> [1.015351]  [] ? xfs_iext_bno_to_ext+0xa2/0x190 [xfs]
> [1.040182]  [] __down_common+0xaa/0x104
> [1.059021]  [] ? _xfs_buf_find+0x162/0x340 [xfs]
> [1.081357]  [] __down+0x1d/0x1f
> [1.097166]  [] down+0x41/0x50
> [1.112869]  [] xfs_buf_lock+0x3c/0xf0 [xfs]
> [1.134504]  [] _xfs_buf_find+0x162/0x340 [xfs]
> [1.156871]  [] xfs_buf_get_map+0x2a/0x270 [xfs]

So what's holding that directory data buffer lock? It should only be
held if there is either IO in progress, or a modification of the
buffer in progress that is blocked somewhere else.

> [1.180010]  [] xfs_buf_read_map+0x2d/0x180 [xfs]
> [1.203538]  [] xfs_trans_read_buf_map+0xf1/0x300 [xfs]
> [1.229194]  [] xfs_da_read_buf+0xd1/0x100 [xfs]
> [1.251948]  [] xfs_dir3_data_read+0x26/0x60 [xfs]
> [1.275736]  [] xfs_dir2_leaf_readbuf.isra.12+0x1be/0x4a0 
> [xfs]
> [1.305094]  [] ? down_read+0x12/0x30
> [1.323787]  [] ? xfs_ilock+0xe4/0x110 [xfs]
> [1.345114]  [] xfs_dir2_leaf_getdents+0x13b/0x3d0 [xfs]
> [1.371818]  [] xfs_readdir+0x1a6/0x1c0 [xfs]

So we should be holding the ilock in shared mode here...

> [1.393471]  [] xfs_file_readdir+0x2b/0x30 [xfs]
> [1.416874]  [] iterate_dir+0x173/0x190
> [1.436709]  [] ? do_audit_syscall_entry+0x66/0x70
> [1.460951]  [] SyS_getdents+0x98/0x120
> [1.480566]  [] ? iterate_dir+0x190/0x190
> [1.500909]  [] do_syscall_64+0x62/0x110
> [1.520847]  [] entry_SYSCALL64_slow_path+0x25/0x25
> [1.545372] INFO: task cp:49040 blocked for more than 120 seconds.
> [1.568933]   Not tainted 4.6.0-rc6-next-20160504 #5
> [1.587943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [1.618544] cp  D 880b91463b00 0 49040  49016
> 0x0080
> [1.645502]  880b91463b00 880464d5c140 88029b475700
> 880b91464000
> [1.674145]  880411c42610  880411c42628
> 8802c10bc610
> [1.702834]  880b91463b18 816cbc25 88029b475700
> 880b91463b88
> [1.731501] Call Trace:
> [1.736866]  [] schedule+0x35/0x80
> [1.754119]  [] rwsem_down_read_failed+0xf2/0x140
> [1.777411]  [] ? xfs_ilock_data_map_shared+0x30/0x40
> [xfs]
> [1.805090]  [] call_rwsem_down_read_failed+0x18/0x30
> [1.830482]  [] down_read+0x20/0x30
> [1.848505]  [] xfs_ilock+0xe4/0x110 [xfs]
> [1.869293]  [] xfs_ilock_data_map_shared+0x30/0x40

And it this is an attempt to lock the inode shared, so if that is
failing while there's another shared holder, than means there's an
exclusive waiter queued up (i.e. read iheld -> write blocked -> read
blocked).


So looking at dump-g273xfs0510:

[  845.727907] INFO: task cp:40126 blocked for more than 120 seconds.
[  845.751175]   Not tainted 4.6.0-rc7-next-20160510 #9
[  845.770011] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"

Re: [PATCH 1/2] arm64: dts: NS2: Add all of the UARTs

2016-05-17 Thread Kefeng Wang



On 2016/5/18 3:56, Florian Fainelli wrote:
> On 05/15/2016 06:11 PM, Kefeng Wang wrote:
>>> I can confirm that with your change and the change to the bootargs you 
>>> describe above, it works as desired.  Was your change already accepted?
>>>
>>
>> Great, thanks a lot. it is still being reviewing for now and waiting for 
>> response for now.
> 
> Then I would be inclined to take Jon's patch as-is, and follow up with
> an additional patch to the NS2 DTS once yours lands in.
> 

Sure, please.

Kefeng

Re: [PATCH 1/2] arm64: dts: NS2: Add all of the UARTs

2016-05-17 Thread Kefeng Wang



On 2016/5/18 3:56, Florian Fainelli wrote:
> On 05/15/2016 06:11 PM, Kefeng Wang wrote:
>>> I can confirm that with your change and the change to the bootargs you 
>>> describe above, it works as desired.  Was your change already accepted?
>>>
>>
>> Great, thanks a lot. it is still being reviewing for now and waiting for 
>> response for now.
> 
> Then I would be inclined to take Jon's patch as-is, and follow up with
> an additional patch to the NS2 DTS once yours lands in.
> 

Sure, please.

Kefeng

[RFC][PATCH 8/7] sched/fair: Use utilization distance to filter affine sync wakeups

2016-05-17 Thread Mike Galbraith

On Mon, 2016-05-09 at 12:48 +0200, Peter Zijlstra wrote:
> Hai,

(got some of the frozen variety handy?:)

> here be a semi coherent patch series for the recent select_idle_siblings()
> tinkering. Happy benchmarking..

And tinkering on top of your rewrite series...

sched/fair: Use utilization distance to filter affine sync wakeups

Identifying truly synchronous tasks accurately is annoyingly fragile,
which led to the demise of the old avg_overlap heuristic, which meant
that we schedule tasks high frequency localhost communicating buddies
to L3 vs L2, causing them to take painful cache misses needlessly.

To combat this, track average utilization distance, and when both
waker/wakee are short duration tasks cycling at the ~same frequency
(ie can't have any appreciable reclaimable overlap), and the sync
hint has been passed, take that as a queue that pulling the wakee
to hot L2 is very likely to be a win.  Changes in behavior, such
as taking a long nap, bursts of other activity, or sharing the rq
with tasks that are not cycling rapidly will quickly encourage the
pair to search for a new home, where they can again find each other.

This only helps really fast movers, but that's ok (if we can get
away with it at all), as these are the ones that need some help.

It's dirt simple, cheap, and seems to work pretty well.  It does
help fast movers, does not wreck lmbench AF_UNIX/TCP throughput
gains that select_idle_sibling() provided, and didn't change pgbench
numbers one bit on my desktop box, ie tight discrimination criteria
seems to work out ok in light testing, so _maybe_ not completely
useless...
 
4 x E7-8890 tbench
Throughput 598.158 MB/sec  1 clients  1 procs  max_latency=0.287 ms 
1.000
Throughput 1166.26 MB/sec  2 clients  2 procs  max_latency=0.076 ms 
1.000
Throughput 2214.55 MB/sec  4 clients  4 procs  max_latency=0.087 ms 
1.000
Throughput 4264.44 MB/sec  8 clients  8 procs  max_latency=0.164 ms 
1.000
Throughput 7780.58 MB/sec  16 clients  16 procs  max_latency=0.109 ms   
1.000
Throughput 15199.3 MB/sec  32 clients  32 procs  max_latency=0.293 ms   
1.000
Throughput 21714.8 MB/sec  64 clients  64 procs  max_latency=0.872 ms   
1.000
Throughput 44916.1 MB/sec  128 clients  128 procs  max_latency=4.821 ms 
1.000
Throughput 76294.5 MB/sec  256 clients  256 procs  max_latency=7.375 ms 
1.000

+IDLE_SYNC
Throughput 737.781 MB/sec  1 clients  1 procs  max_latency=0.248 ms 
1.233
Throughput 1478.49 MB/sec  2 clients  2 procs  max_latency=0.321 ms 
1.267
Throughput 2506.98 MB/sec  4 clients  4 procs  max_latency=0.413 ms 
1.132
Throughput 4359.15 MB/sec  8 clients  8 procs  max_latency=0.306 ms 
1.022
Throughput 9025.05 MB/sec  16 clients  16 procs  max_latency=0.349 ms   
1.159
Throughput 18703.1 MB/sec  32 clients  32 procs  max_latency=0.290 ms   
1.230
Throughput 33600.8 MB/sec  64 clients  64 procs  max_latency=6.469 ms   
1.547
Throughput 59084.3 MB/sec  128 clients  128 procs  max_latency=5.031 ms 
1.315
Throughput 75705.8 MB/sec  256 clients  256 procs  max_latency=24.113 ms
0.992

1 x i4790 lmbench3
*Local* Communication bandwidths in MB/s - bigger is better
-
HostOS  Pipe AFTCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
 UNIX  reread reread (libc) (hand) read write
- -    -- -- -- --  -
IDLE_CORE+IDLE_CPU+IDLE_SMT
homer 4.6.0-masterx 6027 14.K 9773 8905.2  15.2K  10.1K 6775.0 15.K 10.0K
homer 4.6.0-masterx 5962 14.K 9881 8900.7  15.0K  10.1K 6785.2 15.K 10.0K
homer 4.6.0-masterx 5935 14.K 9917 8946.2  15.0K  10.1K 6761.8 15.K 9826.
+IDLE_SYNC
homer 4.6.0-masterx 8865 14.K 9807 8880.6  14.7K  10.1K 6777.9 15.K 9966.
homer 4.6.0-masterx 8855 13.K 9856 8844.5  15.2K  10.1K 6752.1 15.K 10.0K
homer 4.6.0-masterx 8896 14.K 9836 8880.1  15.0K  10.2K 6771.6 15.K 9941.
^++  ^+-  ^+-
select_idle_sibling() completely disabled
homer 4.6.0-masterx 8810 9807 7109 8982.8  15.4K  10.2K 6831.7 15.K 10.1K
homer 4.6.0-masterx 8877 9757 6864 8970.1  15.3K  10.2K 6826.6 15.K 10.1K
homer 4.6.0-masterx 8779 9736 10.K 8975.6  15.4K  10.1K 6830.2 15.K 10.1K
^++  ^--  ^--

Signed-off-by: Mike Galbraith 
---
 include/linux/sched.h   |2 -
 kernel/sched/core.c |6 -
 kernel/sched/fair.c |   49 +---
 kernel/sched/features.h |1 
 kernel/sched/sched.h|7 ++
 5 files changed, 51 insertions(+), 14 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1302,7 +1302,7 @@ struct load_weight {
  * issues.
  */
 struct sched_avg {
-   u64 last_update_time, load_sum;
+   u64 last_update_time, load_sum, util_dist_us;
u32 util_sum,

[RFC][PATCH 8/7] sched/fair: Use utilization distance to filter affine sync wakeups

2016-05-17 Thread Mike Galbraith

On Mon, 2016-05-09 at 12:48 +0200, Peter Zijlstra wrote:
> Hai,

(got some of the frozen variety handy?:)

> here be a semi coherent patch series for the recent select_idle_siblings()
> tinkering. Happy benchmarking..

And tinkering on top of your rewrite series...

sched/fair: Use utilization distance to filter affine sync wakeups

Identifying truly synchronous tasks accurately is annoyingly fragile,
which led to the demise of the old avg_overlap heuristic, which meant
that we schedule tasks high frequency localhost communicating buddies
to L3 vs L2, causing them to take painful cache misses needlessly.

To combat this, track average utilization distance, and when both
waker/wakee are short duration tasks cycling at the ~same frequency
(ie can't have any appreciable reclaimable overlap), and the sync
hint has been passed, take that as a queue that pulling the wakee
to hot L2 is very likely to be a win.  Changes in behavior, such
as taking a long nap, bursts of other activity, or sharing the rq
with tasks that are not cycling rapidly will quickly encourage the
pair to search for a new home, where they can again find each other.

This only helps really fast movers, but that's ok (if we can get
away with it at all), as these are the ones that need some help.

It's dirt simple, cheap, and seems to work pretty well.  It does
help fast movers, does not wreck lmbench AF_UNIX/TCP throughput
gains that select_idle_sibling() provided, and didn't change pgbench
numbers one bit on my desktop box, ie tight discrimination criteria
seems to work out ok in light testing, so _maybe_ not completely
useless...
 
4 x E7-8890 tbench
Throughput 598.158 MB/sec  1 clients  1 procs  max_latency=0.287 ms 
1.000
Throughput 1166.26 MB/sec  2 clients  2 procs  max_latency=0.076 ms 
1.000
Throughput 2214.55 MB/sec  4 clients  4 procs  max_latency=0.087 ms 
1.000
Throughput 4264.44 MB/sec  8 clients  8 procs  max_latency=0.164 ms 
1.000
Throughput 7780.58 MB/sec  16 clients  16 procs  max_latency=0.109 ms   
1.000
Throughput 15199.3 MB/sec  32 clients  32 procs  max_latency=0.293 ms   
1.000
Throughput 21714.8 MB/sec  64 clients  64 procs  max_latency=0.872 ms   
1.000
Throughput 44916.1 MB/sec  128 clients  128 procs  max_latency=4.821 ms 
1.000
Throughput 76294.5 MB/sec  256 clients  256 procs  max_latency=7.375 ms 
1.000

+IDLE_SYNC
Throughput 737.781 MB/sec  1 clients  1 procs  max_latency=0.248 ms 
1.233
Throughput 1478.49 MB/sec  2 clients  2 procs  max_latency=0.321 ms 
1.267
Throughput 2506.98 MB/sec  4 clients  4 procs  max_latency=0.413 ms 
1.132
Throughput 4359.15 MB/sec  8 clients  8 procs  max_latency=0.306 ms 
1.022
Throughput 9025.05 MB/sec  16 clients  16 procs  max_latency=0.349 ms   
1.159
Throughput 18703.1 MB/sec  32 clients  32 procs  max_latency=0.290 ms   
1.230
Throughput 33600.8 MB/sec  64 clients  64 procs  max_latency=6.469 ms   
1.547
Throughput 59084.3 MB/sec  128 clients  128 procs  max_latency=5.031 ms 
1.315
Throughput 75705.8 MB/sec  256 clients  256 procs  max_latency=24.113 ms
0.992

1 x i4790 lmbench3
*Local* Communication bandwidths in MB/s - bigger is better
-
HostOS  Pipe AFTCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
 UNIX  reread reread (libc) (hand) read write
- -    -- -- -- --  -
IDLE_CORE+IDLE_CPU+IDLE_SMT
homer 4.6.0-masterx 6027 14.K 9773 8905.2  15.2K  10.1K 6775.0 15.K 10.0K
homer 4.6.0-masterx 5962 14.K 9881 8900.7  15.0K  10.1K 6785.2 15.K 10.0K
homer 4.6.0-masterx 5935 14.K 9917 8946.2  15.0K  10.1K 6761.8 15.K 9826.
+IDLE_SYNC
homer 4.6.0-masterx 8865 14.K 9807 8880.6  14.7K  10.1K 6777.9 15.K 9966.
homer 4.6.0-masterx 8855 13.K 9856 8844.5  15.2K  10.1K 6752.1 15.K 10.0K
homer 4.6.0-masterx 8896 14.K 9836 8880.1  15.0K  10.2K 6771.6 15.K 9941.
^++  ^+-  ^+-
select_idle_sibling() completely disabled
homer 4.6.0-masterx 8810 9807 7109 8982.8  15.4K  10.2K 6831.7 15.K 10.1K
homer 4.6.0-masterx 8877 9757 6864 8970.1  15.3K  10.2K 6826.6 15.K 10.1K
homer 4.6.0-masterx 8779 9736 10.K 8975.6  15.4K  10.1K 6830.2 15.K 10.1K
^++  ^--  ^--

Signed-off-by: Mike Galbraith 
---
 include/linux/sched.h   |2 -
 kernel/sched/core.c |6 -
 kernel/sched/fair.c |   49 +---
 kernel/sched/features.h |1 
 kernel/sched/sched.h|7 ++
 5 files changed, 51 insertions(+), 14 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1302,7 +1302,7 @@ struct load_weight {
  * issues.
  */
 struct sched_avg {
-   u64 last_update_time, load_sum;
+   u64 last_update_time, load_sum, util_dist_us;
u32 util_sum, period_contrib;
unsigned

Re: UBSAN: Undefined behaviour in block/blk-mq.c:1459:27 with pata_amd

2016-05-17 Thread Meelis Roos

> Does the patch below help?
> 
> From: Bartlomiej Zolnierkiewicz 
> Subject: [PATCH] blk-mq: fix undefined behaviour in order_to_size()
> 
> When this_order variable in blk_mq_init_rq_map() becomes zero
> the code incorrectly decrements the variable and passes the result
> to order_to_size() helper causing undefined behaviour:
> 
>  UBSAN: Undefined behaviour in block/blk-mq.c:1459:27
>  shift exponent 4294967295 is too large for 32-bit type 'unsigned int'
>  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc6-00072-g33656a1 #22
> 
> Fix the code by checking this_order variable for not having the zero
> value first.
> 
> Reported-by: Meelis Roos 
> Fixes: 320ae51feed5 ("blk-mq: new multi-queue block IO queueing mechanism")
> Signed-off-by: Bartlomiej Zolnierkiewicz 

It fixes the warning independently of the pata driver - ata_piix 
pata_via pata_serverworks pata_macio 3ware were fixed too.

-- 
Meelis Roos (mr...@linux.ee)

Re: UBSAN: Undefined behaviour in block/blk-mq.c:1459:27 with pata_amd

2016-05-17 Thread Meelis Roos

> Does the patch below help?
> 
> From: Bartlomiej Zolnierkiewicz 
> Subject: [PATCH] blk-mq: fix undefined behaviour in order_to_size()
> 
> When this_order variable in blk_mq_init_rq_map() becomes zero
> the code incorrectly decrements the variable and passes the result
> to order_to_size() helper causing undefined behaviour:
> 
>  UBSAN: Undefined behaviour in block/blk-mq.c:1459:27
>  shift exponent 4294967295 is too large for 32-bit type 'unsigned int'
>  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc6-00072-g33656a1 #22
> 
> Fix the code by checking this_order variable for not having the zero
> value first.
> 
> Reported-by: Meelis Roos 
> Fixes: 320ae51feed5 ("blk-mq: new multi-queue block IO queueing mechanism")
> Signed-off-by: Bartlomiej Zolnierkiewicz 

It fixes the warning independently of the pata driver - ata_piix 
pata_via pata_serverworks pata_macio 3ware were fixed too.

-- 
Meelis Roos (mr...@linux.ee)

Re: [PATCH] arc: axs103_smp: Fix CPU frequency to 100MHz for dual-core

2016-05-17 Thread Vineet Gupta

On Monday 16 May 2016 03:27 PM, Alexey Brodkin wrote:
> The most recent release of AXS103 [v1.1] is proven to work
> at 100 MHz in dual-core mode so this change uses mentioned feature.
> For that we:
>  * Update axc003_idu.dtsi with mention of really-used CPU clock freq
>  * Remove clock override in AXS platform code for dual-core HW
> 
> Note we're still leaving a hack for clock "downgrade" on early boot
> for quad-core hardware.
> 
> Also note this change will break functionality of AXS103 v1.0 hardware.
> That means all users of AXS103 __must__ upgrade their boards with the
> most recent firmware.
> 
> Signed-off-by: Alexey Brodkin 

Applied for 4.7

Thx,
-Vineet

Re: [PATCH] arc: axs103_smp: Fix CPU frequency to 100MHz for dual-core

2016-05-17 Thread Vineet Gupta

On Monday 16 May 2016 03:27 PM, Alexey Brodkin wrote:
> The most recent release of AXS103 [v1.1] is proven to work
> at 100 MHz in dual-core mode so this change uses mentioned feature.
> For that we:
>  * Update axc003_idu.dtsi with mention of really-used CPU clock freq
>  * Remove clock override in AXS platform code for dual-core HW
> 
> Note we're still leaving a hack for clock "downgrade" on early boot
> for quad-core hardware.
> 
> Also note this change will break functionality of AXS103 v1.0 hardware.
> That means all users of AXS103 __must__ upgrade their boards with the
> most recent firmware.
> 
> Signed-off-by: Alexey Brodkin 

Applied for 4.7

Thx,
-Vineet

Re: [PATCH] ARC: Troubleshoot execution of UP Linux on SMP HW and vice versa

2016-05-17 Thread Vineet Gupta

On Wednesday 18 May 2016 02:36 AM, Alexey Brodkin wrote:
> ARC SMP hardware heavily relies on Interrupt Distribution Unit (IDU)
> for all interrupts serving. And UP ARC hardware lacks this block.
> 
> That leads to incompatibility between UP and SMP Linux builds.
> 
> Even though UP build of Linux will run on SMP hardware at some
> point strange behavior will appear. Very good example is serial port
> will stop functioning once it switches from earlycon driver (which
> doesn't use interrupts) to full-scale serial driver (that will rely
> on interrupts).
> 
> The same is applicable to reverse combination: SMP build won't
> work on UP hardware and symptoms will be pretty much the same.

This is not necessarily correct. I'm pretty sure that if u have right DT 
(despite
embedded, if you have uboot provide a different one), SMP kernel will infact 
boot
on UP hardware - and if not - we should actively try to achieve that. That is
where world is moving: fwiw ARM64 kernel forces CONFIG_SMP because and doesn't
even support UP anymore.

> And so to save [especially  newcomers] from spending hours in
> frustration we're doing a check very early on boot if the kernel was
> configured with CONFIG_ARC_MCIP (which is automatically selected as
> a dependency of CONFIG_SMP) and in run-time we're seeing SMP-specific
> register that holds a number of SMP cores.
> 
> Signed-off-by: Alexey Brodkin 
> ---
>  arch/arc/kernel/setup.c | 12 
...
> @@ -374,6 +375,8 @@ static inline int is_kernel(unsigned long addr)
>  
>  void __init setup_arch(char **cmdline_p)
>  {
> + unsigned int num_cores;
> +
>  #ifdef CONFIG_ARC_UBOOT_SUPPORT
>   /* make sure that uboot passed pointer to cmdline/dtb is valid */
>   if (uboot_tag && is_kernel((unsigned long)uboot_arg))
> @@ -413,6 +416,15 @@ void __init setup_arch(char **cmdline_p)
>   if (machine_desc->init_early)
>   machine_desc->init_early();
>  
> + num_cores = (read_aux_reg(ARC_REG_MCIP_BCR) >> 16) & 0x3F;
> +#ifdef CONFIG_ARC_MCIP
> + if (!num_cores)
> + panic("SMP kernel is run on a UP hardware!\n");
> +#else
> + if (num_cores)
> + panic("UP kernel is run on a SMP hardware!\n");
> +#endif

This is ugly: if AXS platform has trouble booting with UP/SMP hw/sw mismatch, do
that in platform early init code w/o littering platform agnostic code unless
absolutely necessary.

> +
>   smp_init_cpus();
>  
>   setup_processor();
>

Re: [PATCH] ARC: Troubleshoot execution of UP Linux on SMP HW and vice versa

2016-05-17 Thread Vineet Gupta

On Wednesday 18 May 2016 02:36 AM, Alexey Brodkin wrote:
> ARC SMP hardware heavily relies on Interrupt Distribution Unit (IDU)
> for all interrupts serving. And UP ARC hardware lacks this block.
> 
> That leads to incompatibility between UP and SMP Linux builds.
> 
> Even though UP build of Linux will run on SMP hardware at some
> point strange behavior will appear. Very good example is serial port
> will stop functioning once it switches from earlycon driver (which
> doesn't use interrupts) to full-scale serial driver (that will rely
> on interrupts).
> 
> The same is applicable to reverse combination: SMP build won't
> work on UP hardware and symptoms will be pretty much the same.

This is not necessarily correct. I'm pretty sure that if u have right DT 
(despite
embedded, if you have uboot provide a different one), SMP kernel will infact 
boot
on UP hardware - and if not - we should actively try to achieve that. That is
where world is moving: fwiw ARM64 kernel forces CONFIG_SMP because and doesn't
even support UP anymore.

> And so to save [especially  newcomers] from spending hours in
> frustration we're doing a check very early on boot if the kernel was
> configured with CONFIG_ARC_MCIP (which is automatically selected as
> a dependency of CONFIG_SMP) and in run-time we're seeing SMP-specific
> register that holds a number of SMP cores.
> 
> Signed-off-by: Alexey Brodkin 
> ---
>  arch/arc/kernel/setup.c | 12 
...
> @@ -374,6 +375,8 @@ static inline int is_kernel(unsigned long addr)
>  
>  void __init setup_arch(char **cmdline_p)
>  {
> + unsigned int num_cores;
> +
>  #ifdef CONFIG_ARC_UBOOT_SUPPORT
>   /* make sure that uboot passed pointer to cmdline/dtb is valid */
>   if (uboot_tag && is_kernel((unsigned long)uboot_arg))
> @@ -413,6 +416,15 @@ void __init setup_arch(char **cmdline_p)
>   if (machine_desc->init_early)
>   machine_desc->init_early();
>  
> + num_cores = (read_aux_reg(ARC_REG_MCIP_BCR) >> 16) & 0x3F;
> +#ifdef CONFIG_ARC_MCIP
> + if (!num_cores)
> + panic("SMP kernel is run on a UP hardware!\n");
> +#else
> + if (num_cores)
> + panic("UP kernel is run on a SMP hardware!\n");
> +#endif

This is ugly: if AXS platform has trouble booting with UP/SMP hw/sw mismatch, do
that in platform early init code w/o littering platform agnostic code unless
absolutely necessary.

> +
>   smp_init_cpus();
>  
>   setup_processor();
>

Re: [PATCH] net: au1000 eth: simplify logical expression

2016-05-17 Thread Florian Fainelli

Le 17/05/2016 16:58, Heinrich Schuchardt a écrit :
> (a && a > 0) is equivalent to (a > 0).
> 
> Signed-off-by: Heinrich Schuchardt 

Acked-by: Florian Fainelli 
-- 
Florian

Re: [PATCH] net: au1000 eth: simplify logical expression

2016-05-17 Thread Florian Fainelli

Le 17/05/2016 16:58, Heinrich Schuchardt a écrit :
> (a && a > 0) is equivalent to (a > 0).
> 
> Signed-off-by: Heinrich Schuchardt 

Acked-by: Florian Fainelli 
-- 
Florian

Re: Crashes in -next due to 'phy: add support for a reset-gpio specification'

2016-05-17 Thread Florian Fainelli

Le 17/05/2016 21:37, Guenter Roeck a écrit :
> Hi,
> 
> my xtensa qemu tests crash in -next as follows.
> 
> [ ... ]
> 
> [9.366256] libphy: ethoc-mdio: probed
> [9.367389]  (null): could not attach to PHY
> [9.368555]  (null): failed to probe MDIO bus
> [9.371540] Unable to handle kernel paging request at virtual address
> 001c
> [9.371540]  pc = d0320926, ra = 903209d1
> [9.375358] Oops: sig: 11 [#1]
> [9.376081] PREEMPT
> [9.377080] CPU: 0 PID: 1 Comm: swapper Not tainted
> 4.6.0-next-20160517 #1
> [9.378397] task: d7c2c000 ti: d7c3 task.ti: d7c3
> [9.379394] a00: 903209d1 d7c31bd0 d7fb5810 0001 
>  d7f45c00 d7c31bd0
> [9.382298] a08:     00060100
> d04b0c10 d7f45dfc d7c31bb0
> [9.385732] pc: d0320926, ps: 00060110, depc: 0018, excvaddr:
> 001c
> [9.387061] lbeg: d0322e35, lend: d0322e57 lcount: , sar:
> 0011
> [9.388173]
> Stack: d7c31be0 00060700 d7f45c00 d7c31bd0 9021d509 d7c31c30 d7f45c00
> 
>d0485dcc d0485dcc d7fb5810 d7c2c000  d7c31c30 d7f45c00
> d025befc
>d0485dcc d7c3 d7f45c34 d7c31bf0 9021c985 d7c31c50 d7f45c00
> d7f45c34
> [9.396652] Call Trace:
> [9.397469]  [] __device_release_driver+0x7d/0x98
> [9.398869]  [] device_release_driver+0x15/0x20
> [9.400247]  [] bus_remove_device+0xc1/0xd4
> [9.401569]  [] device_del+0x109/0x15c
> [9.402794]  [] phy_mdio_device_remove+0xd/0x18
> [9.404124]  [] mdiobus_unregister+0x40/0x5c
> [9.405444]  [] ethoc_probe+0x534/0x5b8
> [9.406742]  [] platform_drv_probe+0x28/0x48
> [9.408122]  [] driver_probe_device+0x101/0x234
> [9.409499]  [] __driver_attach+0x7d/0x98
> [9.410809]  [] bus_for_each_dev+0x30/0x5c
> [9.412104]  [] driver_attach+0x14/0x18
> [9.413385]  [] bus_add_driver+0xc9/0x198
> [9.414686]  [] driver_register+0x70/0xa0
> [9.416001]  [] __platform_driver_register+0x24/0x28
> [9.417463]  [] ethoc_driver_init+0x10/0x14
> [9.418824]  [] do_one_initcall+0x80/0x1ac
> [9.420083]  [] kernel_init_freeable+0x131/0x198
> [9.421504]  [] kernel_init+0xc/0xb0
> [9.422693]  [] ret_from_kernel_thread+0x8/0xc
> 
> Bisect points to commit da47b4572056 ("phy: add support for a reset-gpio
> specification").
> Bisect log is attached. Reverting the patch fixes the problem.

Aside from what you pointed out, this patch was still in dicussion when
it got merged, since we got a concurrent patch from Sergei which tries
to deal with the same kind of problem.

Do you mind sending a revert, or I can do that first thing in the morning.

> 
> I think there may be a number of problems, all of them exposed by the patch
> but really separate.
> 
> GPIOLIB is not configured in my test case, meaning gpiod_get_optional()
> returns -ENOSYS, and phy_probe() thus returns an error. Question here is if
> it is really appropriate for the XXX_optional() gpiolib functions to return
> an error if GPIOLIB is not configured. Either case, result is that pretty
> much all phy registrations will now fail if GPIOLIB is not configured.
> 
> Also, I suspect that there may be a bug in the error handling path
> of ethoc_probe(). No idea what exactly is wrong, though. Other drivers
> use pretty much the same code sequence for mdio registration and associated
> error handling.
> 
> Last but not least, something seems to be wrong with the use of dev_err()
> with >dev if register_netdev() has not yet been called. Maybe
> someone
> has some insight ?

It all depends if SET_NETDEV_DEV() has had a chance to run, but in
general it is kind of a bad idea to use netdev_* before the interface
has been registered, since it won't have any valid name.
-- 
Florian

Re: Crashes in -next due to 'phy: add support for a reset-gpio specification'

2016-05-17 Thread Florian Fainelli

Le 17/05/2016 21:37, Guenter Roeck a écrit :
> Hi,
> 
> my xtensa qemu tests crash in -next as follows.
> 
> [ ... ]
> 
> [9.366256] libphy: ethoc-mdio: probed
> [9.367389]  (null): could not attach to PHY
> [9.368555]  (null): failed to probe MDIO bus
> [9.371540] Unable to handle kernel paging request at virtual address
> 001c
> [9.371540]  pc = d0320926, ra = 903209d1
> [9.375358] Oops: sig: 11 [#1]
> [9.376081] PREEMPT
> [9.377080] CPU: 0 PID: 1 Comm: swapper Not tainted
> 4.6.0-next-20160517 #1
> [9.378397] task: d7c2c000 ti: d7c3 task.ti: d7c3
> [9.379394] a00: 903209d1 d7c31bd0 d7fb5810 0001 
>  d7f45c00 d7c31bd0
> [9.382298] a08:     00060100
> d04b0c10 d7f45dfc d7c31bb0
> [9.385732] pc: d0320926, ps: 00060110, depc: 0018, excvaddr:
> 001c
> [9.387061] lbeg: d0322e35, lend: d0322e57 lcount: , sar:
> 0011
> [9.388173]
> Stack: d7c31be0 00060700 d7f45c00 d7c31bd0 9021d509 d7c31c30 d7f45c00
> 
>d0485dcc d0485dcc d7fb5810 d7c2c000  d7c31c30 d7f45c00
> d025befc
>d0485dcc d7c3 d7f45c34 d7c31bf0 9021c985 d7c31c50 d7f45c00
> d7f45c34
> [9.396652] Call Trace:
> [9.397469]  [] __device_release_driver+0x7d/0x98
> [9.398869]  [] device_release_driver+0x15/0x20
> [9.400247]  [] bus_remove_device+0xc1/0xd4
> [9.401569]  [] device_del+0x109/0x15c
> [9.402794]  [] phy_mdio_device_remove+0xd/0x18
> [9.404124]  [] mdiobus_unregister+0x40/0x5c
> [9.405444]  [] ethoc_probe+0x534/0x5b8
> [9.406742]  [] platform_drv_probe+0x28/0x48
> [9.408122]  [] driver_probe_device+0x101/0x234
> [9.409499]  [] __driver_attach+0x7d/0x98
> [9.410809]  [] bus_for_each_dev+0x30/0x5c
> [9.412104]  [] driver_attach+0x14/0x18
> [9.413385]  [] bus_add_driver+0xc9/0x198
> [9.414686]  [] driver_register+0x70/0xa0
> [9.416001]  [] __platform_driver_register+0x24/0x28
> [9.417463]  [] ethoc_driver_init+0x10/0x14
> [9.418824]  [] do_one_initcall+0x80/0x1ac
> [9.420083]  [] kernel_init_freeable+0x131/0x198
> [9.421504]  [] kernel_init+0xc/0xb0
> [9.422693]  [] ret_from_kernel_thread+0x8/0xc
> 
> Bisect points to commit da47b4572056 ("phy: add support for a reset-gpio
> specification").
> Bisect log is attached. Reverting the patch fixes the problem.

Aside from what you pointed out, this patch was still in dicussion when
it got merged, since we got a concurrent patch from Sergei which tries
to deal with the same kind of problem.

Do you mind sending a revert, or I can do that first thing in the morning.

> 
> I think there may be a number of problems, all of them exposed by the patch
> but really separate.
> 
> GPIOLIB is not configured in my test case, meaning gpiod_get_optional()
> returns -ENOSYS, and phy_probe() thus returns an error. Question here is if
> it is really appropriate for the XXX_optional() gpiolib functions to return
> an error if GPIOLIB is not configured. Either case, result is that pretty
> much all phy registrations will now fail if GPIOLIB is not configured.
> 
> Also, I suspect that there may be a bug in the error handling path
> of ethoc_probe(). No idea what exactly is wrong, though. Other drivers
> use pretty much the same code sequence for mdio registration and associated
> error handling.
> 
> Last but not least, something seems to be wrong with the use of dev_err()
> with >dev if register_netdev() has not yet been called. Maybe
> someone
> has some insight ?

It all depends if SET_NETDEV_DEV() has had a chance to run, but in
general it is kind of a bad idea to use netdev_* before the interface
has been registered, since it won't have any valid name.
-- 
Florian

[PATCH 1/1] Staging: comedi: fix CHECK: Prefer using the BIT macro issues in pcmmio.c

2016-05-17 Thread Ravishankar Karkala Mallikarjunayya

This patch Replace all occurences of (1<
---
 drivers/staging/comedi/drivers/pcmmio.c | 40 -
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/staging/comedi/drivers/pcmmio.c 
b/drivers/staging/comedi/drivers/pcmmio.c
index 10472e6..70ad497 100644
--- a/drivers/staging/comedi/drivers/pcmmio.c
+++ b/drivers/staging/comedi/drivers/pcmmio.c
@@ -84,25 +84,25 @@
 #define PCMMIO_AI_LSB_REG  0x00
 #define PCMMIO_AI_MSB_REG  0x01
 #define PCMMIO_AI_CMD_REG  0x02
-#define PCMMIO_AI_CMD_SE   (1 << 7)
-#define PCMMIO_AI_CMD_ODD_CHAN (1 << 6)
+#define PCMMIO_AI_CMD_SE   BIT(7)
+#define PCMMIO_AI_CMD_ODD_CHAN BIT(6)
 #define PCMMIO_AI_CMD_CHAN_SEL(x)  (((x) & 0x3) << 4)
 #define PCMMIO_AI_CMD_RANGE(x) (((x) & 0x3) << 2)
 #define PCMMIO_RESOURCE_REG0x02
 #define PCMMIO_RESOURCE_IRQ(x) (((x) & 0xf) << 0)
 #define PCMMIO_AI_STATUS_REG   0x03
-#define PCMMIO_AI_STATUS_DATA_READY(1 << 7)
-#define PCMMIO_AI_STATUS_DATA_DMA_PEND (1 << 6)
-#define PCMMIO_AI_STATUS_CMD_DMA_PEND  (1 << 5)
-#define PCMMIO_AI_STATUS_IRQ_PEND  (1 << 4)
-#define PCMMIO_AI_STATUS_DATA_DRQ_ENA  (1 << 2)
-#define PCMMIO_AI_STATUS_REG_SEL   (1 << 3)
-#define PCMMIO_AI_STATUS_CMD_DRQ_ENA   (1 << 1)
-#define PCMMIO_AI_STATUS_IRQ_ENA   (1 << 0)
+#define PCMMIO_AI_STATUS_DATA_READYBIT(7)
+#define PCMMIO_AI_STATUS_DATA_DMA_PEND BIT(6)
+#define PCMMIO_AI_STATUS_CMD_DMA_PEND  BIT(5)
+#define PCMMIO_AI_STATUS_IRQ_PEND  BIT(4)
+#define PCMMIO_AI_STATUS_DATA_DRQ_ENA  BIT(2)
+#define PCMMIO_AI_STATUS_REG_SEL   BIT(3)
+#define PCMMIO_AI_STATUS_CMD_DRQ_ENA   BIT(1)
+#define PCMMIO_AI_STATUS_IRQ_ENA   BIT(0)
 #define PCMMIO_AI_RES_ENA_REG  0x03
 #define PCMMIO_AI_RES_ENA_CMD_REG_ACCESS   (0 << 3)
-#define PCMMIO_AI_RES_ENA_AI_RES_ACCESS(1 << 3)
-#define PCMMIO_AI_RES_ENA_DIO_RES_ACCESS   (1 << 4)
+#define PCMMIO_AI_RES_ENA_AI_RES_ACCESSBIT(3)
+#define PCMMIO_AI_RES_ENA_DIO_RES_ACCESS   BIT(4)
 #define PCMMIO_AI_2ND_ADC_OFFSET   0x04
 
 #define PCMMIO_AO_LSB_REG  0x08
@@ -125,14 +125,14 @@
 #define PCMMIO_AO_CMD_CHAN_SEL(x)  (((x) & 0x03) << 1)
 #define PCMMIO_AO_CMD_CHAN_SEL_ALL (0x0f << 0)
 #define PCMMIO_AO_STATUS_REG   0x0b
-#define PCMMIO_AO_STATUS_DATA_READY(1 << 7)
-#define PCMMIO_AO_STATUS_DATA_DMA_PEND (1 << 6)
-#define PCMMIO_AO_STATUS_CMD_DMA_PEND  (1 << 5)
-#define PCMMIO_AO_STATUS_IRQ_PEND  (1 << 4)
-#define PCMMIO_AO_STATUS_DATA_DRQ_ENA  (1 << 2)
-#define PCMMIO_AO_STATUS_REG_SEL   (1 << 3)
-#define PCMMIO_AO_STATUS_CMD_DRQ_ENA   (1 << 1)
-#define PCMMIO_AO_STATUS_IRQ_ENA   (1 << 0)
+#define PCMMIO_AO_STATUS_DATA_READYBIT(7)
+#define PCMMIO_AO_STATUS_DATA_DMA_PEND BIT(6)
+#define PCMMIO_AO_STATUS_CMD_DMA_PEND  BIT(5)
+#define PCMMIO_AO_STATUS_IRQ_PEND  BIT(4)
+#define PCMMIO_AO_STATUS_DATA_DRQ_ENA  BIT(2)
+#define PCMMIO_AO_STATUS_REG_SEL   BIT(3)
+#define PCMMIO_AO_STATUS_CMD_DRQ_ENA   BIT(1)
+#define PCMMIO_AO_STATUS_IRQ_ENA   BIT(0)
 #define PCMMIO_AO_RESOURCE_ENA_REG 0x0b
 #define PCMMIO_AO_2ND_DAC_OFFSET   0x04
 
-- 
1.9.1

[PATCH 1/1] Staging: comedi: fix CHECK: Prefer using the BIT macro issues in pcmmio.c

2016-05-17 Thread Ravishankar Karkala Mallikarjunayya

This patch Replace all occurences of (1<
---
 drivers/staging/comedi/drivers/pcmmio.c | 40 -
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/staging/comedi/drivers/pcmmio.c 
b/drivers/staging/comedi/drivers/pcmmio.c
index 10472e6..70ad497 100644
--- a/drivers/staging/comedi/drivers/pcmmio.c
+++ b/drivers/staging/comedi/drivers/pcmmio.c
@@ -84,25 +84,25 @@
 #define PCMMIO_AI_LSB_REG  0x00
 #define PCMMIO_AI_MSB_REG  0x01
 #define PCMMIO_AI_CMD_REG  0x02
-#define PCMMIO_AI_CMD_SE   (1 << 7)
-#define PCMMIO_AI_CMD_ODD_CHAN (1 << 6)
+#define PCMMIO_AI_CMD_SE   BIT(7)
+#define PCMMIO_AI_CMD_ODD_CHAN BIT(6)
 #define PCMMIO_AI_CMD_CHAN_SEL(x)  (((x) & 0x3) << 4)
 #define PCMMIO_AI_CMD_RANGE(x) (((x) & 0x3) << 2)
 #define PCMMIO_RESOURCE_REG0x02
 #define PCMMIO_RESOURCE_IRQ(x) (((x) & 0xf) << 0)
 #define PCMMIO_AI_STATUS_REG   0x03
-#define PCMMIO_AI_STATUS_DATA_READY(1 << 7)
-#define PCMMIO_AI_STATUS_DATA_DMA_PEND (1 << 6)
-#define PCMMIO_AI_STATUS_CMD_DMA_PEND  (1 << 5)
-#define PCMMIO_AI_STATUS_IRQ_PEND  (1 << 4)
-#define PCMMIO_AI_STATUS_DATA_DRQ_ENA  (1 << 2)
-#define PCMMIO_AI_STATUS_REG_SEL   (1 << 3)
-#define PCMMIO_AI_STATUS_CMD_DRQ_ENA   (1 << 1)
-#define PCMMIO_AI_STATUS_IRQ_ENA   (1 << 0)
+#define PCMMIO_AI_STATUS_DATA_READYBIT(7)
+#define PCMMIO_AI_STATUS_DATA_DMA_PEND BIT(6)
+#define PCMMIO_AI_STATUS_CMD_DMA_PEND  BIT(5)
+#define PCMMIO_AI_STATUS_IRQ_PEND  BIT(4)
+#define PCMMIO_AI_STATUS_DATA_DRQ_ENA  BIT(2)
+#define PCMMIO_AI_STATUS_REG_SEL   BIT(3)
+#define PCMMIO_AI_STATUS_CMD_DRQ_ENA   BIT(1)
+#define PCMMIO_AI_STATUS_IRQ_ENA   BIT(0)
 #define PCMMIO_AI_RES_ENA_REG  0x03
 #define PCMMIO_AI_RES_ENA_CMD_REG_ACCESS   (0 << 3)
-#define PCMMIO_AI_RES_ENA_AI_RES_ACCESS(1 << 3)
-#define PCMMIO_AI_RES_ENA_DIO_RES_ACCESS   (1 << 4)
+#define PCMMIO_AI_RES_ENA_AI_RES_ACCESSBIT(3)
+#define PCMMIO_AI_RES_ENA_DIO_RES_ACCESS   BIT(4)
 #define PCMMIO_AI_2ND_ADC_OFFSET   0x04
 
 #define PCMMIO_AO_LSB_REG  0x08
@@ -125,14 +125,14 @@
 #define PCMMIO_AO_CMD_CHAN_SEL(x)  (((x) & 0x03) << 1)
 #define PCMMIO_AO_CMD_CHAN_SEL_ALL (0x0f << 0)
 #define PCMMIO_AO_STATUS_REG   0x0b
-#define PCMMIO_AO_STATUS_DATA_READY(1 << 7)
-#define PCMMIO_AO_STATUS_DATA_DMA_PEND (1 << 6)
-#define PCMMIO_AO_STATUS_CMD_DMA_PEND  (1 << 5)
-#define PCMMIO_AO_STATUS_IRQ_PEND  (1 << 4)
-#define PCMMIO_AO_STATUS_DATA_DRQ_ENA  (1 << 2)
-#define PCMMIO_AO_STATUS_REG_SEL   (1 << 3)
-#define PCMMIO_AO_STATUS_CMD_DRQ_ENA   (1 << 1)
-#define PCMMIO_AO_STATUS_IRQ_ENA   (1 << 0)
+#define PCMMIO_AO_STATUS_DATA_READYBIT(7)
+#define PCMMIO_AO_STATUS_DATA_DMA_PEND BIT(6)
+#define PCMMIO_AO_STATUS_CMD_DMA_PEND  BIT(5)
+#define PCMMIO_AO_STATUS_IRQ_PEND  BIT(4)
+#define PCMMIO_AO_STATUS_DATA_DRQ_ENA  BIT(2)
+#define PCMMIO_AO_STATUS_REG_SEL   BIT(3)
+#define PCMMIO_AO_STATUS_CMD_DRQ_ENA   BIT(1)
+#define PCMMIO_AO_STATUS_IRQ_ENA   BIT(0)
 #define PCMMIO_AO_RESOURCE_ENA_REG 0x0b
 #define PCMMIO_AO_2ND_DAC_OFFSET   0x04
 
-- 
1.9.1

Crashes in -next due to 'phy: add support for a reset-gpio specification'

2016-05-17 Thread Guenter Roeck


Hi,

my xtensa qemu tests crash in -next as follows.

[ ... ]

[9.366256] libphy: ethoc-mdio: probed
[9.367389]  (null): could not attach to PHY
[9.368555]  (null): failed to probe MDIO bus
[9.371540] Unable to handle kernel paging request at virtual address 
001c
[9.371540]  pc = d0320926, ra = 903209d1
[9.375358] Oops: sig: 11 [#1]
[9.376081] PREEMPT
[9.377080] CPU: 0 PID: 1 Comm: swapper Not tainted 4.6.0-next-20160517 #1
[9.378397] task: d7c2c000 ti: d7c3 task.ti: d7c3
[9.379394] a00: 903209d1 d7c31bd0 d7fb5810 0001   
d7f45c00 d7c31bd0
[9.382298] a08:     00060100 d04b0c10 
d7f45dfc d7c31bb0
[9.385732] pc: d0320926, ps: 00060110, depc: 0018, excvaddr: 001c
[9.387061] lbeg: d0322e35, lend: d0322e57 lcount: , sar: 0011
[9.388173]
Stack: d7c31be0 00060700 d7f45c00 d7c31bd0 9021d509 d7c31c30 d7f45c00 
   d0485dcc d0485dcc d7fb5810 d7c2c000  d7c31c30 d7f45c00 d025befc
   d0485dcc d7c3 d7f45c34 d7c31bf0 9021c985 d7c31c50 d7f45c00 d7f45c34
[9.396652] Call Trace:
[9.397469]  [] __device_release_driver+0x7d/0x98
[9.398869]  [] device_release_driver+0x15/0x20
[9.400247]  [] bus_remove_device+0xc1/0xd4
[9.401569]  [] device_del+0x109/0x15c
[9.402794]  [] phy_mdio_device_remove+0xd/0x18
[9.404124]  [] mdiobus_unregister+0x40/0x5c
[9.405444]  [] ethoc_probe+0x534/0x5b8
[9.406742]  [] platform_drv_probe+0x28/0x48
[9.408122]  [] driver_probe_device+0x101/0x234
[9.409499]  [] __driver_attach+0x7d/0x98
[9.410809]  [] bus_for_each_dev+0x30/0x5c
[9.412104]  [] driver_attach+0x14/0x18
[9.413385]  [] bus_add_driver+0xc9/0x198
[9.414686]  [] driver_register+0x70/0xa0
[9.416001]  [] __platform_driver_register+0x24/0x28
[9.417463]  [] ethoc_driver_init+0x10/0x14
[9.418824]  [] do_one_initcall+0x80/0x1ac
[9.420083]  [] kernel_init_freeable+0x131/0x198
[9.421504]  [] kernel_init+0xc/0xb0
[9.422693]  [] ret_from_kernel_thread+0x8/0xc

Bisect points to commit da47b4572056 ("phy: add support for a reset-gpio 
specification").
Bisect log is attached. Reverting the patch fixes the problem.

I think there may be a number of problems, all of them exposed by the patch
but really separate.

GPIOLIB is not configured in my test case, meaning gpiod_get_optional()
returns -ENOSYS, and phy_probe() thus returns an error. Question here is if
it is really appropriate for the XXX_optional() gpiolib functions to return
an error if GPIOLIB is not configured. Either case, result is that pretty
much all phy registrations will now fail if GPIOLIB is not configured.

Also, I suspect that there may be a bug in the error handling path
of ethoc_probe(). No idea what exactly is wrong, though. Other drivers
use pretty much the same code sequence for mdio registration and associated
error handling.

Last but not least, something seems to be wrong with the use of dev_err()
with >dev if register_netdev() has not yet been called. Maybe someone
has some insight ?

Test scripts and root file system used for the test are available at
https://github.com/groeck/linux-build-test/tree/master/rootfs/xtensa.

Guenter

---
# bad: [31b8ce4d1f8150fdc29d2f8a649dc4835e7f2961] arm: Use _rcuidle suffix to 
allow clk_core_enable() to used from idle
# good: [2dcd0af568b0cf583645c8a317dd12e344b1c72a] Linux 4.6
git bisect start 'HEAD' 'v4.6'
# bad: [dfd08ad591ff4f6d19896f21fb6c10dc4998dae4] Merge remote-tracking branch 
'net-next/master'
git bisect bad dfd08ad591ff4f6d19896f21fb6c10dc4998dae4
# good: [eeb1cd39e9e27d89375b33c3a907807fb5adba7e] Merge remote-tracking branch 
'xfs/for-next'
git bisect good eeb1cd39e9e27d89375b33c3a907807fb5adba7e
# good: [b75803d52a2ce1f6cbaf7ae0ae40a369210070cf] tcp: refactor struct 
tcp_skb_cb
git bisect good b75803d52a2ce1f6cbaf7ae0ae40a369210070cf
# good: [c2f40435ab0963284d348993b10ac66de6329b74] Merge remote-tracking branch 
'v4l-dvb/master'
git bisect good c2f40435ab0963284d348993b10ac66de6329b74
# good: [678c657e09034d6f87d254b3183873d6e4a493e4] Merge remote-tracking branch 
'slave-dma/next'
git bisect good 678c657e09034d6f87d254b3183873d6e4a493e4
# good: [6a47a570321fdcd2b6fc9e6537b2a3650d0fd04b] Merge branch 'mlx5-next'
git bisect good 6a47a570321fdcd2b6fc9e6537b2a3650d0fd04b
# good: [06566e5dd4e53f57fc3daa12fb8b5252772d70de] i40e: Refactor ethtool 
get_settings
git bisect good 06566e5dd4e53f57fc3daa12fb8b5252772d70de
# bad: [10cbc6843446165ee250e1ee80dc19ee325f1e6d] net/sched: cls_flower: 
Hardware offloaded filters statistics support
git bisect bad 10cbc6843446165ee250e1ee80dc19ee325f1e6d
# bad: [da47b4572056487fd7941c26f73b3e8815ff712a] phy: add support for a 
reset-gpio specification
git bisect bad da47b4572056487fd7941c26f73b3e8815ff712a
# good: [5049e33b559a44e9f216d86c58c7c7fce6f5df2f] bnxt_en: Add BCM57314 device 
ID.
git bisect good 5049e33b559a44e9f216d86c58c7c7fce6

Crashes in -next due to 'phy: add support for a reset-gpio specification'

2016-05-17 Thread Guenter Roeck


Hi,

my xtensa qemu tests crash in -next as follows.

[ ... ]

[9.366256] libphy: ethoc-mdio: probed
[9.367389]  (null): could not attach to PHY
[9.368555]  (null): failed to probe MDIO bus
[9.371540] Unable to handle kernel paging request at virtual address 
001c
[9.371540]  pc = d0320926, ra = 903209d1
[9.375358] Oops: sig: 11 [#1]
[9.376081] PREEMPT
[9.377080] CPU: 0 PID: 1 Comm: swapper Not tainted 4.6.0-next-20160517 #1
[9.378397] task: d7c2c000 ti: d7c3 task.ti: d7c3
[9.379394] a00: 903209d1 d7c31bd0 d7fb5810 0001   
d7f45c00 d7c31bd0
[9.382298] a08:     00060100 d04b0c10 
d7f45dfc d7c31bb0
[9.385732] pc: d0320926, ps: 00060110, depc: 0018, excvaddr: 001c
[9.387061] lbeg: d0322e35, lend: d0322e57 lcount: , sar: 0011
[9.388173]
Stack: d7c31be0 00060700 d7f45c00 d7c31bd0 9021d509 d7c31c30 d7f45c00 
   d0485dcc d0485dcc d7fb5810 d7c2c000  d7c31c30 d7f45c00 d025befc
   d0485dcc d7c3 d7f45c34 d7c31bf0 9021c985 d7c31c50 d7f45c00 d7f45c34
[9.396652] Call Trace:
[9.397469]  [] __device_release_driver+0x7d/0x98
[9.398869]  [] device_release_driver+0x15/0x20
[9.400247]  [] bus_remove_device+0xc1/0xd4
[9.401569]  [] device_del+0x109/0x15c
[9.402794]  [] phy_mdio_device_remove+0xd/0x18
[9.404124]  [] mdiobus_unregister+0x40/0x5c
[9.405444]  [] ethoc_probe+0x534/0x5b8
[9.406742]  [] platform_drv_probe+0x28/0x48
[9.408122]  [] driver_probe_device+0x101/0x234
[9.409499]  [] __driver_attach+0x7d/0x98
[9.410809]  [] bus_for_each_dev+0x30/0x5c
[9.412104]  [] driver_attach+0x14/0x18
[9.413385]  [] bus_add_driver+0xc9/0x198
[9.414686]  [] driver_register+0x70/0xa0
[9.416001]  [] __platform_driver_register+0x24/0x28
[9.417463]  [] ethoc_driver_init+0x10/0x14
[9.418824]  [] do_one_initcall+0x80/0x1ac
[9.420083]  [] kernel_init_freeable+0x131/0x198
[9.421504]  [] kernel_init+0xc/0xb0
[9.422693]  [] ret_from_kernel_thread+0x8/0xc

Bisect points to commit da47b4572056 ("phy: add support for a reset-gpio 
specification").
Bisect log is attached. Reverting the patch fixes the problem.

I think there may be a number of problems, all of them exposed by the patch
but really separate.

GPIOLIB is not configured in my test case, meaning gpiod_get_optional()
returns -ENOSYS, and phy_probe() thus returns an error. Question here is if
it is really appropriate for the XXX_optional() gpiolib functions to return
an error if GPIOLIB is not configured. Either case, result is that pretty
much all phy registrations will now fail if GPIOLIB is not configured.

Also, I suspect that there may be a bug in the error handling path
of ethoc_probe(). No idea what exactly is wrong, though. Other drivers
use pretty much the same code sequence for mdio registration and associated
error handling.

Last but not least, something seems to be wrong with the use of dev_err()
with >dev if register_netdev() has not yet been called. Maybe someone
has some insight ?

Test scripts and root file system used for the test are available at
https://github.com/groeck/linux-build-test/tree/master/rootfs/xtensa.

Guenter

---
# bad: [31b8ce4d1f8150fdc29d2f8a649dc4835e7f2961] arm: Use _rcuidle suffix to 
allow clk_core_enable() to used from idle
# good: [2dcd0af568b0cf583645c8a317dd12e344b1c72a] Linux 4.6
git bisect start 'HEAD' 'v4.6'
# bad: [dfd08ad591ff4f6d19896f21fb6c10dc4998dae4] Merge remote-tracking branch 
'net-next/master'
git bisect bad dfd08ad591ff4f6d19896f21fb6c10dc4998dae4
# good: [eeb1cd39e9e27d89375b33c3a907807fb5adba7e] Merge remote-tracking branch 
'xfs/for-next'
git bisect good eeb1cd39e9e27d89375b33c3a907807fb5adba7e
# good: [b75803d52a2ce1f6cbaf7ae0ae40a369210070cf] tcp: refactor struct 
tcp_skb_cb
git bisect good b75803d52a2ce1f6cbaf7ae0ae40a369210070cf
# good: [c2f40435ab0963284d348993b10ac66de6329b74] Merge remote-tracking branch 
'v4l-dvb/master'
git bisect good c2f40435ab0963284d348993b10ac66de6329b74
# good: [678c657e09034d6f87d254b3183873d6e4a493e4] Merge remote-tracking branch 
'slave-dma/next'
git bisect good 678c657e09034d6f87d254b3183873d6e4a493e4
# good: [6a47a570321fdcd2b6fc9e6537b2a3650d0fd04b] Merge branch 'mlx5-next'
git bisect good 6a47a570321fdcd2b6fc9e6537b2a3650d0fd04b
# good: [06566e5dd4e53f57fc3daa12fb8b5252772d70de] i40e: Refactor ethtool 
get_settings
git bisect good 06566e5dd4e53f57fc3daa12fb8b5252772d70de
# bad: [10cbc6843446165ee250e1ee80dc19ee325f1e6d] net/sched: cls_flower: 
Hardware offloaded filters statistics support
git bisect bad 10cbc6843446165ee250e1ee80dc19ee325f1e6d
# bad: [da47b4572056487fd7941c26f73b3e8815ff712a] phy: add support for a 
reset-gpio specification
git bisect bad da47b4572056487fd7941c26f73b3e8815ff712a
# good: [5049e33b559a44e9f216d86c58c7c7fce6f5df2f] bnxt_en: Add BCM57314 device 
ID.
git bisect good 5049e33b559a44e9f216d86c58c7c7fce6

Re: [PATCH v2 1/9] powerpc/powernv: Move CHECK_HMI_INTERRUPT to exception-64s header

2016-05-17 Thread Gautham R Shenoy

On Tue, May 03, 2016 at 01:54:30PM +0530, Shreyas B. Prabhu wrote:
> CHECK_HMI_INTERRUPT is used to check for HMI's in reset vector. Move
> the macro to a common location (exception-64s.h)
> This patch does not change any functionality.
> 

I suppose this code movement is to facilitate the invocation of
CHECK_HMI_INTERRUPT in some later patch ? In this case you could
add this to the commit message.

Otherwise,
Reviewed-by: Gautham R. Shenoy 
> ---
>  arch/powerpc/include/asm/exception-64s.h | 18 ++
>  arch/powerpc/kernel/idle_power7.S| 20 +---
>  2 files changed, 19 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/exception-64s.h 
> b/arch/powerpc/include/asm/exception-64s.h
> index 93ae809..6a625af 100644
> --- a/arch/powerpc/include/asm/exception-64s.h
> +++ b/arch/powerpc/include/asm/exception-64s.h
> @@ -545,4 +545,22 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP)
>  #define FINISH_NAP
>  #endif
> 
> +#define CHECK_HMI_INTERRUPT  \
> + mfspr   r0,SPRN_SRR1;   \
> +BEGIN_FTR_SECTION_NESTED(66);
> \
> + rlwinm  r0,r0,45-31,0xf;  /* extract wake reason field (P8) */  \
> +FTR_SECTION_ELSE_NESTED(66); \
> + rlwinm  r0,r0,45-31,0xe;  /* P7 wake reason field is 3 bits */  \
> +ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \
> + cmpwi   r0,0xa; /* Hypervisor maintenance ? */  \
> + bne 20f;\
> + /* Invoke opal call to handle hmi */\
> + ld  r2,PACATOC(r13);\
> + ld  r1,PACAR1(r13); \
> + std r3,ORIG_GPR3(r1);   /* Save original r3 */  \
> + li  r0,OPAL_HANDLE_HMI; /* Pass opal token argument*/   \
> + bl  opal_call_realmode; \
> + ld  r3,ORIG_GPR3(r1);   /* Restore original r3 */   \
> +20:  nop;
> +
>  #endif   /* _ASM_POWERPC_EXCEPTION_H */
> diff --git a/arch/powerpc/kernel/idle_power7.S 
> b/arch/powerpc/kernel/idle_power7.S
> index 470ceeb..6b3404b 100644
> --- a/arch/powerpc/kernel/idle_power7.S
> +++ b/arch/powerpc/kernel/idle_power7.S
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #undef DEBUG
> @@ -257,25 +258,6 @@ _GLOBAL(power7_winkle)
>   b   power7_powersave_common
>   /* No return */
> 
> -#define CHECK_HMI_INTERRUPT  \
> - mfspr   r0,SPRN_SRR1;   \
> -BEGIN_FTR_SECTION_NESTED(66);
> \
> - rlwinm  r0,r0,45-31,0xf;  /* extract wake reason field (P8) */  \
> -FTR_SECTION_ELSE_NESTED(66); \
> - rlwinm  r0,r0,45-31,0xe;  /* P7 wake reason field is 3 bits */  \
> -ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \
> - cmpwi   r0,0xa; /* Hypervisor maintenance ? */  \
> - bne 20f;\
> - /* Invoke opal call to handle hmi */\
> - ld  r2,PACATOC(r13);\
> - ld  r1,PACAR1(r13); \
> - std r3,ORIG_GPR3(r1);   /* Save original r3 */  \
> - li  r0,OPAL_HANDLE_HMI; /* Pass opal token argument*/   \
> - bl  opal_call_realmode; \
> - ld  r3,ORIG_GPR3(r1);   /* Restore original r3 */   \
> -20:  nop;
> -
> -
>  _GLOBAL(power7_wakeup_tb_loss)
>   ld  r2,PACATOC(r13);
>   ld  r1,PACAR1(r13)
> -- 
> 2.4.11
>

Re: [PATCH v2 1/9] powerpc/powernv: Move CHECK_HMI_INTERRUPT to exception-64s header

2016-05-17 Thread Gautham R Shenoy

On Tue, May 03, 2016 at 01:54:30PM +0530, Shreyas B. Prabhu wrote:
> CHECK_HMI_INTERRUPT is used to check for HMI's in reset vector. Move
> the macro to a common location (exception-64s.h)
> This patch does not change any functionality.
> 

I suppose this code movement is to facilitate the invocation of
CHECK_HMI_INTERRUPT in some later patch ? In this case you could
add this to the commit message.

Otherwise,
Reviewed-by: Gautham R. Shenoy 
> ---
>  arch/powerpc/include/asm/exception-64s.h | 18 ++
>  arch/powerpc/kernel/idle_power7.S| 20 +---
>  2 files changed, 19 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/exception-64s.h 
> b/arch/powerpc/include/asm/exception-64s.h
> index 93ae809..6a625af 100644
> --- a/arch/powerpc/include/asm/exception-64s.h
> +++ b/arch/powerpc/include/asm/exception-64s.h
> @@ -545,4 +545,22 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP)
>  #define FINISH_NAP
>  #endif
> 
> +#define CHECK_HMI_INTERRUPT  \
> + mfspr   r0,SPRN_SRR1;   \
> +BEGIN_FTR_SECTION_NESTED(66);
> \
> + rlwinm  r0,r0,45-31,0xf;  /* extract wake reason field (P8) */  \
> +FTR_SECTION_ELSE_NESTED(66); \
> + rlwinm  r0,r0,45-31,0xe;  /* P7 wake reason field is 3 bits */  \
> +ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \
> + cmpwi   r0,0xa; /* Hypervisor maintenance ? */  \
> + bne 20f;\
> + /* Invoke opal call to handle hmi */\
> + ld  r2,PACATOC(r13);\
> + ld  r1,PACAR1(r13); \
> + std r3,ORIG_GPR3(r1);   /* Save original r3 */  \
> + li  r0,OPAL_HANDLE_HMI; /* Pass opal token argument*/   \
> + bl  opal_call_realmode; \
> + ld  r3,ORIG_GPR3(r1);   /* Restore original r3 */   \
> +20:  nop;
> +
>  #endif   /* _ASM_POWERPC_EXCEPTION_H */
> diff --git a/arch/powerpc/kernel/idle_power7.S 
> b/arch/powerpc/kernel/idle_power7.S
> index 470ceeb..6b3404b 100644
> --- a/arch/powerpc/kernel/idle_power7.S
> +++ b/arch/powerpc/kernel/idle_power7.S
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #undef DEBUG
> @@ -257,25 +258,6 @@ _GLOBAL(power7_winkle)
>   b   power7_powersave_common
>   /* No return */
> 
> -#define CHECK_HMI_INTERRUPT  \
> - mfspr   r0,SPRN_SRR1;   \
> -BEGIN_FTR_SECTION_NESTED(66);
> \
> - rlwinm  r0,r0,45-31,0xf;  /* extract wake reason field (P8) */  \
> -FTR_SECTION_ELSE_NESTED(66); \
> - rlwinm  r0,r0,45-31,0xe;  /* P7 wake reason field is 3 bits */  \
> -ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \
> - cmpwi   r0,0xa; /* Hypervisor maintenance ? */  \
> - bne 20f;\
> - /* Invoke opal call to handle hmi */\
> - ld  r2,PACATOC(r13);\
> - ld  r1,PACAR1(r13); \
> - std r3,ORIG_GPR3(r1);   /* Save original r3 */  \
> - li  r0,OPAL_HANDLE_HMI; /* Pass opal token argument*/   \
> - bl  opal_call_realmode; \
> - ld  r3,ORIG_GPR3(r1);   /* Restore original r3 */   \
> -20:  nop;
> -
> -
>  _GLOBAL(power7_wakeup_tb_loss)
>   ld  r2,PACATOC(r13);
>   ld  r1,PACAR1(r13)
> -- 
> 2.4.11
>

linux-next: Tree for May 18

2016-05-17 Thread Stephen Rothwell

Hi all,

Please do not add any v4.8 destined material to your linux-next included
branches until after v4.7-rc1 has been released.

Changes since 20160517:

New tree: dax-misc

The dax-misc tree gained a conflict against the nvdimm tree.

The akpm-current tree gained a conflict against the dax-misc tree.

Non-merge commits (relative to Linus' tree): 8785
 7390 files changed, 389214 insertions(+), 158994 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 236 trees (counting Linus' and 35 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (7f427d3a6029 Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs)
Merging fixes/master (b507146bb6b9 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging kbuild-current/rc-fixes (3d1450d54a4f Makefile: Force gzip and xz on 
module install)
Merging arc-current/for-curr (44549e8f5eea Linux 4.6-rc7)
Merging arm-current/fixes (ec953b70f368 ARM: 8573/1: domain: move 
{set,get}_domain under config guard)
Merging m68k-current/for-linus (9a6462763b17 m68k/mvme16x: Include generic 
)
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging powerpc-fixes/fixes (b4c112114aab powerpc: Fix bad inline asm 
constraint in create_zero_mask())
Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2)
Merging sparc/master (33656a1f2ee5 Merge branch 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs)
Merging net/master (2dcd0af568b0 Linux 4.6)
Merging ipsec/master (d6af1a31cc72 vti: Add pmtu handling to vti_xmit.)
Merging ipvs/master (f28f20da704d Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging wireless-drivers/master (cbbba30f1ac9 Merge tag 
'iwlwifi-for-kalle-2016-05-04' of 
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging mac80211/master (e6436be21e77 mac80211: fix statistics leak if 
dev_alloc_name() fails)
Merging sound-current/for-linus (c7c5856b6f6f sound: oss: Use setup_timer and 
mod_timer.)
Merging pci-current/for-linus (9a2a5a638f8e PCI: Do not treat EPROBE_DEFER as 
device attach failure)
Merging driver-core.current/driver-core-linus (c3b46c73264b Linux 4.6-rc4)
Merging tty.current/tty-linus (44549e8f5eea Linux 4.6-rc7)
Merging usb.current/usb-linus (44549e8f5eea Linux 4.6-rc7)
Merging usb-gadget-fixes/fixes (38740a5b87d5 usb: gadget: f_fs: Fix 
use-after-free)
Merging usb-serial-fixes/usb-linus (74d2a91aec97 USB: serial: option: add even 
more ZTE device ids)
Merging usb-chipidea-fixes/ci-for-usb-stable (d144dfea8af7 usb: chipidea: otg: 
change workqueue ci_otg as freezable)
Merging staging.current/staging-linus (44549e8f5eea Linux 4.6-rc7)
Merging char-misc.current/char-misc-linus (44549e8f5eea Linux 4.6-rc7)
Merging input-current/for-linus (23ea5967d6bd Merge branch 'next' into 
for-linus)
Merging crypto-current/master (4a6b27b79da5 crypto: sha1-mb - make 
sha1_x8_avx2() conform to C function ABI)
Merging ide/master (1993b176a822 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide)
Merging devicetree-current/devicetree/merge (f76502aa9140 of/dynamic: Fix test 
for PPC_PSERIES)
Merging rr-fixes/fixes (8244062ef1e5 modules: fix longstanding /proc/kallsyms 
vs module insertion race.)
Merging vfio-fixes/for-linus (8160c4e45582 v

linux-next: Tree for May 18

2016-05-17 Thread Stephen Rothwell

Hi all,

Please do not add any v4.8 destined material to your linux-next included
branches until after v4.7-rc1 has been released.

Changes since 20160517:

New tree: dax-misc

The dax-misc tree gained a conflict against the nvdimm tree.

The akpm-current tree gained a conflict against the dax-misc tree.

Non-merge commits (relative to Linus' tree): 8785
 7390 files changed, 389214 insertions(+), 158994 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
(this fails its final link) and pseries_le_defconfig and i386, sparc
and sparc64 defconfig.

Below is a summary of the state of the merge.

I am currently merging 236 trees (counting Linus' and 35 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (7f427d3a6029 Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs)
Merging fixes/master (b507146bb6b9 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging kbuild-current/rc-fixes (3d1450d54a4f Makefile: Force gzip and xz on 
module install)
Merging arc-current/for-curr (44549e8f5eea Linux 4.6-rc7)
Merging arm-current/fixes (ec953b70f368 ARM: 8573/1: domain: move 
{set,get}_domain under config guard)
Merging m68k-current/for-linus (9a6462763b17 m68k/mvme16x: Include generic 
)
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging powerpc-fixes/fixes (b4c112114aab powerpc: Fix bad inline asm 
constraint in create_zero_mask())
Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2)
Merging sparc/master (33656a1f2ee5 Merge branch 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs)
Merging net/master (2dcd0af568b0 Linux 4.6)
Merging ipsec/master (d6af1a31cc72 vti: Add pmtu handling to vti_xmit.)
Merging ipvs/master (f28f20da704d Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging wireless-drivers/master (cbbba30f1ac9 Merge tag 
'iwlwifi-for-kalle-2016-05-04' of 
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging mac80211/master (e6436be21e77 mac80211: fix statistics leak if 
dev_alloc_name() fails)
Merging sound-current/for-linus (c7c5856b6f6f sound: oss: Use setup_timer and 
mod_timer.)
Merging pci-current/for-linus (9a2a5a638f8e PCI: Do not treat EPROBE_DEFER as 
device attach failure)
Merging driver-core.current/driver-core-linus (c3b46c73264b Linux 4.6-rc4)
Merging tty.current/tty-linus (44549e8f5eea Linux 4.6-rc7)
Merging usb.current/usb-linus (44549e8f5eea Linux 4.6-rc7)
Merging usb-gadget-fixes/fixes (38740a5b87d5 usb: gadget: f_fs: Fix 
use-after-free)
Merging usb-serial-fixes/usb-linus (74d2a91aec97 USB: serial: option: add even 
more ZTE device ids)
Merging usb-chipidea-fixes/ci-for-usb-stable (d144dfea8af7 usb: chipidea: otg: 
change workqueue ci_otg as freezable)
Merging staging.current/staging-linus (44549e8f5eea Linux 4.6-rc7)
Merging char-misc.current/char-misc-linus (44549e8f5eea Linux 4.6-rc7)
Merging input-current/for-linus (23ea5967d6bd Merge branch 'next' into 
for-linus)
Merging crypto-current/master (4a6b27b79da5 crypto: sha1-mb - make 
sha1_x8_avx2() conform to C function ABI)
Merging ide/master (1993b176a822 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide)
Merging devicetree-current/devicetree/merge (f76502aa9140 of/dynamic: Fix test 
for PPC_PSERIES)
Merging rr-fixes/fixes (8244062ef1e5 modules: fix longstanding /proc/kallsyms 
vs module insertion race.)
Merging vfio-fixes/for-linus (8160c4e45582 v

Re: [PATCH] mmc: dw_mmc: Consider HLE errors to be data and command errors

2016-05-17 Thread Doug Anderson

Hi,

On Tue, May 17, 2016 at 7:08 PM, Jaehoon Chung  wrote:
> On 05/18/2016 09:47 AM, Doug Anderson wrote:
>> Jaehoon,
>>
>> On Mon, Mar 30, 2015 at 8:47 AM, Doug Anderson  wrote:
>>> Jaehoon,
>>>
>>> On Sun, Mar 29, 2015 at 5:55 PM, Jaehoon Chung  
>>> wrote:
 Dear Doug,

 I'm considering to control HLE error..So holding this patch.
 If this is absolutely necessary patch, let me know, plz.

 Best Regards,
 Jaehoon Chung
>>>
>>> Sounds OK.  I have certainly applied this locally and the driver isn't
>>> robust against insertions / removals without it, but once the card is
>>> inserted things are OK so it's probably not urgent that it be applied
>>> upstream.  Hopefully we can figure out a better solution...
>>
>> I'm now testing a nice new rebased kernel and I'm hitting this again.
>>
>> Of course I'll just pick my same patch to my new kernel tree, but
>> since it's been a year and nobody has done anything better, would you
>> consider landing my patch?  It is certainly better than nothing.
>
> Sure, it's right.
> I think that main reason of HLE is wait_prvdata_complete. (I'm guessing..)
> On other hands, dwmmc controller is handling something wrong. (I found that 
> HLE is occurred the similar case.)
> After find the main solution, it's not bad that your patch is applied on 
> dwmmc controller.
>
> Ulf have sent PR for next..So if we needs to apply this, i will apply on fix.

It's not new, so I'd say just queue it up for the next version
whenever it's convenient.

-Doug

Re: [PATCH] mmc: dw_mmc: Consider HLE errors to be data and command errors

2016-05-17 Thread Doug Anderson

Hi,

On Tue, May 17, 2016 at 7:08 PM, Jaehoon Chung  wrote:
> On 05/18/2016 09:47 AM, Doug Anderson wrote:
>> Jaehoon,
>>
>> On Mon, Mar 30, 2015 at 8:47 AM, Doug Anderson  wrote:
>>> Jaehoon,
>>>
>>> On Sun, Mar 29, 2015 at 5:55 PM, Jaehoon Chung  
>>> wrote:
 Dear Doug,

 I'm considering to control HLE error..So holding this patch.
 If this is absolutely necessary patch, let me know, plz.

 Best Regards,
 Jaehoon Chung
>>>
>>> Sounds OK.  I have certainly applied this locally and the driver isn't
>>> robust against insertions / removals without it, but once the card is
>>> inserted things are OK so it's probably not urgent that it be applied
>>> upstream.  Hopefully we can figure out a better solution...
>>
>> I'm now testing a nice new rebased kernel and I'm hitting this again.
>>
>> Of course I'll just pick my same patch to my new kernel tree, but
>> since it's been a year and nobody has done anything better, would you
>> consider landing my patch?  It is certainly better than nothing.
>
> Sure, it's right.
> I think that main reason of HLE is wait_prvdata_complete. (I'm guessing..)
> On other hands, dwmmc controller is handling something wrong. (I found that 
> HLE is occurred the similar case.)
> After find the main solution, it's not bad that your patch is applied on 
> dwmmc controller.
>
> Ulf have sent PR for next..So if we needs to apply this, i will apply on fix.

It's not new, so I'd say just queue it up for the next version
whenever it's convenient.

-Doug

Re: [PATCH] mmc: dw_mmc: Consider HLE errors to be data and command errors

2016-05-17 Thread Doug Anderson

Hi,

On Tue, May 17, 2016 at 6:59 PM, Shawn Lin
 wrote:
> Could you try this patch to see if you can still find HLE?
>
> @@ -2356,12 +2356,22 @@ static void dw_mci_cmd_interrupt(struct dw_mci
> *host, u32 status)
>  static void dw_mci_handle_cd(struct dw_mci *host)
>  {
> int i;
> +   int present;
>
> for (i = 0; i < host->num_slots; i++) {
> struct dw_mci_slot *slot = host->slot[i];
>
> if (!slot)
> continue;
>
> +   present = !(mci_readl(slot->host, CDETECT) & (1 <<
> slot->id));
> +   if (present)
> +   set_bit(DW_MMC_CARD_PRESENT, >flags);
> +   else
> +   clear_bit(DW_MMC_CARD_PRESENT, >flags);

No, because we don't use the builtin card detect on veyron.  ;)

We use GPIO card detect because we didn't like the way JTAG and SD
interacted.  Also on rk3288 the builtin card detect line had the wrong
voltage domain (you couldn't detect a card when the IO lines were
powered off).  The builtin card detect line is always driven low on
veyron.

I'm nearly certain that the root cause of my HLE errors is actually
related to the same problem addressed by the commit 7c5209c315ea
("mmc: core: Increase delay for voltage to stabilize from 3.3V to
1.8V").  I think that on minnie we're still on the hairy edge and
sometimes the line doesn't transition fast enough.

It appears that increasing this to 30ms avoids the HLE errors.

I _think_ I can actually fully fix this properly by temporarily
engaging the internal pull-ups while the voltage switch is happening.
This will bleed away the voltage just a little bit faster (since lines
are driven low here).  I'll try to confirm that.

In any case, it seems like we should take this patch since (without
this patch) the failure case when you get HLE errors is that the
interrupt controller fires over and over again (with no printouts) and
your system stalls with no error messages.

-Doug

Re: [PATCH] mmc: dw_mmc: Consider HLE errors to be data and command errors

2016-05-17 Thread Doug Anderson

Hi,

On Tue, May 17, 2016 at 6:59 PM, Shawn Lin
 wrote:
> Could you try this patch to see if you can still find HLE?
>
> @@ -2356,12 +2356,22 @@ static void dw_mci_cmd_interrupt(struct dw_mci
> *host, u32 status)
>  static void dw_mci_handle_cd(struct dw_mci *host)
>  {
> int i;
> +   int present;
>
> for (i = 0; i < host->num_slots; i++) {
> struct dw_mci_slot *slot = host->slot[i];
>
> if (!slot)
> continue;
>
> +   present = !(mci_readl(slot->host, CDETECT) & (1 <<
> slot->id));
> +   if (present)
> +   set_bit(DW_MMC_CARD_PRESENT, >flags);
> +   else
> +   clear_bit(DW_MMC_CARD_PRESENT, >flags);

No, because we don't use the builtin card detect on veyron.  ;)

We use GPIO card detect because we didn't like the way JTAG and SD
interacted.  Also on rk3288 the builtin card detect line had the wrong
voltage domain (you couldn't detect a card when the IO lines were
powered off).  The builtin card detect line is always driven low on
veyron.

I'm nearly certain that the root cause of my HLE errors is actually
related to the same problem addressed by the commit 7c5209c315ea
("mmc: core: Increase delay for voltage to stabilize from 3.3V to
1.8V").  I think that on minnie we're still on the hairy edge and
sometimes the line doesn't transition fast enough.

It appears that increasing this to 30ms avoids the HLE errors.

I _think_ I can actually fully fix this properly by temporarily
engaging the internal pull-ups while the voltage switch is happening.
This will bleed away the voltage just a little bit faster (since lines
are driven low here).  I'll try to confirm that.

In any case, it seems like we should take this patch since (without
this patch) the failure case when you get HLE errors is that the
interrupt controller fires over and over again (with no printouts) and
your system stalls with no error messages.

-Doug

Re: QRTR merge conflict resolution

2016-05-17 Thread Bjorn Andersson

On Tue 17 May 17:43 PDT 2016, Stephen Rothwell wrote:

> Hi David,
> 
> On Tue, 17 May 2016 14:11:54 -0400 (EDT) David Miller  
> wrote:
> >
> > From: Bjorn Andersson 
> > Date: Fri, 13 May 2016 15:19:09 -0700
> > 
> > > I have prepared the merge of net-next and the conflicting tag from the
> > > Qualcomm SOC, please include this in your pull towards Linus to avoid
> > > the merge conflict.  
> > 
> > Pulled, thanks.
> 
> Except in the merge resolution, the 2 new functions added to
> include/linux/soc/qcom/smd.h (qcom_smd_get_drvdata and
> qcom_smd_set_drvdata) were not marked "static inline" :-(
> 

How silly of me to miss that, sorry about that.

I didn't spot this in my compile testing either, because this is the
only driver in the tree including that file that doesn't depend on
QCOM_SMD.

As there is no immediate problem with moving forward I suggest that I'll
fix this, through arm-soc, once the code has landed.

Regards,
Bjorn

Re: QRTR merge conflict resolution

2016-05-17 Thread Bjorn Andersson

On Tue 17 May 17:43 PDT 2016, Stephen Rothwell wrote:

> Hi David,
> 
> On Tue, 17 May 2016 14:11:54 -0400 (EDT) David Miller  
> wrote:
> >
> > From: Bjorn Andersson 
> > Date: Fri, 13 May 2016 15:19:09 -0700
> > 
> > > I have prepared the merge of net-next and the conflicting tag from the
> > > Qualcomm SOC, please include this in your pull towards Linus to avoid
> > > the merge conflict.  
> > 
> > Pulled, thanks.
> 
> Except in the merge resolution, the 2 new functions added to
> include/linux/soc/qcom/smd.h (qcom_smd_get_drvdata and
> qcom_smd_set_drvdata) were not marked "static inline" :-(
> 

How silly of me to miss that, sorry about that.

I didn't spot this in my compile testing either, because this is the
only driver in the tree including that file that doesn't depend on
QCOM_SMD.

As there is no immediate problem with moving forward I suggest that I'll
fix this, through arm-soc, once the code has landed.

Regards,
Bjorn

Re: [PATCH] sched/cputime: add steal time support to full dynticks CPU time accounting

2016-05-17 Thread Rik van Riel

On Tue, 2016-05-10 at 13:34 +0800, Wanpeng Li wrote:
> From: Wanpeng Li 
> 
> This patch adds steal guest time support to full dynticks CPU time 
> accounting. After commit ff9a9b4c(sched, time: Switch
> VIRT_CPU_ACCOUNTING_GEN 
> to jiffy granularity), time is jiffy based sampling even if it's 
> still listened to ring boundaries, so steal_account_process_tick() 
> is reused to account how much 'ticks' are steal time after the 
> last accumulation. 
> 
> Suggested-by: Rik van Riel 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra (Intel) 
> Cc: Rik van Riel 
> Cc: Thomas Gleixner 
> Cc: Frederic Weisbecker 
> Cc: Paolo Bonzini 
> Cc: Radim 
> Signed-off-by: Wanpeng Li 
> 

Acked-by: Rik van Riel 

-- 
All Rights Reversed.



signature.asc
Description: This is a digitally signed message part

Re: [PATCH] sched/cputime: add steal time support to full dynticks CPU time accounting

2016-05-17 Thread Rik van Riel

On Tue, 2016-05-10 at 13:34 +0800, Wanpeng Li wrote:
> From: Wanpeng Li 
> 
> This patch adds steal guest time support to full dynticks CPU time 
> accounting. After commit ff9a9b4c(sched, time: Switch
> VIRT_CPU_ACCOUNTING_GEN 
> to jiffy granularity), time is jiffy based sampling even if it's 
> still listened to ring boundaries, so steal_account_process_tick() 
> is reused to account how much 'ticks' are steal time after the 
> last accumulation. 
> 
> Suggested-by: Rik van Riel 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra (Intel) 
> Cc: Rik van Riel 
> Cc: Thomas Gleixner 
> Cc: Frederic Weisbecker 
> Cc: Paolo Bonzini 
> Cc: Radim 
> Signed-off-by: Wanpeng Li 
> 

Acked-by: Rik van Riel 

-- 
All Rights Reversed.



signature.asc
Description: This is a digitally signed message part

Re: [PATCH v12 05/10] arm64: Kprobes with single stepping support

2016-05-17 Thread Masami Hiramatsu

On Thu, 12 May 2016 16:01:54 +0100
James Morse  wrote:

> Hi David, Sandeepa,
> 
> On 27/04/16 19:53, David Long wrote:
> > From: Sandeepa Prabhu 
> 
> > diff --git a/arch/arm64/kernel/kprobes.c b/arch/arm64/kernel/kprobes.c
> > new file mode 100644
> > index 000..dfa1b1f
> > --- /dev/null
> > +++ b/arch/arm64/kernel/kprobes.c
> > @@ -0,0 +1,520 @@
> > +/*
> > + * arch/arm64/kernel/kprobes.c
> > + *
> > + * Kprobes support for ARM64
> > + *
> > + * Copyright (C) 2013 Linaro Limited.
> > + * Author: Sandeepa Prabhu 
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * General Public License for more details.
> > + *
> > + */
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "kprobes-arm64.h"
> > +
> > +#define MIN_STACK_SIZE(addr)   min((unsigned long)MAX_STACK_SIZE,  
> > \
> > +   (unsigned long)current_thread_info() + THREAD_START_SP - (addr))
> 
> What if we probe something called on the irq stack?
> This needs the on_irq_stack() checks too, the start/end can be found from the
> per-cpu irq_stack value.
> 
> [ ... ]
> 
> > +int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
> > +{
> > +   struct jprobe *jp = container_of(p, struct jprobe, kp);
> > +   struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
> > +   long stack_ptr = kernel_stack_pointer(regs);
> > +
> > +   kcb->jprobe_saved_regs = *regs;
> > +   memcpy(kcb->jprobes_stack, (void *)stack_ptr,
> > +  MIN_STACK_SIZE(stack_ptr));
> 
> I wonder if we need this stack save/restore?
> 
> The comment next to the equivalent code for x86 says:
> > gcc assumes that the callee owns the argument space and could overwrite it,
> > e.g. tailcall optimization. So, to be absolutely safe we also save and
> > restore enough stack bytes to cover the argument area.
> 
> On arm64 the first eight arguments are passed in registers, so we might not 
> need
> this stack copy. (sparc and powerpc work like this too, their versions of this
> function don't copy chunks of the stack).

Hmm, maybe sparc and powerpc implementation should also be fixed...

> ... then I went looking for functions with >8 arguments...
> 
> Looking at the arm64 defconfig dwarf debug data, there are 71 of these that
> don't get inlined, picking at random:
> > rockchip_clk_register_pll() has 13
> > fib_dump_info() has 11
> > vma_merge() has 10
> > vring_create_virtqueue() has 10
> etc...
> 
> So we do need this stack copying, so that we can probe these function without
> risking the arguments being modified.
> 
> It may be worth including a comment to the effect that this stack save/restore
> is needed for functions that pass >8 arguments where the pre-handler may 
> change
> these values on the stack.

Indeed, commenting on this code can help us to understand the reason why.

Thank you!

> 
> 
> > +   preempt_enable_no_resched();
> > +   return 1;
> > +}
> > +
> 
> 
> Thanks,
> 
> James


-- 
Masami Hiramatsu

Re: [PATCH v12 05/10] arm64: Kprobes with single stepping support

2016-05-17 Thread Masami Hiramatsu

On Thu, 12 May 2016 16:01:54 +0100
James Morse  wrote:

> Hi David, Sandeepa,
> 
> On 27/04/16 19:53, David Long wrote:
> > From: Sandeepa Prabhu 
> 
> > diff --git a/arch/arm64/kernel/kprobes.c b/arch/arm64/kernel/kprobes.c
> > new file mode 100644
> > index 000..dfa1b1f
> > --- /dev/null
> > +++ b/arch/arm64/kernel/kprobes.c
> > @@ -0,0 +1,520 @@
> > +/*
> > + * arch/arm64/kernel/kprobes.c
> > + *
> > + * Kprobes support for ARM64
> > + *
> > + * Copyright (C) 2013 Linaro Limited.
> > + * Author: Sandeepa Prabhu 
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * General Public License for more details.
> > + *
> > + */
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "kprobes-arm64.h"
> > +
> > +#define MIN_STACK_SIZE(addr)   min((unsigned long)MAX_STACK_SIZE,  
> > \
> > +   (unsigned long)current_thread_info() + THREAD_START_SP - (addr))
> 
> What if we probe something called on the irq stack?
> This needs the on_irq_stack() checks too, the start/end can be found from the
> per-cpu irq_stack value.
> 
> [ ... ]
> 
> > +int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
> > +{
> > +   struct jprobe *jp = container_of(p, struct jprobe, kp);
> > +   struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
> > +   long stack_ptr = kernel_stack_pointer(regs);
> > +
> > +   kcb->jprobe_saved_regs = *regs;
> > +   memcpy(kcb->jprobes_stack, (void *)stack_ptr,
> > +  MIN_STACK_SIZE(stack_ptr));
> 
> I wonder if we need this stack save/restore?
> 
> The comment next to the equivalent code for x86 says:
> > gcc assumes that the callee owns the argument space and could overwrite it,
> > e.g. tailcall optimization. So, to be absolutely safe we also save and
> > restore enough stack bytes to cover the argument area.
> 
> On arm64 the first eight arguments are passed in registers, so we might not 
> need
> this stack copy. (sparc and powerpc work like this too, their versions of this
> function don't copy chunks of the stack).

Hmm, maybe sparc and powerpc implementation should also be fixed...

> ... then I went looking for functions with >8 arguments...
> 
> Looking at the arm64 defconfig dwarf debug data, there are 71 of these that
> don't get inlined, picking at random:
> > rockchip_clk_register_pll() has 13
> > fib_dump_info() has 11
> > vma_merge() has 10
> > vring_create_virtqueue() has 10
> etc...
> 
> So we do need this stack copying, so that we can probe these function without
> risking the arguments being modified.
> 
> It may be worth including a comment to the effect that this stack save/restore
> is needed for functions that pass >8 arguments where the pre-handler may 
> change
> these values on the stack.

Indeed, commenting on this code can help us to understand the reason why.

Thank you!

> 
> 
> > +   preempt_enable_no_resched();
> > +   return 1;
> > +}
> > +
> 
> 
> Thanks,
> 
> James


-- 
Masami Hiramatsu

linux-next: manual merge of the akpm-current tree with the dax-misc tree

2016-05-17 Thread Stephen Rothwell

Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  include/linux/dax.h

between commit:

  ecdb4bf9e327 ("dax: export a low-level __dax_zero_page_range helper")

from the dax-misc tree and commit:

  29d44f6759f6 ("dax: add dax_get_unmapped_area for pmd mappings")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/linux/dax.h
index 7743e51f826c,184b1714900c..
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@@ -14,19 -17,15 +14,22 @@@ int __dax_fault(struct vm_area_struct *
  
  #ifdef CONFIG_FS_DAX
  struct page *read_dax_sector(struct block_device *bdev, sector_t n);
 +int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
 +  unsigned int offset, unsigned int length);
+ unsigned long dax_get_unmapped_area(struct file *filp, unsigned long addr,
+   unsigned long len, unsigned long pgoff, unsigned long flags);
  #else
  static inline struct page *read_dax_sector(struct block_device *bdev,
sector_t n)
  {
return ERR_PTR(-ENXIO);
  }
 +static inline int __dax_zero_page_range(struct block_device *bdev,
 +  sector_t sector, unsigned int offset, unsigned int length)
 +{
 +  return -ENXIO;
 +}
+ #define dax_get_unmapped_area NULL
  #endif
  
  #ifdef CONFIG_TRANSPARENT_HUGEPAGE

linux-next: manual merge of the akpm-current tree with the dax-misc tree

2016-05-17 Thread Stephen Rothwell

Hi Andrew,

Today's linux-next merge of the akpm-current tree got a conflict in:

  include/linux/dax.h

between commit:

  ecdb4bf9e327 ("dax: export a low-level __dax_zero_page_range helper")

from the dax-misc tree and commit:

  29d44f6759f6 ("dax: add dax_get_unmapped_area for pmd mappings")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/linux/dax.h
index 7743e51f826c,184b1714900c..
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@@ -14,19 -17,15 +14,22 @@@ int __dax_fault(struct vm_area_struct *
  
  #ifdef CONFIG_FS_DAX
  struct page *read_dax_sector(struct block_device *bdev, sector_t n);
 +int __dax_zero_page_range(struct block_device *bdev, sector_t sector,
 +  unsigned int offset, unsigned int length);
+ unsigned long dax_get_unmapped_area(struct file *filp, unsigned long addr,
+   unsigned long len, unsigned long pgoff, unsigned long flags);
  #else
  static inline struct page *read_dax_sector(struct block_device *bdev,
sector_t n)
  {
return ERR_PTR(-ENXIO);
  }
 +static inline int __dax_zero_page_range(struct block_device *bdev,
 +  sector_t sector, unsigned int offset, unsigned int length)
 +{
 +  return -ENXIO;
 +}
+ #define dax_get_unmapped_area NULL
  #endif
  
  #ifdef CONFIG_TRANSPARENT_HUGEPAGE

[PATCH] MM: increase safety margin provided by PF_LESS_THROTTLE

2016-05-17 Thread NeilBrown


When nfsd is exporting a filesystem over NFS which is then NFS-mounted
on the local machine there is a risk of deadlock.  This happens when
there are lots of dirty pages in the NFS filesystem and they cause
NFSD to be throttled, either in throttle_vm_writeout() or in
balance_dirty_pages().

To avoid this problem the PF_LESS_THROTTLE flag is set for NFSD
threads and it provides a 25% increase to the limits that affect NFSD.
Any process writing to an NFS filesystem will be throttled well
before the number of dirty NFS pages reaches the limit imposed on
NFSD, so NFSD will not deadlock on pages that it needs to write out.
At least it shouldn't.

All processes are allowed a small excess margin to avoid performing
too many calculations: ratelimit_pages.

ratelimit_pages is set so that if a thread on every CPU uses the
entire margin, the total will only go 3% over the limit, and this is
much less than the 25% bonus that PF_LESS_THROTTLE provides, so this
margin shouldn't be a problem.  But it is.

The "total memory" that these 3% and 25% are calculated against are not
really total memory but are "global_dirtyable_memory()" which doesn't
include anonymous memory, just free memory and page-cache memory.

The "ratelimit_pages" number is based on whatever the
global_dirtyable_memory was on the last CPU hot-plug, which might not
be what you expect, but is probably close to the total freeable memory.

The throttle threshold uses the global_dirtable_memory at the moment
when the throttling happens, which could be much less than at the last
CPU hotplug.  So if lots of anonymous memory has been allocated, thus
pushing out lots of page-cache pages, then NFSD might end up being
throttled due to dirty NFS pages because the "25%" bonus it gets is
calculated against a rather small amount of dirtyable memory, while
the "3%" margin that other processes are allowed to dirty without
penalty is calculated against a much larger number.

To remove this possibility of deadlock we need to make sure that the
margin granted to PF_LESS_THROTTLE exceeds that rate-limit margin.
Simply adding ratelimit_pages isn't enough as that should be
multiplied by the number of cpus.

So add "global_wb_domain.dirty_limit / 32" as that more accurately
reflects the current total over-shoot margin.  This ensures that the
number of dirty NFS pages never gets so high that nfsd will be
throttled waiting for them to be written.

Signed-off-by: NeilBrown 

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index bc5149d5ec38..bbdcd7ccef57 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -407,8 +407,8 @@ static void domain_dirty_limits(struct 
dirty_throttle_control *dtc)
bg_thresh = thresh / 2;
tsk = current;
if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
-   bg_thresh += bg_thresh / 4;
-   thresh += thresh / 4;
+   bg_thresh += bg_thresh / 4 + global_wb_domain.dirty_limit / 32;
+   thresh += thresh / 4 + global_wb_domain.dirty_limit / 32;
}
dtc->thresh = thresh;
dtc->bg_thresh = bg_thresh;


signature.asc
Description: PGP signature

[PATCH] MM: increase safety margin provided by PF_LESS_THROTTLE

2016-05-17 Thread NeilBrown


When nfsd is exporting a filesystem over NFS which is then NFS-mounted
on the local machine there is a risk of deadlock.  This happens when
there are lots of dirty pages in the NFS filesystem and they cause
NFSD to be throttled, either in throttle_vm_writeout() or in
balance_dirty_pages().

To avoid this problem the PF_LESS_THROTTLE flag is set for NFSD
threads and it provides a 25% increase to the limits that affect NFSD.
Any process writing to an NFS filesystem will be throttled well
before the number of dirty NFS pages reaches the limit imposed on
NFSD, so NFSD will not deadlock on pages that it needs to write out.
At least it shouldn't.

All processes are allowed a small excess margin to avoid performing
too many calculations: ratelimit_pages.

ratelimit_pages is set so that if a thread on every CPU uses the
entire margin, the total will only go 3% over the limit, and this is
much less than the 25% bonus that PF_LESS_THROTTLE provides, so this
margin shouldn't be a problem.  But it is.

The "total memory" that these 3% and 25% are calculated against are not
really total memory but are "global_dirtyable_memory()" which doesn't
include anonymous memory, just free memory and page-cache memory.

The "ratelimit_pages" number is based on whatever the
global_dirtyable_memory was on the last CPU hot-plug, which might not
be what you expect, but is probably close to the total freeable memory.

The throttle threshold uses the global_dirtable_memory at the moment
when the throttling happens, which could be much less than at the last
CPU hotplug.  So if lots of anonymous memory has been allocated, thus
pushing out lots of page-cache pages, then NFSD might end up being
throttled due to dirty NFS pages because the "25%" bonus it gets is
calculated against a rather small amount of dirtyable memory, while
the "3%" margin that other processes are allowed to dirty without
penalty is calculated against a much larger number.

To remove this possibility of deadlock we need to make sure that the
margin granted to PF_LESS_THROTTLE exceeds that rate-limit margin.
Simply adding ratelimit_pages isn't enough as that should be
multiplied by the number of cpus.

So add "global_wb_domain.dirty_limit / 32" as that more accurately
reflects the current total over-shoot margin.  This ensures that the
number of dirty NFS pages never gets so high that nfsd will be
throttled waiting for them to be written.

Signed-off-by: NeilBrown 

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index bc5149d5ec38..bbdcd7ccef57 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -407,8 +407,8 @@ static void domain_dirty_limits(struct 
dirty_throttle_control *dtc)
bg_thresh = thresh / 2;
tsk = current;
if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
-   bg_thresh += bg_thresh / 4;
-   thresh += thresh / 4;
+   bg_thresh += bg_thresh / 4 + global_wb_domain.dirty_limit / 32;
+   thresh += thresh / 4 + global_wb_domain.dirty_limit / 32;
}
dtc->thresh = thresh;
dtc->bg_thresh = bg_thresh;


signature.asc
Description: PGP signature

Re: [GIT] Networking

2016-05-17 Thread Emmanuel Grumbach

On Wed, May 18, 2016 at 4:00 AM, Linus Torvalds
 wrote:
> On Tue, May 17, 2016 at 12:11 PM, David Miller  wrote:
>>
>> Highlights:
>
> Lowlights:
>
>  1) the iwlwifi driver seems to be broken
>
> My laptop that uses the intel 7680 iwlwifi module no longer connects
> to the network. It fails with a "Microcode SW error detected." and
> spews out register state over and over again.

Can we have the register state and the ASSERT / NMI / whatever that
goes along with it?
This clearly means that the firmware is crashing, but I don't know why,
I copied here the lines that I need from another bug with another
device with another firmware,
but the log that we will still explain what I need:

[  800.880402] iwlwifi :02:00.0: Start IWL Error Log Dump:
[  800.880406] iwlwifi :02:00.0: Status: 0x, count: 6
[  800.880409] iwlwifi :02:00.0: Loaded firmware version: 21.311951.0
[  800.880413] iwlwifi :02:00.0: 0x0394 | ADVANCED_SYSASSERT
[  800.880416] iwlwifi :02:00.0: 0x0220 | trm_hw_status0
[  800.880419] iwlwifi :02:00.0: 0x | trm_hw_status1
[  800.880422] iwlwifi :02:00.0: 0x0BD8 | branchlink2
[  800.880425] iwlwifi :02:00.0: 0x00026AC4 | interruptlink1
[  800.880428] iwlwifi :02:00.0: 0x | interruptlink2
[  800.880431] iwlwifi :02:00.0: 0x0001 | data1
[  800.880434] iwlwifi :02:00.0: 0x02039845 | data2
[  800.880437] iwlwifi :02:00.0: 0x0056 | data3
[  800.880440] iwlwifi :02:00.0: 0x8E4184A7 | beacon time
[  800.880443] iwlwifi :02:00.0: 0x30E2CB41 | tsf low
[  800.880446] iwlwifi :02:00.0: 0x0027 | tsf hi
[  800.880449] iwlwifi :02:00.0: 0x | time gp1
[  800.880451] iwlwifi :02:00.0: 0x2F842F8A | time gp2
[  800.880454] iwlwifi :02:00.0: 0x | uCode revision type
[  800.880457] iwlwifi :02:00.0: 0x0015 | uCode version major
[  800.880460] iwlwifi :02:00.0: 0x0004C28F | uCode version minor
[  800.880463] iwlwifi :02:00.0: 0x0201 | hw version
[  800.880466] iwlwifi :02:00.0: 0x00489008 | board version
[  800.880469] iwlwifi :02:00.0: 0x001C | hcmd
[  800.880472] iwlwifi :02:00.0: 0x24022000 | isr0
[  800.880475] iwlwifi :02:00.0: 0x0100 | isr1
[  800.880478] iwlwifi :02:00.0: 0x580A | isr2
[  800.880481] iwlwifi :02:00.0: 0x4041FCC1 | isr3
[  800.880483] iwlwifi :02:00.0: 0x | isr4
[  800.880486] iwlwifi :02:00.0: 0x00800110 | last cmd Id
[  800.880489] iwlwifi :02:00.0: 0x | wait_event
[  800.880492] iwlwifi :02:00.0: 0x02C8 | l2p_control
[  800.880495] iwlwifi :02:00.0: 0x00018030 | l2p_duration
[  800.880498] iwlwifi :02:00.0: 0x00BF | l2p_mhvalid
[  800.880501] iwlwifi :02:00.0: 0x00EF | l2p_addr_match
[  800.880503] iwlwifi :02:00.0: 0x000D | lmpm_pmg_sel
[  800.880506] iwlwifi :02:00.0: 0x30031805 | timestamp
[  800.880509] iwlwifi :02:00.0: 0xE0F0 | flow_handler




>
> The last thing it says before falling over is:
>
>   wlp1s0: authenticate with xx:xx:xx:xx:xx:xx
>   wlp1s0: send auth to xx:xx:xx:xx:xx:xx (try 1/3)
>   wlp1s0: send auth to xx:xx:xx:xx:xx:xx (try 2/3)
>
> and then it goes all titsup.
>
> I thought that it might be because I had downloaded one of the daily
> firmware versions (it calls itself iwlwifi-7260-17.ucode, but isn't a
> real release afaik - but it has worked fien for me before), but the
> problem persists with the ver-16 ucode too, so that wasn't it.
>
> I haven't bisected it, but there is absolutely nothing odd in my hardware.
>
> I do have a 802.11ac network, which apparently not everybody does,
> judging by previous bug-reports of mine..
>
> Intel iwlwifi people: please check this out.
>
>Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [GIT] Networking

2016-05-17 Thread Emmanuel Grumbach

On Wed, May 18, 2016 at 4:00 AM, Linus Torvalds
 wrote:
> On Tue, May 17, 2016 at 12:11 PM, David Miller  wrote:
>>
>> Highlights:
>
> Lowlights:
>
>  1) the iwlwifi driver seems to be broken
>
> My laptop that uses the intel 7680 iwlwifi module no longer connects
> to the network. It fails with a "Microcode SW error detected." and
> spews out register state over and over again.

Can we have the register state and the ASSERT / NMI / whatever that
goes along with it?
This clearly means that the firmware is crashing, but I don't know why,
I copied here the lines that I need from another bug with another
device with another firmware,
but the log that we will still explain what I need:

[  800.880402] iwlwifi :02:00.0: Start IWL Error Log Dump:
[  800.880406] iwlwifi :02:00.0: Status: 0x, count: 6
[  800.880409] iwlwifi :02:00.0: Loaded firmware version: 21.311951.0
[  800.880413] iwlwifi :02:00.0: 0x0394 | ADVANCED_SYSASSERT
[  800.880416] iwlwifi :02:00.0: 0x0220 | trm_hw_status0
[  800.880419] iwlwifi :02:00.0: 0x | trm_hw_status1
[  800.880422] iwlwifi :02:00.0: 0x0BD8 | branchlink2
[  800.880425] iwlwifi :02:00.0: 0x00026AC4 | interruptlink1
[  800.880428] iwlwifi :02:00.0: 0x | interruptlink2
[  800.880431] iwlwifi :02:00.0: 0x0001 | data1
[  800.880434] iwlwifi :02:00.0: 0x02039845 | data2
[  800.880437] iwlwifi :02:00.0: 0x0056 | data3
[  800.880440] iwlwifi :02:00.0: 0x8E4184A7 | beacon time
[  800.880443] iwlwifi :02:00.0: 0x30E2CB41 | tsf low
[  800.880446] iwlwifi :02:00.0: 0x0027 | tsf hi
[  800.880449] iwlwifi :02:00.0: 0x | time gp1
[  800.880451] iwlwifi :02:00.0: 0x2F842F8A | time gp2
[  800.880454] iwlwifi :02:00.0: 0x | uCode revision type
[  800.880457] iwlwifi :02:00.0: 0x0015 | uCode version major
[  800.880460] iwlwifi :02:00.0: 0x0004C28F | uCode version minor
[  800.880463] iwlwifi :02:00.0: 0x0201 | hw version
[  800.880466] iwlwifi :02:00.0: 0x00489008 | board version
[  800.880469] iwlwifi :02:00.0: 0x001C | hcmd
[  800.880472] iwlwifi :02:00.0: 0x24022000 | isr0
[  800.880475] iwlwifi :02:00.0: 0x0100 | isr1
[  800.880478] iwlwifi :02:00.0: 0x580A | isr2
[  800.880481] iwlwifi :02:00.0: 0x4041FCC1 | isr3
[  800.880483] iwlwifi :02:00.0: 0x | isr4
[  800.880486] iwlwifi :02:00.0: 0x00800110 | last cmd Id
[  800.880489] iwlwifi :02:00.0: 0x | wait_event
[  800.880492] iwlwifi :02:00.0: 0x02C8 | l2p_control
[  800.880495] iwlwifi :02:00.0: 0x00018030 | l2p_duration
[  800.880498] iwlwifi :02:00.0: 0x00BF | l2p_mhvalid
[  800.880501] iwlwifi :02:00.0: 0x00EF | l2p_addr_match
[  800.880503] iwlwifi :02:00.0: 0x000D | lmpm_pmg_sel
[  800.880506] iwlwifi :02:00.0: 0x30031805 | timestamp
[  800.880509] iwlwifi :02:00.0: 0xE0F0 | flow_handler




>
> The last thing it says before falling over is:
>
>   wlp1s0: authenticate with xx:xx:xx:xx:xx:xx
>   wlp1s0: send auth to xx:xx:xx:xx:xx:xx (try 1/3)
>   wlp1s0: send auth to xx:xx:xx:xx:xx:xx (try 2/3)
>
> and then it goes all titsup.
>
> I thought that it might be because I had downloaded one of the daily
> firmware versions (it calls itself iwlwifi-7260-17.ucode, but isn't a
> real release afaik - but it has worked fien for me before), but the
> problem persists with the ver-16 ucode too, so that wasn't it.
>
> I haven't bisected it, but there is absolutely nothing odd in my hardware.
>
> I do have a 802.11ac network, which apparently not everybody does,
> judging by previous bug-reports of mine..
>
> Intel iwlwifi people: please check this out.
>
>Linus
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

linux-next: manual merge of the dax-misc tree with the nvdimm tree

2016-05-17 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the dax-misc tree got a conflict in:

  fs/block_dev.c

between commit:

  8044aae6f374 ("Revert "block: enable dax for raw block devices"")

from the nvdimm tree and commit:

  02fbd139759f ("dax: Remove complete_unwritten argument")

from the dax-misc tree.

I fixed it up (the former removed the code modified by the latter) and
can carry the fix as necessary. This is now fixed as far as linux-next
is concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

linux-next: manual merge of the dax-misc tree with the nvdimm tree

2016-05-17 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the dax-misc tree got a conflict in:

  fs/block_dev.c

between commit:

  8044aae6f374 ("Revert "block: enable dax for raw block devices"")

from the nvdimm tree and commit:

  02fbd139759f ("dax: Remove complete_unwritten argument")

from the dax-misc tree.

I fixed it up (the former removed the code modified by the latter) and
can carry the fix as necessary. This is now fixed as far as linux-next
is concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell

Re: 45aebeaf4f67 "ovl: Ensure upper filesystem supports d_type" breaking Docker

2016-05-17 Thread Daniel Axtens

Hi Vivek,

My sincere apologies - it turns out I *was* running on xfs with
ftype=0. Someone in the office had moved docker's storage without
me noticing.

Apologies to all whose time I wasted.

Regards,
Daniel

Vivek Goyal  writes:

> On Tue, May 17, 2016 at 10:15:21AM +0200, Miklos Szeredi wrote:
>> On Tue, May 17, 2016 at 8:28 AM, Al Viro  wrote:
>> > On Mon, May 16, 2016 at 09:07:27AM -0400, Vivek Goyal wrote:
>> >> So it became clear that we need a check at mount time to make sure
>> >> d_type is supported otherwise error out. This will require users to
>> >> do mkfs.xfs with ftype=1 to make progress.
>> >>
>> >> I think new defaults for mkfs.xfs are such that ftype=1 is set. I am
>> >> not sure which version that change was made in.
>> >
>> > Dumb question - can we end up with empty workdir at that point?  Because
>> > if we do, the check would appear to return a false negative, no matter
>> > what fs supports...
>> 
>> ovl_workdir_create() creates a subdirectory of workdir ("work") so
>> workdir itself won't be empty after that.  If somebody else messes
>> with workdir, then we are screwed anyway.
>
> Right. Initially I was creating a directory of my own and later realized
> that ovl_workdir_create() already creates one.
>
> Having said that, what happens when ovl_workdir_create() fails and we
> mount overlayfs read only. In that case I think we will conclude that
> underlying fs does not support d_type and mounting will fail.
>
> Any thoughts, on how to handle this failure path better?
>
> Daniel,
>
> Yesterday Eric Sandeen told me that I can run "xfs_info " to
> figure out if ftype is 0 or 1. You might want to run "xfs_info /" and 
> ensure ftype=0 in your case and overlay is not detecting it wrong.
>
> Thanks
> Vivek

Re: 45aebeaf4f67 "ovl: Ensure upper filesystem supports d_type" breaking Docker

2016-05-17 Thread Daniel Axtens

Hi Vivek,

My sincere apologies - it turns out I *was* running on xfs with
ftype=0. Someone in the office had moved docker's storage without
me noticing.

Apologies to all whose time I wasted.

Regards,
Daniel

Vivek Goyal  writes:

> On Tue, May 17, 2016 at 10:15:21AM +0200, Miklos Szeredi wrote:
>> On Tue, May 17, 2016 at 8:28 AM, Al Viro  wrote:
>> > On Mon, May 16, 2016 at 09:07:27AM -0400, Vivek Goyal wrote:
>> >> So it became clear that we need a check at mount time to make sure
>> >> d_type is supported otherwise error out. This will require users to
>> >> do mkfs.xfs with ftype=1 to make progress.
>> >>
>> >> I think new defaults for mkfs.xfs are such that ftype=1 is set. I am
>> >> not sure which version that change was made in.
>> >
>> > Dumb question - can we end up with empty workdir at that point?  Because
>> > if we do, the check would appear to return a false negative, no matter
>> > what fs supports...
>> 
>> ovl_workdir_create() creates a subdirectory of workdir ("work") so
>> workdir itself won't be empty after that.  If somebody else messes
>> with workdir, then we are screwed anyway.
>
> Right. Initially I was creating a directory of my own and later realized
> that ovl_workdir_create() already creates one.
>
> Having said that, what happens when ovl_workdir_create() fails and we
> mount overlayfs read only. In that case I think we will conclude that
> underlying fs does not support d_type and mounting will fail.
>
> Any thoughts, on how to handle this failure path better?
>
> Daniel,
>
> Yesterday Eric Sandeen told me that I can run "xfs_info " to
> figure out if ftype is 0 or 1. You might want to run "xfs_info /" and 
> ensure ftype=0 in your case and overlay is not detecting it wrong.
>
> Thanks
> Vivek

Re: [PATCH 02/17] perf tools: Add evlist channel helpers

2016-05-17 Thread Wangnan (F)




On 2016/5/13 21:05, Arnaldo Carvalho de Melo wrote:

Em Fri, May 13, 2016 at 07:55:59AM +, Wang Nan escreveu:

In this commit sereval helpers are introduced to support the principle

  several


of channel. Channels hold different groups of evsels which configured
differently. It will be used for overwritable evsels, which allows perf

why not use multiple evlists? An "evlist" is a "list of evsels", why do
we need yet another way of grouping evlists?

- Arnaldo



There's an assumption all over perf that there's only one evlist: in 
'struct record'
there's an 'evlist' pointer, in 'struct session' there's also an 
'evlist' pointer.
Trying to change them to an array results in 181 errors, so I think 
fundamentally

moving to multiple evlists is nearly impossible.

Now I'm thinking introducing auxiliary evlists to perf record. We still 
obey one

evlist assumption, only creates separated evlists for mmap.

Thank you.

Re: [PATCH 02/17] perf tools: Add evlist channel helpers

2016-05-17 Thread Wangnan (F)




On 2016/5/13 21:05, Arnaldo Carvalho de Melo wrote:

Em Fri, May 13, 2016 at 07:55:59AM +, Wang Nan escreveu:

In this commit sereval helpers are introduced to support the principle

  several


of channel. Channels hold different groups of evsels which configured
differently. It will be used for overwritable evsels, which allows perf

why not use multiple evlists? An "evlist" is a "list of evsels", why do
we need yet another way of grouping evlists?

- Arnaldo



There's an assumption all over perf that there's only one evlist: in 
'struct record'
there's an 'evlist' pointer, in 'struct session' there's also an 
'evlist' pointer.
Trying to change them to an array results in 181 errors, so I think 
fundamentally

moving to multiple evlists is nearly impossible.

Now I'm thinking introducing auxiliary evlists to perf record. We still 
obey one

evlist assumption, only creates separated evlists for mmap.

Thank you.

Re: [PATCH v12 05/10] arm64: Kprobes with single stepping support

2016-05-17 Thread Masami Hiramatsu

On Tue, 17 May 2016 16:58:09 +0800
Huang Shijie  wrote:

> On Wed, Apr 27, 2016 at 02:53:00PM -0400, David Long wrote:
> > +
> > +/*
> > + * Interrupts need to be disabled before single-step mode is set, and not
> > + * reenabled until after single-step mode ends.
> > + * Without disabling interrupt on local CPU, there is a chance of
> > + * interrupt occurrence in the period of exception return and  start of
> > + * out-of-line single-step, that result in wrongly single stepping
> > + * into the interrupt handler.
> > + */
> > +static void __kprobes kprobes_save_local_irqflag(struct pt_regs *regs)
> > +{
> > + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
> 
> Why not add a parameter for this function to save the @kcb?

Good catch, it should use same kcb of caller.

> 
> > +
> > + kcb->saved_irqflag = regs->pstate;
> > + regs->pstate |= PSR_I_BIT;
> > +}
> > +
> > +static void __kprobes kprobes_restore_local_irqflag(struct pt_regs *regs)
> > +{
> > + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
> ditto.
> 
> > +
> > + if (kcb->saved_irqflag & PSR_I_BIT)
> > + regs->pstate |= PSR_I_BIT;
> > + else
> > + regs->pstate &= ~PSR_I_BIT;
> > +}
> > +
> > +static void __kprobes
> > +set_ss_context(struct kprobe_ctlblk *kcb, unsigned long addr)
> > +{
> > + kcb->ss_ctx.ss_pending = true;
> > + kcb->ss_ctx.match_addr = addr + sizeof(kprobe_opcode_t);
> > +}
> > +
> > +static void __kprobes clear_ss_context(struct kprobe_ctlblk *kcb)
> > +{
> > + kcb->ss_ctx.ss_pending = false;
> > + kcb->ss_ctx.match_addr = 0;
> > +}
> > +
> > +static void __kprobes setup_singlestep(struct kprobe *p,
> > +struct pt_regs *regs,
> > +struct kprobe_ctlblk *kcb, int reenter)
> > +{
> > + unsigned long slot;
> > +
> > + if (reenter) {
> > + save_previous_kprobe(kcb);
> > + set_current_kprobe(p);
> > + kcb->kprobe_status = KPROBE_REENTER;
> > + } else {
> > + kcb->kprobe_status = KPROBE_HIT_SS;
> > + }
> > +
> > + if (p->ainsn.insn) {
> > + /* prepare for single stepping */
> > + slot = (unsigned long)p->ainsn.insn;
> > +
> > + set_ss_context(kcb, slot);  /* mark pending ss */
> > +
> > + if (kcb->kprobe_status == KPROBE_REENTER)
> > + spsr_set_debug_flag(regs, 0);
> > +
> > + /* IRQs and single stepping do not mix well. */
> > + kprobes_save_local_irqflag(regs);
> > + kernel_enable_single_step(regs);
> > + instruction_pointer(regs) = slot;
> > + } else  {
> > + BUG();

You'd better use BUG_ON(!p->ainsn.insn);

> > + }
> > +}
> > +
> > +static int __kprobes reenter_kprobe(struct kprobe *p,
> > + struct pt_regs *regs,
> > + struct kprobe_ctlblk *kcb)
> > +{
> > + switch (kcb->kprobe_status) {
> > + case KPROBE_HIT_SSDONE:
> > + case KPROBE_HIT_ACTIVE:
> > + kprobes_inc_nmissed_count(p);
> > + setup_singlestep(p, regs, kcb, 1);
> > + break;
> > + case KPROBE_HIT_SS:
> > + case KPROBE_REENTER:
> > + pr_warn("Unrecoverable kprobe detected at %p.\n", p->addr);
> > + dump_kprobe(p);
> > + BUG();
> > + break;
> > + default:
> > + WARN_ON(1);
> > + return 0;
> > + }
> > +
> > + return 1;
> > +}
> > +
> > +static void __kprobes
> > +post_kprobe_handler(struct kprobe_ctlblk *kcb, struct pt_regs *regs)
> > +{
> > + struct kprobe *cur = kprobe_running();
> > +
> > + if (!cur)
> > + return;
> > +
> > + /* return addr restore if non-branching insn */
> > + if (cur->ainsn.restore.type == RESTORE_PC) {
> > + instruction_pointer(regs) = cur->ainsn.restore.addr;
> > + if (!instruction_pointer(regs))
> > + BUG();
> > + }
> > +
> > + /* restore back original saved kprobe variables and continue */
> > + if (kcb->kprobe_status == KPROBE_REENTER) {
> > + restore_previous_kprobe(kcb);
> > + return;
> > + }
> > + /* call post handler */
> > + kcb->kprobe_status = KPROBE_HIT_SSDONE;
> > + if (cur->post_handler)  {
> > + /* post_handler can hit breakpoint and single step
> > +  * again, so we enable D-flag for recursive exception.
> > +  */
> > + cur->post_handler(cur, regs, 0);
> > + }
> > +
> > + reset_current_kprobe();
> > +}
> > +
> > +int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr)
> > +{
> > + struct kprobe *cur = kprobe_running();
> > + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
> > +
> > + switch (kcb->kprobe_status) {
> > + case KPROBE_HIT_SS:
> > + case

Re: [PATCH v12 05/10] arm64: Kprobes with single stepping support

2016-05-17 Thread Masami Hiramatsu

On Tue, 17 May 2016 16:58:09 +0800
Huang Shijie  wrote:

> On Wed, Apr 27, 2016 at 02:53:00PM -0400, David Long wrote:
> > +
> > +/*
> > + * Interrupts need to be disabled before single-step mode is set, and not
> > + * reenabled until after single-step mode ends.
> > + * Without disabling interrupt on local CPU, there is a chance of
> > + * interrupt occurrence in the period of exception return and  start of
> > + * out-of-line single-step, that result in wrongly single stepping
> > + * into the interrupt handler.
> > + */
> > +static void __kprobes kprobes_save_local_irqflag(struct pt_regs *regs)
> > +{
> > + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
> 
> Why not add a parameter for this function to save the @kcb?

Good catch, it should use same kcb of caller.

> 
> > +
> > + kcb->saved_irqflag = regs->pstate;
> > + regs->pstate |= PSR_I_BIT;
> > +}
> > +
> > +static void __kprobes kprobes_restore_local_irqflag(struct pt_regs *regs)
> > +{
> > + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
> ditto.
> 
> > +
> > + if (kcb->saved_irqflag & PSR_I_BIT)
> > + regs->pstate |= PSR_I_BIT;
> > + else
> > + regs->pstate &= ~PSR_I_BIT;
> > +}
> > +
> > +static void __kprobes
> > +set_ss_context(struct kprobe_ctlblk *kcb, unsigned long addr)
> > +{
> > + kcb->ss_ctx.ss_pending = true;
> > + kcb->ss_ctx.match_addr = addr + sizeof(kprobe_opcode_t);
> > +}
> > +
> > +static void __kprobes clear_ss_context(struct kprobe_ctlblk *kcb)
> > +{
> > + kcb->ss_ctx.ss_pending = false;
> > + kcb->ss_ctx.match_addr = 0;
> > +}
> > +
> > +static void __kprobes setup_singlestep(struct kprobe *p,
> > +struct pt_regs *regs,
> > +struct kprobe_ctlblk *kcb, int reenter)
> > +{
> > + unsigned long slot;
> > +
> > + if (reenter) {
> > + save_previous_kprobe(kcb);
> > + set_current_kprobe(p);
> > + kcb->kprobe_status = KPROBE_REENTER;
> > + } else {
> > + kcb->kprobe_status = KPROBE_HIT_SS;
> > + }
> > +
> > + if (p->ainsn.insn) {
> > + /* prepare for single stepping */
> > + slot = (unsigned long)p->ainsn.insn;
> > +
> > + set_ss_context(kcb, slot);  /* mark pending ss */
> > +
> > + if (kcb->kprobe_status == KPROBE_REENTER)
> > + spsr_set_debug_flag(regs, 0);
> > +
> > + /* IRQs and single stepping do not mix well. */
> > + kprobes_save_local_irqflag(regs);
> > + kernel_enable_single_step(regs);
> > + instruction_pointer(regs) = slot;
> > + } else  {
> > + BUG();

You'd better use BUG_ON(!p->ainsn.insn);

> > + }
> > +}
> > +
> > +static int __kprobes reenter_kprobe(struct kprobe *p,
> > + struct pt_regs *regs,
> > + struct kprobe_ctlblk *kcb)
> > +{
> > + switch (kcb->kprobe_status) {
> > + case KPROBE_HIT_SSDONE:
> > + case KPROBE_HIT_ACTIVE:
> > + kprobes_inc_nmissed_count(p);
> > + setup_singlestep(p, regs, kcb, 1);
> > + break;
> > + case KPROBE_HIT_SS:
> > + case KPROBE_REENTER:
> > + pr_warn("Unrecoverable kprobe detected at %p.\n", p->addr);
> > + dump_kprobe(p);
> > + BUG();
> > + break;
> > + default:
> > + WARN_ON(1);
> > + return 0;
> > + }
> > +
> > + return 1;
> > +}
> > +
> > +static void __kprobes
> > +post_kprobe_handler(struct kprobe_ctlblk *kcb, struct pt_regs *regs)
> > +{
> > + struct kprobe *cur = kprobe_running();
> > +
> > + if (!cur)
> > + return;
> > +
> > + /* return addr restore if non-branching insn */
> > + if (cur->ainsn.restore.type == RESTORE_PC) {
> > + instruction_pointer(regs) = cur->ainsn.restore.addr;
> > + if (!instruction_pointer(regs))
> > + BUG();
> > + }
> > +
> > + /* restore back original saved kprobe variables and continue */
> > + if (kcb->kprobe_status == KPROBE_REENTER) {
> > + restore_previous_kprobe(kcb);
> > + return;
> > + }
> > + /* call post handler */
> > + kcb->kprobe_status = KPROBE_HIT_SSDONE;
> > + if (cur->post_handler)  {
> > + /* post_handler can hit breakpoint and single step
> > +  * again, so we enable D-flag for recursive exception.
> > +  */
> > + cur->post_handler(cur, regs, 0);
> > + }
> > +
> > + reset_current_kprobe();
> > +}
> > +
> > +int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr)
> > +{
> > + struct kprobe *cur = kprobe_running();
> > + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
> > +
> > + switch (kcb->kprobe_status) {
> > + case KPROBE_HIT_SS:
> > + case KPROBE_REENTER:
> > +

Re: [PATCH v8 13/14] usb: gadget: udc: adapt to OTG core

2016-05-17 Thread Peter Chen

On Mon, May 16, 2016 at 12:51:53PM +0300, Roger Quadros wrote:
> On 16/05/16 12:23, Peter Chen wrote:
> > On Mon, May 16, 2016 at 11:26:57AM +0300, Roger Quadros wrote:
> >> Hi,
> >>
> >> On 16/05/16 10:02, Peter Chen wrote:
> >>> On Fri, May 13, 2016 at 01:03:27PM +0300, Roger Quadros wrote:
>  +
>  +static int usb_gadget_connect_control(struct usb_gadget *gadget, bool 
>  connect)
>  +{
>  +struct usb_udc *udc;
>  +
>  +mutex_lock(_lock);
>  +udc = usb_gadget_to_udc(gadget);
>  +if (!udc) {
>  +dev_err(gadget->dev.parent, "%s: gadget not 
>  registered.\n",
>  +__func__);
>  +mutex_unlock(_lock);
>  +return -EINVAL;
>  +}
>  +
>  +if (connect) {
>  +if (!gadget->connected)
>  +usb_gadget_connect(udc->gadget);
>  +} else {
>  +if (gadget->connected) {
>  +usb_gadget_disconnect(udc->gadget);
>  +udc->driver->disconnect(udc->gadget);
>  +}
>  +}
>  +
>  +mutex_unlock(_lock);
>  +
>  +return 0;
>  +}
>  +
> >>>
> >>> Since this is called for vbus interrupt, why not using
> >>> usb_udc_vbus_handler directly, and call udc->driver->disconnect
> >>> at usb_gadget_stop.
> >>
> >> We can't assume that this is always called for vbus interrupt so
> >> I decided not to call usb_udc_vbus_handler.
> >>
> >> udc->vbus is really pointless for us. We keep vbus states in our
> >> state machine and leave udc->vbus as ture always.
> >>
> >> Why do you want to move udc->driver->disconnect() to stop?
> >> If USB controller disconnected from bus then the gadget driver
> >> must be notified about the disconnect immediately. The controller
> >> may or may not be stopped by the core.
> >>
> > 
> > Then, would you give some comments when this API will be used?
> > I was assumed it is only used for drd state machine.
> 
> drd_state machine didn't even need this API in the first place :).
> You guys wanted me to separate out start/stop and connect/disconnect for full 
> OTG case.
> Won't full OTG state machine want to use this API? If not what would it use?
> 

Oh, I meant only drd and fully otg state machine needs it. I am
wondering if we need have a new API to do it. Two questions:

- Except for vbus interrupt, any chances this API will be used at
current logic?
- When this API is called but without a coming gadget->stop?

-- 

Best Regards,
Peter Chen

Re: [PATCH v8 13/14] usb: gadget: udc: adapt to OTG core

2016-05-17 Thread Peter Chen

On Mon, May 16, 2016 at 12:51:53PM +0300, Roger Quadros wrote:
> On 16/05/16 12:23, Peter Chen wrote:
> > On Mon, May 16, 2016 at 11:26:57AM +0300, Roger Quadros wrote:
> >> Hi,
> >>
> >> On 16/05/16 10:02, Peter Chen wrote:
> >>> On Fri, May 13, 2016 at 01:03:27PM +0300, Roger Quadros wrote:
>  +
>  +static int usb_gadget_connect_control(struct usb_gadget *gadget, bool 
>  connect)
>  +{
>  +struct usb_udc *udc;
>  +
>  +mutex_lock(_lock);
>  +udc = usb_gadget_to_udc(gadget);
>  +if (!udc) {
>  +dev_err(gadget->dev.parent, "%s: gadget not 
>  registered.\n",
>  +__func__);
>  +mutex_unlock(_lock);
>  +return -EINVAL;
>  +}
>  +
>  +if (connect) {
>  +if (!gadget->connected)
>  +usb_gadget_connect(udc->gadget);
>  +} else {
>  +if (gadget->connected) {
>  +usb_gadget_disconnect(udc->gadget);
>  +udc->driver->disconnect(udc->gadget);
>  +}
>  +}
>  +
>  +mutex_unlock(_lock);
>  +
>  +return 0;
>  +}
>  +
> >>>
> >>> Since this is called for vbus interrupt, why not using
> >>> usb_udc_vbus_handler directly, and call udc->driver->disconnect
> >>> at usb_gadget_stop.
> >>
> >> We can't assume that this is always called for vbus interrupt so
> >> I decided not to call usb_udc_vbus_handler.
> >>
> >> udc->vbus is really pointless for us. We keep vbus states in our
> >> state machine and leave udc->vbus as ture always.
> >>
> >> Why do you want to move udc->driver->disconnect() to stop?
> >> If USB controller disconnected from bus then the gadget driver
> >> must be notified about the disconnect immediately. The controller
> >> may or may not be stopped by the core.
> >>
> > 
> > Then, would you give some comments when this API will be used?
> > I was assumed it is only used for drd state machine.
> 
> drd_state machine didn't even need this API in the first place :).
> You guys wanted me to separate out start/stop and connect/disconnect for full 
> OTG case.
> Won't full OTG state machine want to use this API? If not what would it use?
> 

Oh, I meant only drd and fully otg state machine needs it. I am
wondering if we need have a new API to do it. Two questions:

- Except for vbus interrupt, any chances this API will be used at
current logic?
- When this API is called but without a coming gadget->stop?

-- 

Best Regards,
Peter Chen

Re: Re: [PATCH v8 3/5] mfd: hi655x: Add MFD driver for hi655x

2016-05-17 Thread Guodong Xu

On 19 April 2016 at 14:53, Lee Jones <lee.jo...@linaro.org> wrote:
>
> On Tue, 19 Apr 2016, Guodong Xu wrote:
>
> > On 13 April 2016 at 08:51, Chen Feng <puck.c...@hisilicon.com> wrote:
> > >
> > >
> > >
> > >  Forwarded Message 
> > > Subject: Re: [PATCH v8 3/5] mfd: hi655x: Add MFD driver for hi655x
> > > Date: Mon, 11 Apr 2016 11:41:06 +0100
> > > From: Lee Jones <lee.jo...@linaro.org>
> > > To: Chen Feng <puck.c...@hisilicon.com>
> > > CC: lgirdw...@gmail.com, broo...@kernel.org, 
> > > linux-kernel@vger.kernel.org, w...@huawei.com, 
> > > kong.kongxin...@hisilicon.com, haojian.zhu...@linaro.org, 
> > > suzhuangl...@hisilicon.com, dan.z...@hisilicon.com
> > >
> > > On Sun, 14 Feb 2016, Chen Feng wrote:
> > >
> > > > Add PMIC MFD driver to support hisilicon hi665x.
> > > >
> > > > Signed-off-by: Chen Feng <puck.c...@hisilicon.com>
> > > > Signed-off-by: Fei Wang <w...@huawei.com>
> > > > Signed-off-by: Xinwei Kong <kong.kongxin...@hisilicon.com>
> > > > Reviewed-by: Haojian Zhuang <haojian.zhu...@linaro.org>
> > > > Acked-by: Lee Jones <lee.jo...@linaro.org>
> > > > ---
> > > >  drivers/mfd/Kconfig |  10 +++
> > > >  drivers/mfd/Makefile|   1 +
> > > >  drivers/mfd/hi655x-pmic.c   | 162 
> > > > 
> > > >  include/linux/mfd/hi655x-pmic.h |  55 ++
> > > >  4 files changed, 228 insertions(+)
> > > >  create mode 100644 drivers/mfd/hi655x-pmic.c
> > > >  create mode 100644 include/linux/mfd/hi655x-pmic.h
> > >
> > > Applied, thanks.
> >
> > Hi, Lee, Mark
> >
> > I still didn't see this patch in linux-next (next-20160418) since your
> > replied "Applied". Are you expecting anything else? Dependencies?
> >
> > I didn't see any unsolved review comments actually. But if there is,
> > please let us know, so I can send an updated version.
>
> When I applied your patch, I also added ~40 other patches.  I haven't
> yet got around to editing and pushing them all to -next.  I will put
> some time aside this morning in order to complete the push.


Hi, Lee

As of this morning, I still cannot see hi655x in your for-mfd-next
branch and in linux-next (next-20160517)

I saw this patch integrated on Apr/19:
mfd: hi655x: Add document for hi665x PMIC

But apparently missing this one:
[PATCH v8 3/5] mfd: hi655x: Add MFD driver for hi655x

Would you please have a check? Sorry if I'm asking something stupid.
Look forward to seeing it in v4.7-rcs.

Thank you.

-Guodong

>
>
> > > > diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
> > > > index 9ca66de..5b1c091 100644
> > > > --- a/drivers/mfd/Kconfig
> > > > +++ b/drivers/mfd/Kconfig
> > > > @@ -284,6 +284,16 @@ config MFD_HI6421_PMIC
> > > > menus in order to enable them.
> > > > We communicate with the Hi6421 via memory-mapped I/O.
> > > >
> > > > +config MFD_HI655X_PMIC
> > > > + tristate "HiSilicon Hi655X series PMU/Codec IC"
> > > > + depends on ARCH_HISI || COMPILE_TEST
> > > > + depends on OF
> > > > + select MFD_CORE
> > > > + select REGMAP_MMIO
> > > > + select REGMAP_IRQ
> > > > + help
> > > > +   Select this option to enable Hisilicon hi655x series pmic 
> > > > driver.
> > > > +
> > > >  config HTC_EGPIO
> > > >   bool "HTC EGPIO support"
> > > >   depends on GPIOLIB && ARM
> > > > diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
> > > > index 0f230a6..1e166c1 100644
> > > > --- a/drivers/mfd/Makefile
> > > > +++ b/drivers/mfd/Makefile
> > > > @@ -190,6 +190,7 @@ obj-$(CONFIG_MFD_STW481X) += stw481x.o
> > > >  obj-$(CONFIG_MFD_IPAQ_MICRO) += ipaq-micro.o
> > > >  obj-$(CONFIG_MFD_MENF21BMC)  += menf21bmc.o
> > > >  obj-$(CONFIG_MFD_HI6421_PMIC)+= hi6421-pmic-core.o
> > > > +obj-$(CONFIG_MFD_HI655X_PMIC)   += hi655x-pmic.o
> > > >  obj-$(CONFIG_MFD_DLN2)   += dln2.o
> > > >  obj-$(CONFIG_MFD_RT5033) += rt5033.o
> > > >  obj-$(CONFIG_MFD_SKY81452)   += sky81452.o
> > > > diff --git a/drivers/mfd/hi655x-pmic.c b/drivers/mfd/hi655x-pmic.c
> > >

Re: Re: [PATCH v8 3/5] mfd: hi655x: Add MFD driver for hi655x

2016-05-17 Thread Guodong Xu

On 19 April 2016 at 14:53, Lee Jones  wrote:
>
> On Tue, 19 Apr 2016, Guodong Xu wrote:
>
> > On 13 April 2016 at 08:51, Chen Feng  wrote:
> > >
> > >
> > >
> > >  Forwarded Message 
> > > Subject: Re: [PATCH v8 3/5] mfd: hi655x: Add MFD driver for hi655x
> > > Date: Mon, 11 Apr 2016 11:41:06 +0100
> > > From: Lee Jones 
> > > To: Chen Feng 
> > > CC: lgirdw...@gmail.com, broo...@kernel.org, 
> > > linux-kernel@vger.kernel.org, w...@huawei.com, 
> > > kong.kongxin...@hisilicon.com, haojian.zhu...@linaro.org, 
> > > suzhuangl...@hisilicon.com, dan.z...@hisilicon.com
> > >
> > > On Sun, 14 Feb 2016, Chen Feng wrote:
> > >
> > > > Add PMIC MFD driver to support hisilicon hi665x.
> > > >
> > > > Signed-off-by: Chen Feng 
> > > > Signed-off-by: Fei Wang 
> > > > Signed-off-by: Xinwei Kong 
> > > > Reviewed-by: Haojian Zhuang 
> > > > Acked-by: Lee Jones 
> > > > ---
> > > >  drivers/mfd/Kconfig |  10 +++
> > > >  drivers/mfd/Makefile|   1 +
> > > >  drivers/mfd/hi655x-pmic.c   | 162 
> > > > 
> > > >  include/linux/mfd/hi655x-pmic.h |  55 ++
> > > >  4 files changed, 228 insertions(+)
> > > >  create mode 100644 drivers/mfd/hi655x-pmic.c
> > > >  create mode 100644 include/linux/mfd/hi655x-pmic.h
> > >
> > > Applied, thanks.
> >
> > Hi, Lee, Mark
> >
> > I still didn't see this patch in linux-next (next-20160418) since your
> > replied "Applied". Are you expecting anything else? Dependencies?
> >
> > I didn't see any unsolved review comments actually. But if there is,
> > please let us know, so I can send an updated version.
>
> When I applied your patch, I also added ~40 other patches.  I haven't
> yet got around to editing and pushing them all to -next.  I will put
> some time aside this morning in order to complete the push.


Hi, Lee

As of this morning, I still cannot see hi655x in your for-mfd-next
branch and in linux-next (next-20160517)

I saw this patch integrated on Apr/19:
mfd: hi655x: Add document for hi665x PMIC

But apparently missing this one:
[PATCH v8 3/5] mfd: hi655x: Add MFD driver for hi655x

Would you please have a check? Sorry if I'm asking something stupid.
Look forward to seeing it in v4.7-rcs.

Thank you.

-Guodong

>
>
> > > > diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
> > > > index 9ca66de..5b1c091 100644
> > > > --- a/drivers/mfd/Kconfig
> > > > +++ b/drivers/mfd/Kconfig
> > > > @@ -284,6 +284,16 @@ config MFD_HI6421_PMIC
> > > > menus in order to enable them.
> > > > We communicate with the Hi6421 via memory-mapped I/O.
> > > >
> > > > +config MFD_HI655X_PMIC
> > > > + tristate "HiSilicon Hi655X series PMU/Codec IC"
> > > > + depends on ARCH_HISI || COMPILE_TEST
> > > > + depends on OF
> > > > + select MFD_CORE
> > > > + select REGMAP_MMIO
> > > > + select REGMAP_IRQ
> > > > + help
> > > > +   Select this option to enable Hisilicon hi655x series pmic 
> > > > driver.
> > > > +
> > > >  config HTC_EGPIO
> > > >   bool "HTC EGPIO support"
> > > >   depends on GPIOLIB && ARM
> > > > diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
> > > > index 0f230a6..1e166c1 100644
> > > > --- a/drivers/mfd/Makefile
> > > > +++ b/drivers/mfd/Makefile
> > > > @@ -190,6 +190,7 @@ obj-$(CONFIG_MFD_STW481X) += stw481x.o
> > > >  obj-$(CONFIG_MFD_IPAQ_MICRO) += ipaq-micro.o
> > > >  obj-$(CONFIG_MFD_MENF21BMC)  += menf21bmc.o
> > > >  obj-$(CONFIG_MFD_HI6421_PMIC)+= hi6421-pmic-core.o
> > > > +obj-$(CONFIG_MFD_HI655X_PMIC)   += hi655x-pmic.o
> > > >  obj-$(CONFIG_MFD_DLN2)   += dln2.o
> > > >  obj-$(CONFIG_MFD_RT5033) += rt5033.o
> > > >  obj-$(CONFIG_MFD_SKY81452)   += sky81452.o
> > > > diff --git a/drivers/mfd/hi655x-pmic.c b/drivers/mfd/hi655x-pmic.c
> > > > new file mode 100644
> > > > index 000..05ddc78
> > > > --- /dev/null
> > > > +++ b/drivers/mfd/hi655x-pmic.c
> > > > @@ -0,0 +1,162 @@
> > > > +/*
> > > > + * Device driver for MFD hi655

RE: [PATCH 1/3] ACPI: table upgrade: use cacheable map for tables

2016-05-17 Thread Zheng, Lv

Hi,

> From: Aleksey Makarov [mailto:aleksey.maka...@linaro.org]
> Subject: [PATCH 1/3] ACPI: table upgrade: use cacheable map for tables
> 
> The new memory allocated in acpi_table_initrd_init() is used to
> copy the upgraded tables to it.  So it should be mapped with
> early_memunmap() instead of early_ioremap().
> 
> This is critical for ARM.
> 
> Signed-off-by: Aleksey Makarov 
[Lv Zheng] 
Acked-by: Lv Zheng 

Thanks
-Lv

> ---
>  drivers/acpi/tables.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index a372f9e..449a649 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -578,10 +578,10 @@ static void __init acpi_table_initrd_init(void *data,
> size_t size)
>   clen = size;
>   if (clen > MAP_CHUNK_SIZE - slop)
>   clen = MAP_CHUNK_SIZE - slop;
> - dest_p = early_ioremap(dest_addr & PAGE_MASK,
> + dest_p = early_memremap(dest_addr & PAGE_MASK,
>clen + slop);
>   memcpy(dest_p + slop, src_p, clen);
> - early_iounmap(dest_p, clen + slop);
> + early_memunmap(dest_p, clen + slop);
>   src_p += clen;
>   dest_addr += clen;
>   size -= clen;
> --
> 2.8.2

RE: [PATCH 1/3] ACPI: table upgrade: use cacheable map for tables

2016-05-17 Thread Zheng, Lv

Hi,

> From: Aleksey Makarov [mailto:aleksey.maka...@linaro.org]
> Subject: [PATCH 1/3] ACPI: table upgrade: use cacheable map for tables
> 
> The new memory allocated in acpi_table_initrd_init() is used to
> copy the upgraded tables to it.  So it should be mapped with
> early_memunmap() instead of early_ioremap().
> 
> This is critical for ARM.
> 
> Signed-off-by: Aleksey Makarov 
[Lv Zheng] 
Acked-by: Lv Zheng 

Thanks
-Lv

> ---
>  drivers/acpi/tables.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index a372f9e..449a649 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -578,10 +578,10 @@ static void __init acpi_table_initrd_init(void *data,
> size_t size)
>   clen = size;
>   if (clen > MAP_CHUNK_SIZE - slop)
>   clen = MAP_CHUNK_SIZE - slop;
> - dest_p = early_ioremap(dest_addr & PAGE_MASK,
> + dest_p = early_memremap(dest_addr & PAGE_MASK,
>clen + slop);
>   memcpy(dest_p + slop, src_p, clen);
> - early_iounmap(dest_p, clen + slop);
> + early_memunmap(dest_p, clen + slop);
>   src_p += clen;
>   dest_addr += clen;
>   size -= clen;
> --
> 2.8.2

Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-17 Thread David Ahern


On 5/17/16 8:48 PM, Hekuang wrote:

I don't understand why dso-prefix option is needed? Why make me type
yet more options to the analysis command? Why can't the directory be
located under the symfs tree in a known location and populated the
same way it is without symfs?



Because the default buidid folder path is $HOME/.debug/.buildid,
and this $HOME is on the target machine, not the same as $HOME
on the host. Without that option, we need to copy $HOME/.debug/.buildid
to the 'known location in symfs', that's also an extra work.



My argument for symfs is that $HOME is not relevant or if it is the path 
is symfs/$HOME. The use case is dealing with countless images -- some 
development, some production. I should be able to nuke the symfs when 
the analysis is done and everything related to it is gone. With the 
$HOME/.debug path it just grows on and on with no real means of pruning it.


If the vdsos are for a particular symfs then why aren't the vdso's under 
it in a known location?

Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-17 Thread David Ahern


On 5/17/16 8:48 PM, Hekuang wrote:

I don't understand why dso-prefix option is needed? Why make me type
yet more options to the analysis command? Why can't the directory be
located under the symfs tree in a known location and populated the
same way it is without symfs?



Because the default buidid folder path is $HOME/.debug/.buildid,
and this $HOME is on the target machine, not the same as $HOME
on the host. Without that option, we need to copy $HOME/.debug/.buildid
to the 'known location in symfs', that's also an extra work.



My argument for symfs is that $HOME is not relevant or if it is the path 
is symfs/$HOME. The use case is dealing with countless images -- some 
development, some production. I should be able to nuke the symfs when 
the analysis is done and everything related to it is gone. With the 
$HOME/.debug path it just grows on and on with no real means of pruning it.


If the vdsos are for a particular symfs then why aren't the vdso's under 
it in a known location?

Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-17 Thread Hekuang




在 2016/5/18 9:51, David Ahern 写道:

On 5/17/16 7:47 PM, Hekuang wrote:



在 2016/5/16 10:50, David Ahern 写道:

On 5/15/16 7:30 PM, Hekuang wrote:

In previous patch, I use 'perf buildid-cache -a' to add vdso
binary into the HOST buildid dir.


So 'perf buildid-cache' needs the symfs option?



With this patch 'PATCH v3 3/7 UPDATE', the tree of symfs dir is
like this:

├── debug($(dso-prefix))
│   ├── .build-id
│   │   ├── 3a
│   │   │   └── e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd ->
../../[kernel.kallsyms]/3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
│   │   └── 84
│   │   └── dbd75729adba57cc42f5544b25de571c0c8731 ->
../../[vdso32]/84dbd75729adba57cc42f5544b25de571c0c8731
│   ├── [kernel.kallsyms]
│   │   └── 3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
│   ├── [vdso]
│   │   └── 84dbd75729adba57cc42f5544b25de571c0c8731
│   └── [vdso32]
│   └── 84dbd75729adba57cc42f5544b25de571c0c8731
├── lib
│   ├── ld-2.22.so
│   └── libc-2.22.so
├── tmp
│   └── hello
└── xxx

So all binaries we need are included in the symfs dir. I think
this is consistent with your idea explained in previous mails.

With this symfs, we do not need buildid dir anymore and what's
your idea on 'perf buildid-cache' needs symfs option? after all,
that only effects on buildid dir.


I don't understand why dso-prefix option is needed? Why make me type 
yet more options to the analysis command? Why can't the directory be 
located under the symfs tree in a known location and populated the 
same way it is without symfs?




Because the default buidid folder path is $HOME/.debug/.buildid,
and this $HOME is on the target machine, not the same as $HOME
on the host. Without that option, we need to copy $HOME/.debug/.buildid
to the 'known location in symfs', that's also an extra work.

Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-17 Thread Hekuang




在 2016/5/18 9:51, David Ahern 写道:

On 5/17/16 7:47 PM, Hekuang wrote:



在 2016/5/16 10:50, David Ahern 写道:

On 5/15/16 7:30 PM, Hekuang wrote:

In previous patch, I use 'perf buildid-cache -a' to add vdso
binary into the HOST buildid dir.


So 'perf buildid-cache' needs the symfs option?



With this patch 'PATCH v3 3/7 UPDATE', the tree of symfs dir is
like this:

├── debug($(dso-prefix))
│   ├── .build-id
│   │   ├── 3a
│   │   │   └── e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd ->
../../[kernel.kallsyms]/3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
│   │   └── 84
│   │   └── dbd75729adba57cc42f5544b25de571c0c8731 ->
../../[vdso32]/84dbd75729adba57cc42f5544b25de571c0c8731
│   ├── [kernel.kallsyms]
│   │   └── 3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
│   ├── [vdso]
│   │   └── 84dbd75729adba57cc42f5544b25de571c0c8731
│   └── [vdso32]
│   └── 84dbd75729adba57cc42f5544b25de571c0c8731
├── lib
│   ├── ld-2.22.so
│   └── libc-2.22.so
├── tmp
│   └── hello
└── xxx

So all binaries we need are included in the symfs dir. I think
this is consistent with your idea explained in previous mails.

With this symfs, we do not need buildid dir anymore and what's
your idea on 'perf buildid-cache' needs symfs option? after all,
that only effects on buildid dir.


I don't understand why dso-prefix option is needed? Why make me type 
yet more options to the analysis command? Why can't the directory be 
located under the symfs tree in a known location and populated the 
same way it is without symfs?




Because the default buidid folder path is $HOME/.debug/.buildid,
and this $HOME is on the target machine, not the same as $HOME
on the host. Without that option, we need to copy $HOME/.debug/.buildid
to the 'known location in symfs', that's also an extra work.

Re: CQ and RDMA READ/WRITE APIs

2016-05-17 Thread Parav Pandit

Hi Doug,

On Tue, May 17, 2016 at 11:02 PM, Doug Ledford  wrote:
> Nice catch there Bart.  That was well before my role as maintainer and
> so settles things well enough for me.  IOW, I don't feel I need to worry
> about trying to maintain the dual license nature of the RDMA stack as it
> was broken long before I took over.  Thanks for pointing that out.
>

Does it mean we can submit new code files under GPL only license?
I submitted RDMA cgroup related code in ib_core under GPLv2 only license.
Existing files which are calling those new APIs will continue to be
dual license (similar to CQ and RDMA APIs)?

Parav

Re: CQ and RDMA READ/WRITE APIs

2016-05-17 Thread Parav Pandit

Hi Doug,

On Tue, May 17, 2016 at 11:02 PM, Doug Ledford  wrote:
> Nice catch there Bart.  That was well before my role as maintainer and
> so settles things well enough for me.  IOW, I don't feel I need to worry
> about trying to maintain the dual license nature of the RDMA stack as it
> was broken long before I took over.  Thanks for pointing that out.
>

Does it mean we can submit new code files under GPL only license?
I submitted RDMA cgroup related code in ib_core under GPLv2 only license.
Existing files which are calling those new APIs will continue to be
dual license (similar to CQ and RDMA APIs)?

Parav

Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()

2016-05-17 Thread Yuyang Du

On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote:
> So, if the code looks like the following, either now or in the future,
> 
> static void __schedule(bool preempt)
> {
>   ...
>   /* Clear RQCF_ACT_SKIP */
>   rq->clock_update_flags = 0;
>   ...
>   delta = rq_clock();
> }

Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it,
you clear everything. And if you clear the RQCF_UPDATE also (maybe you
shouldn't, but actually it does not matter), of course you will get
a warning...

In addition, it looks like multiple skips are possible, so:

update_rq_clock() {
rq->clock_update_flags |= RQCF_UPDATE;

...
}

instead of clearing the skip flag there.

Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()

2016-05-17 Thread Yuyang Du

On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote:
> So, if the code looks like the following, either now or in the future,
> 
> static void __schedule(bool preempt)
> {
>   ...
>   /* Clear RQCF_ACT_SKIP */
>   rq->clock_update_flags = 0;
>   ...
>   delta = rq_clock();
> }

Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it,
you clear everything. And if you clear the RQCF_UPDATE also (maybe you
shouldn't, but actually it does not matter), of course you will get
a warning...

In addition, it looks like multiple skips are possible, so:

update_rq_clock() {
rq->clock_update_flags |= RQCF_UPDATE;

...
}

instead of clearing the skip flag there.

[PATCH] mm: fix duplicate words and typos

2016-05-17 Thread Li Peng

Signed-off-by: Li Peng 
---
 mm/memcontrol.c | 2 +-
 mm/page_alloc.c | 6 +++---
 mm/vmscan.c | 7 +++
 mm/zswap.c  | 2 +-
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fe787f5..4b74255 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2293,7 +2293,7 @@ struct kmem_cache *__memcg_kmem_get_cache(struct 
kmem_cache *cachep, gfp_t gfp)
 
/*
 * If we are in a safe context (can wait, and not in interrupt
-* context), we could be be predictable and return right away.
+* context), we could be predictable and return right away.
 * This would guarantee that the allocation being performed
 * already belongs in the new cache.
 *
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c1069ef..93824cb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3030,7 +3030,7 @@ retry:
/*
 * If an allocation failed after direct reclaim, it could be because
 * pages are pinned on the per-cpu lists or in high alloc reserves.
-* Shrink them them and try again
+* Shrink them and try again.
 */
if (!page && !drained) {
unreserve_highatomic_pageblock(ac);
@@ -4812,7 +4812,7 @@ static int zone_batchsize(struct zone *zone)
  * locking.
  *
  * Any new users of pcp->batch and pcp->high should ensure they can cope with
- * those fields changing asynchronously (acording the the above rule).
+ * those fields changing asynchronously (according to the above rule).
  *
  * mutex_is_locked(_batch_high_lock) required when calling this function
  * outside of boot time (or some other assurance that no concurrent updaters
@@ -5024,7 +5024,7 @@ int __meminit __early_pfn_to_nid(unsigned long pfn,
  * @max_low_pfn: The highest PFN that will be passed to memblock_free_early_nid
  *
  * If an architecture guarantees that all ranges registered contain no holes
- * and may be freed, this this function may be used instead of calling
+ * and may be freed, this function may be used instead of calling
  * memblock_free_early_nid() manually.
  */
 void __init free_bootmem_with_active_regions(int nid, unsigned long 
max_low_pfn)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 142cb61..8ff5a79 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1683,8 +1683,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct 
lruvec *lruvec,
set_bit(ZONE_DIRTY, >flags);
 
/*
-* If kswapd scans pages marked marked for immediate
-* reclaim and under writeback (nr_immediate), it implies
+* If kswapd scans pages marked for immediate reclaim
+* and under writeback (nr_immediate), it implies
 * that pages are cycling through the LRU faster than
 * they are written so also forcibly stall.
 */
@@ -3267,8 +3267,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int 
classzone_idx)
/*
 * There should be no need to raise the scanning
 * priority if enough pages are already being scanned
-* that that high watermark would be met at 100%
-* efficiency.
+* that high watermark would be met at 100% efficiency.
 */
if (kswapd_shrink_zone(zone, end_zone, ))
raise_priority = false;
diff --git a/mm/zswap.c b/mm/zswap.c
index de0f119b..6d829d7 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -928,7 +928,7 @@ static int zswap_writeback_entry(struct zpool *pool, 
unsigned long handle)
* a load may happening concurrently
* it is safe and okay to not free the entry
* if we free the entry in the following put
-   * it it either okay to return !0
+   * it either okay to return !0
*/
 fail:
spin_lock(>lock);
-- 
1.8.3.1

[PATCH] mm: fix duplicate words and typos

2016-05-17 Thread Li Peng

Signed-off-by: Li Peng 
---
 mm/memcontrol.c | 2 +-
 mm/page_alloc.c | 6 +++---
 mm/vmscan.c | 7 +++
 mm/zswap.c  | 2 +-
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index fe787f5..4b74255 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2293,7 +2293,7 @@ struct kmem_cache *__memcg_kmem_get_cache(struct 
kmem_cache *cachep, gfp_t gfp)
 
/*
 * If we are in a safe context (can wait, and not in interrupt
-* context), we could be be predictable and return right away.
+* context), we could be predictable and return right away.
 * This would guarantee that the allocation being performed
 * already belongs in the new cache.
 *
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c1069ef..93824cb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3030,7 +3030,7 @@ retry:
/*
 * If an allocation failed after direct reclaim, it could be because
 * pages are pinned on the per-cpu lists or in high alloc reserves.
-* Shrink them them and try again
+* Shrink them and try again.
 */
if (!page && !drained) {
unreserve_highatomic_pageblock(ac);
@@ -4812,7 +4812,7 @@ static int zone_batchsize(struct zone *zone)
  * locking.
  *
  * Any new users of pcp->batch and pcp->high should ensure they can cope with
- * those fields changing asynchronously (acording the the above rule).
+ * those fields changing asynchronously (according to the above rule).
  *
  * mutex_is_locked(_batch_high_lock) required when calling this function
  * outside of boot time (or some other assurance that no concurrent updaters
@@ -5024,7 +5024,7 @@ int __meminit __early_pfn_to_nid(unsigned long pfn,
  * @max_low_pfn: The highest PFN that will be passed to memblock_free_early_nid
  *
  * If an architecture guarantees that all ranges registered contain no holes
- * and may be freed, this this function may be used instead of calling
+ * and may be freed, this function may be used instead of calling
  * memblock_free_early_nid() manually.
  */
 void __init free_bootmem_with_active_regions(int nid, unsigned long 
max_low_pfn)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 142cb61..8ff5a79 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1683,8 +1683,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct 
lruvec *lruvec,
set_bit(ZONE_DIRTY, >flags);
 
/*
-* If kswapd scans pages marked marked for immediate
-* reclaim and under writeback (nr_immediate), it implies
+* If kswapd scans pages marked for immediate reclaim
+* and under writeback (nr_immediate), it implies
 * that pages are cycling through the LRU faster than
 * they are written so also forcibly stall.
 */
@@ -3267,8 +3267,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int 
classzone_idx)
/*
 * There should be no need to raise the scanning
 * priority if enough pages are already being scanned
-* that that high watermark would be met at 100%
-* efficiency.
+* that high watermark would be met at 100% efficiency.
 */
if (kswapd_shrink_zone(zone, end_zone, ))
raise_priority = false;
diff --git a/mm/zswap.c b/mm/zswap.c
index de0f119b..6d829d7 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -928,7 +928,7 @@ static int zswap_writeback_entry(struct zpool *pool, 
unsigned long handle)
* a load may happening concurrently
* it is safe and okay to not free the entry
* if we free the entry in the following put
-   * it it either okay to return !0
+   * it either okay to return !0
*/
 fail:
spin_lock(>lock);
-- 
1.8.3.1

Re: [PATCH v12 00/10] arm64: Add kernel probes (kprobes) support

2016-05-17 Thread Huang Shijie

On Thu, May 12, 2016 at 10:26:40AM +0800, Li Bin wrote:
> 
> 
> on 2016/5/11 23:33, James Morse wrote:
> > Hi David,
> > 
> > On 27/04/16 19:52, David Long wrote:
> >> From: "David A. Long" 
> >>
> >> This patchset is heavily based on Sandeepa Prabhu's ARM v8 kprobes patches,
> >> first seen in October 2013. This version attempts to address concerns 
> >> raised by
> >> reviewers and also fixes problems discovered during testing.
> >>
> >> This patchset adds support for kernel probes(kprobes), jump probes(jprobes)
> >> and return probes(kretprobes) support for ARM64.
> >>
> >> The kprobes mechanism makes use of software breakpoint and single stepping
> >> support available in the ARM v8 kernel.
> > 
> > I applied this series on v4.6-rc7, and built the sample kprobes. They work 
> > fine,
> > unless I throw ftrace into the mix too.
> > 
> > I enabled the function_graph tracer, then tried to load the jprobe example 
> > module:
> > -%<-
> > root@ubuntu:/sys/kernel/debug/tracing# insmod /root/jprobe_example.ko
> > Planted jprobe at ff80080c8f20, handler addr ff8000bb3000
> > root@ubuntu:/sys/kernel/debug/tracing# jprobe: clone_flags = 0x1200011, 
> > stack_st
> > art = 0x0 stack_size = 0x0
> > Bad mode in Synchronous Abort handler detected, code 0x8605 -- IABT 
> > (current
> >  EL)
> > CPU: 5 PID: 1047 Comm: systemd-udevd Not tainted 4.6.0-rc7+ #4064
> > Hardware name: ARM Juno development board (r1) (DT)
> > task: ffc975948300 ti: ffc974e4c000 task.ti: ffc974e4c000
> > PC is at 0x0
> > LR is at 0x0
> > 
> > pc : [<>] lr : [<>] pstate: 6145
> > sp : ffc974e4ff00
> > x29: 01200011 x28: ffc974e4c000
> > x27: ff80088d x26: 00dc
> > x25: 0120 x24: 0015
> > x23: 6000 x22: 007fa1b40e60
> > x21: 007fa1ce70d0 x20: 
> > x19:  x18: 0a03
> > x17: 007fa1b40d90 x16: ff80080c9708
> > x15: 003b9aca x14: 007fddb7e5c0
> > x13: 007fa1b40e2c x12: 00d00ff0
> > x11: ff8009c4d000 x10: ff800920c000
> > x9 : ff8008f5c000 x8 : ffc976c06800
> > x7 : 0006daf2 x6 : 0015
> > x5 : 0004 x4 : ffc96e8690a0
> > x3 : 001ed7cbab74 x2 : ffc96e869000
> > x1 :  x0 : 
> > 
> > Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP
> > Modules linked in: jprobe_example
> > CPU: 5 PID: 1047 Comm: systemd-udevd Not tainted 4.6.0-rc7+ #4064
> > Hardware name: ARM Juno development board (r1) (DT)
> > task: ffc975948300 ti: ffc974e4c000 task.ti: ffc974e4c000
> > PC is at 0x0
> > LR is at 0x0
> > 
> > pc : [<>] lr : [<>] pstate: 6145
> > sp : ffc974e4ff00
> > x29: 01200011 x28: ffc974e4c000
> > x27: ff80088d x26: 00dc
> > x25: 0120 x24: 0015
> > x23: 6000 x22: 007fa1b40e60
> > x21: 007fa1ce70d0 x20: 
> > x19:  x18: 0a03
> > x17: 007fa1b40d90 x16: ff80080c9708
> > x15: 003b9aca x14: 007fddb7e5c0
> > x13: 007fa1b40e2c x12: 00d00ff0
> > x11: ff8009c4d000 x10: ff800920c000
> > x9 : ff8008f5c000 x8 : ffc976c06800
> > x7 : 0006daf2 x6 : 0015
> > x5 : 0004 x4 : ffc96e8690a0
> > x3 : 001ed7cbab74 x2 : ffc96e869000
> > x1 :  x0 : 
> > 
> > Process systemd-udevd (pid: 1047, stack limit = 0xffc974e4c020)
> > Stack: (0xffc974e4ff00 to 0xffc974e5)
> > ff00: 0417 007fa1ce76f0 00dc 0417
> > ff20:  007fddb7ecf8 0005 
> > ff40: ff01 003b9aca 00555b3868b0 007fa1b40d90
> > ff60: 0a03 007fddb7e5c0  007fddb7e5e0
> > ff80: 00555b358000 00558f56f0e0  00558f574f00
> > ffa0: 00558f574f00 04fa 00558f56f010 007fddb7e600
> > ffc0: 007fa1b40e2c 007fddb7e5c0 007fa1b40e60 6000
> > ffe0: 01200011 00dc 000484000200 08000200
> > Call trace:
> > [<  (null)>]   (null)
> > Code: bad PC value
> > ---[ end trace 35d24aad799c2941 ]---
> > -%<-
> > 
> 
> To solve this, it should pause function tracing before the jprobe handler is 
> called
> and unpause it before it returns back to the function it probed.
> 
> diff --git a/arch/arm64/kernel/kprobes.c b/arch/arm64/kernel/kprobes.c
> index db2d95c..b21ed00 100644
> --- a/arch/arm64/kernel/kprobes.c
> +++ b/arch/arm64/kernel/kprobes.c
> @@ -714,6 +714,7 @@ int __kprobes setjmp_pre_handler(struct kprobe *p, struct 
> pt_regs *regs)
> 
>

Re: [PATCH v12 00/10] arm64: Add kernel probes (kprobes) support

2016-05-17 Thread Huang Shijie

On Thu, May 12, 2016 at 10:26:40AM +0800, Li Bin wrote:
> 
> 
> on 2016/5/11 23:33, James Morse wrote:
> > Hi David,
> > 
> > On 27/04/16 19:52, David Long wrote:
> >> From: "David A. Long" 
> >>
> >> This patchset is heavily based on Sandeepa Prabhu's ARM v8 kprobes patches,
> >> first seen in October 2013. This version attempts to address concerns 
> >> raised by
> >> reviewers and also fixes problems discovered during testing.
> >>
> >> This patchset adds support for kernel probes(kprobes), jump probes(jprobes)
> >> and return probes(kretprobes) support for ARM64.
> >>
> >> The kprobes mechanism makes use of software breakpoint and single stepping
> >> support available in the ARM v8 kernel.
> > 
> > I applied this series on v4.6-rc7, and built the sample kprobes. They work 
> > fine,
> > unless I throw ftrace into the mix too.
> > 
> > I enabled the function_graph tracer, then tried to load the jprobe example 
> > module:
> > -%<-
> > root@ubuntu:/sys/kernel/debug/tracing# insmod /root/jprobe_example.ko
> > Planted jprobe at ff80080c8f20, handler addr ff8000bb3000
> > root@ubuntu:/sys/kernel/debug/tracing# jprobe: clone_flags = 0x1200011, 
> > stack_st
> > art = 0x0 stack_size = 0x0
> > Bad mode in Synchronous Abort handler detected, code 0x8605 -- IABT 
> > (current
> >  EL)
> > CPU: 5 PID: 1047 Comm: systemd-udevd Not tainted 4.6.0-rc7+ #4064
> > Hardware name: ARM Juno development board (r1) (DT)
> > task: ffc975948300 ti: ffc974e4c000 task.ti: ffc974e4c000
> > PC is at 0x0
> > LR is at 0x0
> > 
> > pc : [<>] lr : [<>] pstate: 6145
> > sp : ffc974e4ff00
> > x29: 01200011 x28: ffc974e4c000
> > x27: ff80088d x26: 00dc
> > x25: 0120 x24: 0015
> > x23: 6000 x22: 007fa1b40e60
> > x21: 007fa1ce70d0 x20: 
> > x19:  x18: 0a03
> > x17: 007fa1b40d90 x16: ff80080c9708
> > x15: 003b9aca x14: 007fddb7e5c0
> > x13: 007fa1b40e2c x12: 00d00ff0
> > x11: ff8009c4d000 x10: ff800920c000
> > x9 : ff8008f5c000 x8 : ffc976c06800
> > x7 : 0006daf2 x6 : 0015
> > x5 : 0004 x4 : ffc96e8690a0
> > x3 : 001ed7cbab74 x2 : ffc96e869000
> > x1 :  x0 : 
> > 
> > Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP
> > Modules linked in: jprobe_example
> > CPU: 5 PID: 1047 Comm: systemd-udevd Not tainted 4.6.0-rc7+ #4064
> > Hardware name: ARM Juno development board (r1) (DT)
> > task: ffc975948300 ti: ffc974e4c000 task.ti: ffc974e4c000
> > PC is at 0x0
> > LR is at 0x0
> > 
> > pc : [<>] lr : [<>] pstate: 6145
> > sp : ffc974e4ff00
> > x29: 01200011 x28: ffc974e4c000
> > x27: ff80088d x26: 00dc
> > x25: 0120 x24: 0015
> > x23: 6000 x22: 007fa1b40e60
> > x21: 007fa1ce70d0 x20: 
> > x19:  x18: 0a03
> > x17: 007fa1b40d90 x16: ff80080c9708
> > x15: 003b9aca x14: 007fddb7e5c0
> > x13: 007fa1b40e2c x12: 00d00ff0
> > x11: ff8009c4d000 x10: ff800920c000
> > x9 : ff8008f5c000 x8 : ffc976c06800
> > x7 : 0006daf2 x6 : 0015
> > x5 : 0004 x4 : ffc96e8690a0
> > x3 : 001ed7cbab74 x2 : ffc96e869000
> > x1 :  x0 : 
> > 
> > Process systemd-udevd (pid: 1047, stack limit = 0xffc974e4c020)
> > Stack: (0xffc974e4ff00 to 0xffc974e5)
> > ff00: 0417 007fa1ce76f0 00dc 0417
> > ff20:  007fddb7ecf8 0005 
> > ff40: ff01 003b9aca 00555b3868b0 007fa1b40d90
> > ff60: 0a03 007fddb7e5c0  007fddb7e5e0
> > ff80: 00555b358000 00558f56f0e0  00558f574f00
> > ffa0: 00558f574f00 04fa 00558f56f010 007fddb7e600
> > ffc0: 007fa1b40e2c 007fddb7e5c0 007fa1b40e60 6000
> > ffe0: 01200011 00dc 000484000200 08000200
> > Call trace:
> > [<  (null)>]   (null)
> > Code: bad PC value
> > ---[ end trace 35d24aad799c2941 ]---
> > -%<-
> > 
> 
> To solve this, it should pause function tracing before the jprobe handler is 
> called
> and unpause it before it returns back to the function it probed.
> 
> diff --git a/arch/arm64/kernel/kprobes.c b/arch/arm64/kernel/kprobes.c
> index db2d95c..b21ed00 100644
> --- a/arch/arm64/kernel/kprobes.c
> +++ b/arch/arm64/kernel/kprobes.c
> @@ -714,6 +714,7 @@ int __kprobes setjmp_pre_handler(struct kprobe *p, struct 
> pt_regs *regs)
> 
> instruction_pointer_set(regs,

Re: [PATCH] mmc: dw_mmc: Consider HLE errors to be data and command errors

2016-05-17 Thread Jaehoon Chung

On 05/18/2016 09:47 AM, Doug Anderson wrote:
> Jaehoon,
> 
> On Mon, Mar 30, 2015 at 8:47 AM, Doug Anderson  wrote:
>> Jaehoon,
>>
>> On Sun, Mar 29, 2015 at 5:55 PM, Jaehoon Chung  
>> wrote:
>>> Dear Doug,
>>>
>>> I'm considering to control HLE error..So holding this patch.
>>> If this is absolutely necessary patch, let me know, plz.
>>>
>>> Best Regards,
>>> Jaehoon Chung
>>
>> Sounds OK.  I have certainly applied this locally and the driver isn't
>> robust against insertions / removals without it, but once the card is
>> inserted things are OK so it's probably not urgent that it be applied
>> upstream.  Hopefully we can figure out a better solution...
> 
> I'm now testing a nice new rebased kernel and I'm hitting this again.
> 
> Of course I'll just pick my same patch to my new kernel tree, but
> since it's been a year and nobody has done anything better, would you
> consider landing my patch?  It is certainly better than nothing.

Sure, it's right.
I think that main reason of HLE is wait_prvdata_complete. (I'm guessing..)
On other hands, dwmmc controller is handling something wrong. (I found that HLE 
is occurred the similar case.)
After find the main solution, it's not bad that your patch is applied on dwmmc 
controller.

Ulf have sent PR for next..So if we needs to apply this, i will apply on fix.

Best Regards,
Jaehoon Chung

> 
> -Doug
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>

Re: [PATCH] mmc: dw_mmc: Consider HLE errors to be data and command errors

2016-05-17 Thread Jaehoon Chung

On 05/18/2016 09:47 AM, Doug Anderson wrote:
> Jaehoon,
> 
> On Mon, Mar 30, 2015 at 8:47 AM, Doug Anderson  wrote:
>> Jaehoon,
>>
>> On Sun, Mar 29, 2015 at 5:55 PM, Jaehoon Chung  
>> wrote:
>>> Dear Doug,
>>>
>>> I'm considering to control HLE error..So holding this patch.
>>> If this is absolutely necessary patch, let me know, plz.
>>>
>>> Best Regards,
>>> Jaehoon Chung
>>
>> Sounds OK.  I have certainly applied this locally and the driver isn't
>> robust against insertions / removals without it, but once the card is
>> inserted things are OK so it's probably not urgent that it be applied
>> upstream.  Hopefully we can figure out a better solution...
> 
> I'm now testing a nice new rebased kernel and I'm hitting this again.
> 
> Of course I'll just pick my same patch to my new kernel tree, but
> since it's been a year and nobody has done anything better, would you
> consider landing my patch?  It is certainly better than nothing.

Sure, it's right.
I think that main reason of HLE is wait_prvdata_complete. (I'm guessing..)
On other hands, dwmmc controller is handling something wrong. (I found that HLE 
is occurred the similar case.)
After find the main solution, it's not bad that your patch is applied on dwmmc 
controller.

Ulf have sent PR for next..So if we needs to apply this, i will apply on fix.

Best Regards,
Jaehoon Chung

> 
> -Doug
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>

[PATCH v2] Drivers: hv: vmbus: fix the race when querying & updating the percpu list

2016-05-17 Thread Dexuan Cui

There is a rare race when we remove an entry from the global list
hv_context.percpu_list[cpu] in hv_process_channel_removal() ->
percpu_channel_deq() -> list_del(): at this time, if vmbus_on_event() ->
process_chn_event() -> pcpu_relid2channel() is trying to query the list,
we can get the general protection fault:

general protection fault:  [#1] SMP
...
RIP: 0010:[]  [] vmbus_on_event+0xc4/0x149

Similarly, we also have the issue in the code path: vmbus_process_offer() ->
percpu_channel_enq().

We can resolve the issue by disabling the tasklet when updating the list.

Reported-by: Rolf Neugebauer 
Cc: Vitaly Kuznetsov 
Signed-off-by: Dexuan Cui 
---

v2: added tasklet_schedule() after tasklet_enable(). Thanks, Vitaly!

 drivers/hv/channel.c  |  5 +
 drivers/hv/channel_mgmt.c | 24 +---
 include/linux/hyperv.h|  3 +++
 3 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 56dd261..17c4711 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -546,8 +546,11 @@ static int vmbus_close_internal(struct vmbus_channel 
*channel)
put_cpu();
smp_call_function_single(channel->target_cpu, reset_channel_cb,
 channel, true);
+   smp_call_function_single(channel->target_cpu,
+percpu_channel_deq, channel, true);
} else {
reset_channel_cb(channel);
+   percpu_channel_deq(channel);
put_cpu();
}
 
@@ -592,6 +595,8 @@ static int vmbus_close_internal(struct vmbus_channel 
*channel)
 
 out:
tasklet_enable(tasklet);
+   /* for possible pending event */
+   tasklet_schedule(tasklet);
 
return ret;
 }
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 38b682ba..8e251e3 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -21,6 +21,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -277,7 +278,7 @@ static void free_channel(struct vmbus_channel *channel)
kfree(channel);
 }
 
-static void percpu_channel_enq(void *arg)
+void percpu_channel_enq(void *arg)
 {
struct vmbus_channel *channel = arg;
int cpu = smp_processor_id();
@@ -285,7 +286,7 @@ static void percpu_channel_enq(void *arg)
list_add_tail(>percpu_list, _context.percpu_list[cpu]);
 }
 
-static void percpu_channel_deq(void *arg)
+void percpu_channel_deq(void *arg)
 {
struct vmbus_channel *channel = arg;
 
@@ -313,15 +314,6 @@ void hv_process_channel_removal(struct vmbus_channel 
*channel, u32 relid)
BUG_ON(!channel->rescind);
BUG_ON(!mutex_is_locked(_connection.channel_mutex));
 
-   if (channel->target_cpu != get_cpu()) {
-   put_cpu();
-   smp_call_function_single(channel->target_cpu,
-percpu_channel_deq, channel, true);
-   } else {
-   percpu_channel_deq(channel);
-   put_cpu();
-   }
-
if (channel->primary_channel == NULL) {
list_del(>listentry);
 
@@ -363,6 +355,7 @@ void vmbus_free_channels(void)
  */
 static void vmbus_process_offer(struct vmbus_channel *newchannel)
 {
+   struct tasklet_struct *tasklet;
struct vmbus_channel *channel;
bool fnew = true;
unsigned long flags;
@@ -409,6 +402,8 @@ static void vmbus_process_offer(struct vmbus_channel 
*newchannel)
 
init_vp_index(newchannel, dev_type);
 
+   tasklet = hv_context.event_dpc[newchannel->target_cpu];
+   tasklet_disable(tasklet);
if (newchannel->target_cpu != get_cpu()) {
put_cpu();
smp_call_function_single(newchannel->target_cpu,
@@ -418,6 +413,9 @@ static void vmbus_process_offer(struct vmbus_channel 
*newchannel)
percpu_channel_enq(newchannel);
put_cpu();
}
+   tasklet_enable(tasklet);
+   /* for possible pending event */
+   tasklet_schedule(tasklet);
 
/*
 * This state is used to indicate a successful open
@@ -469,6 +467,7 @@ err_deq_chan:
list_del(>listentry);
mutex_unlock(_connection.channel_mutex);
 
+   tasklet_disable(tasklet);
if (newchannel->target_cpu != get_cpu()) {
put_cpu();
smp_call_function_single(newchannel->target_cpu,
@@ -477,6 +476,9 @@ err_deq_chan:
percpu_channel_deq(newchannel);
put_cpu();
}
+   tasklet_enable(tasklet);
+   /* for possible pending event */
+   tasklet_schedule(tasklet);
 
 err_free_chan:
free_channel(newchannel);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 7be7237..95aea09 100644
--- a/include/linux/hyperv.h
+++

[PATCH v2] Drivers: hv: vmbus: fix the race when querying & updating the percpu list

2016-05-17 Thread Dexuan Cui

There is a rare race when we remove an entry from the global list
hv_context.percpu_list[cpu] in hv_process_channel_removal() ->
percpu_channel_deq() -> list_del(): at this time, if vmbus_on_event() ->
process_chn_event() -> pcpu_relid2channel() is trying to query the list,
we can get the general protection fault:

general protection fault:  [#1] SMP
...
RIP: 0010:[]  [] vmbus_on_event+0xc4/0x149

Similarly, we also have the issue in the code path: vmbus_process_offer() ->
percpu_channel_enq().

We can resolve the issue by disabling the tasklet when updating the list.

Reported-by: Rolf Neugebauer 
Cc: Vitaly Kuznetsov 
Signed-off-by: Dexuan Cui 
---

v2: added tasklet_schedule() after tasklet_enable(). Thanks, Vitaly!

 drivers/hv/channel.c  |  5 +
 drivers/hv/channel_mgmt.c | 24 +---
 include/linux/hyperv.h|  3 +++
 3 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 56dd261..17c4711 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -546,8 +546,11 @@ static int vmbus_close_internal(struct vmbus_channel 
*channel)
put_cpu();
smp_call_function_single(channel->target_cpu, reset_channel_cb,
 channel, true);
+   smp_call_function_single(channel->target_cpu,
+percpu_channel_deq, channel, true);
} else {
reset_channel_cb(channel);
+   percpu_channel_deq(channel);
put_cpu();
}
 
@@ -592,6 +595,8 @@ static int vmbus_close_internal(struct vmbus_channel 
*channel)
 
 out:
tasklet_enable(tasklet);
+   /* for possible pending event */
+   tasklet_schedule(tasklet);
 
return ret;
 }
diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 38b682ba..8e251e3 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -21,6 +21,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -277,7 +278,7 @@ static void free_channel(struct vmbus_channel *channel)
kfree(channel);
 }
 
-static void percpu_channel_enq(void *arg)
+void percpu_channel_enq(void *arg)
 {
struct vmbus_channel *channel = arg;
int cpu = smp_processor_id();
@@ -285,7 +286,7 @@ static void percpu_channel_enq(void *arg)
list_add_tail(>percpu_list, _context.percpu_list[cpu]);
 }
 
-static void percpu_channel_deq(void *arg)
+void percpu_channel_deq(void *arg)
 {
struct vmbus_channel *channel = arg;
 
@@ -313,15 +314,6 @@ void hv_process_channel_removal(struct vmbus_channel 
*channel, u32 relid)
BUG_ON(!channel->rescind);
BUG_ON(!mutex_is_locked(_connection.channel_mutex));
 
-   if (channel->target_cpu != get_cpu()) {
-   put_cpu();
-   smp_call_function_single(channel->target_cpu,
-percpu_channel_deq, channel, true);
-   } else {
-   percpu_channel_deq(channel);
-   put_cpu();
-   }
-
if (channel->primary_channel == NULL) {
list_del(>listentry);
 
@@ -363,6 +355,7 @@ void vmbus_free_channels(void)
  */
 static void vmbus_process_offer(struct vmbus_channel *newchannel)
 {
+   struct tasklet_struct *tasklet;
struct vmbus_channel *channel;
bool fnew = true;
unsigned long flags;
@@ -409,6 +402,8 @@ static void vmbus_process_offer(struct vmbus_channel 
*newchannel)
 
init_vp_index(newchannel, dev_type);
 
+   tasklet = hv_context.event_dpc[newchannel->target_cpu];
+   tasklet_disable(tasklet);
if (newchannel->target_cpu != get_cpu()) {
put_cpu();
smp_call_function_single(newchannel->target_cpu,
@@ -418,6 +413,9 @@ static void vmbus_process_offer(struct vmbus_channel 
*newchannel)
percpu_channel_enq(newchannel);
put_cpu();
}
+   tasklet_enable(tasklet);
+   /* for possible pending event */
+   tasklet_schedule(tasklet);
 
/*
 * This state is used to indicate a successful open
@@ -469,6 +467,7 @@ err_deq_chan:
list_del(>listentry);
mutex_unlock(_connection.channel_mutex);
 
+   tasklet_disable(tasklet);
if (newchannel->target_cpu != get_cpu()) {
put_cpu();
smp_call_function_single(newchannel->target_cpu,
@@ -477,6 +476,9 @@ err_deq_chan:
percpu_channel_deq(newchannel);
put_cpu();
}
+   tasklet_enable(tasklet);
+   /* for possible pending event */
+   tasklet_schedule(tasklet);
 
 err_free_chan:
free_channel(newchannel);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 7be7237..95aea09 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1328,6 +1328,9 @@ extern bool

Re: [PATCH] mmc: dw_mmc: Consider HLE errors to be data and command errors

2016-05-17 Thread Shawn Lin


Hi Doug,

On 2016-5-18 8:47, Doug Anderson wrote:

Jaehoon,

On Mon, Mar 30, 2015 at 8:47 AM, Doug Anderson  wrote:

Jaehoon,

On Sun, Mar 29, 2015 at 5:55 PM, Jaehoon Chung  wrote:

Dear Doug,

I'm considering to control HLE error..So holding this patch.
If this is absolutely necessary patch, let me know, plz.

Best Regards,
Jaehoon Chung

Sounds OK.  I have certainly applied this locally and the driver isn't
robust against insertions / removals without it, but once the card is
inserted things are OK so it's probably not urgent that it be applied
upstream.  Hopefully we can figure out a better solution...

I'm now testing a nice new rebased kernel and I'm hitting this again.

Of course I'll just pick my same patch to my new kernel tree, but
since it's been a year and nobody has done anything better, would you
consider landing my patch?  It is certainly better than nothing.


Could you try this patch to see if you can still find HLE?

@@ -2356,12 +2356,22 @@ static void dw_mci_cmd_interrupt(struct dw_mci *host, 
u32 status)
 static void dw_mci_handle_cd(struct dw_mci *host)
 {
int i;
+   int present;

for (i = 0; i < host->num_slots; i++) {
struct dw_mci_slot *slot = host->slot[i];

if (!slot)
continue;

+   present = !(mci_readl(slot->host, CDETECT) & (1 << slot->id));
+   if (present)
+   set_bit(DW_MMC_CARD_PRESENT, >flags);
+   else
+   clear_bit(DW_MMC_CARD_PRESENT, >flags);

if (slot->mmc->ops->card_event)
slot->mmc->ops->card_event(slot->mmc);




-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] mmc: dw_mmc: Consider HLE errors to be data and command errors

2016-05-17 Thread Shawn Lin


Hi Doug,

On 2016-5-18 8:47, Doug Anderson wrote:

Jaehoon,

On Mon, Mar 30, 2015 at 8:47 AM, Doug Anderson  wrote:

Jaehoon,

On Sun, Mar 29, 2015 at 5:55 PM, Jaehoon Chung  wrote:

Dear Doug,

I'm considering to control HLE error..So holding this patch.
If this is absolutely necessary patch, let me know, plz.

Best Regards,
Jaehoon Chung

Sounds OK.  I have certainly applied this locally and the driver isn't
robust against insertions / removals without it, but once the card is
inserted things are OK so it's probably not urgent that it be applied
upstream.  Hopefully we can figure out a better solution...

I'm now testing a nice new rebased kernel and I'm hitting this again.

Of course I'll just pick my same patch to my new kernel tree, but
since it's been a year and nobody has done anything better, would you
consider landing my patch?  It is certainly better than nothing.


Could you try this patch to see if you can still find HLE?

@@ -2356,12 +2356,22 @@ static void dw_mci_cmd_interrupt(struct dw_mci *host, 
u32 status)
 static void dw_mci_handle_cd(struct dw_mci *host)
 {
int i;
+   int present;

for (i = 0; i < host->num_slots; i++) {
struct dw_mci_slot *slot = host->slot[i];

if (!slot)
continue;

+   present = !(mci_readl(slot->host, CDETECT) & (1 << slot->id));
+   if (present)
+   set_bit(DW_MMC_CARD_PRESENT, >flags);
+   else
+   clear_bit(DW_MMC_CARD_PRESENT, >flags);

if (slot->mmc->ops->card_event)
slot->mmc->ops->card_event(slot->mmc);




-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-17 Thread Hekuang




在 2016/5/16 10:50, David Ahern 写道:

On 5/15/16 7:30 PM, Hekuang wrote:

In previous patch, I use 'perf buildid-cache -a' to add vdso
binary into the HOST buildid dir.


So 'perf buildid-cache' needs the symfs option?



With this patch 'PATCH v3 3/7 UPDATE', the tree of symfs dir is
like this:

├── debug($(dso-prefix))
│   ├── .build-id
│   │   ├── 3a
│   │   │   └── e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd -> 
../../[kernel.kallsyms]/3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd

│   │   └── 84
│   │   └── dbd75729adba57cc42f5544b25de571c0c8731 -> 
../../[vdso32]/84dbd75729adba57cc42f5544b25de571c0c8731

│   ├── [kernel.kallsyms]
│   │   └── 3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
│   ├── [vdso]
│   │   └── 84dbd75729adba57cc42f5544b25de571c0c8731
│   └── [vdso32]
│   └── 84dbd75729adba57cc42f5544b25de571c0c8731
├── lib
│   ├── ld-2.22.so
│   └── libc-2.22.so
├── tmp
│   └── hello
└── xxx

So all binaries we need are included in the symfs dir. I think
this is consistent with your idea explained in previous mails.

With this symfs, we do not need buildid dir anymore and what's
your idea on 'perf buildid-cache' needs symfs option? after all,
that only effects on buildid dir.

Thanks.

Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-17 Thread Hekuang




在 2016/5/16 10:50, David Ahern 写道:

On 5/15/16 7:30 PM, Hekuang wrote:

In previous patch, I use 'perf buildid-cache -a' to add vdso
binary into the HOST buildid dir.


So 'perf buildid-cache' needs the symfs option?



With this patch 'PATCH v3 3/7 UPDATE', the tree of symfs dir is
like this:

├── debug($(dso-prefix))
│   ├── .build-id
│   │   ├── 3a
│   │   │   └── e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd -> 
../../[kernel.kallsyms]/3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd

│   │   └── 84
│   │   └── dbd75729adba57cc42f5544b25de571c0c8731 -> 
../../[vdso32]/84dbd75729adba57cc42f5544b25de571c0c8731

│   ├── [kernel.kallsyms]
│   │   └── 3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
│   ├── [vdso]
│   │   └── 84dbd75729adba57cc42f5544b25de571c0c8731
│   └── [vdso32]
│   └── 84dbd75729adba57cc42f5544b25de571c0c8731
├── lib
│   ├── ld-2.22.so
│   └── libc-2.22.so
├── tmp
│   └── hello
└── xxx

So all binaries we need are included in the symfs dir. I think
this is consistent with your idea explained in previous mails.

With this symfs, we do not need buildid dir anymore and what's
your idea on 'perf buildid-cache' needs symfs option? after all,
that only effects on buildid dir.

Thanks.

Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-17 Thread David Ahern


On 5/17/16 7:47 PM, Hekuang wrote:



在 2016/5/16 10:50, David Ahern 写道:

On 5/15/16 7:30 PM, Hekuang wrote:

In previous patch, I use 'perf buildid-cache -a' to add vdso
binary into the HOST buildid dir.


So 'perf buildid-cache' needs the symfs option?



With this patch 'PATCH v3 3/7 UPDATE', the tree of symfs dir is
like this:

├── debug($(dso-prefix))
│   ├── .build-id
│   │   ├── 3a
│   │   │   └── e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd ->
../../[kernel.kallsyms]/3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
│   │   └── 84
│   │   └── dbd75729adba57cc42f5544b25de571c0c8731 ->
../../[vdso32]/84dbd75729adba57cc42f5544b25de571c0c8731
│   ├── [kernel.kallsyms]
│   │   └── 3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
│   ├── [vdso]
│   │   └── 84dbd75729adba57cc42f5544b25de571c0c8731
│   └── [vdso32]
│   └── 84dbd75729adba57cc42f5544b25de571c0c8731
├── lib
│   ├── ld-2.22.so
│   └── libc-2.22.so
├── tmp
│   └── hello
└── xxx

So all binaries we need are included in the symfs dir. I think
this is consistent with your idea explained in previous mails.

With this symfs, we do not need buildid dir anymore and what's
your idea on 'perf buildid-cache' needs symfs option? after all,
that only effects on buildid dir.


I don't understand why dso-prefix option is needed? Why make me type yet 
more options to the analysis command? Why can't the directory be located 
under the symfs tree in a known location and populated the same way it 
is without symfs?

Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-17 Thread David Ahern


On 5/17/16 7:47 PM, Hekuang wrote:



在 2016/5/16 10:50, David Ahern 写道:

On 5/15/16 7:30 PM, Hekuang wrote:

In previous patch, I use 'perf buildid-cache -a' to add vdso
binary into the HOST buildid dir.


So 'perf buildid-cache' needs the symfs option?



With this patch 'PATCH v3 3/7 UPDATE', the tree of symfs dir is
like this:

├── debug($(dso-prefix))
│   ├── .build-id
│   │   ├── 3a
│   │   │   └── e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd ->
../../[kernel.kallsyms]/3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
│   │   └── 84
│   │   └── dbd75729adba57cc42f5544b25de571c0c8731 ->
../../[vdso32]/84dbd75729adba57cc42f5544b25de571c0c8731
│   ├── [kernel.kallsyms]
│   │   └── 3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
│   ├── [vdso]
│   │   └── 84dbd75729adba57cc42f5544b25de571c0c8731
│   └── [vdso32]
│   └── 84dbd75729adba57cc42f5544b25de571c0c8731
├── lib
│   ├── ld-2.22.so
│   └── libc-2.22.so
├── tmp
│   └── hello
└── xxx

So all binaries we need are included in the symfs dir. I think
this is consistent with your idea explained in previous mails.

With this symfs, we do not need buildid dir anymore and what's
your idea on 'perf buildid-cache' needs symfs option? after all,
that only effects on buildid dir.


I don't understand why dso-prefix option is needed? Why make me type yet 
more options to the analysis command? Why can't the directory be located 
under the symfs tree in a known location and populated the same way it 
is without symfs?

Linux-next parallel cp workload hang

2016-05-17 Thread Xiong Zhou

Hi,

Parallel cp workload (xfstests generic/273) hangs like blow.
It's reproducible with a small chance, less the 1/100 i think.

Have hit this in linux-next 20160504 0506 0510 trees, testing on
xfs with loop or block device. Ext4 survived several rounds
of testing.

Linux next 20160510 tree hangs within 500 rounds testing several
times. The same tree with vfs parallel lookup patchset reverted
survived 900 rounds testing. Reverted commits are attached.

Bisecting in this patchset ided this commit:

3b0a3c1ac1598722fc289da19219d14f2a37b31f is the first bad commit
commit 3b0a3c1ac1598722fc289da19219d14f2a37b31f
Author: Al Viro 
Date:   Wed Apr 20 23:42:46 2016 -0400

simple local filesystems: switch to ->iterate_shared()

no changes needed (XFS isn't simple, but it has the same parallelism
in the interesting parts exercised from CXFS).

With this commit reverted on top of Linux next 0510 tree, 5000+ rounds
of testing passed.

Although 2000 rounds testing had been conducted before good/bad
verdict, i'm not 100 percent sure about all this, since it's
so hard to hit, and i am not that lucky..

Bisect log and full blocked state process dump log are also attached.

Furthermore, this was first hit when testing fs dax on nvdimm,
however it's reproducible without dax mount option, and also
reproducible on loop device, just seems harder to hit.

Thanks,
Xiong

[0.771475] INFO: task cp:49033 blocked for more than 120 seconds.
[0.794263]   Not tainted 4.6.0-rc6-next-20160504 #5
[0.812515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[0.841801] cp  D 880b4e977928 0 49033  49014
0x0080
[0.868923]  880b4e977928 880ba275d380 880b8d712b80
880b4e978000
[0.897504]  7fff 0002 
880b8d712b80
[0.925234]  880b4e977940 816cbc25 88035a1dabb0
880b4e9779e8
[0.953237] Call Trace:
[0.958314]  [] schedule+0x35/0x80
[0.974854]  [] schedule_timeout+0x231/0x2d0
[0.995728]  [] ? down_trylock+0x2d/0x40
[1.015351]  [] ? xfs_iext_bno_to_ext+0xa2/0x190 [xfs]
[1.040182]  [] __down_common+0xaa/0x104
[1.059021]  [] ? _xfs_buf_find+0x162/0x340 [xfs]
[1.081357]  [] __down+0x1d/0x1f
[1.097166]  [] down+0x41/0x50
[1.112869]  [] xfs_buf_lock+0x3c/0xf0 [xfs]
[1.134504]  [] _xfs_buf_find+0x162/0x340 [xfs]
[1.156871]  [] xfs_buf_get_map+0x2a/0x270 [xfs]
[1.180010]  [] xfs_buf_read_map+0x2d/0x180 [xfs]
[1.203538]  [] xfs_trans_read_buf_map+0xf1/0x300 [xfs]
[1.229194]  [] xfs_da_read_buf+0xd1/0x100 [xfs]
[1.251948]  [] xfs_dir3_data_read+0x26/0x60 [xfs]
[1.275736]  []
xfs_dir2_leaf_readbuf.isra.12+0x1be/0x4a0 [xfs]
[1.305094]  [] ? down_read+0x12/0x30
[1.323787]  [] ? xfs_ilock+0xe4/0x110 [xfs]
[1.345114]  [] xfs_dir2_leaf_getdents+0x13b/0x3d0
[xfs]
[1.371818]  [] xfs_readdir+0x1a6/0x1c0 [xfs]
[1.393471]  [] xfs_file_readdir+0x2b/0x30 [xfs]
[1.416874]  [] iterate_dir+0x173/0x190
[1.436709]  [] ? do_audit_syscall_entry+0x66/0x70
[1.460951]  [] SyS_getdents+0x98/0x120
[1.480566]  [] ? iterate_dir+0x190/0x190
[1.500909]  [] do_syscall_64+0x62/0x110
[1.520847]  [] entry_SYSCALL64_slow_path+0x25/0x25
[1.545372] INFO: task cp:49040 blocked for more than 120 seconds.
[1.568933]   Not tainted 4.6.0-rc6-next-20160504 #5
[1.587943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[1.618544] cp  D 880b91463b00 0 49040  49016
0x0080
[1.645502]  880b91463b00 880464d5c140 88029b475700
880b91464000
[1.674145]  880411c42610  880411c42628
8802c10bc610
[1.702834]  880b91463b18 816cbc25 88029b475700
880b91463b88
[1.731501] Call Trace:
[1.736866]  [] schedule+0x35/0x80
[1.754119]  [] rwsem_down_read_failed+0xf2/0x140
[1.777411]  [] ? xfs_ilock_data_map_shared+0x30/0x40
[xfs]
[1.805090]  [] call_rwsem_down_read_failed+0x18/0x30
[1.830482]  [] down_read+0x20/0x30
[1.848505]  [] xfs_ilock+0xe4/0x110 [xfs]
[1.869293]  [] xfs_ilock_data_map_shared+0x30/0x40
[xfs]
[1.896775]  [] xfs_dir_open+0x30/0x60 [xfs]
[1.917882]  [] do_dentry_open+0x20f/0x320
[1.938919]  [] ? xfs_file_mmap+0x50/0x50 [xfs]
[1.961532]  [] vfs_open+0x57/0x60
[1.978945]  [] path_openat+0x325/0x14e0
[1.999273]  [] ? putname+0x53/0x60
[2.017695]  [] do_filp_open+0x91/0x100
[2.036893]  [] ? __alloc_fd+0x46/0x180
[2.055479]  [] do_sys_open+0x124/0x210
[2.073783]  [] ? __audit_syscall_exit+0x1db/0x260
[2.096426]  [] SyS_openat+0x14/0x20
[2.113690]  [] do_syscall_64+0x62/0x110
[2.132417]  [] entry_SYSCALL64_slow_path+0x25/0x25



g273-block-dumps.tar.gz
Description: application/gzip

Linux-next parallel cp workload hang

2016-05-17 Thread Xiong Zhou

Hi,

Parallel cp workload (xfstests generic/273) hangs like blow.
It's reproducible with a small chance, less the 1/100 i think.

Have hit this in linux-next 20160504 0506 0510 trees, testing on
xfs with loop or block device. Ext4 survived several rounds
of testing.

Linux next 20160510 tree hangs within 500 rounds testing several
times. The same tree with vfs parallel lookup patchset reverted
survived 900 rounds testing. Reverted commits are attached.

Bisecting in this patchset ided this commit:

3b0a3c1ac1598722fc289da19219d14f2a37b31f is the first bad commit
commit 3b0a3c1ac1598722fc289da19219d14f2a37b31f
Author: Al Viro 
Date:   Wed Apr 20 23:42:46 2016 -0400

simple local filesystems: switch to ->iterate_shared()

no changes needed (XFS isn't simple, but it has the same parallelism
in the interesting parts exercised from CXFS).

With this commit reverted on top of Linux next 0510 tree, 5000+ rounds
of testing passed.

Although 2000 rounds testing had been conducted before good/bad
verdict, i'm not 100 percent sure about all this, since it's
so hard to hit, and i am not that lucky..

Bisect log and full blocked state process dump log are also attached.

Furthermore, this was first hit when testing fs dax on nvdimm,
however it's reproducible without dax mount option, and also
reproducible on loop device, just seems harder to hit.

Thanks,
Xiong

[0.771475] INFO: task cp:49033 blocked for more than 120 seconds.
[0.794263]   Not tainted 4.6.0-rc6-next-20160504 #5
[0.812515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[0.841801] cp  D 880b4e977928 0 49033  49014
0x0080
[0.868923]  880b4e977928 880ba275d380 880b8d712b80
880b4e978000
[0.897504]  7fff 0002 
880b8d712b80
[0.925234]  880b4e977940 816cbc25 88035a1dabb0
880b4e9779e8
[0.953237] Call Trace:
[0.958314]  [] schedule+0x35/0x80
[0.974854]  [] schedule_timeout+0x231/0x2d0
[0.995728]  [] ? down_trylock+0x2d/0x40
[1.015351]  [] ? xfs_iext_bno_to_ext+0xa2/0x190 [xfs]
[1.040182]  [] __down_common+0xaa/0x104
[1.059021]  [] ? _xfs_buf_find+0x162/0x340 [xfs]
[1.081357]  [] __down+0x1d/0x1f
[1.097166]  [] down+0x41/0x50
[1.112869]  [] xfs_buf_lock+0x3c/0xf0 [xfs]
[1.134504]  [] _xfs_buf_find+0x162/0x340 [xfs]
[1.156871]  [] xfs_buf_get_map+0x2a/0x270 [xfs]
[1.180010]  [] xfs_buf_read_map+0x2d/0x180 [xfs]
[1.203538]  [] xfs_trans_read_buf_map+0xf1/0x300 [xfs]
[1.229194]  [] xfs_da_read_buf+0xd1/0x100 [xfs]
[1.251948]  [] xfs_dir3_data_read+0x26/0x60 [xfs]
[1.275736]  []
xfs_dir2_leaf_readbuf.isra.12+0x1be/0x4a0 [xfs]
[1.305094]  [] ? down_read+0x12/0x30
[1.323787]  [] ? xfs_ilock+0xe4/0x110 [xfs]
[1.345114]  [] xfs_dir2_leaf_getdents+0x13b/0x3d0
[xfs]
[1.371818]  [] xfs_readdir+0x1a6/0x1c0 [xfs]
[1.393471]  [] xfs_file_readdir+0x2b/0x30 [xfs]
[1.416874]  [] iterate_dir+0x173/0x190
[1.436709]  [] ? do_audit_syscall_entry+0x66/0x70
[1.460951]  [] SyS_getdents+0x98/0x120
[1.480566]  [] ? iterate_dir+0x190/0x190
[1.500909]  [] do_syscall_64+0x62/0x110
[1.520847]  [] entry_SYSCALL64_slow_path+0x25/0x25
[1.545372] INFO: task cp:49040 blocked for more than 120 seconds.
[1.568933]   Not tainted 4.6.0-rc6-next-20160504 #5
[1.587943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[1.618544] cp  D 880b91463b00 0 49040  49016
0x0080
[1.645502]  880b91463b00 880464d5c140 88029b475700
880b91464000
[1.674145]  880411c42610  880411c42628
8802c10bc610
[1.702834]  880b91463b18 816cbc25 88029b475700
880b91463b88
[1.731501] Call Trace:
[1.736866]  [] schedule+0x35/0x80
[1.754119]  [] rwsem_down_read_failed+0xf2/0x140
[1.777411]  [] ? xfs_ilock_data_map_shared+0x30/0x40
[xfs]
[1.805090]  [] call_rwsem_down_read_failed+0x18/0x30
[1.830482]  [] down_read+0x20/0x30
[1.848505]  [] xfs_ilock+0xe4/0x110 [xfs]
[1.869293]  [] xfs_ilock_data_map_shared+0x30/0x40
[xfs]
[1.896775]  [] xfs_dir_open+0x30/0x60 [xfs]
[1.917882]  [] do_dentry_open+0x20f/0x320
[1.938919]  [] ? xfs_file_mmap+0x50/0x50 [xfs]
[1.961532]  [] vfs_open+0x57/0x60
[1.978945]  [] path_openat+0x325/0x14e0
[1.999273]  [] ? putname+0x53/0x60
[2.017695]  [] do_filp_open+0x91/0x100
[2.036893]  [] ? __alloc_fd+0x46/0x180
[2.055479]  [] do_sys_open+0x124/0x210
[2.073783]  [] ? __audit_syscall_exit+0x1db/0x260
[2.096426]  [] SyS_openat+0x14/0x20
[2.113690]  [] do_syscall_64+0x62/0x110
[2.132417]  [] entry_SYSCALL64_slow_path+0x25/0x25



g273-block-dumps.tar.gz
Description: application/gzip

Re: [PATCH] doc: self-protection: provide initial details

2016-05-17 Thread Kees Cook

On Tue, May 17, 2016 at 6:26 PM, Jonathan Corbet  wrote:
> On Mon, 16 May 2016 19:27:28 -0700
> Kees Cook  wrote:
>
>> This document attempts to codify the intent around kernel self-protection
>> along with discussion of both existing and desired technologies, with
>> attention given to the rationale behind them, and the expectations of
>> their usage.
>
> I've applied this to the docs tree.  In the process, I took the liberty
> of applying the suggestions from Randy, hope you don't mind...

Ah, thanks! I'll send a follow-up. I had a suggestion for another
section and a typo fix.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

Re: [PATCH] doc: self-protection: provide initial details

2016-05-17 Thread Kees Cook

On Tue, May 17, 2016 at 6:26 PM, Jonathan Corbet  wrote:
> On Mon, 16 May 2016 19:27:28 -0700
> Kees Cook  wrote:
>
>> This document attempts to codify the intent around kernel self-protection
>> along with discussion of both existing and desired technologies, with
>> attention given to the rationale behind them, and the expectations of
>> their usage.
>
> I've applied this to the docs tree.  In the process, I took the liberty
> of applying the suggestions from Randy, hope you don't mind...

Ah, thanks! I'll send a follow-up. I had a suggestion for another
section and a typo fix.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

[PATCH] mm: disable fault around on emulated access bit architecture

2016-05-17 Thread Minchan Kim

On Tue, May 17, 2016 at 03:34:23PM +0300, Kirill A. Shutemov wrote:
> On Mon, May 16, 2016 at 11:56:32PM +0900, Minchan Kim wrote:
> > On Mon, May 16, 2016 at 05:29:00PM +0300, Kirill A. Shutemov wrote:
> > > > Kirill,
> > > > You wanted to test non-HW access bit system and I did.
> > > > What's your opinion?
> > > 
> > > Sorry, for late response.
> > > 
> > > My patch is incomlete: we need to find a way to not mark pte as old if we
> > > handle page fault for the address the pte represents.
> > 
> > I'm sure you can handle it but my point is there wouldn't be a big gain
> > although you can handle it in non-HW access bit system. Okay, let's be
> > more clear because I don't have every non-HW access bit architecture.
> > At least, current mobile workload in ARM which I have wouldn't be huge
> > benefit.
> > I will say one more.
> > I tested the workload on quad-core system and core speed is not so slow
> > compared to recent other mobile phone SoC. Even when I tested the benchmark
> > without pte_mkold, the benefit is within noise because storage is really
> > slow so major fault is dominant factor. So, I decide test storage from eMMC
> > to eSATA. And then finally, I manage to see the a little beneift with
> > fault_around without pte_mkold.
> > 
> > However, let's consider side-effect aspect from fault_around.
> > 
> > 1. Increase slab shrinking compard to old
> > 2. high level vmpressure compared to old
> > 
> > With considering that regressions on my system, it's really not worth to
> > try at the moment.
> > That's why I wanted to disable fault_around as default in non-HW access
> > bit system.
> 
> Feel free to post such patch. I guess it's reasonable.

>From d926a2a19cd0921b34279c3f6a3bae8b7508646d Mon Sep 17 00:00:00 2001
From: Minchan Kim 
Date: Wed, 18 May 2016 08:36:59 +0900
Subject: [PATCH] mm: disable fault around on emulated access bit architecture

The fault_around aims for reducing minor fault of file-backed pages
via speculative ahead pte mapping with relying on readahead logic.
However, on non-HW access bit architecture, the benefit is highly
limited because they should emulate young bit with minor fault
for page aging algorithm of reclaim. IOW, we cannot reduce minor fault
on those architectures.

I did quick test in my ARM machine.

512M file mmap sequential every word read on eSATA drive with 4 times.
stdev is stable.

= fault_around 4096 =
elapsed time(usec): 6747645

= fault_around 65536 =
elapsed time(usec): 6709263

0.5% gain.

Even, when I tested it with eMMC, there is no gain because I guess
with slow storage, major fault is more dominant factor.

As well, fault_around has side effect to shrink slab more aggressively
and higher vmpressure so if such speculation fails, it can evict slab
more which can result in page I/O(e.g., inode cache), in the end,
it would make void benefit of fault_around.

So let's make default disable on those architectures.

Cc: Kirill A. Shutemov 
Cc: linux-a...@vger.kernel.org
Signed-off-by: Minchan Kim 
---
 mm/memory.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index b762b17aa4c5..9f652fdc0295 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2897,8 +2897,16 @@ void do_set_pte(struct vm_area_struct *vma, unsigned 
long address,
update_mmu_cache(vma, address, pte);
 }
 
+/*
+ * If architecture emulates "accessed" or "young" bit without HW support,
+ * there is no much gain with fault_around.
+ */
 static unsigned long fault_around_bytes __read_mostly =
+#ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
+   PAGE_SIZE;
+#else
rounddown_pow_of_two(65536);
+#endif
 
 #ifdef CONFIG_DEBUG_FS
 static int fault_around_bytes_get(void *data, u64 *val)
-- 
1.9.1

[PATCH] mm: disable fault around on emulated access bit architecture

2016-05-17 Thread Minchan Kim

On Tue, May 17, 2016 at 03:34:23PM +0300, Kirill A. Shutemov wrote:
> On Mon, May 16, 2016 at 11:56:32PM +0900, Minchan Kim wrote:
> > On Mon, May 16, 2016 at 05:29:00PM +0300, Kirill A. Shutemov wrote:
> > > > Kirill,
> > > > You wanted to test non-HW access bit system and I did.
> > > > What's your opinion?
> > > 
> > > Sorry, for late response.
> > > 
> > > My patch is incomlete: we need to find a way to not mark pte as old if we
> > > handle page fault for the address the pte represents.
> > 
> > I'm sure you can handle it but my point is there wouldn't be a big gain
> > although you can handle it in non-HW access bit system. Okay, let's be
> > more clear because I don't have every non-HW access bit architecture.
> > At least, current mobile workload in ARM which I have wouldn't be huge
> > benefit.
> > I will say one more.
> > I tested the workload on quad-core system and core speed is not so slow
> > compared to recent other mobile phone SoC. Even when I tested the benchmark
> > without pte_mkold, the benefit is within noise because storage is really
> > slow so major fault is dominant factor. So, I decide test storage from eMMC
> > to eSATA. And then finally, I manage to see the a little beneift with
> > fault_around without pte_mkold.
> > 
> > However, let's consider side-effect aspect from fault_around.
> > 
> > 1. Increase slab shrinking compard to old
> > 2. high level vmpressure compared to old
> > 
> > With considering that regressions on my system, it's really not worth to
> > try at the moment.
> > That's why I wanted to disable fault_around as default in non-HW access
> > bit system.
> 
> Feel free to post such patch. I guess it's reasonable.

>From d926a2a19cd0921b34279c3f6a3bae8b7508646d Mon Sep 17 00:00:00 2001
From: Minchan Kim 
Date: Wed, 18 May 2016 08:36:59 +0900
Subject: [PATCH] mm: disable fault around on emulated access bit architecture

The fault_around aims for reducing minor fault of file-backed pages
via speculative ahead pte mapping with relying on readahead logic.
However, on non-HW access bit architecture, the benefit is highly
limited because they should emulate young bit with minor fault
for page aging algorithm of reclaim. IOW, we cannot reduce minor fault
on those architectures.

I did quick test in my ARM machine.

512M file mmap sequential every word read on eSATA drive with 4 times.
stdev is stable.

= fault_around 4096 =
elapsed time(usec): 6747645

= fault_around 65536 =
elapsed time(usec): 6709263

0.5% gain.

Even, when I tested it with eMMC, there is no gain because I guess
with slow storage, major fault is more dominant factor.

As well, fault_around has side effect to shrink slab more aggressively
and higher vmpressure so if such speculation fails, it can evict slab
more which can result in page I/O(e.g., inode cache), in the end,
it would make void benefit of fault_around.

So let's make default disable on those architectures.

Cc: Kirill A. Shutemov 
Cc: linux-a...@vger.kernel.org
Signed-off-by: Minchan Kim 
---
 mm/memory.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index b762b17aa4c5..9f652fdc0295 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2897,8 +2897,16 @@ void do_set_pte(struct vm_area_struct *vma, unsigned 
long address,
update_mmu_cache(vma, address, pte);
 }
 
+/*
+ * If architecture emulates "accessed" or "young" bit without HW support,
+ * there is no much gain with fault_around.
+ */
 static unsigned long fault_around_bytes __read_mostly =
+#ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS
+   PAGE_SIZE;
+#else
rounddown_pow_of_two(65536);
+#endif
 
 #ifdef CONFIG_DEBUG_FS
 static int fault_around_bytes_get(void *data, u64 *val)
-- 
1.9.1

[PATCH 1/2] kprobes: add a new module parameter

2016-05-17 Thread Huang Shijie

This patch adds a new module parameter which can be used as the
symbol name. With this parameter, the module becomes more flexable.

Signed-off-by: Huang Shijie 
---
 samples/kprobes/kprobe_example.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/samples/kprobes/kprobe_example.c b/samples/kprobes/kprobe_example.c
index 727eb21..2bb190d 100644
--- a/samples/kprobes/kprobe_example.c
+++ b/samples/kprobes/kprobe_example.c
@@ -14,9 +14,13 @@
 #include 
 #include 
 
+#define MAX_SYMBOL_LEN 64
+static char symbol[MAX_SYMBOL_LEN] = "_do_fork";
+module_param_string(symbol, symbol, sizeof(symbol), 0644);
+
 /* For each probe you need to allocate a kprobe structure */
 static struct kprobe kp = {
-   .symbol_name= "_do_fork",
+   .symbol_name= symbol,
 };
 
 /* kprobe pre_handler: called just before the probed instruction is executed */
-- 
2.5.5

[PATCH 2/2] kprobes: print out the symbol name for the hooks

2016-05-17 Thread Huang Shijie

Print out the symbol name for the hooks, it makes the logs more readable.

Signed-off-by: Huang Shijie 
---
 samples/kprobes/kprobe_example.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/samples/kprobes/kprobe_example.c b/samples/kprobes/kprobe_example.c
index 2bb190d..ed0ca0c 100644
--- a/samples/kprobes/kprobe_example.c
+++ b/samples/kprobes/kprobe_example.c
@@ -27,24 +27,24 @@ static struct kprobe kp = {
 static int handler_pre(struct kprobe *p, struct pt_regs *regs)
 {
 #ifdef CONFIG_X86
-   printk(KERN_INFO "pre_handler: p->addr = 0x%p, ip = %lx,"
+   printk(KERN_INFO "<%s> pre_handler: p->addr = 0x%p, ip = %lx,"
" flags = 0x%lx\n",
-   p->addr, regs->ip, regs->flags);
+   p->symbol_name, p->addr, regs->ip, regs->flags);
 #endif
 #ifdef CONFIG_PPC
-   printk(KERN_INFO "pre_handler: p->addr = 0x%p, nip = 0x%lx,"
+   printk(KERN_INFO "<%s> pre_handler: p->addr = 0x%p, nip = 0x%lx,"
" msr = 0x%lx\n",
-   p->addr, regs->nip, regs->msr);
+   p->symbol_name, p->addr, regs->nip, regs->msr);
 #endif
 #ifdef CONFIG_MIPS
-   printk(KERN_INFO "pre_handler: p->addr = 0x%p, epc = 0x%lx,"
+   printk(KERN_INFO "<%s> pre_handler: p->addr = 0x%p, epc = 0x%lx,"
" status = 0x%lx\n",
-   p->addr, regs->cp0_epc, regs->cp0_status);
+   p->symbol_name, p->addr, regs->cp0_epc, regs->cp0_status);
 #endif
 #ifdef CONFIG_TILEGX
-   printk(KERN_INFO "pre_handler: p->addr = 0x%p, pc = 0x%lx,"
+   printk(KERN_INFO "<%s> pre_handler: p->addr = 0x%p, pc = 0x%lx,"
" ex1 = 0x%lx\n",
-   p->addr, regs->pc, regs->ex1);
+   p->symbol_name, p->addr, regs->pc, regs->ex1);
 #endif
 
/* A dump_stack() here will give a stack backtrace */
@@ -56,20 +56,20 @@ static void handler_post(struct kprobe *p, struct pt_regs 
*regs,
unsigned long flags)
 {
 #ifdef CONFIG_X86
-   printk(KERN_INFO "post_handler: p->addr = 0x%p, flags = 0x%lx\n",
-   p->addr, regs->flags);
+   printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, flags = 0x%lx\n",
+   p->symbol_name, p->addr, regs->flags);
 #endif
 #ifdef CONFIG_PPC
-   printk(KERN_INFO "post_handler: p->addr = 0x%p, msr = 0x%lx\n",
-   p->addr, regs->msr);
+   printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, msr = 0x%lx\n",
+   p->symbol_name, p->addr, regs->msr);
 #endif
 #ifdef CONFIG_MIPS
-   printk(KERN_INFO "post_handler: p->addr = 0x%p, status = 0x%lx\n",
-   p->addr, regs->cp0_status);
+   printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, status = 0x%lx\n",
+   p->symbol_name, p->addr, regs->cp0_status);
 #endif
 #ifdef CONFIG_TILEGX
-   printk(KERN_INFO "post_handler: p->addr = 0x%p, ex1 = 0x%lx\n",
-   p->addr, regs->ex1);
+   printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, ex1 = 0x%lx\n",
+   p->symbol_name, p->addr, regs->ex1);
 #endif
 }
 
-- 
2.5.5

[PATCH 1/2] kprobes: add a new module parameter

2016-05-17 Thread Huang Shijie

This patch adds a new module parameter which can be used as the
symbol name. With this parameter, the module becomes more flexable.

Signed-off-by: Huang Shijie 
---
 samples/kprobes/kprobe_example.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/samples/kprobes/kprobe_example.c b/samples/kprobes/kprobe_example.c
index 727eb21..2bb190d 100644
--- a/samples/kprobes/kprobe_example.c
+++ b/samples/kprobes/kprobe_example.c
@@ -14,9 +14,13 @@
 #include 
 #include 
 
+#define MAX_SYMBOL_LEN 64
+static char symbol[MAX_SYMBOL_LEN] = "_do_fork";
+module_param_string(symbol, symbol, sizeof(symbol), 0644);
+
 /* For each probe you need to allocate a kprobe structure */
 static struct kprobe kp = {
-   .symbol_name= "_do_fork",
+   .symbol_name= symbol,
 };
 
 /* kprobe pre_handler: called just before the probed instruction is executed */
-- 
2.5.5

[PATCH 2/2] kprobes: print out the symbol name for the hooks

2016-05-17 Thread Huang Shijie

Print out the symbol name for the hooks, it makes the logs more readable.

Signed-off-by: Huang Shijie 
---
 samples/kprobes/kprobe_example.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/samples/kprobes/kprobe_example.c b/samples/kprobes/kprobe_example.c
index 2bb190d..ed0ca0c 100644
--- a/samples/kprobes/kprobe_example.c
+++ b/samples/kprobes/kprobe_example.c
@@ -27,24 +27,24 @@ static struct kprobe kp = {
 static int handler_pre(struct kprobe *p, struct pt_regs *regs)
 {
 #ifdef CONFIG_X86
-   printk(KERN_INFO "pre_handler: p->addr = 0x%p, ip = %lx,"
+   printk(KERN_INFO "<%s> pre_handler: p->addr = 0x%p, ip = %lx,"
" flags = 0x%lx\n",
-   p->addr, regs->ip, regs->flags);
+   p->symbol_name, p->addr, regs->ip, regs->flags);
 #endif
 #ifdef CONFIG_PPC
-   printk(KERN_INFO "pre_handler: p->addr = 0x%p, nip = 0x%lx,"
+   printk(KERN_INFO "<%s> pre_handler: p->addr = 0x%p, nip = 0x%lx,"
" msr = 0x%lx\n",
-   p->addr, regs->nip, regs->msr);
+   p->symbol_name, p->addr, regs->nip, regs->msr);
 #endif
 #ifdef CONFIG_MIPS
-   printk(KERN_INFO "pre_handler: p->addr = 0x%p, epc = 0x%lx,"
+   printk(KERN_INFO "<%s> pre_handler: p->addr = 0x%p, epc = 0x%lx,"
" status = 0x%lx\n",
-   p->addr, regs->cp0_epc, regs->cp0_status);
+   p->symbol_name, p->addr, regs->cp0_epc, regs->cp0_status);
 #endif
 #ifdef CONFIG_TILEGX
-   printk(KERN_INFO "pre_handler: p->addr = 0x%p, pc = 0x%lx,"
+   printk(KERN_INFO "<%s> pre_handler: p->addr = 0x%p, pc = 0x%lx,"
" ex1 = 0x%lx\n",
-   p->addr, regs->pc, regs->ex1);
+   p->symbol_name, p->addr, regs->pc, regs->ex1);
 #endif
 
/* A dump_stack() here will give a stack backtrace */
@@ -56,20 +56,20 @@ static void handler_post(struct kprobe *p, struct pt_regs 
*regs,
unsigned long flags)
 {
 #ifdef CONFIG_X86
-   printk(KERN_INFO "post_handler: p->addr = 0x%p, flags = 0x%lx\n",
-   p->addr, regs->flags);
+   printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, flags = 0x%lx\n",
+   p->symbol_name, p->addr, regs->flags);
 #endif
 #ifdef CONFIG_PPC
-   printk(KERN_INFO "post_handler: p->addr = 0x%p, msr = 0x%lx\n",
-   p->addr, regs->msr);
+   printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, msr = 0x%lx\n",
+   p->symbol_name, p->addr, regs->msr);
 #endif
 #ifdef CONFIG_MIPS
-   printk(KERN_INFO "post_handler: p->addr = 0x%p, status = 0x%lx\n",
-   p->addr, regs->cp0_status);
+   printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, status = 0x%lx\n",
+   p->symbol_name, p->addr, regs->cp0_status);
 #endif
 #ifdef CONFIG_TILEGX
-   printk(KERN_INFO "post_handler: p->addr = 0x%p, ex1 = 0x%lx\n",
-   p->addr, regs->ex1);
+   printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, ex1 = 0x%lx\n",
+   p->symbol_name, p->addr, regs->ex1);
 #endif
 }
 
-- 
2.5.5

Re: [PATCH v12 10/10] kprobes: Add arm64 case in kprobe example module

2016-05-17 Thread Huang Shijie

On Tue, May 17, 2016 at 11:24:27AM +0100, Mark Brown wrote:
> On Tue, May 17, 2016 at 05:57:33PM +0800, Huang Shijie wrote:
> > On Wed, Apr 27, 2016 at 02:53:05PM -0400, David Long wrote:
> 
> > > +#ifdef CONFIG_ARM64
> > > + pr_info("pre_handler: p->addr = 0x%p, pc = 0x%lx\n",
> 
> > I think you miss the KERN_INFO here.
> 
> That's what pr_info() does over printk() - it adds the KERN_INFO more
> cleanly.
sorry, I thought the "pr_info" to "printk" when I first read this code.

thanks
Huang Shijie

Re: [PATCH v12 10/10] kprobes: Add arm64 case in kprobe example module

2016-05-17 Thread Huang Shijie

On Tue, May 17, 2016 at 11:24:27AM +0100, Mark Brown wrote:
> On Tue, May 17, 2016 at 05:57:33PM +0800, Huang Shijie wrote:
> > On Wed, Apr 27, 2016 at 02:53:05PM -0400, David Long wrote:
> 
> > > +#ifdef CONFIG_ARM64
> > > + pr_info("pre_handler: p->addr = 0x%p, pc = 0x%lx\n",
> 
> > I think you miss the KERN_INFO here.
> 
> That's what pr_info() does over printk() - it adds the KERN_INFO more
> cleanly.
sorry, I thought the "pr_info" to "printk" when I first read this code.

thanks
Huang Shijie

Re: [f2fs-dev] [PATCH] f2fs: use bio count instead of F2FS_WRITEBACK page count

2016-05-17 Thread Jaegeuk Kim

On Wed, May 18, 2016 at 09:17:00AM +0800, Chao Yu wrote:
> Hi Jaegeuk,
> 
> On 2016/5/18 8:44, Jaegeuk Kim wrote:
> > This can reduce page counting overhead.
> 
> We change to increase one reference for one bio, but block layer can split or
> merge bios by itself, and write_end will be called per bio, so the reference 
> may
> be maintained incorrectly?

Well, block layer will merge bios in a same request, and then finally call
end_io for each original bios, no?
So far I've seen no error in any test cases.
Am I missing something?

Thanks,

> 
> Thanks,
> 
> > 
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  fs/f2fs/checkpoint.c |  2 +-
> >  fs/f2fs/data.c   | 26 +++---
> >  fs/f2fs/debug.c  |  6 +++---
> >  fs/f2fs/f2fs.h   |  4 ++--
> >  fs/f2fs/super.c  |  2 +-
> >  5 files changed, 22 insertions(+), 18 deletions(-)
> > 
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index d04113b..447e2a9 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -914,7 +914,7 @@ static void wait_on_all_pages_writeback(struct 
> > f2fs_sb_info *sbi)
> > for (;;) {
> > prepare_to_wait(>cp_wait, , TASK_UNINTERRUPTIBLE);
> >  
> > -   if (!get_pages(sbi, F2FS_WRITEBACK))
> > +   if (!atomic_read(>nr_wb_bios))
> > break;
> >  
> > io_schedule_timeout(5*HZ);
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index 1013836..faef666 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -71,10 +71,9 @@ static void f2fs_write_end_io(struct bio *bio)
> > f2fs_stop_checkpoint(sbi);
> > }
> > end_page_writeback(page);
> > -   dec_page_count(sbi, F2FS_WRITEBACK);
> > }
> > -
> > -   if (!get_pages(sbi, F2FS_WRITEBACK) && wq_has_sleeper(>cp_wait))
> > +   if (atomic_dec_and_test(>nr_wb_bios) &&
> > +   wq_has_sleeper(>cp_wait))
> > wake_up(>cp_wait);
> >  
> > bio_put(bio);
> > @@ -98,6 +97,14 @@ static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, 
> > block_t blk_addr,
> > return bio;
> >  }
> >  
> > +static inline void __submit_bio(struct f2fs_sb_info *sbi, int rw,
> > +   struct bio *bio)
> > +{
> > +   if (!is_read_io(rw))
> > +   atomic_inc(>nr_wb_bios);
> > +   submit_bio(rw, bio);
> > +}
> > +
> >  static void __submit_merged_bio(struct f2fs_bio_info *io)
> >  {
> > struct f2fs_io_info *fio = >fio;
> > @@ -110,7 +117,7 @@ static void __submit_merged_bio(struct f2fs_bio_info 
> > *io)
> > else
> > trace_f2fs_submit_write_bio(io->sbi->sb, fio, io->bio);
> >  
> > -   submit_bio(fio->rw, io->bio);
> > +   __submit_bio(io->sbi, fio->rw, io->bio);
> > io->bio = NULL;
> >  }
> >  
> > @@ -228,7 +235,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
> > return -EFAULT;
> > }
> >  
> > -   submit_bio(fio->rw, bio);
> > +   __submit_bio(fio->sbi, fio->rw, bio);
> > return 0;
> >  }
> >  
> > @@ -248,9 +255,6 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio)
> >  
> > down_write(>io_rwsem);
> >  
> > -   if (!is_read)
> > -   inc_page_count(sbi, F2FS_WRITEBACK);
> > -
> > if (io->bio && (io->last_block_in_bio != fio->new_blkaddr - 1 ||
> > io->fio.rw != fio->rw))
> > __submit_merged_bio(io);
> > @@ -1047,7 +1051,7 @@ got_it:
> >  */
> > if (bio && (last_block_in_bio != block_nr - 1)) {
> >  submit_and_realloc:
> > -   submit_bio(READ, bio);
> > +   __submit_bio(F2FS_I_SB(inode), READ, bio);
> > bio = NULL;
> > }
> > if (bio == NULL) {
> > @@ -1090,7 +1094,7 @@ set_error_page:
> > goto next_page;
> >  confused:
> > if (bio) {
> > -   submit_bio(READ, bio);
> > +   __submit_bio(F2FS_I_SB(inode), READ, bio);
> > bio = NULL;
> > }
> > unlock_page(page);
> > @@ -1100,7 +1104,7 @@ next_page:
> > }
> > BUG_ON(pages && !list_empty(pages));
> > if (bio)
> > -   submit_bio(READ, bio);
> > +   __submit_bio(F2FS_I_SB(inode), READ, bio);
> > return 0;
> >  }
> >  
> > diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
> > index 37615b2..a188973 100644
> > --- a/fs/f2fs/debug.c
> > +++ b/fs/f2fs/debug.c
> > @@ -48,7 +48,7 @@ static void update_general_status(struct f2fs_sb_info 
> > *sbi)
> > si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
> > si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
> > si->inmem_pages = get_pages(sbi, F2FS_INMEM_PAGES);
> > -   si->wb_pages = get_pages(sbi, F2FS_WRITEBACK);
> > +   si->wb_bios = atomic_read(>nr_wb_bios);
> > si->total_count = (int)sbi->user_block_count / sbi->blocks_per_seg;
> > si->rsvd_segs = reserved_segments(sbi);
> >

Re: [f2fs-dev] [PATCH] f2fs: use bio count instead of F2FS_WRITEBACK page count

2016-05-17 Thread Jaegeuk Kim

On Wed, May 18, 2016 at 09:17:00AM +0800, Chao Yu wrote:
> Hi Jaegeuk,
> 
> On 2016/5/18 8:44, Jaegeuk Kim wrote:
> > This can reduce page counting overhead.
> 
> We change to increase one reference for one bio, but block layer can split or
> merge bios by itself, and write_end will be called per bio, so the reference 
> may
> be maintained incorrectly?

Well, block layer will merge bios in a same request, and then finally call
end_io for each original bios, no?
So far I've seen no error in any test cases.
Am I missing something?

Thanks,

> 
> Thanks,
> 
> > 
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  fs/f2fs/checkpoint.c |  2 +-
> >  fs/f2fs/data.c   | 26 +++---
> >  fs/f2fs/debug.c  |  6 +++---
> >  fs/f2fs/f2fs.h   |  4 ++--
> >  fs/f2fs/super.c  |  2 +-
> >  5 files changed, 22 insertions(+), 18 deletions(-)
> > 
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index d04113b..447e2a9 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -914,7 +914,7 @@ static void wait_on_all_pages_writeback(struct 
> > f2fs_sb_info *sbi)
> > for (;;) {
> > prepare_to_wait(>cp_wait, , TASK_UNINTERRUPTIBLE);
> >  
> > -   if (!get_pages(sbi, F2FS_WRITEBACK))
> > +   if (!atomic_read(>nr_wb_bios))
> > break;
> >  
> > io_schedule_timeout(5*HZ);
> > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > index 1013836..faef666 100644
> > --- a/fs/f2fs/data.c
> > +++ b/fs/f2fs/data.c
> > @@ -71,10 +71,9 @@ static void f2fs_write_end_io(struct bio *bio)
> > f2fs_stop_checkpoint(sbi);
> > }
> > end_page_writeback(page);
> > -   dec_page_count(sbi, F2FS_WRITEBACK);
> > }
> > -
> > -   if (!get_pages(sbi, F2FS_WRITEBACK) && wq_has_sleeper(>cp_wait))
> > +   if (atomic_dec_and_test(>nr_wb_bios) &&
> > +   wq_has_sleeper(>cp_wait))
> > wake_up(>cp_wait);
> >  
> > bio_put(bio);
> > @@ -98,6 +97,14 @@ static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, 
> > block_t blk_addr,
> > return bio;
> >  }
> >  
> > +static inline void __submit_bio(struct f2fs_sb_info *sbi, int rw,
> > +   struct bio *bio)
> > +{
> > +   if (!is_read_io(rw))
> > +   atomic_inc(>nr_wb_bios);
> > +   submit_bio(rw, bio);
> > +}
> > +
> >  static void __submit_merged_bio(struct f2fs_bio_info *io)
> >  {
> > struct f2fs_io_info *fio = >fio;
> > @@ -110,7 +117,7 @@ static void __submit_merged_bio(struct f2fs_bio_info 
> > *io)
> > else
> > trace_f2fs_submit_write_bio(io->sbi->sb, fio, io->bio);
> >  
> > -   submit_bio(fio->rw, io->bio);
> > +   __submit_bio(io->sbi, fio->rw, io->bio);
> > io->bio = NULL;
> >  }
> >  
> > @@ -228,7 +235,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
> > return -EFAULT;
> > }
> >  
> > -   submit_bio(fio->rw, bio);
> > +   __submit_bio(fio->sbi, fio->rw, bio);
> > return 0;
> >  }
> >  
> > @@ -248,9 +255,6 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio)
> >  
> > down_write(>io_rwsem);
> >  
> > -   if (!is_read)
> > -   inc_page_count(sbi, F2FS_WRITEBACK);
> > -
> > if (io->bio && (io->last_block_in_bio != fio->new_blkaddr - 1 ||
> > io->fio.rw != fio->rw))
> > __submit_merged_bio(io);
> > @@ -1047,7 +1051,7 @@ got_it:
> >  */
> > if (bio && (last_block_in_bio != block_nr - 1)) {
> >  submit_and_realloc:
> > -   submit_bio(READ, bio);
> > +   __submit_bio(F2FS_I_SB(inode), READ, bio);
> > bio = NULL;
> > }
> > if (bio == NULL) {
> > @@ -1090,7 +1094,7 @@ set_error_page:
> > goto next_page;
> >  confused:
> > if (bio) {
> > -   submit_bio(READ, bio);
> > +   __submit_bio(F2FS_I_SB(inode), READ, bio);
> > bio = NULL;
> > }
> > unlock_page(page);
> > @@ -1100,7 +1104,7 @@ next_page:
> > }
> > BUG_ON(pages && !list_empty(pages));
> > if (bio)
> > -   submit_bio(READ, bio);
> > +   __submit_bio(F2FS_I_SB(inode), READ, bio);
> > return 0;
> >  }
> >  
> > diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
> > index 37615b2..a188973 100644
> > --- a/fs/f2fs/debug.c
> > +++ b/fs/f2fs/debug.c
> > @@ -48,7 +48,7 @@ static void update_general_status(struct f2fs_sb_info 
> > *sbi)
> > si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
> > si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
> > si->inmem_pages = get_pages(sbi, F2FS_INMEM_PAGES);
> > -   si->wb_pages = get_pages(sbi, F2FS_WRITEBACK);
> > +   si->wb_bios = atomic_read(>nr_wb_bios);
> > si->total_count = (int)sbi->user_block_count / sbi->blocks_per_seg;
> > si->rsvd_segs = reserved_segments(sbi);
> > si->overp_segs =

Re: [PATCH] Staging: comedi: quatech_daqp_cs.c: fixed a warning issue

2016-05-17 Thread Amit Ghadge

On Tue, May 17, 2016 at 06:47:56AM -0700, Greg KH wrote:
> A: http://en.wikipedia.org/wiki/Top_post
> Q: Were do I find info about this thing called top-posting?
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing in e-mail?
> 
> A: No.
> Q: Should I include quotations after my reply?
> 
> http://daringfireball.net/2007/07/on_top
Thanks for this valuable information.

> 
> On Tue, May 17, 2016 at 09:31:56AM +0530, Amit Ghadge wrote:
> > Hello Greg KH,
> > 
> > I make patch same like other, I'm new and I nerver see changelog in other 
> > patches.
> > 
> > Where to add changelog? I followed you are tutorial.
> 
> It's the area in the email before the patch, it ends up in the changelog
> when the patch is committed to the kernel tree.  You wrote something
> this time, but it was vague and didn't make sense.  Please fix that up
> and resend.
I resend this patch with patch description.

> 
> greg k-h

Re: [PATCH] Staging: comedi: quatech_daqp_cs.c: fixed a warning issue

2016-05-17 Thread Amit Ghadge

On Tue, May 17, 2016 at 06:47:56AM -0700, Greg KH wrote:
> A: http://en.wikipedia.org/wiki/Top_post
> Q: Were do I find info about this thing called top-posting?
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing in e-mail?
> 
> A: No.
> Q: Should I include quotations after my reply?
> 
> http://daringfireball.net/2007/07/on_top
Thanks for this valuable information.

> 
> On Tue, May 17, 2016 at 09:31:56AM +0530, Amit Ghadge wrote:
> > Hello Greg KH,
> > 
> > I make patch same like other, I'm new and I nerver see changelog in other 
> > patches.
> > 
> > Where to add changelog? I followed you are tutorial.
> 
> It's the area in the email before the patch, it ends up in the changelog
> when the patch is committed to the kernel tree.  You wrote something
> this time, but it was vague and didn't make sense.  Please fix that up
> and resend.
I resend this patch with patch description.

> 
> greg k-h

[PATCH v4 3/5] locking/rwsem: Don't wake up one's own task

2016-05-17 Thread Waiman Long

As rwsem_down_read_failed() will queue itself and potentially call
__rwsem_do_wake(sem, RWSEM_WAKE_ANY), it is possible that a reader
will try to wake up its own task. This patch adds a check to make
sure that this won't happen.

Signed-off-by: Waiman Long 
Reviewed-by: Peter Hurley 
---
 kernel/locking/rwsem-xadd.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index c278f5a..007814f 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -202,7 +202,8 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum 
rwsem_wake_type wake_type)
 */
smp_mb();
waiter->task = NULL;
-   wake_up_process(tsk);
+   if (tsk != current)
+   wake_up_process(tsk);
put_task_struct(tsk);
} while (--loop);
 
-- 
1.7.1

[PATCH v4 5/5] locking/rwsem: Streamline the rwsem_optimistic_spin() code

2016-05-17 Thread Waiman Long

This patch moves the owner loading and checking code entirely inside of
rwsem_spin_on_owner() to simplify the logic of rwsem_optimistic_spin()
loop.

Suggested-by: Peter Hurley 
Signed-off-by: Waiman Long 
Reviewed-by: Peter Hurley 
---
 kernel/locking/rwsem-xadd.c |   38 --
 1 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index e3a7e06..a85a2bd 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -332,9 +332,16 @@ done:
return ret;
 }
 
-static noinline
-bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
+/*
+ * Return true only if we can still spin on the owner field of the rwsem.
+ */
+static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 {
+   struct task_struct *owner = READ_ONCE(sem->owner);
+
+   if (!rwsem_owner_is_writer(owner))
+   goto out;
+
rcu_read_lock();
while (sem->owner == owner) {
/*
@@ -354,7 +361,7 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct 
task_struct *owner)
cpu_relax_lowlatency();
}
rcu_read_unlock();
-
+out:
/*
 * If there is a new owner or the owner is not set, we continue
 * spinning.
@@ -364,7 +371,6 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct 
task_struct *owner)
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 {
-   struct task_struct *owner;
bool taken = false;
 
preempt_disable();
@@ -376,21 +382,17 @@ static bool rwsem_optimistic_spin(struct rw_semaphore 
*sem)
if (!osq_lock(>osq))
goto done;
 
-   while (true) {
-   owner = READ_ONCE(sem->owner);
+   /*
+* Optimistically spin on the owner field and attempt to acquire the
+* lock whenever the owner changes. Spinning will be stopped when:
+*  1) the owning writer isn't running; or
+*  2) readers own the lock as we can't determine if they are
+* actively running or not.
+*/
+   while (rwsem_spin_on_owner(sem)) {
/*
-* Don't spin if
-* 1) the owner is a reader as we we can't determine if the
-*reader is actively running or not.
-* 2) The rwsem_spin_on_owner() returns false which means
-*the owner isn't running.
+* Try to acquire the lock
 */
-   if (rwsem_owner_is_reader(owner) ||
-  (rwsem_owner_is_writer(owner) &&
-  !rwsem_spin_on_owner(sem, owner)))
-   break;
-
-   /* wait_lock will be acquired if write_lock is obtained */
if (rwsem_try_write_lock_unqueued(sem)) {
taken = true;
break;
@@ -402,7 +404,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 * we're an RT task that will live-lock because we won't let
 * the owner complete.
 */
-   if (!owner && (need_resched() || rt_task(current)))
+   if (!sem->owner && (need_resched() || rt_task(current)))
break;
 
/*
-- 
1.7.1

[PATCH v4 0/5] [PATCH v3 0/4] locking/rwsem: Add reader-owned state to the owner field

2016-05-17 Thread Waiman Long

 v3->v4:
  - Add a new patch 2 to use WRITE_ONCE() for all rwsem->owner stores
to prevent store tearing.

 v2->v3:
  - Make minor code changes as suggested by PeterZ & Peter Hurley.
  - Add 2 minor patches (#2 & #3) to improve the rwsem code
  - Add a 4th patch to streamline the rwsem_optimistic_spin() code.

 v1->v2:
  - Add rwsem_is_reader_owned() helper & rename rwsem_reader_owned()
to rwsem_set_reader_owned().
  - Add more comments to clarify the purpose of some of the code
changes.

Patch 1 is the main patch of this series.

Patch 2 protects against store tearing of rwsem->owner field which 
can cause problem when a reader tries to dereference it.

Patch 3 eliminates redundant wakeup caused by a reader waking itself.

Patch 4 improves the efficiency of the reader wakeup code.

Patch 5 streamlines the rwsem_optimistic_spin() to make it simpler.

Waiman Long (5):
  locking/rwsem: Add reader-owned state to the owner field
  locking/rwsem: Protect all writes to owner by WRITE_ONCE()
  locking/rwsem: Don't wake up one's own task
  locking/rwsem: Improve reader wakeup code
  locking/rwsem: Streamline the rwsem_optimistic_spin() code

 kernel/locking/rwsem-xadd.c |   75 --
 kernel/locking/rwsem.c  |8 +++-
 kernel/locking/rwsem.h  |   52 -
 3 files changed, 99 insertions(+), 36 deletions(-)

[PATCH v4 1/5] locking/rwsem: Add reader-owned state to the owner field

2016-05-17 Thread Waiman Long

Currently, it is not possible to determine for sure if a reader
owns a rwsem by looking at the content of the rwsem data structure.
This patch adds a new state RWSEM_READER_OWNED to the owner field
to indicate that readers currently own the lock. This enables us to
address the following 2 issues in the rwsem optimistic spinning code:

 1) rwsem_can_spin_on_owner() will disallow optimistic spinning if
the owner field is NULL which can mean either the readers own
the lock or the owning writer hasn't set the owner field yet.
In the latter case, we miss the chance to do optimistic spinning.

 2) While a writer is waiting in the OSQ and a reader takes the lock,
the writer will continue to spin when out of the OSQ in the main
rwsem_optimistic_spin() loop as the owner field is NULL wasting
CPU cycles if some of readers are sleeping.

Adding the new state will allow optimistic spinning to go forward as
long as the owner field is not RWSEM_READER_OWNED and the owner is
running, if set, but stop immediately when that state has been reached.

On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the
fio test with multithreaded randrw and randwrite tests on the same
file on a XFS partition on top of a NVDIMM were run, the aggregated
bandwidths before and after the patch were as follows:

  Test  BW before patch BW after patch  % change
    --- --  
  randrw 988 MB/s  1192 MB/s  +21%
  randwrite 1513 MB/s  1623 MB/s  +7.3%

The perf profile of the rwsem_down_write_failed() function in randrw
before and after the patch were:

   19.95%  5.88%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed
   14.20%  1.52%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed

The actual CPU cycles spend in rwsem_down_write_failed() dropped from
5.88% to 1.52% after the patch.

The xfstests was also run and no regression was observed.

Signed-off-by: Waiman Long 
Acked-by: Jason Low 
Acked-by: Davidlohr Bueso 
---
 kernel/locking/rwsem-xadd.c |   41 ++---
 kernel/locking/rwsem.c  |8 ++--
 kernel/locking/rwsem.h  |   41 +
 3 files changed, 69 insertions(+), 21 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 09e30c6..c278f5a 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -155,6 +155,12 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum 
rwsem_wake_type wake_type)
/* Last active locker left. Retry waking readers. */
goto try_reader_grant;
}
+   /*
+* It is not really necessary to set it to reader-owned here,
+* but it gives the spinners an early indication that the
+* readers now have the lock.
+*/
+   rwsem_set_reader_owned(sem);
}
 
/* Grant an infinite number of read locks to the readers at the front
@@ -306,16 +312,11 @@ static inline bool rwsem_can_spin_on_owner(struct 
rw_semaphore *sem)
 
rcu_read_lock();
owner = READ_ONCE(sem->owner);
-   if (!owner) {
-   long count = READ_ONCE(sem->count);
+   if (!rwsem_owner_is_writer(owner)) {
/*
-* If sem->owner is not set, yet we have just recently entered 
the
-* slowpath with the lock being active, then there is a 
possibility
-* reader(s) may have the lock. To be safe, bail spinning in 
these
-* situations.
+* Don't spin if the rwsem is readers owned.
 */
-   if (count & RWSEM_ACTIVE_MASK)
-   ret = false;
+   ret = !rwsem_owner_is_reader(owner);
goto done;
}
 
@@ -328,8 +329,6 @@ done:
 static noinline
 bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 {
-   long count;
-
rcu_read_lock();
while (sem->owner == owner) {
/*
@@ -350,16 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct 
task_struct *owner)
}
rcu_read_unlock();
 
-   if (READ_ONCE(sem->owner))
-   return true; /* new owner, continue spinning */
-
/*
-* When the owner is not set, the lock could be free or
-* held by readers. Check the counter to verify the
-* state.
+* If there is a new owner or the owner is not set, we continue
+* spinning.
 */
-   count = READ_ONCE(sem->count);
-   return (count == 0 || count == RWSEM_WAITING_BIAS);
+   return !rwsem_owner_is_reader(READ_ONCE(sem->owner));
 }
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
@@ -378,7 +372,16 @@ static bool rwsem_optimistic_spin(struct

[PATCH v4 3/5] locking/rwsem: Don't wake up one's own task

2016-05-17 Thread Waiman Long

As rwsem_down_read_failed() will queue itself and potentially call
__rwsem_do_wake(sem, RWSEM_WAKE_ANY), it is possible that a reader
will try to wake up its own task. This patch adds a check to make
sure that this won't happen.

Signed-off-by: Waiman Long 
Reviewed-by: Peter Hurley 
---
 kernel/locking/rwsem-xadd.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index c278f5a..007814f 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -202,7 +202,8 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum 
rwsem_wake_type wake_type)
 */
smp_mb();
waiter->task = NULL;
-   wake_up_process(tsk);
+   if (tsk != current)
+   wake_up_process(tsk);
put_task_struct(tsk);
} while (--loop);
 
-- 
1.7.1

[PATCH v4 5/5] locking/rwsem: Streamline the rwsem_optimistic_spin() code

2016-05-17 Thread Waiman Long

This patch moves the owner loading and checking code entirely inside of
rwsem_spin_on_owner() to simplify the logic of rwsem_optimistic_spin()
loop.

Suggested-by: Peter Hurley 
Signed-off-by: Waiman Long 
Reviewed-by: Peter Hurley 
---
 kernel/locking/rwsem-xadd.c |   38 --
 1 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index e3a7e06..a85a2bd 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -332,9 +332,16 @@ done:
return ret;
 }
 
-static noinline
-bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
+/*
+ * Return true only if we can still spin on the owner field of the rwsem.
+ */
+static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem)
 {
+   struct task_struct *owner = READ_ONCE(sem->owner);
+
+   if (!rwsem_owner_is_writer(owner))
+   goto out;
+
rcu_read_lock();
while (sem->owner == owner) {
/*
@@ -354,7 +361,7 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct 
task_struct *owner)
cpu_relax_lowlatency();
}
rcu_read_unlock();
-
+out:
/*
 * If there is a new owner or the owner is not set, we continue
 * spinning.
@@ -364,7 +371,6 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct 
task_struct *owner)
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 {
-   struct task_struct *owner;
bool taken = false;
 
preempt_disable();
@@ -376,21 +382,17 @@ static bool rwsem_optimistic_spin(struct rw_semaphore 
*sem)
if (!osq_lock(>osq))
goto done;
 
-   while (true) {
-   owner = READ_ONCE(sem->owner);
+   /*
+* Optimistically spin on the owner field and attempt to acquire the
+* lock whenever the owner changes. Spinning will be stopped when:
+*  1) the owning writer isn't running; or
+*  2) readers own the lock as we can't determine if they are
+* actively running or not.
+*/
+   while (rwsem_spin_on_owner(sem)) {
/*
-* Don't spin if
-* 1) the owner is a reader as we we can't determine if the
-*reader is actively running or not.
-* 2) The rwsem_spin_on_owner() returns false which means
-*the owner isn't running.
+* Try to acquire the lock
 */
-   if (rwsem_owner_is_reader(owner) ||
-  (rwsem_owner_is_writer(owner) &&
-  !rwsem_spin_on_owner(sem, owner)))
-   break;
-
-   /* wait_lock will be acquired if write_lock is obtained */
if (rwsem_try_write_lock_unqueued(sem)) {
taken = true;
break;
@@ -402,7 +404,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 * we're an RT task that will live-lock because we won't let
 * the owner complete.
 */
-   if (!owner && (need_resched() || rt_task(current)))
+   if (!sem->owner && (need_resched() || rt_task(current)))
break;
 
/*
-- 
1.7.1

[PATCH v4 0/5] [PATCH v3 0/4] locking/rwsem: Add reader-owned state to the owner field

2016-05-17 Thread Waiman Long

 v3->v4:
  - Add a new patch 2 to use WRITE_ONCE() for all rwsem->owner stores
to prevent store tearing.

 v2->v3:
  - Make minor code changes as suggested by PeterZ & Peter Hurley.
  - Add 2 minor patches (#2 & #3) to improve the rwsem code
  - Add a 4th patch to streamline the rwsem_optimistic_spin() code.

 v1->v2:
  - Add rwsem_is_reader_owned() helper & rename rwsem_reader_owned()
to rwsem_set_reader_owned().
  - Add more comments to clarify the purpose of some of the code
changes.

Patch 1 is the main patch of this series.

Patch 2 protects against store tearing of rwsem->owner field which 
can cause problem when a reader tries to dereference it.

Patch 3 eliminates redundant wakeup caused by a reader waking itself.

Patch 4 improves the efficiency of the reader wakeup code.

Patch 5 streamlines the rwsem_optimistic_spin() to make it simpler.

Waiman Long (5):
  locking/rwsem: Add reader-owned state to the owner field
  locking/rwsem: Protect all writes to owner by WRITE_ONCE()
  locking/rwsem: Don't wake up one's own task
  locking/rwsem: Improve reader wakeup code
  locking/rwsem: Streamline the rwsem_optimistic_spin() code

 kernel/locking/rwsem-xadd.c |   75 --
 kernel/locking/rwsem.c  |8 +++-
 kernel/locking/rwsem.h  |   52 -
 3 files changed, 99 insertions(+), 36 deletions(-)

[PATCH v4 1/5] locking/rwsem: Add reader-owned state to the owner field

2016-05-17 Thread Waiman Long

Currently, it is not possible to determine for sure if a reader
owns a rwsem by looking at the content of the rwsem data structure.
This patch adds a new state RWSEM_READER_OWNED to the owner field
to indicate that readers currently own the lock. This enables us to
address the following 2 issues in the rwsem optimistic spinning code:

 1) rwsem_can_spin_on_owner() will disallow optimistic spinning if
the owner field is NULL which can mean either the readers own
the lock or the owning writer hasn't set the owner field yet.
In the latter case, we miss the chance to do optimistic spinning.

 2) While a writer is waiting in the OSQ and a reader takes the lock,
the writer will continue to spin when out of the OSQ in the main
rwsem_optimistic_spin() loop as the owner field is NULL wasting
CPU cycles if some of readers are sleeping.

Adding the new state will allow optimistic spinning to go forward as
long as the owner field is not RWSEM_READER_OWNED and the owner is
running, if set, but stop immediately when that state has been reached.

On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the
fio test with multithreaded randrw and randwrite tests on the same
file on a XFS partition on top of a NVDIMM were run, the aggregated
bandwidths before and after the patch were as follows:

  Test  BW before patch BW after patch  % change
    --- --  
  randrw 988 MB/s  1192 MB/s  +21%
  randwrite 1513 MB/s  1623 MB/s  +7.3%

The perf profile of the rwsem_down_write_failed() function in randrw
before and after the patch were:

   19.95%  5.88%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed
   14.20%  1.52%  fio  [kernel.vmlinux]  [k] rwsem_down_write_failed

The actual CPU cycles spend in rwsem_down_write_failed() dropped from
5.88% to 1.52% after the patch.

The xfstests was also run and no regression was observed.

Signed-off-by: Waiman Long 
Acked-by: Jason Low 
Acked-by: Davidlohr Bueso 
---
 kernel/locking/rwsem-xadd.c |   41 ++---
 kernel/locking/rwsem.c  |8 ++--
 kernel/locking/rwsem.h  |   41 +
 3 files changed, 69 insertions(+), 21 deletions(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 09e30c6..c278f5a 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -155,6 +155,12 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum 
rwsem_wake_type wake_type)
/* Last active locker left. Retry waking readers. */
goto try_reader_grant;
}
+   /*
+* It is not really necessary to set it to reader-owned here,
+* but it gives the spinners an early indication that the
+* readers now have the lock.
+*/
+   rwsem_set_reader_owned(sem);
}
 
/* Grant an infinite number of read locks to the readers at the front
@@ -306,16 +312,11 @@ static inline bool rwsem_can_spin_on_owner(struct 
rw_semaphore *sem)
 
rcu_read_lock();
owner = READ_ONCE(sem->owner);
-   if (!owner) {
-   long count = READ_ONCE(sem->count);
+   if (!rwsem_owner_is_writer(owner)) {
/*
-* If sem->owner is not set, yet we have just recently entered 
the
-* slowpath with the lock being active, then there is a 
possibility
-* reader(s) may have the lock. To be safe, bail spinning in 
these
-* situations.
+* Don't spin if the rwsem is readers owned.
 */
-   if (count & RWSEM_ACTIVE_MASK)
-   ret = false;
+   ret = !rwsem_owner_is_reader(owner);
goto done;
}
 
@@ -328,8 +329,6 @@ done:
 static noinline
 bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 {
-   long count;
-
rcu_read_lock();
while (sem->owner == owner) {
/*
@@ -350,16 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct 
task_struct *owner)
}
rcu_read_unlock();
 
-   if (READ_ONCE(sem->owner))
-   return true; /* new owner, continue spinning */
-
/*
-* When the owner is not set, the lock could be free or
-* held by readers. Check the counter to verify the
-* state.
+* If there is a new owner or the owner is not set, we continue
+* spinning.
 */
-   count = READ_ONCE(sem->count);
-   return (count == 0 || count == RWSEM_WAITING_BIAS);
+   return !rwsem_owner_is_reader(READ_ONCE(sem->owner));
 }
 
 static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
@@ -378,7 +372,16 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem)
 
while (true) {
owner =

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1262 matches

Mail list logo