Re: Linux-next parallel cp workload hang
On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote: > Hi, > > Parallel cp workload (xfstests generic/273) hangs like blow. > It's reproducible with a small chance, less the 1/100 i think. > > Have hit this in linux-next 20160504 0506 0510 trees, testing on > xfs with loop or block device. Ext4 survived several rounds > of testing. > > Linux next 20160510 tree hangs within 500 rounds testing several > times. The same tree with vfs parallel lookup patchset reverted > survived 900 rounds testing. Reverted commits are attached. What hardware? > Bisecting in this patchset ided this commit: > > 3b0a3c1ac1598722fc289da19219d14f2a37b31f is the first bad commit > commit 3b0a3c1ac1598722fc289da19219d14f2a37b31f > Author: Al Viro> Date: Wed Apr 20 23:42:46 2016 -0400 > > simple local filesystems: switch to ->iterate_shared() > > no changes needed (XFS isn't simple, but it has the same parallelism > in the interesting parts exercised from CXFS). > > With this commit reverted on top of Linux next 0510 tree, 5000+ rounds > of testing passed. > > Although 2000 rounds testing had been conducted before good/bad > verdict, i'm not 100 percent sure about all this, since it's > so hard to hit, and i am not that lucky.. > > Bisect log and full blocked state process dump log are also attached. > > Furthermore, this was first hit when testing fs dax on nvdimm, > however it's reproducible without dax mount option, and also > reproducible on loop device, just seems harder to hit. > > Thanks, > Xiong > > [0.771475] INFO: task cp:49033 blocked for more than 120 seconds. > [0.794263] Not tainted 4.6.0-rc6-next-20160504 #5 > [0.812515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [0.841801] cp D 880b4e977928 0 49033 49014 > 0x0080 > [0.868923] 880b4e977928 880ba275d380 880b8d712b80 > 880b4e978000 > [0.897504] 7fff 0002 > 880b8d712b80 > [0.925234] 880b4e977940 816cbc25 88035a1dabb0 > 880b4e9779e8 > [0.953237] Call Trace: > [0.958314] [] schedule+0x35/0x80 > [0.974854] [] schedule_timeout+0x231/0x2d0 > [0.995728] [] ? down_trylock+0x2d/0x40 > [1.015351] [] ? xfs_iext_bno_to_ext+0xa2/0x190 [xfs] > [1.040182] [] __down_common+0xaa/0x104 > [1.059021] [] ? _xfs_buf_find+0x162/0x340 [xfs] > [1.081357] [] __down+0x1d/0x1f > [1.097166] [] down+0x41/0x50 > [1.112869] [] xfs_buf_lock+0x3c/0xf0 [xfs] > [1.134504] [] _xfs_buf_find+0x162/0x340 [xfs] > [1.156871] [] xfs_buf_get_map+0x2a/0x270 [xfs] So what's holding that directory data buffer lock? It should only be held if there is either IO in progress, or a modification of the buffer in progress that is blocked somewhere else. > [1.180010] [] xfs_buf_read_map+0x2d/0x180 [xfs] > [1.203538] [] xfs_trans_read_buf_map+0xf1/0x300 [xfs] > [1.229194] [] xfs_da_read_buf+0xd1/0x100 [xfs] > [1.251948] [] xfs_dir3_data_read+0x26/0x60 [xfs] > [1.275736] [] xfs_dir2_leaf_readbuf.isra.12+0x1be/0x4a0 > [xfs] > [1.305094] [] ? down_read+0x12/0x30 > [1.323787] [] ? xfs_ilock+0xe4/0x110 [xfs] > [1.345114] [] xfs_dir2_leaf_getdents+0x13b/0x3d0 [xfs] > [1.371818] [] xfs_readdir+0x1a6/0x1c0 [xfs] So we should be holding the ilock in shared mode here... > [1.393471] [] xfs_file_readdir+0x2b/0x30 [xfs] > [1.416874] [] iterate_dir+0x173/0x190 > [1.436709] [] ? do_audit_syscall_entry+0x66/0x70 > [1.460951] [] SyS_getdents+0x98/0x120 > [1.480566] [] ? iterate_dir+0x190/0x190 > [1.500909] [] do_syscall_64+0x62/0x110 > [1.520847] [] entry_SYSCALL64_slow_path+0x25/0x25 > [1.545372] INFO: task cp:49040 blocked for more than 120 seconds. > [1.568933] Not tainted 4.6.0-rc6-next-20160504 #5 > [1.587943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [1.618544] cp D 880b91463b00 0 49040 49016 > 0x0080 > [1.645502] 880b91463b00 880464d5c140 88029b475700 > 880b91464000 > [1.674145] 880411c42610 880411c42628 > 8802c10bc610 > [1.702834] 880b91463b18 816cbc25 88029b475700 > 880b91463b88 > [1.731501] Call Trace: > [1.736866] [] schedule+0x35/0x80 > [1.754119] [] rwsem_down_read_failed+0xf2/0x140 > [1.777411] [] ? xfs_ilock_data_map_shared+0x30/0x40 > [xfs] > [1.805090] [] call_rwsem_down_read_failed+0x18/0x30 > [1.830482] [] down_read+0x20/0x30 > [1.848505] [] xfs_ilock+0xe4/0x110 [xfs] > [1.869293] [] xfs_ilock_data_map_shared+0x30/0x40 And it this is an attempt to lock the inode shared, so if that is failing while there's another shared holder, than means there's an exclusive waiter queued up (i.e. read iheld -> write blocked -> read blocked). So looking at dump-g273xfs0510: [ 845.727907] INFO: task cp:40126 blocked for more than 120 seconds. [ 845.751175] Not tainted 4.6.0-rc7-next-20160510 #9 [ 845.770011] "echo 0 >
Re: Linux-next parallel cp workload hang
On Wed, May 18, 2016 at 09:46:15AM +0800, Xiong Zhou wrote: > Hi, > > Parallel cp workload (xfstests generic/273) hangs like blow. > It's reproducible with a small chance, less the 1/100 i think. > > Have hit this in linux-next 20160504 0506 0510 trees, testing on > xfs with loop or block device. Ext4 survived several rounds > of testing. > > Linux next 20160510 tree hangs within 500 rounds testing several > times. The same tree with vfs parallel lookup patchset reverted > survived 900 rounds testing. Reverted commits are attached. What hardware? > Bisecting in this patchset ided this commit: > > 3b0a3c1ac1598722fc289da19219d14f2a37b31f is the first bad commit > commit 3b0a3c1ac1598722fc289da19219d14f2a37b31f > Author: Al Viro > Date: Wed Apr 20 23:42:46 2016 -0400 > > simple local filesystems: switch to ->iterate_shared() > > no changes needed (XFS isn't simple, but it has the same parallelism > in the interesting parts exercised from CXFS). > > With this commit reverted on top of Linux next 0510 tree, 5000+ rounds > of testing passed. > > Although 2000 rounds testing had been conducted before good/bad > verdict, i'm not 100 percent sure about all this, since it's > so hard to hit, and i am not that lucky.. > > Bisect log and full blocked state process dump log are also attached. > > Furthermore, this was first hit when testing fs dax on nvdimm, > however it's reproducible without dax mount option, and also > reproducible on loop device, just seems harder to hit. > > Thanks, > Xiong > > [0.771475] INFO: task cp:49033 blocked for more than 120 seconds. > [0.794263] Not tainted 4.6.0-rc6-next-20160504 #5 > [0.812515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [0.841801] cp D 880b4e977928 0 49033 49014 > 0x0080 > [0.868923] 880b4e977928 880ba275d380 880b8d712b80 > 880b4e978000 > [0.897504] 7fff 0002 > 880b8d712b80 > [0.925234] 880b4e977940 816cbc25 88035a1dabb0 > 880b4e9779e8 > [0.953237] Call Trace: > [0.958314] [] schedule+0x35/0x80 > [0.974854] [] schedule_timeout+0x231/0x2d0 > [0.995728] [] ? down_trylock+0x2d/0x40 > [1.015351] [] ? xfs_iext_bno_to_ext+0xa2/0x190 [xfs] > [1.040182] [] __down_common+0xaa/0x104 > [1.059021] [] ? _xfs_buf_find+0x162/0x340 [xfs] > [1.081357] [] __down+0x1d/0x1f > [1.097166] [] down+0x41/0x50 > [1.112869] [] xfs_buf_lock+0x3c/0xf0 [xfs] > [1.134504] [] _xfs_buf_find+0x162/0x340 [xfs] > [1.156871] [] xfs_buf_get_map+0x2a/0x270 [xfs] So what's holding that directory data buffer lock? It should only be held if there is either IO in progress, or a modification of the buffer in progress that is blocked somewhere else. > [1.180010] [] xfs_buf_read_map+0x2d/0x180 [xfs] > [1.203538] [] xfs_trans_read_buf_map+0xf1/0x300 [xfs] > [1.229194] [] xfs_da_read_buf+0xd1/0x100 [xfs] > [1.251948] [] xfs_dir3_data_read+0x26/0x60 [xfs] > [1.275736] [] xfs_dir2_leaf_readbuf.isra.12+0x1be/0x4a0 > [xfs] > [1.305094] [] ? down_read+0x12/0x30 > [1.323787] [] ? xfs_ilock+0xe4/0x110 [xfs] > [1.345114] [] xfs_dir2_leaf_getdents+0x13b/0x3d0 [xfs] > [1.371818] [] xfs_readdir+0x1a6/0x1c0 [xfs] So we should be holding the ilock in shared mode here... > [1.393471] [] xfs_file_readdir+0x2b/0x30 [xfs] > [1.416874] [] iterate_dir+0x173/0x190 > [1.436709] [] ? do_audit_syscall_entry+0x66/0x70 > [1.460951] [] SyS_getdents+0x98/0x120 > [1.480566] [] ? iterate_dir+0x190/0x190 > [1.500909] [] do_syscall_64+0x62/0x110 > [1.520847] [] entry_SYSCALL64_slow_path+0x25/0x25 > [1.545372] INFO: task cp:49040 blocked for more than 120 seconds. > [1.568933] Not tainted 4.6.0-rc6-next-20160504 #5 > [1.587943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [1.618544] cp D 880b91463b00 0 49040 49016 > 0x0080 > [1.645502] 880b91463b00 880464d5c140 88029b475700 > 880b91464000 > [1.674145] 880411c42610 880411c42628 > 8802c10bc610 > [1.702834] 880b91463b18 816cbc25 88029b475700 > 880b91463b88 > [1.731501] Call Trace: > [1.736866] [] schedule+0x35/0x80 > [1.754119] [] rwsem_down_read_failed+0xf2/0x140 > [1.777411] [] ? xfs_ilock_data_map_shared+0x30/0x40 > [xfs] > [1.805090] [] call_rwsem_down_read_failed+0x18/0x30 > [1.830482] [] down_read+0x20/0x30 > [1.848505] [] xfs_ilock+0xe4/0x110 [xfs] > [1.869293] [] xfs_ilock_data_map_shared+0x30/0x40 And it this is an attempt to lock the inode shared, so if that is failing while there's another shared holder, than means there's an exclusive waiter queued up (i.e. read iheld -> write blocked -> read blocked). So looking at dump-g273xfs0510: [ 845.727907] INFO: task cp:40126 blocked for more than 120 seconds. [ 845.751175] Not tainted 4.6.0-rc7-next-20160510 #9 [ 845.770011] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
Re: [PATCH 1/2] arm64: dts: NS2: Add all of the UARTs
On 2016/5/18 3:56, Florian Fainelli wrote: > On 05/15/2016 06:11 PM, Kefeng Wang wrote: >>> I can confirm that with your change and the change to the bootargs you >>> describe above, it works as desired. Was your change already accepted? >>> >> >> Great, thanks a lot. it is still being reviewing for now and waiting for >> response for now. > > Then I would be inclined to take Jon's patch as-is, and follow up with > an additional patch to the NS2 DTS once yours lands in. > Sure, please. Kefeng
Re: [PATCH 1/2] arm64: dts: NS2: Add all of the UARTs
On 2016/5/18 3:56, Florian Fainelli wrote: > On 05/15/2016 06:11 PM, Kefeng Wang wrote: >>> I can confirm that with your change and the change to the bootargs you >>> describe above, it works as desired. Was your change already accepted? >>> >> >> Great, thanks a lot. it is still being reviewing for now and waiting for >> response for now. > > Then I would be inclined to take Jon's patch as-is, and follow up with > an additional patch to the NS2 DTS once yours lands in. > Sure, please. Kefeng
[RFC][PATCH 8/7] sched/fair: Use utilization distance to filter affine sync wakeups
On Mon, 2016-05-09 at 12:48 +0200, Peter Zijlstra wrote: > Hai, (got some of the frozen variety handy?:) > here be a semi coherent patch series for the recent select_idle_siblings() > tinkering. Happy benchmarking.. And tinkering on top of your rewrite series... sched/fair: Use utilization distance to filter affine sync wakeups Identifying truly synchronous tasks accurately is annoyingly fragile, which led to the demise of the old avg_overlap heuristic, which meant that we schedule tasks high frequency localhost communicating buddies to L3 vs L2, causing them to take painful cache misses needlessly. To combat this, track average utilization distance, and when both waker/wakee are short duration tasks cycling at the ~same frequency (ie can't have any appreciable reclaimable overlap), and the sync hint has been passed, take that as a queue that pulling the wakee to hot L2 is very likely to be a win. Changes in behavior, such as taking a long nap, bursts of other activity, or sharing the rq with tasks that are not cycling rapidly will quickly encourage the pair to search for a new home, where they can again find each other. This only helps really fast movers, but that's ok (if we can get away with it at all), as these are the ones that need some help. It's dirt simple, cheap, and seems to work pretty well. It does help fast movers, does not wreck lmbench AF_UNIX/TCP throughput gains that select_idle_sibling() provided, and didn't change pgbench numbers one bit on my desktop box, ie tight discrimination criteria seems to work out ok in light testing, so _maybe_ not completely useless... 4 x E7-8890 tbench Throughput 598.158 MB/sec 1 clients 1 procs max_latency=0.287 ms 1.000 Throughput 1166.26 MB/sec 2 clients 2 procs max_latency=0.076 ms 1.000 Throughput 2214.55 MB/sec 4 clients 4 procs max_latency=0.087 ms 1.000 Throughput 4264.44 MB/sec 8 clients 8 procs max_latency=0.164 ms 1.000 Throughput 7780.58 MB/sec 16 clients 16 procs max_latency=0.109 ms 1.000 Throughput 15199.3 MB/sec 32 clients 32 procs max_latency=0.293 ms 1.000 Throughput 21714.8 MB/sec 64 clients 64 procs max_latency=0.872 ms 1.000 Throughput 44916.1 MB/sec 128 clients 128 procs max_latency=4.821 ms 1.000 Throughput 76294.5 MB/sec 256 clients 256 procs max_latency=7.375 ms 1.000 +IDLE_SYNC Throughput 737.781 MB/sec 1 clients 1 procs max_latency=0.248 ms 1.233 Throughput 1478.49 MB/sec 2 clients 2 procs max_latency=0.321 ms 1.267 Throughput 2506.98 MB/sec 4 clients 4 procs max_latency=0.413 ms 1.132 Throughput 4359.15 MB/sec 8 clients 8 procs max_latency=0.306 ms 1.022 Throughput 9025.05 MB/sec 16 clients 16 procs max_latency=0.349 ms 1.159 Throughput 18703.1 MB/sec 32 clients 32 procs max_latency=0.290 ms 1.230 Throughput 33600.8 MB/sec 64 clients 64 procs max_latency=6.469 ms 1.547 Throughput 59084.3 MB/sec 128 clients 128 procs max_latency=5.031 ms 1.315 Throughput 75705.8 MB/sec 256 clients 256 procs max_latency=24.113 ms 0.992 1 x i4790 lmbench3 *Local* Communication bandwidths in MB/s - bigger is better - HostOS Pipe AFTCP File Mmap Bcopy Bcopy Mem Mem UNIX reread reread (libc) (hand) read write - - -- -- -- -- - IDLE_CORE+IDLE_CPU+IDLE_SMT homer 4.6.0-masterx 6027 14.K 9773 8905.2 15.2K 10.1K 6775.0 15.K 10.0K homer 4.6.0-masterx 5962 14.K 9881 8900.7 15.0K 10.1K 6785.2 15.K 10.0K homer 4.6.0-masterx 5935 14.K 9917 8946.2 15.0K 10.1K 6761.8 15.K 9826. +IDLE_SYNC homer 4.6.0-masterx 8865 14.K 9807 8880.6 14.7K 10.1K 6777.9 15.K 9966. homer 4.6.0-masterx 8855 13.K 9856 8844.5 15.2K 10.1K 6752.1 15.K 10.0K homer 4.6.0-masterx 8896 14.K 9836 8880.1 15.0K 10.2K 6771.6 15.K 9941. ^++ ^+- ^+- select_idle_sibling() completely disabled homer 4.6.0-masterx 8810 9807 7109 8982.8 15.4K 10.2K 6831.7 15.K 10.1K homer 4.6.0-masterx 8877 9757 6864 8970.1 15.3K 10.2K 6826.6 15.K 10.1K homer 4.6.0-masterx 8779 9736 10.K 8975.6 15.4K 10.1K 6830.2 15.K 10.1K ^++ ^-- ^-- Signed-off-by: Mike Galbraith--- include/linux/sched.h |2 - kernel/sched/core.c |6 - kernel/sched/fair.c | 49 +--- kernel/sched/features.h |1 kernel/sched/sched.h|7 ++ 5 files changed, 51 insertions(+), 14 deletions(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1302,7 +1302,7 @@ struct load_weight { * issues. */ struct sched_avg { - u64 last_update_time, load_sum; + u64 last_update_time, load_sum, util_dist_us; u32 util_sum,
[RFC][PATCH 8/7] sched/fair: Use utilization distance to filter affine sync wakeups
On Mon, 2016-05-09 at 12:48 +0200, Peter Zijlstra wrote: > Hai, (got some of the frozen variety handy?:) > here be a semi coherent patch series for the recent select_idle_siblings() > tinkering. Happy benchmarking.. And tinkering on top of your rewrite series... sched/fair: Use utilization distance to filter affine sync wakeups Identifying truly synchronous tasks accurately is annoyingly fragile, which led to the demise of the old avg_overlap heuristic, which meant that we schedule tasks high frequency localhost communicating buddies to L3 vs L2, causing them to take painful cache misses needlessly. To combat this, track average utilization distance, and when both waker/wakee are short duration tasks cycling at the ~same frequency (ie can't have any appreciable reclaimable overlap), and the sync hint has been passed, take that as a queue that pulling the wakee to hot L2 is very likely to be a win. Changes in behavior, such as taking a long nap, bursts of other activity, or sharing the rq with tasks that are not cycling rapidly will quickly encourage the pair to search for a new home, where they can again find each other. This only helps really fast movers, but that's ok (if we can get away with it at all), as these are the ones that need some help. It's dirt simple, cheap, and seems to work pretty well. It does help fast movers, does not wreck lmbench AF_UNIX/TCP throughput gains that select_idle_sibling() provided, and didn't change pgbench numbers one bit on my desktop box, ie tight discrimination criteria seems to work out ok in light testing, so _maybe_ not completely useless... 4 x E7-8890 tbench Throughput 598.158 MB/sec 1 clients 1 procs max_latency=0.287 ms 1.000 Throughput 1166.26 MB/sec 2 clients 2 procs max_latency=0.076 ms 1.000 Throughput 2214.55 MB/sec 4 clients 4 procs max_latency=0.087 ms 1.000 Throughput 4264.44 MB/sec 8 clients 8 procs max_latency=0.164 ms 1.000 Throughput 7780.58 MB/sec 16 clients 16 procs max_latency=0.109 ms 1.000 Throughput 15199.3 MB/sec 32 clients 32 procs max_latency=0.293 ms 1.000 Throughput 21714.8 MB/sec 64 clients 64 procs max_latency=0.872 ms 1.000 Throughput 44916.1 MB/sec 128 clients 128 procs max_latency=4.821 ms 1.000 Throughput 76294.5 MB/sec 256 clients 256 procs max_latency=7.375 ms 1.000 +IDLE_SYNC Throughput 737.781 MB/sec 1 clients 1 procs max_latency=0.248 ms 1.233 Throughput 1478.49 MB/sec 2 clients 2 procs max_latency=0.321 ms 1.267 Throughput 2506.98 MB/sec 4 clients 4 procs max_latency=0.413 ms 1.132 Throughput 4359.15 MB/sec 8 clients 8 procs max_latency=0.306 ms 1.022 Throughput 9025.05 MB/sec 16 clients 16 procs max_latency=0.349 ms 1.159 Throughput 18703.1 MB/sec 32 clients 32 procs max_latency=0.290 ms 1.230 Throughput 33600.8 MB/sec 64 clients 64 procs max_latency=6.469 ms 1.547 Throughput 59084.3 MB/sec 128 clients 128 procs max_latency=5.031 ms 1.315 Throughput 75705.8 MB/sec 256 clients 256 procs max_latency=24.113 ms 0.992 1 x i4790 lmbench3 *Local* Communication bandwidths in MB/s - bigger is better - HostOS Pipe AFTCP File Mmap Bcopy Bcopy Mem Mem UNIX reread reread (libc) (hand) read write - - -- -- -- -- - IDLE_CORE+IDLE_CPU+IDLE_SMT homer 4.6.0-masterx 6027 14.K 9773 8905.2 15.2K 10.1K 6775.0 15.K 10.0K homer 4.6.0-masterx 5962 14.K 9881 8900.7 15.0K 10.1K 6785.2 15.K 10.0K homer 4.6.0-masterx 5935 14.K 9917 8946.2 15.0K 10.1K 6761.8 15.K 9826. +IDLE_SYNC homer 4.6.0-masterx 8865 14.K 9807 8880.6 14.7K 10.1K 6777.9 15.K 9966. homer 4.6.0-masterx 8855 13.K 9856 8844.5 15.2K 10.1K 6752.1 15.K 10.0K homer 4.6.0-masterx 8896 14.K 9836 8880.1 15.0K 10.2K 6771.6 15.K 9941. ^++ ^+- ^+- select_idle_sibling() completely disabled homer 4.6.0-masterx 8810 9807 7109 8982.8 15.4K 10.2K 6831.7 15.K 10.1K homer 4.6.0-masterx 8877 9757 6864 8970.1 15.3K 10.2K 6826.6 15.K 10.1K homer 4.6.0-masterx 8779 9736 10.K 8975.6 15.4K 10.1K 6830.2 15.K 10.1K ^++ ^-- ^-- Signed-off-by: Mike Galbraith --- include/linux/sched.h |2 - kernel/sched/core.c |6 - kernel/sched/fair.c | 49 +--- kernel/sched/features.h |1 kernel/sched/sched.h|7 ++ 5 files changed, 51 insertions(+), 14 deletions(-) --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1302,7 +1302,7 @@ struct load_weight { * issues. */ struct sched_avg { - u64 last_update_time, load_sum; + u64 last_update_time, load_sum, util_dist_us; u32 util_sum, period_contrib; unsigned
Re: UBSAN: Undefined behaviour in block/blk-mq.c:1459:27 with pata_amd
> Does the patch below help? > > From: Bartlomiej Zolnierkiewicz> Subject: [PATCH] blk-mq: fix undefined behaviour in order_to_size() > > When this_order variable in blk_mq_init_rq_map() becomes zero > the code incorrectly decrements the variable and passes the result > to order_to_size() helper causing undefined behaviour: > > UBSAN: Undefined behaviour in block/blk-mq.c:1459:27 > shift exponent 4294967295 is too large for 32-bit type 'unsigned int' > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc6-00072-g33656a1 #22 > > Fix the code by checking this_order variable for not having the zero > value first. > > Reported-by: Meelis Roos > Fixes: 320ae51feed5 ("blk-mq: new multi-queue block IO queueing mechanism") > Signed-off-by: Bartlomiej Zolnierkiewicz It fixes the warning independently of the pata driver - ata_piix pata_via pata_serverworks pata_macio 3ware were fixed too. -- Meelis Roos (mr...@linux.ee)
Re: UBSAN: Undefined behaviour in block/blk-mq.c:1459:27 with pata_amd
> Does the patch below help? > > From: Bartlomiej Zolnierkiewicz > Subject: [PATCH] blk-mq: fix undefined behaviour in order_to_size() > > When this_order variable in blk_mq_init_rq_map() becomes zero > the code incorrectly decrements the variable and passes the result > to order_to_size() helper causing undefined behaviour: > > UBSAN: Undefined behaviour in block/blk-mq.c:1459:27 > shift exponent 4294967295 is too large for 32-bit type 'unsigned int' > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc6-00072-g33656a1 #22 > > Fix the code by checking this_order variable for not having the zero > value first. > > Reported-by: Meelis Roos > Fixes: 320ae51feed5 ("blk-mq: new multi-queue block IO queueing mechanism") > Signed-off-by: Bartlomiej Zolnierkiewicz It fixes the warning independently of the pata driver - ata_piix pata_via pata_serverworks pata_macio 3ware were fixed too. -- Meelis Roos (mr...@linux.ee)
Re: [PATCH] arc: axs103_smp: Fix CPU frequency to 100MHz for dual-core
On Monday 16 May 2016 03:27 PM, Alexey Brodkin wrote: > The most recent release of AXS103 [v1.1] is proven to work > at 100 MHz in dual-core mode so this change uses mentioned feature. > For that we: > * Update axc003_idu.dtsi with mention of really-used CPU clock freq > * Remove clock override in AXS platform code for dual-core HW > > Note we're still leaving a hack for clock "downgrade" on early boot > for quad-core hardware. > > Also note this change will break functionality of AXS103 v1.0 hardware. > That means all users of AXS103 __must__ upgrade their boards with the > most recent firmware. > > Signed-off-by: Alexey BrodkinApplied for 4.7 Thx, -Vineet
Re: [PATCH] arc: axs103_smp: Fix CPU frequency to 100MHz for dual-core
On Monday 16 May 2016 03:27 PM, Alexey Brodkin wrote: > The most recent release of AXS103 [v1.1] is proven to work > at 100 MHz in dual-core mode so this change uses mentioned feature. > For that we: > * Update axc003_idu.dtsi with mention of really-used CPU clock freq > * Remove clock override in AXS platform code for dual-core HW > > Note we're still leaving a hack for clock "downgrade" on early boot > for quad-core hardware. > > Also note this change will break functionality of AXS103 v1.0 hardware. > That means all users of AXS103 __must__ upgrade their boards with the > most recent firmware. > > Signed-off-by: Alexey Brodkin Applied for 4.7 Thx, -Vineet
Re: [PATCH] ARC: Troubleshoot execution of UP Linux on SMP HW and vice versa
On Wednesday 18 May 2016 02:36 AM, Alexey Brodkin wrote: > ARC SMP hardware heavily relies on Interrupt Distribution Unit (IDU) > for all interrupts serving. And UP ARC hardware lacks this block. > > That leads to incompatibility between UP and SMP Linux builds. > > Even though UP build of Linux will run on SMP hardware at some > point strange behavior will appear. Very good example is serial port > will stop functioning once it switches from earlycon driver (which > doesn't use interrupts) to full-scale serial driver (that will rely > on interrupts). > > The same is applicable to reverse combination: SMP build won't > work on UP hardware and symptoms will be pretty much the same. This is not necessarily correct. I'm pretty sure that if u have right DT (despite embedded, if you have uboot provide a different one), SMP kernel will infact boot on UP hardware - and if not - we should actively try to achieve that. That is where world is moving: fwiw ARM64 kernel forces CONFIG_SMP because and doesn't even support UP anymore. > And so to save [especially newcomers] from spending hours in > frustration we're doing a check very early on boot if the kernel was > configured with CONFIG_ARC_MCIP (which is automatically selected as > a dependency of CONFIG_SMP) and in run-time we're seeing SMP-specific > register that holds a number of SMP cores. > > Signed-off-by: Alexey Brodkin> --- > arch/arc/kernel/setup.c | 12 ... > @@ -374,6 +375,8 @@ static inline int is_kernel(unsigned long addr) > > void __init setup_arch(char **cmdline_p) > { > + unsigned int num_cores; > + > #ifdef CONFIG_ARC_UBOOT_SUPPORT > /* make sure that uboot passed pointer to cmdline/dtb is valid */ > if (uboot_tag && is_kernel((unsigned long)uboot_arg)) > @@ -413,6 +416,15 @@ void __init setup_arch(char **cmdline_p) > if (machine_desc->init_early) > machine_desc->init_early(); > > + num_cores = (read_aux_reg(ARC_REG_MCIP_BCR) >> 16) & 0x3F; > +#ifdef CONFIG_ARC_MCIP > + if (!num_cores) > + panic("SMP kernel is run on a UP hardware!\n"); > +#else > + if (num_cores) > + panic("UP kernel is run on a SMP hardware!\n"); > +#endif This is ugly: if AXS platform has trouble booting with UP/SMP hw/sw mismatch, do that in platform early init code w/o littering platform agnostic code unless absolutely necessary. > + > smp_init_cpus(); > > setup_processor(); >
Re: [PATCH] ARC: Troubleshoot execution of UP Linux on SMP HW and vice versa
On Wednesday 18 May 2016 02:36 AM, Alexey Brodkin wrote: > ARC SMP hardware heavily relies on Interrupt Distribution Unit (IDU) > for all interrupts serving. And UP ARC hardware lacks this block. > > That leads to incompatibility between UP and SMP Linux builds. > > Even though UP build of Linux will run on SMP hardware at some > point strange behavior will appear. Very good example is serial port > will stop functioning once it switches from earlycon driver (which > doesn't use interrupts) to full-scale serial driver (that will rely > on interrupts). > > The same is applicable to reverse combination: SMP build won't > work on UP hardware and symptoms will be pretty much the same. This is not necessarily correct. I'm pretty sure that if u have right DT (despite embedded, if you have uboot provide a different one), SMP kernel will infact boot on UP hardware - and if not - we should actively try to achieve that. That is where world is moving: fwiw ARM64 kernel forces CONFIG_SMP because and doesn't even support UP anymore. > And so to save [especially newcomers] from spending hours in > frustration we're doing a check very early on boot if the kernel was > configured with CONFIG_ARC_MCIP (which is automatically selected as > a dependency of CONFIG_SMP) and in run-time we're seeing SMP-specific > register that holds a number of SMP cores. > > Signed-off-by: Alexey Brodkin > --- > arch/arc/kernel/setup.c | 12 ... > @@ -374,6 +375,8 @@ static inline int is_kernel(unsigned long addr) > > void __init setup_arch(char **cmdline_p) > { > + unsigned int num_cores; > + > #ifdef CONFIG_ARC_UBOOT_SUPPORT > /* make sure that uboot passed pointer to cmdline/dtb is valid */ > if (uboot_tag && is_kernel((unsigned long)uboot_arg)) > @@ -413,6 +416,15 @@ void __init setup_arch(char **cmdline_p) > if (machine_desc->init_early) > machine_desc->init_early(); > > + num_cores = (read_aux_reg(ARC_REG_MCIP_BCR) >> 16) & 0x3F; > +#ifdef CONFIG_ARC_MCIP > + if (!num_cores) > + panic("SMP kernel is run on a UP hardware!\n"); > +#else > + if (num_cores) > + panic("UP kernel is run on a SMP hardware!\n"); > +#endif This is ugly: if AXS platform has trouble booting with UP/SMP hw/sw mismatch, do that in platform early init code w/o littering platform agnostic code unless absolutely necessary. > + > smp_init_cpus(); > > setup_processor(); >
Re: [PATCH] net: au1000 eth: simplify logical expression
Le 17/05/2016 16:58, Heinrich Schuchardt a écrit : > (a && a > 0) is equivalent to (a > 0). > > Signed-off-by: Heinrich SchuchardtAcked-by: Florian Fainelli -- Florian
Re: [PATCH] net: au1000 eth: simplify logical expression
Le 17/05/2016 16:58, Heinrich Schuchardt a écrit : > (a && a > 0) is equivalent to (a > 0). > > Signed-off-by: Heinrich Schuchardt Acked-by: Florian Fainelli -- Florian
Re: Crashes in -next due to 'phy: add support for a reset-gpio specification'
Le 17/05/2016 21:37, Guenter Roeck a écrit : > Hi, > > my xtensa qemu tests crash in -next as follows. > > [ ... ] > > [9.366256] libphy: ethoc-mdio: probed > [9.367389] (null): could not attach to PHY > [9.368555] (null): failed to probe MDIO bus > [9.371540] Unable to handle kernel paging request at virtual address > 001c > [9.371540] pc = d0320926, ra = 903209d1 > [9.375358] Oops: sig: 11 [#1] > [9.376081] PREEMPT > [9.377080] CPU: 0 PID: 1 Comm: swapper Not tainted > 4.6.0-next-20160517 #1 > [9.378397] task: d7c2c000 ti: d7c3 task.ti: d7c3 > [9.379394] a00: 903209d1 d7c31bd0 d7fb5810 0001 > d7f45c00 d7c31bd0 > [9.382298] a08: 00060100 > d04b0c10 d7f45dfc d7c31bb0 > [9.385732] pc: d0320926, ps: 00060110, depc: 0018, excvaddr: > 001c > [9.387061] lbeg: d0322e35, lend: d0322e57 lcount: , sar: > 0011 > [9.388173] > Stack: d7c31be0 00060700 d7f45c00 d7c31bd0 9021d509 d7c31c30 d7f45c00 > >d0485dcc d0485dcc d7fb5810 d7c2c000 d7c31c30 d7f45c00 > d025befc >d0485dcc d7c3 d7f45c34 d7c31bf0 9021c985 d7c31c50 d7f45c00 > d7f45c34 > [9.396652] Call Trace: > [9.397469] [] __device_release_driver+0x7d/0x98 > [9.398869] [] device_release_driver+0x15/0x20 > [9.400247] [] bus_remove_device+0xc1/0xd4 > [9.401569] [] device_del+0x109/0x15c > [9.402794] [] phy_mdio_device_remove+0xd/0x18 > [9.404124] [] mdiobus_unregister+0x40/0x5c > [9.405444] [] ethoc_probe+0x534/0x5b8 > [9.406742] [] platform_drv_probe+0x28/0x48 > [9.408122] [] driver_probe_device+0x101/0x234 > [9.409499] [] __driver_attach+0x7d/0x98 > [9.410809] [] bus_for_each_dev+0x30/0x5c > [9.412104] [] driver_attach+0x14/0x18 > [9.413385] [] bus_add_driver+0xc9/0x198 > [9.414686] [] driver_register+0x70/0xa0 > [9.416001] [] __platform_driver_register+0x24/0x28 > [9.417463] [] ethoc_driver_init+0x10/0x14 > [9.418824] [] do_one_initcall+0x80/0x1ac > [9.420083] [] kernel_init_freeable+0x131/0x198 > [9.421504] [] kernel_init+0xc/0xb0 > [9.422693] [] ret_from_kernel_thread+0x8/0xc > > Bisect points to commit da47b4572056 ("phy: add support for a reset-gpio > specification"). > Bisect log is attached. Reverting the patch fixes the problem. Aside from what you pointed out, this patch was still in dicussion when it got merged, since we got a concurrent patch from Sergei which tries to deal with the same kind of problem. Do you mind sending a revert, or I can do that first thing in the morning. > > I think there may be a number of problems, all of them exposed by the patch > but really separate. > > GPIOLIB is not configured in my test case, meaning gpiod_get_optional() > returns -ENOSYS, and phy_probe() thus returns an error. Question here is if > it is really appropriate for the XXX_optional() gpiolib functions to return > an error if GPIOLIB is not configured. Either case, result is that pretty > much all phy registrations will now fail if GPIOLIB is not configured. > > Also, I suspect that there may be a bug in the error handling path > of ethoc_probe(). No idea what exactly is wrong, though. Other drivers > use pretty much the same code sequence for mdio registration and associated > error handling. > > Last but not least, something seems to be wrong with the use of dev_err() > with >dev if register_netdev() has not yet been called. Maybe > someone > has some insight ? It all depends if SET_NETDEV_DEV() has had a chance to run, but in general it is kind of a bad idea to use netdev_* before the interface has been registered, since it won't have any valid name. -- Florian
Re: Crashes in -next due to 'phy: add support for a reset-gpio specification'
Le 17/05/2016 21:37, Guenter Roeck a écrit : > Hi, > > my xtensa qemu tests crash in -next as follows. > > [ ... ] > > [9.366256] libphy: ethoc-mdio: probed > [9.367389] (null): could not attach to PHY > [9.368555] (null): failed to probe MDIO bus > [9.371540] Unable to handle kernel paging request at virtual address > 001c > [9.371540] pc = d0320926, ra = 903209d1 > [9.375358] Oops: sig: 11 [#1] > [9.376081] PREEMPT > [9.377080] CPU: 0 PID: 1 Comm: swapper Not tainted > 4.6.0-next-20160517 #1 > [9.378397] task: d7c2c000 ti: d7c3 task.ti: d7c3 > [9.379394] a00: 903209d1 d7c31bd0 d7fb5810 0001 > d7f45c00 d7c31bd0 > [9.382298] a08: 00060100 > d04b0c10 d7f45dfc d7c31bb0 > [9.385732] pc: d0320926, ps: 00060110, depc: 0018, excvaddr: > 001c > [9.387061] lbeg: d0322e35, lend: d0322e57 lcount: , sar: > 0011 > [9.388173] > Stack: d7c31be0 00060700 d7f45c00 d7c31bd0 9021d509 d7c31c30 d7f45c00 > >d0485dcc d0485dcc d7fb5810 d7c2c000 d7c31c30 d7f45c00 > d025befc >d0485dcc d7c3 d7f45c34 d7c31bf0 9021c985 d7c31c50 d7f45c00 > d7f45c34 > [9.396652] Call Trace: > [9.397469] [] __device_release_driver+0x7d/0x98 > [9.398869] [] device_release_driver+0x15/0x20 > [9.400247] [] bus_remove_device+0xc1/0xd4 > [9.401569] [] device_del+0x109/0x15c > [9.402794] [] phy_mdio_device_remove+0xd/0x18 > [9.404124] [] mdiobus_unregister+0x40/0x5c > [9.405444] [] ethoc_probe+0x534/0x5b8 > [9.406742] [] platform_drv_probe+0x28/0x48 > [9.408122] [] driver_probe_device+0x101/0x234 > [9.409499] [] __driver_attach+0x7d/0x98 > [9.410809] [] bus_for_each_dev+0x30/0x5c > [9.412104] [] driver_attach+0x14/0x18 > [9.413385] [] bus_add_driver+0xc9/0x198 > [9.414686] [] driver_register+0x70/0xa0 > [9.416001] [] __platform_driver_register+0x24/0x28 > [9.417463] [] ethoc_driver_init+0x10/0x14 > [9.418824] [] do_one_initcall+0x80/0x1ac > [9.420083] [] kernel_init_freeable+0x131/0x198 > [9.421504] [] kernel_init+0xc/0xb0 > [9.422693] [] ret_from_kernel_thread+0x8/0xc > > Bisect points to commit da47b4572056 ("phy: add support for a reset-gpio > specification"). > Bisect log is attached. Reverting the patch fixes the problem. Aside from what you pointed out, this patch was still in dicussion when it got merged, since we got a concurrent patch from Sergei which tries to deal with the same kind of problem. Do you mind sending a revert, or I can do that first thing in the morning. > > I think there may be a number of problems, all of them exposed by the patch > but really separate. > > GPIOLIB is not configured in my test case, meaning gpiod_get_optional() > returns -ENOSYS, and phy_probe() thus returns an error. Question here is if > it is really appropriate for the XXX_optional() gpiolib functions to return > an error if GPIOLIB is not configured. Either case, result is that pretty > much all phy registrations will now fail if GPIOLIB is not configured. > > Also, I suspect that there may be a bug in the error handling path > of ethoc_probe(). No idea what exactly is wrong, though. Other drivers > use pretty much the same code sequence for mdio registration and associated > error handling. > > Last but not least, something seems to be wrong with the use of dev_err() > with >dev if register_netdev() has not yet been called. Maybe > someone > has some insight ? It all depends if SET_NETDEV_DEV() has had a chance to run, but in general it is kind of a bad idea to use netdev_* before the interface has been registered, since it won't have any valid name. -- Florian
[PATCH 1/1] Staging: comedi: fix CHECK: Prefer using the BIT macro issues in pcmmio.c
This patch Replace all occurences of (1<--- drivers/staging/comedi/drivers/pcmmio.c | 40 - 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/drivers/staging/comedi/drivers/pcmmio.c b/drivers/staging/comedi/drivers/pcmmio.c index 10472e6..70ad497 100644 --- a/drivers/staging/comedi/drivers/pcmmio.c +++ b/drivers/staging/comedi/drivers/pcmmio.c @@ -84,25 +84,25 @@ #define PCMMIO_AI_LSB_REG 0x00 #define PCMMIO_AI_MSB_REG 0x01 #define PCMMIO_AI_CMD_REG 0x02 -#define PCMMIO_AI_CMD_SE (1 << 7) -#define PCMMIO_AI_CMD_ODD_CHAN (1 << 6) +#define PCMMIO_AI_CMD_SE BIT(7) +#define PCMMIO_AI_CMD_ODD_CHAN BIT(6) #define PCMMIO_AI_CMD_CHAN_SEL(x) (((x) & 0x3) << 4) #define PCMMIO_AI_CMD_RANGE(x) (((x) & 0x3) << 2) #define PCMMIO_RESOURCE_REG0x02 #define PCMMIO_RESOURCE_IRQ(x) (((x) & 0xf) << 0) #define PCMMIO_AI_STATUS_REG 0x03 -#define PCMMIO_AI_STATUS_DATA_READY(1 << 7) -#define PCMMIO_AI_STATUS_DATA_DMA_PEND (1 << 6) -#define PCMMIO_AI_STATUS_CMD_DMA_PEND (1 << 5) -#define PCMMIO_AI_STATUS_IRQ_PEND (1 << 4) -#define PCMMIO_AI_STATUS_DATA_DRQ_ENA (1 << 2) -#define PCMMIO_AI_STATUS_REG_SEL (1 << 3) -#define PCMMIO_AI_STATUS_CMD_DRQ_ENA (1 << 1) -#define PCMMIO_AI_STATUS_IRQ_ENA (1 << 0) +#define PCMMIO_AI_STATUS_DATA_READYBIT(7) +#define PCMMIO_AI_STATUS_DATA_DMA_PEND BIT(6) +#define PCMMIO_AI_STATUS_CMD_DMA_PEND BIT(5) +#define PCMMIO_AI_STATUS_IRQ_PEND BIT(4) +#define PCMMIO_AI_STATUS_DATA_DRQ_ENA BIT(2) +#define PCMMIO_AI_STATUS_REG_SEL BIT(3) +#define PCMMIO_AI_STATUS_CMD_DRQ_ENA BIT(1) +#define PCMMIO_AI_STATUS_IRQ_ENA BIT(0) #define PCMMIO_AI_RES_ENA_REG 0x03 #define PCMMIO_AI_RES_ENA_CMD_REG_ACCESS (0 << 3) -#define PCMMIO_AI_RES_ENA_AI_RES_ACCESS(1 << 3) -#define PCMMIO_AI_RES_ENA_DIO_RES_ACCESS (1 << 4) +#define PCMMIO_AI_RES_ENA_AI_RES_ACCESSBIT(3) +#define PCMMIO_AI_RES_ENA_DIO_RES_ACCESS BIT(4) #define PCMMIO_AI_2ND_ADC_OFFSET 0x04 #define PCMMIO_AO_LSB_REG 0x08 @@ -125,14 +125,14 @@ #define PCMMIO_AO_CMD_CHAN_SEL(x) (((x) & 0x03) << 1) #define PCMMIO_AO_CMD_CHAN_SEL_ALL (0x0f << 0) #define PCMMIO_AO_STATUS_REG 0x0b -#define PCMMIO_AO_STATUS_DATA_READY(1 << 7) -#define PCMMIO_AO_STATUS_DATA_DMA_PEND (1 << 6) -#define PCMMIO_AO_STATUS_CMD_DMA_PEND (1 << 5) -#define PCMMIO_AO_STATUS_IRQ_PEND (1 << 4) -#define PCMMIO_AO_STATUS_DATA_DRQ_ENA (1 << 2) -#define PCMMIO_AO_STATUS_REG_SEL (1 << 3) -#define PCMMIO_AO_STATUS_CMD_DRQ_ENA (1 << 1) -#define PCMMIO_AO_STATUS_IRQ_ENA (1 << 0) +#define PCMMIO_AO_STATUS_DATA_READYBIT(7) +#define PCMMIO_AO_STATUS_DATA_DMA_PEND BIT(6) +#define PCMMIO_AO_STATUS_CMD_DMA_PEND BIT(5) +#define PCMMIO_AO_STATUS_IRQ_PEND BIT(4) +#define PCMMIO_AO_STATUS_DATA_DRQ_ENA BIT(2) +#define PCMMIO_AO_STATUS_REG_SEL BIT(3) +#define PCMMIO_AO_STATUS_CMD_DRQ_ENA BIT(1) +#define PCMMIO_AO_STATUS_IRQ_ENA BIT(0) #define PCMMIO_AO_RESOURCE_ENA_REG 0x0b #define PCMMIO_AO_2ND_DAC_OFFSET 0x04 -- 1.9.1
[PATCH 1/1] Staging: comedi: fix CHECK: Prefer using the BIT macro issues in pcmmio.c
This patch Replace all occurences of (1< --- drivers/staging/comedi/drivers/pcmmio.c | 40 - 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/drivers/staging/comedi/drivers/pcmmio.c b/drivers/staging/comedi/drivers/pcmmio.c index 10472e6..70ad497 100644 --- a/drivers/staging/comedi/drivers/pcmmio.c +++ b/drivers/staging/comedi/drivers/pcmmio.c @@ -84,25 +84,25 @@ #define PCMMIO_AI_LSB_REG 0x00 #define PCMMIO_AI_MSB_REG 0x01 #define PCMMIO_AI_CMD_REG 0x02 -#define PCMMIO_AI_CMD_SE (1 << 7) -#define PCMMIO_AI_CMD_ODD_CHAN (1 << 6) +#define PCMMIO_AI_CMD_SE BIT(7) +#define PCMMIO_AI_CMD_ODD_CHAN BIT(6) #define PCMMIO_AI_CMD_CHAN_SEL(x) (((x) & 0x3) << 4) #define PCMMIO_AI_CMD_RANGE(x) (((x) & 0x3) << 2) #define PCMMIO_RESOURCE_REG0x02 #define PCMMIO_RESOURCE_IRQ(x) (((x) & 0xf) << 0) #define PCMMIO_AI_STATUS_REG 0x03 -#define PCMMIO_AI_STATUS_DATA_READY(1 << 7) -#define PCMMIO_AI_STATUS_DATA_DMA_PEND (1 << 6) -#define PCMMIO_AI_STATUS_CMD_DMA_PEND (1 << 5) -#define PCMMIO_AI_STATUS_IRQ_PEND (1 << 4) -#define PCMMIO_AI_STATUS_DATA_DRQ_ENA (1 << 2) -#define PCMMIO_AI_STATUS_REG_SEL (1 << 3) -#define PCMMIO_AI_STATUS_CMD_DRQ_ENA (1 << 1) -#define PCMMIO_AI_STATUS_IRQ_ENA (1 << 0) +#define PCMMIO_AI_STATUS_DATA_READYBIT(7) +#define PCMMIO_AI_STATUS_DATA_DMA_PEND BIT(6) +#define PCMMIO_AI_STATUS_CMD_DMA_PEND BIT(5) +#define PCMMIO_AI_STATUS_IRQ_PEND BIT(4) +#define PCMMIO_AI_STATUS_DATA_DRQ_ENA BIT(2) +#define PCMMIO_AI_STATUS_REG_SEL BIT(3) +#define PCMMIO_AI_STATUS_CMD_DRQ_ENA BIT(1) +#define PCMMIO_AI_STATUS_IRQ_ENA BIT(0) #define PCMMIO_AI_RES_ENA_REG 0x03 #define PCMMIO_AI_RES_ENA_CMD_REG_ACCESS (0 << 3) -#define PCMMIO_AI_RES_ENA_AI_RES_ACCESS(1 << 3) -#define PCMMIO_AI_RES_ENA_DIO_RES_ACCESS (1 << 4) +#define PCMMIO_AI_RES_ENA_AI_RES_ACCESSBIT(3) +#define PCMMIO_AI_RES_ENA_DIO_RES_ACCESS BIT(4) #define PCMMIO_AI_2ND_ADC_OFFSET 0x04 #define PCMMIO_AO_LSB_REG 0x08 @@ -125,14 +125,14 @@ #define PCMMIO_AO_CMD_CHAN_SEL(x) (((x) & 0x03) << 1) #define PCMMIO_AO_CMD_CHAN_SEL_ALL (0x0f << 0) #define PCMMIO_AO_STATUS_REG 0x0b -#define PCMMIO_AO_STATUS_DATA_READY(1 << 7) -#define PCMMIO_AO_STATUS_DATA_DMA_PEND (1 << 6) -#define PCMMIO_AO_STATUS_CMD_DMA_PEND (1 << 5) -#define PCMMIO_AO_STATUS_IRQ_PEND (1 << 4) -#define PCMMIO_AO_STATUS_DATA_DRQ_ENA (1 << 2) -#define PCMMIO_AO_STATUS_REG_SEL (1 << 3) -#define PCMMIO_AO_STATUS_CMD_DRQ_ENA (1 << 1) -#define PCMMIO_AO_STATUS_IRQ_ENA (1 << 0) +#define PCMMIO_AO_STATUS_DATA_READYBIT(7) +#define PCMMIO_AO_STATUS_DATA_DMA_PEND BIT(6) +#define PCMMIO_AO_STATUS_CMD_DMA_PEND BIT(5) +#define PCMMIO_AO_STATUS_IRQ_PEND BIT(4) +#define PCMMIO_AO_STATUS_DATA_DRQ_ENA BIT(2) +#define PCMMIO_AO_STATUS_REG_SEL BIT(3) +#define PCMMIO_AO_STATUS_CMD_DRQ_ENA BIT(1) +#define PCMMIO_AO_STATUS_IRQ_ENA BIT(0) #define PCMMIO_AO_RESOURCE_ENA_REG 0x0b #define PCMMIO_AO_2ND_DAC_OFFSET 0x04 -- 1.9.1
Crashes in -next due to 'phy: add support for a reset-gpio specification'
Hi, my xtensa qemu tests crash in -next as follows. [ ... ] [9.366256] libphy: ethoc-mdio: probed [9.367389] (null): could not attach to PHY [9.368555] (null): failed to probe MDIO bus [9.371540] Unable to handle kernel paging request at virtual address 001c [9.371540] pc = d0320926, ra = 903209d1 [9.375358] Oops: sig: 11 [#1] [9.376081] PREEMPT [9.377080] CPU: 0 PID: 1 Comm: swapper Not tainted 4.6.0-next-20160517 #1 [9.378397] task: d7c2c000 ti: d7c3 task.ti: d7c3 [9.379394] a00: 903209d1 d7c31bd0 d7fb5810 0001 d7f45c00 d7c31bd0 [9.382298] a08: 00060100 d04b0c10 d7f45dfc d7c31bb0 [9.385732] pc: d0320926, ps: 00060110, depc: 0018, excvaddr: 001c [9.387061] lbeg: d0322e35, lend: d0322e57 lcount: , sar: 0011 [9.388173] Stack: d7c31be0 00060700 d7f45c00 d7c31bd0 9021d509 d7c31c30 d7f45c00 d0485dcc d0485dcc d7fb5810 d7c2c000 d7c31c30 d7f45c00 d025befc d0485dcc d7c3 d7f45c34 d7c31bf0 9021c985 d7c31c50 d7f45c00 d7f45c34 [9.396652] Call Trace: [9.397469] [] __device_release_driver+0x7d/0x98 [9.398869] [] device_release_driver+0x15/0x20 [9.400247] [] bus_remove_device+0xc1/0xd4 [9.401569] [] device_del+0x109/0x15c [9.402794] [] phy_mdio_device_remove+0xd/0x18 [9.404124] [] mdiobus_unregister+0x40/0x5c [9.405444] [] ethoc_probe+0x534/0x5b8 [9.406742] [] platform_drv_probe+0x28/0x48 [9.408122] [] driver_probe_device+0x101/0x234 [9.409499] [] __driver_attach+0x7d/0x98 [9.410809] [] bus_for_each_dev+0x30/0x5c [9.412104] [] driver_attach+0x14/0x18 [9.413385] [] bus_add_driver+0xc9/0x198 [9.414686] [] driver_register+0x70/0xa0 [9.416001] [] __platform_driver_register+0x24/0x28 [9.417463] [] ethoc_driver_init+0x10/0x14 [9.418824] [] do_one_initcall+0x80/0x1ac [9.420083] [] kernel_init_freeable+0x131/0x198 [9.421504] [] kernel_init+0xc/0xb0 [9.422693] [] ret_from_kernel_thread+0x8/0xc Bisect points to commit da47b4572056 ("phy: add support for a reset-gpio specification"). Bisect log is attached. Reverting the patch fixes the problem. I think there may be a number of problems, all of them exposed by the patch but really separate. GPIOLIB is not configured in my test case, meaning gpiod_get_optional() returns -ENOSYS, and phy_probe() thus returns an error. Question here is if it is really appropriate for the XXX_optional() gpiolib functions to return an error if GPIOLIB is not configured. Either case, result is that pretty much all phy registrations will now fail if GPIOLIB is not configured. Also, I suspect that there may be a bug in the error handling path of ethoc_probe(). No idea what exactly is wrong, though. Other drivers use pretty much the same code sequence for mdio registration and associated error handling. Last but not least, something seems to be wrong with the use of dev_err() with >dev if register_netdev() has not yet been called. Maybe someone has some insight ? Test scripts and root file system used for the test are available at https://github.com/groeck/linux-build-test/tree/master/rootfs/xtensa. Guenter --- # bad: [31b8ce4d1f8150fdc29d2f8a649dc4835e7f2961] arm: Use _rcuidle suffix to allow clk_core_enable() to used from idle # good: [2dcd0af568b0cf583645c8a317dd12e344b1c72a] Linux 4.6 git bisect start 'HEAD' 'v4.6' # bad: [dfd08ad591ff4f6d19896f21fb6c10dc4998dae4] Merge remote-tracking branch 'net-next/master' git bisect bad dfd08ad591ff4f6d19896f21fb6c10dc4998dae4 # good: [eeb1cd39e9e27d89375b33c3a907807fb5adba7e] Merge remote-tracking branch 'xfs/for-next' git bisect good eeb1cd39e9e27d89375b33c3a907807fb5adba7e # good: [b75803d52a2ce1f6cbaf7ae0ae40a369210070cf] tcp: refactor struct tcp_skb_cb git bisect good b75803d52a2ce1f6cbaf7ae0ae40a369210070cf # good: [c2f40435ab0963284d348993b10ac66de6329b74] Merge remote-tracking branch 'v4l-dvb/master' git bisect good c2f40435ab0963284d348993b10ac66de6329b74 # good: [678c657e09034d6f87d254b3183873d6e4a493e4] Merge remote-tracking branch 'slave-dma/next' git bisect good 678c657e09034d6f87d254b3183873d6e4a493e4 # good: [6a47a570321fdcd2b6fc9e6537b2a3650d0fd04b] Merge branch 'mlx5-next' git bisect good 6a47a570321fdcd2b6fc9e6537b2a3650d0fd04b # good: [06566e5dd4e53f57fc3daa12fb8b5252772d70de] i40e: Refactor ethtool get_settings git bisect good 06566e5dd4e53f57fc3daa12fb8b5252772d70de # bad: [10cbc6843446165ee250e1ee80dc19ee325f1e6d] net/sched: cls_flower: Hardware offloaded filters statistics support git bisect bad 10cbc6843446165ee250e1ee80dc19ee325f1e6d # bad: [da47b4572056487fd7941c26f73b3e8815ff712a] phy: add support for a reset-gpio specification git bisect bad da47b4572056487fd7941c26f73b3e8815ff712a # good: [5049e33b559a44e9f216d86c58c7c7fce6f5df2f] bnxt_en: Add BCM57314 device ID. git bisect good 5049e33b559a44e9f216d86c58c7c7fce6
Crashes in -next due to 'phy: add support for a reset-gpio specification'
Hi, my xtensa qemu tests crash in -next as follows. [ ... ] [9.366256] libphy: ethoc-mdio: probed [9.367389] (null): could not attach to PHY [9.368555] (null): failed to probe MDIO bus [9.371540] Unable to handle kernel paging request at virtual address 001c [9.371540] pc = d0320926, ra = 903209d1 [9.375358] Oops: sig: 11 [#1] [9.376081] PREEMPT [9.377080] CPU: 0 PID: 1 Comm: swapper Not tainted 4.6.0-next-20160517 #1 [9.378397] task: d7c2c000 ti: d7c3 task.ti: d7c3 [9.379394] a00: 903209d1 d7c31bd0 d7fb5810 0001 d7f45c00 d7c31bd0 [9.382298] a08: 00060100 d04b0c10 d7f45dfc d7c31bb0 [9.385732] pc: d0320926, ps: 00060110, depc: 0018, excvaddr: 001c [9.387061] lbeg: d0322e35, lend: d0322e57 lcount: , sar: 0011 [9.388173] Stack: d7c31be0 00060700 d7f45c00 d7c31bd0 9021d509 d7c31c30 d7f45c00 d0485dcc d0485dcc d7fb5810 d7c2c000 d7c31c30 d7f45c00 d025befc d0485dcc d7c3 d7f45c34 d7c31bf0 9021c985 d7c31c50 d7f45c00 d7f45c34 [9.396652] Call Trace: [9.397469] [] __device_release_driver+0x7d/0x98 [9.398869] [] device_release_driver+0x15/0x20 [9.400247] [] bus_remove_device+0xc1/0xd4 [9.401569] [] device_del+0x109/0x15c [9.402794] [] phy_mdio_device_remove+0xd/0x18 [9.404124] [] mdiobus_unregister+0x40/0x5c [9.405444] [] ethoc_probe+0x534/0x5b8 [9.406742] [] platform_drv_probe+0x28/0x48 [9.408122] [] driver_probe_device+0x101/0x234 [9.409499] [] __driver_attach+0x7d/0x98 [9.410809] [] bus_for_each_dev+0x30/0x5c [9.412104] [] driver_attach+0x14/0x18 [9.413385] [] bus_add_driver+0xc9/0x198 [9.414686] [] driver_register+0x70/0xa0 [9.416001] [] __platform_driver_register+0x24/0x28 [9.417463] [] ethoc_driver_init+0x10/0x14 [9.418824] [] do_one_initcall+0x80/0x1ac [9.420083] [] kernel_init_freeable+0x131/0x198 [9.421504] [] kernel_init+0xc/0xb0 [9.422693] [] ret_from_kernel_thread+0x8/0xc Bisect points to commit da47b4572056 ("phy: add support for a reset-gpio specification"). Bisect log is attached. Reverting the patch fixes the problem. I think there may be a number of problems, all of them exposed by the patch but really separate. GPIOLIB is not configured in my test case, meaning gpiod_get_optional() returns -ENOSYS, and phy_probe() thus returns an error. Question here is if it is really appropriate for the XXX_optional() gpiolib functions to return an error if GPIOLIB is not configured. Either case, result is that pretty much all phy registrations will now fail if GPIOLIB is not configured. Also, I suspect that there may be a bug in the error handling path of ethoc_probe(). No idea what exactly is wrong, though. Other drivers use pretty much the same code sequence for mdio registration and associated error handling. Last but not least, something seems to be wrong with the use of dev_err() with >dev if register_netdev() has not yet been called. Maybe someone has some insight ? Test scripts and root file system used for the test are available at https://github.com/groeck/linux-build-test/tree/master/rootfs/xtensa. Guenter --- # bad: [31b8ce4d1f8150fdc29d2f8a649dc4835e7f2961] arm: Use _rcuidle suffix to allow clk_core_enable() to used from idle # good: [2dcd0af568b0cf583645c8a317dd12e344b1c72a] Linux 4.6 git bisect start 'HEAD' 'v4.6' # bad: [dfd08ad591ff4f6d19896f21fb6c10dc4998dae4] Merge remote-tracking branch 'net-next/master' git bisect bad dfd08ad591ff4f6d19896f21fb6c10dc4998dae4 # good: [eeb1cd39e9e27d89375b33c3a907807fb5adba7e] Merge remote-tracking branch 'xfs/for-next' git bisect good eeb1cd39e9e27d89375b33c3a907807fb5adba7e # good: [b75803d52a2ce1f6cbaf7ae0ae40a369210070cf] tcp: refactor struct tcp_skb_cb git bisect good b75803d52a2ce1f6cbaf7ae0ae40a369210070cf # good: [c2f40435ab0963284d348993b10ac66de6329b74] Merge remote-tracking branch 'v4l-dvb/master' git bisect good c2f40435ab0963284d348993b10ac66de6329b74 # good: [678c657e09034d6f87d254b3183873d6e4a493e4] Merge remote-tracking branch 'slave-dma/next' git bisect good 678c657e09034d6f87d254b3183873d6e4a493e4 # good: [6a47a570321fdcd2b6fc9e6537b2a3650d0fd04b] Merge branch 'mlx5-next' git bisect good 6a47a570321fdcd2b6fc9e6537b2a3650d0fd04b # good: [06566e5dd4e53f57fc3daa12fb8b5252772d70de] i40e: Refactor ethtool get_settings git bisect good 06566e5dd4e53f57fc3daa12fb8b5252772d70de # bad: [10cbc6843446165ee250e1ee80dc19ee325f1e6d] net/sched: cls_flower: Hardware offloaded filters statistics support git bisect bad 10cbc6843446165ee250e1ee80dc19ee325f1e6d # bad: [da47b4572056487fd7941c26f73b3e8815ff712a] phy: add support for a reset-gpio specification git bisect bad da47b4572056487fd7941c26f73b3e8815ff712a # good: [5049e33b559a44e9f216d86c58c7c7fce6f5df2f] bnxt_en: Add BCM57314 device ID. git bisect good 5049e33b559a44e9f216d86c58c7c7fce6
Re: [PATCH v2 1/9] powerpc/powernv: Move CHECK_HMI_INTERRUPT to exception-64s header
On Tue, May 03, 2016 at 01:54:30PM +0530, Shreyas B. Prabhu wrote: > CHECK_HMI_INTERRUPT is used to check for HMI's in reset vector. Move > the macro to a common location (exception-64s.h) > This patch does not change any functionality. > I suppose this code movement is to facilitate the invocation of CHECK_HMI_INTERRUPT in some later patch ? In this case you could add this to the commit message. Otherwise, Reviewed-by: Gautham R. Shenoy> --- > arch/powerpc/include/asm/exception-64s.h | 18 ++ > arch/powerpc/kernel/idle_power7.S| 20 +--- > 2 files changed, 19 insertions(+), 19 deletions(-) > > diff --git a/arch/powerpc/include/asm/exception-64s.h > b/arch/powerpc/include/asm/exception-64s.h > index 93ae809..6a625af 100644 > --- a/arch/powerpc/include/asm/exception-64s.h > +++ b/arch/powerpc/include/asm/exception-64s.h > @@ -545,4 +545,22 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) > #define FINISH_NAP > #endif > > +#define CHECK_HMI_INTERRUPT \ > + mfspr r0,SPRN_SRR1; \ > +BEGIN_FTR_SECTION_NESTED(66); > \ > + rlwinm r0,r0,45-31,0xf; /* extract wake reason field (P8) */ \ > +FTR_SECTION_ELSE_NESTED(66); \ > + rlwinm r0,r0,45-31,0xe; /* P7 wake reason field is 3 bits */ \ > +ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \ > + cmpwi r0,0xa; /* Hypervisor maintenance ? */ \ > + bne 20f;\ > + /* Invoke opal call to handle hmi */\ > + ld r2,PACATOC(r13);\ > + ld r1,PACAR1(r13); \ > + std r3,ORIG_GPR3(r1); /* Save original r3 */ \ > + li r0,OPAL_HANDLE_HMI; /* Pass opal token argument*/ \ > + bl opal_call_realmode; \ > + ld r3,ORIG_GPR3(r1); /* Restore original r3 */ \ > +20: nop; > + > #endif /* _ASM_POWERPC_EXCEPTION_H */ > diff --git a/arch/powerpc/kernel/idle_power7.S > b/arch/powerpc/kernel/idle_power7.S > index 470ceeb..6b3404b 100644 > --- a/arch/powerpc/kernel/idle_power7.S > +++ b/arch/powerpc/kernel/idle_power7.S > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > #include > > #undef DEBUG > @@ -257,25 +258,6 @@ _GLOBAL(power7_winkle) > b power7_powersave_common > /* No return */ > > -#define CHECK_HMI_INTERRUPT \ > - mfspr r0,SPRN_SRR1; \ > -BEGIN_FTR_SECTION_NESTED(66); > \ > - rlwinm r0,r0,45-31,0xf; /* extract wake reason field (P8) */ \ > -FTR_SECTION_ELSE_NESTED(66); \ > - rlwinm r0,r0,45-31,0xe; /* P7 wake reason field is 3 bits */ \ > -ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \ > - cmpwi r0,0xa; /* Hypervisor maintenance ? */ \ > - bne 20f;\ > - /* Invoke opal call to handle hmi */\ > - ld r2,PACATOC(r13);\ > - ld r1,PACAR1(r13); \ > - std r3,ORIG_GPR3(r1); /* Save original r3 */ \ > - li r0,OPAL_HANDLE_HMI; /* Pass opal token argument*/ \ > - bl opal_call_realmode; \ > - ld r3,ORIG_GPR3(r1); /* Restore original r3 */ \ > -20: nop; > - > - > _GLOBAL(power7_wakeup_tb_loss) > ld r2,PACATOC(r13); > ld r1,PACAR1(r13) > -- > 2.4.11 >
Re: [PATCH v2 1/9] powerpc/powernv: Move CHECK_HMI_INTERRUPT to exception-64s header
On Tue, May 03, 2016 at 01:54:30PM +0530, Shreyas B. Prabhu wrote: > CHECK_HMI_INTERRUPT is used to check for HMI's in reset vector. Move > the macro to a common location (exception-64s.h) > This patch does not change any functionality. > I suppose this code movement is to facilitate the invocation of CHECK_HMI_INTERRUPT in some later patch ? In this case you could add this to the commit message. Otherwise, Reviewed-by: Gautham R. Shenoy > --- > arch/powerpc/include/asm/exception-64s.h | 18 ++ > arch/powerpc/kernel/idle_power7.S| 20 +--- > 2 files changed, 19 insertions(+), 19 deletions(-) > > diff --git a/arch/powerpc/include/asm/exception-64s.h > b/arch/powerpc/include/asm/exception-64s.h > index 93ae809..6a625af 100644 > --- a/arch/powerpc/include/asm/exception-64s.h > +++ b/arch/powerpc/include/asm/exception-64s.h > @@ -545,4 +545,22 @@ END_FTR_SECTION_IFSET(CPU_FTR_CAN_NAP) > #define FINISH_NAP > #endif > > +#define CHECK_HMI_INTERRUPT \ > + mfspr r0,SPRN_SRR1; \ > +BEGIN_FTR_SECTION_NESTED(66); > \ > + rlwinm r0,r0,45-31,0xf; /* extract wake reason field (P8) */ \ > +FTR_SECTION_ELSE_NESTED(66); \ > + rlwinm r0,r0,45-31,0xe; /* P7 wake reason field is 3 bits */ \ > +ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \ > + cmpwi r0,0xa; /* Hypervisor maintenance ? */ \ > + bne 20f;\ > + /* Invoke opal call to handle hmi */\ > + ld r2,PACATOC(r13);\ > + ld r1,PACAR1(r13); \ > + std r3,ORIG_GPR3(r1); /* Save original r3 */ \ > + li r0,OPAL_HANDLE_HMI; /* Pass opal token argument*/ \ > + bl opal_call_realmode; \ > + ld r3,ORIG_GPR3(r1); /* Restore original r3 */ \ > +20: nop; > + > #endif /* _ASM_POWERPC_EXCEPTION_H */ > diff --git a/arch/powerpc/kernel/idle_power7.S > b/arch/powerpc/kernel/idle_power7.S > index 470ceeb..6b3404b 100644 > --- a/arch/powerpc/kernel/idle_power7.S > +++ b/arch/powerpc/kernel/idle_power7.S > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > #include > > #undef DEBUG > @@ -257,25 +258,6 @@ _GLOBAL(power7_winkle) > b power7_powersave_common > /* No return */ > > -#define CHECK_HMI_INTERRUPT \ > - mfspr r0,SPRN_SRR1; \ > -BEGIN_FTR_SECTION_NESTED(66); > \ > - rlwinm r0,r0,45-31,0xf; /* extract wake reason field (P8) */ \ > -FTR_SECTION_ELSE_NESTED(66); \ > - rlwinm r0,r0,45-31,0xe; /* P7 wake reason field is 3 bits */ \ > -ALT_FTR_SECTION_END_NESTED_IFSET(CPU_FTR_ARCH_207S, 66); \ > - cmpwi r0,0xa; /* Hypervisor maintenance ? */ \ > - bne 20f;\ > - /* Invoke opal call to handle hmi */\ > - ld r2,PACATOC(r13);\ > - ld r1,PACAR1(r13); \ > - std r3,ORIG_GPR3(r1); /* Save original r3 */ \ > - li r0,OPAL_HANDLE_HMI; /* Pass opal token argument*/ \ > - bl opal_call_realmode; \ > - ld r3,ORIG_GPR3(r1); /* Restore original r3 */ \ > -20: nop; > - > - > _GLOBAL(power7_wakeup_tb_loss) > ld r2,PACATOC(r13); > ld r1,PACAR1(r13) > -- > 2.4.11 >
linux-next: Tree for May 18
Hi all, Please do not add any v4.8 destined material to your linux-next included branches until after v4.7-rc1 has been released. Changes since 20160517: New tree: dax-misc The dax-misc tree gained a conflict against the nvdimm tree. The akpm-current tree gained a conflict against the dax-misc tree. Non-merge commits (relative to Linus' tree): 8785 7390 files changed, 389214 insertions(+), 158994 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig (with CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig (this fails its final link) and pseries_le_defconfig and i386, sparc and sparc64 defconfig. Below is a summary of the state of the merge. I am currently merging 236 trees (counting Linus' and 35 trees of patches pending for Linus' tree). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (7f427d3a6029 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs) Merging fixes/master (b507146bb6b9 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6) Merging kbuild-current/rc-fixes (3d1450d54a4f Makefile: Force gzip and xz on module install) Merging arc-current/for-curr (44549e8f5eea Linux 4.6-rc7) Merging arm-current/fixes (ec953b70f368 ARM: 8573/1: domain: move {set,get}_domain under config guard) Merging m68k-current/for-linus (9a6462763b17 m68k/mvme16x: Include generic ) Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached build errors) Merging powerpc-fixes/fixes (b4c112114aab powerpc: Fix bad inline asm constraint in create_zero_mask()) Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2) Merging sparc/master (33656a1f2ee5 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs) Merging net/master (2dcd0af568b0 Linux 4.6) Merging ipsec/master (d6af1a31cc72 vti: Add pmtu handling to vti_xmit.) Merging ipvs/master (f28f20da704d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) Merging wireless-drivers/master (cbbba30f1ac9 Merge tag 'iwlwifi-for-kalle-2016-05-04' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes) Merging mac80211/master (e6436be21e77 mac80211: fix statistics leak if dev_alloc_name() fails) Merging sound-current/for-linus (c7c5856b6f6f sound: oss: Use setup_timer and mod_timer.) Merging pci-current/for-linus (9a2a5a638f8e PCI: Do not treat EPROBE_DEFER as device attach failure) Merging driver-core.current/driver-core-linus (c3b46c73264b Linux 4.6-rc4) Merging tty.current/tty-linus (44549e8f5eea Linux 4.6-rc7) Merging usb.current/usb-linus (44549e8f5eea Linux 4.6-rc7) Merging usb-gadget-fixes/fixes (38740a5b87d5 usb: gadget: f_fs: Fix use-after-free) Merging usb-serial-fixes/usb-linus (74d2a91aec97 USB: serial: option: add even more ZTE device ids) Merging usb-chipidea-fixes/ci-for-usb-stable (d144dfea8af7 usb: chipidea: otg: change workqueue ci_otg as freezable) Merging staging.current/staging-linus (44549e8f5eea Linux 4.6-rc7) Merging char-misc.current/char-misc-linus (44549e8f5eea Linux 4.6-rc7) Merging input-current/for-linus (23ea5967d6bd Merge branch 'next' into for-linus) Merging crypto-current/master (4a6b27b79da5 crypto: sha1-mb - make sha1_x8_avx2() conform to C function ABI) Merging ide/master (1993b176a822 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide) Merging devicetree-current/devicetree/merge (f76502aa9140 of/dynamic: Fix test for PPC_PSERIES) Merging rr-fixes/fixes (8244062ef1e5 modules: fix longstanding /proc/kallsyms vs module insertion race.) Merging vfio-fixes/for-linus (8160c4e45582 v
linux-next: Tree for May 18
Hi all, Please do not add any v4.8 destined material to your linux-next included branches until after v4.7-rc1 has been released. Changes since 20160517: New tree: dax-misc The dax-misc tree gained a conflict against the nvdimm tree. The akpm-current tree gained a conflict against the dax-misc tree. Non-merge commits (relative to Linus' tree): 8785 7390 files changed, 389214 insertions(+), 158994 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig (with CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig (this fails its final link) and pseries_le_defconfig and i386, sparc and sparc64 defconfig. Below is a summary of the state of the merge. I am currently merging 236 trees (counting Linus' and 35 trees of patches pending for Linus' tree). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (7f427d3a6029 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs) Merging fixes/master (b507146bb6b9 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6) Merging kbuild-current/rc-fixes (3d1450d54a4f Makefile: Force gzip and xz on module install) Merging arc-current/for-curr (44549e8f5eea Linux 4.6-rc7) Merging arm-current/fixes (ec953b70f368 ARM: 8573/1: domain: move {set,get}_domain under config guard) Merging m68k-current/for-linus (9a6462763b17 m68k/mvme16x: Include generic ) Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached build errors) Merging powerpc-fixes/fixes (b4c112114aab powerpc: Fix bad inline asm constraint in create_zero_mask()) Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2) Merging sparc/master (33656a1f2ee5 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs) Merging net/master (2dcd0af568b0 Linux 4.6) Merging ipsec/master (d6af1a31cc72 vti: Add pmtu handling to vti_xmit.) Merging ipvs/master (f28f20da704d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net) Merging wireless-drivers/master (cbbba30f1ac9 Merge tag 'iwlwifi-for-kalle-2016-05-04' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes) Merging mac80211/master (e6436be21e77 mac80211: fix statistics leak if dev_alloc_name() fails) Merging sound-current/for-linus (c7c5856b6f6f sound: oss: Use setup_timer and mod_timer.) Merging pci-current/for-linus (9a2a5a638f8e PCI: Do not treat EPROBE_DEFER as device attach failure) Merging driver-core.current/driver-core-linus (c3b46c73264b Linux 4.6-rc4) Merging tty.current/tty-linus (44549e8f5eea Linux 4.6-rc7) Merging usb.current/usb-linus (44549e8f5eea Linux 4.6-rc7) Merging usb-gadget-fixes/fixes (38740a5b87d5 usb: gadget: f_fs: Fix use-after-free) Merging usb-serial-fixes/usb-linus (74d2a91aec97 USB: serial: option: add even more ZTE device ids) Merging usb-chipidea-fixes/ci-for-usb-stable (d144dfea8af7 usb: chipidea: otg: change workqueue ci_otg as freezable) Merging staging.current/staging-linus (44549e8f5eea Linux 4.6-rc7) Merging char-misc.current/char-misc-linus (44549e8f5eea Linux 4.6-rc7) Merging input-current/for-linus (23ea5967d6bd Merge branch 'next' into for-linus) Merging crypto-current/master (4a6b27b79da5 crypto: sha1-mb - make sha1_x8_avx2() conform to C function ABI) Merging ide/master (1993b176a822 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide) Merging devicetree-current/devicetree/merge (f76502aa9140 of/dynamic: Fix test for PPC_PSERIES) Merging rr-fixes/fixes (8244062ef1e5 modules: fix longstanding /proc/kallsyms vs module insertion race.) Merging vfio-fixes/for-linus (8160c4e45582 v
Re: [PATCH] mmc: dw_mmc: Consider HLE errors to be data and command errors
Hi, On Tue, May 17, 2016 at 7:08 PM, Jaehoon Chungwrote: > On 05/18/2016 09:47 AM, Doug Anderson wrote: >> Jaehoon, >> >> On Mon, Mar 30, 2015 at 8:47 AM, Doug Anderson wrote: >>> Jaehoon, >>> >>> On Sun, Mar 29, 2015 at 5:55 PM, Jaehoon Chung >>> wrote: Dear Doug, I'm considering to control HLE error..So holding this patch. If this is absolutely necessary patch, let me know, plz. Best Regards, Jaehoon Chung >>> >>> Sounds OK. I have certainly applied this locally and the driver isn't >>> robust against insertions / removals without it, but once the card is >>> inserted things are OK so it's probably not urgent that it be applied >>> upstream. Hopefully we can figure out a better solution... >> >> I'm now testing a nice new rebased kernel and I'm hitting this again. >> >> Of course I'll just pick my same patch to my new kernel tree, but >> since it's been a year and nobody has done anything better, would you >> consider landing my patch? It is certainly better than nothing. > > Sure, it's right. > I think that main reason of HLE is wait_prvdata_complete. (I'm guessing..) > On other hands, dwmmc controller is handling something wrong. (I found that > HLE is occurred the similar case.) > After find the main solution, it's not bad that your patch is applied on > dwmmc controller. > > Ulf have sent PR for next..So if we needs to apply this, i will apply on fix. It's not new, so I'd say just queue it up for the next version whenever it's convenient. -Doug
Re: [PATCH] mmc: dw_mmc: Consider HLE errors to be data and command errors
Hi, On Tue, May 17, 2016 at 7:08 PM, Jaehoon Chung wrote: > On 05/18/2016 09:47 AM, Doug Anderson wrote: >> Jaehoon, >> >> On Mon, Mar 30, 2015 at 8:47 AM, Doug Anderson wrote: >>> Jaehoon, >>> >>> On Sun, Mar 29, 2015 at 5:55 PM, Jaehoon Chung >>> wrote: Dear Doug, I'm considering to control HLE error..So holding this patch. If this is absolutely necessary patch, let me know, plz. Best Regards, Jaehoon Chung >>> >>> Sounds OK. I have certainly applied this locally and the driver isn't >>> robust against insertions / removals without it, but once the card is >>> inserted things are OK so it's probably not urgent that it be applied >>> upstream. Hopefully we can figure out a better solution... >> >> I'm now testing a nice new rebased kernel and I'm hitting this again. >> >> Of course I'll just pick my same patch to my new kernel tree, but >> since it's been a year and nobody has done anything better, would you >> consider landing my patch? It is certainly better than nothing. > > Sure, it's right. > I think that main reason of HLE is wait_prvdata_complete. (I'm guessing..) > On other hands, dwmmc controller is handling something wrong. (I found that > HLE is occurred the similar case.) > After find the main solution, it's not bad that your patch is applied on > dwmmc controller. > > Ulf have sent PR for next..So if we needs to apply this, i will apply on fix. It's not new, so I'd say just queue it up for the next version whenever it's convenient. -Doug
Re: [PATCH] mmc: dw_mmc: Consider HLE errors to be data and command errors
Hi, On Tue, May 17, 2016 at 6:59 PM, Shawn Linwrote: > Could you try this patch to see if you can still find HLE? > > @@ -2356,12 +2356,22 @@ static void dw_mci_cmd_interrupt(struct dw_mci > *host, u32 status) > static void dw_mci_handle_cd(struct dw_mci *host) > { > int i; > + int present; > > for (i = 0; i < host->num_slots; i++) { > struct dw_mci_slot *slot = host->slot[i]; > > if (!slot) > continue; > > + present = !(mci_readl(slot->host, CDETECT) & (1 << > slot->id)); > + if (present) > + set_bit(DW_MMC_CARD_PRESENT, >flags); > + else > + clear_bit(DW_MMC_CARD_PRESENT, >flags); No, because we don't use the builtin card detect on veyron. ;) We use GPIO card detect because we didn't like the way JTAG and SD interacted. Also on rk3288 the builtin card detect line had the wrong voltage domain (you couldn't detect a card when the IO lines were powered off). The builtin card detect line is always driven low on veyron. I'm nearly certain that the root cause of my HLE errors is actually related to the same problem addressed by the commit 7c5209c315ea ("mmc: core: Increase delay for voltage to stabilize from 3.3V to 1.8V"). I think that on minnie we're still on the hairy edge and sometimes the line doesn't transition fast enough. It appears that increasing this to 30ms avoids the HLE errors. I _think_ I can actually fully fix this properly by temporarily engaging the internal pull-ups while the voltage switch is happening. This will bleed away the voltage just a little bit faster (since lines are driven low here). I'll try to confirm that. In any case, it seems like we should take this patch since (without this patch) the failure case when you get HLE errors is that the interrupt controller fires over and over again (with no printouts) and your system stalls with no error messages. -Doug
Re: [PATCH] mmc: dw_mmc: Consider HLE errors to be data and command errors
Hi, On Tue, May 17, 2016 at 6:59 PM, Shawn Lin wrote: > Could you try this patch to see if you can still find HLE? > > @@ -2356,12 +2356,22 @@ static void dw_mci_cmd_interrupt(struct dw_mci > *host, u32 status) > static void dw_mci_handle_cd(struct dw_mci *host) > { > int i; > + int present; > > for (i = 0; i < host->num_slots; i++) { > struct dw_mci_slot *slot = host->slot[i]; > > if (!slot) > continue; > > + present = !(mci_readl(slot->host, CDETECT) & (1 << > slot->id)); > + if (present) > + set_bit(DW_MMC_CARD_PRESENT, >flags); > + else > + clear_bit(DW_MMC_CARD_PRESENT, >flags); No, because we don't use the builtin card detect on veyron. ;) We use GPIO card detect because we didn't like the way JTAG and SD interacted. Also on rk3288 the builtin card detect line had the wrong voltage domain (you couldn't detect a card when the IO lines were powered off). The builtin card detect line is always driven low on veyron. I'm nearly certain that the root cause of my HLE errors is actually related to the same problem addressed by the commit 7c5209c315ea ("mmc: core: Increase delay for voltage to stabilize from 3.3V to 1.8V"). I think that on minnie we're still on the hairy edge and sometimes the line doesn't transition fast enough. It appears that increasing this to 30ms avoids the HLE errors. I _think_ I can actually fully fix this properly by temporarily engaging the internal pull-ups while the voltage switch is happening. This will bleed away the voltage just a little bit faster (since lines are driven low here). I'll try to confirm that. In any case, it seems like we should take this patch since (without this patch) the failure case when you get HLE errors is that the interrupt controller fires over and over again (with no printouts) and your system stalls with no error messages. -Doug
Re: QRTR merge conflict resolution
On Tue 17 May 17:43 PDT 2016, Stephen Rothwell wrote: > Hi David, > > On Tue, 17 May 2016 14:11:54 -0400 (EDT) David Miller> wrote: > > > > From: Bjorn Andersson > > Date: Fri, 13 May 2016 15:19:09 -0700 > > > > > I have prepared the merge of net-next and the conflicting tag from the > > > Qualcomm SOC, please include this in your pull towards Linus to avoid > > > the merge conflict. > > > > Pulled, thanks. > > Except in the merge resolution, the 2 new functions added to > include/linux/soc/qcom/smd.h (qcom_smd_get_drvdata and > qcom_smd_set_drvdata) were not marked "static inline" :-( > How silly of me to miss that, sorry about that. I didn't spot this in my compile testing either, because this is the only driver in the tree including that file that doesn't depend on QCOM_SMD. As there is no immediate problem with moving forward I suggest that I'll fix this, through arm-soc, once the code has landed. Regards, Bjorn
Re: QRTR merge conflict resolution
On Tue 17 May 17:43 PDT 2016, Stephen Rothwell wrote: > Hi David, > > On Tue, 17 May 2016 14:11:54 -0400 (EDT) David Miller > wrote: > > > > From: Bjorn Andersson > > Date: Fri, 13 May 2016 15:19:09 -0700 > > > > > I have prepared the merge of net-next and the conflicting tag from the > > > Qualcomm SOC, please include this in your pull towards Linus to avoid > > > the merge conflict. > > > > Pulled, thanks. > > Except in the merge resolution, the 2 new functions added to > include/linux/soc/qcom/smd.h (qcom_smd_get_drvdata and > qcom_smd_set_drvdata) were not marked "static inline" :-( > How silly of me to miss that, sorry about that. I didn't spot this in my compile testing either, because this is the only driver in the tree including that file that doesn't depend on QCOM_SMD. As there is no immediate problem with moving forward I suggest that I'll fix this, through arm-soc, once the code has landed. Regards, Bjorn
Re: [PATCH] sched/cputime: add steal time support to full dynticks CPU time accounting
On Tue, 2016-05-10 at 13:34 +0800, Wanpeng Li wrote: > From: Wanpeng Li> > This patch adds steal guest time support to full dynticks CPU time > accounting. After commit ff9a9b4c(sched, time: Switch > VIRT_CPU_ACCOUNTING_GEN > to jiffy granularity), time is jiffy based sampling even if it's > still listened to ring boundaries, so steal_account_process_tick() > is reused to account how much 'ticks' are steal time after the > last accumulation. > > Suggested-by: Rik van Riel > Cc: Ingo Molnar > Cc: Peter Zijlstra (Intel) > Cc: Rik van Riel > Cc: Thomas Gleixner > Cc: Frederic Weisbecker > Cc: Paolo Bonzini > Cc: Radim > Signed-off-by: Wanpeng Li > Acked-by: Rik van Riel -- All Rights Reversed. signature.asc Description: This is a digitally signed message part
Re: [PATCH] sched/cputime: add steal time support to full dynticks CPU time accounting
On Tue, 2016-05-10 at 13:34 +0800, Wanpeng Li wrote: > From: Wanpeng Li > > This patch adds steal guest time support to full dynticks CPU time > accounting. After commit ff9a9b4c(sched, time: Switch > VIRT_CPU_ACCOUNTING_GEN > to jiffy granularity), time is jiffy based sampling even if it's > still listened to ring boundaries, so steal_account_process_tick() > is reused to account how much 'ticks' are steal time after the > last accumulation. > > Suggested-by: Rik van Riel > Cc: Ingo Molnar > Cc: Peter Zijlstra (Intel) > Cc: Rik van Riel > Cc: Thomas Gleixner > Cc: Frederic Weisbecker > Cc: Paolo Bonzini > Cc: Radim > Signed-off-by: Wanpeng Li > Acked-by: Rik van Riel -- All Rights Reversed. signature.asc Description: This is a digitally signed message part
Re: [PATCH v12 05/10] arm64: Kprobes with single stepping support
On Thu, 12 May 2016 16:01:54 +0100 James Morsewrote: > Hi David, Sandeepa, > > On 27/04/16 19:53, David Long wrote: > > From: Sandeepa Prabhu > > > diff --git a/arch/arm64/kernel/kprobes.c b/arch/arm64/kernel/kprobes.c > > new file mode 100644 > > index 000..dfa1b1f > > --- /dev/null > > +++ b/arch/arm64/kernel/kprobes.c > > @@ -0,0 +1,520 @@ > > +/* > > + * arch/arm64/kernel/kprobes.c > > + * > > + * Kprobes support for ARM64 > > + * > > + * Copyright (C) 2013 Linaro Limited. > > + * Author: Sandeepa Prabhu > > + * > > + * This program is free software; you can redistribute it and/or modify > > + * it under the terms of the GNU General Public License version 2 as > > + * published by the Free Software Foundation. > > + * > > + * This program is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + * General Public License for more details. > > + * > > + */ > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#include "kprobes-arm64.h" > > + > > +#define MIN_STACK_SIZE(addr) min((unsigned long)MAX_STACK_SIZE, > > \ > > + (unsigned long)current_thread_info() + THREAD_START_SP - (addr)) > > What if we probe something called on the irq stack? > This needs the on_irq_stack() checks too, the start/end can be found from the > per-cpu irq_stack value. > > [ ... ] > > > +int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs) > > +{ > > + struct jprobe *jp = container_of(p, struct jprobe, kp); > > + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); > > + long stack_ptr = kernel_stack_pointer(regs); > > + > > + kcb->jprobe_saved_regs = *regs; > > + memcpy(kcb->jprobes_stack, (void *)stack_ptr, > > + MIN_STACK_SIZE(stack_ptr)); > > I wonder if we need this stack save/restore? > > The comment next to the equivalent code for x86 says: > > gcc assumes that the callee owns the argument space and could overwrite it, > > e.g. tailcall optimization. So, to be absolutely safe we also save and > > restore enough stack bytes to cover the argument area. > > On arm64 the first eight arguments are passed in registers, so we might not > need > this stack copy. (sparc and powerpc work like this too, their versions of this > function don't copy chunks of the stack). Hmm, maybe sparc and powerpc implementation should also be fixed... > ... then I went looking for functions with >8 arguments... > > Looking at the arm64 defconfig dwarf debug data, there are 71 of these that > don't get inlined, picking at random: > > rockchip_clk_register_pll() has 13 > > fib_dump_info() has 11 > > vma_merge() has 10 > > vring_create_virtqueue() has 10 > etc... > > So we do need this stack copying, so that we can probe these function without > risking the arguments being modified. > > It may be worth including a comment to the effect that this stack save/restore > is needed for functions that pass >8 arguments where the pre-handler may > change > these values on the stack. Indeed, commenting on this code can help us to understand the reason why. Thank you! > > > > + preempt_enable_no_resched(); > > + return 1; > > +} > > + > > > Thanks, > > James -- Masami Hiramatsu
Re: [PATCH v12 05/10] arm64: Kprobes with single stepping support
On Thu, 12 May 2016 16:01:54 +0100 James Morse wrote: > Hi David, Sandeepa, > > On 27/04/16 19:53, David Long wrote: > > From: Sandeepa Prabhu > > > diff --git a/arch/arm64/kernel/kprobes.c b/arch/arm64/kernel/kprobes.c > > new file mode 100644 > > index 000..dfa1b1f > > --- /dev/null > > +++ b/arch/arm64/kernel/kprobes.c > > @@ -0,0 +1,520 @@ > > +/* > > + * arch/arm64/kernel/kprobes.c > > + * > > + * Kprobes support for ARM64 > > + * > > + * Copyright (C) 2013 Linaro Limited. > > + * Author: Sandeepa Prabhu > > + * > > + * This program is free software; you can redistribute it and/or modify > > + * it under the terms of the GNU General Public License version 2 as > > + * published by the Free Software Foundation. > > + * > > + * This program is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + * General Public License for more details. > > + * > > + */ > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#include "kprobes-arm64.h" > > + > > +#define MIN_STACK_SIZE(addr) min((unsigned long)MAX_STACK_SIZE, > > \ > > + (unsigned long)current_thread_info() + THREAD_START_SP - (addr)) > > What if we probe something called on the irq stack? > This needs the on_irq_stack() checks too, the start/end can be found from the > per-cpu irq_stack value. > > [ ... ] > > > +int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs) > > +{ > > + struct jprobe *jp = container_of(p, struct jprobe, kp); > > + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); > > + long stack_ptr = kernel_stack_pointer(regs); > > + > > + kcb->jprobe_saved_regs = *regs; > > + memcpy(kcb->jprobes_stack, (void *)stack_ptr, > > + MIN_STACK_SIZE(stack_ptr)); > > I wonder if we need this stack save/restore? > > The comment next to the equivalent code for x86 says: > > gcc assumes that the callee owns the argument space and could overwrite it, > > e.g. tailcall optimization. So, to be absolutely safe we also save and > > restore enough stack bytes to cover the argument area. > > On arm64 the first eight arguments are passed in registers, so we might not > need > this stack copy. (sparc and powerpc work like this too, their versions of this > function don't copy chunks of the stack). Hmm, maybe sparc and powerpc implementation should also be fixed... > ... then I went looking for functions with >8 arguments... > > Looking at the arm64 defconfig dwarf debug data, there are 71 of these that > don't get inlined, picking at random: > > rockchip_clk_register_pll() has 13 > > fib_dump_info() has 11 > > vma_merge() has 10 > > vring_create_virtqueue() has 10 > etc... > > So we do need this stack copying, so that we can probe these function without > risking the arguments being modified. > > It may be worth including a comment to the effect that this stack save/restore > is needed for functions that pass >8 arguments where the pre-handler may > change > these values on the stack. Indeed, commenting on this code can help us to understand the reason why. Thank you! > > > > + preempt_enable_no_resched(); > > + return 1; > > +} > > + > > > Thanks, > > James -- Masami Hiramatsu
linux-next: manual merge of the akpm-current tree with the dax-misc tree
Hi Andrew, Today's linux-next merge of the akpm-current tree got a conflict in: include/linux/dax.h between commit: ecdb4bf9e327 ("dax: export a low-level __dax_zero_page_range helper") from the dax-misc tree and commit: 29d44f6759f6 ("dax: add dax_get_unmapped_area for pmd mappings") from the akpm-current tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc include/linux/dax.h index 7743e51f826c,184b1714900c.. --- a/include/linux/dax.h +++ b/include/linux/dax.h @@@ -14,19 -17,15 +14,22 @@@ int __dax_fault(struct vm_area_struct * #ifdef CONFIG_FS_DAX struct page *read_dax_sector(struct block_device *bdev, sector_t n); +int __dax_zero_page_range(struct block_device *bdev, sector_t sector, + unsigned int offset, unsigned int length); + unsigned long dax_get_unmapped_area(struct file *filp, unsigned long addr, + unsigned long len, unsigned long pgoff, unsigned long flags); #else static inline struct page *read_dax_sector(struct block_device *bdev, sector_t n) { return ERR_PTR(-ENXIO); } +static inline int __dax_zero_page_range(struct block_device *bdev, + sector_t sector, unsigned int offset, unsigned int length) +{ + return -ENXIO; +} + #define dax_get_unmapped_area NULL #endif #ifdef CONFIG_TRANSPARENT_HUGEPAGE
linux-next: manual merge of the akpm-current tree with the dax-misc tree
Hi Andrew, Today's linux-next merge of the akpm-current tree got a conflict in: include/linux/dax.h between commit: ecdb4bf9e327 ("dax: export a low-level __dax_zero_page_range helper") from the dax-misc tree and commit: 29d44f6759f6 ("dax: add dax_get_unmapped_area for pmd mappings") from the akpm-current tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc include/linux/dax.h index 7743e51f826c,184b1714900c.. --- a/include/linux/dax.h +++ b/include/linux/dax.h @@@ -14,19 -17,15 +14,22 @@@ int __dax_fault(struct vm_area_struct * #ifdef CONFIG_FS_DAX struct page *read_dax_sector(struct block_device *bdev, sector_t n); +int __dax_zero_page_range(struct block_device *bdev, sector_t sector, + unsigned int offset, unsigned int length); + unsigned long dax_get_unmapped_area(struct file *filp, unsigned long addr, + unsigned long len, unsigned long pgoff, unsigned long flags); #else static inline struct page *read_dax_sector(struct block_device *bdev, sector_t n) { return ERR_PTR(-ENXIO); } +static inline int __dax_zero_page_range(struct block_device *bdev, + sector_t sector, unsigned int offset, unsigned int length) +{ + return -ENXIO; +} + #define dax_get_unmapped_area NULL #endif #ifdef CONFIG_TRANSPARENT_HUGEPAGE
[PATCH] MM: increase safety margin provided by PF_LESS_THROTTLE
When nfsd is exporting a filesystem over NFS which is then NFS-mounted on the local machine there is a risk of deadlock. This happens when there are lots of dirty pages in the NFS filesystem and they cause NFSD to be throttled, either in throttle_vm_writeout() or in balance_dirty_pages(). To avoid this problem the PF_LESS_THROTTLE flag is set for NFSD threads and it provides a 25% increase to the limits that affect NFSD. Any process writing to an NFS filesystem will be throttled well before the number of dirty NFS pages reaches the limit imposed on NFSD, so NFSD will not deadlock on pages that it needs to write out. At least it shouldn't. All processes are allowed a small excess margin to avoid performing too many calculations: ratelimit_pages. ratelimit_pages is set so that if a thread on every CPU uses the entire margin, the total will only go 3% over the limit, and this is much less than the 25% bonus that PF_LESS_THROTTLE provides, so this margin shouldn't be a problem. But it is. The "total memory" that these 3% and 25% are calculated against are not really total memory but are "global_dirtyable_memory()" which doesn't include anonymous memory, just free memory and page-cache memory. The "ratelimit_pages" number is based on whatever the global_dirtyable_memory was on the last CPU hot-plug, which might not be what you expect, but is probably close to the total freeable memory. The throttle threshold uses the global_dirtable_memory at the moment when the throttling happens, which could be much less than at the last CPU hotplug. So if lots of anonymous memory has been allocated, thus pushing out lots of page-cache pages, then NFSD might end up being throttled due to dirty NFS pages because the "25%" bonus it gets is calculated against a rather small amount of dirtyable memory, while the "3%" margin that other processes are allowed to dirty without penalty is calculated against a much larger number. To remove this possibility of deadlock we need to make sure that the margin granted to PF_LESS_THROTTLE exceeds that rate-limit margin. Simply adding ratelimit_pages isn't enough as that should be multiplied by the number of cpus. So add "global_wb_domain.dirty_limit / 32" as that more accurately reflects the current total over-shoot margin. This ensures that the number of dirty NFS pages never gets so high that nfsd will be throttled waiting for them to be written. Signed-off-by: NeilBrowndiff --git a/mm/page-writeback.c b/mm/page-writeback.c index bc5149d5ec38..bbdcd7ccef57 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -407,8 +407,8 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc) bg_thresh = thresh / 2; tsk = current; if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) { - bg_thresh += bg_thresh / 4; - thresh += thresh / 4; + bg_thresh += bg_thresh / 4 + global_wb_domain.dirty_limit / 32; + thresh += thresh / 4 + global_wb_domain.dirty_limit / 32; } dtc->thresh = thresh; dtc->bg_thresh = bg_thresh; signature.asc Description: PGP signature
[PATCH] MM: increase safety margin provided by PF_LESS_THROTTLE
When nfsd is exporting a filesystem over NFS which is then NFS-mounted on the local machine there is a risk of deadlock. This happens when there are lots of dirty pages in the NFS filesystem and they cause NFSD to be throttled, either in throttle_vm_writeout() or in balance_dirty_pages(). To avoid this problem the PF_LESS_THROTTLE flag is set for NFSD threads and it provides a 25% increase to the limits that affect NFSD. Any process writing to an NFS filesystem will be throttled well before the number of dirty NFS pages reaches the limit imposed on NFSD, so NFSD will not deadlock on pages that it needs to write out. At least it shouldn't. All processes are allowed a small excess margin to avoid performing too many calculations: ratelimit_pages. ratelimit_pages is set so that if a thread on every CPU uses the entire margin, the total will only go 3% over the limit, and this is much less than the 25% bonus that PF_LESS_THROTTLE provides, so this margin shouldn't be a problem. But it is. The "total memory" that these 3% and 25% are calculated against are not really total memory but are "global_dirtyable_memory()" which doesn't include anonymous memory, just free memory and page-cache memory. The "ratelimit_pages" number is based on whatever the global_dirtyable_memory was on the last CPU hot-plug, which might not be what you expect, but is probably close to the total freeable memory. The throttle threshold uses the global_dirtable_memory at the moment when the throttling happens, which could be much less than at the last CPU hotplug. So if lots of anonymous memory has been allocated, thus pushing out lots of page-cache pages, then NFSD might end up being throttled due to dirty NFS pages because the "25%" bonus it gets is calculated against a rather small amount of dirtyable memory, while the "3%" margin that other processes are allowed to dirty without penalty is calculated against a much larger number. To remove this possibility of deadlock we need to make sure that the margin granted to PF_LESS_THROTTLE exceeds that rate-limit margin. Simply adding ratelimit_pages isn't enough as that should be multiplied by the number of cpus. So add "global_wb_domain.dirty_limit / 32" as that more accurately reflects the current total over-shoot margin. This ensures that the number of dirty NFS pages never gets so high that nfsd will be throttled waiting for them to be written. Signed-off-by: NeilBrown diff --git a/mm/page-writeback.c b/mm/page-writeback.c index bc5149d5ec38..bbdcd7ccef57 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -407,8 +407,8 @@ static void domain_dirty_limits(struct dirty_throttle_control *dtc) bg_thresh = thresh / 2; tsk = current; if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) { - bg_thresh += bg_thresh / 4; - thresh += thresh / 4; + bg_thresh += bg_thresh / 4 + global_wb_domain.dirty_limit / 32; + thresh += thresh / 4 + global_wb_domain.dirty_limit / 32; } dtc->thresh = thresh; dtc->bg_thresh = bg_thresh; signature.asc Description: PGP signature
Re: [GIT] Networking
On Wed, May 18, 2016 at 4:00 AM, Linus Torvaldswrote: > On Tue, May 17, 2016 at 12:11 PM, David Miller wrote: >> >> Highlights: > > Lowlights: > > 1) the iwlwifi driver seems to be broken > > My laptop that uses the intel 7680 iwlwifi module no longer connects > to the network. It fails with a "Microcode SW error detected." and > spews out register state over and over again. Can we have the register state and the ASSERT / NMI / whatever that goes along with it? This clearly means that the firmware is crashing, but I don't know why, I copied here the lines that I need from another bug with another device with another firmware, but the log that we will still explain what I need: [ 800.880402] iwlwifi :02:00.0: Start IWL Error Log Dump: [ 800.880406] iwlwifi :02:00.0: Status: 0x, count: 6 [ 800.880409] iwlwifi :02:00.0: Loaded firmware version: 21.311951.0 [ 800.880413] iwlwifi :02:00.0: 0x0394 | ADVANCED_SYSASSERT [ 800.880416] iwlwifi :02:00.0: 0x0220 | trm_hw_status0 [ 800.880419] iwlwifi :02:00.0: 0x | trm_hw_status1 [ 800.880422] iwlwifi :02:00.0: 0x0BD8 | branchlink2 [ 800.880425] iwlwifi :02:00.0: 0x00026AC4 | interruptlink1 [ 800.880428] iwlwifi :02:00.0: 0x | interruptlink2 [ 800.880431] iwlwifi :02:00.0: 0x0001 | data1 [ 800.880434] iwlwifi :02:00.0: 0x02039845 | data2 [ 800.880437] iwlwifi :02:00.0: 0x0056 | data3 [ 800.880440] iwlwifi :02:00.0: 0x8E4184A7 | beacon time [ 800.880443] iwlwifi :02:00.0: 0x30E2CB41 | tsf low [ 800.880446] iwlwifi :02:00.0: 0x0027 | tsf hi [ 800.880449] iwlwifi :02:00.0: 0x | time gp1 [ 800.880451] iwlwifi :02:00.0: 0x2F842F8A | time gp2 [ 800.880454] iwlwifi :02:00.0: 0x | uCode revision type [ 800.880457] iwlwifi :02:00.0: 0x0015 | uCode version major [ 800.880460] iwlwifi :02:00.0: 0x0004C28F | uCode version minor [ 800.880463] iwlwifi :02:00.0: 0x0201 | hw version [ 800.880466] iwlwifi :02:00.0: 0x00489008 | board version [ 800.880469] iwlwifi :02:00.0: 0x001C | hcmd [ 800.880472] iwlwifi :02:00.0: 0x24022000 | isr0 [ 800.880475] iwlwifi :02:00.0: 0x0100 | isr1 [ 800.880478] iwlwifi :02:00.0: 0x580A | isr2 [ 800.880481] iwlwifi :02:00.0: 0x4041FCC1 | isr3 [ 800.880483] iwlwifi :02:00.0: 0x | isr4 [ 800.880486] iwlwifi :02:00.0: 0x00800110 | last cmd Id [ 800.880489] iwlwifi :02:00.0: 0x | wait_event [ 800.880492] iwlwifi :02:00.0: 0x02C8 | l2p_control [ 800.880495] iwlwifi :02:00.0: 0x00018030 | l2p_duration [ 800.880498] iwlwifi :02:00.0: 0x00BF | l2p_mhvalid [ 800.880501] iwlwifi :02:00.0: 0x00EF | l2p_addr_match [ 800.880503] iwlwifi :02:00.0: 0x000D | lmpm_pmg_sel [ 800.880506] iwlwifi :02:00.0: 0x30031805 | timestamp [ 800.880509] iwlwifi :02:00.0: 0xE0F0 | flow_handler > > The last thing it says before falling over is: > > wlp1s0: authenticate with xx:xx:xx:xx:xx:xx > wlp1s0: send auth to xx:xx:xx:xx:xx:xx (try 1/3) > wlp1s0: send auth to xx:xx:xx:xx:xx:xx (try 2/3) > > and then it goes all titsup. > > I thought that it might be because I had downloaded one of the daily > firmware versions (it calls itself iwlwifi-7260-17.ucode, but isn't a > real release afaik - but it has worked fien for me before), but the > problem persists with the ver-16 ucode too, so that wasn't it. > > I haven't bisected it, but there is absolutely nothing odd in my hardware. > > I do have a 802.11ac network, which apparently not everybody does, > judging by previous bug-reports of mine.. > > Intel iwlwifi people: please check this out. > >Linus > -- > To unsubscribe from this list: send the line "unsubscribe linux-wireless" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT] Networking
On Wed, May 18, 2016 at 4:00 AM, Linus Torvalds wrote: > On Tue, May 17, 2016 at 12:11 PM, David Miller wrote: >> >> Highlights: > > Lowlights: > > 1) the iwlwifi driver seems to be broken > > My laptop that uses the intel 7680 iwlwifi module no longer connects > to the network. It fails with a "Microcode SW error detected." and > spews out register state over and over again. Can we have the register state and the ASSERT / NMI / whatever that goes along with it? This clearly means that the firmware is crashing, but I don't know why, I copied here the lines that I need from another bug with another device with another firmware, but the log that we will still explain what I need: [ 800.880402] iwlwifi :02:00.0: Start IWL Error Log Dump: [ 800.880406] iwlwifi :02:00.0: Status: 0x, count: 6 [ 800.880409] iwlwifi :02:00.0: Loaded firmware version: 21.311951.0 [ 800.880413] iwlwifi :02:00.0: 0x0394 | ADVANCED_SYSASSERT [ 800.880416] iwlwifi :02:00.0: 0x0220 | trm_hw_status0 [ 800.880419] iwlwifi :02:00.0: 0x | trm_hw_status1 [ 800.880422] iwlwifi :02:00.0: 0x0BD8 | branchlink2 [ 800.880425] iwlwifi :02:00.0: 0x00026AC4 | interruptlink1 [ 800.880428] iwlwifi :02:00.0: 0x | interruptlink2 [ 800.880431] iwlwifi :02:00.0: 0x0001 | data1 [ 800.880434] iwlwifi :02:00.0: 0x02039845 | data2 [ 800.880437] iwlwifi :02:00.0: 0x0056 | data3 [ 800.880440] iwlwifi :02:00.0: 0x8E4184A7 | beacon time [ 800.880443] iwlwifi :02:00.0: 0x30E2CB41 | tsf low [ 800.880446] iwlwifi :02:00.0: 0x0027 | tsf hi [ 800.880449] iwlwifi :02:00.0: 0x | time gp1 [ 800.880451] iwlwifi :02:00.0: 0x2F842F8A | time gp2 [ 800.880454] iwlwifi :02:00.0: 0x | uCode revision type [ 800.880457] iwlwifi :02:00.0: 0x0015 | uCode version major [ 800.880460] iwlwifi :02:00.0: 0x0004C28F | uCode version minor [ 800.880463] iwlwifi :02:00.0: 0x0201 | hw version [ 800.880466] iwlwifi :02:00.0: 0x00489008 | board version [ 800.880469] iwlwifi :02:00.0: 0x001C | hcmd [ 800.880472] iwlwifi :02:00.0: 0x24022000 | isr0 [ 800.880475] iwlwifi :02:00.0: 0x0100 | isr1 [ 800.880478] iwlwifi :02:00.0: 0x580A | isr2 [ 800.880481] iwlwifi :02:00.0: 0x4041FCC1 | isr3 [ 800.880483] iwlwifi :02:00.0: 0x | isr4 [ 800.880486] iwlwifi :02:00.0: 0x00800110 | last cmd Id [ 800.880489] iwlwifi :02:00.0: 0x | wait_event [ 800.880492] iwlwifi :02:00.0: 0x02C8 | l2p_control [ 800.880495] iwlwifi :02:00.0: 0x00018030 | l2p_duration [ 800.880498] iwlwifi :02:00.0: 0x00BF | l2p_mhvalid [ 800.880501] iwlwifi :02:00.0: 0x00EF | l2p_addr_match [ 800.880503] iwlwifi :02:00.0: 0x000D | lmpm_pmg_sel [ 800.880506] iwlwifi :02:00.0: 0x30031805 | timestamp [ 800.880509] iwlwifi :02:00.0: 0xE0F0 | flow_handler > > The last thing it says before falling over is: > > wlp1s0: authenticate with xx:xx:xx:xx:xx:xx > wlp1s0: send auth to xx:xx:xx:xx:xx:xx (try 1/3) > wlp1s0: send auth to xx:xx:xx:xx:xx:xx (try 2/3) > > and then it goes all titsup. > > I thought that it might be because I had downloaded one of the daily > firmware versions (it calls itself iwlwifi-7260-17.ucode, but isn't a > real release afaik - but it has worked fien for me before), but the > problem persists with the ver-16 ucode too, so that wasn't it. > > I haven't bisected it, but there is absolutely nothing odd in my hardware. > > I do have a 802.11ac network, which apparently not everybody does, > judging by previous bug-reports of mine.. > > Intel iwlwifi people: please check this out. > >Linus > -- > To unsubscribe from this list: send the line "unsubscribe linux-wireless" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
linux-next: manual merge of the dax-misc tree with the nvdimm tree
Hi all, Today's linux-next merge of the dax-misc tree got a conflict in: fs/block_dev.c between commit: 8044aae6f374 ("Revert "block: enable dax for raw block devices"") from the nvdimm tree and commit: 02fbd139759f ("dax: Remove complete_unwritten argument") from the dax-misc tree. I fixed it up (the former removed the code modified by the latter) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell
linux-next: manual merge of the dax-misc tree with the nvdimm tree
Hi all, Today's linux-next merge of the dax-misc tree got a conflict in: fs/block_dev.c between commit: 8044aae6f374 ("Revert "block: enable dax for raw block devices"") from the nvdimm tree and commit: 02fbd139759f ("dax: Remove complete_unwritten argument") from the dax-misc tree. I fixed it up (the former removed the code modified by the latter) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell
Re: 45aebeaf4f67 "ovl: Ensure upper filesystem supports d_type" breaking Docker
Hi Vivek, My sincere apologies - it turns out I *was* running on xfs with ftype=0. Someone in the office had moved docker's storage without me noticing. Apologies to all whose time I wasted. Regards, Daniel Vivek Goyalwrites: > On Tue, May 17, 2016 at 10:15:21AM +0200, Miklos Szeredi wrote: >> On Tue, May 17, 2016 at 8:28 AM, Al Viro wrote: >> > On Mon, May 16, 2016 at 09:07:27AM -0400, Vivek Goyal wrote: >> >> So it became clear that we need a check at mount time to make sure >> >> d_type is supported otherwise error out. This will require users to >> >> do mkfs.xfs with ftype=1 to make progress. >> >> >> >> I think new defaults for mkfs.xfs are such that ftype=1 is set. I am >> >> not sure which version that change was made in. >> > >> > Dumb question - can we end up with empty workdir at that point? Because >> > if we do, the check would appear to return a false negative, no matter >> > what fs supports... >> >> ovl_workdir_create() creates a subdirectory of workdir ("work") so >> workdir itself won't be empty after that. If somebody else messes >> with workdir, then we are screwed anyway. > > Right. Initially I was creating a directory of my own and later realized > that ovl_workdir_create() already creates one. > > Having said that, what happens when ovl_workdir_create() fails and we > mount overlayfs read only. In that case I think we will conclude that > underlying fs does not support d_type and mounting will fail. > > Any thoughts, on how to handle this failure path better? > > Daniel, > > Yesterday Eric Sandeen told me that I can run "xfs_info " to > figure out if ftype is 0 or 1. You might want to run "xfs_info /" and > ensure ftype=0 in your case and overlay is not detecting it wrong. > > Thanks > Vivek
Re: 45aebeaf4f67 "ovl: Ensure upper filesystem supports d_type" breaking Docker
Hi Vivek, My sincere apologies - it turns out I *was* running on xfs with ftype=0. Someone in the office had moved docker's storage without me noticing. Apologies to all whose time I wasted. Regards, Daniel Vivek Goyal writes: > On Tue, May 17, 2016 at 10:15:21AM +0200, Miklos Szeredi wrote: >> On Tue, May 17, 2016 at 8:28 AM, Al Viro wrote: >> > On Mon, May 16, 2016 at 09:07:27AM -0400, Vivek Goyal wrote: >> >> So it became clear that we need a check at mount time to make sure >> >> d_type is supported otherwise error out. This will require users to >> >> do mkfs.xfs with ftype=1 to make progress. >> >> >> >> I think new defaults for mkfs.xfs are such that ftype=1 is set. I am >> >> not sure which version that change was made in. >> > >> > Dumb question - can we end up with empty workdir at that point? Because >> > if we do, the check would appear to return a false negative, no matter >> > what fs supports... >> >> ovl_workdir_create() creates a subdirectory of workdir ("work") so >> workdir itself won't be empty after that. If somebody else messes >> with workdir, then we are screwed anyway. > > Right. Initially I was creating a directory of my own and later realized > that ovl_workdir_create() already creates one. > > Having said that, what happens when ovl_workdir_create() fails and we > mount overlayfs read only. In that case I think we will conclude that > underlying fs does not support d_type and mounting will fail. > > Any thoughts, on how to handle this failure path better? > > Daniel, > > Yesterday Eric Sandeen told me that I can run "xfs_info " to > figure out if ftype is 0 or 1. You might want to run "xfs_info /" and > ensure ftype=0 in your case and overlay is not detecting it wrong. > > Thanks > Vivek
Re: [PATCH 02/17] perf tools: Add evlist channel helpers
On 2016/5/13 21:05, Arnaldo Carvalho de Melo wrote: Em Fri, May 13, 2016 at 07:55:59AM +, Wang Nan escreveu: In this commit sereval helpers are introduced to support the principle several of channel. Channels hold different groups of evsels which configured differently. It will be used for overwritable evsels, which allows perf why not use multiple evlists? An "evlist" is a "list of evsels", why do we need yet another way of grouping evlists? - Arnaldo There's an assumption all over perf that there's only one evlist: in 'struct record' there's an 'evlist' pointer, in 'struct session' there's also an 'evlist' pointer. Trying to change them to an array results in 181 errors, so I think fundamentally moving to multiple evlists is nearly impossible. Now I'm thinking introducing auxiliary evlists to perf record. We still obey one evlist assumption, only creates separated evlists for mmap. Thank you.
Re: [PATCH 02/17] perf tools: Add evlist channel helpers
On 2016/5/13 21:05, Arnaldo Carvalho de Melo wrote: Em Fri, May 13, 2016 at 07:55:59AM +, Wang Nan escreveu: In this commit sereval helpers are introduced to support the principle several of channel. Channels hold different groups of evsels which configured differently. It will be used for overwritable evsels, which allows perf why not use multiple evlists? An "evlist" is a "list of evsels", why do we need yet another way of grouping evlists? - Arnaldo There's an assumption all over perf that there's only one evlist: in 'struct record' there's an 'evlist' pointer, in 'struct session' there's also an 'evlist' pointer. Trying to change them to an array results in 181 errors, so I think fundamentally moving to multiple evlists is nearly impossible. Now I'm thinking introducing auxiliary evlists to perf record. We still obey one evlist assumption, only creates separated evlists for mmap. Thank you.
Re: [PATCH v12 05/10] arm64: Kprobes with single stepping support
On Tue, 17 May 2016 16:58:09 +0800 Huang Shijiewrote: > On Wed, Apr 27, 2016 at 02:53:00PM -0400, David Long wrote: > > + > > +/* > > + * Interrupts need to be disabled before single-step mode is set, and not > > + * reenabled until after single-step mode ends. > > + * Without disabling interrupt on local CPU, there is a chance of > > + * interrupt occurrence in the period of exception return and start of > > + * out-of-line single-step, that result in wrongly single stepping > > + * into the interrupt handler. > > + */ > > +static void __kprobes kprobes_save_local_irqflag(struct pt_regs *regs) > > +{ > > + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); > > Why not add a parameter for this function to save the @kcb? Good catch, it should use same kcb of caller. > > > + > > + kcb->saved_irqflag = regs->pstate; > > + regs->pstate |= PSR_I_BIT; > > +} > > + > > +static void __kprobes kprobes_restore_local_irqflag(struct pt_regs *regs) > > +{ > > + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); > ditto. > > > + > > + if (kcb->saved_irqflag & PSR_I_BIT) > > + regs->pstate |= PSR_I_BIT; > > + else > > + regs->pstate &= ~PSR_I_BIT; > > +} > > + > > +static void __kprobes > > +set_ss_context(struct kprobe_ctlblk *kcb, unsigned long addr) > > +{ > > + kcb->ss_ctx.ss_pending = true; > > + kcb->ss_ctx.match_addr = addr + sizeof(kprobe_opcode_t); > > +} > > + > > +static void __kprobes clear_ss_context(struct kprobe_ctlblk *kcb) > > +{ > > + kcb->ss_ctx.ss_pending = false; > > + kcb->ss_ctx.match_addr = 0; > > +} > > + > > +static void __kprobes setup_singlestep(struct kprobe *p, > > +struct pt_regs *regs, > > +struct kprobe_ctlblk *kcb, int reenter) > > +{ > > + unsigned long slot; > > + > > + if (reenter) { > > + save_previous_kprobe(kcb); > > + set_current_kprobe(p); > > + kcb->kprobe_status = KPROBE_REENTER; > > + } else { > > + kcb->kprobe_status = KPROBE_HIT_SS; > > + } > > + > > + if (p->ainsn.insn) { > > + /* prepare for single stepping */ > > + slot = (unsigned long)p->ainsn.insn; > > + > > + set_ss_context(kcb, slot); /* mark pending ss */ > > + > > + if (kcb->kprobe_status == KPROBE_REENTER) > > + spsr_set_debug_flag(regs, 0); > > + > > + /* IRQs and single stepping do not mix well. */ > > + kprobes_save_local_irqflag(regs); > > + kernel_enable_single_step(regs); > > + instruction_pointer(regs) = slot; > > + } else { > > + BUG(); You'd better use BUG_ON(!p->ainsn.insn); > > + } > > +} > > + > > +static int __kprobes reenter_kprobe(struct kprobe *p, > > + struct pt_regs *regs, > > + struct kprobe_ctlblk *kcb) > > +{ > > + switch (kcb->kprobe_status) { > > + case KPROBE_HIT_SSDONE: > > + case KPROBE_HIT_ACTIVE: > > + kprobes_inc_nmissed_count(p); > > + setup_singlestep(p, regs, kcb, 1); > > + break; > > + case KPROBE_HIT_SS: > > + case KPROBE_REENTER: > > + pr_warn("Unrecoverable kprobe detected at %p.\n", p->addr); > > + dump_kprobe(p); > > + BUG(); > > + break; > > + default: > > + WARN_ON(1); > > + return 0; > > + } > > + > > + return 1; > > +} > > + > > +static void __kprobes > > +post_kprobe_handler(struct kprobe_ctlblk *kcb, struct pt_regs *regs) > > +{ > > + struct kprobe *cur = kprobe_running(); > > + > > + if (!cur) > > + return; > > + > > + /* return addr restore if non-branching insn */ > > + if (cur->ainsn.restore.type == RESTORE_PC) { > > + instruction_pointer(regs) = cur->ainsn.restore.addr; > > + if (!instruction_pointer(regs)) > > + BUG(); > > + } > > + > > + /* restore back original saved kprobe variables and continue */ > > + if (kcb->kprobe_status == KPROBE_REENTER) { > > + restore_previous_kprobe(kcb); > > + return; > > + } > > + /* call post handler */ > > + kcb->kprobe_status = KPROBE_HIT_SSDONE; > > + if (cur->post_handler) { > > + /* post_handler can hit breakpoint and single step > > + * again, so we enable D-flag for recursive exception. > > + */ > > + cur->post_handler(cur, regs, 0); > > + } > > + > > + reset_current_kprobe(); > > +} > > + > > +int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr) > > +{ > > + struct kprobe *cur = kprobe_running(); > > + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); > > + > > + switch (kcb->kprobe_status) { > > + case KPROBE_HIT_SS: > > + case
Re: [PATCH v12 05/10] arm64: Kprobes with single stepping support
On Tue, 17 May 2016 16:58:09 +0800 Huang Shijie wrote: > On Wed, Apr 27, 2016 at 02:53:00PM -0400, David Long wrote: > > + > > +/* > > + * Interrupts need to be disabled before single-step mode is set, and not > > + * reenabled until after single-step mode ends. > > + * Without disabling interrupt on local CPU, there is a chance of > > + * interrupt occurrence in the period of exception return and start of > > + * out-of-line single-step, that result in wrongly single stepping > > + * into the interrupt handler. > > + */ > > +static void __kprobes kprobes_save_local_irqflag(struct pt_regs *regs) > > +{ > > + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); > > Why not add a parameter for this function to save the @kcb? Good catch, it should use same kcb of caller. > > > + > > + kcb->saved_irqflag = regs->pstate; > > + regs->pstate |= PSR_I_BIT; > > +} > > + > > +static void __kprobes kprobes_restore_local_irqflag(struct pt_regs *regs) > > +{ > > + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); > ditto. > > > + > > + if (kcb->saved_irqflag & PSR_I_BIT) > > + regs->pstate |= PSR_I_BIT; > > + else > > + regs->pstate &= ~PSR_I_BIT; > > +} > > + > > +static void __kprobes > > +set_ss_context(struct kprobe_ctlblk *kcb, unsigned long addr) > > +{ > > + kcb->ss_ctx.ss_pending = true; > > + kcb->ss_ctx.match_addr = addr + sizeof(kprobe_opcode_t); > > +} > > + > > +static void __kprobes clear_ss_context(struct kprobe_ctlblk *kcb) > > +{ > > + kcb->ss_ctx.ss_pending = false; > > + kcb->ss_ctx.match_addr = 0; > > +} > > + > > +static void __kprobes setup_singlestep(struct kprobe *p, > > +struct pt_regs *regs, > > +struct kprobe_ctlblk *kcb, int reenter) > > +{ > > + unsigned long slot; > > + > > + if (reenter) { > > + save_previous_kprobe(kcb); > > + set_current_kprobe(p); > > + kcb->kprobe_status = KPROBE_REENTER; > > + } else { > > + kcb->kprobe_status = KPROBE_HIT_SS; > > + } > > + > > + if (p->ainsn.insn) { > > + /* prepare for single stepping */ > > + slot = (unsigned long)p->ainsn.insn; > > + > > + set_ss_context(kcb, slot); /* mark pending ss */ > > + > > + if (kcb->kprobe_status == KPROBE_REENTER) > > + spsr_set_debug_flag(regs, 0); > > + > > + /* IRQs and single stepping do not mix well. */ > > + kprobes_save_local_irqflag(regs); > > + kernel_enable_single_step(regs); > > + instruction_pointer(regs) = slot; > > + } else { > > + BUG(); You'd better use BUG_ON(!p->ainsn.insn); > > + } > > +} > > + > > +static int __kprobes reenter_kprobe(struct kprobe *p, > > + struct pt_regs *regs, > > + struct kprobe_ctlblk *kcb) > > +{ > > + switch (kcb->kprobe_status) { > > + case KPROBE_HIT_SSDONE: > > + case KPROBE_HIT_ACTIVE: > > + kprobes_inc_nmissed_count(p); > > + setup_singlestep(p, regs, kcb, 1); > > + break; > > + case KPROBE_HIT_SS: > > + case KPROBE_REENTER: > > + pr_warn("Unrecoverable kprobe detected at %p.\n", p->addr); > > + dump_kprobe(p); > > + BUG(); > > + break; > > + default: > > + WARN_ON(1); > > + return 0; > > + } > > + > > + return 1; > > +} > > + > > +static void __kprobes > > +post_kprobe_handler(struct kprobe_ctlblk *kcb, struct pt_regs *regs) > > +{ > > + struct kprobe *cur = kprobe_running(); > > + > > + if (!cur) > > + return; > > + > > + /* return addr restore if non-branching insn */ > > + if (cur->ainsn.restore.type == RESTORE_PC) { > > + instruction_pointer(regs) = cur->ainsn.restore.addr; > > + if (!instruction_pointer(regs)) > > + BUG(); > > + } > > + > > + /* restore back original saved kprobe variables and continue */ > > + if (kcb->kprobe_status == KPROBE_REENTER) { > > + restore_previous_kprobe(kcb); > > + return; > > + } > > + /* call post handler */ > > + kcb->kprobe_status = KPROBE_HIT_SSDONE; > > + if (cur->post_handler) { > > + /* post_handler can hit breakpoint and single step > > + * again, so we enable D-flag for recursive exception. > > + */ > > + cur->post_handler(cur, regs, 0); > > + } > > + > > + reset_current_kprobe(); > > +} > > + > > +int __kprobes kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr) > > +{ > > + struct kprobe *cur = kprobe_running(); > > + struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); > > + > > + switch (kcb->kprobe_status) { > > + case KPROBE_HIT_SS: > > + case KPROBE_REENTER: > > +
Re: [PATCH v8 13/14] usb: gadget: udc: adapt to OTG core
On Mon, May 16, 2016 at 12:51:53PM +0300, Roger Quadros wrote: > On 16/05/16 12:23, Peter Chen wrote: > > On Mon, May 16, 2016 at 11:26:57AM +0300, Roger Quadros wrote: > >> Hi, > >> > >> On 16/05/16 10:02, Peter Chen wrote: > >>> On Fri, May 13, 2016 at 01:03:27PM +0300, Roger Quadros wrote: > + > +static int usb_gadget_connect_control(struct usb_gadget *gadget, bool > connect) > +{ > +struct usb_udc *udc; > + > +mutex_lock(_lock); > +udc = usb_gadget_to_udc(gadget); > +if (!udc) { > +dev_err(gadget->dev.parent, "%s: gadget not > registered.\n", > +__func__); > +mutex_unlock(_lock); > +return -EINVAL; > +} > + > +if (connect) { > +if (!gadget->connected) > +usb_gadget_connect(udc->gadget); > +} else { > +if (gadget->connected) { > +usb_gadget_disconnect(udc->gadget); > +udc->driver->disconnect(udc->gadget); > +} > +} > + > +mutex_unlock(_lock); > + > +return 0; > +} > + > >>> > >>> Since this is called for vbus interrupt, why not using > >>> usb_udc_vbus_handler directly, and call udc->driver->disconnect > >>> at usb_gadget_stop. > >> > >> We can't assume that this is always called for vbus interrupt so > >> I decided not to call usb_udc_vbus_handler. > >> > >> udc->vbus is really pointless for us. We keep vbus states in our > >> state machine and leave udc->vbus as ture always. > >> > >> Why do you want to move udc->driver->disconnect() to stop? > >> If USB controller disconnected from bus then the gadget driver > >> must be notified about the disconnect immediately. The controller > >> may or may not be stopped by the core. > >> > > > > Then, would you give some comments when this API will be used? > > I was assumed it is only used for drd state machine. > > drd_state machine didn't even need this API in the first place :). > You guys wanted me to separate out start/stop and connect/disconnect for full > OTG case. > Won't full OTG state machine want to use this API? If not what would it use? > Oh, I meant only drd and fully otg state machine needs it. I am wondering if we need have a new API to do it. Two questions: - Except for vbus interrupt, any chances this API will be used at current logic? - When this API is called but without a coming gadget->stop? -- Best Regards, Peter Chen
Re: [PATCH v8 13/14] usb: gadget: udc: adapt to OTG core
On Mon, May 16, 2016 at 12:51:53PM +0300, Roger Quadros wrote: > On 16/05/16 12:23, Peter Chen wrote: > > On Mon, May 16, 2016 at 11:26:57AM +0300, Roger Quadros wrote: > >> Hi, > >> > >> On 16/05/16 10:02, Peter Chen wrote: > >>> On Fri, May 13, 2016 at 01:03:27PM +0300, Roger Quadros wrote: > + > +static int usb_gadget_connect_control(struct usb_gadget *gadget, bool > connect) > +{ > +struct usb_udc *udc; > + > +mutex_lock(_lock); > +udc = usb_gadget_to_udc(gadget); > +if (!udc) { > +dev_err(gadget->dev.parent, "%s: gadget not > registered.\n", > +__func__); > +mutex_unlock(_lock); > +return -EINVAL; > +} > + > +if (connect) { > +if (!gadget->connected) > +usb_gadget_connect(udc->gadget); > +} else { > +if (gadget->connected) { > +usb_gadget_disconnect(udc->gadget); > +udc->driver->disconnect(udc->gadget); > +} > +} > + > +mutex_unlock(_lock); > + > +return 0; > +} > + > >>> > >>> Since this is called for vbus interrupt, why not using > >>> usb_udc_vbus_handler directly, and call udc->driver->disconnect > >>> at usb_gadget_stop. > >> > >> We can't assume that this is always called for vbus interrupt so > >> I decided not to call usb_udc_vbus_handler. > >> > >> udc->vbus is really pointless for us. We keep vbus states in our > >> state machine and leave udc->vbus as ture always. > >> > >> Why do you want to move udc->driver->disconnect() to stop? > >> If USB controller disconnected from bus then the gadget driver > >> must be notified about the disconnect immediately. The controller > >> may or may not be stopped by the core. > >> > > > > Then, would you give some comments when this API will be used? > > I was assumed it is only used for drd state machine. > > drd_state machine didn't even need this API in the first place :). > You guys wanted me to separate out start/stop and connect/disconnect for full > OTG case. > Won't full OTG state machine want to use this API? If not what would it use? > Oh, I meant only drd and fully otg state machine needs it. I am wondering if we need have a new API to do it. Two questions: - Except for vbus interrupt, any chances this API will be used at current logic? - When this API is called but without a coming gadget->stop? -- Best Regards, Peter Chen
Re: Re: [PATCH v8 3/5] mfd: hi655x: Add MFD driver for hi655x
On 19 April 2016 at 14:53, Lee Jones <lee.jo...@linaro.org> wrote: > > On Tue, 19 Apr 2016, Guodong Xu wrote: > > > On 13 April 2016 at 08:51, Chen Feng <puck.c...@hisilicon.com> wrote: > > > > > > > > > > > > Forwarded Message > > > Subject: Re: [PATCH v8 3/5] mfd: hi655x: Add MFD driver for hi655x > > > Date: Mon, 11 Apr 2016 11:41:06 +0100 > > > From: Lee Jones <lee.jo...@linaro.org> > > > To: Chen Feng <puck.c...@hisilicon.com> > > > CC: lgirdw...@gmail.com, broo...@kernel.org, > > > linux-kernel@vger.kernel.org, w...@huawei.com, > > > kong.kongxin...@hisilicon.com, haojian.zhu...@linaro.org, > > > suzhuangl...@hisilicon.com, dan.z...@hisilicon.com > > > > > > On Sun, 14 Feb 2016, Chen Feng wrote: > > > > > > > Add PMIC MFD driver to support hisilicon hi665x. > > > > > > > > Signed-off-by: Chen Feng <puck.c...@hisilicon.com> > > > > Signed-off-by: Fei Wang <w...@huawei.com> > > > > Signed-off-by: Xinwei Kong <kong.kongxin...@hisilicon.com> > > > > Reviewed-by: Haojian Zhuang <haojian.zhu...@linaro.org> > > > > Acked-by: Lee Jones <lee.jo...@linaro.org> > > > > --- > > > > drivers/mfd/Kconfig | 10 +++ > > > > drivers/mfd/Makefile| 1 + > > > > drivers/mfd/hi655x-pmic.c | 162 > > > > > > > > include/linux/mfd/hi655x-pmic.h | 55 ++ > > > > 4 files changed, 228 insertions(+) > > > > create mode 100644 drivers/mfd/hi655x-pmic.c > > > > create mode 100644 include/linux/mfd/hi655x-pmic.h > > > > > > Applied, thanks. > > > > Hi, Lee, Mark > > > > I still didn't see this patch in linux-next (next-20160418) since your > > replied "Applied". Are you expecting anything else? Dependencies? > > > > I didn't see any unsolved review comments actually. But if there is, > > please let us know, so I can send an updated version. > > When I applied your patch, I also added ~40 other patches. I haven't > yet got around to editing and pushing them all to -next. I will put > some time aside this morning in order to complete the push. Hi, Lee As of this morning, I still cannot see hi655x in your for-mfd-next branch and in linux-next (next-20160517) I saw this patch integrated on Apr/19: mfd: hi655x: Add document for hi665x PMIC But apparently missing this one: [PATCH v8 3/5] mfd: hi655x: Add MFD driver for hi655x Would you please have a check? Sorry if I'm asking something stupid. Look forward to seeing it in v4.7-rcs. Thank you. -Guodong > > > > > > diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig > > > > index 9ca66de..5b1c091 100644 > > > > --- a/drivers/mfd/Kconfig > > > > +++ b/drivers/mfd/Kconfig > > > > @@ -284,6 +284,16 @@ config MFD_HI6421_PMIC > > > > menus in order to enable them. > > > > We communicate with the Hi6421 via memory-mapped I/O. > > > > > > > > +config MFD_HI655X_PMIC > > > > + tristate "HiSilicon Hi655X series PMU/Codec IC" > > > > + depends on ARCH_HISI || COMPILE_TEST > > > > + depends on OF > > > > + select MFD_CORE > > > > + select REGMAP_MMIO > > > > + select REGMAP_IRQ > > > > + help > > > > + Select this option to enable Hisilicon hi655x series pmic > > > > driver. > > > > + > > > > config HTC_EGPIO > > > > bool "HTC EGPIO support" > > > > depends on GPIOLIB && ARM > > > > diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile > > > > index 0f230a6..1e166c1 100644 > > > > --- a/drivers/mfd/Makefile > > > > +++ b/drivers/mfd/Makefile > > > > @@ -190,6 +190,7 @@ obj-$(CONFIG_MFD_STW481X) += stw481x.o > > > > obj-$(CONFIG_MFD_IPAQ_MICRO) += ipaq-micro.o > > > > obj-$(CONFIG_MFD_MENF21BMC) += menf21bmc.o > > > > obj-$(CONFIG_MFD_HI6421_PMIC)+= hi6421-pmic-core.o > > > > +obj-$(CONFIG_MFD_HI655X_PMIC) += hi655x-pmic.o > > > > obj-$(CONFIG_MFD_DLN2) += dln2.o > > > > obj-$(CONFIG_MFD_RT5033) += rt5033.o > > > > obj-$(CONFIG_MFD_SKY81452) += sky81452.o > > > > diff --git a/drivers/mfd/hi655x-pmic.c b/drivers/mfd/hi655x-pmic.c > > >
Re: Re: [PATCH v8 3/5] mfd: hi655x: Add MFD driver for hi655x
On 19 April 2016 at 14:53, Lee Jones wrote: > > On Tue, 19 Apr 2016, Guodong Xu wrote: > > > On 13 April 2016 at 08:51, Chen Feng wrote: > > > > > > > > > > > > Forwarded Message > > > Subject: Re: [PATCH v8 3/5] mfd: hi655x: Add MFD driver for hi655x > > > Date: Mon, 11 Apr 2016 11:41:06 +0100 > > > From: Lee Jones > > > To: Chen Feng > > > CC: lgirdw...@gmail.com, broo...@kernel.org, > > > linux-kernel@vger.kernel.org, w...@huawei.com, > > > kong.kongxin...@hisilicon.com, haojian.zhu...@linaro.org, > > > suzhuangl...@hisilicon.com, dan.z...@hisilicon.com > > > > > > On Sun, 14 Feb 2016, Chen Feng wrote: > > > > > > > Add PMIC MFD driver to support hisilicon hi665x. > > > > > > > > Signed-off-by: Chen Feng > > > > Signed-off-by: Fei Wang > > > > Signed-off-by: Xinwei Kong > > > > Reviewed-by: Haojian Zhuang > > > > Acked-by: Lee Jones > > > > --- > > > > drivers/mfd/Kconfig | 10 +++ > > > > drivers/mfd/Makefile| 1 + > > > > drivers/mfd/hi655x-pmic.c | 162 > > > > > > > > include/linux/mfd/hi655x-pmic.h | 55 ++ > > > > 4 files changed, 228 insertions(+) > > > > create mode 100644 drivers/mfd/hi655x-pmic.c > > > > create mode 100644 include/linux/mfd/hi655x-pmic.h > > > > > > Applied, thanks. > > > > Hi, Lee, Mark > > > > I still didn't see this patch in linux-next (next-20160418) since your > > replied "Applied". Are you expecting anything else? Dependencies? > > > > I didn't see any unsolved review comments actually. But if there is, > > please let us know, so I can send an updated version. > > When I applied your patch, I also added ~40 other patches. I haven't > yet got around to editing and pushing them all to -next. I will put > some time aside this morning in order to complete the push. Hi, Lee As of this morning, I still cannot see hi655x in your for-mfd-next branch and in linux-next (next-20160517) I saw this patch integrated on Apr/19: mfd: hi655x: Add document for hi665x PMIC But apparently missing this one: [PATCH v8 3/5] mfd: hi655x: Add MFD driver for hi655x Would you please have a check? Sorry if I'm asking something stupid. Look forward to seeing it in v4.7-rcs. Thank you. -Guodong > > > > > > diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig > > > > index 9ca66de..5b1c091 100644 > > > > --- a/drivers/mfd/Kconfig > > > > +++ b/drivers/mfd/Kconfig > > > > @@ -284,6 +284,16 @@ config MFD_HI6421_PMIC > > > > menus in order to enable them. > > > > We communicate with the Hi6421 via memory-mapped I/O. > > > > > > > > +config MFD_HI655X_PMIC > > > > + tristate "HiSilicon Hi655X series PMU/Codec IC" > > > > + depends on ARCH_HISI || COMPILE_TEST > > > > + depends on OF > > > > + select MFD_CORE > > > > + select REGMAP_MMIO > > > > + select REGMAP_IRQ > > > > + help > > > > + Select this option to enable Hisilicon hi655x series pmic > > > > driver. > > > > + > > > > config HTC_EGPIO > > > > bool "HTC EGPIO support" > > > > depends on GPIOLIB && ARM > > > > diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile > > > > index 0f230a6..1e166c1 100644 > > > > --- a/drivers/mfd/Makefile > > > > +++ b/drivers/mfd/Makefile > > > > @@ -190,6 +190,7 @@ obj-$(CONFIG_MFD_STW481X) += stw481x.o > > > > obj-$(CONFIG_MFD_IPAQ_MICRO) += ipaq-micro.o > > > > obj-$(CONFIG_MFD_MENF21BMC) += menf21bmc.o > > > > obj-$(CONFIG_MFD_HI6421_PMIC)+= hi6421-pmic-core.o > > > > +obj-$(CONFIG_MFD_HI655X_PMIC) += hi655x-pmic.o > > > > obj-$(CONFIG_MFD_DLN2) += dln2.o > > > > obj-$(CONFIG_MFD_RT5033) += rt5033.o > > > > obj-$(CONFIG_MFD_SKY81452) += sky81452.o > > > > diff --git a/drivers/mfd/hi655x-pmic.c b/drivers/mfd/hi655x-pmic.c > > > > new file mode 100644 > > > > index 000..05ddc78 > > > > --- /dev/null > > > > +++ b/drivers/mfd/hi655x-pmic.c > > > > @@ -0,0 +1,162 @@ > > > > +/* > > > > + * Device driver for MFD hi655
RE: [PATCH 1/3] ACPI: table upgrade: use cacheable map for tables
Hi, > From: Aleksey Makarov [mailto:aleksey.maka...@linaro.org] > Subject: [PATCH 1/3] ACPI: table upgrade: use cacheable map for tables > > The new memory allocated in acpi_table_initrd_init() is used to > copy the upgraded tables to it. So it should be mapped with > early_memunmap() instead of early_ioremap(). > > This is critical for ARM. > > Signed-off-by: Aleksey Makarov[Lv Zheng] Acked-by: Lv Zheng Thanks -Lv > --- > drivers/acpi/tables.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c > index a372f9e..449a649 100644 > --- a/drivers/acpi/tables.c > +++ b/drivers/acpi/tables.c > @@ -578,10 +578,10 @@ static void __init acpi_table_initrd_init(void *data, > size_t size) > clen = size; > if (clen > MAP_CHUNK_SIZE - slop) > clen = MAP_CHUNK_SIZE - slop; > - dest_p = early_ioremap(dest_addr & PAGE_MASK, > + dest_p = early_memremap(dest_addr & PAGE_MASK, >clen + slop); > memcpy(dest_p + slop, src_p, clen); > - early_iounmap(dest_p, clen + slop); > + early_memunmap(dest_p, clen + slop); > src_p += clen; > dest_addr += clen; > size -= clen; > -- > 2.8.2
RE: [PATCH 1/3] ACPI: table upgrade: use cacheable map for tables
Hi, > From: Aleksey Makarov [mailto:aleksey.maka...@linaro.org] > Subject: [PATCH 1/3] ACPI: table upgrade: use cacheable map for tables > > The new memory allocated in acpi_table_initrd_init() is used to > copy the upgraded tables to it. So it should be mapped with > early_memunmap() instead of early_ioremap(). > > This is critical for ARM. > > Signed-off-by: Aleksey Makarov [Lv Zheng] Acked-by: Lv Zheng Thanks -Lv > --- > drivers/acpi/tables.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c > index a372f9e..449a649 100644 > --- a/drivers/acpi/tables.c > +++ b/drivers/acpi/tables.c > @@ -578,10 +578,10 @@ static void __init acpi_table_initrd_init(void *data, > size_t size) > clen = size; > if (clen > MAP_CHUNK_SIZE - slop) > clen = MAP_CHUNK_SIZE - slop; > - dest_p = early_ioremap(dest_addr & PAGE_MASK, > + dest_p = early_memremap(dest_addr & PAGE_MASK, >clen + slop); > memcpy(dest_p + slop, src_p, clen); > - early_iounmap(dest_p, clen + slop); > + early_memunmap(dest_p, clen + slop); > src_p += clen; > dest_addr += clen; > size -= clen; > -- > 2.8.2
Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs
On 5/17/16 8:48 PM, Hekuang wrote: I don't understand why dso-prefix option is needed? Why make me type yet more options to the analysis command? Why can't the directory be located under the symfs tree in a known location and populated the same way it is without symfs? Because the default buidid folder path is $HOME/.debug/.buildid, and this $HOME is on the target machine, not the same as $HOME on the host. Without that option, we need to copy $HOME/.debug/.buildid to the 'known location in symfs', that's also an extra work. My argument for symfs is that $HOME is not relevant or if it is the path is symfs/$HOME. The use case is dealing with countless images -- some development, some production. I should be able to nuke the symfs when the analysis is done and everything related to it is gone. With the $HOME/.debug path it just grows on and on with no real means of pruning it. If the vdsos are for a particular symfs then why aren't the vdso's under it in a known location?
Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs
On 5/17/16 8:48 PM, Hekuang wrote: I don't understand why dso-prefix option is needed? Why make me type yet more options to the analysis command? Why can't the directory be located under the symfs tree in a known location and populated the same way it is without symfs? Because the default buidid folder path is $HOME/.debug/.buildid, and this $HOME is on the target machine, not the same as $HOME on the host. Without that option, we need to copy $HOME/.debug/.buildid to the 'known location in symfs', that's also an extra work. My argument for symfs is that $HOME is not relevant or if it is the path is symfs/$HOME. The use case is dealing with countless images -- some development, some production. I should be able to nuke the symfs when the analysis is done and everything related to it is gone. With the $HOME/.debug path it just grows on and on with no real means of pruning it. If the vdsos are for a particular symfs then why aren't the vdso's under it in a known location?
Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs
在 2016/5/18 9:51, David Ahern 写道: On 5/17/16 7:47 PM, Hekuang wrote: 在 2016/5/16 10:50, David Ahern 写道: On 5/15/16 7:30 PM, Hekuang wrote: In previous patch, I use 'perf buildid-cache -a' to add vdso binary into the HOST buildid dir. So 'perf buildid-cache' needs the symfs option? With this patch 'PATCH v3 3/7 UPDATE', the tree of symfs dir is like this: ├── debug($(dso-prefix)) │ ├── .build-id │ │ ├── 3a │ │ │ └── e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd -> ../../[kernel.kallsyms]/3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd │ │ └── 84 │ │ └── dbd75729adba57cc42f5544b25de571c0c8731 -> ../../[vdso32]/84dbd75729adba57cc42f5544b25de571c0c8731 │ ├── [kernel.kallsyms] │ │ └── 3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd │ ├── [vdso] │ │ └── 84dbd75729adba57cc42f5544b25de571c0c8731 │ └── [vdso32] │ └── 84dbd75729adba57cc42f5544b25de571c0c8731 ├── lib │ ├── ld-2.22.so │ └── libc-2.22.so ├── tmp │ └── hello └── xxx So all binaries we need are included in the symfs dir. I think this is consistent with your idea explained in previous mails. With this symfs, we do not need buildid dir anymore and what's your idea on 'perf buildid-cache' needs symfs option? after all, that only effects on buildid dir. I don't understand why dso-prefix option is needed? Why make me type yet more options to the analysis command? Why can't the directory be located under the symfs tree in a known location and populated the same way it is without symfs? Because the default buidid folder path is $HOME/.debug/.buildid, and this $HOME is on the target machine, not the same as $HOME on the host. Without that option, we need to copy $HOME/.debug/.buildid to the 'known location in symfs', that's also an extra work.
Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs
在 2016/5/18 9:51, David Ahern 写道: On 5/17/16 7:47 PM, Hekuang wrote: 在 2016/5/16 10:50, David Ahern 写道: On 5/15/16 7:30 PM, Hekuang wrote: In previous patch, I use 'perf buildid-cache -a' to add vdso binary into the HOST buildid dir. So 'perf buildid-cache' needs the symfs option? With this patch 'PATCH v3 3/7 UPDATE', the tree of symfs dir is like this: ├── debug($(dso-prefix)) │ ├── .build-id │ │ ├── 3a │ │ │ └── e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd -> ../../[kernel.kallsyms]/3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd │ │ └── 84 │ │ └── dbd75729adba57cc42f5544b25de571c0c8731 -> ../../[vdso32]/84dbd75729adba57cc42f5544b25de571c0c8731 │ ├── [kernel.kallsyms] │ │ └── 3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd │ ├── [vdso] │ │ └── 84dbd75729adba57cc42f5544b25de571c0c8731 │ └── [vdso32] │ └── 84dbd75729adba57cc42f5544b25de571c0c8731 ├── lib │ ├── ld-2.22.so │ └── libc-2.22.so ├── tmp │ └── hello └── xxx So all binaries we need are included in the symfs dir. I think this is consistent with your idea explained in previous mails. With this symfs, we do not need buildid dir anymore and what's your idea on 'perf buildid-cache' needs symfs option? after all, that only effects on buildid dir. I don't understand why dso-prefix option is needed? Why make me type yet more options to the analysis command? Why can't the directory be located under the symfs tree in a known location and populated the same way it is without symfs? Because the default buidid folder path is $HOME/.debug/.buildid, and this $HOME is on the target machine, not the same as $HOME on the host. Without that option, we need to copy $HOME/.debug/.buildid to the 'known location in symfs', that's also an extra work.
Re: CQ and RDMA READ/WRITE APIs
Hi Doug, On Tue, May 17, 2016 at 11:02 PM, Doug Ledfordwrote: > Nice catch there Bart. That was well before my role as maintainer and > so settles things well enough for me. IOW, I don't feel I need to worry > about trying to maintain the dual license nature of the RDMA stack as it > was broken long before I took over. Thanks for pointing that out. > Does it mean we can submit new code files under GPL only license? I submitted RDMA cgroup related code in ib_core under GPLv2 only license. Existing files which are calling those new APIs will continue to be dual license (similar to CQ and RDMA APIs)? Parav
Re: CQ and RDMA READ/WRITE APIs
Hi Doug, On Tue, May 17, 2016 at 11:02 PM, Doug Ledford wrote: > Nice catch there Bart. That was well before my role as maintainer and > so settles things well enough for me. IOW, I don't feel I need to worry > about trying to maintain the dual license nature of the RDMA stack as it > was broken long before I took over. Thanks for pointing that out. > Does it mean we can submit new code files under GPL only license? I submitted RDMA cgroup related code in ib_core under GPLv2 only license. Existing files which are calling those new APIs will continue to be dual license (similar to CQ and RDMA APIs)? Parav
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote: > So, if the code looks like the following, either now or in the future, > > static void __schedule(bool preempt) > { > ... > /* Clear RQCF_ACT_SKIP */ > rq->clock_update_flags = 0; > ... > delta = rq_clock(); > } Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it, you clear everything. And if you clear the RQCF_UPDATE also (maybe you shouldn't, but actually it does not matter), of course you will get a warning... In addition, it looks like multiple skips are possible, so: update_rq_clock() { rq->clock_update_flags |= RQCF_UPDATE; ... } instead of clearing the skip flag there.
Re: [RFC][PATCH 5/5] sched/core: Add debug code to catch missing update_rq_clock()
On Tue, May 17, 2016 at 01:24:15PM +0100, Matt Fleming wrote: > So, if the code looks like the following, either now or in the future, > > static void __schedule(bool preempt) > { > ... > /* Clear RQCF_ACT_SKIP */ > rq->clock_update_flags = 0; > ... > delta = rq_clock(); > } Sigh, you even said "Clear RQCF_ACT_SKIP", but you not only clear it, you clear everything. And if you clear the RQCF_UPDATE also (maybe you shouldn't, but actually it does not matter), of course you will get a warning... In addition, it looks like multiple skips are possible, so: update_rq_clock() { rq->clock_update_flags |= RQCF_UPDATE; ... } instead of clearing the skip flag there.
[PATCH] mm: fix duplicate words and typos
Signed-off-by: Li Peng--- mm/memcontrol.c | 2 +- mm/page_alloc.c | 6 +++--- mm/vmscan.c | 7 +++ mm/zswap.c | 2 +- 4 files changed, 8 insertions(+), 9 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index fe787f5..4b74255 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2293,7 +2293,7 @@ struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp) /* * If we are in a safe context (can wait, and not in interrupt -* context), we could be be predictable and return right away. +* context), we could be predictable and return right away. * This would guarantee that the allocation being performed * already belongs in the new cache. * diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c1069ef..93824cb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3030,7 +3030,7 @@ retry: /* * If an allocation failed after direct reclaim, it could be because * pages are pinned on the per-cpu lists or in high alloc reserves. -* Shrink them them and try again +* Shrink them and try again. */ if (!page && !drained) { unreserve_highatomic_pageblock(ac); @@ -4812,7 +4812,7 @@ static int zone_batchsize(struct zone *zone) * locking. * * Any new users of pcp->batch and pcp->high should ensure they can cope with - * those fields changing asynchronously (acording the the above rule). + * those fields changing asynchronously (according to the above rule). * * mutex_is_locked(_batch_high_lock) required when calling this function * outside of boot time (or some other assurance that no concurrent updaters @@ -5024,7 +5024,7 @@ int __meminit __early_pfn_to_nid(unsigned long pfn, * @max_low_pfn: The highest PFN that will be passed to memblock_free_early_nid * * If an architecture guarantees that all ranges registered contain no holes - * and may be freed, this this function may be used instead of calling + * and may be freed, this function may be used instead of calling * memblock_free_early_nid() manually. */ void __init free_bootmem_with_active_regions(int nid, unsigned long max_low_pfn) diff --git a/mm/vmscan.c b/mm/vmscan.c index 142cb61..8ff5a79 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1683,8 +1683,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, set_bit(ZONE_DIRTY, >flags); /* -* If kswapd scans pages marked marked for immediate -* reclaim and under writeback (nr_immediate), it implies +* If kswapd scans pages marked for immediate reclaim +* and under writeback (nr_immediate), it implies * that pages are cycling through the LRU faster than * they are written so also forcibly stall. */ @@ -3267,8 +3267,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx) /* * There should be no need to raise the scanning * priority if enough pages are already being scanned -* that that high watermark would be met at 100% -* efficiency. +* that high watermark would be met at 100% efficiency. */ if (kswapd_shrink_zone(zone, end_zone, )) raise_priority = false; diff --git a/mm/zswap.c b/mm/zswap.c index de0f119b..6d829d7 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -928,7 +928,7 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) * a load may happening concurrently * it is safe and okay to not free the entry * if we free the entry in the following put - * it it either okay to return !0 + * it either okay to return !0 */ fail: spin_lock(>lock); -- 1.8.3.1
[PATCH] mm: fix duplicate words and typos
Signed-off-by: Li Peng --- mm/memcontrol.c | 2 +- mm/page_alloc.c | 6 +++--- mm/vmscan.c | 7 +++ mm/zswap.c | 2 +- 4 files changed, 8 insertions(+), 9 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index fe787f5..4b74255 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2293,7 +2293,7 @@ struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp) /* * If we are in a safe context (can wait, and not in interrupt -* context), we could be be predictable and return right away. +* context), we could be predictable and return right away. * This would guarantee that the allocation being performed * already belongs in the new cache. * diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c1069ef..93824cb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3030,7 +3030,7 @@ retry: /* * If an allocation failed after direct reclaim, it could be because * pages are pinned on the per-cpu lists or in high alloc reserves. -* Shrink them them and try again +* Shrink them and try again. */ if (!page && !drained) { unreserve_highatomic_pageblock(ac); @@ -4812,7 +4812,7 @@ static int zone_batchsize(struct zone *zone) * locking. * * Any new users of pcp->batch and pcp->high should ensure they can cope with - * those fields changing asynchronously (acording the the above rule). + * those fields changing asynchronously (according to the above rule). * * mutex_is_locked(_batch_high_lock) required when calling this function * outside of boot time (or some other assurance that no concurrent updaters @@ -5024,7 +5024,7 @@ int __meminit __early_pfn_to_nid(unsigned long pfn, * @max_low_pfn: The highest PFN that will be passed to memblock_free_early_nid * * If an architecture guarantees that all ranges registered contain no holes - * and may be freed, this this function may be used instead of calling + * and may be freed, this function may be used instead of calling * memblock_free_early_nid() manually. */ void __init free_bootmem_with_active_regions(int nid, unsigned long max_low_pfn) diff --git a/mm/vmscan.c b/mm/vmscan.c index 142cb61..8ff5a79 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1683,8 +1683,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, set_bit(ZONE_DIRTY, >flags); /* -* If kswapd scans pages marked marked for immediate -* reclaim and under writeback (nr_immediate), it implies +* If kswapd scans pages marked for immediate reclaim +* and under writeback (nr_immediate), it implies * that pages are cycling through the LRU faster than * they are written so also forcibly stall. */ @@ -3267,8 +3267,7 @@ static int balance_pgdat(pg_data_t *pgdat, int order, int classzone_idx) /* * There should be no need to raise the scanning * priority if enough pages are already being scanned -* that that high watermark would be met at 100% -* efficiency. +* that high watermark would be met at 100% efficiency. */ if (kswapd_shrink_zone(zone, end_zone, )) raise_priority = false; diff --git a/mm/zswap.c b/mm/zswap.c index de0f119b..6d829d7 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -928,7 +928,7 @@ static int zswap_writeback_entry(struct zpool *pool, unsigned long handle) * a load may happening concurrently * it is safe and okay to not free the entry * if we free the entry in the following put - * it it either okay to return !0 + * it either okay to return !0 */ fail: spin_lock(>lock); -- 1.8.3.1
Re: [PATCH v12 00/10] arm64: Add kernel probes (kprobes) support
On Thu, May 12, 2016 at 10:26:40AM +0800, Li Bin wrote: > > > on 2016/5/11 23:33, James Morse wrote: > > Hi David, > > > > On 27/04/16 19:52, David Long wrote: > >> From: "David A. Long"> >> > >> This patchset is heavily based on Sandeepa Prabhu's ARM v8 kprobes patches, > >> first seen in October 2013. This version attempts to address concerns > >> raised by > >> reviewers and also fixes problems discovered during testing. > >> > >> This patchset adds support for kernel probes(kprobes), jump probes(jprobes) > >> and return probes(kretprobes) support for ARM64. > >> > >> The kprobes mechanism makes use of software breakpoint and single stepping > >> support available in the ARM v8 kernel. > > > > I applied this series on v4.6-rc7, and built the sample kprobes. They work > > fine, > > unless I throw ftrace into the mix too. > > > > I enabled the function_graph tracer, then tried to load the jprobe example > > module: > > -%<- > > root@ubuntu:/sys/kernel/debug/tracing# insmod /root/jprobe_example.ko > > Planted jprobe at ff80080c8f20, handler addr ff8000bb3000 > > root@ubuntu:/sys/kernel/debug/tracing# jprobe: clone_flags = 0x1200011, > > stack_st > > art = 0x0 stack_size = 0x0 > > Bad mode in Synchronous Abort handler detected, code 0x8605 -- IABT > > (current > > EL) > > CPU: 5 PID: 1047 Comm: systemd-udevd Not tainted 4.6.0-rc7+ #4064 > > Hardware name: ARM Juno development board (r1) (DT) > > task: ffc975948300 ti: ffc974e4c000 task.ti: ffc974e4c000 > > PC is at 0x0 > > LR is at 0x0 > > > > pc : [<>] lr : [<>] pstate: 6145 > > sp : ffc974e4ff00 > > x29: 01200011 x28: ffc974e4c000 > > x27: ff80088d x26: 00dc > > x25: 0120 x24: 0015 > > x23: 6000 x22: 007fa1b40e60 > > x21: 007fa1ce70d0 x20: > > x19: x18: 0a03 > > x17: 007fa1b40d90 x16: ff80080c9708 > > x15: 003b9aca x14: 007fddb7e5c0 > > x13: 007fa1b40e2c x12: 00d00ff0 > > x11: ff8009c4d000 x10: ff800920c000 > > x9 : ff8008f5c000 x8 : ffc976c06800 > > x7 : 0006daf2 x6 : 0015 > > x5 : 0004 x4 : ffc96e8690a0 > > x3 : 001ed7cbab74 x2 : ffc96e869000 > > x1 : x0 : > > > > Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP > > Modules linked in: jprobe_example > > CPU: 5 PID: 1047 Comm: systemd-udevd Not tainted 4.6.0-rc7+ #4064 > > Hardware name: ARM Juno development board (r1) (DT) > > task: ffc975948300 ti: ffc974e4c000 task.ti: ffc974e4c000 > > PC is at 0x0 > > LR is at 0x0 > > > > pc : [<>] lr : [<>] pstate: 6145 > > sp : ffc974e4ff00 > > x29: 01200011 x28: ffc974e4c000 > > x27: ff80088d x26: 00dc > > x25: 0120 x24: 0015 > > x23: 6000 x22: 007fa1b40e60 > > x21: 007fa1ce70d0 x20: > > x19: x18: 0a03 > > x17: 007fa1b40d90 x16: ff80080c9708 > > x15: 003b9aca x14: 007fddb7e5c0 > > x13: 007fa1b40e2c x12: 00d00ff0 > > x11: ff8009c4d000 x10: ff800920c000 > > x9 : ff8008f5c000 x8 : ffc976c06800 > > x7 : 0006daf2 x6 : 0015 > > x5 : 0004 x4 : ffc96e8690a0 > > x3 : 001ed7cbab74 x2 : ffc96e869000 > > x1 : x0 : > > > > Process systemd-udevd (pid: 1047, stack limit = 0xffc974e4c020) > > Stack: (0xffc974e4ff00 to 0xffc974e5) > > ff00: 0417 007fa1ce76f0 00dc 0417 > > ff20: 007fddb7ecf8 0005 > > ff40: ff01 003b9aca 00555b3868b0 007fa1b40d90 > > ff60: 0a03 007fddb7e5c0 007fddb7e5e0 > > ff80: 00555b358000 00558f56f0e0 00558f574f00 > > ffa0: 00558f574f00 04fa 00558f56f010 007fddb7e600 > > ffc0: 007fa1b40e2c 007fddb7e5c0 007fa1b40e60 6000 > > ffe0: 01200011 00dc 000484000200 08000200 > > Call trace: > > [< (null)>] (null) > > Code: bad PC value > > ---[ end trace 35d24aad799c2941 ]--- > > -%<- > > > > To solve this, it should pause function tracing before the jprobe handler is > called > and unpause it before it returns back to the function it probed. > > diff --git a/arch/arm64/kernel/kprobes.c b/arch/arm64/kernel/kprobes.c > index db2d95c..b21ed00 100644 > --- a/arch/arm64/kernel/kprobes.c > +++ b/arch/arm64/kernel/kprobes.c > @@ -714,6 +714,7 @@ int __kprobes setjmp_pre_handler(struct kprobe *p, struct > pt_regs *regs) > >
Re: [PATCH v12 00/10] arm64: Add kernel probes (kprobes) support
On Thu, May 12, 2016 at 10:26:40AM +0800, Li Bin wrote: > > > on 2016/5/11 23:33, James Morse wrote: > > Hi David, > > > > On 27/04/16 19:52, David Long wrote: > >> From: "David A. Long" > >> > >> This patchset is heavily based on Sandeepa Prabhu's ARM v8 kprobes patches, > >> first seen in October 2013. This version attempts to address concerns > >> raised by > >> reviewers and also fixes problems discovered during testing. > >> > >> This patchset adds support for kernel probes(kprobes), jump probes(jprobes) > >> and return probes(kretprobes) support for ARM64. > >> > >> The kprobes mechanism makes use of software breakpoint and single stepping > >> support available in the ARM v8 kernel. > > > > I applied this series on v4.6-rc7, and built the sample kprobes. They work > > fine, > > unless I throw ftrace into the mix too. > > > > I enabled the function_graph tracer, then tried to load the jprobe example > > module: > > -%<- > > root@ubuntu:/sys/kernel/debug/tracing# insmod /root/jprobe_example.ko > > Planted jprobe at ff80080c8f20, handler addr ff8000bb3000 > > root@ubuntu:/sys/kernel/debug/tracing# jprobe: clone_flags = 0x1200011, > > stack_st > > art = 0x0 stack_size = 0x0 > > Bad mode in Synchronous Abort handler detected, code 0x8605 -- IABT > > (current > > EL) > > CPU: 5 PID: 1047 Comm: systemd-udevd Not tainted 4.6.0-rc7+ #4064 > > Hardware name: ARM Juno development board (r1) (DT) > > task: ffc975948300 ti: ffc974e4c000 task.ti: ffc974e4c000 > > PC is at 0x0 > > LR is at 0x0 > > > > pc : [<>] lr : [<>] pstate: 6145 > > sp : ffc974e4ff00 > > x29: 01200011 x28: ffc974e4c000 > > x27: ff80088d x26: 00dc > > x25: 0120 x24: 0015 > > x23: 6000 x22: 007fa1b40e60 > > x21: 007fa1ce70d0 x20: > > x19: x18: 0a03 > > x17: 007fa1b40d90 x16: ff80080c9708 > > x15: 003b9aca x14: 007fddb7e5c0 > > x13: 007fa1b40e2c x12: 00d00ff0 > > x11: ff8009c4d000 x10: ff800920c000 > > x9 : ff8008f5c000 x8 : ffc976c06800 > > x7 : 0006daf2 x6 : 0015 > > x5 : 0004 x4 : ffc96e8690a0 > > x3 : 001ed7cbab74 x2 : ffc96e869000 > > x1 : x0 : > > > > Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP > > Modules linked in: jprobe_example > > CPU: 5 PID: 1047 Comm: systemd-udevd Not tainted 4.6.0-rc7+ #4064 > > Hardware name: ARM Juno development board (r1) (DT) > > task: ffc975948300 ti: ffc974e4c000 task.ti: ffc974e4c000 > > PC is at 0x0 > > LR is at 0x0 > > > > pc : [<>] lr : [<>] pstate: 6145 > > sp : ffc974e4ff00 > > x29: 01200011 x28: ffc974e4c000 > > x27: ff80088d x26: 00dc > > x25: 0120 x24: 0015 > > x23: 6000 x22: 007fa1b40e60 > > x21: 007fa1ce70d0 x20: > > x19: x18: 0a03 > > x17: 007fa1b40d90 x16: ff80080c9708 > > x15: 003b9aca x14: 007fddb7e5c0 > > x13: 007fa1b40e2c x12: 00d00ff0 > > x11: ff8009c4d000 x10: ff800920c000 > > x9 : ff8008f5c000 x8 : ffc976c06800 > > x7 : 0006daf2 x6 : 0015 > > x5 : 0004 x4 : ffc96e8690a0 > > x3 : 001ed7cbab74 x2 : ffc96e869000 > > x1 : x0 : > > > > Process systemd-udevd (pid: 1047, stack limit = 0xffc974e4c020) > > Stack: (0xffc974e4ff00 to 0xffc974e5) > > ff00: 0417 007fa1ce76f0 00dc 0417 > > ff20: 007fddb7ecf8 0005 > > ff40: ff01 003b9aca 00555b3868b0 007fa1b40d90 > > ff60: 0a03 007fddb7e5c0 007fddb7e5e0 > > ff80: 00555b358000 00558f56f0e0 00558f574f00 > > ffa0: 00558f574f00 04fa 00558f56f010 007fddb7e600 > > ffc0: 007fa1b40e2c 007fddb7e5c0 007fa1b40e60 6000 > > ffe0: 01200011 00dc 000484000200 08000200 > > Call trace: > > [< (null)>] (null) > > Code: bad PC value > > ---[ end trace 35d24aad799c2941 ]--- > > -%<- > > > > To solve this, it should pause function tracing before the jprobe handler is > called > and unpause it before it returns back to the function it probed. > > diff --git a/arch/arm64/kernel/kprobes.c b/arch/arm64/kernel/kprobes.c > index db2d95c..b21ed00 100644 > --- a/arch/arm64/kernel/kprobes.c > +++ b/arch/arm64/kernel/kprobes.c > @@ -714,6 +714,7 @@ int __kprobes setjmp_pre_handler(struct kprobe *p, struct > pt_regs *regs) > > instruction_pointer_set(regs,
Re: [PATCH] mmc: dw_mmc: Consider HLE errors to be data and command errors
On 05/18/2016 09:47 AM, Doug Anderson wrote: > Jaehoon, > > On Mon, Mar 30, 2015 at 8:47 AM, Doug Andersonwrote: >> Jaehoon, >> >> On Sun, Mar 29, 2015 at 5:55 PM, Jaehoon Chung >> wrote: >>> Dear Doug, >>> >>> I'm considering to control HLE error..So holding this patch. >>> If this is absolutely necessary patch, let me know, plz. >>> >>> Best Regards, >>> Jaehoon Chung >> >> Sounds OK. I have certainly applied this locally and the driver isn't >> robust against insertions / removals without it, but once the card is >> inserted things are OK so it's probably not urgent that it be applied >> upstream. Hopefully we can figure out a better solution... > > I'm now testing a nice new rebased kernel and I'm hitting this again. > > Of course I'll just pick my same patch to my new kernel tree, but > since it's been a year and nobody has done anything better, would you > consider landing my patch? It is certainly better than nothing. Sure, it's right. I think that main reason of HLE is wait_prvdata_complete. (I'm guessing..) On other hands, dwmmc controller is handling something wrong. (I found that HLE is occurred the similar case.) After find the main solution, it's not bad that your patch is applied on dwmmc controller. Ulf have sent PR for next..So if we needs to apply this, i will apply on fix. Best Regards, Jaehoon Chung > > -Doug > -- > To unsubscribe from this list: send the line "unsubscribe linux-mmc" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > >
Re: [PATCH] mmc: dw_mmc: Consider HLE errors to be data and command errors
On 05/18/2016 09:47 AM, Doug Anderson wrote: > Jaehoon, > > On Mon, Mar 30, 2015 at 8:47 AM, Doug Anderson wrote: >> Jaehoon, >> >> On Sun, Mar 29, 2015 at 5:55 PM, Jaehoon Chung >> wrote: >>> Dear Doug, >>> >>> I'm considering to control HLE error..So holding this patch. >>> If this is absolutely necessary patch, let me know, plz. >>> >>> Best Regards, >>> Jaehoon Chung >> >> Sounds OK. I have certainly applied this locally and the driver isn't >> robust against insertions / removals without it, but once the card is >> inserted things are OK so it's probably not urgent that it be applied >> upstream. Hopefully we can figure out a better solution... > > I'm now testing a nice new rebased kernel and I'm hitting this again. > > Of course I'll just pick my same patch to my new kernel tree, but > since it's been a year and nobody has done anything better, would you > consider landing my patch? It is certainly better than nothing. Sure, it's right. I think that main reason of HLE is wait_prvdata_complete. (I'm guessing..) On other hands, dwmmc controller is handling something wrong. (I found that HLE is occurred the similar case.) After find the main solution, it's not bad that your patch is applied on dwmmc controller. Ulf have sent PR for next..So if we needs to apply this, i will apply on fix. Best Regards, Jaehoon Chung > > -Doug > -- > To unsubscribe from this list: send the line "unsubscribe linux-mmc" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > >
[PATCH v2] Drivers: hv: vmbus: fix the race when querying & updating the percpu list
There is a rare race when we remove an entry from the global list hv_context.percpu_list[cpu] in hv_process_channel_removal() -> percpu_channel_deq() -> list_del(): at this time, if vmbus_on_event() -> process_chn_event() -> pcpu_relid2channel() is trying to query the list, we can get the general protection fault: general protection fault: [#1] SMP ... RIP: 0010:[] [] vmbus_on_event+0xc4/0x149 Similarly, we also have the issue in the code path: vmbus_process_offer() -> percpu_channel_enq(). We can resolve the issue by disabling the tasklet when updating the list. Reported-by: Rolf NeugebauerCc: Vitaly Kuznetsov Signed-off-by: Dexuan Cui --- v2: added tasklet_schedule() after tasklet_enable(). Thanks, Vitaly! drivers/hv/channel.c | 5 + drivers/hv/channel_mgmt.c | 24 +--- include/linux/hyperv.h| 3 +++ 3 files changed, 21 insertions(+), 11 deletions(-) diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c index 56dd261..17c4711 100644 --- a/drivers/hv/channel.c +++ b/drivers/hv/channel.c @@ -546,8 +546,11 @@ static int vmbus_close_internal(struct vmbus_channel *channel) put_cpu(); smp_call_function_single(channel->target_cpu, reset_channel_cb, channel, true); + smp_call_function_single(channel->target_cpu, +percpu_channel_deq, channel, true); } else { reset_channel_cb(channel); + percpu_channel_deq(channel); put_cpu(); } @@ -592,6 +595,8 @@ static int vmbus_close_internal(struct vmbus_channel *channel) out: tasklet_enable(tasklet); + /* for possible pending event */ + tasklet_schedule(tasklet); return ret; } diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c index 38b682ba..8e251e3 100644 --- a/drivers/hv/channel_mgmt.c +++ b/drivers/hv/channel_mgmt.c @@ -21,6 +21,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include +#include #include #include #include @@ -277,7 +278,7 @@ static void free_channel(struct vmbus_channel *channel) kfree(channel); } -static void percpu_channel_enq(void *arg) +void percpu_channel_enq(void *arg) { struct vmbus_channel *channel = arg; int cpu = smp_processor_id(); @@ -285,7 +286,7 @@ static void percpu_channel_enq(void *arg) list_add_tail(>percpu_list, _context.percpu_list[cpu]); } -static void percpu_channel_deq(void *arg) +void percpu_channel_deq(void *arg) { struct vmbus_channel *channel = arg; @@ -313,15 +314,6 @@ void hv_process_channel_removal(struct vmbus_channel *channel, u32 relid) BUG_ON(!channel->rescind); BUG_ON(!mutex_is_locked(_connection.channel_mutex)); - if (channel->target_cpu != get_cpu()) { - put_cpu(); - smp_call_function_single(channel->target_cpu, -percpu_channel_deq, channel, true); - } else { - percpu_channel_deq(channel); - put_cpu(); - } - if (channel->primary_channel == NULL) { list_del(>listentry); @@ -363,6 +355,7 @@ void vmbus_free_channels(void) */ static void vmbus_process_offer(struct vmbus_channel *newchannel) { + struct tasklet_struct *tasklet; struct vmbus_channel *channel; bool fnew = true; unsigned long flags; @@ -409,6 +402,8 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel) init_vp_index(newchannel, dev_type); + tasklet = hv_context.event_dpc[newchannel->target_cpu]; + tasklet_disable(tasklet); if (newchannel->target_cpu != get_cpu()) { put_cpu(); smp_call_function_single(newchannel->target_cpu, @@ -418,6 +413,9 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel) percpu_channel_enq(newchannel); put_cpu(); } + tasklet_enable(tasklet); + /* for possible pending event */ + tasklet_schedule(tasklet); /* * This state is used to indicate a successful open @@ -469,6 +467,7 @@ err_deq_chan: list_del(>listentry); mutex_unlock(_connection.channel_mutex); + tasklet_disable(tasklet); if (newchannel->target_cpu != get_cpu()) { put_cpu(); smp_call_function_single(newchannel->target_cpu, @@ -477,6 +476,9 @@ err_deq_chan: percpu_channel_deq(newchannel); put_cpu(); } + tasklet_enable(tasklet); + /* for possible pending event */ + tasklet_schedule(tasklet); err_free_chan: free_channel(newchannel); diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 7be7237..95aea09 100644 --- a/include/linux/hyperv.h +++
[PATCH v2] Drivers: hv: vmbus: fix the race when querying & updating the percpu list
There is a rare race when we remove an entry from the global list hv_context.percpu_list[cpu] in hv_process_channel_removal() -> percpu_channel_deq() -> list_del(): at this time, if vmbus_on_event() -> process_chn_event() -> pcpu_relid2channel() is trying to query the list, we can get the general protection fault: general protection fault: [#1] SMP ... RIP: 0010:[] [] vmbus_on_event+0xc4/0x149 Similarly, we also have the issue in the code path: vmbus_process_offer() -> percpu_channel_enq(). We can resolve the issue by disabling the tasklet when updating the list. Reported-by: Rolf Neugebauer Cc: Vitaly Kuznetsov Signed-off-by: Dexuan Cui --- v2: added tasklet_schedule() after tasklet_enable(). Thanks, Vitaly! drivers/hv/channel.c | 5 + drivers/hv/channel_mgmt.c | 24 +--- include/linux/hyperv.h| 3 +++ 3 files changed, 21 insertions(+), 11 deletions(-) diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c index 56dd261..17c4711 100644 --- a/drivers/hv/channel.c +++ b/drivers/hv/channel.c @@ -546,8 +546,11 @@ static int vmbus_close_internal(struct vmbus_channel *channel) put_cpu(); smp_call_function_single(channel->target_cpu, reset_channel_cb, channel, true); + smp_call_function_single(channel->target_cpu, +percpu_channel_deq, channel, true); } else { reset_channel_cb(channel); + percpu_channel_deq(channel); put_cpu(); } @@ -592,6 +595,8 @@ static int vmbus_close_internal(struct vmbus_channel *channel) out: tasklet_enable(tasklet); + /* for possible pending event */ + tasklet_schedule(tasklet); return ret; } diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c index 38b682ba..8e251e3 100644 --- a/drivers/hv/channel_mgmt.c +++ b/drivers/hv/channel_mgmt.c @@ -21,6 +21,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt #include +#include #include #include #include @@ -277,7 +278,7 @@ static void free_channel(struct vmbus_channel *channel) kfree(channel); } -static void percpu_channel_enq(void *arg) +void percpu_channel_enq(void *arg) { struct vmbus_channel *channel = arg; int cpu = smp_processor_id(); @@ -285,7 +286,7 @@ static void percpu_channel_enq(void *arg) list_add_tail(>percpu_list, _context.percpu_list[cpu]); } -static void percpu_channel_deq(void *arg) +void percpu_channel_deq(void *arg) { struct vmbus_channel *channel = arg; @@ -313,15 +314,6 @@ void hv_process_channel_removal(struct vmbus_channel *channel, u32 relid) BUG_ON(!channel->rescind); BUG_ON(!mutex_is_locked(_connection.channel_mutex)); - if (channel->target_cpu != get_cpu()) { - put_cpu(); - smp_call_function_single(channel->target_cpu, -percpu_channel_deq, channel, true); - } else { - percpu_channel_deq(channel); - put_cpu(); - } - if (channel->primary_channel == NULL) { list_del(>listentry); @@ -363,6 +355,7 @@ void vmbus_free_channels(void) */ static void vmbus_process_offer(struct vmbus_channel *newchannel) { + struct tasklet_struct *tasklet; struct vmbus_channel *channel; bool fnew = true; unsigned long flags; @@ -409,6 +402,8 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel) init_vp_index(newchannel, dev_type); + tasklet = hv_context.event_dpc[newchannel->target_cpu]; + tasklet_disable(tasklet); if (newchannel->target_cpu != get_cpu()) { put_cpu(); smp_call_function_single(newchannel->target_cpu, @@ -418,6 +413,9 @@ static void vmbus_process_offer(struct vmbus_channel *newchannel) percpu_channel_enq(newchannel); put_cpu(); } + tasklet_enable(tasklet); + /* for possible pending event */ + tasklet_schedule(tasklet); /* * This state is used to indicate a successful open @@ -469,6 +467,7 @@ err_deq_chan: list_del(>listentry); mutex_unlock(_connection.channel_mutex); + tasklet_disable(tasklet); if (newchannel->target_cpu != get_cpu()) { put_cpu(); smp_call_function_single(newchannel->target_cpu, @@ -477,6 +476,9 @@ err_deq_chan: percpu_channel_deq(newchannel); put_cpu(); } + tasklet_enable(tasklet); + /* for possible pending event */ + tasklet_schedule(tasklet); err_free_chan: free_channel(newchannel); diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 7be7237..95aea09 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1328,6 +1328,9 @@ extern bool
Re: [PATCH] mmc: dw_mmc: Consider HLE errors to be data and command errors
Hi Doug, On 2016-5-18 8:47, Doug Anderson wrote: Jaehoon, On Mon, Mar 30, 2015 at 8:47 AM, Doug Andersonwrote: Jaehoon, On Sun, Mar 29, 2015 at 5:55 PM, Jaehoon Chung wrote: Dear Doug, I'm considering to control HLE error..So holding this patch. If this is absolutely necessary patch, let me know, plz. Best Regards, Jaehoon Chung Sounds OK. I have certainly applied this locally and the driver isn't robust against insertions / removals without it, but once the card is inserted things are OK so it's probably not urgent that it be applied upstream. Hopefully we can figure out a better solution... I'm now testing a nice new rebased kernel and I'm hitting this again. Of course I'll just pick my same patch to my new kernel tree, but since it's been a year and nobody has done anything better, would you consider landing my patch? It is certainly better than nothing. Could you try this patch to see if you can still find HLE? @@ -2356,12 +2356,22 @@ static void dw_mci_cmd_interrupt(struct dw_mci *host, u32 status) static void dw_mci_handle_cd(struct dw_mci *host) { int i; + int present; for (i = 0; i < host->num_slots; i++) { struct dw_mci_slot *slot = host->slot[i]; if (!slot) continue; + present = !(mci_readl(slot->host, CDETECT) & (1 << slot->id)); + if (present) + set_bit(DW_MMC_CARD_PRESENT, >flags); + else + clear_bit(DW_MMC_CARD_PRESENT, >flags); if (slot->mmc->ops->card_event) slot->mmc->ops->card_event(slot->mmc); -Doug -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mmc: dw_mmc: Consider HLE errors to be data and command errors
Hi Doug, On 2016-5-18 8:47, Doug Anderson wrote: Jaehoon, On Mon, Mar 30, 2015 at 8:47 AM, Doug Anderson wrote: Jaehoon, On Sun, Mar 29, 2015 at 5:55 PM, Jaehoon Chung wrote: Dear Doug, I'm considering to control HLE error..So holding this patch. If this is absolutely necessary patch, let me know, plz. Best Regards, Jaehoon Chung Sounds OK. I have certainly applied this locally and the driver isn't robust against insertions / removals without it, but once the card is inserted things are OK so it's probably not urgent that it be applied upstream. Hopefully we can figure out a better solution... I'm now testing a nice new rebased kernel and I'm hitting this again. Of course I'll just pick my same patch to my new kernel tree, but since it's been a year and nobody has done anything better, would you consider landing my patch? It is certainly better than nothing. Could you try this patch to see if you can still find HLE? @@ -2356,12 +2356,22 @@ static void dw_mci_cmd_interrupt(struct dw_mci *host, u32 status) static void dw_mci_handle_cd(struct dw_mci *host) { int i; + int present; for (i = 0; i < host->num_slots; i++) { struct dw_mci_slot *slot = host->slot[i]; if (!slot) continue; + present = !(mci_readl(slot->host, CDETECT) & (1 << slot->id)); + if (present) + set_bit(DW_MMC_CARD_PRESENT, >flags); + else + clear_bit(DW_MMC_CARD_PRESENT, >flags); if (slot->mmc->ops->card_event) slot->mmc->ops->card_event(slot->mmc); -Doug -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs
在 2016/5/16 10:50, David Ahern 写道: On 5/15/16 7:30 PM, Hekuang wrote: In previous patch, I use 'perf buildid-cache -a' to add vdso binary into the HOST buildid dir. So 'perf buildid-cache' needs the symfs option? With this patch 'PATCH v3 3/7 UPDATE', the tree of symfs dir is like this: ├── debug($(dso-prefix)) │ ├── .build-id │ │ ├── 3a │ │ │ └── e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd -> ../../[kernel.kallsyms]/3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd │ │ └── 84 │ │ └── dbd75729adba57cc42f5544b25de571c0c8731 -> ../../[vdso32]/84dbd75729adba57cc42f5544b25de571c0c8731 │ ├── [kernel.kallsyms] │ │ └── 3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd │ ├── [vdso] │ │ └── 84dbd75729adba57cc42f5544b25de571c0c8731 │ └── [vdso32] │ └── 84dbd75729adba57cc42f5544b25de571c0c8731 ├── lib │ ├── ld-2.22.so │ └── libc-2.22.so ├── tmp │ └── hello └── xxx So all binaries we need are included in the symfs dir. I think this is consistent with your idea explained in previous mails. With this symfs, we do not need buildid dir anymore and what's your idea on 'perf buildid-cache' needs symfs option? after all, that only effects on buildid dir. Thanks.
Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs
在 2016/5/16 10:50, David Ahern 写道: On 5/15/16 7:30 PM, Hekuang wrote: In previous patch, I use 'perf buildid-cache -a' to add vdso binary into the HOST buildid dir. So 'perf buildid-cache' needs the symfs option? With this patch 'PATCH v3 3/7 UPDATE', the tree of symfs dir is like this: ├── debug($(dso-prefix)) │ ├── .build-id │ │ ├── 3a │ │ │ └── e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd -> ../../[kernel.kallsyms]/3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd │ │ └── 84 │ │ └── dbd75729adba57cc42f5544b25de571c0c8731 -> ../../[vdso32]/84dbd75729adba57cc42f5544b25de571c0c8731 │ ├── [kernel.kallsyms] │ │ └── 3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd │ ├── [vdso] │ │ └── 84dbd75729adba57cc42f5544b25de571c0c8731 │ └── [vdso32] │ └── 84dbd75729adba57cc42f5544b25de571c0c8731 ├── lib │ ├── ld-2.22.so │ └── libc-2.22.so ├── tmp │ └── hello └── xxx So all binaries we need are included in the symfs dir. I think this is consistent with your idea explained in previous mails. With this symfs, we do not need buildid dir anymore and what's your idea on 'perf buildid-cache' needs symfs option? after all, that only effects on buildid dir. Thanks.
Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs
On 5/17/16 7:47 PM, Hekuang wrote: 在 2016/5/16 10:50, David Ahern 写道: On 5/15/16 7:30 PM, Hekuang wrote: In previous patch, I use 'perf buildid-cache -a' to add vdso binary into the HOST buildid dir. So 'perf buildid-cache' needs the symfs option? With this patch 'PATCH v3 3/7 UPDATE', the tree of symfs dir is like this: ├── debug($(dso-prefix)) │ ├── .build-id │ │ ├── 3a │ │ │ └── e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd -> ../../[kernel.kallsyms]/3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd │ │ └── 84 │ │ └── dbd75729adba57cc42f5544b25de571c0c8731 -> ../../[vdso32]/84dbd75729adba57cc42f5544b25de571c0c8731 │ ├── [kernel.kallsyms] │ │ └── 3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd │ ├── [vdso] │ │ └── 84dbd75729adba57cc42f5544b25de571c0c8731 │ └── [vdso32] │ └── 84dbd75729adba57cc42f5544b25de571c0c8731 ├── lib │ ├── ld-2.22.so │ └── libc-2.22.so ├── tmp │ └── hello └── xxx So all binaries we need are included in the symfs dir. I think this is consistent with your idea explained in previous mails. With this symfs, we do not need buildid dir anymore and what's your idea on 'perf buildid-cache' needs symfs option? after all, that only effects on buildid dir. I don't understand why dso-prefix option is needed? Why make me type yet more options to the analysis command? Why can't the directory be located under the symfs tree in a known location and populated the same way it is without symfs?
Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs
On 5/17/16 7:47 PM, Hekuang wrote: 在 2016/5/16 10:50, David Ahern 写道: On 5/15/16 7:30 PM, Hekuang wrote: In previous patch, I use 'perf buildid-cache -a' to add vdso binary into the HOST buildid dir. So 'perf buildid-cache' needs the symfs option? With this patch 'PATCH v3 3/7 UPDATE', the tree of symfs dir is like this: ├── debug($(dso-prefix)) │ ├── .build-id │ │ ├── 3a │ │ │ └── e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd -> ../../[kernel.kallsyms]/3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd │ │ └── 84 │ │ └── dbd75729adba57cc42f5544b25de571c0c8731 -> ../../[vdso32]/84dbd75729adba57cc42f5544b25de571c0c8731 │ ├── [kernel.kallsyms] │ │ └── 3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd │ ├── [vdso] │ │ └── 84dbd75729adba57cc42f5544b25de571c0c8731 │ └── [vdso32] │ └── 84dbd75729adba57cc42f5544b25de571c0c8731 ├── lib │ ├── ld-2.22.so │ └── libc-2.22.so ├── tmp │ └── hello └── xxx So all binaries we need are included in the symfs dir. I think this is consistent with your idea explained in previous mails. With this symfs, we do not need buildid dir anymore and what's your idea on 'perf buildid-cache' needs symfs option? after all, that only effects on buildid dir. I don't understand why dso-prefix option is needed? Why make me type yet more options to the analysis command? Why can't the directory be located under the symfs tree in a known location and populated the same way it is without symfs?
Linux-next parallel cp workload hang
Hi, Parallel cp workload (xfstests generic/273) hangs like blow. It's reproducible with a small chance, less the 1/100 i think. Have hit this in linux-next 20160504 0506 0510 trees, testing on xfs with loop or block device. Ext4 survived several rounds of testing. Linux next 20160510 tree hangs within 500 rounds testing several times. The same tree with vfs parallel lookup patchset reverted survived 900 rounds testing. Reverted commits are attached. Bisecting in this patchset ided this commit: 3b0a3c1ac1598722fc289da19219d14f2a37b31f is the first bad commit commit 3b0a3c1ac1598722fc289da19219d14f2a37b31f Author: Al ViroDate: Wed Apr 20 23:42:46 2016 -0400 simple local filesystems: switch to ->iterate_shared() no changes needed (XFS isn't simple, but it has the same parallelism in the interesting parts exercised from CXFS). With this commit reverted on top of Linux next 0510 tree, 5000+ rounds of testing passed. Although 2000 rounds testing had been conducted before good/bad verdict, i'm not 100 percent sure about all this, since it's so hard to hit, and i am not that lucky.. Bisect log and full blocked state process dump log are also attached. Furthermore, this was first hit when testing fs dax on nvdimm, however it's reproducible without dax mount option, and also reproducible on loop device, just seems harder to hit. Thanks, Xiong [0.771475] INFO: task cp:49033 blocked for more than 120 seconds. [0.794263] Not tainted 4.6.0-rc6-next-20160504 #5 [0.812515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [0.841801] cp D 880b4e977928 0 49033 49014 0x0080 [0.868923] 880b4e977928 880ba275d380 880b8d712b80 880b4e978000 [0.897504] 7fff 0002 880b8d712b80 [0.925234] 880b4e977940 816cbc25 88035a1dabb0 880b4e9779e8 [0.953237] Call Trace: [0.958314] [] schedule+0x35/0x80 [0.974854] [] schedule_timeout+0x231/0x2d0 [0.995728] [] ? down_trylock+0x2d/0x40 [1.015351] [] ? xfs_iext_bno_to_ext+0xa2/0x190 [xfs] [1.040182] [] __down_common+0xaa/0x104 [1.059021] [] ? _xfs_buf_find+0x162/0x340 [xfs] [1.081357] [] __down+0x1d/0x1f [1.097166] [] down+0x41/0x50 [1.112869] [] xfs_buf_lock+0x3c/0xf0 [xfs] [1.134504] [] _xfs_buf_find+0x162/0x340 [xfs] [1.156871] [] xfs_buf_get_map+0x2a/0x270 [xfs] [1.180010] [] xfs_buf_read_map+0x2d/0x180 [xfs] [1.203538] [] xfs_trans_read_buf_map+0xf1/0x300 [xfs] [1.229194] [] xfs_da_read_buf+0xd1/0x100 [xfs] [1.251948] [] xfs_dir3_data_read+0x26/0x60 [xfs] [1.275736] [] xfs_dir2_leaf_readbuf.isra.12+0x1be/0x4a0 [xfs] [1.305094] [] ? down_read+0x12/0x30 [1.323787] [] ? xfs_ilock+0xe4/0x110 [xfs] [1.345114] [] xfs_dir2_leaf_getdents+0x13b/0x3d0 [xfs] [1.371818] [] xfs_readdir+0x1a6/0x1c0 [xfs] [1.393471] [] xfs_file_readdir+0x2b/0x30 [xfs] [1.416874] [] iterate_dir+0x173/0x190 [1.436709] [] ? do_audit_syscall_entry+0x66/0x70 [1.460951] [] SyS_getdents+0x98/0x120 [1.480566] [] ? iterate_dir+0x190/0x190 [1.500909] [] do_syscall_64+0x62/0x110 [1.520847] [] entry_SYSCALL64_slow_path+0x25/0x25 [1.545372] INFO: task cp:49040 blocked for more than 120 seconds. [1.568933] Not tainted 4.6.0-rc6-next-20160504 #5 [1.587943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1.618544] cp D 880b91463b00 0 49040 49016 0x0080 [1.645502] 880b91463b00 880464d5c140 88029b475700 880b91464000 [1.674145] 880411c42610 880411c42628 8802c10bc610 [1.702834] 880b91463b18 816cbc25 88029b475700 880b91463b88 [1.731501] Call Trace: [1.736866] [] schedule+0x35/0x80 [1.754119] [] rwsem_down_read_failed+0xf2/0x140 [1.777411] [] ? xfs_ilock_data_map_shared+0x30/0x40 [xfs] [1.805090] [] call_rwsem_down_read_failed+0x18/0x30 [1.830482] [] down_read+0x20/0x30 [1.848505] [] xfs_ilock+0xe4/0x110 [xfs] [1.869293] [] xfs_ilock_data_map_shared+0x30/0x40 [xfs] [1.896775] [] xfs_dir_open+0x30/0x60 [xfs] [1.917882] [] do_dentry_open+0x20f/0x320 [1.938919] [] ? xfs_file_mmap+0x50/0x50 [xfs] [1.961532] [] vfs_open+0x57/0x60 [1.978945] [] path_openat+0x325/0x14e0 [1.999273] [] ? putname+0x53/0x60 [2.017695] [] do_filp_open+0x91/0x100 [2.036893] [] ? __alloc_fd+0x46/0x180 [2.055479] [] do_sys_open+0x124/0x210 [2.073783] [] ? __audit_syscall_exit+0x1db/0x260 [2.096426] [] SyS_openat+0x14/0x20 [2.113690] [] do_syscall_64+0x62/0x110 [2.132417] [] entry_SYSCALL64_slow_path+0x25/0x25 g273-block-dumps.tar.gz Description: application/gzip
Linux-next parallel cp workload hang
Hi, Parallel cp workload (xfstests generic/273) hangs like blow. It's reproducible with a small chance, less the 1/100 i think. Have hit this in linux-next 20160504 0506 0510 trees, testing on xfs with loop or block device. Ext4 survived several rounds of testing. Linux next 20160510 tree hangs within 500 rounds testing several times. The same tree with vfs parallel lookup patchset reverted survived 900 rounds testing. Reverted commits are attached. Bisecting in this patchset ided this commit: 3b0a3c1ac1598722fc289da19219d14f2a37b31f is the first bad commit commit 3b0a3c1ac1598722fc289da19219d14f2a37b31f Author: Al Viro Date: Wed Apr 20 23:42:46 2016 -0400 simple local filesystems: switch to ->iterate_shared() no changes needed (XFS isn't simple, but it has the same parallelism in the interesting parts exercised from CXFS). With this commit reverted on top of Linux next 0510 tree, 5000+ rounds of testing passed. Although 2000 rounds testing had been conducted before good/bad verdict, i'm not 100 percent sure about all this, since it's so hard to hit, and i am not that lucky.. Bisect log and full blocked state process dump log are also attached. Furthermore, this was first hit when testing fs dax on nvdimm, however it's reproducible without dax mount option, and also reproducible on loop device, just seems harder to hit. Thanks, Xiong [0.771475] INFO: task cp:49033 blocked for more than 120 seconds. [0.794263] Not tainted 4.6.0-rc6-next-20160504 #5 [0.812515] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [0.841801] cp D 880b4e977928 0 49033 49014 0x0080 [0.868923] 880b4e977928 880ba275d380 880b8d712b80 880b4e978000 [0.897504] 7fff 0002 880b8d712b80 [0.925234] 880b4e977940 816cbc25 88035a1dabb0 880b4e9779e8 [0.953237] Call Trace: [0.958314] [] schedule+0x35/0x80 [0.974854] [] schedule_timeout+0x231/0x2d0 [0.995728] [] ? down_trylock+0x2d/0x40 [1.015351] [] ? xfs_iext_bno_to_ext+0xa2/0x190 [xfs] [1.040182] [] __down_common+0xaa/0x104 [1.059021] [] ? _xfs_buf_find+0x162/0x340 [xfs] [1.081357] [] __down+0x1d/0x1f [1.097166] [] down+0x41/0x50 [1.112869] [] xfs_buf_lock+0x3c/0xf0 [xfs] [1.134504] [] _xfs_buf_find+0x162/0x340 [xfs] [1.156871] [] xfs_buf_get_map+0x2a/0x270 [xfs] [1.180010] [] xfs_buf_read_map+0x2d/0x180 [xfs] [1.203538] [] xfs_trans_read_buf_map+0xf1/0x300 [xfs] [1.229194] [] xfs_da_read_buf+0xd1/0x100 [xfs] [1.251948] [] xfs_dir3_data_read+0x26/0x60 [xfs] [1.275736] [] xfs_dir2_leaf_readbuf.isra.12+0x1be/0x4a0 [xfs] [1.305094] [] ? down_read+0x12/0x30 [1.323787] [] ? xfs_ilock+0xe4/0x110 [xfs] [1.345114] [] xfs_dir2_leaf_getdents+0x13b/0x3d0 [xfs] [1.371818] [] xfs_readdir+0x1a6/0x1c0 [xfs] [1.393471] [] xfs_file_readdir+0x2b/0x30 [xfs] [1.416874] [] iterate_dir+0x173/0x190 [1.436709] [] ? do_audit_syscall_entry+0x66/0x70 [1.460951] [] SyS_getdents+0x98/0x120 [1.480566] [] ? iterate_dir+0x190/0x190 [1.500909] [] do_syscall_64+0x62/0x110 [1.520847] [] entry_SYSCALL64_slow_path+0x25/0x25 [1.545372] INFO: task cp:49040 blocked for more than 120 seconds. [1.568933] Not tainted 4.6.0-rc6-next-20160504 #5 [1.587943] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1.618544] cp D 880b91463b00 0 49040 49016 0x0080 [1.645502] 880b91463b00 880464d5c140 88029b475700 880b91464000 [1.674145] 880411c42610 880411c42628 8802c10bc610 [1.702834] 880b91463b18 816cbc25 88029b475700 880b91463b88 [1.731501] Call Trace: [1.736866] [] schedule+0x35/0x80 [1.754119] [] rwsem_down_read_failed+0xf2/0x140 [1.777411] [] ? xfs_ilock_data_map_shared+0x30/0x40 [xfs] [1.805090] [] call_rwsem_down_read_failed+0x18/0x30 [1.830482] [] down_read+0x20/0x30 [1.848505] [] xfs_ilock+0xe4/0x110 [xfs] [1.869293] [] xfs_ilock_data_map_shared+0x30/0x40 [xfs] [1.896775] [] xfs_dir_open+0x30/0x60 [xfs] [1.917882] [] do_dentry_open+0x20f/0x320 [1.938919] [] ? xfs_file_mmap+0x50/0x50 [xfs] [1.961532] [] vfs_open+0x57/0x60 [1.978945] [] path_openat+0x325/0x14e0 [1.999273] [] ? putname+0x53/0x60 [2.017695] [] do_filp_open+0x91/0x100 [2.036893] [] ? __alloc_fd+0x46/0x180 [2.055479] [] do_sys_open+0x124/0x210 [2.073783] [] ? __audit_syscall_exit+0x1db/0x260 [2.096426] [] SyS_openat+0x14/0x20 [2.113690] [] do_syscall_64+0x62/0x110 [2.132417] [] entry_SYSCALL64_slow_path+0x25/0x25 g273-block-dumps.tar.gz Description: application/gzip
Re: [PATCH] doc: self-protection: provide initial details
On Tue, May 17, 2016 at 6:26 PM, Jonathan Corbetwrote: > On Mon, 16 May 2016 19:27:28 -0700 > Kees Cook wrote: > >> This document attempts to codify the intent around kernel self-protection >> along with discussion of both existing and desired technologies, with >> attention given to the rationale behind them, and the expectations of >> their usage. > > I've applied this to the docs tree. In the process, I took the liberty > of applying the suggestions from Randy, hope you don't mind... Ah, thanks! I'll send a follow-up. I had a suggestion for another section and a typo fix. -Kees -- Kees Cook Chrome OS & Brillo Security
Re: [PATCH] doc: self-protection: provide initial details
On Tue, May 17, 2016 at 6:26 PM, Jonathan Corbet wrote: > On Mon, 16 May 2016 19:27:28 -0700 > Kees Cook wrote: > >> This document attempts to codify the intent around kernel self-protection >> along with discussion of both existing and desired technologies, with >> attention given to the rationale behind them, and the expectations of >> their usage. > > I've applied this to the docs tree. In the process, I took the liberty > of applying the suggestions from Randy, hope you don't mind... Ah, thanks! I'll send a follow-up. I had a suggestion for another section and a typo fix. -Kees -- Kees Cook Chrome OS & Brillo Security
[PATCH] mm: disable fault around on emulated access bit architecture
On Tue, May 17, 2016 at 03:34:23PM +0300, Kirill A. Shutemov wrote: > On Mon, May 16, 2016 at 11:56:32PM +0900, Minchan Kim wrote: > > On Mon, May 16, 2016 at 05:29:00PM +0300, Kirill A. Shutemov wrote: > > > > Kirill, > > > > You wanted to test non-HW access bit system and I did. > > > > What's your opinion? > > > > > > Sorry, for late response. > > > > > > My patch is incomlete: we need to find a way to not mark pte as old if we > > > handle page fault for the address the pte represents. > > > > I'm sure you can handle it but my point is there wouldn't be a big gain > > although you can handle it in non-HW access bit system. Okay, let's be > > more clear because I don't have every non-HW access bit architecture. > > At least, current mobile workload in ARM which I have wouldn't be huge > > benefit. > > I will say one more. > > I tested the workload on quad-core system and core speed is not so slow > > compared to recent other mobile phone SoC. Even when I tested the benchmark > > without pte_mkold, the benefit is within noise because storage is really > > slow so major fault is dominant factor. So, I decide test storage from eMMC > > to eSATA. And then finally, I manage to see the a little beneift with > > fault_around without pte_mkold. > > > > However, let's consider side-effect aspect from fault_around. > > > > 1. Increase slab shrinking compard to old > > 2. high level vmpressure compared to old > > > > With considering that regressions on my system, it's really not worth to > > try at the moment. > > That's why I wanted to disable fault_around as default in non-HW access > > bit system. > > Feel free to post such patch. I guess it's reasonable. >From d926a2a19cd0921b34279c3f6a3bae8b7508646d Mon Sep 17 00:00:00 2001 From: Minchan KimDate: Wed, 18 May 2016 08:36:59 +0900 Subject: [PATCH] mm: disable fault around on emulated access bit architecture The fault_around aims for reducing minor fault of file-backed pages via speculative ahead pte mapping with relying on readahead logic. However, on non-HW access bit architecture, the benefit is highly limited because they should emulate young bit with minor fault for page aging algorithm of reclaim. IOW, we cannot reduce minor fault on those architectures. I did quick test in my ARM machine. 512M file mmap sequential every word read on eSATA drive with 4 times. stdev is stable. = fault_around 4096 = elapsed time(usec): 6747645 = fault_around 65536 = elapsed time(usec): 6709263 0.5% gain. Even, when I tested it with eMMC, there is no gain because I guess with slow storage, major fault is more dominant factor. As well, fault_around has side effect to shrink slab more aggressively and higher vmpressure so if such speculation fails, it can evict slab more which can result in page I/O(e.g., inode cache), in the end, it would make void benefit of fault_around. So let's make default disable on those architectures. Cc: Kirill A. Shutemov Cc: linux-a...@vger.kernel.org Signed-off-by: Minchan Kim --- mm/memory.c | 8 1 file changed, 8 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index b762b17aa4c5..9f652fdc0295 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2897,8 +2897,16 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, update_mmu_cache(vma, address, pte); } +/* + * If architecture emulates "accessed" or "young" bit without HW support, + * there is no much gain with fault_around. + */ static unsigned long fault_around_bytes __read_mostly = +#ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS + PAGE_SIZE; +#else rounddown_pow_of_two(65536); +#endif #ifdef CONFIG_DEBUG_FS static int fault_around_bytes_get(void *data, u64 *val) -- 1.9.1
[PATCH] mm: disable fault around on emulated access bit architecture
On Tue, May 17, 2016 at 03:34:23PM +0300, Kirill A. Shutemov wrote: > On Mon, May 16, 2016 at 11:56:32PM +0900, Minchan Kim wrote: > > On Mon, May 16, 2016 at 05:29:00PM +0300, Kirill A. Shutemov wrote: > > > > Kirill, > > > > You wanted to test non-HW access bit system and I did. > > > > What's your opinion? > > > > > > Sorry, for late response. > > > > > > My patch is incomlete: we need to find a way to not mark pte as old if we > > > handle page fault for the address the pte represents. > > > > I'm sure you can handle it but my point is there wouldn't be a big gain > > although you can handle it in non-HW access bit system. Okay, let's be > > more clear because I don't have every non-HW access bit architecture. > > At least, current mobile workload in ARM which I have wouldn't be huge > > benefit. > > I will say one more. > > I tested the workload on quad-core system and core speed is not so slow > > compared to recent other mobile phone SoC. Even when I tested the benchmark > > without pte_mkold, the benefit is within noise because storage is really > > slow so major fault is dominant factor. So, I decide test storage from eMMC > > to eSATA. And then finally, I manage to see the a little beneift with > > fault_around without pte_mkold. > > > > However, let's consider side-effect aspect from fault_around. > > > > 1. Increase slab shrinking compard to old > > 2. high level vmpressure compared to old > > > > With considering that regressions on my system, it's really not worth to > > try at the moment. > > That's why I wanted to disable fault_around as default in non-HW access > > bit system. > > Feel free to post such patch. I guess it's reasonable. >From d926a2a19cd0921b34279c3f6a3bae8b7508646d Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Wed, 18 May 2016 08:36:59 +0900 Subject: [PATCH] mm: disable fault around on emulated access bit architecture The fault_around aims for reducing minor fault of file-backed pages via speculative ahead pte mapping with relying on readahead logic. However, on non-HW access bit architecture, the benefit is highly limited because they should emulate young bit with minor fault for page aging algorithm of reclaim. IOW, we cannot reduce minor fault on those architectures. I did quick test in my ARM machine. 512M file mmap sequential every word read on eSATA drive with 4 times. stdev is stable. = fault_around 4096 = elapsed time(usec): 6747645 = fault_around 65536 = elapsed time(usec): 6709263 0.5% gain. Even, when I tested it with eMMC, there is no gain because I guess with slow storage, major fault is more dominant factor. As well, fault_around has side effect to shrink slab more aggressively and higher vmpressure so if such speculation fails, it can evict slab more which can result in page I/O(e.g., inode cache), in the end, it would make void benefit of fault_around. So let's make default disable on those architectures. Cc: Kirill A. Shutemov Cc: linux-a...@vger.kernel.org Signed-off-by: Minchan Kim --- mm/memory.c | 8 1 file changed, 8 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index b762b17aa4c5..9f652fdc0295 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2897,8 +2897,16 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address, update_mmu_cache(vma, address, pte); } +/* + * If architecture emulates "accessed" or "young" bit without HW support, + * there is no much gain with fault_around. + */ static unsigned long fault_around_bytes __read_mostly = +#ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS + PAGE_SIZE; +#else rounddown_pow_of_two(65536); +#endif #ifdef CONFIG_DEBUG_FS static int fault_around_bytes_get(void *data, u64 *val) -- 1.9.1
[PATCH 1/2] kprobes: add a new module parameter
This patch adds a new module parameter which can be used as the symbol name. With this parameter, the module becomes more flexable. Signed-off-by: Huang Shijie--- samples/kprobes/kprobe_example.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/samples/kprobes/kprobe_example.c b/samples/kprobes/kprobe_example.c index 727eb21..2bb190d 100644 --- a/samples/kprobes/kprobe_example.c +++ b/samples/kprobes/kprobe_example.c @@ -14,9 +14,13 @@ #include #include +#define MAX_SYMBOL_LEN 64 +static char symbol[MAX_SYMBOL_LEN] = "_do_fork"; +module_param_string(symbol, symbol, sizeof(symbol), 0644); + /* For each probe you need to allocate a kprobe structure */ static struct kprobe kp = { - .symbol_name= "_do_fork", + .symbol_name= symbol, }; /* kprobe pre_handler: called just before the probed instruction is executed */ -- 2.5.5
[PATCH 2/2] kprobes: print out the symbol name for the hooks
Print out the symbol name for the hooks, it makes the logs more readable. Signed-off-by: Huang Shijie--- samples/kprobes/kprobe_example.c | 32 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/samples/kprobes/kprobe_example.c b/samples/kprobes/kprobe_example.c index 2bb190d..ed0ca0c 100644 --- a/samples/kprobes/kprobe_example.c +++ b/samples/kprobes/kprobe_example.c @@ -27,24 +27,24 @@ static struct kprobe kp = { static int handler_pre(struct kprobe *p, struct pt_regs *regs) { #ifdef CONFIG_X86 - printk(KERN_INFO "pre_handler: p->addr = 0x%p, ip = %lx," + printk(KERN_INFO "<%s> pre_handler: p->addr = 0x%p, ip = %lx," " flags = 0x%lx\n", - p->addr, regs->ip, regs->flags); + p->symbol_name, p->addr, regs->ip, regs->flags); #endif #ifdef CONFIG_PPC - printk(KERN_INFO "pre_handler: p->addr = 0x%p, nip = 0x%lx," + printk(KERN_INFO "<%s> pre_handler: p->addr = 0x%p, nip = 0x%lx," " msr = 0x%lx\n", - p->addr, regs->nip, regs->msr); + p->symbol_name, p->addr, regs->nip, regs->msr); #endif #ifdef CONFIG_MIPS - printk(KERN_INFO "pre_handler: p->addr = 0x%p, epc = 0x%lx," + printk(KERN_INFO "<%s> pre_handler: p->addr = 0x%p, epc = 0x%lx," " status = 0x%lx\n", - p->addr, regs->cp0_epc, regs->cp0_status); + p->symbol_name, p->addr, regs->cp0_epc, regs->cp0_status); #endif #ifdef CONFIG_TILEGX - printk(KERN_INFO "pre_handler: p->addr = 0x%p, pc = 0x%lx," + printk(KERN_INFO "<%s> pre_handler: p->addr = 0x%p, pc = 0x%lx," " ex1 = 0x%lx\n", - p->addr, regs->pc, regs->ex1); + p->symbol_name, p->addr, regs->pc, regs->ex1); #endif /* A dump_stack() here will give a stack backtrace */ @@ -56,20 +56,20 @@ static void handler_post(struct kprobe *p, struct pt_regs *regs, unsigned long flags) { #ifdef CONFIG_X86 - printk(KERN_INFO "post_handler: p->addr = 0x%p, flags = 0x%lx\n", - p->addr, regs->flags); + printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, flags = 0x%lx\n", + p->symbol_name, p->addr, regs->flags); #endif #ifdef CONFIG_PPC - printk(KERN_INFO "post_handler: p->addr = 0x%p, msr = 0x%lx\n", - p->addr, regs->msr); + printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, msr = 0x%lx\n", + p->symbol_name, p->addr, regs->msr); #endif #ifdef CONFIG_MIPS - printk(KERN_INFO "post_handler: p->addr = 0x%p, status = 0x%lx\n", - p->addr, regs->cp0_status); + printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, status = 0x%lx\n", + p->symbol_name, p->addr, regs->cp0_status); #endif #ifdef CONFIG_TILEGX - printk(KERN_INFO "post_handler: p->addr = 0x%p, ex1 = 0x%lx\n", - p->addr, regs->ex1); + printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, ex1 = 0x%lx\n", + p->symbol_name, p->addr, regs->ex1); #endif } -- 2.5.5
[PATCH 1/2] kprobes: add a new module parameter
This patch adds a new module parameter which can be used as the symbol name. With this parameter, the module becomes more flexable. Signed-off-by: Huang Shijie --- samples/kprobes/kprobe_example.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/samples/kprobes/kprobe_example.c b/samples/kprobes/kprobe_example.c index 727eb21..2bb190d 100644 --- a/samples/kprobes/kprobe_example.c +++ b/samples/kprobes/kprobe_example.c @@ -14,9 +14,13 @@ #include #include +#define MAX_SYMBOL_LEN 64 +static char symbol[MAX_SYMBOL_LEN] = "_do_fork"; +module_param_string(symbol, symbol, sizeof(symbol), 0644); + /* For each probe you need to allocate a kprobe structure */ static struct kprobe kp = { - .symbol_name= "_do_fork", + .symbol_name= symbol, }; /* kprobe pre_handler: called just before the probed instruction is executed */ -- 2.5.5
[PATCH 2/2] kprobes: print out the symbol name for the hooks
Print out the symbol name for the hooks, it makes the logs more readable. Signed-off-by: Huang Shijie --- samples/kprobes/kprobe_example.c | 32 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/samples/kprobes/kprobe_example.c b/samples/kprobes/kprobe_example.c index 2bb190d..ed0ca0c 100644 --- a/samples/kprobes/kprobe_example.c +++ b/samples/kprobes/kprobe_example.c @@ -27,24 +27,24 @@ static struct kprobe kp = { static int handler_pre(struct kprobe *p, struct pt_regs *regs) { #ifdef CONFIG_X86 - printk(KERN_INFO "pre_handler: p->addr = 0x%p, ip = %lx," + printk(KERN_INFO "<%s> pre_handler: p->addr = 0x%p, ip = %lx," " flags = 0x%lx\n", - p->addr, regs->ip, regs->flags); + p->symbol_name, p->addr, regs->ip, regs->flags); #endif #ifdef CONFIG_PPC - printk(KERN_INFO "pre_handler: p->addr = 0x%p, nip = 0x%lx," + printk(KERN_INFO "<%s> pre_handler: p->addr = 0x%p, nip = 0x%lx," " msr = 0x%lx\n", - p->addr, regs->nip, regs->msr); + p->symbol_name, p->addr, regs->nip, regs->msr); #endif #ifdef CONFIG_MIPS - printk(KERN_INFO "pre_handler: p->addr = 0x%p, epc = 0x%lx," + printk(KERN_INFO "<%s> pre_handler: p->addr = 0x%p, epc = 0x%lx," " status = 0x%lx\n", - p->addr, regs->cp0_epc, regs->cp0_status); + p->symbol_name, p->addr, regs->cp0_epc, regs->cp0_status); #endif #ifdef CONFIG_TILEGX - printk(KERN_INFO "pre_handler: p->addr = 0x%p, pc = 0x%lx," + printk(KERN_INFO "<%s> pre_handler: p->addr = 0x%p, pc = 0x%lx," " ex1 = 0x%lx\n", - p->addr, regs->pc, regs->ex1); + p->symbol_name, p->addr, regs->pc, regs->ex1); #endif /* A dump_stack() here will give a stack backtrace */ @@ -56,20 +56,20 @@ static void handler_post(struct kprobe *p, struct pt_regs *regs, unsigned long flags) { #ifdef CONFIG_X86 - printk(KERN_INFO "post_handler: p->addr = 0x%p, flags = 0x%lx\n", - p->addr, regs->flags); + printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, flags = 0x%lx\n", + p->symbol_name, p->addr, regs->flags); #endif #ifdef CONFIG_PPC - printk(KERN_INFO "post_handler: p->addr = 0x%p, msr = 0x%lx\n", - p->addr, regs->msr); + printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, msr = 0x%lx\n", + p->symbol_name, p->addr, regs->msr); #endif #ifdef CONFIG_MIPS - printk(KERN_INFO "post_handler: p->addr = 0x%p, status = 0x%lx\n", - p->addr, regs->cp0_status); + printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, status = 0x%lx\n", + p->symbol_name, p->addr, regs->cp0_status); #endif #ifdef CONFIG_TILEGX - printk(KERN_INFO "post_handler: p->addr = 0x%p, ex1 = 0x%lx\n", - p->addr, regs->ex1); + printk(KERN_INFO "<%s> post_handler: p->addr = 0x%p, ex1 = 0x%lx\n", + p->symbol_name, p->addr, regs->ex1); #endif } -- 2.5.5
Re: [PATCH v12 10/10] kprobes: Add arm64 case in kprobe example module
On Tue, May 17, 2016 at 11:24:27AM +0100, Mark Brown wrote: > On Tue, May 17, 2016 at 05:57:33PM +0800, Huang Shijie wrote: > > On Wed, Apr 27, 2016 at 02:53:05PM -0400, David Long wrote: > > > > +#ifdef CONFIG_ARM64 > > > + pr_info("pre_handler: p->addr = 0x%p, pc = 0x%lx\n", > > > I think you miss the KERN_INFO here. > > That's what pr_info() does over printk() - it adds the KERN_INFO more > cleanly. sorry, I thought the "pr_info" to "printk" when I first read this code. thanks Huang Shijie
Re: [PATCH v12 10/10] kprobes: Add arm64 case in kprobe example module
On Tue, May 17, 2016 at 11:24:27AM +0100, Mark Brown wrote: > On Tue, May 17, 2016 at 05:57:33PM +0800, Huang Shijie wrote: > > On Wed, Apr 27, 2016 at 02:53:05PM -0400, David Long wrote: > > > > +#ifdef CONFIG_ARM64 > > > + pr_info("pre_handler: p->addr = 0x%p, pc = 0x%lx\n", > > > I think you miss the KERN_INFO here. > > That's what pr_info() does over printk() - it adds the KERN_INFO more > cleanly. sorry, I thought the "pr_info" to "printk" when I first read this code. thanks Huang Shijie
Re: [f2fs-dev] [PATCH] f2fs: use bio count instead of F2FS_WRITEBACK page count
On Wed, May 18, 2016 at 09:17:00AM +0800, Chao Yu wrote: > Hi Jaegeuk, > > On 2016/5/18 8:44, Jaegeuk Kim wrote: > > This can reduce page counting overhead. > > We change to increase one reference for one bio, but block layer can split or > merge bios by itself, and write_end will be called per bio, so the reference > may > be maintained incorrectly? Well, block layer will merge bios in a same request, and then finally call end_io for each original bios, no? So far I've seen no error in any test cases. Am I missing something? Thanks, > > Thanks, > > > > > Signed-off-by: Jaegeuk Kim> > --- > > fs/f2fs/checkpoint.c | 2 +- > > fs/f2fs/data.c | 26 +++--- > > fs/f2fs/debug.c | 6 +++--- > > fs/f2fs/f2fs.h | 4 ++-- > > fs/f2fs/super.c | 2 +- > > 5 files changed, 22 insertions(+), 18 deletions(-) > > > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > > index d04113b..447e2a9 100644 > > --- a/fs/f2fs/checkpoint.c > > +++ b/fs/f2fs/checkpoint.c > > @@ -914,7 +914,7 @@ static void wait_on_all_pages_writeback(struct > > f2fs_sb_info *sbi) > > for (;;) { > > prepare_to_wait(>cp_wait, , TASK_UNINTERRUPTIBLE); > > > > - if (!get_pages(sbi, F2FS_WRITEBACK)) > > + if (!atomic_read(>nr_wb_bios)) > > break; > > > > io_schedule_timeout(5*HZ); > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > > index 1013836..faef666 100644 > > --- a/fs/f2fs/data.c > > +++ b/fs/f2fs/data.c > > @@ -71,10 +71,9 @@ static void f2fs_write_end_io(struct bio *bio) > > f2fs_stop_checkpoint(sbi); > > } > > end_page_writeback(page); > > - dec_page_count(sbi, F2FS_WRITEBACK); > > } > > - > > - if (!get_pages(sbi, F2FS_WRITEBACK) && wq_has_sleeper(>cp_wait)) > > + if (atomic_dec_and_test(>nr_wb_bios) && > > + wq_has_sleeper(>cp_wait)) > > wake_up(>cp_wait); > > > > bio_put(bio); > > @@ -98,6 +97,14 @@ static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, > > block_t blk_addr, > > return bio; > > } > > > > +static inline void __submit_bio(struct f2fs_sb_info *sbi, int rw, > > + struct bio *bio) > > +{ > > + if (!is_read_io(rw)) > > + atomic_inc(>nr_wb_bios); > > + submit_bio(rw, bio); > > +} > > + > > static void __submit_merged_bio(struct f2fs_bio_info *io) > > { > > struct f2fs_io_info *fio = >fio; > > @@ -110,7 +117,7 @@ static void __submit_merged_bio(struct f2fs_bio_info > > *io) > > else > > trace_f2fs_submit_write_bio(io->sbi->sb, fio, io->bio); > > > > - submit_bio(fio->rw, io->bio); > > + __submit_bio(io->sbi, fio->rw, io->bio); > > io->bio = NULL; > > } > > > > @@ -228,7 +235,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio) > > return -EFAULT; > > } > > > > - submit_bio(fio->rw, bio); > > + __submit_bio(fio->sbi, fio->rw, bio); > > return 0; > > } > > > > @@ -248,9 +255,6 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio) > > > > down_write(>io_rwsem); > > > > - if (!is_read) > > - inc_page_count(sbi, F2FS_WRITEBACK); > > - > > if (io->bio && (io->last_block_in_bio != fio->new_blkaddr - 1 || > > io->fio.rw != fio->rw)) > > __submit_merged_bio(io); > > @@ -1047,7 +1051,7 @@ got_it: > > */ > > if (bio && (last_block_in_bio != block_nr - 1)) { > > submit_and_realloc: > > - submit_bio(READ, bio); > > + __submit_bio(F2FS_I_SB(inode), READ, bio); > > bio = NULL; > > } > > if (bio == NULL) { > > @@ -1090,7 +1094,7 @@ set_error_page: > > goto next_page; > > confused: > > if (bio) { > > - submit_bio(READ, bio); > > + __submit_bio(F2FS_I_SB(inode), READ, bio); > > bio = NULL; > > } > > unlock_page(page); > > @@ -1100,7 +1104,7 @@ next_page: > > } > > BUG_ON(pages && !list_empty(pages)); > > if (bio) > > - submit_bio(READ, bio); > > + __submit_bio(F2FS_I_SB(inode), READ, bio); > > return 0; > > } > > > > diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c > > index 37615b2..a188973 100644 > > --- a/fs/f2fs/debug.c > > +++ b/fs/f2fs/debug.c > > @@ -48,7 +48,7 @@ static void update_general_status(struct f2fs_sb_info > > *sbi) > > si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE]; > > si->ndirty_files = sbi->ndirty_inode[FILE_INODE]; > > si->inmem_pages = get_pages(sbi, F2FS_INMEM_PAGES); > > - si->wb_pages = get_pages(sbi, F2FS_WRITEBACK); > > + si->wb_bios = atomic_read(>nr_wb_bios); > > si->total_count = (int)sbi->user_block_count / sbi->blocks_per_seg; > > si->rsvd_segs = reserved_segments(sbi); > >
Re: [f2fs-dev] [PATCH] f2fs: use bio count instead of F2FS_WRITEBACK page count
On Wed, May 18, 2016 at 09:17:00AM +0800, Chao Yu wrote: > Hi Jaegeuk, > > On 2016/5/18 8:44, Jaegeuk Kim wrote: > > This can reduce page counting overhead. > > We change to increase one reference for one bio, but block layer can split or > merge bios by itself, and write_end will be called per bio, so the reference > may > be maintained incorrectly? Well, block layer will merge bios in a same request, and then finally call end_io for each original bios, no? So far I've seen no error in any test cases. Am I missing something? Thanks, > > Thanks, > > > > > Signed-off-by: Jaegeuk Kim > > --- > > fs/f2fs/checkpoint.c | 2 +- > > fs/f2fs/data.c | 26 +++--- > > fs/f2fs/debug.c | 6 +++--- > > fs/f2fs/f2fs.h | 4 ++-- > > fs/f2fs/super.c | 2 +- > > 5 files changed, 22 insertions(+), 18 deletions(-) > > > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > > index d04113b..447e2a9 100644 > > --- a/fs/f2fs/checkpoint.c > > +++ b/fs/f2fs/checkpoint.c > > @@ -914,7 +914,7 @@ static void wait_on_all_pages_writeback(struct > > f2fs_sb_info *sbi) > > for (;;) { > > prepare_to_wait(>cp_wait, , TASK_UNINTERRUPTIBLE); > > > > - if (!get_pages(sbi, F2FS_WRITEBACK)) > > + if (!atomic_read(>nr_wb_bios)) > > break; > > > > io_schedule_timeout(5*HZ); > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c > > index 1013836..faef666 100644 > > --- a/fs/f2fs/data.c > > +++ b/fs/f2fs/data.c > > @@ -71,10 +71,9 @@ static void f2fs_write_end_io(struct bio *bio) > > f2fs_stop_checkpoint(sbi); > > } > > end_page_writeback(page); > > - dec_page_count(sbi, F2FS_WRITEBACK); > > } > > - > > - if (!get_pages(sbi, F2FS_WRITEBACK) && wq_has_sleeper(>cp_wait)) > > + if (atomic_dec_and_test(>nr_wb_bios) && > > + wq_has_sleeper(>cp_wait)) > > wake_up(>cp_wait); > > > > bio_put(bio); > > @@ -98,6 +97,14 @@ static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, > > block_t blk_addr, > > return bio; > > } > > > > +static inline void __submit_bio(struct f2fs_sb_info *sbi, int rw, > > + struct bio *bio) > > +{ > > + if (!is_read_io(rw)) > > + atomic_inc(>nr_wb_bios); > > + submit_bio(rw, bio); > > +} > > + > > static void __submit_merged_bio(struct f2fs_bio_info *io) > > { > > struct f2fs_io_info *fio = >fio; > > @@ -110,7 +117,7 @@ static void __submit_merged_bio(struct f2fs_bio_info > > *io) > > else > > trace_f2fs_submit_write_bio(io->sbi->sb, fio, io->bio); > > > > - submit_bio(fio->rw, io->bio); > > + __submit_bio(io->sbi, fio->rw, io->bio); > > io->bio = NULL; > > } > > > > @@ -228,7 +235,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio) > > return -EFAULT; > > } > > > > - submit_bio(fio->rw, bio); > > + __submit_bio(fio->sbi, fio->rw, bio); > > return 0; > > } > > > > @@ -248,9 +255,6 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio) > > > > down_write(>io_rwsem); > > > > - if (!is_read) > > - inc_page_count(sbi, F2FS_WRITEBACK); > > - > > if (io->bio && (io->last_block_in_bio != fio->new_blkaddr - 1 || > > io->fio.rw != fio->rw)) > > __submit_merged_bio(io); > > @@ -1047,7 +1051,7 @@ got_it: > > */ > > if (bio && (last_block_in_bio != block_nr - 1)) { > > submit_and_realloc: > > - submit_bio(READ, bio); > > + __submit_bio(F2FS_I_SB(inode), READ, bio); > > bio = NULL; > > } > > if (bio == NULL) { > > @@ -1090,7 +1094,7 @@ set_error_page: > > goto next_page; > > confused: > > if (bio) { > > - submit_bio(READ, bio); > > + __submit_bio(F2FS_I_SB(inode), READ, bio); > > bio = NULL; > > } > > unlock_page(page); > > @@ -1100,7 +1104,7 @@ next_page: > > } > > BUG_ON(pages && !list_empty(pages)); > > if (bio) > > - submit_bio(READ, bio); > > + __submit_bio(F2FS_I_SB(inode), READ, bio); > > return 0; > > } > > > > diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c > > index 37615b2..a188973 100644 > > --- a/fs/f2fs/debug.c > > +++ b/fs/f2fs/debug.c > > @@ -48,7 +48,7 @@ static void update_general_status(struct f2fs_sb_info > > *sbi) > > si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE]; > > si->ndirty_files = sbi->ndirty_inode[FILE_INODE]; > > si->inmem_pages = get_pages(sbi, F2FS_INMEM_PAGES); > > - si->wb_pages = get_pages(sbi, F2FS_WRITEBACK); > > + si->wb_bios = atomic_read(>nr_wb_bios); > > si->total_count = (int)sbi->user_block_count / sbi->blocks_per_seg; > > si->rsvd_segs = reserved_segments(sbi); > > si->overp_segs =
Re: [PATCH] Staging: comedi: quatech_daqp_cs.c: fixed a warning issue
On Tue, May 17, 2016 at 06:47:56AM -0700, Greg KH wrote: > A: http://en.wikipedia.org/wiki/Top_post > Q: Were do I find info about this thing called top-posting? > A: Because it messes up the order in which people normally read text. > Q: Why is top-posting such a bad thing? > A: Top-posting. > Q: What is the most annoying thing in e-mail? > > A: No. > Q: Should I include quotations after my reply? > > http://daringfireball.net/2007/07/on_top Thanks for this valuable information. > > On Tue, May 17, 2016 at 09:31:56AM +0530, Amit Ghadge wrote: > > Hello Greg KH, > > > > I make patch same like other, I'm new and I nerver see changelog in other > > patches. > > > > Where to add changelog? I followed you are tutorial. > > It's the area in the email before the patch, it ends up in the changelog > when the patch is committed to the kernel tree. You wrote something > this time, but it was vague and didn't make sense. Please fix that up > and resend. I resend this patch with patch description. > > greg k-h
Re: [PATCH] Staging: comedi: quatech_daqp_cs.c: fixed a warning issue
On Tue, May 17, 2016 at 06:47:56AM -0700, Greg KH wrote: > A: http://en.wikipedia.org/wiki/Top_post > Q: Were do I find info about this thing called top-posting? > A: Because it messes up the order in which people normally read text. > Q: Why is top-posting such a bad thing? > A: Top-posting. > Q: What is the most annoying thing in e-mail? > > A: No. > Q: Should I include quotations after my reply? > > http://daringfireball.net/2007/07/on_top Thanks for this valuable information. > > On Tue, May 17, 2016 at 09:31:56AM +0530, Amit Ghadge wrote: > > Hello Greg KH, > > > > I make patch same like other, I'm new and I nerver see changelog in other > > patches. > > > > Where to add changelog? I followed you are tutorial. > > It's the area in the email before the patch, it ends up in the changelog > when the patch is committed to the kernel tree. You wrote something > this time, but it was vague and didn't make sense. Please fix that up > and resend. I resend this patch with patch description. > > greg k-h
[PATCH v4 3/5] locking/rwsem: Don't wake up one's own task
As rwsem_down_read_failed() will queue itself and potentially call __rwsem_do_wake(sem, RWSEM_WAKE_ANY), it is possible that a reader will try to wake up its own task. This patch adds a check to make sure that this won't happen. Signed-off-by: Waiman LongReviewed-by: Peter Hurley --- kernel/locking/rwsem-xadd.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index c278f5a..007814f 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -202,7 +202,8 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type) */ smp_mb(); waiter->task = NULL; - wake_up_process(tsk); + if (tsk != current) + wake_up_process(tsk); put_task_struct(tsk); } while (--loop); -- 1.7.1
[PATCH v4 5/5] locking/rwsem: Streamline the rwsem_optimistic_spin() code
This patch moves the owner loading and checking code entirely inside of rwsem_spin_on_owner() to simplify the logic of rwsem_optimistic_spin() loop. Suggested-by: Peter HurleySigned-off-by: Waiman Long Reviewed-by: Peter Hurley --- kernel/locking/rwsem-xadd.c | 38 -- 1 files changed, 20 insertions(+), 18 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index e3a7e06..a85a2bd 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -332,9 +332,16 @@ done: return ret; } -static noinline -bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner) +/* + * Return true only if we can still spin on the owner field of the rwsem. + */ +static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem) { + struct task_struct *owner = READ_ONCE(sem->owner); + + if (!rwsem_owner_is_writer(owner)) + goto out; + rcu_read_lock(); while (sem->owner == owner) { /* @@ -354,7 +361,7 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner) cpu_relax_lowlatency(); } rcu_read_unlock(); - +out: /* * If there is a new owner or the owner is not set, we continue * spinning. @@ -364,7 +371,6 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner) static bool rwsem_optimistic_spin(struct rw_semaphore *sem) { - struct task_struct *owner; bool taken = false; preempt_disable(); @@ -376,21 +382,17 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) if (!osq_lock(>osq)) goto done; - while (true) { - owner = READ_ONCE(sem->owner); + /* +* Optimistically spin on the owner field and attempt to acquire the +* lock whenever the owner changes. Spinning will be stopped when: +* 1) the owning writer isn't running; or +* 2) readers own the lock as we can't determine if they are +* actively running or not. +*/ + while (rwsem_spin_on_owner(sem)) { /* -* Don't spin if -* 1) the owner is a reader as we we can't determine if the -*reader is actively running or not. -* 2) The rwsem_spin_on_owner() returns false which means -*the owner isn't running. +* Try to acquire the lock */ - if (rwsem_owner_is_reader(owner) || - (rwsem_owner_is_writer(owner) && - !rwsem_spin_on_owner(sem, owner))) - break; - - /* wait_lock will be acquired if write_lock is obtained */ if (rwsem_try_write_lock_unqueued(sem)) { taken = true; break; @@ -402,7 +404,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) * we're an RT task that will live-lock because we won't let * the owner complete. */ - if (!owner && (need_resched() || rt_task(current))) + if (!sem->owner && (need_resched() || rt_task(current))) break; /* -- 1.7.1
[PATCH v4 0/5] [PATCH v3 0/4] locking/rwsem: Add reader-owned state to the owner field
v3->v4: - Add a new patch 2 to use WRITE_ONCE() for all rwsem->owner stores to prevent store tearing. v2->v3: - Make minor code changes as suggested by PeterZ & Peter Hurley. - Add 2 minor patches (#2 & #3) to improve the rwsem code - Add a 4th patch to streamline the rwsem_optimistic_spin() code. v1->v2: - Add rwsem_is_reader_owned() helper & rename rwsem_reader_owned() to rwsem_set_reader_owned(). - Add more comments to clarify the purpose of some of the code changes. Patch 1 is the main patch of this series. Patch 2 protects against store tearing of rwsem->owner field which can cause problem when a reader tries to dereference it. Patch 3 eliminates redundant wakeup caused by a reader waking itself. Patch 4 improves the efficiency of the reader wakeup code. Patch 5 streamlines the rwsem_optimistic_spin() to make it simpler. Waiman Long (5): locking/rwsem: Add reader-owned state to the owner field locking/rwsem: Protect all writes to owner by WRITE_ONCE() locking/rwsem: Don't wake up one's own task locking/rwsem: Improve reader wakeup code locking/rwsem: Streamline the rwsem_optimistic_spin() code kernel/locking/rwsem-xadd.c | 75 -- kernel/locking/rwsem.c |8 +++- kernel/locking/rwsem.h | 52 - 3 files changed, 99 insertions(+), 36 deletions(-)
[PATCH v4 1/5] locking/rwsem: Add reader-owned state to the owner field
Currently, it is not possible to determine for sure if a reader owns a rwsem by looking at the content of the rwsem data structure. This patch adds a new state RWSEM_READER_OWNED to the owner field to indicate that readers currently own the lock. This enables us to address the following 2 issues in the rwsem optimistic spinning code: 1) rwsem_can_spin_on_owner() will disallow optimistic spinning if the owner field is NULL which can mean either the readers own the lock or the owning writer hasn't set the owner field yet. In the latter case, we miss the chance to do optimistic spinning. 2) While a writer is waiting in the OSQ and a reader takes the lock, the writer will continue to spin when out of the OSQ in the main rwsem_optimistic_spin() loop as the owner field is NULL wasting CPU cycles if some of readers are sleeping. Adding the new state will allow optimistic spinning to go forward as long as the owner field is not RWSEM_READER_OWNED and the owner is running, if set, but stop immediately when that state has been reached. On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the fio test with multithreaded randrw and randwrite tests on the same file on a XFS partition on top of a NVDIMM were run, the aggregated bandwidths before and after the patch were as follows: Test BW before patch BW after patch % change --- -- randrw 988 MB/s 1192 MB/s +21% randwrite 1513 MB/s 1623 MB/s +7.3% The perf profile of the rwsem_down_write_failed() function in randrw before and after the patch were: 19.95% 5.88% fio [kernel.vmlinux] [k] rwsem_down_write_failed 14.20% 1.52% fio [kernel.vmlinux] [k] rwsem_down_write_failed The actual CPU cycles spend in rwsem_down_write_failed() dropped from 5.88% to 1.52% after the patch. The xfstests was also run and no regression was observed. Signed-off-by: Waiman LongAcked-by: Jason Low Acked-by: Davidlohr Bueso --- kernel/locking/rwsem-xadd.c | 41 ++--- kernel/locking/rwsem.c |8 ++-- kernel/locking/rwsem.h | 41 + 3 files changed, 69 insertions(+), 21 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 09e30c6..c278f5a 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -155,6 +155,12 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type) /* Last active locker left. Retry waking readers. */ goto try_reader_grant; } + /* +* It is not really necessary to set it to reader-owned here, +* but it gives the spinners an early indication that the +* readers now have the lock. +*/ + rwsem_set_reader_owned(sem); } /* Grant an infinite number of read locks to the readers at the front @@ -306,16 +312,11 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) rcu_read_lock(); owner = READ_ONCE(sem->owner); - if (!owner) { - long count = READ_ONCE(sem->count); + if (!rwsem_owner_is_writer(owner)) { /* -* If sem->owner is not set, yet we have just recently entered the -* slowpath with the lock being active, then there is a possibility -* reader(s) may have the lock. To be safe, bail spinning in these -* situations. +* Don't spin if the rwsem is readers owned. */ - if (count & RWSEM_ACTIVE_MASK) - ret = false; + ret = !rwsem_owner_is_reader(owner); goto done; } @@ -328,8 +329,6 @@ done: static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner) { - long count; - rcu_read_lock(); while (sem->owner == owner) { /* @@ -350,16 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner) } rcu_read_unlock(); - if (READ_ONCE(sem->owner)) - return true; /* new owner, continue spinning */ - /* -* When the owner is not set, the lock could be free or -* held by readers. Check the counter to verify the -* state. +* If there is a new owner or the owner is not set, we continue +* spinning. */ - count = READ_ONCE(sem->count); - return (count == 0 || count == RWSEM_WAITING_BIAS); + return !rwsem_owner_is_reader(READ_ONCE(sem->owner)); } static bool rwsem_optimistic_spin(struct rw_semaphore *sem) @@ -378,7 +372,16 @@ static bool rwsem_optimistic_spin(struct
[PATCH v4 3/5] locking/rwsem: Don't wake up one's own task
As rwsem_down_read_failed() will queue itself and potentially call __rwsem_do_wake(sem, RWSEM_WAKE_ANY), it is possible that a reader will try to wake up its own task. This patch adds a check to make sure that this won't happen. Signed-off-by: Waiman Long Reviewed-by: Peter Hurley --- kernel/locking/rwsem-xadd.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index c278f5a..007814f 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -202,7 +202,8 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type) */ smp_mb(); waiter->task = NULL; - wake_up_process(tsk); + if (tsk != current) + wake_up_process(tsk); put_task_struct(tsk); } while (--loop); -- 1.7.1
[PATCH v4 5/5] locking/rwsem: Streamline the rwsem_optimistic_spin() code
This patch moves the owner loading and checking code entirely inside of rwsem_spin_on_owner() to simplify the logic of rwsem_optimistic_spin() loop. Suggested-by: Peter Hurley Signed-off-by: Waiman Long Reviewed-by: Peter Hurley --- kernel/locking/rwsem-xadd.c | 38 -- 1 files changed, 20 insertions(+), 18 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index e3a7e06..a85a2bd 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -332,9 +332,16 @@ done: return ret; } -static noinline -bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner) +/* + * Return true only if we can still spin on the owner field of the rwsem. + */ +static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem) { + struct task_struct *owner = READ_ONCE(sem->owner); + + if (!rwsem_owner_is_writer(owner)) + goto out; + rcu_read_lock(); while (sem->owner == owner) { /* @@ -354,7 +361,7 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner) cpu_relax_lowlatency(); } rcu_read_unlock(); - +out: /* * If there is a new owner or the owner is not set, we continue * spinning. @@ -364,7 +371,6 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner) static bool rwsem_optimistic_spin(struct rw_semaphore *sem) { - struct task_struct *owner; bool taken = false; preempt_disable(); @@ -376,21 +382,17 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) if (!osq_lock(>osq)) goto done; - while (true) { - owner = READ_ONCE(sem->owner); + /* +* Optimistically spin on the owner field and attempt to acquire the +* lock whenever the owner changes. Spinning will be stopped when: +* 1) the owning writer isn't running; or +* 2) readers own the lock as we can't determine if they are +* actively running or not. +*/ + while (rwsem_spin_on_owner(sem)) { /* -* Don't spin if -* 1) the owner is a reader as we we can't determine if the -*reader is actively running or not. -* 2) The rwsem_spin_on_owner() returns false which means -*the owner isn't running. +* Try to acquire the lock */ - if (rwsem_owner_is_reader(owner) || - (rwsem_owner_is_writer(owner) && - !rwsem_spin_on_owner(sem, owner))) - break; - - /* wait_lock will be acquired if write_lock is obtained */ if (rwsem_try_write_lock_unqueued(sem)) { taken = true; break; @@ -402,7 +404,7 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) * we're an RT task that will live-lock because we won't let * the owner complete. */ - if (!owner && (need_resched() || rt_task(current))) + if (!sem->owner && (need_resched() || rt_task(current))) break; /* -- 1.7.1
[PATCH v4 0/5] [PATCH v3 0/4] locking/rwsem: Add reader-owned state to the owner field
v3->v4: - Add a new patch 2 to use WRITE_ONCE() for all rwsem->owner stores to prevent store tearing. v2->v3: - Make minor code changes as suggested by PeterZ & Peter Hurley. - Add 2 minor patches (#2 & #3) to improve the rwsem code - Add a 4th patch to streamline the rwsem_optimistic_spin() code. v1->v2: - Add rwsem_is_reader_owned() helper & rename rwsem_reader_owned() to rwsem_set_reader_owned(). - Add more comments to clarify the purpose of some of the code changes. Patch 1 is the main patch of this series. Patch 2 protects against store tearing of rwsem->owner field which can cause problem when a reader tries to dereference it. Patch 3 eliminates redundant wakeup caused by a reader waking itself. Patch 4 improves the efficiency of the reader wakeup code. Patch 5 streamlines the rwsem_optimistic_spin() to make it simpler. Waiman Long (5): locking/rwsem: Add reader-owned state to the owner field locking/rwsem: Protect all writes to owner by WRITE_ONCE() locking/rwsem: Don't wake up one's own task locking/rwsem: Improve reader wakeup code locking/rwsem: Streamline the rwsem_optimistic_spin() code kernel/locking/rwsem-xadd.c | 75 -- kernel/locking/rwsem.c |8 +++- kernel/locking/rwsem.h | 52 - 3 files changed, 99 insertions(+), 36 deletions(-)
[PATCH v4 1/5] locking/rwsem: Add reader-owned state to the owner field
Currently, it is not possible to determine for sure if a reader owns a rwsem by looking at the content of the rwsem data structure. This patch adds a new state RWSEM_READER_OWNED to the owner field to indicate that readers currently own the lock. This enables us to address the following 2 issues in the rwsem optimistic spinning code: 1) rwsem_can_spin_on_owner() will disallow optimistic spinning if the owner field is NULL which can mean either the readers own the lock or the owning writer hasn't set the owner field yet. In the latter case, we miss the chance to do optimistic spinning. 2) While a writer is waiting in the OSQ and a reader takes the lock, the writer will continue to spin when out of the OSQ in the main rwsem_optimistic_spin() loop as the owner field is NULL wasting CPU cycles if some of readers are sleeping. Adding the new state will allow optimistic spinning to go forward as long as the owner field is not RWSEM_READER_OWNED and the owner is running, if set, but stop immediately when that state has been reached. On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the fio test with multithreaded randrw and randwrite tests on the same file on a XFS partition on top of a NVDIMM were run, the aggregated bandwidths before and after the patch were as follows: Test BW before patch BW after patch % change --- -- randrw 988 MB/s 1192 MB/s +21% randwrite 1513 MB/s 1623 MB/s +7.3% The perf profile of the rwsem_down_write_failed() function in randrw before and after the patch were: 19.95% 5.88% fio [kernel.vmlinux] [k] rwsem_down_write_failed 14.20% 1.52% fio [kernel.vmlinux] [k] rwsem_down_write_failed The actual CPU cycles spend in rwsem_down_write_failed() dropped from 5.88% to 1.52% after the patch. The xfstests was also run and no regression was observed. Signed-off-by: Waiman Long Acked-by: Jason Low Acked-by: Davidlohr Bueso --- kernel/locking/rwsem-xadd.c | 41 ++--- kernel/locking/rwsem.c |8 ++-- kernel/locking/rwsem.h | 41 + 3 files changed, 69 insertions(+), 21 deletions(-) diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index 09e30c6..c278f5a 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -155,6 +155,12 @@ __rwsem_do_wake(struct rw_semaphore *sem, enum rwsem_wake_type wake_type) /* Last active locker left. Retry waking readers. */ goto try_reader_grant; } + /* +* It is not really necessary to set it to reader-owned here, +* but it gives the spinners an early indication that the +* readers now have the lock. +*/ + rwsem_set_reader_owned(sem); } /* Grant an infinite number of read locks to the readers at the front @@ -306,16 +312,11 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) rcu_read_lock(); owner = READ_ONCE(sem->owner); - if (!owner) { - long count = READ_ONCE(sem->count); + if (!rwsem_owner_is_writer(owner)) { /* -* If sem->owner is not set, yet we have just recently entered the -* slowpath with the lock being active, then there is a possibility -* reader(s) may have the lock. To be safe, bail spinning in these -* situations. +* Don't spin if the rwsem is readers owned. */ - if (count & RWSEM_ACTIVE_MASK) - ret = false; + ret = !rwsem_owner_is_reader(owner); goto done; } @@ -328,8 +329,6 @@ done: static noinline bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner) { - long count; - rcu_read_lock(); while (sem->owner == owner) { /* @@ -350,16 +349,11 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner) } rcu_read_unlock(); - if (READ_ONCE(sem->owner)) - return true; /* new owner, continue spinning */ - /* -* When the owner is not set, the lock could be free or -* held by readers. Check the counter to verify the -* state. +* If there is a new owner or the owner is not set, we continue +* spinning. */ - count = READ_ONCE(sem->count); - return (count == 0 || count == RWSEM_WAITING_BIAS); + return !rwsem_owner_is_reader(READ_ONCE(sem->owner)); } static bool rwsem_optimistic_spin(struct rw_semaphore *sem) @@ -378,7 +372,16 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem) while (true) { owner =