Re: Linux 5.10
On Mon, Dec 14, 2020 at 10:21:59AM -0700, Jens Axboe wrote: > [ 87.290698] attempt to access beyond end of device > md0: rw=4096, want=13996467328, limit=6261202944 > [ 87.293371] attempt to access beyond end of device > md0: rw=4096, want=13998564480, limit=6261202944 > [ 87.296045] BTRFS warning (device md0): couldn't read tree root > [ 87.300056] BTRFS error (device md0): open_ctree failed > > Reverting it goes back to the -rc7 behaviour where it mounts fine. > >>> > >>> If the developer/maintainer(s) agree, I can revert this and push out a > >>> 5.10.1, just let me know. > >> > >> Yes, these should be reverted from 5.10 via 5.10.1: > >> > >> e0910c8e4f87 dm raid: fix discard limits for raid1 and raid10 > >> f075cfb1dc59 md: change mddev 'chunk_sectors' from int to unsigned > > > > Sorry, f075cfb1dc59 was my local commit id, the corresponding upstream > > commit as staged by Jens is: > > > > 6ffeb1c3f82 md: change mddev 'chunk_sectors' from int to unsigned > > > > So please revert: > > 6ffeb1c3f822 md: change mddev 'chunk_sectors' from int to unsigned > > and then revert: > > e0910c8e4f87 dm raid: fix discard limits for raid1 and raid10 > > Working with Song on understanding the failure case here. raid6 was > tested prior to this being shipped. We'll be back with more soon... FYI, mixup in my original mail, it was raid5 (I forgot I converted it from raid6->raid5 a few months back). But I wouldn't be surprised if they were both equally affected given what that header touched. Dave
Re: Linux 5.10
On Sun, Dec 13, 2020 at 03:03:29PM -0800, Linus Torvalds wrote: > Ok, here it is - 5.10 is tagged and pushed out. > > I pretty much always wish that the last week was even calmer than it > was, and that's true here too. There's a fair amount of fixes in here, > including a few last-minute reverts for things that didn't get fixed, > but nothing makes me go "we need another week". ... > Mike Snitzer (1): > md: change mddev 'chunk_sectors' from int to unsigned Seems to be broken. This breaks mounting my raid6 partition: [ 87.290698] attempt to access beyond end of device md0: rw=4096, want=13996467328, limit=6261202944 [ 87.293371] attempt to access beyond end of device md0: rw=4096, want=13998564480, limit=6261202944 [ 87.296045] BTRFS warning (device md0): couldn't read tree root [ 87.300056] BTRFS error (device md0): open_ctree failed Reverting it goes back to the -rc7 behaviour where it mounts fine. Dave
Re: Linux 5.10
On Mon, Dec 14, 2020 at 12:31:47AM -0500, Dave Jones wrote: > On Sun, Dec 13, 2020 at 03:03:29PM -0800, Linus Torvalds wrote: > > Ok, here it is - 5.10 is tagged and pushed out. > > > > I pretty much always wish that the last week was even calmer than it > > was, and that's true here too. There's a fair amount of fixes in here, > > including a few last-minute reverts for things that didn't get fixed, > > but nothing makes me go "we need another week". > > ... > > > Mike Snitzer (1): > > md: change mddev 'chunk_sectors' from int to unsigned > > Seems to be broken. This breaks mounting my raid6 partition: > > [ 87.290698] attempt to access beyond end of device >md0: rw=4096, want=13996467328, limit=6261202944 > [ 87.293371] attempt to access beyond end of device >md0: rw=4096, want=13998564480, limit=6261202944 > [ 87.296045] BTRFS warning (device md0): couldn't read tree root > [ 87.300056] BTRFS error (device md0): open_ctree failed > > Reverting it goes back to the -rc7 behaviour where it mounts fine. Another data point from the md setup in dmesg.. good: [4.614957] md/raid:md0: device sdd1 operational as raid disk 3 [4.614960] md/raid:md0: device sda1 operational as raid disk 0 [4.614962] md/raid:md0: device sdc1 operational as raid disk 2 [4.614963] md/raid:md0: device sdf1 operational as raid disk 4 [4.614964] md/raid:md0: device sdg1 operational as raid disk 1 [4.615156] md/raid:md0: raid level 5 active with 5 out of 5 devices, algorithm 2 [4.645563] md0: detected capacity change from 0 to 12001828929536 bad: [5.315036] md/raid:md0: device sda1 operational as raid disk 0 [5.316220] md/raid:md0: device sdd1 operational as raid disk 3 [5.317389] md/raid:md0: device sdc1 operational as raid disk 2 [5.318613] md/raid:md0: device sdf1 operational as raid disk 4 [5.319748] md/raid:md0: device sdg1 operational as raid disk 1 [5.321155] md/raid:md0: raid level 5 active with 5 out of 5 devices, algorithm 2 [5.370257] md0: detected capacity change from 0 to 3205735907328
Re: weird loadavg on idle machine post 5.7
On Mon, Jul 06, 2020 at 04:59:52PM +0200, Peter Zijlstra wrote: > On Fri, Jul 03, 2020 at 04:51:53PM -0400, Dave Jones wrote: > > On Fri, Jul 03, 2020 at 12:40:33PM +0200, Peter Zijlstra wrote: > > > > looked promising the first few hours, but as soon as it hit four hours > > of uptime, loadavg spiked and is now pinned to at least 1.00 > > OK, lots of cursing later, I now have the below... > > The TL;DR is that while schedule() doesn't change p->state once it > starts, it does read it quite a bit, and ttwu() will actually change it > to TASK_WAKING. So if ttwu() changes it to WAKING before schedule() > reads it to do loadavg accounting, things go sideways. > > The below is extra complicated by the fact that I've had to scrounge up > a bunch of load-store ordering without actually adding barriers. It adds > yet another control dependency to ttwu(), so take that C standard :-) Man this stuff is subtle. I could've read this a hundred times and not even come close to approaching this. Basically me reading scheduler code: http://www.quickmeme.com/img/96/9642ed212bbced00885592b39880ec55218e922245e0637cf94db2e41857d558.jpg > I've booted it, and build a few kernels with it and checked loadavg > drops to 0 after each build, so from that pov all is well, but since > I'm not confident I can reproduce the issue, I can't tell this actually > fixes anything, except maybe phantoms of my imagination. Five hours in, looking good so far. I think you nailed it. Dave
Re: weird loadavg on idle machine post 5.7
On Fri, Jul 03, 2020 at 12:40:33PM +0200, Peter Zijlstra wrote: > So ARM/Power/etc.. can speculate the load such that the > task_contributes_to_load() value is from before ->on_rq. > > The compiler might similar re-order things -- although I've not found it > doing so with the few builds I looked at. > > So I think at the very least we should do something like this. But i've > no idea how to reproduce this problem. > > Mel's patch placed it too far down, as the WF_ON_CPU path also relies on > this, and by not resetting p->sched_contributes_to_load it would skew > accounting even worse. looked promising the first few hours, but as soon as it hit four hours of uptime, loadavg spiked and is now pinned to at least 1.00 Dave
Re: weird loadavg on idle machine post 5.7
On Thu, Jul 02, 2020 at 10:36:27PM +0100, Mel Gorman wrote: > I'm thinking that the !!task_contributes_to_load(p) should still happen > after smp_cond_load_acquire() when on_cpu is stable and the pi_lock is > held to stabilised p->state against a parallel wakeup or updating the > task rq. I do not see any hazards with respect to smp_rmb and the value > of p->state in this particular path but I've confused myself enough in > the various scheduler and wakeup paths that I don't want to bet money on > it late in the evening > > It builds, not booted, it's for discussion but maybe Dave is feeling brave! stalls, and then panics during boot :( [ 16.933212] igb :02:00.0 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX [ 69.572840] watchdog: BUG: soft lockup - CPU#3 stuck for 44s! [kworker/u8:0:7] [ 69.572849] CPU: 3 PID: 7 Comm: kworker/u8:0 Kdump: loaded Not tainted 5.8.0-rc3-firewall+ #2 [ 69.572852] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Q3XXG4-P, BIOS 5.6.5 06/30/2018 [ 69.572861] Workqueue: 0x0 (events_power_efficient) [ 69.572877] RIP: 0010:finish_task_switch+0x71/0x1a0 [ 69.572884] Code: 00 00 4d 8b 7c 24 10 65 4c 8b 34 25 c0 6c 01 00 0f 1f 44 00 00 0f 1f 44 00 00 41 c7 44 24 2c 00 00 00 00 c6 03 00 fb 4d 85 ed <74> 0b f0 41 ff 4d 4c 0f 84 d9 00 00 00 49 83 c7 80 74 7a 48 89 d8 [ 69.572887] RSP: 0018:b36700067e40 EFLAGS: 0246 [ 69.572893] RAX: 94654eab RBX: 9465575a8b40 RCX: [ 69.572895] RDX: RSI: 9465565c RDI: 94654eab [ 69.572898] RBP: b36700067e68 R08: 0001 R09: 000283c0 [ 69.572901] R10: R11: R12: 94654eab [ 69.572904] R13: R14: 9465565c R15: 0001 [ 69.572909] FS: () GS:94655758() knlGS: [ 69.572912] CS: 0010 DS: ES: CR0: 80050033 [ 69.572917] CR2: 7f29b26abc30 CR3: 00020812d001 CR4: 001606e0 [ 69.572919] Call Trace: [ 69.572937] __schedule+0x28d/0x570 [ 69.572946] ? _cond_resched+0x15/0x30 [ 69.572954] schedule+0x38/0xa0 [ 69.572962] worker_thread+0xaa/0x3c0 [ 69.572968] ? process_one_work+0x3c0/0x3c0 [ 69.572972] kthread+0x116/0x130 [ 69.572977] ? __kthread_create_on_node+0x180/0x180 [ 69.572982] ret_from_fork+0x22/0x30 [ 69.572988] Kernel panic - not syncing: softlockup: hung tasks [ 69.572993] CPU: 3 PID: 7 Comm: kworker/u8:0 Kdump: loaded Tainted: G L5.8.0-rc3-firewall+ #2 [ 69.572995] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Q3XXG4-P, BIOS 5.6.5 06/30/2018 [ 69.572998] Workqueue: 0x0 (events_power_efficient) [ 69.573001] Call Trace: [ 69.573004] [ 69.573010] dump_stack+0x57/0x70 [ 69.573016] panic+0xfb/0x2cb [ 69.573024] watchdog_timer_fn.cold.12+0x7d/0x96 [ 69.573030] ? softlockup_fn+0x30/0x30 [ 69.573035] __hrtimer_run_queues+0x100/0x280 [ 69.573041] hrtimer_interrupt+0xf4/0x210 [ 69.573049] __sysvec_apic_timer_interrupt+0x5d/0xf0 [ 69.573055] asm_call_on_stack+0x12/0x20 [ 69.573058] [ 69.573064] sysvec_apic_timer_interrupt+0x6d/0x80 [ 69.573069] asm_sysvec_apic_timer_interrupt+0xf/0x20 [ 69.573078] RIP: 0010:finish_task_switch+0x71/0x1a0 [ 69.573082] Code: 00 00 4d 8b 7c 24 10 65 4c 8b 34 25 c0 6c 01 00 0f 1f 44 00 00 0f 1f 44 00 00 41 c7 44 24 2c 00 00 00 00 c6 03 00 fb 4d 85 ed <74> 0b f0 41 ff 4d 4c 0f 84 d9 00 00 00 49 83 c7 80 74 7a 48 89 d8 [ 69.573085] RSP: 0018:b36700067e40 EFLAGS: 0246 [ 69.573088] RAX: 94654eab RBX: 9465575a8b40 RCX: [ 69.573090] RDX: RSI: 9465565c RDI: 94654eab [ 69.573092] RBP: b36700067e68 R08: 0001 R09: 000283c0 [ 69.573094] R10: R11: R12: 94654eab [ 69.573096] R13: R14: 9465565c R15: 0001 [ 69.573106] __schedule+0x28d/0x570 [ 69.573113] ? _cond_resched+0x15/0x30 [ 69.573119] schedule+0x38/0xa0 [ 69.573125] worker_thread+0xaa/0x3c0 [ 69.573130] ? process_one_work+0x3c0/0x3c0 [ 69.573134] kthread+0x116/0x130 [ 69.573149] ? __kthread_create_on_node+0x180/0x180 [ 69.792344] ret_from_fork+0x22/0x30
Re: weird loadavg on idle machine post 5.7
On Thu, Jul 02, 2020 at 01:15:48PM -0400, Dave Jones wrote: > When I upgraded my firewall to 5.7-rc2 I noticed that on a mostly > idle machine (that usually sees loadavg hover in the 0.xx range) > that it was consistently above 1.00 even when there was nothing running. > All that perf showed was the kernel was spending time in the idle loop > (and running perf). Unfortunate typo there, I meant 5.8-rc2, and just confirmed the bug persists in 5.8-rc3. Dave
weird loadavg on idle machine post 5.7
When I upgraded my firewall to 5.7-rc2 I noticed that on a mostly idle machine (that usually sees loadavg hover in the 0.xx range) that it was consistently above 1.00 even when there was nothing running. All that perf showed was the kernel was spending time in the idle loop (and running perf). For the first hour or so after boot, everything seems fine, but over time loadavg creeps up, and once it's established a new baseline, it never seems to ever drop below that again. One morning I woke up to find loadavg at '7.xx', after almost as many hours of uptime, which makes me wonder if perhaps this is triggered by something in cron. I have a bunch of scripts that fire off every hour that involve thousands of shortlived runs of iptables/ipset, but running them manually didn't seem to automatically trigger the bug. Given it took a few hours of runtime to confirm good/bad, bisecting this took the last two weeks. I did it four different times, the first producing bogus results from over-eager 'good', but the last two runs both implicated this commit: commit c6e7bd7afaeb3af55ffac122828035f1c01d1d7b (refs/bisect/bad) Author: Peter Zijlstra Date: Sun May 24 21:29:55 2020 +0100 sched/core: Optimize ttwu() spinning on p->on_cpu Both Rik and Mel reported seeing ttwu() spend significant time on: smp_cond_load_acquire(>on_cpu, !VAL); Attempt to avoid this by queueing the wakeup on the CPU that owns the p->on_cpu value. This will then allow the ttwu() to complete without further waiting. Since we run schedule() with interrupts disabled, the IPI is guaranteed to happen after p->on_cpu is cleared, this is what makes it safe to queue early. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Mel Gorman Signed-off-by: Ingo Molnar Cc: Jirka Hladky Cc: Vincent Guittot Cc: valentin.schnei...@arm.com Cc: Hillf Danton Cc: Rik van Riel Link: https://lore.kernel.org/r/20200524202956.27665-2-mgor...@techsingularity.net Unfortunatly it doesn't revert cleanly on top of rc3 so I haven't confirmed 100% that it's the cause yet, but the two separate bisects seem promising. I don't see any obvious correlation between what's changing there and the symtoms (other than "scheduler magic") but maybe those closer to this have ideas what could be going awry ? Dave
ntp audit spew.
I have some hosts that are constantly spewing audit messages like so: [46897.591182] audit: type=1333 audit(1569250288.663:220): op=offset old=2543677901372 new=2980866217213 [46897.591184] audit: type=1333 audit(1569250288.663:221): op=freq old=-2443166611284 new=-2436281764244 [48850.604005] audit: type=1333 audit(1569252241.675:222): op=offset old=1850302393317 new=3190241577926 [48850.604008] audit: type=1333 audit(1569252241.675:223): op=freq old=-2436281764244 new=-2413071187316 [49926.567270] audit: type=1333 audit(1569253317.638:224): op=offset old=2453141035832 new=2372389610455 [49926.567273] audit: type=1333 audit(1569253317.638:225): op=freq old=-2413071187316 new=-2403561671476 This gets emitted every time ntp makes an adjustment, which is apparently very frequent on some hosts. Audit isn't even enabled on these machines. # auditctl -l No rules # auditctl -s enabled 0 failure 1 pid 0 rate_limit 0 backlog_limit 64 lost 0 backlog 0 loginuid_immutable 0 unlocked Asides from the log spew, why is this code doing _anything_ when audit isn't enabled ? Something like this: diff --git a/kernel/audit.c b/kernel/audit.c index da8dc0db5bd3..1291d826c024 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -2340,6 +2340,9 @@ void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type, struct audit_buffer *ab; va_list args; + if (audit_initialized != AUDIT_INITIALIZED) + return; + ab = audit_log_start(ctx, gfp_mask, type); if (ab) { va_start(args, fmt); Might silence the spew, but I'm concerned that the amount of work that audit is doing on an unconfigured machine might warrant further investigation. ("turn off CONFIG_AUDIT" isn't an option unfortunately, as this is a one-size-fits-all kernel that runs on some other hosts that /do/ have audit configured) Dave
5.3-rc1 panic in dma_direct_max_mapping_size
only got a partial panic, but when I threw 5.3-rc1 on a linode vm, it hit this: bus_add_driver+0x1a9/0x1c0 ? scsi_init_sysctl+0x22/0x22 driver_register+0x6b/0xa6 ? scsi_init_sysctl+0x22/0x22 init+0x86/0xcc do_one_initcall+0x69/0x334 kernel_init_freeable+0x367/0x3ff ? rest_init+0x247/0x247 kernel_init+0xa/0xf9 ret_from_fork+0x3a/0x50 CR2: ---[ end trace 2967cd16f7b1a303 ]--- RIP: 0010:dma_direct_max_mapping_size+0x21/0x71 Code: 0f b6 c0 c3 0f 1f 44 00 00 0f 1f 44 00 00 55 53 48 89 fb e8 21 0e 00 00 84 c0 74 2c 48 8b 83 20 03 00 00 48 8b ab 30 03 00 00 <48> 8b 00 48 85 c0 75 20 48 89 df e8 ff f3 ff ff 48 39 e8 77 2c 83 RSP: 0018:b58f00013ae8 EFLAGS: 00010202 RAX: RBX: a35ff8914ac8 RCX: b58f00013a1c RDX: a35ff81d4658 RSI: 007e RDI: a35ff8914ac8 RBP: R08: a35ff81d4cc0 R09: a35ff82e3bc8 R10: R11: R12: a35ff8914ac8 R13: R14: a35ff826c160 R15: FS: () GS:a35ffba0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 00012d220001 CR4: 003606f0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009 Kernel Offset: 0x1b00 from 0x8100 (relocation range: 0x8000-0xbfff) Will try and get some more debug info this evening if it isn't obvious from the above. Dave
kernel BUG at kernel/cred.c:825!
[ 53.980701] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory [ 53.981216] NFSD: starting 45-second grace period (net f098) [ 54.006802] CRED: Invalid credentials [ 54.006880] CRED: At ./include/linux/cred.h:253 [ 54.006899] CRED: Specified credentials: 5daa4529 [ 54.006916] CRED: ->magic=0, put_addr= (null) [ 54.006927] CRED: ->usage=1, subscr=0 [ 54.006935] CRED: ->*uid = { 0,0,0,0 } [ 54.006944] CRED: ->*gid = { 0,0,0,0 } [ 54.006954] [ cut here ] [ 54.006964] kernel BUG at kernel/cred.c:825! [ 54.006977] invalid opcode: [#1] SMP RIP: __invalid_creds+0x48/0x50 [ 54.006987] CPU: 2 PID: 814 Comm: mount.nfs Tainted: GW 5.0.0-rc1-backup+ #1 [ 54.006997] Hardware name: ASUS All Series/Z97-DELUXE, BIOS 2602 08/18/2015 [ 54.007171] RIP: 0010:__invalid_creds+0x48/0x50 [ 54.007184] Code: 44 89 e2 48 89 ee 48 c7 c7 37 3e 53 ba e8 f7 8f 03 00 48 c7 c6 49 3e 53 ba 48 89 df 65 48 8b 14 25 80 4e 01 00 e8 48 fd ff ff <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 49 89 fe [ 54.007207] RSP: 0018:c9e33a30 EFLAGS: 00010286 [ 54.007219] RAX: 001a RBX: ba960300 RCX: 0006 [ 54.007234] RDX: RSI: 8884276f8818 RDI: 88842f895710 [ 54.007246] RBP: ba5274c3 R08: 0001 R09: [ 54.007254] R10: c9e33a50 R11: R12: 00fd [ 54.007261] R13: 88842c1a6a08 R14: ba960300 R15: c9e33d60 [ 54.007269] FS: 7f73770cb140() GS:88842f88() knlGS: [ 54.007277] CS: 0010 DS: ES: CR0: 80050033 [ 54.007283] CR2: 5557d17d1000 CR3: 0004122ba006 CR4: 001606e0 [ 54.007359] Call Trace: [ 54.007366] nfs4_discover_server_trunking+0x286/0x310 [ 54.007376] nfs4_init_client+0xe8/0x260 [ 54.007389] ? nfs_get_client+0x519/0x610 [ 54.007401] ? _raw_spin_unlock+0x24/0x30 [ 54.007412] ? nfs_get_client+0x519/0x610 [ 54.007424] nfs4_set_client+0xb8/0x100 [ 54.007439] nfs4_create_server+0xfe/0x270 [ 54.007451] ? pcpu_alloc+0x611/0x8a0 [ 54.007462] nfs4_remote_mount+0x28/0x50 [ 54.007474] mount_fs+0xf/0x80 [ 54.007487] vfs_kern_mount+0x62/0x160 [ 54.007498] nfs_do_root_mount+0x7f/0xc0 [ 54.007510] nfs4_try_mount+0x3f/0xc0 [ 54.007521] ? get_nfs_version+0x11/0x50 [ 54.007536] nfs_fs_mount+0x61b/0xbd0 [ 54.007548] ? rcu_read_lock_sched_held+0x66/0x70 [ 54.007560] ? nfs_clone_super+0x70/0x70 [ 54.007571] ? nfs_destroy_inode+0x20/0x20 [ 54.007585] ? mount_fs+0xf/0x80 [ 54.007595] mount_fs+0xf/0x80 [ 54.007606] vfs_kern_mount+0x62/0x160 [ 54.007618] do_mount+0x1d1/0xd40 [ 54.007631] ? copy_mount_options+0xd2/0x170 [ 54.007643] ksys_mount+0x7e/0xd0 [ 54.007654] __x64_sys_mount+0x21/0x30 [ 54.007665] do_syscall_64+0x6d/0x660 [ 54.007677] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 54.007690] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 54.007702] RIP: 0033:0x7f7377e97a1a [ 54.007713] Code: 48 8b 0d 71 e4 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3e e4 0b 00 f7 d8 64 89 01 48 [ 54.007736] RSP: 002b:7ffc73d9b4a8 EFLAGS: 0202 ORIG_RAX: 00a5 [ 54.007751] RAX: ffda RBX: RCX: 7f7377e97a1a [ 54.007764] RDX: 5632beb51b50 RSI: 5632beb51b70 RDI: 5632beb53880 [ 54.007780] RBP: 7ffc73d9b600 R08: 5632beb556b0 R09: 33643a303036343a [ 54.007794] R10: 0c00 R11: 0202 R12: 7ffc73d9b600 [ 54.007807] R13: 5632beb548a0 R14: 001c R15: 7ffc73d9b510
Re: Kernel 4.17.4 lockup
On Wed, Jul 11, 2018 at 10:50:22AM -0700, Dave Hansen wrote: > On 07/11/2018 10:29 AM, H.J. Lu wrote: > >> I have seen it on machines with various amounts of cores and RAMs. > >> It triggers the fastest on 8 cores with 6GB RAM reliably. > > Here is the first kernel message. > > This looks like random corruption again. It's probably a bogus 'struct > page' that fails the move_freepages() pfn_valid() checks. I'm too lazy > to go reproduce the likely stack trace (not sure why it didn't show up > on your screen), but this could just be another symptom of the same > issue that caused the TLB batching oops. > > My money is on this being some kind of odd stack corruption, maybe > interrupt-induced, but that's a total guess at this point. So, maybe related.. I reported this to linux-mm a few days ago: When I ran an rsync on my machine I use for backups, it eventually hits this trace.. kernel BUG at mm/page_alloc.c:2016! invalid opcode: [#1] SMP RIP: move_freepages_block+0x120/0x2d0 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.18.0-rc4-backup+ #1 Hardware name: ASUS All Series/Z97-DELUXE, BIOS 2602 08/18/2015 RIP: 0010:move_freepages_block+0x120/0x2d0 Code: 05 48 01 c8 74 3b f6 00 02 74 36 48 8b 03 48 c1 e8 3e 48 8d 0c 40 48 8b 86 c0 7f 00 00 48 c1 e8 3e 48 8d 04 40 48 39 c8 74 17 <0f> 0b 45 31 f6 48 83 c4 28 44 89 f0 5b 5d 41 5c 41 5d 41 5e 41 5f RSP: 0018:88043fac3af8 EFLAGS: 00010093 RAX: RBX: ea0002e2 RCX: 0003 RDX: RSI: ea0002e2 RDI: RBP: R08: 88043fac3b5c R09: 9295e110 R10: 88043fdf4000 R11: ea0002e20008 R12: ea0002e2 R13: 9295dd40 R14: 0008 R15: ea0002e27fc0 FS: () GS:88043fac() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f2a75f71fe8 CR3: 0001e380f006 CR4: 001606e0 Call Trace: ? lock_acquire+0xe6/0x1dc steal_suitable_fallback+0x152/0x1a0 get_page_from_freelist+0x1029/0x1650 ? free_debug_processing+0x271/0x410 __alloc_pages_nodemask+0x111/0x310 page_frag_alloc+0x74/0x120 __netdev_alloc_skb+0x95/0x110 e1000_alloc_rx_buffers+0x225/0x2b0 e1000_clean_rx_irq+0x2ee/0x450 e1000e_poll+0x7c/0x2e0 net_rx_action+0x273/0x4d0 __do_softirq+0xc6/0x4d6 irq_exit+0xbb/0xc0 do_IRQ+0x60/0x110 common_interrupt+0xf/0xf RIP: 0010:cpuidle_enter_state+0xb5/0x390 Code: 89 04 24 0f 1f 44 00 00 31 ff e8 86 26 64 ff 80 7c 24 0f 00 0f 85 fb 01 00 00 e8 66 02 66 ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <48> 8b 0c 24 4c 29 f9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f RSP: 0018:c90abe70 EFLAGS: 0202 ORIG_RAX: ffdc RAX: 880107fe8040 RBX: 0003 RCX: 0001 RDX: 20c49ba5e353f7cf RSI: 0001 RDI: 880107fe8040 RBP: 88043fae8c20 R08: 0001 R09: 0018 R10: R11: R12: 928fb7d8 R13: 0003 R14: 0003 R15: 015e55aecf23 do_idle+0x128/0x230 cpu_startup_entry+0x6f/0x80 start_secondary+0x192/0x1f0 secondary_startup_64+0xa5/0xb0 NMI watchdog: Watchdog detected hard LOCKUP on cpu 4 Everything then locks up & rebooots. It's fairly reproduceable, though every time I run it my rsync gets further, and eventually I suspect it won't create enough load to reproduce. 2006 #ifndef CONFIG_HOLES_IN_ZONE 2007 /* 2008 * page_zone is not safe to call in this context when 2009 * CONFIG_HOLES_IN_ZONE is set. This bug check is probably redundant 2010 * anyway as we check zone boundaries in move_freepages_block(). 2011 * Remove at a later date when no bug reports exist related to 2012 * grouping pages by mobility 2013 */ 2014 VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) && 2015 pfn_valid(page_to_pfn(end_page)) && 2016 page_zone(start_page) != page_zone(end_page)); 2017 #endif 2018 I could trigger it fairly quickly last week, but it seemed dependant on just how much rsync is actually transferring. (There are millions of files, and only a few thousand had changed) When there's nothing changed, the rsync was running to completion every time. Dave
Re: Kernel 4.17.4 lockup
On Wed, Jul 11, 2018 at 10:50:22AM -0700, Dave Hansen wrote: > On 07/11/2018 10:29 AM, H.J. Lu wrote: > >> I have seen it on machines with various amounts of cores and RAMs. > >> It triggers the fastest on 8 cores with 6GB RAM reliably. > > Here is the first kernel message. > > This looks like random corruption again. It's probably a bogus 'struct > page' that fails the move_freepages() pfn_valid() checks. I'm too lazy > to go reproduce the likely stack trace (not sure why it didn't show up > on your screen), but this could just be another symptom of the same > issue that caused the TLB batching oops. > > My money is on this being some kind of odd stack corruption, maybe > interrupt-induced, but that's a total guess at this point. So, maybe related.. I reported this to linux-mm a few days ago: When I ran an rsync on my machine I use for backups, it eventually hits this trace.. kernel BUG at mm/page_alloc.c:2016! invalid opcode: [#1] SMP RIP: move_freepages_block+0x120/0x2d0 CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.18.0-rc4-backup+ #1 Hardware name: ASUS All Series/Z97-DELUXE, BIOS 2602 08/18/2015 RIP: 0010:move_freepages_block+0x120/0x2d0 Code: 05 48 01 c8 74 3b f6 00 02 74 36 48 8b 03 48 c1 e8 3e 48 8d 0c 40 48 8b 86 c0 7f 00 00 48 c1 e8 3e 48 8d 04 40 48 39 c8 74 17 <0f> 0b 45 31 f6 48 83 c4 28 44 89 f0 5b 5d 41 5c 41 5d 41 5e 41 5f RSP: 0018:88043fac3af8 EFLAGS: 00010093 RAX: RBX: ea0002e2 RCX: 0003 RDX: RSI: ea0002e2 RDI: RBP: R08: 88043fac3b5c R09: 9295e110 R10: 88043fdf4000 R11: ea0002e20008 R12: ea0002e2 R13: 9295dd40 R14: 0008 R15: ea0002e27fc0 FS: () GS:88043fac() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f2a75f71fe8 CR3: 0001e380f006 CR4: 001606e0 Call Trace: ? lock_acquire+0xe6/0x1dc steal_suitable_fallback+0x152/0x1a0 get_page_from_freelist+0x1029/0x1650 ? free_debug_processing+0x271/0x410 __alloc_pages_nodemask+0x111/0x310 page_frag_alloc+0x74/0x120 __netdev_alloc_skb+0x95/0x110 e1000_alloc_rx_buffers+0x225/0x2b0 e1000_clean_rx_irq+0x2ee/0x450 e1000e_poll+0x7c/0x2e0 net_rx_action+0x273/0x4d0 __do_softirq+0xc6/0x4d6 irq_exit+0xbb/0xc0 do_IRQ+0x60/0x110 common_interrupt+0xf/0xf RIP: 0010:cpuidle_enter_state+0xb5/0x390 Code: 89 04 24 0f 1f 44 00 00 31 ff e8 86 26 64 ff 80 7c 24 0f 00 0f 85 fb 01 00 00 e8 66 02 66 ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <48> 8b 0c 24 4c 29 f9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f RSP: 0018:c90abe70 EFLAGS: 0202 ORIG_RAX: ffdc RAX: 880107fe8040 RBX: 0003 RCX: 0001 RDX: 20c49ba5e353f7cf RSI: 0001 RDI: 880107fe8040 RBP: 88043fae8c20 R08: 0001 R09: 0018 R10: R11: R12: 928fb7d8 R13: 0003 R14: 0003 R15: 015e55aecf23 do_idle+0x128/0x230 cpu_startup_entry+0x6f/0x80 start_secondary+0x192/0x1f0 secondary_startup_64+0xa5/0xb0 NMI watchdog: Watchdog detected hard LOCKUP on cpu 4 Everything then locks up & rebooots. It's fairly reproduceable, though every time I run it my rsync gets further, and eventually I suspect it won't create enough load to reproduce. 2006 #ifndef CONFIG_HOLES_IN_ZONE 2007 /* 2008 * page_zone is not safe to call in this context when 2009 * CONFIG_HOLES_IN_ZONE is set. This bug check is probably redundant 2010 * anyway as we check zone boundaries in move_freepages_block(). 2011 * Remove at a later date when no bug reports exist related to 2012 * grouping pages by mobility 2013 */ 2014 VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) && 2015 pfn_valid(page_to_pfn(end_page)) && 2016 page_zone(start_page) != page_zone(end_page)); 2017 #endif 2018 I could trigger it fairly quickly last week, but it seemed dependant on just how much rsync is actually transferring. (There are millions of files, and only a few thousand had changed) When there's nothing changed, the rsync was running to completion every time. Dave
fscache kasan splat on v4.17-rc3
[ 46.333213] == [ 46.336298] BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x129/0x310 [ 46.338208] Read of size 4 at addr 8803ea90261c by task mount.nfs/839 [ 46.342780] CPU: 2 PID: 839 Comm: mount.nfs Not tainted 4.17.0-rc3-backup-debug+ #1 [ 46.342783] Hardware name: ASUS All Series/Z97-DELUXE, BIOS 2602 08/18/2015 [ 46.342784] Call Trace: [ 46.342790] dump_stack+0x74/0xbb [ 46.342795] print_address_description+0x9b/0x2b0 [ 46.342797] kasan_report+0x258/0x380 [ 46.355407] ? fscache_alloc_cookie+0x129/0x310 [ 46.355410] fscache_alloc_cookie+0x129/0x310 [ 46.355413] __fscache_acquire_cookie+0xd2/0x570 [ 46.355417] nfs_fscache_get_client_cookie+0x206/0x220 [ 46.355419] ? nfs_readpage_from_fscache_complete+0xa0/0xa0 [ 46.355422] ? rcu_read_lock_sched_held+0x8a/0xa0 [ 46.355426] ? memcpy+0x34/0x50 [ 46.355428] nfs_alloc_client+0x1d9/0x1f0 [ 46.371854] nfs4_alloc_client+0x22/0x420 [ 46.371857] nfs_get_client+0x47d/0x8f0 [ 46.371860] ? pcpu_alloc+0x599/0xaf0 [ 46.371862] nfs4_set_client+0x155/0x1e0 [ 46.371865] ? nfs4_check_serverowner_major_id+0x50/0x50 [ 46.371867] nfs4_create_server+0x261/0x4e0 [ 46.371870] ? nfs4_set_ds_client+0x200/0x200 [ 46.371872] ? alloc_vfsmnt+0xa6/0x360 [ 46.371875] ? __lockdep_init_map+0xaa/0x290 [ 46.371878] nfs4_remote_mount+0x31/0x60 [ 46.371880] mount_fs+0x2f/0xd0 [ 46.371884] vfs_kern_mount+0x68/0x200 [ 46.396948] nfs_do_root_mount+0x7f/0xc0 [ 46.396952] ? do_raw_spin_unlock+0xa2/0x130 [ 46.396954] nfs4_try_mount+0x7f/0x110 [ 46.396957] nfs_fs_mount+0xca5/0x1450 [ 46.396960] ? pcpu_alloc+0x599/0xaf0 [ 46.396962] ? nfs_remount+0x8a0/0x8a0 [ 46.396964] ? mark_held_locks+0x1c/0xb0 [ 46.396967] ? __raw_spin_lock_init+0x1c/0x70 [ 46.412631] ? trace_hardirqs_on_caller+0x187/0x260 [ 46.412633] ? nfs_clone_super+0x150/0x150 [ 46.412635] ? nfs_destroy_inode+0x20/0x20 [ 46.412637] ? __lockdep_init_map+0xaa/0x290 [ 46.412639] ? __lockdep_init_map+0xaa/0x290 [ 46.412641] ? mount_fs+0x2f/0xd0 [ 46.412642] mount_fs+0x2f/0xd0 [ 46.412645] vfs_kern_mount+0x68/0x200 [ 46.412648] ? do_raw_read_unlock+0x28/0x50 [ 46.412651] do_mount+0x2ac/0x14f0 [ 46.412653] ? copy_mount_string+0x20/0x20 [ 46.431590] ? copy_mount_options+0xe6/0x1b0 [ 46.431592] ? copy_mount_options+0x100/0x1b0 [ 46.431594] ? copy_mount_options+0xe6/0x1b0 [ 46.431596] ksys_mount+0x7e/0xd0 [ 46.431599] __x64_sys_mount+0x62/0x70 [ 46.431601] do_syscall_64+0xc7/0x8a0 [ 46.431603] ? syscall_return_slowpath+0x3c0/0x3c0 [ 46.431605] ? mark_held_locks+0x1c/0xb0 [ 46.431609] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe [ 46.431611] ? trace_hardirqs_off_caller+0xc2/0x110 [ 46.431613] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 46.431615] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 46.431617] RIP: 0033:0x7f546ceb97fa [ 46.431619] RSP: 002b:7ffdf1c9d078 EFLAGS: 0206 ORIG_RAX: 00a5 [ 46.431622] RAX: ffda RBX: RCX: 7f546ceb97fa [ 46.431623] RDX: 55decf202b20 RSI: 55decf202b40 RDI: 55decf204850 [ 46.431625] RBP: 7ffdf1c9d1d0 R08: 55decf206680 R09: 62353a303036343a [ 46.431626] R10: 0c00 R11: 0206 R12: 7ffdf1c9d1d0 [ 46.431627] R13: 55decf205870 R14: 001c R15: 7ffdf1c9d0e0 [ 46.431631] Allocated by task 839: [ 46.431634] kasan_kmalloc+0xa0/0xd0 [ 46.431636] __kmalloc+0x156/0x350 [ 46.431639] fscache_alloc_cookie+0x2e4/0x310 [ 46.431640] __fscache_acquire_cookie+0xd2/0x570 [ 46.431643] nfs_fscache_get_client_cookie+0x206/0x220 [ 46.431645] nfs_alloc_client+0x1d9/0x1f0 [ 46.431648] nfs4_alloc_client+0x22/0x420 [ 46.431650] nfs_get_client+0x47d/0x8f0 [ 46.431652] nfs4_set_client+0x155/0x1e0 [ 46.431653] nfs4_create_server+0x261/0x4e0 [ 46.431655] nfs4_remote_mount+0x31/0x60 [ 46.431657] mount_fs+0x2f/0xd0 [ 46.431659] vfs_kern_mount+0x68/0x200 [ 46.431662] nfs_do_root_mount+0x7f/0xc0 [ 46.484441] nfs4_try_mount+0x7f/0x110 [ 46.484443] nfs_fs_mount+0xca5/0x1450 [ 46.484445] mount_fs+0x2f/0xd0 [ 46.484447] vfs_kern_mount+0x68/0x200 [ 46.484449] do_mount+0x2ac/0x14f0 [ 46.484451] ksys_mount+0x7e/0xd0 [ 46.484452] __x64_sys_mount+0x62/0x70 [ 46.484455] do_syscall_64+0xc7/0x8a0 [ 46.484458] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 46.484461] Freed by task 407: [ 46.499159] __kasan_slab_free+0x11d/0x160 [ 46.499161] kfree+0xe5/0x320 [ 46.499163] kobject_uevent_env+0x1ab/0x760 [ 46.499165] kobject_synth_uevent+0x470/0x4e0 [ 46.499168] uevent_store+0x1c/0x40 [ 46.499171] kernfs_fop_write+0x196/0x230 [ 46.499174] __vfs_write+0xc5/0x310 [ 46.499175] vfs_write+0xfb/0x250 [ 46.499177] ksys_write+0xa7/0x130 [ 46.499180] do_syscall_64+0xc7/0x8a0 [ 46.512915]
fscache kasan splat on v4.17-rc3
[ 46.333213] == [ 46.336298] BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x129/0x310 [ 46.338208] Read of size 4 at addr 8803ea90261c by task mount.nfs/839 [ 46.342780] CPU: 2 PID: 839 Comm: mount.nfs Not tainted 4.17.0-rc3-backup-debug+ #1 [ 46.342783] Hardware name: ASUS All Series/Z97-DELUXE, BIOS 2602 08/18/2015 [ 46.342784] Call Trace: [ 46.342790] dump_stack+0x74/0xbb [ 46.342795] print_address_description+0x9b/0x2b0 [ 46.342797] kasan_report+0x258/0x380 [ 46.355407] ? fscache_alloc_cookie+0x129/0x310 [ 46.355410] fscache_alloc_cookie+0x129/0x310 [ 46.355413] __fscache_acquire_cookie+0xd2/0x570 [ 46.355417] nfs_fscache_get_client_cookie+0x206/0x220 [ 46.355419] ? nfs_readpage_from_fscache_complete+0xa0/0xa0 [ 46.355422] ? rcu_read_lock_sched_held+0x8a/0xa0 [ 46.355426] ? memcpy+0x34/0x50 [ 46.355428] nfs_alloc_client+0x1d9/0x1f0 [ 46.371854] nfs4_alloc_client+0x22/0x420 [ 46.371857] nfs_get_client+0x47d/0x8f0 [ 46.371860] ? pcpu_alloc+0x599/0xaf0 [ 46.371862] nfs4_set_client+0x155/0x1e0 [ 46.371865] ? nfs4_check_serverowner_major_id+0x50/0x50 [ 46.371867] nfs4_create_server+0x261/0x4e0 [ 46.371870] ? nfs4_set_ds_client+0x200/0x200 [ 46.371872] ? alloc_vfsmnt+0xa6/0x360 [ 46.371875] ? __lockdep_init_map+0xaa/0x290 [ 46.371878] nfs4_remote_mount+0x31/0x60 [ 46.371880] mount_fs+0x2f/0xd0 [ 46.371884] vfs_kern_mount+0x68/0x200 [ 46.396948] nfs_do_root_mount+0x7f/0xc0 [ 46.396952] ? do_raw_spin_unlock+0xa2/0x130 [ 46.396954] nfs4_try_mount+0x7f/0x110 [ 46.396957] nfs_fs_mount+0xca5/0x1450 [ 46.396960] ? pcpu_alloc+0x599/0xaf0 [ 46.396962] ? nfs_remount+0x8a0/0x8a0 [ 46.396964] ? mark_held_locks+0x1c/0xb0 [ 46.396967] ? __raw_spin_lock_init+0x1c/0x70 [ 46.412631] ? trace_hardirqs_on_caller+0x187/0x260 [ 46.412633] ? nfs_clone_super+0x150/0x150 [ 46.412635] ? nfs_destroy_inode+0x20/0x20 [ 46.412637] ? __lockdep_init_map+0xaa/0x290 [ 46.412639] ? __lockdep_init_map+0xaa/0x290 [ 46.412641] ? mount_fs+0x2f/0xd0 [ 46.412642] mount_fs+0x2f/0xd0 [ 46.412645] vfs_kern_mount+0x68/0x200 [ 46.412648] ? do_raw_read_unlock+0x28/0x50 [ 46.412651] do_mount+0x2ac/0x14f0 [ 46.412653] ? copy_mount_string+0x20/0x20 [ 46.431590] ? copy_mount_options+0xe6/0x1b0 [ 46.431592] ? copy_mount_options+0x100/0x1b0 [ 46.431594] ? copy_mount_options+0xe6/0x1b0 [ 46.431596] ksys_mount+0x7e/0xd0 [ 46.431599] __x64_sys_mount+0x62/0x70 [ 46.431601] do_syscall_64+0xc7/0x8a0 [ 46.431603] ? syscall_return_slowpath+0x3c0/0x3c0 [ 46.431605] ? mark_held_locks+0x1c/0xb0 [ 46.431609] ? entry_SYSCALL_64_after_hwframe+0x59/0xbe [ 46.431611] ? trace_hardirqs_off_caller+0xc2/0x110 [ 46.431613] ? trace_hardirqs_off_thunk+0x1a/0x1c [ 46.431615] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 46.431617] RIP: 0033:0x7f546ceb97fa [ 46.431619] RSP: 002b:7ffdf1c9d078 EFLAGS: 0206 ORIG_RAX: 00a5 [ 46.431622] RAX: ffda RBX: RCX: 7f546ceb97fa [ 46.431623] RDX: 55decf202b20 RSI: 55decf202b40 RDI: 55decf204850 [ 46.431625] RBP: 7ffdf1c9d1d0 R08: 55decf206680 R09: 62353a303036343a [ 46.431626] R10: 0c00 R11: 0206 R12: 7ffdf1c9d1d0 [ 46.431627] R13: 55decf205870 R14: 001c R15: 7ffdf1c9d0e0 [ 46.431631] Allocated by task 839: [ 46.431634] kasan_kmalloc+0xa0/0xd0 [ 46.431636] __kmalloc+0x156/0x350 [ 46.431639] fscache_alloc_cookie+0x2e4/0x310 [ 46.431640] __fscache_acquire_cookie+0xd2/0x570 [ 46.431643] nfs_fscache_get_client_cookie+0x206/0x220 [ 46.431645] nfs_alloc_client+0x1d9/0x1f0 [ 46.431648] nfs4_alloc_client+0x22/0x420 [ 46.431650] nfs_get_client+0x47d/0x8f0 [ 46.431652] nfs4_set_client+0x155/0x1e0 [ 46.431653] nfs4_create_server+0x261/0x4e0 [ 46.431655] nfs4_remote_mount+0x31/0x60 [ 46.431657] mount_fs+0x2f/0xd0 [ 46.431659] vfs_kern_mount+0x68/0x200 [ 46.431662] nfs_do_root_mount+0x7f/0xc0 [ 46.484441] nfs4_try_mount+0x7f/0x110 [ 46.484443] nfs_fs_mount+0xca5/0x1450 [ 46.484445] mount_fs+0x2f/0xd0 [ 46.484447] vfs_kern_mount+0x68/0x200 [ 46.484449] do_mount+0x2ac/0x14f0 [ 46.484451] ksys_mount+0x7e/0xd0 [ 46.484452] __x64_sys_mount+0x62/0x70 [ 46.484455] do_syscall_64+0xc7/0x8a0 [ 46.484458] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 46.484461] Freed by task 407: [ 46.499159] __kasan_slab_free+0x11d/0x160 [ 46.499161] kfree+0xe5/0x320 [ 46.499163] kobject_uevent_env+0x1ab/0x760 [ 46.499165] kobject_synth_uevent+0x470/0x4e0 [ 46.499168] uevent_store+0x1c/0x40 [ 46.499171] kernfs_fop_write+0x196/0x230 [ 46.499174] __vfs_write+0xc5/0x310 [ 46.499175] vfs_write+0xfb/0x250 [ 46.499177] ksys_write+0xa7/0x130 [ 46.499180] do_syscall_64+0xc7/0x8a0 [ 46.512915]
Re: Linux messages full of `random: get_random_u32 called from`
On Sun, Apr 29, 2018 at 07:02:02PM -0400, Dave Jones wrote: > On Tue, Apr 24, 2018 at 09:56:21AM -0400, Theodore Y. Ts'o wrote: > > > Can you tell me a bit about your system? What distribution, what > > hardware is present in your sytsem (what architecture, what > > peripherals are attached, etc.)? > > > > There's a reason why we made this --- we were declaring the random > > number pool to be fully intialized before it really was, and that was > > a potential security concern. It's not as bad as the weakness > > discovered by Nadia Heninger in 2012. (See https://factorable.net for > > more details.) However, this is not one of those things where we like > > to fool around. > > > > So I want to understand if this is an issue with a particular hardware > > configuration, or whether it's just a badly designed Linux init system > > or embedded setup, or something else. After all, you wouldn't want > > the NSA spying on all of your network traffic, would you? :-) > > Why do we continue to print this stuff out when crng_init=1 though ? answering my own question, I think.. This is a tristate, and we need it to be >1 to be quiet, which doesn't happen until.. > [ 165.806247] random: crng init done this point. Dave
Re: Linux messages full of `random: get_random_u32 called from`
On Sun, Apr 29, 2018 at 07:02:02PM -0400, Dave Jones wrote: > On Tue, Apr 24, 2018 at 09:56:21AM -0400, Theodore Y. Ts'o wrote: > > > Can you tell me a bit about your system? What distribution, what > > hardware is present in your sytsem (what architecture, what > > peripherals are attached, etc.)? > > > > There's a reason why we made this --- we were declaring the random > > number pool to be fully intialized before it really was, and that was > > a potential security concern. It's not as bad as the weakness > > discovered by Nadia Heninger in 2012. (See https://factorable.net for > > more details.) However, this is not one of those things where we like > > to fool around. > > > > So I want to understand if this is an issue with a particular hardware > > configuration, or whether it's just a badly designed Linux init system > > or embedded setup, or something else. After all, you wouldn't want > > the NSA spying on all of your network traffic, would you? :-) > > Why do we continue to print this stuff out when crng_init=1 though ? answering my own question, I think.. This is a tristate, and we need it to be >1 to be quiet, which doesn't happen until.. > [ 165.806247] random: crng init done this point. Dave
Re: Linux messages full of `random: get_random_u32 called from`
On Tue, Apr 24, 2018 at 09:56:21AM -0400, Theodore Y. Ts'o wrote: > Can you tell me a bit about your system? What distribution, what > hardware is present in your sytsem (what architecture, what > peripherals are attached, etc.)? > > There's a reason why we made this --- we were declaring the random > number pool to be fully intialized before it really was, and that was > a potential security concern. It's not as bad as the weakness > discovered by Nadia Heninger in 2012. (See https://factorable.net for > more details.) However, this is not one of those things where we like > to fool around. > > So I want to understand if this is an issue with a particular hardware > configuration, or whether it's just a badly designed Linux init system > or embedded setup, or something else. After all, you wouldn't want > the NSA spying on all of your network traffic, would you? :-) Why do we continue to print this stuff out when crng_init=1 though ? (This from debian stable, on a pretty basic atom box, but similar dmesg's on everything else I've put 4.17-rc on so far) [0.00] random: get_random_bytes called from start_kernel+0x96/0x519 with crng_init=0 [0.00] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0 [0.00] random: get_random_u64 called from cache_random_seq_create+0x76/0x120 with crng_init=0 [0.151401] calling initialize_ptr_random+0x0/0x36 @ 1 [0.151527] initcall initialize_ptr_random+0x0/0x36 returned 0 after 0 usecs [0.294661] calling prandom_init+0x0/0xbd @ 1 [0.294763] initcall prandom_init+0x0/0xbd returned 0 after 0 usecs [1.430529] _warn_unseeded_randomness: 165 callbacks suppressed [1.430540] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0 [1.430860] random: get_random_u64 called from cache_random_seq_create+0x76/0x120 with crng_init=0 [1.452240] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=0 [2.954901] _warn_unseeded_randomness: 54 callbacks suppressed [2.954910] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0 [2.955185] random: get_random_u64 called from cache_random_seq_create+0x76/0x120 with crng_init=0 [2.957701] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0 [6.017364] _warn_unseeded_randomness: 88 callbacks suppressed [6.017373] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0 [6.042652] random: get_random_u64 called from cache_random_seq_create+0x76/0x120 with crng_init=0 [6.060333] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0 [6.951978] calling prandom_reseed+0x0/0x2a @ 1 [6.960627] initcall prandom_reseed+0x0/0x2a returned 0 after 105 usecs [7.371745] _warn_unseeded_randomness: 37 callbacks suppressed [7.371759] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=0 [7.395926] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=0 [7.411549] random: get_random_u32 called from arch_align_stack+0x37/0x50 with crng_init=0 [7.553379] random: systemd-udevd: uninitialized urandom read (16 bytes read) [7.563210] random: systemd-udevd: uninitialized urandom read (16 bytes read) [7.571498] random: systemd-udevd: uninitialized urandom read (16 bytes read) [8.449679] _warn_unseeded_randomness: 154 callbacks suppressed [8.449691] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=0 [8.483097] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=0 [8.497999] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=0 [9.353904] random: fast init done [9.770384] _warn_unseeded_randomness: 187 callbacks suppressed [9.770398] random: get_random_u32 called from bucket_table_alloc+0x84/0x1b0 with crng_init=1 [9.791514] random: get_random_u32 called from new_slab+0x174/0x680 with crng_init=1 [9.834909] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=1 [ 10.802200] _warn_unseeded_randomness: 168 callbacks suppressed [ 10.802214] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1 [ 10.802276] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1 [ 10.802289] random: get_random_u32 called from arch_align_stack+0x37/0x50 with crng_init=1 [ 11.821109] _warn_unseeded_randomness: 160 callbacks suppressed [ 11.821122] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=1 [ 11.863770] random: get_random_u32 called from bucket_table_alloc+0x84/0x1b0 with crng_init=1 [ 11.869384] random: get_random_u32 called from new_slab+0x174/0x680 with crng_init=1 [ 12.843237]
Re: Linux messages full of `random: get_random_u32 called from`
On Tue, Apr 24, 2018 at 09:56:21AM -0400, Theodore Y. Ts'o wrote: > Can you tell me a bit about your system? What distribution, what > hardware is present in your sytsem (what architecture, what > peripherals are attached, etc.)? > > There's a reason why we made this --- we were declaring the random > number pool to be fully intialized before it really was, and that was > a potential security concern. It's not as bad as the weakness > discovered by Nadia Heninger in 2012. (See https://factorable.net for > more details.) However, this is not one of those things where we like > to fool around. > > So I want to understand if this is an issue with a particular hardware > configuration, or whether it's just a badly designed Linux init system > or embedded setup, or something else. After all, you wouldn't want > the NSA spying on all of your network traffic, would you? :-) Why do we continue to print this stuff out when crng_init=1 though ? (This from debian stable, on a pretty basic atom box, but similar dmesg's on everything else I've put 4.17-rc on so far) [0.00] random: get_random_bytes called from start_kernel+0x96/0x519 with crng_init=0 [0.00] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0 [0.00] random: get_random_u64 called from cache_random_seq_create+0x76/0x120 with crng_init=0 [0.151401] calling initialize_ptr_random+0x0/0x36 @ 1 [0.151527] initcall initialize_ptr_random+0x0/0x36 returned 0 after 0 usecs [0.294661] calling prandom_init+0x0/0xbd @ 1 [0.294763] initcall prandom_init+0x0/0xbd returned 0 after 0 usecs [1.430529] _warn_unseeded_randomness: 165 callbacks suppressed [1.430540] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0 [1.430860] random: get_random_u64 called from cache_random_seq_create+0x76/0x120 with crng_init=0 [1.452240] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=0 [2.954901] _warn_unseeded_randomness: 54 callbacks suppressed [2.954910] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0 [2.955185] random: get_random_u64 called from cache_random_seq_create+0x76/0x120 with crng_init=0 [2.957701] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0 [6.017364] _warn_unseeded_randomness: 88 callbacks suppressed [6.017373] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0 [6.042652] random: get_random_u64 called from cache_random_seq_create+0x76/0x120 with crng_init=0 [6.060333] random: get_random_u64 called from __kmem_cache_create+0x39/0x450 with crng_init=0 [6.951978] calling prandom_reseed+0x0/0x2a @ 1 [6.960627] initcall prandom_reseed+0x0/0x2a returned 0 after 105 usecs [7.371745] _warn_unseeded_randomness: 37 callbacks suppressed [7.371759] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=0 [7.395926] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=0 [7.411549] random: get_random_u32 called from arch_align_stack+0x37/0x50 with crng_init=0 [7.553379] random: systemd-udevd: uninitialized urandom read (16 bytes read) [7.563210] random: systemd-udevd: uninitialized urandom read (16 bytes read) [7.571498] random: systemd-udevd: uninitialized urandom read (16 bytes read) [8.449679] _warn_unseeded_randomness: 154 callbacks suppressed [8.449691] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=0 [8.483097] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=0 [8.497999] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=0 [9.353904] random: fast init done [9.770384] _warn_unseeded_randomness: 187 callbacks suppressed [9.770398] random: get_random_u32 called from bucket_table_alloc+0x84/0x1b0 with crng_init=1 [9.791514] random: get_random_u32 called from new_slab+0x174/0x680 with crng_init=1 [9.834909] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=1 [ 10.802200] _warn_unseeded_randomness: 168 callbacks suppressed [ 10.802214] random: get_random_u64 called from arch_pick_mmap_layout+0x64/0x130 with crng_init=1 [ 10.802276] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 with crng_init=1 [ 10.802289] random: get_random_u32 called from arch_align_stack+0x37/0x50 with crng_init=1 [ 11.821109] _warn_unseeded_randomness: 160 callbacks suppressed [ 11.821122] random: get_random_u64 called from copy_process.part.67+0x1ae/0x1e60 with crng_init=1 [ 11.863770] random: get_random_u32 called from bucket_table_alloc+0x84/0x1b0 with crng_init=1 [ 11.869384] random: get_random_u32 called from new_slab+0x174/0x680 with crng_init=1 [ 12.843237]
Re: [Intel-gfx] 4.17-rc2: Could not determine valid watermarks for inherited state
On Thu, Apr 26, 2018 at 06:25:13PM +0300, Ville Syrjälä wrote: > On Thu, Apr 26, 2018 at 06:16:41PM +0300, Ville Syrjälä wrote: > > On Thu, Apr 26, 2018 at 05:56:14PM +0300, Ville Syrjälä wrote: > > > On Thu, Apr 26, 2018 at 10:27:19AM -0400, Dave Jones wrote: > > > > [1.176131] [drm:i9xx_get_initial_plane_config] pipe A/primary A > > > > with fb: size=800x600@32, offset=0, pitch 3200, size 0x1d4c00 > > > > [1.176161] [drm:i915_gem_object_create_stolen_for_preallocated] > > > > creating preallocated stolen object: stolen_offset=0x, > > > > gtt_offset=0x, size=0x001d5000 > > > > [1.176312] [drm:intel_alloc_initial_plane_obj.isra.127] initial > > > > plane fb obj (ptrval) > > > > [1.176351] [drm:intel_modeset_init] pipe A active planes 0x1 > > > > [1.176456] [drm:drm_atomic_helper_check_plane_state] Plane must > > > > cover entire CRTC > > > > [1.176481] [drm:drm_rect_debug_print] dst: 800x600+0+0 > > > > [1.176494] [drm:drm_rect_debug_print] clip: 1366x768+0+0 > > > > > > OK, so that's the problem right there. The fb we took over from the > > > BIOS was 800x600, but now we're trying to set up a 1366x768 mode. > > > > > > We seem to be missing checks to make sure the initial fb is actually > > > big enough for the mode we're currently using :( > > > Hmm. Or maybe we should just stick to the pipe src size. > > I'm curious whether this fixes the problem? > > diff --git a/drivers/gpu/drm/i915/intel_display.c > b/drivers/gpu/drm/i915/intel_display.c > index 0f8c7389e87d..30824beedef7 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -15284,6 +15284,8 @@ static void intel_modeset_readout_hw_state(struct > drm_device *dev) > memset(>base.mode, 0, sizeof(crtc->base.mode)); > if (crtc_state->base.active) { > intel_mode_from_pipe_config(>base.mode, > crtc_state); > +crtc->base.mode.hdisplay = crtc_state->pipe_src_w; > +crtc->base.mode.vdisplay = crtc_state->pipe_src_h; > > intel_mode_from_pipe_config(_state->base.adjusted_mode, crtc_state); > WARN_ON(drm_atomic_set_mode_for_crtc(crtc->base.state, > >base.mode)); > It does! Feel free to throw a Tested-by: Dave Jones <da...@codemonkey.org.uk> in there. Dave
Re: [Intel-gfx] 4.17-rc2: Could not determine valid watermarks for inherited state
On Thu, Apr 26, 2018 at 06:25:13PM +0300, Ville Syrjälä wrote: > On Thu, Apr 26, 2018 at 06:16:41PM +0300, Ville Syrjälä wrote: > > On Thu, Apr 26, 2018 at 05:56:14PM +0300, Ville Syrjälä wrote: > > > On Thu, Apr 26, 2018 at 10:27:19AM -0400, Dave Jones wrote: > > > > [1.176131] [drm:i9xx_get_initial_plane_config] pipe A/primary A > > > > with fb: size=800x600@32, offset=0, pitch 3200, size 0x1d4c00 > > > > [1.176161] [drm:i915_gem_object_create_stolen_for_preallocated] > > > > creating preallocated stolen object: stolen_offset=0x, > > > > gtt_offset=0x, size=0x001d5000 > > > > [1.176312] [drm:intel_alloc_initial_plane_obj.isra.127] initial > > > > plane fb obj (ptrval) > > > > [1.176351] [drm:intel_modeset_init] pipe A active planes 0x1 > > > > [1.176456] [drm:drm_atomic_helper_check_plane_state] Plane must > > > > cover entire CRTC > > > > [1.176481] [drm:drm_rect_debug_print] dst: 800x600+0+0 > > > > [1.176494] [drm:drm_rect_debug_print] clip: 1366x768+0+0 > > > > > > OK, so that's the problem right there. The fb we took over from the > > > BIOS was 800x600, but now we're trying to set up a 1366x768 mode. > > > > > > We seem to be missing checks to make sure the initial fb is actually > > > big enough for the mode we're currently using :( > > > Hmm. Or maybe we should just stick to the pipe src size. > > I'm curious whether this fixes the problem? > > diff --git a/drivers/gpu/drm/i915/intel_display.c > b/drivers/gpu/drm/i915/intel_display.c > index 0f8c7389e87d..30824beedef7 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -15284,6 +15284,8 @@ static void intel_modeset_readout_hw_state(struct > drm_device *dev) > memset(>base.mode, 0, sizeof(crtc->base.mode)); > if (crtc_state->base.active) { > intel_mode_from_pipe_config(>base.mode, > crtc_state); > +crtc->base.mode.hdisplay = crtc_state->pipe_src_w; > +crtc->base.mode.vdisplay = crtc_state->pipe_src_h; > > intel_mode_from_pipe_config(_state->base.adjusted_mode, crtc_state); > WARN_ON(drm_atomic_set_mode_for_crtc(crtc->base.state, > >base.mode)); > It does! Feel free to throw a Tested-by: Dave Jones in there. Dave
Re: [Intel-gfx] 4.17-rc2: Could not determine valid watermarks for inherited state
On Thu, Apr 26, 2018 at 04:10:45PM +0300, Ville Syrjälä wrote: > On Mon, Apr 23, 2018 at 11:27:13AM -0400, Dave Jones wrote: > > This warning just started appearing during boot on a machine I upgraded > > to 4.17-rc2. The warning seems to have been there since 2015, but it > > has never triggered before today. > > Looks like we have bug open about this. I just asked for more > information there: > https://bugs.freedesktop.org/show_bug.cgi?id=105992#c5 > > If you can also boot with drm.debug=0xe maybe we can see some more > details about the supposedly bad watermarks. [1.153294] calling drm_kms_helper_init+0x0/0x15 @ 1 [1.153768] initcall drm_kms_helper_init+0x0/0x15 returned 0 after 0 usecs [1.154242] calling drm_core_init+0x0/0xea @ 1 [1.154760] initcall drm_core_init+0x0/0xea returned 0 after 53 usecs [1.156781] [drm:intel_pch_type] Found LynxPoint PCH [1.157254] [drm:intel_power_domains_init] Allowed DC state mask 00 [1.158717] [drm:i915_driver_load] ppgtt mode: 1 [1.159187] [drm:intel_uc_sanitize_options] enable_guc=0 (submission:no huc:no) [1.159665] [drm:i915_driver_load] guc_log_level=0 (enabled:no verbosity:-1) [1.160247] [drm:i915_ggtt_probe_hw] GGTT size = 2048M [1.160720] [drm:i915_ggtt_probe_hw] GMADR size = 256M [1.161189] [drm:i915_ggtt_probe_hw] DSM size = 64M [1.162126] fb: switching to inteldrmfb from EFI VGA [1.163161] fb: switching to inteldrmfb from VGA16 VGA [1.163511] [drm] Replacing VGA console driver [1.163819] [drm:i915_gem_init_stolen] Memory reserved for graphics device: 65536K, usable: 64512K [1.163868] [drm:intel_opregion_setup] graphic opregion physical addr: 0xd9a13018 [1.163908] [drm:intel_opregion_setup] Public ACPI methods supported [1.163924] [drm:intel_opregion_setup] SWSCI supported [1.168084] [drm:intel_opregion_setup] SWSCI GBDA callbacks 0cb3, SBCB callbacks 00300483 [1.168107] [drm:intel_opregion_setup] ASLE supported [1.168120] [drm:intel_opregion_setup] ASLE extension supported [1.168136] [drm:intel_opregion_setup] Found valid VBT in ACPI OpRegion (Mailbox #4) [1.168325] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [1.168341] [drm] Driver supports precise vblank timestamp query. [1.168357] [drm:intel_bios_init] Set default to SSC at 12 kHz [1.168373] [drm:intel_bios_init] VBT signature "$VBT HASWELL", BDB version 174 [1.168392] [drm:intel_bios_init] BDB_GENERAL_FEATURES int_tv_support 0 int_crt_support 1 lvds_use_ssc 0 lvds_ssc_freq 12 display_clock_mode 0 fdi_rx_polarity_inverted 0 [1.168425] [drm:intel_bios_init] crt_ddc_bus_pin: 5 [1.171131] [drm:intel_opregion_get_panel_type] Ignoring OpRegion panel type (0) [1.171151] [drm:intel_bios_init] Panel type: 2 (VBT) [1.171164] [drm:intel_bios_init] DRRS supported mode is static [1.171185] [drm:intel_bios_init] Found panel mode in BIOS VBT tables: [1.171203] [drm:drm_mode_debug_printmodeline] Modeline 0:"1024x768" 0 65000 1024 1048 1184 1344 768 771 777 806 0x8 0xa [1.171227] [drm:intel_bios_init] VBT initial LVDS value 300 [1.171242] [drm:intel_bios_init] VBT backlight PWM modulation frequency 200 Hz, active high, min brightness 0, level 255, controller 0 [1.171272] [drm:intel_bios_init] Found SDVO panel mode in BIOS VBT tables: [1.171289] [drm:drm_mode_debug_printmodeline] Modeline 0:"1600x1200" 0 162000 1600 1664 1856 2160 1200 1201 1204 1250 0x8 0xa [1.171314] [drm:intel_bios_init] DRRS State Enabled:1 [1.171327] [drm:intel_bios_init] No SDVO device info is found in VBT [1.171344] [drm:intel_bios_init] Port B VBT info: DP:0 HDMI:0 DVI:1 EDP:0 CRT:0 [1.171362] [drm:intel_bios_init] VBT HDMI level shift for port B: 6 [1.171377] [drm:intel_bios_init] Port D VBT info: DP:0 HDMI:1 DVI:1 EDP:0 CRT:0 [1.171395] [drm:intel_bios_init] VBT HDMI level shift for port D: 11 [1.171470] [drm:intel_dsm_detect] no _DSM method for intel device [1.171492] [drm:i915_driver_load] rawclk rate: 125000 kHz [1.171524] [drm:intel_power_well_enable] enabling always-on [1.171549] [drm:intel_power_well_enable] enabling display [1.172946] [drm:intel_fbc_init] Sanitized enable_fbc value: 0 [1.172964] [drm:intel_print_wm_latency] Primary WM0 latency 20 (2.0 usec) [1.172981] [drm:intel_print_wm_latency] Primary WM1 latency 4 (2.0 usec) [1.172997] [drm:intel_print_wm_latency] Primary WM2 latency 36 (18.0 usec) [1.173014] [drm:intel_print_wm_latency] Primary WM3 latency 90 (45.0 usec) [1.173030] [drm:intel_print_wm_latency] Primary WM4 latency 160 (80.0 usec) [1.173047] [drm:intel_print_wm_latency] Sprite WM0 latency 20 (2.0 usec) [1.173063] [drm:intel_print_wm_latency] Sprite WM1 latency 4 (2.0 usec) [1.173080] [drm:intel_print_wm_latency] Sprite WM2 latency 36 (18.0 usec) [
Re: [Intel-gfx] 4.17-rc2: Could not determine valid watermarks for inherited state
On Thu, Apr 26, 2018 at 04:10:45PM +0300, Ville Syrjälä wrote: > On Mon, Apr 23, 2018 at 11:27:13AM -0400, Dave Jones wrote: > > This warning just started appearing during boot on a machine I upgraded > > to 4.17-rc2. The warning seems to have been there since 2015, but it > > has never triggered before today. > > Looks like we have bug open about this. I just asked for more > information there: > https://bugs.freedesktop.org/show_bug.cgi?id=105992#c5 > > If you can also boot with drm.debug=0xe maybe we can see some more > details about the supposedly bad watermarks. [1.153294] calling drm_kms_helper_init+0x0/0x15 @ 1 [1.153768] initcall drm_kms_helper_init+0x0/0x15 returned 0 after 0 usecs [1.154242] calling drm_core_init+0x0/0xea @ 1 [1.154760] initcall drm_core_init+0x0/0xea returned 0 after 53 usecs [1.156781] [drm:intel_pch_type] Found LynxPoint PCH [1.157254] [drm:intel_power_domains_init] Allowed DC state mask 00 [1.158717] [drm:i915_driver_load] ppgtt mode: 1 [1.159187] [drm:intel_uc_sanitize_options] enable_guc=0 (submission:no huc:no) [1.159665] [drm:i915_driver_load] guc_log_level=0 (enabled:no verbosity:-1) [1.160247] [drm:i915_ggtt_probe_hw] GGTT size = 2048M [1.160720] [drm:i915_ggtt_probe_hw] GMADR size = 256M [1.161189] [drm:i915_ggtt_probe_hw] DSM size = 64M [1.162126] fb: switching to inteldrmfb from EFI VGA [1.163161] fb: switching to inteldrmfb from VGA16 VGA [1.163511] [drm] Replacing VGA console driver [1.163819] [drm:i915_gem_init_stolen] Memory reserved for graphics device: 65536K, usable: 64512K [1.163868] [drm:intel_opregion_setup] graphic opregion physical addr: 0xd9a13018 [1.163908] [drm:intel_opregion_setup] Public ACPI methods supported [1.163924] [drm:intel_opregion_setup] SWSCI supported [1.168084] [drm:intel_opregion_setup] SWSCI GBDA callbacks 0cb3, SBCB callbacks 00300483 [1.168107] [drm:intel_opregion_setup] ASLE supported [1.168120] [drm:intel_opregion_setup] ASLE extension supported [1.168136] [drm:intel_opregion_setup] Found valid VBT in ACPI OpRegion (Mailbox #4) [1.168325] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [1.168341] [drm] Driver supports precise vblank timestamp query. [1.168357] [drm:intel_bios_init] Set default to SSC at 12 kHz [1.168373] [drm:intel_bios_init] VBT signature "$VBT HASWELL", BDB version 174 [1.168392] [drm:intel_bios_init] BDB_GENERAL_FEATURES int_tv_support 0 int_crt_support 1 lvds_use_ssc 0 lvds_ssc_freq 12 display_clock_mode 0 fdi_rx_polarity_inverted 0 [1.168425] [drm:intel_bios_init] crt_ddc_bus_pin: 5 [1.171131] [drm:intel_opregion_get_panel_type] Ignoring OpRegion panel type (0) [1.171151] [drm:intel_bios_init] Panel type: 2 (VBT) [1.171164] [drm:intel_bios_init] DRRS supported mode is static [1.171185] [drm:intel_bios_init] Found panel mode in BIOS VBT tables: [1.171203] [drm:drm_mode_debug_printmodeline] Modeline 0:"1024x768" 0 65000 1024 1048 1184 1344 768 771 777 806 0x8 0xa [1.171227] [drm:intel_bios_init] VBT initial LVDS value 300 [1.171242] [drm:intel_bios_init] VBT backlight PWM modulation frequency 200 Hz, active high, min brightness 0, level 255, controller 0 [1.171272] [drm:intel_bios_init] Found SDVO panel mode in BIOS VBT tables: [1.171289] [drm:drm_mode_debug_printmodeline] Modeline 0:"1600x1200" 0 162000 1600 1664 1856 2160 1200 1201 1204 1250 0x8 0xa [1.171314] [drm:intel_bios_init] DRRS State Enabled:1 [1.171327] [drm:intel_bios_init] No SDVO device info is found in VBT [1.171344] [drm:intel_bios_init] Port B VBT info: DP:0 HDMI:0 DVI:1 EDP:0 CRT:0 [1.171362] [drm:intel_bios_init] VBT HDMI level shift for port B: 6 [1.171377] [drm:intel_bios_init] Port D VBT info: DP:0 HDMI:1 DVI:1 EDP:0 CRT:0 [1.171395] [drm:intel_bios_init] VBT HDMI level shift for port D: 11 [1.171470] [drm:intel_dsm_detect] no _DSM method for intel device [1.171492] [drm:i915_driver_load] rawclk rate: 125000 kHz [1.171524] [drm:intel_power_well_enable] enabling always-on [1.171549] [drm:intel_power_well_enable] enabling display [1.172946] [drm:intel_fbc_init] Sanitized enable_fbc value: 0 [1.172964] [drm:intel_print_wm_latency] Primary WM0 latency 20 (2.0 usec) [1.172981] [drm:intel_print_wm_latency] Primary WM1 latency 4 (2.0 usec) [1.172997] [drm:intel_print_wm_latency] Primary WM2 latency 36 (18.0 usec) [1.173014] [drm:intel_print_wm_latency] Primary WM3 latency 90 (45.0 usec) [1.173030] [drm:intel_print_wm_latency] Primary WM4 latency 160 (80.0 usec) [1.173047] [drm:intel_print_wm_latency] Sprite WM0 latency 20 (2.0 usec) [1.173063] [drm:intel_print_wm_latency] Sprite WM1 latency 4 (2.0 usec) [1.173080] [drm:intel_print_wm_latency] Sprite WM2 latency 36 (18.0 usec) [
4.17-rc2: Could not determine valid watermarks for inherited state
This warning just started appearing during boot on a machine I upgraded to 4.17-rc2. The warning seems to have been there since 2015, but it has never triggered before today. Dave [1.158500] fb: switching to inteldrmfb from EFI VGA [1.159073] Console: switching to colour dummy device 80x25 [1.159523] checking generic (a 1) vs hw (e000 1000) [1.159539] fb: switching to inteldrmfb from VGA16 VGA [1.159752] [drm] Replacing VGA console driver [1.164454] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [1.164472] [drm] Driver supports precise vblank timestamp query. [1.167285] i915 :00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem [1.170212] [ cut here ] [1.170230] Could not determine valid watermarks for inherited state [1.170267] WARNING: CPU: 1 PID: 1 at drivers/gpu/drm/i915/intel_display.c:14584 sanitize_watermarks+0x17b/0x1c0 [1.170291] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc2+ #1 [1.170308] Hardware name: Shuttle Inc. SH87R/FH87, BIOS 2.03 06/19/2014 [1.170325] RIP: 0010:sanitize_watermarks+0x17b/0x1c0 [1.170338] RSP: :a944c0023bf0 EFLAGS: 00010246 [1.170352] RAX: RBX: 9193508c RCX: [1.170369] RDX: 0001 RSI: 990b7399 RDI: 990b7399 [1.170385] RBP: 9193508c R08: 0001 R09: 0001 [1.170401] R10: R11: R12: ffea [1.170418] R13: 9193508faa88 R14: 919350823528 R15: 9193508c0a08 [1.170434] FS: () GS:91935640() knlGS: [1.170453] CS: 0010 DS: ES: CR0: 80050033 [1.170466] CR2: CR3: 00011d224001 CR4: 000606e0 [1.170483] Call Trace: [1.170493] intel_modeset_init+0x769/0x18f0 [1.170506] i915_driver_load+0x9b9/0xf30 [1.170519] ? _raw_spin_unlock_irqrestore+0x3f/0x70 [1.170534] pci_device_probe+0xa3/0x120 [1.170546] driver_probe_device+0x28a/0x320 [1.170557] __driver_attach+0x9e/0xb0 [1.170568] ? driver_probe_device+0x320/0x320 [1.170581] bus_for_each_dev+0x68/0xc0 [1.170592] bus_add_driver+0x11d/0x210 [1.170604] ? mipi_dsi_bus_init+0x11/0x11 [1.170615] driver_register+0x5b/0xd0 [1.170627] do_one_initcall+0x10b/0x33f [1.170638] ? do_early_param+0x8b/0x8b [1.170651] ? rcu_read_lock_sched_held+0x66/0x70 [1.170663] ? do_early_param+0x8b/0x8b [1.170674] kernel_init_freeable+0x1c3/0x249 [1.170687] ? rest_init+0xc0/0xc0 [1.170697] kernel_init+0xa/0x100 [1.170707] ret_from_fork+0x24/0x30 [1.170717] Code: 00 00 00 65 48 33 04 25 28 00 00 00 75 4f 48 8d a4 24 88 00 00 00 5b 5d 41 5c 41 5d 41 5e c3 48 c7 c7 e0 5d 04 9a e8 25 33 b1 ff <0f> 0b eb a4 48 c7 c6 d5 73 04 9a 48 c7 c7 0f c6 fe 99 e8 0e 33 [1.170847] irq event stamp: 1449710 [1.170858] hardirqs last enabled at (1449709): [] console_unlock+0x51b/0x6b0 [1.170879] hardirqs last disabled at (1449710): [] error_entry+0x86/0x100 [1.170900] softirqs last enabled at (1449580): [] __do_softirq+0x3dd/0x521 [1.170922] softirqs last disabled at (1449563): [] irq_exit+0xb7/0xc0 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) (That's 8086:0402 fwiw)
4.17-rc2: Could not determine valid watermarks for inherited state
This warning just started appearing during boot on a machine I upgraded to 4.17-rc2. The warning seems to have been there since 2015, but it has never triggered before today. Dave [1.158500] fb: switching to inteldrmfb from EFI VGA [1.159073] Console: switching to colour dummy device 80x25 [1.159523] checking generic (a 1) vs hw (e000 1000) [1.159539] fb: switching to inteldrmfb from VGA16 VGA [1.159752] [drm] Replacing VGA console driver [1.164454] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [1.164472] [drm] Driver supports precise vblank timestamp query. [1.167285] i915 :00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem [1.170212] [ cut here ] [1.170230] Could not determine valid watermarks for inherited state [1.170267] WARNING: CPU: 1 PID: 1 at drivers/gpu/drm/i915/intel_display.c:14584 sanitize_watermarks+0x17b/0x1c0 [1.170291] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc2+ #1 [1.170308] Hardware name: Shuttle Inc. SH87R/FH87, BIOS 2.03 06/19/2014 [1.170325] RIP: 0010:sanitize_watermarks+0x17b/0x1c0 [1.170338] RSP: :a944c0023bf0 EFLAGS: 00010246 [1.170352] RAX: RBX: 9193508c RCX: [1.170369] RDX: 0001 RSI: 990b7399 RDI: 990b7399 [1.170385] RBP: 9193508c R08: 0001 R09: 0001 [1.170401] R10: R11: R12: ffea [1.170418] R13: 9193508faa88 R14: 919350823528 R15: 9193508c0a08 [1.170434] FS: () GS:91935640() knlGS: [1.170453] CS: 0010 DS: ES: CR0: 80050033 [1.170466] CR2: CR3: 00011d224001 CR4: 000606e0 [1.170483] Call Trace: [1.170493] intel_modeset_init+0x769/0x18f0 [1.170506] i915_driver_load+0x9b9/0xf30 [1.170519] ? _raw_spin_unlock_irqrestore+0x3f/0x70 [1.170534] pci_device_probe+0xa3/0x120 [1.170546] driver_probe_device+0x28a/0x320 [1.170557] __driver_attach+0x9e/0xb0 [1.170568] ? driver_probe_device+0x320/0x320 [1.170581] bus_for_each_dev+0x68/0xc0 [1.170592] bus_add_driver+0x11d/0x210 [1.170604] ? mipi_dsi_bus_init+0x11/0x11 [1.170615] driver_register+0x5b/0xd0 [1.170627] do_one_initcall+0x10b/0x33f [1.170638] ? do_early_param+0x8b/0x8b [1.170651] ? rcu_read_lock_sched_held+0x66/0x70 [1.170663] ? do_early_param+0x8b/0x8b [1.170674] kernel_init_freeable+0x1c3/0x249 [1.170687] ? rest_init+0xc0/0xc0 [1.170697] kernel_init+0xa/0x100 [1.170707] ret_from_fork+0x24/0x30 [1.170717] Code: 00 00 00 65 48 33 04 25 28 00 00 00 75 4f 48 8d a4 24 88 00 00 00 5b 5d 41 5c 41 5d 41 5e c3 48 c7 c7 e0 5d 04 9a e8 25 33 b1 ff <0f> 0b eb a4 48 c7 c6 d5 73 04 9a 48 c7 c7 0f c6 fe 99 e8 0e 33 [1.170847] irq event stamp: 1449710 [1.170858] hardirqs last enabled at (1449709): [] console_unlock+0x51b/0x6b0 [1.170879] hardirqs last disabled at (1449710): [] error_entry+0x86/0x100 [1.170900] softirqs last enabled at (1449580): [] __do_softirq+0x3dd/0x521 [1.170922] softirqs last disabled at (1449563): [] irq_exit+0xb7/0xc0 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) (That's 8086:0402 fwiw)
Re: [4.15-rc9] fs_reclaim lockdep trace
On Sun, Jan 28, 2018 at 02:55:28PM +0900, Tetsuo Handa wrote: > Dave, would you try below patch? > > >From cae2cbf389ae3cdef1b492622722b4aeb07eb284 Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp> > Date: Sun, 28 Jan 2018 14:17:14 +0900 > Subject: [PATCH] lockdep: Fix fs_reclaim warning. Seems to suppress the warning for me. Tested-by: Dave Jones <da...@codemonkey.org.uk>
Re: [4.15-rc9] fs_reclaim lockdep trace
On Sun, Jan 28, 2018 at 02:55:28PM +0900, Tetsuo Handa wrote: > Dave, would you try below patch? > > >From cae2cbf389ae3cdef1b492622722b4aeb07eb284 Mon Sep 17 00:00:00 2001 > From: Tetsuo Handa > Date: Sun, 28 Jan 2018 14:17:14 +0900 > Subject: [PATCH] lockdep: Fix fs_reclaim warning. Seems to suppress the warning for me. Tested-by: Dave Jones
Re: [4.15-rc9] fs_reclaim lockdep trace
On Tue, Jan 23, 2018 at 08:36:51PM -0500, Dave Jones wrote: > Just triggered this on a server I was rsync'ing to. Actually, I can trigger this really easily, even with an rsync from one disk to another. Though that also smells a little like networking in the traces. Maybe netdev has ideas. The first instance: > > WARNING: possible recursive locking detected > 4.15.0-rc9-backup-debug+ #1 Not tainted > > sshd/24800 is trying to acquire lock: > (fs_reclaim){+.+.}, at: [<84f438c2>] > fs_reclaim_acquire.part.102+0x5/0x30 > > but task is already holding lock: > (fs_reclaim){+.+.}, at: [<84f438c2>] > fs_reclaim_acquire.part.102+0x5/0x30 > > other info that might help us debug this: > Possible unsafe locking scenario: > >CPU0 > > lock(fs_reclaim); > lock(fs_reclaim); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 2 locks held by sshd/24800: > #0: (sk_lock-AF_INET6){+.+.}, at: [<1a069652>] > tcp_sendmsg+0x19/0x40 > #1: (fs_reclaim){+.+.}, at: [<84f438c2>] > fs_reclaim_acquire.part.102+0x5/0x30 > > stack backtrace: > CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1 > Call Trace: > dump_stack+0xbc/0x13f > ? _atomic_dec_and_lock+0x101/0x101 > ? fs_reclaim_acquire.part.102+0x5/0x30 > ? print_lock+0x54/0x68 > __lock_acquire+0xa09/0x2040 > ? debug_show_all_locks+0x2f0/0x2f0 > ? mutex_destroy+0x120/0x120 > ? hlock_class+0xa0/0xa0 > ? kernel_text_address+0x5c/0x90 > ? __kernel_text_address+0xe/0x30 > ? unwind_get_return_address+0x2f/0x50 > ? __save_stack_trace+0x92/0x100 > ? graph_lock+0x8d/0x100 > ? check_noncircular+0x20/0x20 > ? __lock_acquire+0x616/0x2040 > ? debug_show_all_locks+0x2f0/0x2f0 > ? __lock_acquire+0x616/0x2040 > ? debug_show_all_locks+0x2f0/0x2f0 > ? print_irqtrace_events+0x110/0x110 > ? active_load_balance_cpu_stop+0x7b0/0x7b0 > ? debug_show_all_locks+0x2f0/0x2f0 > ? mark_lock+0x1b1/0xa00 > ? lock_acquire+0x12e/0x350 > lock_acquire+0x12e/0x350 > ? fs_reclaim_acquire.part.102+0x5/0x30 > ? lockdep_rcu_suspicious+0x100/0x100 > ? set_next_entity+0x20e/0x10d0 > ? mark_lock+0x1b1/0xa00 > ? match_held_lock+0x8d/0x440 > ? mark_lock+0x1b1/0xa00 > ? save_trace+0x1e0/0x1e0 > ? print_irqtrace_events+0x110/0x110 > ? alloc_extent_state+0xa7/0x410 > fs_reclaim_acquire.part.102+0x29/0x30 > ? fs_reclaim_acquire.part.102+0x5/0x30 > kmem_cache_alloc+0x3d/0x2c0 > ? rb_erase+0xe63/0x1240 > alloc_extent_state+0xa7/0x410 > ? lock_extent_buffer_for_io+0x3f0/0x3f0 > ? find_held_lock+0x6d/0xd0 > ? test_range_bit+0x197/0x210 > ? lock_acquire+0x350/0x350 > ? do_raw_spin_unlock+0x147/0x220 > ? do_raw_spin_trylock+0x100/0x100 > ? iotree_fs_info+0x30/0x30 > __clear_extent_bit+0x3ea/0x570 > ? clear_state_bit+0x270/0x270 > ? count_range_bits+0x2f0/0x2f0 > ? lock_acquire+0x350/0x350 > ? rb_prev+0x21/0x90 > try_release_extent_mapping+0x21a/0x260 > __btrfs_releasepage+0xb0/0x1c0 > ? btrfs_submit_direct+0xca0/0xca0 > ? check_new_page_bad+0x1f0/0x1f0 > ? match_held_lock+0xa5/0x440 > ? debug_show_all_locks+0x2f0/0x2f0 > btrfs_releasepage+0x161/0x170 > ? __btrfs_releasepage+0x1c0/0x1c0 > ? page_rmapping+0xd0/0xd0 > ? rmap_walk+0x100/0x100 > try_to_release_page+0x162/0x1c0 > ? generic_file_write_iter+0x3c0/0x3c0 > ? page_evictable+0xcc/0x110 > ? lookup_address_in_pgd+0x107/0x190 > shrink_page_list+0x1d5a/0x2fb0 > ? putback_lru_page+0x3f0/0x3f0 > ? save_trace+0x1e0/0x1e0 > ? _lookup_address_cpa.isra.13+0x40/0x60 > ? debug_show_all_locks+0x2f0/0x2f0 > ? kmem_cache_free+0x8c/0x280 > ? free_extent_state+0x1c8/0x3b0 > ? mark_lock+0x1b1/0xa00 > ? page_rmapping+0xd0/0xd0 > ? print_irqtrace_events+0x110/0x110 > ? shrink_node_memcg.constprop.88+0x4c9/0x5e0 > ? shrink_node+0x12d/0x260 > ? try_to_free_pages+0x418/0xaf0 > ? __alloc_pages_slowpath+0x976/0x1790 > ? __alloc_pages_nodemask+0x52c/0x5c0 > ? delete_node+0x28d/0x5c0 > ? find_held_lock+0x6d/0xd0 > ? free_pcppages_bulk+0x381/0x570 > ? lock_acquire+0x350/0x350 > ? do_raw_spin_unlock+0x147/0x220 > ? do_raw_spin_trylock+0x100/0x100 > ? __lock_is_held+0x51/0xc0 > ? _raw_spin_unlock+0x24/0x30 > ? free_pcppages_bulk+0x381/0x570 > ? mark_lock+0x1b1/0xa00 > ? free_compound_page+0x30/0x30 > ? print_irqtrace_events+0x110/0x110 > ? __kernel_map_pages+0x2c9/0x310 > ? mark_lock+0
Re: [4.15-rc9] fs_reclaim lockdep trace
On Tue, Jan 23, 2018 at 08:36:51PM -0500, Dave Jones wrote: > Just triggered this on a server I was rsync'ing to. Actually, I can trigger this really easily, even with an rsync from one disk to another. Though that also smells a little like networking in the traces. Maybe netdev has ideas. The first instance: > > WARNING: possible recursive locking detected > 4.15.0-rc9-backup-debug+ #1 Not tainted > > sshd/24800 is trying to acquire lock: > (fs_reclaim){+.+.}, at: [<84f438c2>] > fs_reclaim_acquire.part.102+0x5/0x30 > > but task is already holding lock: > (fs_reclaim){+.+.}, at: [<84f438c2>] > fs_reclaim_acquire.part.102+0x5/0x30 > > other info that might help us debug this: > Possible unsafe locking scenario: > >CPU0 > > lock(fs_reclaim); > lock(fs_reclaim); > > *** DEADLOCK *** > > May be due to missing lock nesting notation > > 2 locks held by sshd/24800: > #0: (sk_lock-AF_INET6){+.+.}, at: [<1a069652>] > tcp_sendmsg+0x19/0x40 > #1: (fs_reclaim){+.+.}, at: [<84f438c2>] > fs_reclaim_acquire.part.102+0x5/0x30 > > stack backtrace: > CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1 > Call Trace: > dump_stack+0xbc/0x13f > ? _atomic_dec_and_lock+0x101/0x101 > ? fs_reclaim_acquire.part.102+0x5/0x30 > ? print_lock+0x54/0x68 > __lock_acquire+0xa09/0x2040 > ? debug_show_all_locks+0x2f0/0x2f0 > ? mutex_destroy+0x120/0x120 > ? hlock_class+0xa0/0xa0 > ? kernel_text_address+0x5c/0x90 > ? __kernel_text_address+0xe/0x30 > ? unwind_get_return_address+0x2f/0x50 > ? __save_stack_trace+0x92/0x100 > ? graph_lock+0x8d/0x100 > ? check_noncircular+0x20/0x20 > ? __lock_acquire+0x616/0x2040 > ? debug_show_all_locks+0x2f0/0x2f0 > ? __lock_acquire+0x616/0x2040 > ? debug_show_all_locks+0x2f0/0x2f0 > ? print_irqtrace_events+0x110/0x110 > ? active_load_balance_cpu_stop+0x7b0/0x7b0 > ? debug_show_all_locks+0x2f0/0x2f0 > ? mark_lock+0x1b1/0xa00 > ? lock_acquire+0x12e/0x350 > lock_acquire+0x12e/0x350 > ? fs_reclaim_acquire.part.102+0x5/0x30 > ? lockdep_rcu_suspicious+0x100/0x100 > ? set_next_entity+0x20e/0x10d0 > ? mark_lock+0x1b1/0xa00 > ? match_held_lock+0x8d/0x440 > ? mark_lock+0x1b1/0xa00 > ? save_trace+0x1e0/0x1e0 > ? print_irqtrace_events+0x110/0x110 > ? alloc_extent_state+0xa7/0x410 > fs_reclaim_acquire.part.102+0x29/0x30 > ? fs_reclaim_acquire.part.102+0x5/0x30 > kmem_cache_alloc+0x3d/0x2c0 > ? rb_erase+0xe63/0x1240 > alloc_extent_state+0xa7/0x410 > ? lock_extent_buffer_for_io+0x3f0/0x3f0 > ? find_held_lock+0x6d/0xd0 > ? test_range_bit+0x197/0x210 > ? lock_acquire+0x350/0x350 > ? do_raw_spin_unlock+0x147/0x220 > ? do_raw_spin_trylock+0x100/0x100 > ? iotree_fs_info+0x30/0x30 > __clear_extent_bit+0x3ea/0x570 > ? clear_state_bit+0x270/0x270 > ? count_range_bits+0x2f0/0x2f0 > ? lock_acquire+0x350/0x350 > ? rb_prev+0x21/0x90 > try_release_extent_mapping+0x21a/0x260 > __btrfs_releasepage+0xb0/0x1c0 > ? btrfs_submit_direct+0xca0/0xca0 > ? check_new_page_bad+0x1f0/0x1f0 > ? match_held_lock+0xa5/0x440 > ? debug_show_all_locks+0x2f0/0x2f0 > btrfs_releasepage+0x161/0x170 > ? __btrfs_releasepage+0x1c0/0x1c0 > ? page_rmapping+0xd0/0xd0 > ? rmap_walk+0x100/0x100 > try_to_release_page+0x162/0x1c0 > ? generic_file_write_iter+0x3c0/0x3c0 > ? page_evictable+0xcc/0x110 > ? lookup_address_in_pgd+0x107/0x190 > shrink_page_list+0x1d5a/0x2fb0 > ? putback_lru_page+0x3f0/0x3f0 > ? save_trace+0x1e0/0x1e0 > ? _lookup_address_cpa.isra.13+0x40/0x60 > ? debug_show_all_locks+0x2f0/0x2f0 > ? kmem_cache_free+0x8c/0x280 > ? free_extent_state+0x1c8/0x3b0 > ? mark_lock+0x1b1/0xa00 > ? page_rmapping+0xd0/0xd0 > ? print_irqtrace_events+0x110/0x110 > ? shrink_node_memcg.constprop.88+0x4c9/0x5e0 > ? shrink_node+0x12d/0x260 > ? try_to_free_pages+0x418/0xaf0 > ? __alloc_pages_slowpath+0x976/0x1790 > ? __alloc_pages_nodemask+0x52c/0x5c0 > ? delete_node+0x28d/0x5c0 > ? find_held_lock+0x6d/0xd0 > ? free_pcppages_bulk+0x381/0x570 > ? lock_acquire+0x350/0x350 > ? do_raw_spin_unlock+0x147/0x220 > ? do_raw_spin_trylock+0x100/0x100 > ? __lock_is_held+0x51/0xc0 > ? _raw_spin_unlock+0x24/0x30 > ? free_pcppages_bulk+0x381/0x570 > ? mark_lock+0x1b1/0xa00 > ? free_compound_page+0x30/0x30 > ? print_irqtrace_events+0x110/0x110 > ? __kernel_map_pages+0x2c9/0x310 > ? mark_lock+0
[4.15-rc9] fs_reclaim lockdep trace
Just triggered this on a server I was rsync'ing to. WARNING: possible recursive locking detected 4.15.0-rc9-backup-debug+ #1 Not tainted sshd/24800 is trying to acquire lock: (fs_reclaim){+.+.}, at: [<84f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 but task is already holding lock: (fs_reclaim){+.+.}, at: [<84f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(fs_reclaim); lock(fs_reclaim); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by sshd/24800: #0: (sk_lock-AF_INET6){+.+.}, at: [<1a069652>] tcp_sendmsg+0x19/0x40 #1: (fs_reclaim){+.+.}, at: [<84f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 stack backtrace: CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1 Call Trace: dump_stack+0xbc/0x13f ? _atomic_dec_and_lock+0x101/0x101 ? fs_reclaim_acquire.part.102+0x5/0x30 ? print_lock+0x54/0x68 __lock_acquire+0xa09/0x2040 ? debug_show_all_locks+0x2f0/0x2f0 ? mutex_destroy+0x120/0x120 ? hlock_class+0xa0/0xa0 ? kernel_text_address+0x5c/0x90 ? __kernel_text_address+0xe/0x30 ? unwind_get_return_address+0x2f/0x50 ? __save_stack_trace+0x92/0x100 ? graph_lock+0x8d/0x100 ? check_noncircular+0x20/0x20 ? __lock_acquire+0x616/0x2040 ? debug_show_all_locks+0x2f0/0x2f0 ? __lock_acquire+0x616/0x2040 ? debug_show_all_locks+0x2f0/0x2f0 ? print_irqtrace_events+0x110/0x110 ? active_load_balance_cpu_stop+0x7b0/0x7b0 ? debug_show_all_locks+0x2f0/0x2f0 ? mark_lock+0x1b1/0xa00 ? lock_acquire+0x12e/0x350 lock_acquire+0x12e/0x350 ? fs_reclaim_acquire.part.102+0x5/0x30 ? lockdep_rcu_suspicious+0x100/0x100 ? set_next_entity+0x20e/0x10d0 ? mark_lock+0x1b1/0xa00 ? match_held_lock+0x8d/0x440 ? mark_lock+0x1b1/0xa00 ? save_trace+0x1e0/0x1e0 ? print_irqtrace_events+0x110/0x110 ? alloc_extent_state+0xa7/0x410 fs_reclaim_acquire.part.102+0x29/0x30 ? fs_reclaim_acquire.part.102+0x5/0x30 kmem_cache_alloc+0x3d/0x2c0 ? rb_erase+0xe63/0x1240 alloc_extent_state+0xa7/0x410 ? lock_extent_buffer_for_io+0x3f0/0x3f0 ? find_held_lock+0x6d/0xd0 ? test_range_bit+0x197/0x210 ? lock_acquire+0x350/0x350 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? iotree_fs_info+0x30/0x30 __clear_extent_bit+0x3ea/0x570 ? clear_state_bit+0x270/0x270 ? count_range_bits+0x2f0/0x2f0 ? lock_acquire+0x350/0x350 ? rb_prev+0x21/0x90 try_release_extent_mapping+0x21a/0x260 __btrfs_releasepage+0xb0/0x1c0 ? btrfs_submit_direct+0xca0/0xca0 ? check_new_page_bad+0x1f0/0x1f0 ? match_held_lock+0xa5/0x440 ? debug_show_all_locks+0x2f0/0x2f0 btrfs_releasepage+0x161/0x170 ? __btrfs_releasepage+0x1c0/0x1c0 ? page_rmapping+0xd0/0xd0 ? rmap_walk+0x100/0x100 try_to_release_page+0x162/0x1c0 ? generic_file_write_iter+0x3c0/0x3c0 ? page_evictable+0xcc/0x110 ? lookup_address_in_pgd+0x107/0x190 shrink_page_list+0x1d5a/0x2fb0 ? putback_lru_page+0x3f0/0x3f0 ? save_trace+0x1e0/0x1e0 ? _lookup_address_cpa.isra.13+0x40/0x60 ? debug_show_all_locks+0x2f0/0x2f0 ? kmem_cache_free+0x8c/0x280 ? free_extent_state+0x1c8/0x3b0 ? mark_lock+0x1b1/0xa00 ? page_rmapping+0xd0/0xd0 ? print_irqtrace_events+0x110/0x110 ? shrink_node_memcg.constprop.88+0x4c9/0x5e0 ? shrink_node+0x12d/0x260 ? try_to_free_pages+0x418/0xaf0 ? __alloc_pages_slowpath+0x976/0x1790 ? __alloc_pages_nodemask+0x52c/0x5c0 ? delete_node+0x28d/0x5c0 ? find_held_lock+0x6d/0xd0 ? free_pcppages_bulk+0x381/0x570 ? lock_acquire+0x350/0x350 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? __lock_is_held+0x51/0xc0 ? _raw_spin_unlock+0x24/0x30 ? free_pcppages_bulk+0x381/0x570 ? mark_lock+0x1b1/0xa00 ? free_compound_page+0x30/0x30 ? print_irqtrace_events+0x110/0x110 ? __kernel_map_pages+0x2c9/0x310 ? mark_lock+0x1b1/0xa00 ? print_irqtrace_events+0x110/0x110 ? __delete_from_page_cache+0x2e7/0x4e0 ? save_trace+0x1e0/0x1e0 ? __add_to_page_cache_locked+0x680/0x680 ? find_held_lock+0x6d/0xd0 ? __list_add_valid+0x29/0xa0 ? free_unref_page_commit+0x198/0x270 ? drain_local_pages_wq+0x20/0x20 ? stop_critical_timings+0x210/0x210 ? mark_lock+0x1b1/0xa00 ? mark_lock+0x1b1/0xa00 ? print_irqtrace_events+0x110/0x110 ? __lock_acquire+0x616/0x2040 ? mark_lock+0x1b1/0xa00 ? mark_lock+0x1b1/0xa00 ? print_irqtrace_events+0x110/0x110 ? __phys_addr_symbol+0x23/0x40 ? __change_page_attr_set_clr+0xe86/0x1640 ? __btrfs_releasepage+0x1c0/0x1c0 ? mark_lock+0x1b1/0xa00 ? mark_lock+0x1b1/0xa00 ? print_irqtrace_events+0x110/0x110 ? mark_lock+0x1b1/0xa00 ? __lock_acquire+0x616/0x2040 ? __lock_acquire+0x616/0x2040 ? debug_show_all_locks+0x2f0/0x2f0 ? swiotlb_free_coherent+0x60/0x60 ? __phys_addr+0x32/0x80 ? igb_xmit_frame_ring+0xad7/0x1890 ? stack_access_ok+0x35/0x80 ? deref_stack_reg+0xa1/0xe0 ? __read_once_size_nocheck.constprop.6+0x10/0x10 ?
[4.15-rc9] fs_reclaim lockdep trace
Just triggered this on a server I was rsync'ing to. WARNING: possible recursive locking detected 4.15.0-rc9-backup-debug+ #1 Not tainted sshd/24800 is trying to acquire lock: (fs_reclaim){+.+.}, at: [<84f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 but task is already holding lock: (fs_reclaim){+.+.}, at: [<84f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 lock(fs_reclaim); lock(fs_reclaim); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by sshd/24800: #0: (sk_lock-AF_INET6){+.+.}, at: [<1a069652>] tcp_sendmsg+0x19/0x40 #1: (fs_reclaim){+.+.}, at: [<84f438c2>] fs_reclaim_acquire.part.102+0x5/0x30 stack backtrace: CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1 Call Trace: dump_stack+0xbc/0x13f ? _atomic_dec_and_lock+0x101/0x101 ? fs_reclaim_acquire.part.102+0x5/0x30 ? print_lock+0x54/0x68 __lock_acquire+0xa09/0x2040 ? debug_show_all_locks+0x2f0/0x2f0 ? mutex_destroy+0x120/0x120 ? hlock_class+0xa0/0xa0 ? kernel_text_address+0x5c/0x90 ? __kernel_text_address+0xe/0x30 ? unwind_get_return_address+0x2f/0x50 ? __save_stack_trace+0x92/0x100 ? graph_lock+0x8d/0x100 ? check_noncircular+0x20/0x20 ? __lock_acquire+0x616/0x2040 ? debug_show_all_locks+0x2f0/0x2f0 ? __lock_acquire+0x616/0x2040 ? debug_show_all_locks+0x2f0/0x2f0 ? print_irqtrace_events+0x110/0x110 ? active_load_balance_cpu_stop+0x7b0/0x7b0 ? debug_show_all_locks+0x2f0/0x2f0 ? mark_lock+0x1b1/0xa00 ? lock_acquire+0x12e/0x350 lock_acquire+0x12e/0x350 ? fs_reclaim_acquire.part.102+0x5/0x30 ? lockdep_rcu_suspicious+0x100/0x100 ? set_next_entity+0x20e/0x10d0 ? mark_lock+0x1b1/0xa00 ? match_held_lock+0x8d/0x440 ? mark_lock+0x1b1/0xa00 ? save_trace+0x1e0/0x1e0 ? print_irqtrace_events+0x110/0x110 ? alloc_extent_state+0xa7/0x410 fs_reclaim_acquire.part.102+0x29/0x30 ? fs_reclaim_acquire.part.102+0x5/0x30 kmem_cache_alloc+0x3d/0x2c0 ? rb_erase+0xe63/0x1240 alloc_extent_state+0xa7/0x410 ? lock_extent_buffer_for_io+0x3f0/0x3f0 ? find_held_lock+0x6d/0xd0 ? test_range_bit+0x197/0x210 ? lock_acquire+0x350/0x350 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? iotree_fs_info+0x30/0x30 __clear_extent_bit+0x3ea/0x570 ? clear_state_bit+0x270/0x270 ? count_range_bits+0x2f0/0x2f0 ? lock_acquire+0x350/0x350 ? rb_prev+0x21/0x90 try_release_extent_mapping+0x21a/0x260 __btrfs_releasepage+0xb0/0x1c0 ? btrfs_submit_direct+0xca0/0xca0 ? check_new_page_bad+0x1f0/0x1f0 ? match_held_lock+0xa5/0x440 ? debug_show_all_locks+0x2f0/0x2f0 btrfs_releasepage+0x161/0x170 ? __btrfs_releasepage+0x1c0/0x1c0 ? page_rmapping+0xd0/0xd0 ? rmap_walk+0x100/0x100 try_to_release_page+0x162/0x1c0 ? generic_file_write_iter+0x3c0/0x3c0 ? page_evictable+0xcc/0x110 ? lookup_address_in_pgd+0x107/0x190 shrink_page_list+0x1d5a/0x2fb0 ? putback_lru_page+0x3f0/0x3f0 ? save_trace+0x1e0/0x1e0 ? _lookup_address_cpa.isra.13+0x40/0x60 ? debug_show_all_locks+0x2f0/0x2f0 ? kmem_cache_free+0x8c/0x280 ? free_extent_state+0x1c8/0x3b0 ? mark_lock+0x1b1/0xa00 ? page_rmapping+0xd0/0xd0 ? print_irqtrace_events+0x110/0x110 ? shrink_node_memcg.constprop.88+0x4c9/0x5e0 ? shrink_node+0x12d/0x260 ? try_to_free_pages+0x418/0xaf0 ? __alloc_pages_slowpath+0x976/0x1790 ? __alloc_pages_nodemask+0x52c/0x5c0 ? delete_node+0x28d/0x5c0 ? find_held_lock+0x6d/0xd0 ? free_pcppages_bulk+0x381/0x570 ? lock_acquire+0x350/0x350 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? __lock_is_held+0x51/0xc0 ? _raw_spin_unlock+0x24/0x30 ? free_pcppages_bulk+0x381/0x570 ? mark_lock+0x1b1/0xa00 ? free_compound_page+0x30/0x30 ? print_irqtrace_events+0x110/0x110 ? __kernel_map_pages+0x2c9/0x310 ? mark_lock+0x1b1/0xa00 ? print_irqtrace_events+0x110/0x110 ? __delete_from_page_cache+0x2e7/0x4e0 ? save_trace+0x1e0/0x1e0 ? __add_to_page_cache_locked+0x680/0x680 ? find_held_lock+0x6d/0xd0 ? __list_add_valid+0x29/0xa0 ? free_unref_page_commit+0x198/0x270 ? drain_local_pages_wq+0x20/0x20 ? stop_critical_timings+0x210/0x210 ? mark_lock+0x1b1/0xa00 ? mark_lock+0x1b1/0xa00 ? print_irqtrace_events+0x110/0x110 ? __lock_acquire+0x616/0x2040 ? mark_lock+0x1b1/0xa00 ? mark_lock+0x1b1/0xa00 ? print_irqtrace_events+0x110/0x110 ? __phys_addr_symbol+0x23/0x40 ? __change_page_attr_set_clr+0xe86/0x1640 ? __btrfs_releasepage+0x1c0/0x1c0 ? mark_lock+0x1b1/0xa00 ? mark_lock+0x1b1/0xa00 ? print_irqtrace_events+0x110/0x110 ? mark_lock+0x1b1/0xa00 ? __lock_acquire+0x616/0x2040 ? __lock_acquire+0x616/0x2040 ? debug_show_all_locks+0x2f0/0x2f0 ? swiotlb_free_coherent+0x60/0x60 ? __phys_addr+0x32/0x80 ? igb_xmit_frame_ring+0xad7/0x1890 ? stack_access_ok+0x35/0x80 ? deref_stack_reg+0xa1/0xe0 ? __read_once_size_nocheck.constprop.6+0x10/0x10 ?
problematic rc9 futex changes.
c1e2f0eaf015fb: "futex: Avoid violating the 10th rule of futex" seems to make up a few new rules to violate. Coverity picked up these two problems in the same code: First it or's a value with stack garbage. ___ *** CID 1427826: Uninitialized variables (UNINIT) /kernel/futex.c: 2316 in fixup_pi_state_owner() 2310 2311raw_spin_lock_irq(_state->pi_mutex.wait_lock); 2312 2313oldowner = pi_state->owner; 2314/* Owner died? */ 2315if (!pi_state->owner) >>> CID 1427826: Uninitialized variables (UNINIT) >>> Using uninitialized value "newtid". 2316newtid |= FUTEX_OWNER_DIED; 2317 2318/* 2319 * We are here because either: 2320 * 2321 * - we stole the lock and pi_state->owner needs updating to reflect Then it notices that value is never read from before it's written anyway. *** CID 1427824: Code maintainability issues (UNUSED_VALUE) /kernel/futex.c: 2316 in fixup_pi_state_owner() 2310 2311raw_spin_lock_irq(_state->pi_mutex.wait_lock); 2312 2313oldowner = pi_state->owner; 2314/* Owner died? */ 2315if (!pi_state->owner) >>> CID 1427824: Code maintainability issues (UNUSED_VALUE) >>> Assigning value from "newtid | 0x4000U" to "newtid" here, but that >>> stored value is overwritten before it can be used. 2316newtid |= FUTEX_OWNER_DIED; 2317 2318/* 2319 * We are here because either: 2320 * 2321 * - we stole the lock and pi_state->owner needs updating to reflect (The next reference of newtid being.. 2369 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS; Dave
problematic rc9 futex changes.
c1e2f0eaf015fb: "futex: Avoid violating the 10th rule of futex" seems to make up a few new rules to violate. Coverity picked up these two problems in the same code: First it or's a value with stack garbage. ___ *** CID 1427826: Uninitialized variables (UNINIT) /kernel/futex.c: 2316 in fixup_pi_state_owner() 2310 2311raw_spin_lock_irq(_state->pi_mutex.wait_lock); 2312 2313oldowner = pi_state->owner; 2314/* Owner died? */ 2315if (!pi_state->owner) >>> CID 1427826: Uninitialized variables (UNINIT) >>> Using uninitialized value "newtid". 2316newtid |= FUTEX_OWNER_DIED; 2317 2318/* 2319 * We are here because either: 2320 * 2321 * - we stole the lock and pi_state->owner needs updating to reflect Then it notices that value is never read from before it's written anyway. *** CID 1427824: Code maintainability issues (UNUSED_VALUE) /kernel/futex.c: 2316 in fixup_pi_state_owner() 2310 2311raw_spin_lock_irq(_state->pi_mutex.wait_lock); 2312 2313oldowner = pi_state->owner; 2314/* Owner died? */ 2315if (!pi_state->owner) >>> CID 1427824: Code maintainability issues (UNUSED_VALUE) >>> Assigning value from "newtid | 0x4000U" to "newtid" here, but that >>> stored value is overwritten before it can be used. 2316newtid |= FUTEX_OWNER_DIED; 2317 2318/* 2319 * We are here because either: 2320 * 2321 * - we stole the lock and pi_state->owner needs updating to reflect (The next reference of newtid being.. 2369 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS; Dave
Re: proc_flush_task oops
On Thu, Dec 21, 2017 at 07:31:26PM -0600, Eric W. Biederman wrote: > Dave Jones <da...@codemonkey.org.uk> writes: > > > On Thu, Dec 21, 2017 at 12:38:12PM +0200, Alexey Dobriyan wrote: > > > > > > with proc_mnt still set to NULL is a mystery to me. > > > > > > > > Is there any chance the idr code doesn't always return the lowest > > valid > > > > free number? So init gets assigned something other than 1? > > > > > > Well, this theory is easy to test (attached). > > > > I didn't hit this BUG, but I hit the same oops in proc_flush_task. > > Scratch one idea. > > If it isn't too much trouble can you try this. > > I am wondering if somehow the proc_mnt that is NULL is somewhere in the > middle of the stack of pid namespaces. > > This adds two warnings. The first just reports which pid namespace in > the stack of pid namespaces is problematic, and the pid number in that > pid namespace. Which should give a whole lot more to go by. > > The second warning complains if we manage to create a pid namespace > where the parent pid namespace is not properly set up. The test to > prevent that looks quite robust, but at this point I don't know where to > look. Progress ? [ 1653.030190] [ cut here ] [ 1653.030852] 1/1: 2 no proc_mnt [ 1653.030946] WARNING: CPU: 2 PID: 4420 at kernel/pid.c:213 alloc_pid+0x24f/0x2a0
Re: proc_flush_task oops
On Thu, Dec 21, 2017 at 07:31:26PM -0600, Eric W. Biederman wrote: > Dave Jones writes: > > > On Thu, Dec 21, 2017 at 12:38:12PM +0200, Alexey Dobriyan wrote: > > > > > > with proc_mnt still set to NULL is a mystery to me. > > > > > > > > Is there any chance the idr code doesn't always return the lowest > > valid > > > > free number? So init gets assigned something other than 1? > > > > > > Well, this theory is easy to test (attached). > > > > I didn't hit this BUG, but I hit the same oops in proc_flush_task. > > Scratch one idea. > > If it isn't too much trouble can you try this. > > I am wondering if somehow the proc_mnt that is NULL is somewhere in the > middle of the stack of pid namespaces. > > This adds two warnings. The first just reports which pid namespace in > the stack of pid namespaces is problematic, and the pid number in that > pid namespace. Which should give a whole lot more to go by. > > The second warning complains if we manage to create a pid namespace > where the parent pid namespace is not properly set up. The test to > prevent that looks quite robust, but at this point I don't know where to > look. Progress ? [ 1653.030190] [ cut here ] [ 1653.030852] 1/1: 2 no proc_mnt [ 1653.030946] WARNING: CPU: 2 PID: 4420 at kernel/pid.c:213 alloc_pid+0x24f/0x2a0
Re: proc_flush_task oops
On Thu, Dec 21, 2017 at 12:38:12PM +0200, Alexey Dobriyan wrote: > > with proc_mnt still set to NULL is a mystery to me. > > > > Is there any chance the idr code doesn't always return the lowest valid > > free number? So init gets assigned something other than 1? > > Well, this theory is easy to test (attached). I didn't hit this BUG, but I hit the same oops in proc_flush_task. Dave
Re: proc_flush_task oops
On Thu, Dec 21, 2017 at 12:38:12PM +0200, Alexey Dobriyan wrote: > > with proc_mnt still set to NULL is a mystery to me. > > > > Is there any chance the idr code doesn't always return the lowest valid > > free number? So init gets assigned something other than 1? > > Well, this theory is easy to test (attached). I didn't hit this BUG, but I hit the same oops in proc_flush_task. Dave
Re: proc_flush_task oops
On Thu, Dec 21, 2017 at 12:38:12PM +0200, Alexey Dobriyan wrote: > On 12/21/17, Eric W. Biedermanwrote: > > I have stared at this code, and written some test programs and I can't > > see what is going on. alloc_pid by design and in implementation (as far > > as I can see) is always single threaded when allocating the first pid > > in a pid namespace. idr_init always initialized idr_next to 0. > > > > So how we can get past: > > > >if (unlikely(is_child_reaper(pid))) { > >if (pid_ns_prepare_proc(ns)) { > >disable_pid_allocation(ns); > >goto out_free; > >} > >} > > > > with proc_mnt still set to NULL is a mystery to me. > > > > Is there any chance the idr code doesn't always return the lowest valid > > free number? So init gets assigned something other than 1? > > Well, this theory is easy to test (attached). I'll give this a shot and report back when I get to the office. > There is a "valid" way to break the code via kernel.ns_last_pid: > unshare+write+fork but the reproducer doesn't seem to use it (or it does?) that sysctl is root only, so that isn't at play here. Dav
Re: proc_flush_task oops
On Thu, Dec 21, 2017 at 12:38:12PM +0200, Alexey Dobriyan wrote: > On 12/21/17, Eric W. Biederman wrote: > > I have stared at this code, and written some test programs and I can't > > see what is going on. alloc_pid by design and in implementation (as far > > as I can see) is always single threaded when allocating the first pid > > in a pid namespace. idr_init always initialized idr_next to 0. > > > > So how we can get past: > > > >if (unlikely(is_child_reaper(pid))) { > >if (pid_ns_prepare_proc(ns)) { > >disable_pid_allocation(ns); > >goto out_free; > >} > >} > > > > with proc_mnt still set to NULL is a mystery to me. > > > > Is there any chance the idr code doesn't always return the lowest valid > > free number? So init gets assigned something other than 1? > > Well, this theory is easy to test (attached). I'll give this a shot and report back when I get to the office. > There is a "valid" way to break the code via kernel.ns_last_pid: > unshare+write+fork but the reproducer doesn't seem to use it (or it does?) that sysctl is root only, so that isn't at play here. Dav
Re: proc_flush_task oops
On Wed, Dec 20, 2017 at 12:25:52PM -0600, Eric W. Biederman wrote: > > > > > > If the warning triggers it means the bug is in alloc_pid and somehow > > > something has gotten past the is_child_reaper check. > > > > You're onto something. > > > I am not seeing where things go wrong, but that puts the recent pid bitmap, > bit > hash to idr change in the suspect zone. > > Can you try reverting that change: > > e8cfbc245e24 ("pid: remove pidhash") > 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API") > > While keeping the warning in place so we can see if this fixes the > allocation problem? So I can't trigger this any more with those reverted. I seem to hit a bunch of other long-standing bugs first. I'll keep running it overnight, but it looks like this is where the problem lies. Dave
Re: proc_flush_task oops
On Wed, Dec 20, 2017 at 12:25:52PM -0600, Eric W. Biederman wrote: > > > > > > If the warning triggers it means the bug is in alloc_pid and somehow > > > something has gotten past the is_child_reaper check. > > > > You're onto something. > > > I am not seeing where things go wrong, but that puts the recent pid bitmap, > bit > hash to idr change in the suspect zone. > > Can you try reverting that change: > > e8cfbc245e24 ("pid: remove pidhash") > 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API") > > While keeping the warning in place so we can see if this fixes the > allocation problem? So I can't trigger this any more with those reverted. I seem to hit a bunch of other long-standing bugs first. I'll keep running it overnight, but it looks like this is where the problem lies. Dave
Re: proc_flush_task oops
On Tue, Dec 19, 2017 at 07:54:24PM -0600, Eric W. Biederman wrote: > > *Scratches my head* I am not seeing anything obvious. > > Can you try this patch as you reproduce this issue? > > diff --git a/kernel/pid.c b/kernel/pid.c > index b13b624e2c49..df9e5d4d8f83 100644 > --- a/kernel/pid.c > +++ b/kernel/pid.c > @@ -210,6 +210,7 @@ struct pid *alloc_pid(struct pid_namespace *ns) > goto out_unlock; > for ( ; upid >= pid->numbers; --upid) { > /* Make the PID visible to find_pid_ns. */ > + WARN_ON(!upid->ns->proc_mnt); > idr_replace(>ns->idr, pid, upid->nr); > upid->ns->pid_allocated++; > } > > > If the warning triggers it means the bug is in alloc_pid and somehow > something has gotten past the is_child_reaper check. You're onto something. WARNING: CPU: 1 PID: 12020 at kernel/pid.c:213 alloc_pid+0x230/0x280 CPU: 1 PID: 12020 Comm: trinity-c29 Not tainted 4.15.0-rc4-think+ #3 RIP: 0010:alloc_pid+0x230/0x280 RSP: 0018:c90009977d48 EFLAGS: 00010046 RAX: 0030 RBX: 8804fb431280 RCX: 8f5c28f5c28f5c29 RDX: 88050a00de40 RSI: 82005218 RDI: 8804fc6aa9a8 RBP: 8804fb431270 R08: R09: 0001 R10: c90009977cc0 R11: eab94e31da7171b7 R12: 8804fb431260 R13: 8804fb431240 R14: 82005200 R15: 8804fb431268 FS: 7f49b9065700() GS:88050a00() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f49b906a000 CR3: 0004f7446001 CR4: 001606e0 DR0: 7f0b4c405000 DR1: DR2: DR3: DR6: 0ff0 DR7: 0600 Call Trace: copy_process.part.41+0x14fa/0x1e30 _do_fork+0xe7/0x720 ? rcu_read_lock_sched_held+0x6c/0x80 ? syscall_trace_enter+0x2d7/0x340 do_syscall_64+0x60/0x210 entry_SYSCALL64_slow_path+0x25/0x25 followed immediately by... Oops: [#1] SMP CPU: 1 PID: 12020 Comm: trinity-c29 Tainted: GW 4.15.0-rc4-think+ #3 RIP: 0010:proc_flush_task+0x8e/0x1b0 RSP: 0018:c90009977c40 EFLAGS: 00010286 RAX: 0001 RBX: 0001 RCX: fffb RDX: RSI: c90009977c50 RDI: RBP: c90009977c63 R08: R09: 0002 R10: c90009977b70 R11: c90009977c64 R12: 0004 R13: R14: 0004 R15: 8804fb431240 FS: 7f49b9065700() GS:88050a00() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 0004f7446001 CR4: 001606e0 DR0: 7f0b4c405000 DR1: DR2: DR3: DR6: 0ff0 DR7: 0600 Call Trace: ? release_task+0xaf/0x680 release_task+0xd2/0x680 ? wait_consider_task+0xb82/0xce0 wait_consider_task+0xbe9/0xce0 ? do_wait+0xe1/0x330 do_wait+0x151/0x330 kernel_wait4+0x8d/0x150 ? task_stopped_code+0x50/0x50 SYSC_wait4+0x95/0xa0 ? rcu_read_lock_sched_held+0x6c/0x80 ? syscall_trace_enter+0x2d7/0x340 ? do_syscall_64+0x60/0x210 do_syscall_64+0x60/0x210 entry_SYSCALL64_slow_path+0x25/0x25
Re: proc_flush_task oops
On Tue, Dec 19, 2017 at 07:54:24PM -0600, Eric W. Biederman wrote: > > *Scratches my head* I am not seeing anything obvious. > > Can you try this patch as you reproduce this issue? > > diff --git a/kernel/pid.c b/kernel/pid.c > index b13b624e2c49..df9e5d4d8f83 100644 > --- a/kernel/pid.c > +++ b/kernel/pid.c > @@ -210,6 +210,7 @@ struct pid *alloc_pid(struct pid_namespace *ns) > goto out_unlock; > for ( ; upid >= pid->numbers; --upid) { > /* Make the PID visible to find_pid_ns. */ > + WARN_ON(!upid->ns->proc_mnt); > idr_replace(>ns->idr, pid, upid->nr); > upid->ns->pid_allocated++; > } > > > If the warning triggers it means the bug is in alloc_pid and somehow > something has gotten past the is_child_reaper check. You're onto something. WARNING: CPU: 1 PID: 12020 at kernel/pid.c:213 alloc_pid+0x230/0x280 CPU: 1 PID: 12020 Comm: trinity-c29 Not tainted 4.15.0-rc4-think+ #3 RIP: 0010:alloc_pid+0x230/0x280 RSP: 0018:c90009977d48 EFLAGS: 00010046 RAX: 0030 RBX: 8804fb431280 RCX: 8f5c28f5c28f5c29 RDX: 88050a00de40 RSI: 82005218 RDI: 8804fc6aa9a8 RBP: 8804fb431270 R08: R09: 0001 R10: c90009977cc0 R11: eab94e31da7171b7 R12: 8804fb431260 R13: 8804fb431240 R14: 82005200 R15: 8804fb431268 FS: 7f49b9065700() GS:88050a00() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f49b906a000 CR3: 0004f7446001 CR4: 001606e0 DR0: 7f0b4c405000 DR1: DR2: DR3: DR6: 0ff0 DR7: 0600 Call Trace: copy_process.part.41+0x14fa/0x1e30 _do_fork+0xe7/0x720 ? rcu_read_lock_sched_held+0x6c/0x80 ? syscall_trace_enter+0x2d7/0x340 do_syscall_64+0x60/0x210 entry_SYSCALL64_slow_path+0x25/0x25 followed immediately by... Oops: [#1] SMP CPU: 1 PID: 12020 Comm: trinity-c29 Tainted: GW 4.15.0-rc4-think+ #3 RIP: 0010:proc_flush_task+0x8e/0x1b0 RSP: 0018:c90009977c40 EFLAGS: 00010286 RAX: 0001 RBX: 0001 RCX: fffb RDX: RSI: c90009977c50 RDI: RBP: c90009977c63 R08: R09: 0002 R10: c90009977b70 R11: c90009977c64 R12: 0004 R13: R14: 0004 R15: 8804fb431240 FS: 7f49b9065700() GS:88050a00() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 0004f7446001 CR4: 001606e0 DR0: 7f0b4c405000 DR1: DR2: DR3: DR6: 0ff0 DR7: 0600 Call Trace: ? release_task+0xaf/0x680 release_task+0xd2/0x680 ? wait_consider_task+0xb82/0xce0 wait_consider_task+0xbe9/0xce0 ? do_wait+0xe1/0x330 do_wait+0x151/0x330 kernel_wait4+0x8d/0x150 ? task_stopped_code+0x50/0x50 SYSC_wait4+0x95/0xa0 ? rcu_read_lock_sched_held+0x6c/0x80 ? syscall_trace_enter+0x2d7/0x340 ? do_syscall_64+0x60/0x210 do_syscall_64+0x60/0x210 entry_SYSCALL64_slow_path+0x25/0x25
Re: proc_flush_task oops
On Tue, Dec 19, 2017 at 12:27:30PM -0600, Eric W. Biederman wrote: > Dave Jones <da...@codemonkey.org.uk> writes: > > > On Mon, Dec 18, 2017 at 03:50:52PM -0800, Linus Torvalds wrote: > > > > > But I don't see what would have changed in this area recently. > > > > > > Do you end up saving the seeds that cause crashes? Is this > > > reproducible? (Other than seeing it twoce, of course) > > > > Only clue so far, is every time I'm able to trigger it, the last thing > > the child process that triggers it did, was an execveat. > > Is there any chance the excveat might be called from a child thread? If trinity choose one of the exec syscalls, it forks off an extra child to do it in, on the off-chance that it succeeds, and we never return. https://github.com/kernelslacker/trinity/blob/master/syscall.c#L139 > That switching pids between tasks of a process during exec can get a > little bit tricky. > > > Telling it to just fuzz execveat doesn't instantly trigger it, so it > > must be a combination of some other syscall. I'll leave a script running > > overnight to see if I can binary search the other syscalls in > > combination with it. > > Could we have a buggy syscall that is stomping something? Not totally impossible I guess, though I would expect that would manifest in additional random failures, whereas this seems remarkably consistent. Dave
Re: proc_flush_task oops
On Tue, Dec 19, 2017 at 12:27:30PM -0600, Eric W. Biederman wrote: > Dave Jones writes: > > > On Mon, Dec 18, 2017 at 03:50:52PM -0800, Linus Torvalds wrote: > > > > > But I don't see what would have changed in this area recently. > > > > > > Do you end up saving the seeds that cause crashes? Is this > > > reproducible? (Other than seeing it twoce, of course) > > > > Only clue so far, is every time I'm able to trigger it, the last thing > > the child process that triggers it did, was an execveat. > > Is there any chance the excveat might be called from a child thread? If trinity choose one of the exec syscalls, it forks off an extra child to do it in, on the off-chance that it succeeds, and we never return. https://github.com/kernelslacker/trinity/blob/master/syscall.c#L139 > That switching pids between tasks of a process during exec can get a > little bit tricky. > > > Telling it to just fuzz execveat doesn't instantly trigger it, so it > > must be a combination of some other syscall. I'll leave a script running > > overnight to see if I can binary search the other syscalls in > > combination with it. > > Could we have a buggy syscall that is stomping something? Not totally impossible I guess, though I would expect that would manifest in additional random failures, whereas this seems remarkably consistent. Dave
Re: proc_flush_task oops
On Mon, Dec 18, 2017 at 03:50:52PM -0800, Linus Torvalds wrote: > But I don't see what would have changed in this area recently. > > Do you end up saving the seeds that cause crashes? Is this > reproducible? (Other than seeing it twoce, of course) Only clue so far, is every time I'm able to trigger it, the last thing the child process that triggers it did, was an execveat. Telling it to just fuzz execveat doesn't instantly trigger it, so it must be a combination of some other syscall. I'll leave a script running overnight to see if I can binary search the other syscalls in combination with it. One other thing: I said this was rc4, but it was actually rc4 + all the x86 stuff from today. There's enough creepy stuff in that pile, that I'll try with just plain rc4 tomorrow too. Dave
Re: proc_flush_task oops
On Mon, Dec 18, 2017 at 03:50:52PM -0800, Linus Torvalds wrote: > But I don't see what would have changed in this area recently. > > Do you end up saving the seeds that cause crashes? Is this > reproducible? (Other than seeing it twoce, of course) Only clue so far, is every time I'm able to trigger it, the last thing the child process that triggers it did, was an execveat. Telling it to just fuzz execveat doesn't instantly trigger it, so it must be a combination of some other syscall. I'll leave a script running overnight to see if I can binary search the other syscalls in combination with it. One other thing: I said this was rc4, but it was actually rc4 + all the x86 stuff from today. There's enough creepy stuff in that pile, that I'll try with just plain rc4 tomorrow too. Dave
Re: proc_flush_task oops
On Mon, Dec 18, 2017 at 03:50:52PM -0800, Linus Torvalds wrote: > On Mon, Dec 18, 2017 at 3:10 PM, Dave Jones <da...@codemonkey.org.uk> wrote: > > On Mon, Dec 18, 2017 at 10:15:41PM +, Al Viro wrote: > > > On Mon, Dec 18, 2017 at 04:44:38PM -0500, Dave Jones wrote: > > > > I've hit this twice today. It's odd, because afaics, none of this code > > > > has really changed in a long time. > > > > > > Which tree had that been? > > > > Linus, rc4. > > Ok, so the original report was marked as spam for me for whatever > reason. I ended up re-analyzing the oops, but came to the same > conclusion you did: it's a NULL mnt pointer in proc_flush_task_mnt(). > .. > But I don't see what would have changed in this area recently. > > Do you end up saving the seeds that cause crashes? Is this > reproducible? (Other than seeing it twoce, of course) Hit it another two times in the last hour, so it's pretty reproducable. Running it now with some more logging, will see if that yields any extra clues. Dave
Re: proc_flush_task oops
On Mon, Dec 18, 2017 at 03:50:52PM -0800, Linus Torvalds wrote: > On Mon, Dec 18, 2017 at 3:10 PM, Dave Jones wrote: > > On Mon, Dec 18, 2017 at 10:15:41PM +, Al Viro wrote: > > > On Mon, Dec 18, 2017 at 04:44:38PM -0500, Dave Jones wrote: > > > > I've hit this twice today. It's odd, because afaics, none of this code > > > > has really changed in a long time. > > > > > > Which tree had that been? > > > > Linus, rc4. > > Ok, so the original report was marked as spam for me for whatever > reason. I ended up re-analyzing the oops, but came to the same > conclusion you did: it's a NULL mnt pointer in proc_flush_task_mnt(). > .. > But I don't see what would have changed in this area recently. > > Do you end up saving the seeds that cause crashes? Is this > reproducible? (Other than seeing it twoce, of course) Hit it another two times in the last hour, so it's pretty reproducable. Running it now with some more logging, will see if that yields any extra clues. Dave
Re: proc_flush_task oops
On Mon, Dec 18, 2017 at 10:15:41PM +, Al Viro wrote: > On Mon, Dec 18, 2017 at 04:44:38PM -0500, Dave Jones wrote: > > I've hit this twice today. It's odd, because afaics, none of this code > > has really changed in a long time. > > Which tree had that been? Linus, rc4. Dave
Re: proc_flush_task oops
On Mon, Dec 18, 2017 at 10:15:41PM +, Al Viro wrote: > On Mon, Dec 18, 2017 at 04:44:38PM -0500, Dave Jones wrote: > > I've hit this twice today. It's odd, because afaics, none of this code > > has really changed in a long time. > > Which tree had that been? Linus, rc4. Dave
proc_flush_task oops
I've hit this twice today. It's odd, because afaics, none of this code has really changed in a long time. Dave Oops: [#1] SMP CPU: 2 PID: 6743 Comm: trinity-c117 Not tainted 4.15.0-rc4-think+ #2 RIP: 0010:proc_flush_task+0x8e/0x1b0 RSP: 0018:c9000bbffc40 EFLAGS: 00010286 RAX: 0001 RBX: 0001 RCX: fffb RDX: RSI: c9000bbffc50 RDI: RBP: c9000bbffc63 R08: R09: 0002 R10: c9000bbffb70 R11: c9000bbffc64 R12: 0003 R13: R14: 0003 R15: 8804c10d7840 FS: 7f7cb8965700() GS:88050a20() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 0003e21ae003 CR4: 001606e0 DR0: 7fb1d6c22000 DR1: DR2: DR3: DR6: 0ff0 DR7: 0600 Call Trace: ? release_task+0xaf/0x680 release_task+0xd2/0x680 ? wait_consider_task+0xb82/0xce0 wait_consider_task+0xbe9/0xce0 ? do_wait+0xe1/0x330 do_wait+0x151/0x330 kernel_wait4+0x8d/0x150 ? task_stopped_code+0x50/0x50 SYSC_wait4+0x95/0xa0 ? rcu_read_lock_sched_held+0x6c/0x80 ? syscall_trace_enter+0x2d7/0x340 ? do_syscall_64+0x60/0x210 do_syscall_64+0x60/0x210 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f7cb82603aa RSP: 002b:7ffd60770bc8 EFLAGS: 0246 ORIG_RAX: 003d RAX: ffda RBX: 7f7cb6cd4000 RCX: 7f7cb82603aa RDX: 000b RSI: 7ffd60770bd0 RDI: 7cca RBP: 7cca R08: 7f7cb8965700 R09: 7ffd607c7080 R10: R11: 0246 R12: R13: 7ffd60770bd0 R14: 7f7cb6cd4058 R15: cccd Code: c1 e2 04 44 8b 60 30 48 8b 40 38 44 8b 34 11 48 c7 c2 60 3a f5 81 44 89 e1 4c 8b 68 58 e8 4b b4 77 00 89 44 24 14 48 8d 74 24 10 <49> 8b 7d 00 e8 b9 6a f9 ff 48 85 c0 74 1a 48 89 c7 48 89 44 24 RIP: proc_flush_task+0x8e/0x1b0 RSP: c9000bbffc40 CR2: ---[ end trace 53d67a6481059862 ]--- All code 0: c1 e2 04shl$0x4,%edx 3: 44 8b 60 30 mov0x30(%rax),%r12d 7: 48 8b 40 38 mov0x38(%rax),%rax b: 44 8b 34 11 mov(%rcx,%rdx,1),%r14d f: 48 c7 c2 60 3a f5 81mov$0x81f53a60,%rdx 16: 44 89 e1mov%r12d,%ecx 19: 4c 8b 68 58 mov0x58(%rax),%r13 1d: e8 4b b4 77 00 callq 0x77b46d 22: 89 44 24 14 mov%eax,0x14(%rsp) 26: 48 8d 74 24 10 lea0x10(%rsp),%rsi 2b:* 49 8b 7d 00 mov0x0(%r13),%rdi <-- trapping instruction 2f: e8 b9 6a f9 ff callq 0xfff96aed 34: 48 85 c0test %rax,%rax 37: 74 1a je 0x53 39: 48 89 c7mov%rax,%rdi 3c: 48 rex.W 3d: 89 .byte 0x89 3e: 44 rex.R 3f: 24 .byte 0x24 Code starting with the faulting instruction === 0: 49 8b 7d 00 mov0x0(%r13),%rdi 4: e8 b9 6a f9 ff callq 0xfff96ac2 9: 48 85 c0test %rax,%rax c: 74 1a je 0x28 e: 48 89 c7mov%rax,%rdi 11: 48 rex.W 12: 89 .byte 0x89 13: 44 rex.R 14: 24 .byte 0x24 This looks like an inlined part of proc_flush_task_mnt dentry = d_hash_and_lookup(mnt->mnt_root, ); 4f99: 48 8d 74 24 10 lea0x10(%rsp),%rsi 4f9e: 49 8b 7d 00 mov0x0(%r13),%rdi 4fa2: e8 00 00 00 00 callq 4fa7So it looks like this.. 3097 for (i = 0; i <= pid->level; i++) { 3098 upid = >numbers[i]; 3099 proc_flush_task_mnt(upid->ns->proc_mnt, upid->nr, 3100 tgid->numbers[i].nr); 3101 } somehow passed a null upid->ns->proc_mnt down there. I'll try and narrow down a reproducer tomorrow. Any obvious recent changes that might explain this, or did I just finally appease the entropy gods enough to find the right combination of args to hit this ? Dave
proc_flush_task oops
I've hit this twice today. It's odd, because afaics, none of this code has really changed in a long time. Dave Oops: [#1] SMP CPU: 2 PID: 6743 Comm: trinity-c117 Not tainted 4.15.0-rc4-think+ #2 RIP: 0010:proc_flush_task+0x8e/0x1b0 RSP: 0018:c9000bbffc40 EFLAGS: 00010286 RAX: 0001 RBX: 0001 RCX: fffb RDX: RSI: c9000bbffc50 RDI: RBP: c9000bbffc63 R08: R09: 0002 R10: c9000bbffb70 R11: c9000bbffc64 R12: 0003 R13: R14: 0003 R15: 8804c10d7840 FS: 7f7cb8965700() GS:88050a20() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 0003e21ae003 CR4: 001606e0 DR0: 7fb1d6c22000 DR1: DR2: DR3: DR6: 0ff0 DR7: 0600 Call Trace: ? release_task+0xaf/0x680 release_task+0xd2/0x680 ? wait_consider_task+0xb82/0xce0 wait_consider_task+0xbe9/0xce0 ? do_wait+0xe1/0x330 do_wait+0x151/0x330 kernel_wait4+0x8d/0x150 ? task_stopped_code+0x50/0x50 SYSC_wait4+0x95/0xa0 ? rcu_read_lock_sched_held+0x6c/0x80 ? syscall_trace_enter+0x2d7/0x340 ? do_syscall_64+0x60/0x210 do_syscall_64+0x60/0x210 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f7cb82603aa RSP: 002b:7ffd60770bc8 EFLAGS: 0246 ORIG_RAX: 003d RAX: ffda RBX: 7f7cb6cd4000 RCX: 7f7cb82603aa RDX: 000b RSI: 7ffd60770bd0 RDI: 7cca RBP: 7cca R08: 7f7cb8965700 R09: 7ffd607c7080 R10: R11: 0246 R12: R13: 7ffd60770bd0 R14: 7f7cb6cd4058 R15: cccd Code: c1 e2 04 44 8b 60 30 48 8b 40 38 44 8b 34 11 48 c7 c2 60 3a f5 81 44 89 e1 4c 8b 68 58 e8 4b b4 77 00 89 44 24 14 48 8d 74 24 10 <49> 8b 7d 00 e8 b9 6a f9 ff 48 85 c0 74 1a 48 89 c7 48 89 44 24 RIP: proc_flush_task+0x8e/0x1b0 RSP: c9000bbffc40 CR2: ---[ end trace 53d67a6481059862 ]--- All code 0: c1 e2 04shl$0x4,%edx 3: 44 8b 60 30 mov0x30(%rax),%r12d 7: 48 8b 40 38 mov0x38(%rax),%rax b: 44 8b 34 11 mov(%rcx,%rdx,1),%r14d f: 48 c7 c2 60 3a f5 81mov$0x81f53a60,%rdx 16: 44 89 e1mov%r12d,%ecx 19: 4c 8b 68 58 mov0x58(%rax),%r13 1d: e8 4b b4 77 00 callq 0x77b46d 22: 89 44 24 14 mov%eax,0x14(%rsp) 26: 48 8d 74 24 10 lea0x10(%rsp),%rsi 2b:* 49 8b 7d 00 mov0x0(%r13),%rdi <-- trapping instruction 2f: e8 b9 6a f9 ff callq 0xfff96aed 34: 48 85 c0test %rax,%rax 37: 74 1a je 0x53 39: 48 89 c7mov%rax,%rdi 3c: 48 rex.W 3d: 89 .byte 0x89 3e: 44 rex.R 3f: 24 .byte 0x24 Code starting with the faulting instruction === 0: 49 8b 7d 00 mov0x0(%r13),%rdi 4: e8 b9 6a f9 ff callq 0xfff96ac2 9: 48 85 c0test %rax,%rax c: 74 1a je 0x28 e: 48 89 c7mov%rax,%rdi 11: 48 rex.W 12: 89 .byte 0x89 13: 44 rex.R 14: 24 .byte 0x24 This looks like an inlined part of proc_flush_task_mnt dentry = d_hash_and_lookup(mnt->mnt_root, ); 4f99: 48 8d 74 24 10 lea0x10(%rsp),%rsi 4f9e: 49 8b 7d 00 mov0x0(%r13),%rdi 4fa2: e8 00 00 00 00 callq 4fa7 So it looks like this.. 3097 for (i = 0; i <= pid->level; i++) { 3098 upid = >numbers[i]; 3099 proc_flush_task_mnt(upid->ns->proc_mnt, upid->nr, 3100 tgid->numbers[i].nr); 3101 } somehow passed a null upid->ns->proc_mnt down there. I'll try and narrow down a reproducer tomorrow. Any obvious recent changes that might explain this, or did I just finally appease the entropy gods enough to find the right combination of args to hit this ? Dave
Re: drm/amd/display: Restructuring and cleaning up DML
On Sat, Nov 18, 2017 at 12:02:01AM +, Linux Kernel wrote: > Web: > https://git.kernel.org/torvalds/c/6d04ee9dc10149db842d41de66eca201c9d91b60 > Commit: 6d04ee9dc10149db842d41de66eca201c9d91b60 > Parent: 19b7fe4a48efbe0f7e8c496b040c4eb16ff02313 > Refname:refs/heads/master > Author: Dmytro Laktyushkin> AuthorDate: Wed Aug 23 16:43:17 2017 -0400 > Committer: Alex Deucher > CommitDate: Sat Oct 21 16:45:24 2017 -0400 > > drm/amd/display: Restructuring and cleaning up DML > > Signed-off-by: Dmytro Laktyushkin > Reviewed-by: Tony Cheng > Acked-by: Harry Wentland > Signed-off-by: Alex Deucher > --- > diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c > b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c > index a18474437990..b6abe0f3bb15 100644 > --- a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c > +++ b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c > @@ -27,20 +27,36 @@ > > float dcn_bw_mod(const float arg1, const float arg2) > { > +if (arg1 != arg1) > +return arg2; > +if (arg2 != arg2) > +return arg1; > return arg1 - arg1 * ((int) (arg1 / arg2)); > } > > float dcn_bw_min2(const float arg1, const float arg2) > { > +if (arg1 != arg1) > +return arg2; > +if (arg2 != arg2) > +return arg1; > return arg1 < arg2 ? arg1 : arg2; > } > > unsigned int dcn_bw_max(const unsigned int arg1, const unsigned int arg2) > { > +if (arg1 != arg1) > +return arg2; > +if (arg2 != arg2) > +return arg1; > return arg1 > arg2 ? arg1 : arg2; > } > float dcn_bw_max2(const float arg1, const float arg2) > { > +if (arg1 != arg1) > +return arg2; > +if (arg2 != arg2) > +return arg1; > return arg1 > arg2 ? arg1 : arg2; > } This looks really, really bizarre. What was the intention here ? (This, and a bunch of other stuff in this driver picked up by Coverity, sign up at scan.coverity.com if you want access, and I'll approve.) Dave
Re: drm/amd/display: Restructuring and cleaning up DML
On Sat, Nov 18, 2017 at 12:02:01AM +, Linux Kernel wrote: > Web: > https://git.kernel.org/torvalds/c/6d04ee9dc10149db842d41de66eca201c9d91b60 > Commit: 6d04ee9dc10149db842d41de66eca201c9d91b60 > Parent: 19b7fe4a48efbe0f7e8c496b040c4eb16ff02313 > Refname:refs/heads/master > Author: Dmytro Laktyushkin > AuthorDate: Wed Aug 23 16:43:17 2017 -0400 > Committer: Alex Deucher > CommitDate: Sat Oct 21 16:45:24 2017 -0400 > > drm/amd/display: Restructuring and cleaning up DML > > Signed-off-by: Dmytro Laktyushkin > Reviewed-by: Tony Cheng > Acked-by: Harry Wentland > Signed-off-by: Alex Deucher > --- > diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c > b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c > index a18474437990..b6abe0f3bb15 100644 > --- a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c > +++ b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c > @@ -27,20 +27,36 @@ > > float dcn_bw_mod(const float arg1, const float arg2) > { > +if (arg1 != arg1) > +return arg2; > +if (arg2 != arg2) > +return arg1; > return arg1 - arg1 * ((int) (arg1 / arg2)); > } > > float dcn_bw_min2(const float arg1, const float arg2) > { > +if (arg1 != arg1) > +return arg2; > +if (arg2 != arg2) > +return arg1; > return arg1 < arg2 ? arg1 : arg2; > } > > unsigned int dcn_bw_max(const unsigned int arg1, const unsigned int arg2) > { > +if (arg1 != arg1) > +return arg2; > +if (arg2 != arg2) > +return arg1; > return arg1 > arg2 ? arg1 : arg2; > } > float dcn_bw_max2(const float arg1, const float arg2) > { > +if (arg1 != arg1) > +return arg2; > +if (arg2 != arg2) > +return arg1; > return arg1 > arg2 ? arg1 : arg2; > } This looks really, really bizarre. What was the intention here ? (This, and a bunch of other stuff in this driver picked up by Coverity, sign up at scan.coverity.com if you want access, and I'll approve.) Dave
Re: [trinity] WARNING: CPU: 0 PID: 515 at drivers/pci/pci-sysfs.c:1224 pci_mmap_resource+0xd6/0x10e
On Fri, Nov 24, 2017 at 08:11:39AM +0800, Fengguang Wu wrote: > Hello, > > FYI this happens in mainline kernel 4.14.0-12995-g0c86a6b. > It at least dates back to v4.9 . > > I wonder where can we avoid this warning, by improving trinity (or how > we use it), or the pci subsystem? > > [main] Added 42 filenames from /dev > [main] Added 13651 filenames from /proc > [main] Added 11163 filenames from /sys > [ 19.452176] [ cut here ] > [ 19.452938] process "trinity-main" tried to map 0x4000 bytes at page > 0x0001 on :00:06.0 BAR 4 (start 0xfe008000, size 0x > 4000) > [ 19.454804] WARNING: CPU: 0 PID: 515 at drivers/pci/pci-sysfs.c:1224 > pci_mmap_resource+0xd6/0x10e That's a root-only operation, where we allow the user to shoot themselves in the foot afaik. What you could do now that you're running an up to date trinity in 0day, is pass the --dropprivs flag to setuid to nobody in the child processes. Dave
Re: [trinity] WARNING: CPU: 0 PID: 515 at drivers/pci/pci-sysfs.c:1224 pci_mmap_resource+0xd6/0x10e
On Fri, Nov 24, 2017 at 08:11:39AM +0800, Fengguang Wu wrote: > Hello, > > FYI this happens in mainline kernel 4.14.0-12995-g0c86a6b. > It at least dates back to v4.9 . > > I wonder where can we avoid this warning, by improving trinity (or how > we use it), or the pci subsystem? > > [main] Added 42 filenames from /dev > [main] Added 13651 filenames from /proc > [main] Added 11163 filenames from /sys > [ 19.452176] [ cut here ] > [ 19.452938] process "trinity-main" tried to map 0x4000 bytes at page > 0x0001 on :00:06.0 BAR 4 (start 0xfe008000, size 0x > 4000) > [ 19.454804] WARNING: CPU: 0 PID: 515 at drivers/pci/pci-sysfs.c:1224 > pci_mmap_resource+0xd6/0x10e That's a root-only operation, where we allow the user to shoot themselves in the foot afaik. What you could do now that you're running an up to date trinity in 0day, is pass the --dropprivs flag to setuid to nobody in the child processes. Dave
Re: x86/umip: Enable User-Mode Instruction Prevention at runtime
On Mon, Nov 13, 2017 at 11:44:02PM +, Linux Kernel wrote: > Web: > https://git.kernel.org/torvalds/c/aa35f896979d9610bb11df485cf7bb6ca241febb > Commit: aa35f896979d9610bb11df485cf7bb6ca241febb > Parent: c6a960bbf6a36572a06bde866d94a7338c7f256a > Refname:refs/heads/master > Author: Ricardo Neri> AuthorDate: Sun Nov 5 18:27:54 2017 -0800 > Committer: Ingo Molnar > CommitDate: Wed Nov 8 11:16:23 2017 +0100 > > x86/umip: Enable User-Mode Instruction Prevention at runtime > +config X86_INTEL_UMIP > +def_bool n > +depends on CPU_SUP_INTEL > +prompt "Intel User Mode Instruction Prevention" if EXPERT > +---help--- > + The User Mode Instruction Prevention (UMIP) is a security > + feature in newer Intel processors. Can we start defining which CPU generation features appear in in Kconfigs ? In six months time, "newer" will mean even less than it does today. It'd be nice to be able to answer oldconfig without having to look things up in the SDM. Dave
Re: x86/umip: Enable User-Mode Instruction Prevention at runtime
On Mon, Nov 13, 2017 at 11:44:02PM +, Linux Kernel wrote: > Web: > https://git.kernel.org/torvalds/c/aa35f896979d9610bb11df485cf7bb6ca241febb > Commit: aa35f896979d9610bb11df485cf7bb6ca241febb > Parent: c6a960bbf6a36572a06bde866d94a7338c7f256a > Refname:refs/heads/master > Author: Ricardo Neri > AuthorDate: Sun Nov 5 18:27:54 2017 -0800 > Committer: Ingo Molnar > CommitDate: Wed Nov 8 11:16:23 2017 +0100 > > x86/umip: Enable User-Mode Instruction Prevention at runtime > +config X86_INTEL_UMIP > +def_bool n > +depends on CPU_SUP_INTEL > +prompt "Intel User Mode Instruction Prevention" if EXPERT > +---help--- > + The User Mode Instruction Prevention (UMIP) is a security > + feature in newer Intel processors. Can we start defining which CPU generation features appear in in Kconfigs ? In six months time, "newer" will mean even less than it does today. It'd be nice to be able to answer oldconfig without having to look things up in the SDM. Dave
Re: [4.14-rc7] task struct corruption after fork
On Mon, Oct 30, 2017 at 11:59:30AM -0700, Linus Torvalds wrote: > and that location would *almost* make sense in that it's the end of > the same page that contained a "struct task_struct". > > Are you running with VMAP_STACK? Is there perhaps some stale code that > ends up doing the old "stack pointer is in the same allocation as task > struct"? yeah, it's enabled. > If you have the kernel symbols for that image, can you look up if any > of those addresses look like any static kernel symbol addresses? Those > things that have the pattern 8xxx might be symbol > addresses and give us a clue about where the values came from. it got clobbered by another build, but I managed to rebuild it from the older config. Modulo modules that were loaded, things should be the same. > 81172d1e r15 81172d00 t usage_match 81172d30 t HARDIRQ_verbose > 8426daec r14 841cee60 b lock_classes 844eed00 B nr_lock_classes > ed008b17e001 r13 > 811737e2 r12 81173540 t __bfs 81173970 t check_noncircular > 8426dbe0 rbp 841cee60 b lock_classes 844eed00 B nr_lock_classes > 880458bf0008 rbx > 84590d00 r11 841cee60 b lock_classes 844eed00 B nr_lock_classes > 5 r10 > 81172d00 r9 81172d00 t usage_match > 1 r8 > 11008b17dfed rax > 880458bf00f0 rcx > ed008b17dff9 rdx > dc00 rsi > 41b58ab3 rdi > 82a349a8 orig_eax 828a7f40 R inat_primary_table 82a42840 r POLY > 81173540 rip 81173540 t __bfs > 5a5a5a5a5a5a5a5a > 8450f080 flags 844eed40 b list_entries 846eed60 B nr_list_entries So a bunch of lockdep stuff, and not much else afaics. Dave
Re: [4.14-rc7] task struct corruption after fork
On Mon, Oct 30, 2017 at 11:59:30AM -0700, Linus Torvalds wrote: > and that location would *almost* make sense in that it's the end of > the same page that contained a "struct task_struct". > > Are you running with VMAP_STACK? Is there perhaps some stale code that > ends up doing the old "stack pointer is in the same allocation as task > struct"? yeah, it's enabled. > If you have the kernel symbols for that image, can you look up if any > of those addresses look like any static kernel symbol addresses? Those > things that have the pattern 8xxx might be symbol > addresses and give us a clue about where the values came from. it got clobbered by another build, but I managed to rebuild it from the older config. Modulo modules that were loaded, things should be the same. > 81172d1e r15 81172d00 t usage_match 81172d30 t HARDIRQ_verbose > 8426daec r14 841cee60 b lock_classes 844eed00 B nr_lock_classes > ed008b17e001 r13 > 811737e2 r12 81173540 t __bfs 81173970 t check_noncircular > 8426dbe0 rbp 841cee60 b lock_classes 844eed00 B nr_lock_classes > 880458bf0008 rbx > 84590d00 r11 841cee60 b lock_classes 844eed00 B nr_lock_classes > 5 r10 > 81172d00 r9 81172d00 t usage_match > 1 r8 > 11008b17dfed rax > 880458bf00f0 rcx > ed008b17dff9 rdx > dc00 rsi > 41b58ab3 rdi > 82a349a8 orig_eax 828a7f40 R inat_primary_table 82a42840 r POLY > 81173540 rip 81173540 t __bfs > 5a5a5a5a5a5a5a5a > 8450f080 flags 844eed40 b list_entries 846eed60 B nr_list_entries So a bunch of lockdep stuff, and not much else afaics. Dave
[4.14-rc7] task struct corruption after fork
Something scary for halloween. Only saw this once so far. [10737.049397] = [10737.052151] BUG task_struct (Not tainted): Padding overwritten. 0x880458befef8-0x880458beffcf [10737.055172] - [10737.061267] Disabling lock debugging due to kernel taint [10737.064384] INFO: Slab 0xea001162fa00 objects=4 used=4 fp=0x (null) flags=0x2ffc0008100 [10737.067771] CPU: 2 PID: 26357 Comm: trinity-c13 Tainted: GB 4.14.0-rc7-think+ #1 [10737.074807] Call Trace: [10737.078329] dump_stack+0xbc/0x145 [10737.081919] ? dma_virt_map_sg+0xfb/0xfb [10737.085566] ? lock_release+0x890/0x890 [10737.089264] slab_err+0xad/0xd0 [10737.092997] ? memchr_inv+0x160/0x180 [10737.096769] slab_pad_check.part.43+0xfa/0x160 [10737.100681] ? copy_process.part.42+0x101c/0x29e0 [10737.104600] check_slab+0xa6/0xd0 [10737.108563] alloc_debug_processing+0x85/0x1b0 [10737.112612] ___slab_alloc+0x525/0x5d0 [10737.116672] ? __lock_is_held+0x2e/0xd0 [10737.120810] ? copy_process.part.42+0x101c/0x29e0 [10737.125019] ? ___might_sleep.part.69+0x118/0x320 [10737.129267] ? copy_process.part.42+0x101c/0x29e0 [10737.133556] ? __slab_alloc+0x3e/0x80 [10737.137803] __slab_alloc+0x3e/0x80 [10737.142100] kmem_cache_alloc_node+0xbd/0x360 [10737.146464] ? copy_process.part.42+0x101c/0x29e0 [10737.150932] copy_process.part.42+0x101c/0x29e0 [10737.155473] ? jbd2_buffer_abort_trigger+0x50/0x50 [10737.160040] ? __might_sleep+0x58/0xe0 [10737.164670] ? __cleanup_sighand+0x30/0x30 [10737.169308] ? mark_lock+0x16f/0x9b0 [10737.174016] ? balance_dirty_pages_ratelimited+0x744/0x10d0 [10737.178868] ? print_irqtrace_events+0x110/0x110 [10737.183779] ? mark_lock+0x16f/0x9b0 [10737.188731] ? print_irqtrace_events+0x110/0x110 [10737.193696] ? block_write_end+0x150/0x150 [10737.198745] ? match_held_lock+0xa6/0x410 [10737.203887] ? save_trace+0x1c0/0x1c0 [10737.209040] ? native_sched_clock+0xf9/0x1a0 [10737.214255] ? cyc2ns_read_end+0x10/0x10 [10737.219500] ? ext4_da_write_end+0x301/0x690 [10737.224771] ? sched_clock_cpu+0x14/0xf0 [10737.230077] ? __lock_acquire+0x6b3/0x2050 [10737.235438] ? sched_clock_cpu+0x14/0xf0 [10737.240833] ? debug_check_no_locks_freed+0x1a0/0x1a0 [10737.246272] ? debug_check_no_locks_freed+0x1a0/0x1a0 [10737.251754] ? lock_downgrade+0x310/0x310 [10737.257255] ? __lock_page_killable+0x100/0x100 [10737.262801] ? __mnt_drop_write_file+0x26/0x40 [10737.268432] ? current_time+0x70/0x70 [10737.274106] ? fsnotify+0xe99/0x1020 [10737.279744] ? up_write+0x97/0xe0 [10737.285470] ? match_held_lock+0x93/0x410 [10737.291287] ? save_trace+0x1c0/0x1c0 [10737.297097] ? __fsnotify_update_child_dentry_flags.part.2+0x160/0x160 [10737.303127] ? native_sched_clock+0xf9/0x1a0 [10737.309193] ? cyc2ns_read_end+0x10/0x10 [10737.315280] ? ext4_file_mmap+0xb0/0xb0 [10737.321446] ? match_held_lock+0x93/0x410 [10737.327627] ? sched_clock_cpu+0x14/0xf0 [10737.333831] ? save_trace+0x1c0/0x1c0 [10737.340108] ? native_sched_clock+0xf9/0x1a0 [10737.346408] ? cyc2ns_read_end+0x10/0x10 [10737.352788] _do_fork+0x1c4/0xa30 [10737.359190] ? fork_idle+0x120/0x120 [10737.365607] ? lock_downgrade+0x310/0x310 [10737.371992] ? native_sched_clock+0xf9/0x1a0 [10737.378485] ? cyc2ns_read_end+0x10/0x10 [10737.385060] ? syscall_trace_enter+0x2a6/0x670 [10737.391669] ? exit_to_usermode_loop+0x180/0x180 [10737.398368] ? __lock_is_held+0x2e/0xd0 [10737.405116] ? rcu_read_lock_sched_held+0x90/0xa0 [10737.411917] ? __context_tracking_exit.part.4+0x223/0x290 [10737.418798] ? context_tracking_recursion_enter+0x50/0x50 [10737.425758] ? __task_pid_nr_ns+0x1c4/0x300 [10737.432746] ? free_pidmap.isra.0+0x40/0x40 [10737.439743] ? SyS_read+0x140/0x140 [10737.446788] ? mark_held_locks+0x1b/0xa0 [10737.453883] ? do_syscall_64+0xae/0x400 [10737.461040] ? ptregs_sys_rt_sigreturn+0x10/0x10 [10737.468260] do_syscall_64+0x182/0x400 [10737.475508] ? syscall_return_slowpath+0x270/0x270 [10737.482801] ? rcu_read_lock_sched_held+0x90/0xa0 [10737.490156] ? __context_tracking_exit.part.4+0x223/0x290 [10737.497608] ? mark_held_locks+0x1b/0xa0 [10737.505087] ? return_from_SYSCALL_64+0x2d/0x7a [10737.512662] ? trace_hardirqs_on_caller+0x17a/0x250 [10737.520339] ? trace_hardirqs_on_thunk+0x1a/0x1c [10737.527991] entry_SYSCALL64_slow_path+0x25/0x25 [10737.535684] RIP: 0033:0x7f8917f3837b [10737.543411] RSP: 002b:7ffdca212e00 EFLAGS: 0246 [10737.551299] ORIG_RAX: 0038 [10737.559237] RAX: ffda RBX: 7ffdca212e00 RCX: 7f8917f3837b [10737.567378] RDX: RSI: RDI: 01200011 [10737.575550] RBP: 7ffdca212e50 R08: 7f891863a700 R09: 7ffdca3ef080 [10737.583786] R10: 7f891863a9d0 R11: 0246 R12: [10737.592129] R13: 0020 R14:
[4.14-rc7] task struct corruption after fork
Something scary for halloween. Only saw this once so far. [10737.049397] = [10737.052151] BUG task_struct (Not tainted): Padding overwritten. 0x880458befef8-0x880458beffcf [10737.055172] - [10737.061267] Disabling lock debugging due to kernel taint [10737.064384] INFO: Slab 0xea001162fa00 objects=4 used=4 fp=0x (null) flags=0x2ffc0008100 [10737.067771] CPU: 2 PID: 26357 Comm: trinity-c13 Tainted: GB 4.14.0-rc7-think+ #1 [10737.074807] Call Trace: [10737.078329] dump_stack+0xbc/0x145 [10737.081919] ? dma_virt_map_sg+0xfb/0xfb [10737.085566] ? lock_release+0x890/0x890 [10737.089264] slab_err+0xad/0xd0 [10737.092997] ? memchr_inv+0x160/0x180 [10737.096769] slab_pad_check.part.43+0xfa/0x160 [10737.100681] ? copy_process.part.42+0x101c/0x29e0 [10737.104600] check_slab+0xa6/0xd0 [10737.108563] alloc_debug_processing+0x85/0x1b0 [10737.112612] ___slab_alloc+0x525/0x5d0 [10737.116672] ? __lock_is_held+0x2e/0xd0 [10737.120810] ? copy_process.part.42+0x101c/0x29e0 [10737.125019] ? ___might_sleep.part.69+0x118/0x320 [10737.129267] ? copy_process.part.42+0x101c/0x29e0 [10737.133556] ? __slab_alloc+0x3e/0x80 [10737.137803] __slab_alloc+0x3e/0x80 [10737.142100] kmem_cache_alloc_node+0xbd/0x360 [10737.146464] ? copy_process.part.42+0x101c/0x29e0 [10737.150932] copy_process.part.42+0x101c/0x29e0 [10737.155473] ? jbd2_buffer_abort_trigger+0x50/0x50 [10737.160040] ? __might_sleep+0x58/0xe0 [10737.164670] ? __cleanup_sighand+0x30/0x30 [10737.169308] ? mark_lock+0x16f/0x9b0 [10737.174016] ? balance_dirty_pages_ratelimited+0x744/0x10d0 [10737.178868] ? print_irqtrace_events+0x110/0x110 [10737.183779] ? mark_lock+0x16f/0x9b0 [10737.188731] ? print_irqtrace_events+0x110/0x110 [10737.193696] ? block_write_end+0x150/0x150 [10737.198745] ? match_held_lock+0xa6/0x410 [10737.203887] ? save_trace+0x1c0/0x1c0 [10737.209040] ? native_sched_clock+0xf9/0x1a0 [10737.214255] ? cyc2ns_read_end+0x10/0x10 [10737.219500] ? ext4_da_write_end+0x301/0x690 [10737.224771] ? sched_clock_cpu+0x14/0xf0 [10737.230077] ? __lock_acquire+0x6b3/0x2050 [10737.235438] ? sched_clock_cpu+0x14/0xf0 [10737.240833] ? debug_check_no_locks_freed+0x1a0/0x1a0 [10737.246272] ? debug_check_no_locks_freed+0x1a0/0x1a0 [10737.251754] ? lock_downgrade+0x310/0x310 [10737.257255] ? __lock_page_killable+0x100/0x100 [10737.262801] ? __mnt_drop_write_file+0x26/0x40 [10737.268432] ? current_time+0x70/0x70 [10737.274106] ? fsnotify+0xe99/0x1020 [10737.279744] ? up_write+0x97/0xe0 [10737.285470] ? match_held_lock+0x93/0x410 [10737.291287] ? save_trace+0x1c0/0x1c0 [10737.297097] ? __fsnotify_update_child_dentry_flags.part.2+0x160/0x160 [10737.303127] ? native_sched_clock+0xf9/0x1a0 [10737.309193] ? cyc2ns_read_end+0x10/0x10 [10737.315280] ? ext4_file_mmap+0xb0/0xb0 [10737.321446] ? match_held_lock+0x93/0x410 [10737.327627] ? sched_clock_cpu+0x14/0xf0 [10737.333831] ? save_trace+0x1c0/0x1c0 [10737.340108] ? native_sched_clock+0xf9/0x1a0 [10737.346408] ? cyc2ns_read_end+0x10/0x10 [10737.352788] _do_fork+0x1c4/0xa30 [10737.359190] ? fork_idle+0x120/0x120 [10737.365607] ? lock_downgrade+0x310/0x310 [10737.371992] ? native_sched_clock+0xf9/0x1a0 [10737.378485] ? cyc2ns_read_end+0x10/0x10 [10737.385060] ? syscall_trace_enter+0x2a6/0x670 [10737.391669] ? exit_to_usermode_loop+0x180/0x180 [10737.398368] ? __lock_is_held+0x2e/0xd0 [10737.405116] ? rcu_read_lock_sched_held+0x90/0xa0 [10737.411917] ? __context_tracking_exit.part.4+0x223/0x290 [10737.418798] ? context_tracking_recursion_enter+0x50/0x50 [10737.425758] ? __task_pid_nr_ns+0x1c4/0x300 [10737.432746] ? free_pidmap.isra.0+0x40/0x40 [10737.439743] ? SyS_read+0x140/0x140 [10737.446788] ? mark_held_locks+0x1b/0xa0 [10737.453883] ? do_syscall_64+0xae/0x400 [10737.461040] ? ptregs_sys_rt_sigreturn+0x10/0x10 [10737.468260] do_syscall_64+0x182/0x400 [10737.475508] ? syscall_return_slowpath+0x270/0x270 [10737.482801] ? rcu_read_lock_sched_held+0x90/0xa0 [10737.490156] ? __context_tracking_exit.part.4+0x223/0x290 [10737.497608] ? mark_held_locks+0x1b/0xa0 [10737.505087] ? return_from_SYSCALL_64+0x2d/0x7a [10737.512662] ? trace_hardirqs_on_caller+0x17a/0x250 [10737.520339] ? trace_hardirqs_on_thunk+0x1a/0x1c [10737.527991] entry_SYSCALL64_slow_path+0x25/0x25 [10737.535684] RIP: 0033:0x7f8917f3837b [10737.543411] RSP: 002b:7ffdca212e00 EFLAGS: 0246 [10737.551299] ORIG_RAX: 0038 [10737.559237] RAX: ffda RBX: 7ffdca212e00 RCX: 7f8917f3837b [10737.567378] RDX: RSI: RDI: 01200011 [10737.575550] RBP: 7ffdca212e50 R08: 7f891863a700 R09: 7ffdca3ef080 [10737.583786] R10: 7f891863a9d0 R11: 0246 R12: [10737.592129] R13: 0020 R14:
[4.14rc6] suspicious nfs rcu dereference
WARNING: suspicious RCU usage 4.14.0-rc6-think+ #2 Not tainted - net/sunrpc/clnt.c:1206 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by kworker/2:0/9104: #0: ( "rpciod" ){+.+.} , at: [] process_one_work+0x66e/0xea0 #1: ( (>u.tk_work) ){+.+.} , at: [] process_one_work+0x66e/0xea0 stack backtrace: CPU: 2 PID: 9104 Comm: kworker/2:0 Not tainted 4.14.0-rc6-think+ #2 Workqueue: rpciod rpc_async_schedule [sunrpc] Call Trace: dump_stack+0xbc/0x145 ? dma_virt_map_sg+0xfb/0xfb ? lockdep_rcu_suspicious+0xda/0x100 rpc_peeraddr2str+0x11a/0x130 [sunrpc] ? call_start+0x1e0/0x1e0 [sunrpc] perf_trace_nfs4_clientid_event+0xde/0x420 [nfsv4] ? do_raw_spin_unlock+0x147/0x220 ? save_trace+0x1c0/0x1c0 ? perf_trace_nfs4_open_event+0x5f0/0x5f0 [nfsv4] ? nfs41_sequence_process+0xba/0x5a0 [nfsv4] ? _raw_spin_unlock+0x24/0x30 ? nfs41_sequence_free_slot.isra.47+0x143/0x230 [nfsv4] ? __lock_is_held+0x51/0xd0 nfs41_sequence_call_done+0x29a/0x430 [nfsv4] ? perf_trace_nfs4_open_event+0x5f0/0x5f0 [nfsv4] ? nfs4_proc_unlink_done+0xb0/0xb0 [nfsv4] ? __internal_add_timer+0x11b/0x170 ? call_connect_status+0x490/0x490 [sunrpc] ? __lock_is_held+0x51/0xd0 ? call_decode+0x33f/0xdd0 [sunrpc] ? nfs4_proc_unlink_done+0xb0/0xb0 [nfsv4] ? rpc_make_runnable+0x180/0x180 [sunrpc] rpc_exit_task+0x61/0x100 [sunrpc] ? rpc_make_runnable+0x180/0x180 [sunrpc] __rpc_execute+0x1c8/0x9e0 [sunrpc] ? rpc_wake_up_queued_task+0x40/0x40 [sunrpc] ? lock_downgrade+0x310/0x310 ? match_held_lock+0xa6/0x410 ? sched_clock_cpu+0x14/0xf0 ? sched_clock_cpu+0x14/0xf0 ? save_trace+0x1c0/0x1c0 ? lock_acquire+0x12e/0x350 ? lock_acquire+0x12e/0x350 ? process_one_work+0x66e/0xea0 ? lock_release+0x890/0x890 ? do_raw_spin_trylock+0x100/0x100 ? __lock_is_held+0x51/0xd0 process_one_work+0x766/0xea0 ? pwq_dec_nr_in_flight+0x1e0/0x1e0 ? preempt_notifier_dec+0x20/0x20 ? __schedule+0x5cc/0x1310 ? __sched_text_start+0x8/0x8 ? match_held_lock+0x93/0x410 ? save_trace+0x1c0/0x1c0 ? native_sched_clock+0xf9/0x1a0 ? cyc2ns_read_end+0x10/0x10 ? cyc2ns_read_end+0x10/0x10 ? find_held_lock+0x74/0xd0 ? lock_contended+0x790/0x790 ? lock_release+0x890/0x890 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? retint_kernel+0x10/0x10 ? do_raw_spin_trylock+0xb3/0x100 ? do_raw_spin_lock+0x110/0x110 ? mark_held_locks+0x1b/0xa0 worker_thread+0x1cf/0xcf0 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? process_one_work+0xea0/0xea0 ? get_vtime_delta+0x13/0x80 ? mark_held_locks+0x1b/0xa0 ? trace_hardirqs_on_caller+0x17a/0x250 ? _raw_spin_unlock_irq+0x29/0x40 ? finish_task_switch+0x183/0x470 ? finish_task_switch+0x101/0x470 ? preempt_notifier_dec+0x20/0x20 ? __schedule+0x5cc/0x1310 ? try_to_wake_up+0xe7/0xbb0 ? save_stack+0x32/0xb0 ? kasan_kmalloc+0xa0/0xd0 ? native_sched_clock+0xf9/0x1a0 ? ret_from_fork+0x27/0x40 ? sched_clock_cpu+0x14/0xf0 ? sched_clock_cpu+0x14/0xf0 ? lock_downgrade+0x310/0x310 ? __schedule+0x1310/0x1310 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? do_raw_spin_lock+0x110/0x110 ? __init_waitqueue_head+0xbe/0xf0 ? mark_held_locks+0x1b/0xa0 ? _raw_spin_unlock_irqrestore+0x32/0x50 ? process_one_work+0xea0/0xea0 kthread+0x1c9/0x1f0 ? kthread_create_on_node+0xc0/0xc0 ret_from_fork+0x27/0x40
[4.14rc6] suspicious nfs rcu dereference
WARNING: suspicious RCU usage 4.14.0-rc6-think+ #2 Not tainted - net/sunrpc/clnt.c:1206 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 2 locks held by kworker/2:0/9104: #0: ( "rpciod" ){+.+.} , at: [] process_one_work+0x66e/0xea0 #1: ( (>u.tk_work) ){+.+.} , at: [] process_one_work+0x66e/0xea0 stack backtrace: CPU: 2 PID: 9104 Comm: kworker/2:0 Not tainted 4.14.0-rc6-think+ #2 Workqueue: rpciod rpc_async_schedule [sunrpc] Call Trace: dump_stack+0xbc/0x145 ? dma_virt_map_sg+0xfb/0xfb ? lockdep_rcu_suspicious+0xda/0x100 rpc_peeraddr2str+0x11a/0x130 [sunrpc] ? call_start+0x1e0/0x1e0 [sunrpc] perf_trace_nfs4_clientid_event+0xde/0x420 [nfsv4] ? do_raw_spin_unlock+0x147/0x220 ? save_trace+0x1c0/0x1c0 ? perf_trace_nfs4_open_event+0x5f0/0x5f0 [nfsv4] ? nfs41_sequence_process+0xba/0x5a0 [nfsv4] ? _raw_spin_unlock+0x24/0x30 ? nfs41_sequence_free_slot.isra.47+0x143/0x230 [nfsv4] ? __lock_is_held+0x51/0xd0 nfs41_sequence_call_done+0x29a/0x430 [nfsv4] ? perf_trace_nfs4_open_event+0x5f0/0x5f0 [nfsv4] ? nfs4_proc_unlink_done+0xb0/0xb0 [nfsv4] ? __internal_add_timer+0x11b/0x170 ? call_connect_status+0x490/0x490 [sunrpc] ? __lock_is_held+0x51/0xd0 ? call_decode+0x33f/0xdd0 [sunrpc] ? nfs4_proc_unlink_done+0xb0/0xb0 [nfsv4] ? rpc_make_runnable+0x180/0x180 [sunrpc] rpc_exit_task+0x61/0x100 [sunrpc] ? rpc_make_runnable+0x180/0x180 [sunrpc] __rpc_execute+0x1c8/0x9e0 [sunrpc] ? rpc_wake_up_queued_task+0x40/0x40 [sunrpc] ? lock_downgrade+0x310/0x310 ? match_held_lock+0xa6/0x410 ? sched_clock_cpu+0x14/0xf0 ? sched_clock_cpu+0x14/0xf0 ? save_trace+0x1c0/0x1c0 ? lock_acquire+0x12e/0x350 ? lock_acquire+0x12e/0x350 ? process_one_work+0x66e/0xea0 ? lock_release+0x890/0x890 ? do_raw_spin_trylock+0x100/0x100 ? __lock_is_held+0x51/0xd0 process_one_work+0x766/0xea0 ? pwq_dec_nr_in_flight+0x1e0/0x1e0 ? preempt_notifier_dec+0x20/0x20 ? __schedule+0x5cc/0x1310 ? __sched_text_start+0x8/0x8 ? match_held_lock+0x93/0x410 ? save_trace+0x1c0/0x1c0 ? native_sched_clock+0xf9/0x1a0 ? cyc2ns_read_end+0x10/0x10 ? cyc2ns_read_end+0x10/0x10 ? find_held_lock+0x74/0xd0 ? lock_contended+0x790/0x790 ? lock_release+0x890/0x890 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? retint_kernel+0x10/0x10 ? do_raw_spin_trylock+0xb3/0x100 ? do_raw_spin_lock+0x110/0x110 ? mark_held_locks+0x1b/0xa0 worker_thread+0x1cf/0xcf0 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? process_one_work+0xea0/0xea0 ? get_vtime_delta+0x13/0x80 ? mark_held_locks+0x1b/0xa0 ? trace_hardirqs_on_caller+0x17a/0x250 ? _raw_spin_unlock_irq+0x29/0x40 ? finish_task_switch+0x183/0x470 ? finish_task_switch+0x101/0x470 ? preempt_notifier_dec+0x20/0x20 ? __schedule+0x5cc/0x1310 ? try_to_wake_up+0xe7/0xbb0 ? save_stack+0x32/0xb0 ? kasan_kmalloc+0xa0/0xd0 ? native_sched_clock+0xf9/0x1a0 ? ret_from_fork+0x27/0x40 ? sched_clock_cpu+0x14/0xf0 ? sched_clock_cpu+0x14/0xf0 ? lock_downgrade+0x310/0x310 ? __schedule+0x1310/0x1310 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? do_raw_spin_lock+0x110/0x110 ? __init_waitqueue_head+0xbe/0xf0 ? mark_held_locks+0x1b/0xa0 ? _raw_spin_unlock_irqrestore+0x32/0x50 ? process_one_work+0xea0/0xea0 kthread+0x1c9/0x1f0 ? kthread_create_on_node+0xc0/0xc0 ret_from_fork+0x27/0x40
Re: out of bounds strscpy from seccomp_actions_logged_handler
On Tue, Oct 24, 2017 at 06:54:25PM -0500, Tyler Hicks wrote: > On 10/24/2017 06:46 PM, Dave Jones wrote: > > (Triggered with trinity, but it seems just a 'cat > > /proc/sys/kernel/seccomp/actions_logged' reproduces just as easily). > > Hi Dave - Thanks for the report. This is a false positive that was > previously discussed here: > > https://lkml.kernel.org/r/<20171010182805.52b9b...@cakuba.netronome.com> Bah, I thought this smelled familiar. I'll just roll Andrey's workaround diff into my builds for fuzzing runs until someone figures out something better. Dave
Re: out of bounds strscpy from seccomp_actions_logged_handler
On Tue, Oct 24, 2017 at 06:54:25PM -0500, Tyler Hicks wrote: > On 10/24/2017 06:46 PM, Dave Jones wrote: > > (Triggered with trinity, but it seems just a 'cat > > /proc/sys/kernel/seccomp/actions_logged' reproduces just as easily). > > Hi Dave - Thanks for the report. This is a false positive that was > previously discussed here: > > https://lkml.kernel.org/r/<20171010182805.52b9b...@cakuba.netronome.com> Bah, I thought this smelled familiar. I'll just roll Andrey's workaround diff into my builds for fuzzing runs until someone figures out something better. Dave
out of bounds strscpy from seccomp_actions_logged_handler
(Triggered with trinity, but it seems just a 'cat /proc/sys/kernel/seccomp/actions_logged' reproduces just as easily). BUG: KASAN: global-out-of-bounds in strscpy+0x133/0x2d0 Read of size 8 at addr 824b0028 by task trinity-c63/6883 CPU: 3 PID: 6883 Comm: trinity-c63 Not tainted 4.14.0-rc6-think+ #1 Call Trace: dump_stack+0xbc/0x145 ? dma_virt_map_sg+0xfb/0xfb print_address_description+0x2d/0x260 kasan_report+0x277/0x360 ? strscpy+0x133/0x2d0 strscpy+0x133/0x2d0 ? strcasecmp+0xb0/0xb0 seccomp_actions_logged_handler+0x2c5/0x440 ? seccomp_send_sigsys+0xd0/0xd0 ? lock_downgrade+0x310/0x310 ? lock_release+0x890/0x890 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? do_raw_spin_trylock+0x40/0x100 ? do_raw_spin_lock+0x110/0x110 proc_sys_call_handler+0x1b1/0x1f0 ? seccomp_send_sigsys+0xd0/0xd0 ? proc_sys_readdir+0x6d0/0x6d0 do_iter_read+0x23b/0x280 vfs_readv+0x107/0x180 ? compat_rw_copy_check_uvector+0x1d0/0x1d0 ? native_sched_clock+0xf9/0x1a0 ? cyc2ns_read_end+0x10/0x10 ? __fget_light+0x181/0x200 ? fget_raw+0x10/0x10 ? __lock_is_held+0x2e/0xd0 ? rcu_read_lock_sched_held+0x90/0xa0 ? __context_tracking_exit.part.4+0x223/0x290 ? context_tracking_recursion_enter+0x50/0x50 ? __task_pid_nr_ns+0x1c4/0x300 ? do_preadv+0xb0/0xf0 do_preadv+0xb0/0xf0 ? SyS_preadv+0x10/0x10 do_syscall_64+0x182/0x400 ? syscall_return_slowpath+0x270/0x270 ? rcu_read_lock_sched_held+0x90/0xa0 ? __context_tracking_exit.part.4+0x223/0x290 ? mark_held_locks+0x1b/0xa0 ? return_from_SYSCALL_64+0x2d/0x7a ? trace_hardirqs_on_caller+0x17a/0x250 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f52d5f45219 RSP: 002b:7fff8a422838 EFLAGS: 0246 ORIG_RAX: 0147 RAX: ffda RBX: 0147 RCX: 7f52d5f45219 RDX: 00f3 RSI: 55d6d5b413d0 RDI: 00b2 RBP: 7fff8a4228e0 R08: 316c1272491c R09: R10: 725c3dd7 R11: 0246 R12: 0002 R13: 7f52d645b058 R14: 7f52d661b698 R15: 7f52d645b000 The buggy address belongs to the variable: kdb_rwtypes+0x1268/0x1320 Memory state around the buggy address: 824aff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 824aff80: 00 00 00 00 00 00 00 00 00 00 00 00 07 fa fa fa >824b: fa fa fa fa 00 05 fa fa fa fa fa fa 02 fa fa fa ^ 824b0080: fa fa fa fa 00 00 01 fa fa fa fa fa 00 00 04 fa 824b0100: fa fa fa fa 00 06 fa fa fa fa fa fa 00 07 fa fa == Disabling lock debugging due to kernel taint
out of bounds strscpy from seccomp_actions_logged_handler
(Triggered with trinity, but it seems just a 'cat /proc/sys/kernel/seccomp/actions_logged' reproduces just as easily). BUG: KASAN: global-out-of-bounds in strscpy+0x133/0x2d0 Read of size 8 at addr 824b0028 by task trinity-c63/6883 CPU: 3 PID: 6883 Comm: trinity-c63 Not tainted 4.14.0-rc6-think+ #1 Call Trace: dump_stack+0xbc/0x145 ? dma_virt_map_sg+0xfb/0xfb print_address_description+0x2d/0x260 kasan_report+0x277/0x360 ? strscpy+0x133/0x2d0 strscpy+0x133/0x2d0 ? strcasecmp+0xb0/0xb0 seccomp_actions_logged_handler+0x2c5/0x440 ? seccomp_send_sigsys+0xd0/0xd0 ? lock_downgrade+0x310/0x310 ? lock_release+0x890/0x890 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? do_raw_spin_trylock+0x40/0x100 ? do_raw_spin_lock+0x110/0x110 proc_sys_call_handler+0x1b1/0x1f0 ? seccomp_send_sigsys+0xd0/0xd0 ? proc_sys_readdir+0x6d0/0x6d0 do_iter_read+0x23b/0x280 vfs_readv+0x107/0x180 ? compat_rw_copy_check_uvector+0x1d0/0x1d0 ? native_sched_clock+0xf9/0x1a0 ? cyc2ns_read_end+0x10/0x10 ? __fget_light+0x181/0x200 ? fget_raw+0x10/0x10 ? __lock_is_held+0x2e/0xd0 ? rcu_read_lock_sched_held+0x90/0xa0 ? __context_tracking_exit.part.4+0x223/0x290 ? context_tracking_recursion_enter+0x50/0x50 ? __task_pid_nr_ns+0x1c4/0x300 ? do_preadv+0xb0/0xf0 do_preadv+0xb0/0xf0 ? SyS_preadv+0x10/0x10 do_syscall_64+0x182/0x400 ? syscall_return_slowpath+0x270/0x270 ? rcu_read_lock_sched_held+0x90/0xa0 ? __context_tracking_exit.part.4+0x223/0x290 ? mark_held_locks+0x1b/0xa0 ? return_from_SYSCALL_64+0x2d/0x7a ? trace_hardirqs_on_caller+0x17a/0x250 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f52d5f45219 RSP: 002b:7fff8a422838 EFLAGS: 0246 ORIG_RAX: 0147 RAX: ffda RBX: 0147 RCX: 7f52d5f45219 RDX: 00f3 RSI: 55d6d5b413d0 RDI: 00b2 RBP: 7fff8a4228e0 R08: 316c1272491c R09: R10: 725c3dd7 R11: 0246 R12: 0002 R13: 7f52d645b058 R14: 7f52d661b698 R15: 7f52d645b000 The buggy address belongs to the variable: kdb_rwtypes+0x1268/0x1320 Memory state around the buggy address: 824aff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 824aff80: 00 00 00 00 00 00 00 00 00 00 00 00 07 fa fa fa >824b: fa fa fa fa 00 05 fa fa fa fa fa fa 02 fa fa fa ^ 824b0080: fa fa fa fa 00 00 01 fa fa fa fa fa 00 00 04 fa 824b0100: fa fa fa fa 00 06 fa fa fa fa fa fa 00 07 fa fa == Disabling lock debugging due to kernel taint
[4.14rc5] corrupted stack end detected inside scheduler
Just hit this fairly quickly by fuzzing writev calls. Attempting to reproduce, but so far only seeing floods of page allocation stalls. Kernel panic - not syncing: corrupted stack end detected inside scheduler\x0a CPU: 1 PID: 2531 Comm: kworker/u8:4 Not tainted 4.14.0-rc5-think+ #1 Workqueue: writeback wb_workfn (flush-8:16) Call Trace: dump_stack+0xbc/0x145 ? dma_virt_map_sg+0xfb/0xfb ? sched_clock_cpu+0x14/0xf0 ? vsnprintf+0x331/0x7e0 panic+0x14e/0x2b5 ? __warn+0x12b/0x12b ? __schedule+0x111/0x1310 __schedule+0x12fd/0x1310 ? isolate_lru_page+0x890/0x890 ? __sched_text_start+0x8/0x8 ? blk_init_request_from_bio+0x150/0x150 ? lock_downgrade+0x310/0x310 ? lock_release+0x890/0x890 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? mark_held_locks+0x70/0xa0 ? _raw_spin_unlock_irqrestore+0x32/0x50 schedule+0xc3/0x260 ? __schedule+0x1310/0x1310 ? __wake_up_locked_key_bookmark+0x20/0x20 ? match_held_lock+0x93/0x410 ? match_held_lock+0x93/0x410 ? save_trace+0x1c0/0x1c0 ? save_trace+0x1c0/0x1c0 io_schedule+0x1c/0x50 wbt_wait+0x45a/0x7f0 ? wbt_update_limits+0x40/0x40 ? sched_clock_cpu+0x14/0xf0 ? lock_downgrade+0x310/0x310 ? finish_wait+0x200/0x200 ? elv_rb_find+0x32/0x60 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? blk_mq_sched_try_merge+0x74/0x250 ? init_emergency_isa_pool+0x50/0x50 ? _raw_spin_unlock+0x24/0x30 ? dd_bio_merge+0xd3/0x120 ? save_trace+0x1c0/0x1c0 ? __blk_mq_sched_bio_merge+0x106/0x350 blk_mq_make_request+0x298/0x1160 ? __blk_mq_insert_request+0x4c0/0x4c0 ? cyc2ns_read_end+0x10/0x10 ? sched_clock_cpu+0x14/0xf0 ? sched_clock_cpu+0x14/0xf0 ? lock_downgrade+0x310/0x310 ? lock_release+0x890/0x890 ? rcu_is_watching+0x88/0xd0 ? blk_queue_enter+0x188/0x4e0 ? blk_exit_rl+0x40/0x40 ? lock_page_memcg+0xf6/0x240 ? rcu_is_watching+0x88/0xd0 ? rcutorture_record_progress+0x10/0x10 ? lock_release+0x890/0x890 ? do_raw_spin_unlock+0x147/0x220 ? __test_set_page_writeback+0x45f/0x950 ? mark_held_locks+0x70/0xa0 ? _raw_spin_unlock_irqrestore+0x32/0x50 ? balance_dirty_pages_ratelimited+0x10d0/0x10d0 ? mempool_alloc+0x1d6/0x2f0 generic_make_request+0x316/0x7f0 ? bio_add_page+0x140/0x140 ? blk_queue_enter+0x4e0/0x4e0 ? debug_check_no_locks_freed+0x1a0/0x1a0 ? bio_alloc_bioset+0x1e8/0x3b0 ? bvec_alloc+0x160/0x160 ? cyc2ns_read_end+0x10/0x10 ? match_held_lock+0x93/0x410 ? bio_add_page+0xdb/0x140 ? submit_bio+0xe1/0x270 submit_bio+0xe1/0x270 ? wake_up_page_bit+0x300/0x300 ? generic_make_request+0x7f0/0x7f0 ? __lock_acquire+0x6b3/0x2050 ? lock_release+0x890/0x890 ? bdev_write_page+0x50/0x160 __swap_writepage+0x3c6/0xb20 ? SyS_madvise+0xf60/0xf60 ? generic_swapfile_activate+0x2b0/0x2b0 ? lock_downgrade+0x310/0x310 ? lock_release+0x890/0x890 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? do_raw_spin_trylock+0xb0/0x100 ? do_raw_spin_lock+0x110/0x110 ? _raw_spin_unlock+0x24/0x30 ? page_swapcount+0x9f/0xc0 ? page_swapped+0x179/0x190 ? page_trans_huge_map_swapcount+0x700/0x700 ? save_trace+0x1c0/0x1c0 ? sched_clock_cpu+0x14/0xf0 ? sched_clock_cpu+0x14/0xf0 ? try_to_free_swap+0x264/0x330 ? reuse_swap_page+0x560/0x560 ? lock_downgrade+0x310/0x310 ? clear_page_dirty_for_io+0x1a9/0x5a0 ? redirty_page_for_writepage+0x40/0x40 ? ___might_sleep.part.69+0x118/0x320 ? cyc2ns_read_end+0x10/0x10 ? page_remove_rmap+0x690/0x690 ? up_read+0x1c/0x40 pageout.isra.54+0x520/0xb50 ? move_active_pages_to_lru+0x920/0x920 ? do_raw_spin_unlock+0x147/0x220 ? mark_held_locks+0x70/0xa0 ? page_mapping+0x274/0x2b0 ? kstrndup+0x90/0x90 ? __add_to_swap_cache+0x63a/0x710 ? swap_readpage+0x610/0x610 ? swap_set_page_dirty+0x1dd/0x1f0 ? swap_readpage+0x610/0x610 ? show_swap_cache_info+0x130/0x130 ? wait_for_completion+0x3e0/0x3e0 ? rmap_walk+0x175/0x190 ? __anon_vma_prepare+0x360/0x360 ? set_page_dirty+0x1a7/0x380 ? __writepage+0x80/0x80 ? __anon_vma_prepare+0x360/0x360 ? drop_buffers+0x2a0/0x2a0 ? page_rmapping+0x9c/0xd0 ? try_to_unmap+0x34c/0x3a0 ? rmap_walk_locked+0x190/0x190 ? free_swap_slot+0x150/0x150 ? page_remove_rmap+0x690/0x690 ? rcu_read_unlock+0x60/0x60 ? page_get_anon_vma+0x2c0/0x2c0 ? mem_cgroup_swapout+0x4a0/0x4a0 ? page_mapping+0x274/0x2b0 ? kstrndup+0x90/0x90 ? page_get_anon_vma+0x2c0/0x2c0 ? add_to_swap+0x1ae/0x1d0 ? __delete_from_swap_cache+0x4b0/0x4b0 ? page_evictable+0xcc/0x110 shrink_page_list+0x242b/0x2cc0 ? putback_lru_page+0x430/0x430 ? native_flush_tlb_others+0x480/0x480 ? mark_lock+0x16f/0x9b0 ? mark_lock+0x16f/0x9b0 ? print_irqtrace_events+0x110/0x110 ? make_huge_pte+0xa0/0xa0 ? ptep_clear_flush+0xf7/0x140 ? pmd_clear_bad+0x40/0x40 ? mark_lock+0x16f/0x9b0 ? _find_next_bit+0x30/0xb0 ? print_irqtrace_events+0x110/0x110 ? try_to_unmap_one+0x10ff/0x14b0 ? match_held_lock+0x93/0x410 ? native_sched_clock+0xf9/0x1a0 ? match_held_lock+0x93/0x410 ? save_trace+0x1c0/0x1c0 ? save_trace+0x1c0/0x1c0 ? native_sched_clock+0xf9/0x1a0 ?
[4.14rc5] corrupted stack end detected inside scheduler
Just hit this fairly quickly by fuzzing writev calls. Attempting to reproduce, but so far only seeing floods of page allocation stalls. Kernel panic - not syncing: corrupted stack end detected inside scheduler\x0a CPU: 1 PID: 2531 Comm: kworker/u8:4 Not tainted 4.14.0-rc5-think+ #1 Workqueue: writeback wb_workfn (flush-8:16) Call Trace: dump_stack+0xbc/0x145 ? dma_virt_map_sg+0xfb/0xfb ? sched_clock_cpu+0x14/0xf0 ? vsnprintf+0x331/0x7e0 panic+0x14e/0x2b5 ? __warn+0x12b/0x12b ? __schedule+0x111/0x1310 __schedule+0x12fd/0x1310 ? isolate_lru_page+0x890/0x890 ? __sched_text_start+0x8/0x8 ? blk_init_request_from_bio+0x150/0x150 ? lock_downgrade+0x310/0x310 ? lock_release+0x890/0x890 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? mark_held_locks+0x70/0xa0 ? _raw_spin_unlock_irqrestore+0x32/0x50 schedule+0xc3/0x260 ? __schedule+0x1310/0x1310 ? __wake_up_locked_key_bookmark+0x20/0x20 ? match_held_lock+0x93/0x410 ? match_held_lock+0x93/0x410 ? save_trace+0x1c0/0x1c0 ? save_trace+0x1c0/0x1c0 io_schedule+0x1c/0x50 wbt_wait+0x45a/0x7f0 ? wbt_update_limits+0x40/0x40 ? sched_clock_cpu+0x14/0xf0 ? lock_downgrade+0x310/0x310 ? finish_wait+0x200/0x200 ? elv_rb_find+0x32/0x60 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? blk_mq_sched_try_merge+0x74/0x250 ? init_emergency_isa_pool+0x50/0x50 ? _raw_spin_unlock+0x24/0x30 ? dd_bio_merge+0xd3/0x120 ? save_trace+0x1c0/0x1c0 ? __blk_mq_sched_bio_merge+0x106/0x350 blk_mq_make_request+0x298/0x1160 ? __blk_mq_insert_request+0x4c0/0x4c0 ? cyc2ns_read_end+0x10/0x10 ? sched_clock_cpu+0x14/0xf0 ? sched_clock_cpu+0x14/0xf0 ? lock_downgrade+0x310/0x310 ? lock_release+0x890/0x890 ? rcu_is_watching+0x88/0xd0 ? blk_queue_enter+0x188/0x4e0 ? blk_exit_rl+0x40/0x40 ? lock_page_memcg+0xf6/0x240 ? rcu_is_watching+0x88/0xd0 ? rcutorture_record_progress+0x10/0x10 ? lock_release+0x890/0x890 ? do_raw_spin_unlock+0x147/0x220 ? __test_set_page_writeback+0x45f/0x950 ? mark_held_locks+0x70/0xa0 ? _raw_spin_unlock_irqrestore+0x32/0x50 ? balance_dirty_pages_ratelimited+0x10d0/0x10d0 ? mempool_alloc+0x1d6/0x2f0 generic_make_request+0x316/0x7f0 ? bio_add_page+0x140/0x140 ? blk_queue_enter+0x4e0/0x4e0 ? debug_check_no_locks_freed+0x1a0/0x1a0 ? bio_alloc_bioset+0x1e8/0x3b0 ? bvec_alloc+0x160/0x160 ? cyc2ns_read_end+0x10/0x10 ? match_held_lock+0x93/0x410 ? bio_add_page+0xdb/0x140 ? submit_bio+0xe1/0x270 submit_bio+0xe1/0x270 ? wake_up_page_bit+0x300/0x300 ? generic_make_request+0x7f0/0x7f0 ? __lock_acquire+0x6b3/0x2050 ? lock_release+0x890/0x890 ? bdev_write_page+0x50/0x160 __swap_writepage+0x3c6/0xb20 ? SyS_madvise+0xf60/0xf60 ? generic_swapfile_activate+0x2b0/0x2b0 ? lock_downgrade+0x310/0x310 ? lock_release+0x890/0x890 ? do_raw_spin_unlock+0x147/0x220 ? do_raw_spin_trylock+0x100/0x100 ? do_raw_spin_trylock+0xb0/0x100 ? do_raw_spin_lock+0x110/0x110 ? _raw_spin_unlock+0x24/0x30 ? page_swapcount+0x9f/0xc0 ? page_swapped+0x179/0x190 ? page_trans_huge_map_swapcount+0x700/0x700 ? save_trace+0x1c0/0x1c0 ? sched_clock_cpu+0x14/0xf0 ? sched_clock_cpu+0x14/0xf0 ? try_to_free_swap+0x264/0x330 ? reuse_swap_page+0x560/0x560 ? lock_downgrade+0x310/0x310 ? clear_page_dirty_for_io+0x1a9/0x5a0 ? redirty_page_for_writepage+0x40/0x40 ? ___might_sleep.part.69+0x118/0x320 ? cyc2ns_read_end+0x10/0x10 ? page_remove_rmap+0x690/0x690 ? up_read+0x1c/0x40 pageout.isra.54+0x520/0xb50 ? move_active_pages_to_lru+0x920/0x920 ? do_raw_spin_unlock+0x147/0x220 ? mark_held_locks+0x70/0xa0 ? page_mapping+0x274/0x2b0 ? kstrndup+0x90/0x90 ? __add_to_swap_cache+0x63a/0x710 ? swap_readpage+0x610/0x610 ? swap_set_page_dirty+0x1dd/0x1f0 ? swap_readpage+0x610/0x610 ? show_swap_cache_info+0x130/0x130 ? wait_for_completion+0x3e0/0x3e0 ? rmap_walk+0x175/0x190 ? __anon_vma_prepare+0x360/0x360 ? set_page_dirty+0x1a7/0x380 ? __writepage+0x80/0x80 ? __anon_vma_prepare+0x360/0x360 ? drop_buffers+0x2a0/0x2a0 ? page_rmapping+0x9c/0xd0 ? try_to_unmap+0x34c/0x3a0 ? rmap_walk_locked+0x190/0x190 ? free_swap_slot+0x150/0x150 ? page_remove_rmap+0x690/0x690 ? rcu_read_unlock+0x60/0x60 ? page_get_anon_vma+0x2c0/0x2c0 ? mem_cgroup_swapout+0x4a0/0x4a0 ? page_mapping+0x274/0x2b0 ? kstrndup+0x90/0x90 ? page_get_anon_vma+0x2c0/0x2c0 ? add_to_swap+0x1ae/0x1d0 ? __delete_from_swap_cache+0x4b0/0x4b0 ? page_evictable+0xcc/0x110 shrink_page_list+0x242b/0x2cc0 ? putback_lru_page+0x430/0x430 ? native_flush_tlb_others+0x480/0x480 ? mark_lock+0x16f/0x9b0 ? mark_lock+0x16f/0x9b0 ? print_irqtrace_events+0x110/0x110 ? make_huge_pte+0xa0/0xa0 ? ptep_clear_flush+0xf7/0x140 ? pmd_clear_bad+0x40/0x40 ? mark_lock+0x16f/0x9b0 ? _find_next_bit+0x30/0xb0 ? print_irqtrace_events+0x110/0x110 ? try_to_unmap_one+0x10ff/0x14b0 ? match_held_lock+0x93/0x410 ? native_sched_clock+0xf9/0x1a0 ? match_held_lock+0x93/0x410 ? save_trace+0x1c0/0x1c0 ? save_trace+0x1c0/0x1c0 ? native_sched_clock+0xf9/0x1a0 ?
Re: WARN_ON_ONCE in fs/iomap.c:993
On Mon, Sep 11, 2017 at 06:56:05AM -0400, Shankara Pailoor wrote: > Hi, > > I am fuzzing linux 4.13-rc7 with XFS using syzkaller on x86_64 and I > found the following warning: > > WARNING: CPU: 2 PID: 5391 at fs/iomap.c:993 iomap_dio_rw+0xc79/0xe70 > > Here is a reproducer program: https://pastebin.com/tc014k97 pwrite in one thread, sendfile on another. Same thing trinity has been hitting. See thread "Subject: Re: iov_iter_pipe warning". Dave
Re: WARN_ON_ONCE in fs/iomap.c:993
On Mon, Sep 11, 2017 at 06:56:05AM -0400, Shankara Pailoor wrote: > Hi, > > I am fuzzing linux 4.13-rc7 with XFS using syzkaller on x86_64 and I > found the following warning: > > WARNING: CPU: 2 PID: 5391 at fs/iomap.c:993 iomap_dio_rw+0xc79/0xe70 > > Here is a reproducer program: https://pastebin.com/tc014k97 pwrite in one thread, sendfile on another. Same thing trinity has been hitting. See thread "Subject: Re: iov_iter_pipe warning". Dave
Re: iov_iter_pipe warning.
On Sun, Sep 10, 2017 at 09:05:48PM +0100, Al Viro wrote: > On Sun, Sep 10, 2017 at 12:07:10PM -0400, Dave Jones wrote: > > On Sun, Sep 10, 2017 at 03:57:21AM +0100, Al Viro wrote: > > > On Sat, Sep 09, 2017 at 09:07:56PM -0400, Dave Jones wrote: > > > > > > > With this in place, I'm still seeing -EBUSY from > > invalidate_inode_pages2_range > > > > which doesn't end well... > > > > > > Different issue, and I'm not sure why that WARN_ON() is there in the > > > first place. Note that in a similar situation > > generic_file_direct_write() > > > simply buggers off and lets the caller do buffered write... > > > > > > iov_iter_pipe() warning is a sign of ->read_iter() on pipe-backed > > iov_iter > > > putting into the pipe more than it claims to have done. > > > > (from a rerun after hitting that EBUSY warn; hence the taint) > > > > WARNING: CPU: 0 PID: 14154 at fs/iomap.c:1055 iomap_dio_rw+0x78e/0x840 > > ... and that's another invalidate_inode_pages2_range() in the same > sucker. Again, compare with generic_file_direct_write()... > > I don't believe that this one has anything splice-specific to do with it. > And its only relation to iov_iter_pipe() splat is that it's in the same > fs/iomap.c... The interesting part is that I'm hitting these two over and over now rather than the iov_iter_pipe warning. Could just be unlucky randomness though.. Dave
Re: iov_iter_pipe warning.
On Sun, Sep 10, 2017 at 09:05:48PM +0100, Al Viro wrote: > On Sun, Sep 10, 2017 at 12:07:10PM -0400, Dave Jones wrote: > > On Sun, Sep 10, 2017 at 03:57:21AM +0100, Al Viro wrote: > > > On Sat, Sep 09, 2017 at 09:07:56PM -0400, Dave Jones wrote: > > > > > > > With this in place, I'm still seeing -EBUSY from > > invalidate_inode_pages2_range > > > > which doesn't end well... > > > > > > Different issue, and I'm not sure why that WARN_ON() is there in the > > > first place. Note that in a similar situation > > generic_file_direct_write() > > > simply buggers off and lets the caller do buffered write... > > > > > > iov_iter_pipe() warning is a sign of ->read_iter() on pipe-backed > > iov_iter > > > putting into the pipe more than it claims to have done. > > > > (from a rerun after hitting that EBUSY warn; hence the taint) > > > > WARNING: CPU: 0 PID: 14154 at fs/iomap.c:1055 iomap_dio_rw+0x78e/0x840 > > ... and that's another invalidate_inode_pages2_range() in the same > sucker. Again, compare with generic_file_direct_write()... > > I don't believe that this one has anything splice-specific to do with it. > And its only relation to iov_iter_pipe() splat is that it's in the same > fs/iomap.c... The interesting part is that I'm hitting these two over and over now rather than the iov_iter_pipe warning. Could just be unlucky randomness though.. Dave
Re: iov_iter_pipe warning.
On Sun, Sep 10, 2017 at 03:57:21AM +0100, Al Viro wrote: > On Sat, Sep 09, 2017 at 09:07:56PM -0400, Dave Jones wrote: > > > With this in place, I'm still seeing -EBUSY from > > invalidate_inode_pages2_range > > which doesn't end well... > > Different issue, and I'm not sure why that WARN_ON() is there in the > first place. Note that in a similar situation generic_file_direct_write() > simply buggers off and lets the caller do buffered write... > > iov_iter_pipe() warning is a sign of ->read_iter() on pipe-backed iov_iter > putting into the pipe more than it claims to have done. (from a rerun after hitting that EBUSY warn; hence the taint) WARNING: CPU: 0 PID: 14154 at fs/iomap.c:1055 iomap_dio_rw+0x78e/0x840 CPU: 0 PID: 14154 Comm: trinity-c33 Tainted: GW 4.13.0-think+ #9 task: 8801027e3e40 task.stack: 8801632d8000 RIP: 0010:iomap_dio_rw+0x78e/0x840 RSP: 0018:8801632df370 EFLAGS: 00010286 RAX: fff0 RBX: 880428666428 RCX: ffea RDX: ed002c65bdef RSI: RDI: ed002c65be5f RBP: 8801632df550 R08: 88046ae176c0 R09: R10: 8801632de960 R11: 0001 R12: 8801632df7f0 R13: ffea R14: 11002c65be7c R15: 8801632df988 FS: 7f3da2100700() GS:88046ae0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 0002f6223001 CR4: 001606f0 DR0: 7f3da1f3d000 DR1: DR2: DR3: DR6: 0ff0 DR7: 0600 Call Trace: ? iomap_seek_data+0xb0/0xb0 ? find_inode_fast+0xd0/0xd0 ? xfs_file_aio_write_checks+0x295/0x320 [xfs] ? __lock_is_held+0x51/0xc0 ? xfs_file_dio_aio_write+0x286/0x7e0 [xfs] ? rcu_read_lock_sched_held+0x90/0xa0 xfs_file_dio_aio_write+0x286/0x7e0 [xfs] ? xfs_file_aio_write_checks+0x320/0x320 [xfs] ? unwind_get_return_address+0x2f/0x50 ? __save_stack_trace+0x92/0x100 ? memcmp+0x45/0x70 ? depot_save_stack+0x12e/0x480 ? save_stack+0x89/0xb0 ? save_stack+0x32/0xb0 ? kasan_kmalloc+0xa0/0xd0 ? __kmalloc+0x157/0x360 ? iter_file_splice_write+0x154/0x760 ? direct_splice_actor+0x86/0xa0 ? splice_direct_to_actor+0x1c4/0x420 ? do_splice_direct+0x173/0x1e0 ? do_sendfile+0x3a2/0x6d0 ? SyS_sendfile64+0xa4/0x130 ? do_syscall_64+0x182/0x3e0 ? entry_SYSCALL64_slow_path+0x25/0x25 ? match_held_lock+0xa6/0x410 ? iter_file_splice_write+0x154/0x760 xfs_file_write_iter+0x227/0x280 [xfs] do_iter_readv_writev+0x267/0x330 ? vfs_dedupe_file_range+0x400/0x400 do_iter_write+0xd7/0x280 ? splice_from_pipe_next.part.9+0x28/0x160 iter_file_splice_write+0x4d5/0x760 ? page_cache_pipe_buf_steal+0x2b0/0x2b0 ? generic_file_splice_read+0x2e1/0x340 ? pipe_to_user+0x80/0x80 direct_splice_actor+0x86/0xa0 splice_direct_to_actor+0x1c4/0x420 ? generic_pipe_buf_nosteal+0x10/0x10 ? do_splice_to+0xc0/0xc0 do_splice_direct+0x173/0x1e0 ? splice_direct_to_actor+0x420/0x420 ? rcu_read_lock_sched_held+0x90/0xa0 ? rcu_sync_lockdep_assert+0x43/0x70 ? __sb_start_write+0x179/0x1e0 do_sendfile+0x3a2/0x6d0 ? do_compat_pwritev64+0xa0/0xa0 ? __lock_is_held+0x2e/0xc0 SyS_sendfile64+0xa4/0x130 ? SyS_sendfile+0x140/0x140 ? mark_held_locks+0x1c/0x90 ? do_syscall_64+0xae/0x3e0 ? SyS_sendfile+0x140/0x140 do_syscall_64+0x182/0x3e0 ? syscall_return_slowpath+0x250/0x250 ? rcu_read_lock_sched_held+0x90/0xa0 ? __context_tracking_exit.part.4+0x223/0x290 ? mark_held_locks+0x1c/0x90 ? return_from_SYSCALL_64+0x2d/0x7a ? trace_hardirqs_on_caller+0x17a/0x250 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f3da1a2b219 RSP: 002b:7ffdd1642f38 EFLAGS: 0246 ORIG_RAX: 0028 RAX: ffda RBX: 0028 RCX: 7f3da1a2b219 RDX: 7f3da1f3d000 RSI: 005f RDI: 0060 RBP: 7ffdd1642fe0 R08: 30503123188dbe3f R09: e7e7e7e7 R10: f000 R11: 0246 R12: 0002 R13: 7f3da2012058 R14: 7f3da2100698 R15: 7f3da2012000
Re: iov_iter_pipe warning.
On Sun, Sep 10, 2017 at 03:57:21AM +0100, Al Viro wrote: > On Sat, Sep 09, 2017 at 09:07:56PM -0400, Dave Jones wrote: > > > With this in place, I'm still seeing -EBUSY from > > invalidate_inode_pages2_range > > which doesn't end well... > > Different issue, and I'm not sure why that WARN_ON() is there in the > first place. Note that in a similar situation generic_file_direct_write() > simply buggers off and lets the caller do buffered write... > > iov_iter_pipe() warning is a sign of ->read_iter() on pipe-backed iov_iter > putting into the pipe more than it claims to have done. (from a rerun after hitting that EBUSY warn; hence the taint) WARNING: CPU: 0 PID: 14154 at fs/iomap.c:1055 iomap_dio_rw+0x78e/0x840 CPU: 0 PID: 14154 Comm: trinity-c33 Tainted: GW 4.13.0-think+ #9 task: 8801027e3e40 task.stack: 8801632d8000 RIP: 0010:iomap_dio_rw+0x78e/0x840 RSP: 0018:8801632df370 EFLAGS: 00010286 RAX: fff0 RBX: 880428666428 RCX: ffea RDX: ed002c65bdef RSI: RDI: ed002c65be5f RBP: 8801632df550 R08: 88046ae176c0 R09: R10: 8801632de960 R11: 0001 R12: 8801632df7f0 R13: ffea R14: 11002c65be7c R15: 8801632df988 FS: 7f3da2100700() GS:88046ae0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 0002f6223001 CR4: 001606f0 DR0: 7f3da1f3d000 DR1: DR2: DR3: DR6: 0ff0 DR7: 0600 Call Trace: ? iomap_seek_data+0xb0/0xb0 ? find_inode_fast+0xd0/0xd0 ? xfs_file_aio_write_checks+0x295/0x320 [xfs] ? __lock_is_held+0x51/0xc0 ? xfs_file_dio_aio_write+0x286/0x7e0 [xfs] ? rcu_read_lock_sched_held+0x90/0xa0 xfs_file_dio_aio_write+0x286/0x7e0 [xfs] ? xfs_file_aio_write_checks+0x320/0x320 [xfs] ? unwind_get_return_address+0x2f/0x50 ? __save_stack_trace+0x92/0x100 ? memcmp+0x45/0x70 ? depot_save_stack+0x12e/0x480 ? save_stack+0x89/0xb0 ? save_stack+0x32/0xb0 ? kasan_kmalloc+0xa0/0xd0 ? __kmalloc+0x157/0x360 ? iter_file_splice_write+0x154/0x760 ? direct_splice_actor+0x86/0xa0 ? splice_direct_to_actor+0x1c4/0x420 ? do_splice_direct+0x173/0x1e0 ? do_sendfile+0x3a2/0x6d0 ? SyS_sendfile64+0xa4/0x130 ? do_syscall_64+0x182/0x3e0 ? entry_SYSCALL64_slow_path+0x25/0x25 ? match_held_lock+0xa6/0x410 ? iter_file_splice_write+0x154/0x760 xfs_file_write_iter+0x227/0x280 [xfs] do_iter_readv_writev+0x267/0x330 ? vfs_dedupe_file_range+0x400/0x400 do_iter_write+0xd7/0x280 ? splice_from_pipe_next.part.9+0x28/0x160 iter_file_splice_write+0x4d5/0x760 ? page_cache_pipe_buf_steal+0x2b0/0x2b0 ? generic_file_splice_read+0x2e1/0x340 ? pipe_to_user+0x80/0x80 direct_splice_actor+0x86/0xa0 splice_direct_to_actor+0x1c4/0x420 ? generic_pipe_buf_nosteal+0x10/0x10 ? do_splice_to+0xc0/0xc0 do_splice_direct+0x173/0x1e0 ? splice_direct_to_actor+0x420/0x420 ? rcu_read_lock_sched_held+0x90/0xa0 ? rcu_sync_lockdep_assert+0x43/0x70 ? __sb_start_write+0x179/0x1e0 do_sendfile+0x3a2/0x6d0 ? do_compat_pwritev64+0xa0/0xa0 ? __lock_is_held+0x2e/0xc0 SyS_sendfile64+0xa4/0x130 ? SyS_sendfile+0x140/0x140 ? mark_held_locks+0x1c/0x90 ? do_syscall_64+0xae/0x3e0 ? SyS_sendfile+0x140/0x140 do_syscall_64+0x182/0x3e0 ? syscall_return_slowpath+0x250/0x250 ? rcu_read_lock_sched_held+0x90/0xa0 ? __context_tracking_exit.part.4+0x223/0x290 ? mark_held_locks+0x1c/0x90 ? return_from_SYSCALL_64+0x2d/0x7a ? trace_hardirqs_on_caller+0x17a/0x250 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f3da1a2b219 RSP: 002b:7ffdd1642f38 EFLAGS: 0246 ORIG_RAX: 0028 RAX: ffda RBX: 0028 RCX: 7f3da1a2b219 RDX: 7f3da1f3d000 RSI: 005f RDI: 0060 RBP: 7ffdd1642fe0 R08: 30503123188dbe3f R09: e7e7e7e7 R10: f000 R11: 0246 R12: 0002 R13: 7f3da2012058 R14: 7f3da2100698 R15: 7f3da2012000
Re: iov_iter_pipe warning.
On Fri, Sep 08, 2017 at 02:04:41AM +0100, Al Viro wrote: > There's at least one suspicious place in iomap_dio_actor() - > if (!(dio->flags & IOMAP_DIO_WRITE)) { > iov_iter_zero(length, dio->submit.iter); > dio->size += length; > return length; > } > which assumes that iov_iter_zero() always succeeds. That's very > much _not_ true - neither for iovec-backed, not for pipe-backed. > Orangefs read_one_page() is fine (it calls that sucker for bvec-backed > iov_iter it's just created), but iomap_dio_actor() is not. > > I'm not saying that it will suffice, but we definitely need this: > > diff --git a/fs/iomap.c b/fs/iomap.c > index 269b24a01f32..4a671263475f 100644 > --- a/fs/iomap.c > +++ b/fs/iomap.c > @@ -843,7 +843,7 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t > length, > /*FALLTHRU*/ > case IOMAP_UNWRITTEN: > if (!(dio->flags & IOMAP_DIO_WRITE)) { > -iov_iter_zero(length, dio->submit.iter); > +length = iov_iter_zero(length, dio->submit.iter); > dio->size += length; > return length; With this in place, I'm still seeing -EBUSY from invalidate_inode_pages2_range which doesn't end well... WARNING: CPU: 3 PID: 11443 at fs/iomap.c:993 iomap_dio_rw+0x825/0x840 CPU: 3 PID: 11443 Comm: trinity-c39 Not tainted 4.13.0-think+ #9 task: 880461080040 task.stack: 88043d72 RIP: 0010:iomap_dio_rw+0x825/0x840 RSP: 0018:88043d727730 EFLAGS: 00010286 RAX: fff0 RBX: 88044f036428 RCX: RDX: ed0087ae4e67 RSI: RDI: ed0087ae4ed7 RBP: 88043d727910 R08: 88046b4176c0 R09: R10: 88043d726d20 R11: 0001 R12: 88043d727a90 R13: 027253f7 R14: 110087ae4ef4 R15: 88043d727c10 FS: 7f5d8613e700() GS:88046b40() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f5d84503000 CR3: 0004594e1000 CR4: 001606e0 Call Trace: ? iomap_seek_data+0xb0/0xb0 ? down_read_nested+0xd3/0x160 ? down_read_non_owner+0x40/0x40 ? xfs_ilock+0x3cb/0x460 [xfs] ? sched_clock_cpu+0x14/0xf0 ? __lock_is_held+0x51/0xc0 ? xfs_file_dio_aio_read+0x123/0x350 [xfs] xfs_file_dio_aio_read+0x123/0x350 [xfs] ? xfs_file_fallocate+0x550/0x550 [xfs] ? lock_release+0xa00/0xa00 ? ___might_sleep.part.70+0x118/0x320 xfs_file_read_iter+0x1b1/0x1d0 [xfs] do_iter_readv_writev+0x2ea/0x330 ? vfs_dedupe_file_range+0x400/0x400 do_iter_read+0x149/0x280 vfs_readv+0x107/0x180 ? vfs_iter_read+0x60/0x60 ? fget_raw+0x10/0x10 ? native_sched_clock+0xf9/0x1a0 ? __fdget_pos+0xd6/0x110 ? __fdget_pos+0xd6/0x110 ? __fdget_raw+0x10/0x10 ? do_readv+0xc0/0x1b0 do_readv+0xc0/0x1b0 ? vfs_readv+0x180/0x180 ? mark_held_locks+0x1c/0x90 ? do_syscall_64+0xae/0x3e0 ? compat_rw_copy_check_uvector+0x1b0/0x1b0 do_syscall_64+0x182/0x3e0 ? syscall_return_slowpath+0x250/0x250 ? rcu_read_lock_sched_held+0x90/0xa0 ? mark_held_locks+0x1c/0x90 ? return_from_SYSCALL_64+0x2d/0x7a ? trace_hardirqs_on_caller+0x17a/0x250 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f5d85a69219 RSP: 002b:7ffdf090afd8 EFLAGS: 0246 ORIG_RAX: 0013 RAX: ffda RBX: 0013 RCX: 7f5d85a69219 RDX: 00ae RSI: 565183cd5490 RDI: 0056 RBP: 7ffdf090b080 R08: 0141082b00011c63 R09: R10: e000 R11: 0246 R12: 0002 R13: 7f5d86026058 R14: 7f5d8613e698 R15: 7f5d86026000
Re: iov_iter_pipe warning.
On Fri, Sep 08, 2017 at 02:04:41AM +0100, Al Viro wrote: > There's at least one suspicious place in iomap_dio_actor() - > if (!(dio->flags & IOMAP_DIO_WRITE)) { > iov_iter_zero(length, dio->submit.iter); > dio->size += length; > return length; > } > which assumes that iov_iter_zero() always succeeds. That's very > much _not_ true - neither for iovec-backed, not for pipe-backed. > Orangefs read_one_page() is fine (it calls that sucker for bvec-backed > iov_iter it's just created), but iomap_dio_actor() is not. > > I'm not saying that it will suffice, but we definitely need this: > > diff --git a/fs/iomap.c b/fs/iomap.c > index 269b24a01f32..4a671263475f 100644 > --- a/fs/iomap.c > +++ b/fs/iomap.c > @@ -843,7 +843,7 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t > length, > /*FALLTHRU*/ > case IOMAP_UNWRITTEN: > if (!(dio->flags & IOMAP_DIO_WRITE)) { > -iov_iter_zero(length, dio->submit.iter); > +length = iov_iter_zero(length, dio->submit.iter); > dio->size += length; > return length; With this in place, I'm still seeing -EBUSY from invalidate_inode_pages2_range which doesn't end well... WARNING: CPU: 3 PID: 11443 at fs/iomap.c:993 iomap_dio_rw+0x825/0x840 CPU: 3 PID: 11443 Comm: trinity-c39 Not tainted 4.13.0-think+ #9 task: 880461080040 task.stack: 88043d72 RIP: 0010:iomap_dio_rw+0x825/0x840 RSP: 0018:88043d727730 EFLAGS: 00010286 RAX: fff0 RBX: 88044f036428 RCX: RDX: ed0087ae4e67 RSI: RDI: ed0087ae4ed7 RBP: 88043d727910 R08: 88046b4176c0 R09: R10: 88043d726d20 R11: 0001 R12: 88043d727a90 R13: 027253f7 R14: 110087ae4ef4 R15: 88043d727c10 FS: 7f5d8613e700() GS:88046b40() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f5d84503000 CR3: 0004594e1000 CR4: 001606e0 Call Trace: ? iomap_seek_data+0xb0/0xb0 ? down_read_nested+0xd3/0x160 ? down_read_non_owner+0x40/0x40 ? xfs_ilock+0x3cb/0x460 [xfs] ? sched_clock_cpu+0x14/0xf0 ? __lock_is_held+0x51/0xc0 ? xfs_file_dio_aio_read+0x123/0x350 [xfs] xfs_file_dio_aio_read+0x123/0x350 [xfs] ? xfs_file_fallocate+0x550/0x550 [xfs] ? lock_release+0xa00/0xa00 ? ___might_sleep.part.70+0x118/0x320 xfs_file_read_iter+0x1b1/0x1d0 [xfs] do_iter_readv_writev+0x2ea/0x330 ? vfs_dedupe_file_range+0x400/0x400 do_iter_read+0x149/0x280 vfs_readv+0x107/0x180 ? vfs_iter_read+0x60/0x60 ? fget_raw+0x10/0x10 ? native_sched_clock+0xf9/0x1a0 ? __fdget_pos+0xd6/0x110 ? __fdget_pos+0xd6/0x110 ? __fdget_raw+0x10/0x10 ? do_readv+0xc0/0x1b0 do_readv+0xc0/0x1b0 ? vfs_readv+0x180/0x180 ? mark_held_locks+0x1c/0x90 ? do_syscall_64+0xae/0x3e0 ? compat_rw_copy_check_uvector+0x1b0/0x1b0 do_syscall_64+0x182/0x3e0 ? syscall_return_slowpath+0x250/0x250 ? rcu_read_lock_sched_held+0x90/0xa0 ? mark_held_locks+0x1c/0x90 ? return_from_SYSCALL_64+0x2d/0x7a ? trace_hardirqs_on_caller+0x17a/0x250 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f5d85a69219 RSP: 002b:7ffdf090afd8 EFLAGS: 0246 ORIG_RAX: 0013 RAX: ffda RBX: 0013 RCX: 7f5d85a69219 RDX: 00ae RSI: 565183cd5490 RDI: 0056 RBP: 7ffdf090b080 R08: 0141082b00011c63 R09: R10: e000 R11: 0246 R12: 0002 R13: 7f5d86026058 R14: 7f5d8613e698 R15: 7f5d86026000
Re: iov_iter_pipe warning.
On Thu, Sep 07, 2017 at 09:46:17AM +1000, Dave Chinner wrote: > On Wed, Sep 06, 2017 at 04:03:37PM -0400, Dave Jones wrote: > > On Mon, Aug 28, 2017 at 09:25:42PM -0700, Darrick J. Wong wrote: > > > On Mon, Aug 28, 2017 at 04:31:30PM -0400, Dave Jones wrote: > > > > I'm still trying to narrow down an exact reproducer, but it seems > > having > > > > trinity do a combination of sendfile & writev, with pipes and regular > > > > files as fd's is the best repro. > > > > > > > > Is this a real problem, or am I chasing ghosts ? That it doesn't > > happen > > > > on ext4 or btrfs is making me wonder... > > > > > > I haven't heard of any problems w/ directio xfs lately, but OTOH > > > I think it's the only filesystem that uses iomap_dio_rw, which would > > > explain why ext4/btrfs don't have this problem. > > > > Another warning, from likely the same root cause. > > > > WARNING: CPU: 3 PID: 572 at lib/iov_iter.c:962 iov_iter_pipe+0xe2/0xf0 > > WARN_ON(pipe->nrbufs == pipe->buffers); > > * @nrbufs: the number of non-empty pipe buffers in this pipe > * @buffers: total number of buffers (should be a power of 2) > > So that's warning that the pipe buffer is already full before we > try to read from the filesystem? > > That doesn't seem like an XFS problem - it indicates the pipe we are > filling in generic_file_splice_read() is not being emptied by > whatever we are splicing the file data to The puzzling part is this runs for a day on ext4 or btrfs, whereas I can make xfs fall over pretty quickly. As Darrick pointed out though, this could be due to xfs being the only user of iomap_dio_rw. I'm juggling a few other things right now, so probably not going to have much time to dig further on this until after plumbers + 1 wk. Dave
Re: iov_iter_pipe warning.
On Thu, Sep 07, 2017 at 09:46:17AM +1000, Dave Chinner wrote: > On Wed, Sep 06, 2017 at 04:03:37PM -0400, Dave Jones wrote: > > On Mon, Aug 28, 2017 at 09:25:42PM -0700, Darrick J. Wong wrote: > > > On Mon, Aug 28, 2017 at 04:31:30PM -0400, Dave Jones wrote: > > > > I'm still trying to narrow down an exact reproducer, but it seems > > having > > > > trinity do a combination of sendfile & writev, with pipes and regular > > > > files as fd's is the best repro. > > > > > > > > Is this a real problem, or am I chasing ghosts ? That it doesn't > > happen > > > > on ext4 or btrfs is making me wonder... > > > > > > I haven't heard of any problems w/ directio xfs lately, but OTOH > > > I think it's the only filesystem that uses iomap_dio_rw, which would > > > explain why ext4/btrfs don't have this problem. > > > > Another warning, from likely the same root cause. > > > > WARNING: CPU: 3 PID: 572 at lib/iov_iter.c:962 iov_iter_pipe+0xe2/0xf0 > > WARN_ON(pipe->nrbufs == pipe->buffers); > > * @nrbufs: the number of non-empty pipe buffers in this pipe > * @buffers: total number of buffers (should be a power of 2) > > So that's warning that the pipe buffer is already full before we > try to read from the filesystem? > > That doesn't seem like an XFS problem - it indicates the pipe we are > filling in generic_file_splice_read() is not being emptied by > whatever we are splicing the file data to The puzzling part is this runs for a day on ext4 or btrfs, whereas I can make xfs fall over pretty quickly. As Darrick pointed out though, this could be due to xfs being the only user of iomap_dio_rw. I'm juggling a few other things right now, so probably not going to have much time to dig further on this until after plumbers + 1 wk. Dave
Re: x86/kconfig: Consolidate unwinders into multiple choice selection
On Wed, Sep 06, 2017 at 04:49:45PM -0500, Josh Poimboeuf wrote: > > Choose kernel unwinder > > > 1. Frame pointer unwinder (FRAME_POINTER_UNWINDER) (NEW) > > 2. ORC unwinder (ORC_UNWINDER) (NEW) > > 3. Guess unwinder (GUESS_UNWINDER) (NEW) > > choice[1-3?]: > > This is a quirk of the config tool. It's not very intuitive, but to see > the help for a given option you have to type the number appended with a > '?', like: > > > 1. Frame pointer unwinder (FRAME_POINTER_UNWINDER) (NEW) > 2. ORC unwinder (ORC_UNWINDER) (NEW) > choice[1-2?]: 1? Hey, I learned something today! thanks, Dave
Re: x86/kconfig: Consolidate unwinders into multiple choice selection
On Wed, Sep 06, 2017 at 04:49:45PM -0500, Josh Poimboeuf wrote: > > Choose kernel unwinder > > > 1. Frame pointer unwinder (FRAME_POINTER_UNWINDER) (NEW) > > 2. ORC unwinder (ORC_UNWINDER) (NEW) > > 3. Guess unwinder (GUESS_UNWINDER) (NEW) > > choice[1-3?]: > > This is a quirk of the config tool. It's not very intuitive, but to see > the help for a given option you have to type the number appended with a > '?', like: > > > 1. Frame pointer unwinder (FRAME_POINTER_UNWINDER) (NEW) > 2. ORC unwinder (ORC_UNWINDER) (NEW) > choice[1-2?]: 1? Hey, I learned something today! thanks, Dave
Re: iov_iter_pipe warning.
On Mon, Aug 28, 2017 at 09:25:42PM -0700, Darrick J. Wong wrote: > On Mon, Aug 28, 2017 at 04:31:30PM -0400, Dave Jones wrote: > > On Mon, Aug 07, 2017 at 04:18:18PM -0400, Dave Jones wrote: > > > On Fri, Apr 28, 2017 at 06:20:25PM +0100, Al Viro wrote: > > > > On Fri, Apr 28, 2017 at 12:50:24PM -0400, Dave Jones wrote: > > > > > currently running v4.11-rc8-75-gf83246089ca0 > > > > > > > > > > sunrpc bit is for the other unrelated problem I'm chasing. > > > > > > > > > > note also, I saw the backtrace without the fs/splice.c changes. > > > > > > > > Interesting... Could you add this and see if that triggers? > > > > > > > > diff --git a/fs/splice.c b/fs/splice.c > > > > index 540c4a44756c..12a12d9c313f 100644 > > > > --- a/fs/splice.c > > > > +++ b/fs/splice.c > > > > @@ -306,6 +306,9 @@ ssize_t generic_file_splice_read(struct file > > *in, loff_t *ppos, > > > > kiocb.ki_pos = *ppos; > > > > ret = call_read_iter(in, , ); > > > > if (ret > 0) { > > > > +if (WARN_ON(iov_iter_count() != len - ret)) > > > > +printk(KERN_ERR "ops %p: was %zd, left %zd, > > returned %d\n", > > > > +in->f_op, len, iov_iter_count(), > > ret); > > > > *ppos = kiocb.ki_pos; > > > > file_accessed(in); > > > > } else if (ret < 0) { > > > > > > Hey Al, > > > Due to a git stash screw up on my part, I've had this leftover WARN_ON > > > in my tree for the last couple months. (That screw-up might turn out to > > be > > > serendipitous if this is a real bug..) > > > > > > Today I decided to change things up and beat up on xfs for a change, and > > > was able to trigger this again. > > > > > > Is this check no longer valid, or am I triggering the same bug we were > > chased > > > down in nfs, but now in xfs ? (None of the other detritus from that > > debugging > > > back in April made it, just those three lines above). > > > > Revisiting this. I went back and dug out some of the other debug diffs [1] > > from that old thread. > > > > I can easily trigger this spew on xfs. > > > > > > WARNING: CPU: 1 PID: 2251 at fs/splice.c:292 test_it+0xd4/0x1d0 > > CPU: 1 PID: 2251 Comm: trinity-c42 Not tainted 4.13.0-rc7-think+ #1 > > task: 880459173a40 task.stack: 88044f7d > > RIP: 0010:test_it+0xd4/0x1d0 > > RSP: 0018:88044f7d7878 EFLAGS: 00010283 > > RAX: RBX: 88044f44b968 RCX: 81511ea0 > > RDX: 0003 RSI: dc00 RDI: 88044f44ba68 > > RBP: 88044f7d78c8 R08: 88046b218ec0 R09: > > R10: 88044f7d7518 R11: R12: 1000 > > R13: 0001 R14: R15: 0001 > > FS: 7fdbc09b2700() GS:88046b20() > > knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: CR3: 000459e1d000 CR4: 001406e0 > > Call Trace: > > generic_file_splice_read+0x414/0x4e0 > > ? opipe_prep.part.14+0x180/0x180 > > ? lockdep_init_map+0xb2/0x2b0 > > ? rw_verify_area+0x65/0x150 > > do_splice_to+0xab/0xc0 > > splice_direct_to_actor+0x1f5/0x540 > > ? generic_pipe_buf_nosteal+0x10/0x10 > > ? do_splice_to+0xc0/0xc0 > > ? rw_verify_area+0x9d/0x150 > > do_splice_direct+0x1b9/0x230 > > ? splice_direct_to_actor+0x540/0x540 > > ? __sb_start_write+0x164/0x1c0 > > ? do_sendfile+0x7b3/0x840 > > do_sendfile+0x428/0x840 > > ? do_compat_pwritev64+0xb0/0xb0 > > ? __might_sleep+0x72/0xe0 > > ? kasan_check_write+0x14/0x20 > > SyS_sendfile64+0xa4/0x120 > > ? SyS_sendfile+0x150/0x150 > > ? mark_held_locks+0x23/0xb0 > > ? do_syscall_64+0xc0/0x3e0 > > ? SyS_sendfile+0x150/0x150 > > do_syscall_64+0x1bc/0x3e0 > > ? syscall_return_slowpath+0x240/0x240 > > ? mark_held_locks+0x23/0xb0 > > ? return_from_SYSCALL_64+0x2d/0x7a > > ? trace_hardirqs_on_caller+0x182/0x260 > > ? trace_hardirqs_on_thunk+0x1a/0x1c > > entry_SYSCALL64_slow_path+0x25/0x25 > > RIP: 0033:0x7fdbc02dd219 > > RSP: 002b:7ffc5024
Re: iov_iter_pipe warning.
On Mon, Aug 28, 2017 at 09:25:42PM -0700, Darrick J. Wong wrote: > On Mon, Aug 28, 2017 at 04:31:30PM -0400, Dave Jones wrote: > > On Mon, Aug 07, 2017 at 04:18:18PM -0400, Dave Jones wrote: > > > On Fri, Apr 28, 2017 at 06:20:25PM +0100, Al Viro wrote: > > > > On Fri, Apr 28, 2017 at 12:50:24PM -0400, Dave Jones wrote: > > > > > currently running v4.11-rc8-75-gf83246089ca0 > > > > > > > > > > sunrpc bit is for the other unrelated problem I'm chasing. > > > > > > > > > > note also, I saw the backtrace without the fs/splice.c changes. > > > > > > > > Interesting... Could you add this and see if that triggers? > > > > > > > > diff --git a/fs/splice.c b/fs/splice.c > > > > index 540c4a44756c..12a12d9c313f 100644 > > > > --- a/fs/splice.c > > > > +++ b/fs/splice.c > > > > @@ -306,6 +306,9 @@ ssize_t generic_file_splice_read(struct file > > *in, loff_t *ppos, > > > > kiocb.ki_pos = *ppos; > > > > ret = call_read_iter(in, , ); > > > > if (ret > 0) { > > > > +if (WARN_ON(iov_iter_count() != len - ret)) > > > > +printk(KERN_ERR "ops %p: was %zd, left %zd, > > returned %d\n", > > > > +in->f_op, len, iov_iter_count(), > > ret); > > > > *ppos = kiocb.ki_pos; > > > > file_accessed(in); > > > > } else if (ret < 0) { > > > > > > Hey Al, > > > Due to a git stash screw up on my part, I've had this leftover WARN_ON > > > in my tree for the last couple months. (That screw-up might turn out to > > be > > > serendipitous if this is a real bug..) > > > > > > Today I decided to change things up and beat up on xfs for a change, and > > > was able to trigger this again. > > > > > > Is this check no longer valid, or am I triggering the same bug we were > > chased > > > down in nfs, but now in xfs ? (None of the other detritus from that > > debugging > > > back in April made it, just those three lines above). > > > > Revisiting this. I went back and dug out some of the other debug diffs [1] > > from that old thread. > > > > I can easily trigger this spew on xfs. > > > > > > WARNING: CPU: 1 PID: 2251 at fs/splice.c:292 test_it+0xd4/0x1d0 > > CPU: 1 PID: 2251 Comm: trinity-c42 Not tainted 4.13.0-rc7-think+ #1 > > task: 880459173a40 task.stack: 88044f7d > > RIP: 0010:test_it+0xd4/0x1d0 > > RSP: 0018:88044f7d7878 EFLAGS: 00010283 > > RAX: RBX: 88044f44b968 RCX: 81511ea0 > > RDX: 0003 RSI: dc00 RDI: 88044f44ba68 > > RBP: 88044f7d78c8 R08: 88046b218ec0 R09: > > R10: 88044f7d7518 R11: R12: 1000 > > R13: 0001 R14: R15: 0001 > > FS: 7fdbc09b2700() GS:88046b20() > > knlGS: > > CS: 0010 DS: ES: CR0: 80050033 > > CR2: CR3: 000459e1d000 CR4: 001406e0 > > Call Trace: > > generic_file_splice_read+0x414/0x4e0 > > ? opipe_prep.part.14+0x180/0x180 > > ? lockdep_init_map+0xb2/0x2b0 > > ? rw_verify_area+0x65/0x150 > > do_splice_to+0xab/0xc0 > > splice_direct_to_actor+0x1f5/0x540 > > ? generic_pipe_buf_nosteal+0x10/0x10 > > ? do_splice_to+0xc0/0xc0 > > ? rw_verify_area+0x9d/0x150 > > do_splice_direct+0x1b9/0x230 > > ? splice_direct_to_actor+0x540/0x540 > > ? __sb_start_write+0x164/0x1c0 > > ? do_sendfile+0x7b3/0x840 > > do_sendfile+0x428/0x840 > > ? do_compat_pwritev64+0xb0/0xb0 > > ? __might_sleep+0x72/0xe0 > > ? kasan_check_write+0x14/0x20 > > SyS_sendfile64+0xa4/0x120 > > ? SyS_sendfile+0x150/0x150 > > ? mark_held_locks+0x23/0xb0 > > ? do_syscall_64+0xc0/0x3e0 > > ? SyS_sendfile+0x150/0x150 > > do_syscall_64+0x1bc/0x3e0 > > ? syscall_return_slowpath+0x240/0x240 > > ? mark_held_locks+0x23/0xb0 > > ? return_from_SYSCALL_64+0x2d/0x7a > > ? trace_hardirqs_on_caller+0x182/0x260 > > ? trace_hardirqs_on_thunk+0x1a/0x1c > > entry_SYSCALL64_slow_path+0x25/0x25 > > RIP: 0033:0x7fdbc02dd219 > > RSP: 002b:7ffc5024
Re: x86/kconfig: Consolidate unwinders into multiple choice selection
On Mon, Sep 04, 2017 at 08:05:13PM +, Linux Kernel wrote: > Web: > https://git.kernel.org/torvalds/c/81d387190039c14edac8de2b3ec789beb899afd9 > Commit: 81d387190039c14edac8de2b3ec789beb899afd9 > Parent: a34a766ff96d9e88572e35a45066279e40a85d84 > Refname:refs/heads/master > Author: Josh Poimboeuf> AuthorDate: Tue Jul 25 08:54:24 2017 -0500 > Committer: Ingo Molnar > CommitDate: Wed Jul 26 14:05:36 2017 +0200 > > x86/kconfig: Consolidate unwinders into multiple choice selection > > There are three mutually exclusive unwinders. Make that more obvious by > combining them into a multiple-choice selection: > > CONFIG_FRAME_POINTER_UNWINDER > CONFIG_ORC_UNWINDER > CONFIG_GUESS_UNWINDER (if CONFIG_EXPERT=y) The help texts for the various unwinders are now attached to the wrong kconfig item. > +choice > +prompt "Choose kernel unwinder" > +default FRAME_POINTER_UNWINDER > +---help--- > + This determines which method will be used for unwinding kernel stack > + traces for panics, oopses, bugs, warnings, perf, /proc//stack, > + livepatch, lockdep, and more. This is what gets displayed, but tells me nothing about what the benefits/downsides are of each (or even what they are; I had to read the Kconfig file to figure out what 'GUESS' meant) an oldconfig run .. Choose kernel unwinder > 1. Frame pointer unwinder (FRAME_POINTER_UNWINDER) (NEW) 2. ORC unwinder (ORC_UNWINDER) (NEW) 3. Guess unwinder (GUESS_UNWINDER) (NEW) choice[1-3?]: ? This determines which method will be used for unwinding kernel stack traces for panics, oopses, bugs, warnings, perf, /proc//stack, livepatch, lockdep, and more. Prompt: Choose kernel unwinder Location: -> Kernel hacking Defined at arch/x86/Kconfig.debug:359 Selected by: m Choose kernel unwinder > 1. Frame pointer unwinder (FRAME_POINTER_UNWINDER) (NEW) 2. ORC unwinder (ORC_UNWINDER) (NEW) 3. Guess unwinder (GUESS_UNWINDER) (NEW) choice[1-3?]: Dave
Re: x86/kconfig: Consolidate unwinders into multiple choice selection
On Mon, Sep 04, 2017 at 08:05:13PM +, Linux Kernel wrote: > Web: > https://git.kernel.org/torvalds/c/81d387190039c14edac8de2b3ec789beb899afd9 > Commit: 81d387190039c14edac8de2b3ec789beb899afd9 > Parent: a34a766ff96d9e88572e35a45066279e40a85d84 > Refname:refs/heads/master > Author: Josh Poimboeuf > AuthorDate: Tue Jul 25 08:54:24 2017 -0500 > Committer: Ingo Molnar > CommitDate: Wed Jul 26 14:05:36 2017 +0200 > > x86/kconfig: Consolidate unwinders into multiple choice selection > > There are three mutually exclusive unwinders. Make that more obvious by > combining them into a multiple-choice selection: > > CONFIG_FRAME_POINTER_UNWINDER > CONFIG_ORC_UNWINDER > CONFIG_GUESS_UNWINDER (if CONFIG_EXPERT=y) The help texts for the various unwinders are now attached to the wrong kconfig item. > +choice > +prompt "Choose kernel unwinder" > +default FRAME_POINTER_UNWINDER > +---help--- > + This determines which method will be used for unwinding kernel stack > + traces for panics, oopses, bugs, warnings, perf, /proc//stack, > + livepatch, lockdep, and more. This is what gets displayed, but tells me nothing about what the benefits/downsides are of each (or even what they are; I had to read the Kconfig file to figure out what 'GUESS' meant) an oldconfig run .. Choose kernel unwinder > 1. Frame pointer unwinder (FRAME_POINTER_UNWINDER) (NEW) 2. ORC unwinder (ORC_UNWINDER) (NEW) 3. Guess unwinder (GUESS_UNWINDER) (NEW) choice[1-3?]: ? This determines which method will be used for unwinding kernel stack traces for panics, oopses, bugs, warnings, perf, /proc//stack, livepatch, lockdep, and more. Prompt: Choose kernel unwinder Location: -> Kernel hacking Defined at arch/x86/Kconfig.debug:359 Selected by: m Choose kernel unwinder > 1. Frame pointer unwinder (FRAME_POINTER_UNWINDER) (NEW) 2. ORC unwinder (ORC_UNWINDER) (NEW) 3. Guess unwinder (GUESS_UNWINDER) (NEW) choice[1-3?]: Dave
Re: iov_iter_pipe warning.
On Wed, Aug 30, 2017 at 10:13:43AM -0700, Darrick J. Wong wrote: > > I reverted the debug patches mentioned above, and ran trinity for a while > > again, > > and got this which smells really suspiciously related > > > > WARNING: CPU: 1 PID: 10380 at fs/iomap.c:993 iomap_dio_rw+0x825/0x840 > > RAX: fff0 RBX: 88046a64d0e8 RCX: > > > > > > > > That's this.. > > > > 987 ret = filemap_write_and_wait_range(mapping, start, end); > > 988 if (ret) > > 989 goto out_free_dio; > > 990 > > 991 ret = invalidate_inode_pages2_range(mapping, > > 992 start >> PAGE_SHIFT, end >> PAGE_SHIFT); > > 993 WARN_ON_ONCE(ret); > > > > > > Plot thickens.. > > Hm, that's the WARN_ON that comes from a failed pagecache invalidation > prior to a dio operation, which implies that something's mixing buffered > and dio? Plausible. Judging by RAX, we got -EBUSY > Given that it's syzkaller it wouldn't surprise me to hear that it's > doing that... :) s/syzkaller/trinity/, but yes. Dave
Re: iov_iter_pipe warning.
On Wed, Aug 30, 2017 at 10:13:43AM -0700, Darrick J. Wong wrote: > > I reverted the debug patches mentioned above, and ran trinity for a while > > again, > > and got this which smells really suspiciously related > > > > WARNING: CPU: 1 PID: 10380 at fs/iomap.c:993 iomap_dio_rw+0x825/0x840 > > RAX: fff0 RBX: 88046a64d0e8 RCX: > > > > > > > > That's this.. > > > > 987 ret = filemap_write_and_wait_range(mapping, start, end); > > 988 if (ret) > > 989 goto out_free_dio; > > 990 > > 991 ret = invalidate_inode_pages2_range(mapping, > > 992 start >> PAGE_SHIFT, end >> PAGE_SHIFT); > > 993 WARN_ON_ONCE(ret); > > > > > > Plot thickens.. > > Hm, that's the WARN_ON that comes from a failed pagecache invalidation > prior to a dio operation, which implies that something's mixing buffered > and dio? Plausible. Judging by RAX, we got -EBUSY > Given that it's syzkaller it wouldn't surprise me to hear that it's > doing that... :) s/syzkaller/trinity/, but yes. Dave
Re: iov_iter_pipe warning.
On Mon, Aug 28, 2017 at 09:25:42PM -0700, Darrick J. Wong wrote: > On Mon, Aug 28, 2017 at 04:31:30PM -0400, Dave Jones wrote: > > On Mon, Aug 07, 2017 at 04:18:18PM -0400, Dave Jones wrote: > > > On Fri, Apr 28, 2017 at 06:20:25PM +0100, Al Viro wrote: > > > > On Fri, Apr 28, 2017 at 12:50:24PM -0400, Dave Jones wrote: > > > > > > > > diff --git a/fs/splice.c b/fs/splice.c > > > > index 540c4a44756c..12a12d9c313f 100644 > > > > --- a/fs/splice.c > > > > +++ b/fs/splice.c > > > > @@ -306,6 +306,9 @@ ssize_t generic_file_splice_read(struct file > > *in, loff_t *ppos, > > > > kiocb.ki_pos = *ppos; > > > > ret = call_read_iter(in, , ); > > > > if (ret > 0) { > > > > +if (WARN_ON(iov_iter_count() != len - ret)) > > > > +printk(KERN_ERR "ops %p: was %zd, left %zd, > > returned %d\n", > > > > +in->f_op, len, iov_iter_count(), > > ret); > > > > *ppos = kiocb.ki_pos; > > > > file_accessed(in); > > > > } else if (ret < 0) { > > > > > > Hey Al, > > > Due to a git stash screw up on my part, I've had this leftover WARN_ON > > > in my tree for the last couple months. (That screw-up might turn out to > > be > > > serendipitous if this is a real bug..) > > > > > > Today I decided to change things up and beat up on xfs for a change, and > > > was able to trigger this again. > > > > > > Is this check no longer valid, or am I triggering the same bug we were > > chased > > > down in nfs, but now in xfs ? (None of the other detritus from that > > debugging > > > back in April made it, just those three lines above). > > > > Revisiting this. I went back and dug out some of the other debug diffs [1] > > from that old thread. > > > > I can easily trigger this spew on xfs. > > > > ... > > > > asked to read 4096, claims to have read 1 > > actual size of data in pipe 4096 > > [0:4096] > > f_op: a058c920, f_flags: 49154, pos: 0/1, size: 0 > > > > > > I'm still trying to narrow down an exact reproducer, but it seems having > > trinity do a combination of sendfile & writev, with pipes and regular > > files as fd's is the best repro. > > > > Is this a real problem, or am I chasing ghosts ? That it doesn't happen > > on ext4 or btrfs is making me wonder... > > I haven't heard of any problems w/ directio xfs lately, but OTOH > I think it's the only filesystem that uses iomap_dio_rw, which would > explain why ext4/btrfs don't have this problem. > > Granted that's idle speculation; is there a reproducer/xfstest for this? I reverted the debug patches mentioned above, and ran trinity for a while again, and got this which smells really suspiciously related WARNING: CPU: 1 PID: 10380 at fs/iomap.c:993 iomap_dio_rw+0x825/0x840 CPU: 1 PID: 10380 Comm: trinity-c30 Not tainted 4.13.0-rc7-think+ #3 task: 8804613a5740 task.stack: 88043212 RIP: 0010:iomap_dio_rw+0x825/0x840 RSP: 0018:880432127890 EFLAGS: 00010286 RAX: fff0 RBX: 88046a64d0e8 RCX: RDX: ed0086424e9b RSI: RDI: ed0086424f03 RBP: 880432127a70 R08: 88046b239840 R09: 0001 R10: 880432126f50 R11: R12: 880432127c40 R13: 0e0a R14: 110086424f20 R15: 880432127ca0 FS: 7f4cda32f700() GS:88046b20() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f181e02f000 CR3: 00043d32a000 CR4: 001406e0 Call Trace: ? iomap_seek_data+0xc0/0xc0 ? down_read_non_owner+0x40/0x40 ? xfs_ilock+0x3f2/0x490 [xfs] ? touch_atime+0x9c/0x180 ? __atime_needs_update+0x440/0x440 xfs_file_dio_aio_read+0x12d/0x390 [xfs] ? xfs_file_dio_aio_read+0x12d/0x390 [xfs] ? xfs_file_fallocate+0x660/0x660 [xfs] ? cyc2ns_read_end+0x10/0x10 xfs_file_read_iter+0x1bb/0x1d0 [xfs] __vfs_read+0x332/0x440 ? default_llseek+0x140/0x140 ? cyc2ns_read_end+0x10/0x10 ? __fget_light+0x1ae/0x230 ? rcu_is_watching+0x8d/0xd0 ? exit_to_usermode_loop+0x1b0/0x1b0 ? rw_verify_area+0x9d/0x150 vfs_read+0xc8/0x1c0 SyS_pread64+0x11a/0x140 ? SyS_write+0x160/0x160 ? do_syscall_64+0xc0/0x3e0 ? SyS_write+0x160/0x160 do_syscall_64+0x1bc/0x3e0 ? syscall_return_slowpath+0x240/0x240 ? cpumask_check.part.2+0x10/0x10 ? cpumask_check.part.2+0x10/0x10 ? mark_held_locks+0x23/0xb0 ? return_from_SYSCAL
Re: iov_iter_pipe warning.
On Mon, Aug 28, 2017 at 09:25:42PM -0700, Darrick J. Wong wrote: > On Mon, Aug 28, 2017 at 04:31:30PM -0400, Dave Jones wrote: > > On Mon, Aug 07, 2017 at 04:18:18PM -0400, Dave Jones wrote: > > > On Fri, Apr 28, 2017 at 06:20:25PM +0100, Al Viro wrote: > > > > On Fri, Apr 28, 2017 at 12:50:24PM -0400, Dave Jones wrote: > > > > > > > > diff --git a/fs/splice.c b/fs/splice.c > > > > index 540c4a44756c..12a12d9c313f 100644 > > > > --- a/fs/splice.c > > > > +++ b/fs/splice.c > > > > @@ -306,6 +306,9 @@ ssize_t generic_file_splice_read(struct file > > *in, loff_t *ppos, > > > > kiocb.ki_pos = *ppos; > > > > ret = call_read_iter(in, , ); > > > > if (ret > 0) { > > > > +if (WARN_ON(iov_iter_count() != len - ret)) > > > > +printk(KERN_ERR "ops %p: was %zd, left %zd, > > returned %d\n", > > > > +in->f_op, len, iov_iter_count(), > > ret); > > > > *ppos = kiocb.ki_pos; > > > > file_accessed(in); > > > > } else if (ret < 0) { > > > > > > Hey Al, > > > Due to a git stash screw up on my part, I've had this leftover WARN_ON > > > in my tree for the last couple months. (That screw-up might turn out to > > be > > > serendipitous if this is a real bug..) > > > > > > Today I decided to change things up and beat up on xfs for a change, and > > > was able to trigger this again. > > > > > > Is this check no longer valid, or am I triggering the same bug we were > > chased > > > down in nfs, but now in xfs ? (None of the other detritus from that > > debugging > > > back in April made it, just those three lines above). > > > > Revisiting this. I went back and dug out some of the other debug diffs [1] > > from that old thread. > > > > I can easily trigger this spew on xfs. > > > > ... > > > > asked to read 4096, claims to have read 1 > > actual size of data in pipe 4096 > > [0:4096] > > f_op: a058c920, f_flags: 49154, pos: 0/1, size: 0 > > > > > > I'm still trying to narrow down an exact reproducer, but it seems having > > trinity do a combination of sendfile & writev, with pipes and regular > > files as fd's is the best repro. > > > > Is this a real problem, or am I chasing ghosts ? That it doesn't happen > > on ext4 or btrfs is making me wonder... > > I haven't heard of any problems w/ directio xfs lately, but OTOH > I think it's the only filesystem that uses iomap_dio_rw, which would > explain why ext4/btrfs don't have this problem. > > Granted that's idle speculation; is there a reproducer/xfstest for this? I reverted the debug patches mentioned above, and ran trinity for a while again, and got this which smells really suspiciously related WARNING: CPU: 1 PID: 10380 at fs/iomap.c:993 iomap_dio_rw+0x825/0x840 CPU: 1 PID: 10380 Comm: trinity-c30 Not tainted 4.13.0-rc7-think+ #3 task: 8804613a5740 task.stack: 88043212 RIP: 0010:iomap_dio_rw+0x825/0x840 RSP: 0018:880432127890 EFLAGS: 00010286 RAX: fff0 RBX: 88046a64d0e8 RCX: RDX: ed0086424e9b RSI: RDI: ed0086424f03 RBP: 880432127a70 R08: 88046b239840 R09: 0001 R10: 880432126f50 R11: R12: 880432127c40 R13: 0e0a R14: 110086424f20 R15: 880432127ca0 FS: 7f4cda32f700() GS:88046b20() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f181e02f000 CR3: 00043d32a000 CR4: 001406e0 Call Trace: ? iomap_seek_data+0xc0/0xc0 ? down_read_non_owner+0x40/0x40 ? xfs_ilock+0x3f2/0x490 [xfs] ? touch_atime+0x9c/0x180 ? __atime_needs_update+0x440/0x440 xfs_file_dio_aio_read+0x12d/0x390 [xfs] ? xfs_file_dio_aio_read+0x12d/0x390 [xfs] ? xfs_file_fallocate+0x660/0x660 [xfs] ? cyc2ns_read_end+0x10/0x10 xfs_file_read_iter+0x1bb/0x1d0 [xfs] __vfs_read+0x332/0x440 ? default_llseek+0x140/0x140 ? cyc2ns_read_end+0x10/0x10 ? __fget_light+0x1ae/0x230 ? rcu_is_watching+0x8d/0xd0 ? exit_to_usermode_loop+0x1b0/0x1b0 ? rw_verify_area+0x9d/0x150 vfs_read+0xc8/0x1c0 SyS_pread64+0x11a/0x140 ? SyS_write+0x160/0x160 ? do_syscall_64+0xc0/0x3e0 ? SyS_write+0x160/0x160 do_syscall_64+0x1bc/0x3e0 ? syscall_return_slowpath+0x240/0x240 ? cpumask_check.part.2+0x10/0x10 ? cpumask_check.part.2+0x10/0x10 ? mark_held_locks+0x23/0xb0 ? return_from_SYSCAL
Re: iov_iter_pipe warning.
On Mon, Aug 07, 2017 at 04:18:18PM -0400, Dave Jones wrote: > On Fri, Apr 28, 2017 at 06:20:25PM +0100, Al Viro wrote: > > On Fri, Apr 28, 2017 at 12:50:24PM -0400, Dave Jones wrote: > > > currently running v4.11-rc8-75-gf83246089ca0 > > > > > > sunrpc bit is for the other unrelated problem I'm chasing. > > > > > > note also, I saw the backtrace without the fs/splice.c changes. > > > > Interesting... Could you add this and see if that triggers? > > > > diff --git a/fs/splice.c b/fs/splice.c > > index 540c4a44756c..12a12d9c313f 100644 > > --- a/fs/splice.c > > +++ b/fs/splice.c > > @@ -306,6 +306,9 @@ ssize_t generic_file_splice_read(struct file *in, > loff_t *ppos, > > kiocb.ki_pos = *ppos; > > ret = call_read_iter(in, , ); > > if (ret > 0) { > > + if (WARN_ON(iov_iter_count() != len - ret)) > > + printk(KERN_ERR "ops %p: was %zd, left %zd, returned > %d\n", > > + in->f_op, len, iov_iter_count(), ret); > > *ppos = kiocb.ki_pos; > > file_accessed(in); > > } else if (ret < 0) { > > Hey Al, > Due to a git stash screw up on my part, I've had this leftover WARN_ON > in my tree for the last couple months. (That screw-up might turn out to be > serendipitous if this is a real bug..) > > Today I decided to change things up and beat up on xfs for a change, and > was able to trigger this again. > > Is this check no longer valid, or am I triggering the same bug we were chased > down in nfs, but now in xfs ? (None of the other detritus from that > debugging > back in April made it, just those three lines above). Revisiting this. I went back and dug out some of the other debug diffs [1] from that old thread. I can easily trigger this spew on xfs. WARNING: CPU: 1 PID: 2251 at fs/splice.c:292 test_it+0xd4/0x1d0 CPU: 1 PID: 2251 Comm: trinity-c42 Not tainted 4.13.0-rc7-think+ #1 task: 880459173a40 task.stack: 88044f7d RIP: 0010:test_it+0xd4/0x1d0 RSP: 0018:88044f7d7878 EFLAGS: 00010283 RAX: RBX: 88044f44b968 RCX: 81511ea0 RDX: 0003 RSI: dc00 RDI: 88044f44ba68 RBP: 88044f7d78c8 R08: 88046b218ec0 R09: R10: 88044f7d7518 R11: R12: 1000 R13: 0001 R14: R15: 0001 FS: 7fdbc09b2700() GS:88046b20() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 000459e1d000 CR4: 001406e0 Call Trace: generic_file_splice_read+0x414/0x4e0 ? opipe_prep.part.14+0x180/0x180 ? lockdep_init_map+0xb2/0x2b0 ? rw_verify_area+0x65/0x150 do_splice_to+0xab/0xc0 splice_direct_to_actor+0x1f5/0x540 ? generic_pipe_buf_nosteal+0x10/0x10 ? do_splice_to+0xc0/0xc0 ? rw_verify_area+0x9d/0x150 do_splice_direct+0x1b9/0x230 ? splice_direct_to_actor+0x540/0x540 ? __sb_start_write+0x164/0x1c0 ? do_sendfile+0x7b3/0x840 do_sendfile+0x428/0x840 ? do_compat_pwritev64+0xb0/0xb0 ? __might_sleep+0x72/0xe0 ? kasan_check_write+0x14/0x20 SyS_sendfile64+0xa4/0x120 ? SyS_sendfile+0x150/0x150 ? mark_held_locks+0x23/0xb0 ? do_syscall_64+0xc0/0x3e0 ? SyS_sendfile+0x150/0x150 do_syscall_64+0x1bc/0x3e0 ? syscall_return_slowpath+0x240/0x240 ? mark_held_locks+0x23/0xb0 ? return_from_SYSCALL_64+0x2d/0x7a ? trace_hardirqs_on_caller+0x182/0x260 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fdbc02dd219 RSP: 002b:7ffc5024fa48 EFLAGS: 0246 ORIG_RAX: 0028 RAX: ffda RBX: 0028 RCX: 7fdbc02dd219 RDX: 7fdbbe348000 RSI: 0011 RDI: 0015 RBP: 7ffc5024faf0 R08: 006d R09: 0094e82f2c730a50 R10: 1000 R11: 0246 R12: 0002 R13: 7fdbc0885058 R14: 7fdbc09b2698 R15: 7fdbc0885000 ---[ end trace a5847ef0f7be7e20 ]--- asked to read 4096, claims to have read 1 actual size of data in pipe 4096 [0:4096] f_op: a058c920, f_flags: 49154, pos: 0/1, size: 0 I'm still trying to narrow down an exact reproducer, but it seems having trinity do a combination of sendfile & writev, with pipes and regular files as fd's is the best repro. Is this a real problem, or am I chasing ghosts ? That it doesn't happen on ext4 or btrfs is making me wonder... Dave [1] https://lkml.org/lkml/2017/4/11/921
Re: iov_iter_pipe warning.
On Mon, Aug 07, 2017 at 04:18:18PM -0400, Dave Jones wrote: > On Fri, Apr 28, 2017 at 06:20:25PM +0100, Al Viro wrote: > > On Fri, Apr 28, 2017 at 12:50:24PM -0400, Dave Jones wrote: > > > currently running v4.11-rc8-75-gf83246089ca0 > > > > > > sunrpc bit is for the other unrelated problem I'm chasing. > > > > > > note also, I saw the backtrace without the fs/splice.c changes. > > > > Interesting... Could you add this and see if that triggers? > > > > diff --git a/fs/splice.c b/fs/splice.c > > index 540c4a44756c..12a12d9c313f 100644 > > --- a/fs/splice.c > > +++ b/fs/splice.c > > @@ -306,6 +306,9 @@ ssize_t generic_file_splice_read(struct file *in, > loff_t *ppos, > > kiocb.ki_pos = *ppos; > > ret = call_read_iter(in, , ); > > if (ret > 0) { > > + if (WARN_ON(iov_iter_count() != len - ret)) > > + printk(KERN_ERR "ops %p: was %zd, left %zd, returned > %d\n", > > + in->f_op, len, iov_iter_count(), ret); > > *ppos = kiocb.ki_pos; > > file_accessed(in); > > } else if (ret < 0) { > > Hey Al, > Due to a git stash screw up on my part, I've had this leftover WARN_ON > in my tree for the last couple months. (That screw-up might turn out to be > serendipitous if this is a real bug..) > > Today I decided to change things up and beat up on xfs for a change, and > was able to trigger this again. > > Is this check no longer valid, or am I triggering the same bug we were chased > down in nfs, but now in xfs ? (None of the other detritus from that > debugging > back in April made it, just those three lines above). Revisiting this. I went back and dug out some of the other debug diffs [1] from that old thread. I can easily trigger this spew on xfs. WARNING: CPU: 1 PID: 2251 at fs/splice.c:292 test_it+0xd4/0x1d0 CPU: 1 PID: 2251 Comm: trinity-c42 Not tainted 4.13.0-rc7-think+ #1 task: 880459173a40 task.stack: 88044f7d RIP: 0010:test_it+0xd4/0x1d0 RSP: 0018:88044f7d7878 EFLAGS: 00010283 RAX: RBX: 88044f44b968 RCX: 81511ea0 RDX: 0003 RSI: dc00 RDI: 88044f44ba68 RBP: 88044f7d78c8 R08: 88046b218ec0 R09: R10: 88044f7d7518 R11: R12: 1000 R13: 0001 R14: R15: 0001 FS: 7fdbc09b2700() GS:88046b20() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: CR3: 000459e1d000 CR4: 001406e0 Call Trace: generic_file_splice_read+0x414/0x4e0 ? opipe_prep.part.14+0x180/0x180 ? lockdep_init_map+0xb2/0x2b0 ? rw_verify_area+0x65/0x150 do_splice_to+0xab/0xc0 splice_direct_to_actor+0x1f5/0x540 ? generic_pipe_buf_nosteal+0x10/0x10 ? do_splice_to+0xc0/0xc0 ? rw_verify_area+0x9d/0x150 do_splice_direct+0x1b9/0x230 ? splice_direct_to_actor+0x540/0x540 ? __sb_start_write+0x164/0x1c0 ? do_sendfile+0x7b3/0x840 do_sendfile+0x428/0x840 ? do_compat_pwritev64+0xb0/0xb0 ? __might_sleep+0x72/0xe0 ? kasan_check_write+0x14/0x20 SyS_sendfile64+0xa4/0x120 ? SyS_sendfile+0x150/0x150 ? mark_held_locks+0x23/0xb0 ? do_syscall_64+0xc0/0x3e0 ? SyS_sendfile+0x150/0x150 do_syscall_64+0x1bc/0x3e0 ? syscall_return_slowpath+0x240/0x240 ? mark_held_locks+0x23/0xb0 ? return_from_SYSCALL_64+0x2d/0x7a ? trace_hardirqs_on_caller+0x182/0x260 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fdbc02dd219 RSP: 002b:7ffc5024fa48 EFLAGS: 0246 ORIG_RAX: 0028 RAX: ffda RBX: 0028 RCX: 7fdbc02dd219 RDX: 7fdbbe348000 RSI: 0011 RDI: 0015 RBP: 7ffc5024faf0 R08: 006d R09: 0094e82f2c730a50 R10: 1000 R11: 0246 R12: 0002 R13: 7fdbc0885058 R14: 7fdbc09b2698 R15: 7fdbc0885000 ---[ end trace a5847ef0f7be7e20 ]--- asked to read 4096, claims to have read 1 actual size of data in pipe 4096 [0:4096] f_op: a058c920, f_flags: 49154, pos: 0/1, size: 0 I'm still trying to narrow down an exact reproducer, but it seems having trinity do a combination of sendfile & writev, with pipes and regular files as fd's is the best repro. Is this a real problem, or am I chasing ghosts ? That it doesn't happen on ext4 or btrfs is making me wonder... Dave [1] https://lkml.org/lkml/2017/4/11/921
Re: nvmet_fc: add defer_req callback for deferment of cmd buffer return
On Fri, Aug 11, 2017 at 07:44:19PM +, Linux Kernel wrote: > Web: > https://git.kernel.org/torvalds/c/0fb228d30b8d72bfee51f57e638d412324d44a11 > Commit: 0fb228d30b8d72bfee51f57e638d412324d44a11 > Parent: 758f3735580c21b8a36d644128af6608120a1dde > Refname:refs/heads/master > Author: James Smart> AuthorDate: Tue Aug 1 15:12:39 2017 -0700 > Committer: Christoph Hellwig > CommitDate: Thu Aug 10 11:06:38 2017 +0200 > > nvmet_fc: add defer_req callback for deferment of cmd buffer return > + > +/* Cleanup defer'ed IOs in queue */ > +list_for_each_entry(deferfcp, >avail_defer_list, req_list) { > +list_del(>req_list); > +kfree(deferfcp); > +} Shouldn't this be list_for_each_entry_safe ? Dave
Re: nvmet_fc: add defer_req callback for deferment of cmd buffer return
On Fri, Aug 11, 2017 at 07:44:19PM +, Linux Kernel wrote: > Web: > https://git.kernel.org/torvalds/c/0fb228d30b8d72bfee51f57e638d412324d44a11 > Commit: 0fb228d30b8d72bfee51f57e638d412324d44a11 > Parent: 758f3735580c21b8a36d644128af6608120a1dde > Refname:refs/heads/master > Author: James Smart > AuthorDate: Tue Aug 1 15:12:39 2017 -0700 > Committer: Christoph Hellwig > CommitDate: Thu Aug 10 11:06:38 2017 +0200 > > nvmet_fc: add defer_req callback for deferment of cmd buffer return > + > +/* Cleanup defer'ed IOs in queue */ > +list_for_each_entry(deferfcp, >avail_defer_list, req_list) { > +list_del(>req_list); > +kfree(deferfcp); > +} Shouldn't this be list_for_each_entry_safe ? Dave
Re: iov_iter_pipe warning.
On Fri, Apr 28, 2017 at 06:20:25PM +0100, Al Viro wrote: > On Fri, Apr 28, 2017 at 12:50:24PM -0400, Dave Jones wrote: > > currently running v4.11-rc8-75-gf83246089ca0 > > > > sunrpc bit is for the other unrelated problem I'm chasing. > > > > note also, I saw the backtrace without the fs/splice.c changes. > > Interesting... Could you add this and see if that triggers? > > diff --git a/fs/splice.c b/fs/splice.c > index 540c4a44756c..12a12d9c313f 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -306,6 +306,9 @@ ssize_t generic_file_splice_read(struct file *in, loff_t > *ppos, > kiocb.ki_pos = *ppos; > ret = call_read_iter(in, , ); > if (ret > 0) { > +if (WARN_ON(iov_iter_count() != len - ret)) > +printk(KERN_ERR "ops %p: was %zd, left %zd, returned > %d\n", > +in->f_op, len, iov_iter_count(), ret); > *ppos = kiocb.ki_pos; > file_accessed(in); > } else if (ret < 0) { Hey Al, Due to a git stash screw up on my part, I've had this leftover WARN_ON in my tree for the last couple months. (That screw-up might turn out to be serendipitous if this is a real bug..) Today I decided to change things up and beat up on xfs for a change, and was able to trigger this again. Is this check no longer valid, or am I triggering the same bug we were chased down in nfs, but now in xfs ? (None of the other detritus from that debugging back in April made it, just those three lines above). Dave WARNING: CPU: 1 PID: 18377 at fs/splice.c:309 generic_file_splice_read+0x3e4/0x430 CPU: 1 PID: 18377 Comm: trinity-c17 Not tainted 4.13.0-rc4-think+ #1 task: 88045d2855c0 task.stack: 88045ca28000 RIP: 0010:generic_file_splice_read+0x3e4/0x430 RSP: 0018:88045ca2f900 EFLAGS: 00010206 RAX: 001f RBX: 88045c36e200 RCX: RDX: 0fe1 RSI: dc00 RDI: 88045ca2f960 RBP: 88045ca2fa38 R08: 88046b26b880 R09: 001f R10: 88045ca2f540 R11: R12: 88045ca2f9b0 R13: 88045ca2fa10 R14: 11008b945f26 R15: 88045c36e228 FS: 7f5580594700() GS:88046b20() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f5580594698 CR3: 00045d3ef000 CR4: 001406e0 Call Trace: ? pipe_to_user+0xa0/0xa0 ? lockdep_init_map+0xb2/0x2b0 ? rw_verify_area+0x9d/0x150 do_splice_to+0xab/0xc0 splice_direct_to_actor+0x1ac/0x480 ? generic_pipe_buf_nosteal+0x10/0x10 ? do_splice_to+0xc0/0xc0 ? rw_verify_area+0x9d/0x150 do_splice_direct+0x1b9/0x230 ? splice_direct_to_actor+0x480/0x480 ? retint_kernel+0x10/0x10 ? rw_verify_area+0x9d/0x150 do_sendfile+0x428/0x840 ? do_compat_pwritev64+0xb0/0xb0 ? copy_user_generic_unrolled+0x83/0xb0 SyS_sendfile64+0xa4/0x120 ? SyS_sendfile+0x150/0x150 ? mark_held_locks+0x23/0xb0 ? do_syscall_64+0xc0/0x3e0 ? SyS_sendfile+0x150/0x150 do_syscall_64+0x1bc/0x3e0 ? syscall_return_slowpath+0x240/0x240 ? mark_held_locks+0x23/0xb0 ? return_from_SYSCALL_64+0x2d/0x7a ? trace_hardirqs_on_caller+0x182/0x260 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f557febf219 RSP: 002b:7ffc25086db8 EFLAGS: 0246 ORIG_RAX: 0028 RAX: ffda RBX: 0028 RCX: 7f557febf219 RDX: 7f557e559000 RSI: 0187 RDI: 0199 RBP: 7ffc25086e60 R08: 0100 R09: 6262 R10: 1000 R11: 0246 R12: 0002 R13: 7f5580516058 R14: 7f5580594698 R15: 7f5580516000 ---[ end trace e2f2217aba545e92 ]--- ops a09e4920: was 4096, left 0, returned 31 $ grep a09e4920 /proc/kallsyms a09e4920 r xfs_file_operations [xfs]
Re: iov_iter_pipe warning.
On Fri, Apr 28, 2017 at 06:20:25PM +0100, Al Viro wrote: > On Fri, Apr 28, 2017 at 12:50:24PM -0400, Dave Jones wrote: > > currently running v4.11-rc8-75-gf83246089ca0 > > > > sunrpc bit is for the other unrelated problem I'm chasing. > > > > note also, I saw the backtrace without the fs/splice.c changes. > > Interesting... Could you add this and see if that triggers? > > diff --git a/fs/splice.c b/fs/splice.c > index 540c4a44756c..12a12d9c313f 100644 > --- a/fs/splice.c > +++ b/fs/splice.c > @@ -306,6 +306,9 @@ ssize_t generic_file_splice_read(struct file *in, loff_t > *ppos, > kiocb.ki_pos = *ppos; > ret = call_read_iter(in, , ); > if (ret > 0) { > +if (WARN_ON(iov_iter_count() != len - ret)) > +printk(KERN_ERR "ops %p: was %zd, left %zd, returned > %d\n", > +in->f_op, len, iov_iter_count(), ret); > *ppos = kiocb.ki_pos; > file_accessed(in); > } else if (ret < 0) { Hey Al, Due to a git stash screw up on my part, I've had this leftover WARN_ON in my tree for the last couple months. (That screw-up might turn out to be serendipitous if this is a real bug..) Today I decided to change things up and beat up on xfs for a change, and was able to trigger this again. Is this check no longer valid, or am I triggering the same bug we were chased down in nfs, but now in xfs ? (None of the other detritus from that debugging back in April made it, just those three lines above). Dave WARNING: CPU: 1 PID: 18377 at fs/splice.c:309 generic_file_splice_read+0x3e4/0x430 CPU: 1 PID: 18377 Comm: trinity-c17 Not tainted 4.13.0-rc4-think+ #1 task: 88045d2855c0 task.stack: 88045ca28000 RIP: 0010:generic_file_splice_read+0x3e4/0x430 RSP: 0018:88045ca2f900 EFLAGS: 00010206 RAX: 001f RBX: 88045c36e200 RCX: RDX: 0fe1 RSI: dc00 RDI: 88045ca2f960 RBP: 88045ca2fa38 R08: 88046b26b880 R09: 001f R10: 88045ca2f540 R11: R12: 88045ca2f9b0 R13: 88045ca2fa10 R14: 11008b945f26 R15: 88045c36e228 FS: 7f5580594700() GS:88046b20() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f5580594698 CR3: 00045d3ef000 CR4: 001406e0 Call Trace: ? pipe_to_user+0xa0/0xa0 ? lockdep_init_map+0xb2/0x2b0 ? rw_verify_area+0x9d/0x150 do_splice_to+0xab/0xc0 splice_direct_to_actor+0x1ac/0x480 ? generic_pipe_buf_nosteal+0x10/0x10 ? do_splice_to+0xc0/0xc0 ? rw_verify_area+0x9d/0x150 do_splice_direct+0x1b9/0x230 ? splice_direct_to_actor+0x480/0x480 ? retint_kernel+0x10/0x10 ? rw_verify_area+0x9d/0x150 do_sendfile+0x428/0x840 ? do_compat_pwritev64+0xb0/0xb0 ? copy_user_generic_unrolled+0x83/0xb0 SyS_sendfile64+0xa4/0x120 ? SyS_sendfile+0x150/0x150 ? mark_held_locks+0x23/0xb0 ? do_syscall_64+0xc0/0x3e0 ? SyS_sendfile+0x150/0x150 do_syscall_64+0x1bc/0x3e0 ? syscall_return_slowpath+0x240/0x240 ? mark_held_locks+0x23/0xb0 ? return_from_SYSCALL_64+0x2d/0x7a ? trace_hardirqs_on_caller+0x182/0x260 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f557febf219 RSP: 002b:7ffc25086db8 EFLAGS: 0246 ORIG_RAX: 0028 RAX: ffda RBX: 0028 RCX: 7f557febf219 RDX: 7f557e559000 RSI: 0187 RDI: 0199 RBP: 7ffc25086e60 R08: 0100 R09: 6262 R10: 1000 R11: 0246 R12: 0002 R13: 7f5580516058 R14: 7f5580594698 R15: 7f5580516000 ---[ end trace e2f2217aba545e92 ]--- ops a09e4920: was 4096, left 0, returned 31 $ grep a09e4920 /proc/kallsyms a09e4920 r xfs_file_operations [xfs]
use-after-free. [libata/block]
Found this in the logs this morning after an overnight fuzz run.. BUG: KASAN: use-after-free in __lock_acquire+0x1aa/0x1970 Read of size 8 at addr 880406805e30 by task trinity-c8/25954 CPU: 1 PID: 25954 Comm: trinity-c8 Not tainted 4.13.0-rc2-think+ #1 Call Trace: dump_stack+0x68/0xa1 print_address_description+0xd9/0x270 kasan_report+0x257/0x370 ? __lock_acquire+0x1aa/0x1970 __asan_load8+0x54/0x90 __lock_acquire+0x1aa/0x1970 ? save_stack+0xb1/0xd0 ? save_stack_trace+0x1b/0x20 ? save_stack+0x46/0xd0 ? try_to_wake_up+0x9b/0xa20 ? end_swap_bio_read+0xbe/0x1a0 ? debug_check_no_locks_freed+0x1b0/0x1b0 ? scsi_softirq_done+0x1a3/0x1d0 ? __blk_mq_complete_request+0x14a/0x2a0 ? blk_mq_complete_request+0x33/0x40 ? scsi_mq_done+0x4e/0x190 ? ata_scsi_qc_complete+0x15b/0x700 ? __ata_qc_complete+0x16d/0x2e0 ? ata_qc_complete+0x1a4/0x740 ? ata_qc_complete_multiple+0xeb/0x140 ? ahci_handle_port_interrupt+0x19e/0xa10 ? ahci_handle_port_intr+0xd9/0x130 ? ahci_single_level_irq_intr+0x62/0x90 ? __handle_irq_event_percpu+0x6e/0x450 ? handle_irq_event_percpu+0x70/0xf0 ? handle_irq_event+0x5a/0x90 ? handle_edge_irq+0xd9/0x2f0 ? handle_irq+0xb4/0x190 ? do_IRQ+0x67/0x140 ? common_interrupt+0x97/0x97 ? do_syscall_64+0x45/0x260 ? entry_SYSCALL64_slow_path+0x25/0x25 lock_acquire+0xfc/0x220 ? lock_acquire+0xfc/0x220 ? try_to_wake_up+0x9b/0xa20 _raw_spin_lock_irqsave+0x40/0x80 ? try_to_wake_up+0x9b/0xa20 try_to_wake_up+0x9b/0xa20 ? rcu_read_lock_sched_held+0x8f/0xa0 ? kmem_cache_free+0x2d3/0x300 ? migrate_swap_stop+0x3f0/0x3f0 ? mempool_free+0x5f/0xd0 wake_up_process+0x15/0x20 end_swap_bio_read+0xc6/0x1a0 bio_endio+0x12f/0x300 blk_update_request+0x12e/0x5c0 scsi_end_request+0x63/0x2f0 scsi_io_completion+0x3f3/0xa50 ? scsi_end_request+0x2f0/0x2f0 ? lock_downgrade+0x2c0/0x2c0 ? lock_acquire+0xfc/0x220 ? blk_stat_add+0x62/0x340 ? scsi_handle_queue_ramp_up+0x42/0x1e0 scsi_finish_command+0x1b1/0x220 scsi_softirq_done+0x1a3/0x1d0 __blk_mq_complete_request+0x14a/0x2a0 ? scsi_prep_state_check.isra.26+0xa0/0xa0 blk_mq_complete_request+0x33/0x40 scsi_mq_done+0x4e/0x190 ? scsi_prep_state_check.isra.26+0xa0/0xa0 ata_scsi_qc_complete+0x15b/0x700 ? lock_downgrade+0x2c0/0x2c0 ? msleep_interruptible+0xb0/0xb0 ? ata_scsi_activity_show+0xb0/0xb0 ? trace_hardirqs_off_caller+0x70/0x110 ? trace_hardirqs_off+0xd/0x10 ? _raw_spin_unlock_irqrestore+0x4b/0x50 ? intel_unmap+0x20b/0x300 ? intel_unmap_sg+0x9e/0xc0 __ata_qc_complete+0x16d/0x2e0 ? intel_unmap+0x300/0x300 ata_qc_complete+0x1a4/0x740 ata_qc_complete_multiple+0xeb/0x140 ahci_handle_port_interrupt+0x19e/0xa10 ? ahci_single_level_irq_intr+0x57/0x90 ahci_handle_port_intr+0xd9/0x130 ahci_single_level_irq_intr+0x62/0x90 ? ahci_handle_port_intr+0x130/0x130 __handle_irq_event_percpu+0x6e/0x450 handle_irq_event_percpu+0x70/0xf0 ? __handle_irq_event_percpu+0x450/0x450 ? lock_contended+0x810/0x810 ? handle_edge_irq+0x30/0x2f0 ? do_raw_spin_unlock+0x97/0x130 handle_irq_event+0x5a/0x90 handle_edge_irq+0xd9/0x2f0 handle_irq+0xb4/0x190 do_IRQ+0x67/0x140 common_interrupt+0x97/0x97 RIP: 0010:do_syscall_64+0x45/0x260 RSP: 0018:8803b2bd7f08 EFLAGS: 0246 ORIG_RAX: ff1e RAX: RBX: 8803b2bd7f58 RCX: 81146032 RDX: 0007 RSI: dc00 RDI: 88044e6c9cc0 RBP: 8803b2bd7f48 R08: 88046b21d1c0 R09: R10: R11: R12: 0023 R13: 88044e6c9cc0 R14: 8803b2bd7fd0 R15: cccd ? trace_hardirqs_on_caller+0x182/0x260 ? do_syscall_64+0x41/0x260 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f80c5932230 RSP: 002b:7fff521ac2e8 EFLAGS: 0246 ORIG_RAX: 0023 RAX: ffda RBX: 7f80c5ff6058 RCX: 7f80c5932230 RDX: RSI: RDI: 7fff521ac2f0 RBP: 6562 R08: 7f80c5c130a4 R09: 7f80c5c13120 R10: 0001 R11: 0246 R12: 005a R13: 7f80c5ff6058 R14: 004df61cd3a0 R15: cccd Allocated by task 14480: save_stack_trace+0x1b/0x20 save_stack+0x46/0xd0 kasan_kmalloc+0xad/0xe0 kasan_slab_alloc+0x12/0x20 kmem_cache_alloc+0xe0/0x2f0 copy_process.part.44+0xbe0/0x2f90 _do_fork+0x173/0x8a0 SyS_clone+0x19/0x20 do_syscall_64+0xea/0x260 return_from_SYSCALL_64+0x0/0x7a Freed by task 0: save_stack_trace+0x1b/0x20 save_stack+0x46/0xd0 kasan_slab_free+0x72/0xc0 kmem_cache_free+0xa8/0x300 free_task+0x69/0x70 __put_task_struct+0xdc/0x220 delayed_put_task_struct+0x59/0x1a0 rcu_process_callbacks+0x49a/0x1580 __do_softirq+0x109/0x5bc The buggy address belongs to the object at 8804068055c0 which belongs to the cache task_struct of size 6848 The buggy address is located 2160 bytes inside of 6848-byte region [8804068055c0, 880406807080) The buggy address belongs to the page: page:ea00101a count:1 mapcount:0 mapping: (null) index:0x0
use-after-free. [libata/block]
Found this in the logs this morning after an overnight fuzz run.. BUG: KASAN: use-after-free in __lock_acquire+0x1aa/0x1970 Read of size 8 at addr 880406805e30 by task trinity-c8/25954 CPU: 1 PID: 25954 Comm: trinity-c8 Not tainted 4.13.0-rc2-think+ #1 Call Trace: dump_stack+0x68/0xa1 print_address_description+0xd9/0x270 kasan_report+0x257/0x370 ? __lock_acquire+0x1aa/0x1970 __asan_load8+0x54/0x90 __lock_acquire+0x1aa/0x1970 ? save_stack+0xb1/0xd0 ? save_stack_trace+0x1b/0x20 ? save_stack+0x46/0xd0 ? try_to_wake_up+0x9b/0xa20 ? end_swap_bio_read+0xbe/0x1a0 ? debug_check_no_locks_freed+0x1b0/0x1b0 ? scsi_softirq_done+0x1a3/0x1d0 ? __blk_mq_complete_request+0x14a/0x2a0 ? blk_mq_complete_request+0x33/0x40 ? scsi_mq_done+0x4e/0x190 ? ata_scsi_qc_complete+0x15b/0x700 ? __ata_qc_complete+0x16d/0x2e0 ? ata_qc_complete+0x1a4/0x740 ? ata_qc_complete_multiple+0xeb/0x140 ? ahci_handle_port_interrupt+0x19e/0xa10 ? ahci_handle_port_intr+0xd9/0x130 ? ahci_single_level_irq_intr+0x62/0x90 ? __handle_irq_event_percpu+0x6e/0x450 ? handle_irq_event_percpu+0x70/0xf0 ? handle_irq_event+0x5a/0x90 ? handle_edge_irq+0xd9/0x2f0 ? handle_irq+0xb4/0x190 ? do_IRQ+0x67/0x140 ? common_interrupt+0x97/0x97 ? do_syscall_64+0x45/0x260 ? entry_SYSCALL64_slow_path+0x25/0x25 lock_acquire+0xfc/0x220 ? lock_acquire+0xfc/0x220 ? try_to_wake_up+0x9b/0xa20 _raw_spin_lock_irqsave+0x40/0x80 ? try_to_wake_up+0x9b/0xa20 try_to_wake_up+0x9b/0xa20 ? rcu_read_lock_sched_held+0x8f/0xa0 ? kmem_cache_free+0x2d3/0x300 ? migrate_swap_stop+0x3f0/0x3f0 ? mempool_free+0x5f/0xd0 wake_up_process+0x15/0x20 end_swap_bio_read+0xc6/0x1a0 bio_endio+0x12f/0x300 blk_update_request+0x12e/0x5c0 scsi_end_request+0x63/0x2f0 scsi_io_completion+0x3f3/0xa50 ? scsi_end_request+0x2f0/0x2f0 ? lock_downgrade+0x2c0/0x2c0 ? lock_acquire+0xfc/0x220 ? blk_stat_add+0x62/0x340 ? scsi_handle_queue_ramp_up+0x42/0x1e0 scsi_finish_command+0x1b1/0x220 scsi_softirq_done+0x1a3/0x1d0 __blk_mq_complete_request+0x14a/0x2a0 ? scsi_prep_state_check.isra.26+0xa0/0xa0 blk_mq_complete_request+0x33/0x40 scsi_mq_done+0x4e/0x190 ? scsi_prep_state_check.isra.26+0xa0/0xa0 ata_scsi_qc_complete+0x15b/0x700 ? lock_downgrade+0x2c0/0x2c0 ? msleep_interruptible+0xb0/0xb0 ? ata_scsi_activity_show+0xb0/0xb0 ? trace_hardirqs_off_caller+0x70/0x110 ? trace_hardirqs_off+0xd/0x10 ? _raw_spin_unlock_irqrestore+0x4b/0x50 ? intel_unmap+0x20b/0x300 ? intel_unmap_sg+0x9e/0xc0 __ata_qc_complete+0x16d/0x2e0 ? intel_unmap+0x300/0x300 ata_qc_complete+0x1a4/0x740 ata_qc_complete_multiple+0xeb/0x140 ahci_handle_port_interrupt+0x19e/0xa10 ? ahci_single_level_irq_intr+0x57/0x90 ahci_handle_port_intr+0xd9/0x130 ahci_single_level_irq_intr+0x62/0x90 ? ahci_handle_port_intr+0x130/0x130 __handle_irq_event_percpu+0x6e/0x450 handle_irq_event_percpu+0x70/0xf0 ? __handle_irq_event_percpu+0x450/0x450 ? lock_contended+0x810/0x810 ? handle_edge_irq+0x30/0x2f0 ? do_raw_spin_unlock+0x97/0x130 handle_irq_event+0x5a/0x90 handle_edge_irq+0xd9/0x2f0 handle_irq+0xb4/0x190 do_IRQ+0x67/0x140 common_interrupt+0x97/0x97 RIP: 0010:do_syscall_64+0x45/0x260 RSP: 0018:8803b2bd7f08 EFLAGS: 0246 ORIG_RAX: ff1e RAX: RBX: 8803b2bd7f58 RCX: 81146032 RDX: 0007 RSI: dc00 RDI: 88044e6c9cc0 RBP: 8803b2bd7f48 R08: 88046b21d1c0 R09: R10: R11: R12: 0023 R13: 88044e6c9cc0 R14: 8803b2bd7fd0 R15: cccd ? trace_hardirqs_on_caller+0x182/0x260 ? do_syscall_64+0x41/0x260 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f80c5932230 RSP: 002b:7fff521ac2e8 EFLAGS: 0246 ORIG_RAX: 0023 RAX: ffda RBX: 7f80c5ff6058 RCX: 7f80c5932230 RDX: RSI: RDI: 7fff521ac2f0 RBP: 6562 R08: 7f80c5c130a4 R09: 7f80c5c13120 R10: 0001 R11: 0246 R12: 005a R13: 7f80c5ff6058 R14: 004df61cd3a0 R15: cccd Allocated by task 14480: save_stack_trace+0x1b/0x20 save_stack+0x46/0xd0 kasan_kmalloc+0xad/0xe0 kasan_slab_alloc+0x12/0x20 kmem_cache_alloc+0xe0/0x2f0 copy_process.part.44+0xbe0/0x2f90 _do_fork+0x173/0x8a0 SyS_clone+0x19/0x20 do_syscall_64+0xea/0x260 return_from_SYSCALL_64+0x0/0x7a Freed by task 0: save_stack_trace+0x1b/0x20 save_stack+0x46/0xd0 kasan_slab_free+0x72/0xc0 kmem_cache_free+0xa8/0x300 free_task+0x69/0x70 __put_task_struct+0xdc/0x220 delayed_put_task_struct+0x59/0x1a0 rcu_process_callbacks+0x49a/0x1580 __do_softirq+0x109/0x5bc The buggy address belongs to the object at 8804068055c0 which belongs to the cache task_struct of size 6848 The buggy address is located 2160 bytes inside of 6848-byte region [8804068055c0, 880406807080) The buggy address belongs to the page: page:ea00101a count:1 mapcount:0 mapping: (null) index:0x0
Re: [PATCH] lib/strscpy: avoid KASAN false positive
On Wed, Jul 19, 2017 at 11:39:32AM -0400, Chris Metcalf wrote: > > We could just remove all that word-at-a-time logic. Do we have any > > evidence that this would harm anything? > > The word-at-a-time logic was part of the initial commit since I wanted > to ensure that strscpy could be used to replace strlcpy or strncpy without > serious concerns about performance. I'm curious what the typical length of the strings we're concerned about in this case are if this makes a difference. Dave