from:"Dave Jones"

Re: Linux 5.10

2020-12-14 Thread Dave Jones

On Mon, Dec 14, 2020 at 10:21:59AM -0700, Jens Axboe wrote:
 
 >  [   87.290698] attempt to access beyond end of device
 > md0: rw=4096, want=13996467328, limit=6261202944
 >  [   87.293371] attempt to access beyond end of device
 > md0: rw=4096, want=13998564480, limit=6261202944
 >  [   87.296045] BTRFS warning (device md0): couldn't read tree root
 >  [   87.300056] BTRFS error (device md0): open_ctree failed
 > 
 >  Reverting it goes back to the -rc7 behaviour where it mounts fine.
 > >>>
 > >>> If the developer/maintainer(s) agree, I can revert this and push out a
 > >>> 5.10.1, just let me know.
 > >>
 > >> Yes, these should be reverted from 5.10 via 5.10.1:
 > >>
 > >> e0910c8e4f87 dm raid: fix discard limits for raid1 and raid10
 > >> f075cfb1dc59 md: change mddev 'chunk_sectors' from int to unsigned
 > > 
 > > Sorry, f075cfb1dc59 was my local commit id, the corresponding upstream
 > > commit as staged by Jens is:
 > > 
 > > 6ffeb1c3f82 md: change mddev 'chunk_sectors' from int to unsigned
 > > 
 > > So please revert:
 > > 6ffeb1c3f822 md: change mddev 'chunk_sectors' from int to unsigned
 > > and then revert:
 > > e0910c8e4f87 dm raid: fix discard limits for raid1 and raid10
 > 
 > Working with Song on understanding the failure case here. raid6 was
 > tested prior to this being shipped. We'll be back with more soon...

FYI, mixup in my original mail, it was raid5  (I forgot I converted it from
raid6->raid5 a few months back).  But I wouldn't be surprised if they
were both equally affected given what that header touched.

Dave

Re: Linux 5.10

2020-12-13 Thread Dave Jones

On Sun, Dec 13, 2020 at 03:03:29PM -0800, Linus Torvalds wrote:
 > Ok, here it is - 5.10 is tagged and pushed out.
 > 
 > I pretty much always wish that the last week was even calmer than it
 > was, and that's true here too. There's a fair amount of fixes in here,
 > including a few last-minute reverts for things that didn't get fixed,
 > but nothing makes me go "we need another week".

...

 > Mike Snitzer (1):
 >   md: change mddev 'chunk_sectors' from int to unsigned

Seems to be broken.  This breaks mounting my raid6 partition:

[   87.290698] attempt to access beyond end of device
   md0: rw=4096, want=13996467328, limit=6261202944
[   87.293371] attempt to access beyond end of device
   md0: rw=4096, want=13998564480, limit=6261202944
[   87.296045] BTRFS warning (device md0): couldn't read tree root
[   87.300056] BTRFS error (device md0): open_ctree failed

Reverting it goes back to the -rc7 behaviour where it mounts fine.

Dave

Re: Linux 5.10

2020-12-13 Thread Dave Jones

On Mon, Dec 14, 2020 at 12:31:47AM -0500, Dave Jones wrote:
 > On Sun, Dec 13, 2020 at 03:03:29PM -0800, Linus Torvalds wrote:
 >  > Ok, here it is - 5.10 is tagged and pushed out.
 >  > 
 >  > I pretty much always wish that the last week was even calmer than it
 >  > was, and that's true here too. There's a fair amount of fixes in here,
 >  > including a few last-minute reverts for things that didn't get fixed,
 >  > but nothing makes me go "we need another week".
 > 
 > ...
 > 
 >  > Mike Snitzer (1):
 >  >   md: change mddev 'chunk_sectors' from int to unsigned
 > 
 > Seems to be broken.  This breaks mounting my raid6 partition:
 > 
 > [   87.290698] attempt to access beyond end of device
 >md0: rw=4096, want=13996467328, limit=6261202944
 > [   87.293371] attempt to access beyond end of device
 >md0: rw=4096, want=13998564480, limit=6261202944
 > [   87.296045] BTRFS warning (device md0): couldn't read tree root
 > [   87.300056] BTRFS error (device md0): open_ctree failed
 > 
 > Reverting it goes back to the -rc7 behaviour where it mounts fine.

Another data point from the md setup in dmesg..

good:

[4.614957] md/raid:md0: device sdd1 operational as raid disk 3
[4.614960] md/raid:md0: device sda1 operational as raid disk 0
[4.614962] md/raid:md0: device sdc1 operational as raid disk 2
[4.614963] md/raid:md0: device sdf1 operational as raid disk 4
[4.614964] md/raid:md0: device sdg1 operational as raid disk 1
[4.615156] md/raid:md0: raid level 5 active with 5 out of 5 devices, 
algorithm 2
[4.645563] md0: detected capacity change from 0 to 12001828929536


bad:

[5.315036] md/raid:md0: device sda1 operational as raid disk 0
[5.316220] md/raid:md0: device sdd1 operational as raid disk 3
[5.317389] md/raid:md0: device sdc1 operational as raid disk 2
[5.318613] md/raid:md0: device sdf1 operational as raid disk 4
[5.319748] md/raid:md0: device sdg1 operational as raid disk 1
[5.321155] md/raid:md0: raid level 5 active with 5 out of 5 devices, 
algorithm 2
[5.370257] md0: detected capacity change from 0 to 3205735907328

Re: weird loadavg on idle machine post 5.7

2020-07-06 Thread Dave Jones

On Mon, Jul 06, 2020 at 04:59:52PM +0200, Peter Zijlstra wrote:
 > On Fri, Jul 03, 2020 at 04:51:53PM -0400, Dave Jones wrote:
 > > On Fri, Jul 03, 2020 at 12:40:33PM +0200, Peter Zijlstra wrote:
 > >  
 > > looked promising the first few hours, but as soon as it hit four hours
 > > of uptime, loadavg spiked and is now pinned to at least 1.00
 > 
 > OK, lots of cursing later, I now have the below...
 > 
 > The TL;DR is that while schedule() doesn't change p->state once it
 > starts, it does read it quite a bit, and ttwu() will actually change it
 > to TASK_WAKING. So if ttwu() changes it to WAKING before schedule()
 > reads it to do loadavg accounting, things go sideways.
 > 
 > The below is extra complicated by the fact that I've had to scrounge up
 > a bunch of load-store ordering without actually adding barriers. It adds
 > yet another control dependency to ttwu(), so take that C standard :-)

Man this stuff is subtle. I could've read this a hundred times and not
even come close to approaching this.

Basically me reading scheduler code:
http://www.quickmeme.com/img/96/9642ed212bbced00885592b39880ec55218e922245e0637cf94db2e41857d558.jpg

 > I've booted it, and build a few kernels with it and checked loadavg
 > drops to 0 after each build, so from that pov all is well, but since
 > I'm not confident I can reproduce the issue, I can't tell this actually
 > fixes anything, except maybe phantoms of my imagination.

Five hours in, looking good so far.  I think you nailed it.

Dave

Re: weird loadavg on idle machine post 5.7

2020-07-03 Thread Dave Jones

On Fri, Jul 03, 2020 at 12:40:33PM +0200, Peter Zijlstra wrote:
 
 > So ARM/Power/etc.. can speculate the load such that the
 > task_contributes_to_load() value is from before ->on_rq.
 > 
 > The compiler might similar re-order things -- although I've not found it
 > doing so with the few builds I looked at.
 > 
 > So I think at the very least we should do something like this. But i've
 > no idea how to reproduce this problem.
 > 
 > Mel's patch placed it too far down, as the WF_ON_CPU path also relies on
 > this, and by not resetting p->sched_contributes_to_load it would skew
 > accounting even worse.

looked promising the first few hours, but as soon as it hit four hours
of uptime, loadavg spiked and is now pinned to at least 1.00

Dave

Re: weird loadavg on idle machine post 5.7

2020-07-02 Thread Dave Jones

On Thu, Jul 02, 2020 at 10:36:27PM +0100, Mel Gorman wrote:
 
 > I'm thinking that the !!task_contributes_to_load(p) should still happen
 > after smp_cond_load_acquire() when on_cpu is stable and the pi_lock is
 > held to stabilised p->state against a parallel wakeup or updating the
 > task rq. I do not see any hazards with respect to smp_rmb and the value
 > of p->state in this particular path but I've confused myself enough in
 > the various scheduler and wakeup paths that I don't want to bet money on
 > it late in the evening
 > 
 > It builds, not booted, it's for discussion but maybe Dave is feeling brave!

stalls, and then panics during boot :(


[   16.933212] igb :02:00.0 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full 
Duplex, Flow Control: RX/TX
[   69.572840] watchdog: BUG: soft lockup - CPU#3 stuck for 44s! 
[kworker/u8:0:7]
[   69.572849] CPU: 3 PID: 7 Comm: kworker/u8:0 Kdump: loaded Not tainted 
5.8.0-rc3-firewall+ #2
[   69.572852] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./Q3XXG4-P, BIOS 5.6.5 06/30/2018
[   69.572861] Workqueue:  0x0 (events_power_efficient)
[   69.572877] RIP: 0010:finish_task_switch+0x71/0x1a0
[   69.572884] Code: 00 00 4d 8b 7c 24 10 65 4c 8b 34 25 c0 6c 01 00 0f 1f 44 
00 00 0f 1f 44 00 00 41 c7 44 24 2c 00 00 00 00 c6 03 00 fb 4d 85 ed <74> 0b f0 
41 ff 4d 4c 0f 84 d9 00 00 00 49 83 c7 80 74 7a 48 89 d8
[   69.572887] RSP: 0018:b36700067e40 EFLAGS: 0246
[   69.572893] RAX: 94654eab RBX: 9465575a8b40 RCX: 
[   69.572895] RDX:  RSI: 9465565c RDI: 94654eab
[   69.572898] RBP: b36700067e68 R08: 0001 R09: 000283c0
[   69.572901] R10:  R11:  R12: 94654eab
[   69.572904] R13:  R14: 9465565c R15: 0001
[   69.572909] FS:  () GS:94655758() 
knlGS:
[   69.572912] CS:  0010 DS:  ES:  CR0: 80050033
[   69.572917] CR2: 7f29b26abc30 CR3: 00020812d001 CR4: 001606e0
[   69.572919] Call Trace:
[   69.572937]  __schedule+0x28d/0x570
[   69.572946]  ? _cond_resched+0x15/0x30
[   69.572954]  schedule+0x38/0xa0
[   69.572962]  worker_thread+0xaa/0x3c0
[   69.572968]  ? process_one_work+0x3c0/0x3c0
[   69.572972]  kthread+0x116/0x130
[   69.572977]  ? __kthread_create_on_node+0x180/0x180
[   69.572982]  ret_from_fork+0x22/0x30
[   69.572988] Kernel panic - not syncing: softlockup: hung tasks
[   69.572993] CPU: 3 PID: 7 Comm: kworker/u8:0 Kdump: loaded Tainted: G
 L5.8.0-rc3-firewall+ #2
[   69.572995] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./Q3XXG4-P, BIOS 5.6.5 06/30/2018
[   69.572998] Workqueue:  0x0 (events_power_efficient)
[   69.573001] Call Trace:
[   69.573004]  
[   69.573010]  dump_stack+0x57/0x70
[   69.573016]  panic+0xfb/0x2cb
[   69.573024]  watchdog_timer_fn.cold.12+0x7d/0x96
[   69.573030]  ? softlockup_fn+0x30/0x30
[   69.573035]  __hrtimer_run_queues+0x100/0x280
[   69.573041]  hrtimer_interrupt+0xf4/0x210
[   69.573049]  __sysvec_apic_timer_interrupt+0x5d/0xf0
[   69.573055]  asm_call_on_stack+0x12/0x20
[   69.573058]  
[   69.573064]  sysvec_apic_timer_interrupt+0x6d/0x80
[   69.573069]  asm_sysvec_apic_timer_interrupt+0xf/0x20
[   69.573078] RIP: 0010:finish_task_switch+0x71/0x1a0
[   69.573082] Code: 00 00 4d 8b 7c 24 10 65 4c 8b 34 25 c0 6c 01 00 0f 1f 44 
00 00 0f 1f 44 00 00 41 c7 44 24 2c 00 00 00 00 c6 03 00 fb 4d 85 ed <74> 0b f0 
41 ff 4d 4c 0f 84 d9 00 00 00 49 83 c7 80 74 7a 48 89 d8
[   69.573085] RSP: 0018:b36700067e40 EFLAGS: 0246
[   69.573088] RAX: 94654eab RBX: 9465575a8b40 RCX: 
[   69.573090] RDX:  RSI: 9465565c RDI: 94654eab
[   69.573092] RBP: b36700067e68 R08: 0001 R09: 000283c0
[   69.573094] R10:  R11:  R12: 94654eab
[   69.573096] R13:  R14: 9465565c R15: 0001
[   69.573106]  __schedule+0x28d/0x570
[   69.573113]  ? _cond_resched+0x15/0x30
[   69.573119]  schedule+0x38/0xa0
[   69.573125]  worker_thread+0xaa/0x3c0
[   69.573130]  ? process_one_work+0x3c0/0x3c0
[   69.573134]  kthread+0x116/0x130
[   69.573149]  ? __kthread_create_on_node+0x180/0x180
[   69.792344]  ret_from_fork+0x22/0x30

Re: weird loadavg on idle machine post 5.7

2020-07-02 Thread Dave Jones

On Thu, Jul 02, 2020 at 01:15:48PM -0400, Dave Jones wrote:
 > When I upgraded my firewall to 5.7-rc2 I noticed that on a mostly
 > idle machine (that usually sees loadavg hover in the 0.xx range)
 > that it was consistently above 1.00 even when there was nothing running.
 > All that perf showed was the kernel was spending time in the idle loop
 > (and running perf).

Unfortunate typo there, I meant 5.8-rc2, and just confirmed the bug persists in
5.8-rc3.

Dave

weird loadavg on idle machine post 5.7

2020-07-02 Thread Dave Jones

When I upgraded my firewall to 5.7-rc2 I noticed that on a mostly
idle machine (that usually sees loadavg hover in the 0.xx range)
that it was consistently above 1.00 even when there was nothing running.
All that perf showed was the kernel was spending time in the idle loop
(and running perf).

For the first hour or so after boot, everything seems fine, but over
time loadavg creeps up, and once it's established a new baseline, it
never seems to ever drop below that again.

One morning I woke up to find loadavg at '7.xx', after almost as many
hours of uptime, which makes me wonder if perhaps this is triggered
by something in cron.  I have a bunch of scripts that fire off
every hour that involve thousands of shortlived runs of iptables/ipset,
but running them manually didn't seem to automatically trigger the bug.

Given it took a few hours of runtime to confirm good/bad, bisecting this
took the last two weeks. I did it four different times, the first
producing bogus results from over-eager 'good', but the last two runs
both implicated this commit:

commit c6e7bd7afaeb3af55ffac122828035f1c01d1d7b (refs/bisect/bad)
Author: Peter Zijlstra 
Date:   Sun May 24 21:29:55 2020 +0100

sched/core: Optimize ttwu() spinning on p->on_cpu

Both Rik and Mel reported seeing ttwu() spend significant time on:

  smp_cond_load_acquire(>on_cpu, !VAL);

Attempt to avoid this by queueing the wakeup on the CPU that owns the
p->on_cpu value. This will then allow the ttwu() to complete without
further waiting.

Since we run schedule() with interrupts disabled, the IPI is
guaranteed to happen after p->on_cpu is cleared, this is what makes it
safe to queue early.

Signed-off-by: Peter Zijlstra (Intel) 
Signed-off-by: Mel Gorman 
Signed-off-by: Ingo Molnar 
Cc: Jirka Hladky 
Cc: Vincent Guittot 
Cc: valentin.schnei...@arm.com
Cc: Hillf Danton 
Cc: Rik van Riel 
Link: 
https://lore.kernel.org/r/20200524202956.27665-2-mgor...@techsingularity.net

Unfortunatly it doesn't revert cleanly on top of rc3 so I haven't
confirmed 100% that it's the cause yet, but the two separate bisects
seem promising.

I don't see any obvious correlation between what's changing there and
the symtoms (other than "scheduler magic") but maybe those closer to
this have ideas what could be going awry ?

Dave

ntp audit spew.

2019-09-23 Thread Dave Jones

I have some hosts that are constantly spewing audit messages like so:

[46897.591182] audit: type=1333 audit(1569250288.663:220): op=offset 
old=2543677901372 new=2980866217213
[46897.591184] audit: type=1333 audit(1569250288.663:221): op=freq 
old=-2443166611284 new=-2436281764244
[48850.604005] audit: type=1333 audit(1569252241.675:222): op=offset 
old=1850302393317 new=3190241577926
[48850.604008] audit: type=1333 audit(1569252241.675:223): op=freq 
old=-2436281764244 new=-2413071187316
[49926.567270] audit: type=1333 audit(1569253317.638:224): op=offset 
old=2453141035832 new=2372389610455
[49926.567273] audit: type=1333 audit(1569253317.638:225): op=freq 
old=-2413071187316 new=-2403561671476

This gets emitted every time ntp makes an adjustment, which is apparently very 
frequent on some hosts.


Audit isn't even enabled on these machines.

# auditctl -l
No rules

# auditctl -s
enabled 0
failure 1
pid 0
rate_limit 0
backlog_limit 64
lost 0
backlog 0
loginuid_immutable 0 unlocked

Asides from the log spew, why is this code doing _anything_ when audit
isn't enabled ?

Something like this:


diff --git a/kernel/audit.c b/kernel/audit.c
index da8dc0db5bd3..1291d826c024 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -2340,6 +2340,9 @@ void audit_log(struct audit_context *ctx, gfp_t gfp_mask, 
int type,
struct audit_buffer *ab;
va_list args;
 
+   if (audit_initialized != AUDIT_INITIALIZED)
+   return;
+
ab = audit_log_start(ctx, gfp_mask, type);
if (ab) {
va_start(args, fmt);


Might silence the spew, but I'm concerned that the amount of work that
audit is doing on an unconfigured machine might warrant further
investigation.

("turn off CONFIG_AUDIT" isn't an option unfortunately, as this is a
one-size-fits-all kernel that runs on some other hosts that /do/ have
audit configured)

Dave

5.3-rc1 panic in dma_direct_max_mapping_size

2019-07-22 Thread Dave Jones

only got a partial panic, but when I threw 5.3-rc1 on a linode vm,
it hit this:

 bus_add_driver+0x1a9/0x1c0
 ? scsi_init_sysctl+0x22/0x22
 driver_register+0x6b/0xa6
 ? scsi_init_sysctl+0x22/0x22
 init+0x86/0xcc
 do_one_initcall+0x69/0x334
 kernel_init_freeable+0x367/0x3ff
 ? rest_init+0x247/0x247
 kernel_init+0xa/0xf9
 ret_from_fork+0x3a/0x50
CR2: 
---[ end trace 2967cd16f7b1a303 ]---
RIP: 0010:dma_direct_max_mapping_size+0x21/0x71
Code: 0f b6 c0 c3 0f 1f 44 00 00 0f 1f 44 00 00 55 53 48 89 fb e8 21 0e 00 00 
84 c0 74 2c 48 8b 83 20 03 00 00 48 8b ab
30 03 00 00 <48> 8b 00 48 85 c0 75 20 48 89 df e8 ff f3 ff ff 48 39 e8 77 2c 83
RSP: 0018:b58f00013ae8 EFLAGS: 00010202
RAX:  RBX: a35ff8914ac8 RCX: b58f00013a1c
RDX: a35ff81d4658 RSI: 007e RDI: a35ff8914ac8
RBP:  R08: a35ff81d4cc0 R09: a35ff82e3bc8
R10:  R11:  R12: a35ff8914ac8
R13:  R14: a35ff826c160 R15: 
FS:  () GS:a35ffba0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2:  CR3: 00012d220001 CR4: 003606f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009
Kernel Offset: 0x1b00 from 0x8100 (relocation range: 
0x8000-0xbfff)


Will try and get some more debug info this evening if it isn't obvious
from the above.

Dave

kernel BUG at kernel/cred.c:825!

2019-01-07 Thread Dave Jones

[   53.980701] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery 
directory
[   53.981216] NFSD: starting 45-second grace period (net f098)
[   54.006802] CRED: Invalid credentials
[   54.006880] CRED: At ./include/linux/cred.h:253
[   54.006899] CRED: Specified credentials: 5daa4529 
[   54.006916] CRED: ->magic=0, put_addr=  (null)
[   54.006927] CRED: ->usage=1, subscr=0
[   54.006935] CRED: ->*uid = { 0,0,0,0 }
[   54.006944] CRED: ->*gid = { 0,0,0,0 }
[   54.006954] [ cut here ]
[   54.006964] kernel BUG at kernel/cred.c:825!
[   54.006977] invalid opcode:  [#1] SMP RIP: __invalid_creds+0x48/0x50
[   54.006987] CPU: 2 PID: 814 Comm: mount.nfs Tainted: GW 
5.0.0-rc1-backup+ #1
[   54.006997] Hardware name: ASUS All Series/Z97-DELUXE, BIOS 2602 08/18/2015
[   54.007171] RIP: 0010:__invalid_creds+0x48/0x50
[   54.007184] Code: 44 89 e2 48 89 ee 48 c7 c7 37 3e 53 ba e8 f7 8f 03 00 48 
c7 c6 49 3e 53 ba 48 89 df 65 48 8b 14 25 80 4e 01 00 e8 48 fd ff ff <0f> 0b 66 
0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 49 89 fe
[   54.007207] RSP: 0018:c9e33a30 EFLAGS: 00010286
[   54.007219] RAX: 001a RBX: ba960300 RCX: 0006
[   54.007234] RDX:  RSI: 8884276f8818 RDI: 88842f895710
[   54.007246] RBP: ba5274c3 R08: 0001 R09: 
[   54.007254] R10: c9e33a50 R11:  R12: 00fd
[   54.007261] R13: 88842c1a6a08 R14: ba960300 R15: c9e33d60
[   54.007269] FS:  7f73770cb140() GS:88842f88() 
knlGS:
[   54.007277] CS:  0010 DS:  ES:  CR0: 80050033
[   54.007283] CR2: 5557d17d1000 CR3: 0004122ba006 CR4: 001606e0
[   54.007359] Call Trace:
[   54.007366]  nfs4_discover_server_trunking+0x286/0x310
[   54.007376]  nfs4_init_client+0xe8/0x260
[   54.007389]  ? nfs_get_client+0x519/0x610
[   54.007401]  ? _raw_spin_unlock+0x24/0x30
[   54.007412]  ? nfs_get_client+0x519/0x610
[   54.007424]  nfs4_set_client+0xb8/0x100
[   54.007439]  nfs4_create_server+0xfe/0x270
[   54.007451]  ? pcpu_alloc+0x611/0x8a0
[   54.007462]  nfs4_remote_mount+0x28/0x50
[   54.007474]  mount_fs+0xf/0x80
[   54.007487]  vfs_kern_mount+0x62/0x160
[   54.007498]  nfs_do_root_mount+0x7f/0xc0
[   54.007510]  nfs4_try_mount+0x3f/0xc0
[   54.007521]  ? get_nfs_version+0x11/0x50
[   54.007536]  nfs_fs_mount+0x61b/0xbd0
[   54.007548]  ? rcu_read_lock_sched_held+0x66/0x70
[   54.007560]  ? nfs_clone_super+0x70/0x70
[   54.007571]  ? nfs_destroy_inode+0x20/0x20
[   54.007585]  ? mount_fs+0xf/0x80
[   54.007595]  mount_fs+0xf/0x80
[   54.007606]  vfs_kern_mount+0x62/0x160
[   54.007618]  do_mount+0x1d1/0xd40
[   54.007631]  ? copy_mount_options+0xd2/0x170
[   54.007643]  ksys_mount+0x7e/0xd0
[   54.007654]  __x64_sys_mount+0x21/0x30
[   54.007665]  do_syscall_64+0x6d/0x660
[   54.007677]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[   54.007690]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   54.007702] RIP: 0033:0x7f7377e97a1a
[   54.007713] Code: 48 8b 0d 71 e4 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 
0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 
f0 ff ff 73 01 c3 48 8b 0d 3e e4 0b 00 f7 d8 64 89 01 48
[   54.007736] RSP: 002b:7ffc73d9b4a8 EFLAGS: 0202 ORIG_RAX: 
00a5
[   54.007751] RAX: ffda RBX:  RCX: 7f7377e97a1a
[   54.007764] RDX: 5632beb51b50 RSI: 5632beb51b70 RDI: 5632beb53880
[   54.007780] RBP: 7ffc73d9b600 R08: 5632beb556b0 R09: 33643a303036343a
[   54.007794] R10: 0c00 R11: 0202 R12: 7ffc73d9b600
[   54.007807] R13: 5632beb548a0 R14: 001c R15: 7ffc73d9b510

Re: Kernel 4.17.4 lockup

2018-07-11 Thread Dave Jones

On Wed, Jul 11, 2018 at 10:50:22AM -0700, Dave Hansen wrote:
 > On 07/11/2018 10:29 AM, H.J. Lu wrote:
 > >> I have seen it on machines with various amounts of cores and RAMs.
 > >> It triggers the fastest on 8 cores with 6GB RAM reliably.
 > > Here is the first kernel message.
 > 
 > This looks like random corruption again.  It's probably a bogus 'struct
 > page' that fails the move_freepages() pfn_valid() checks.  I'm too lazy
 > to go reproduce the likely stack trace (not sure why it didn't show up
 > on your screen), but this could just be another symptom of the same
 > issue that caused the TLB batching oops.
 > 
 > My money is on this being some kind of odd stack corruption, maybe
 > interrupt-induced, but that's a total guess at this point.

So, maybe related.. I reported this to linux-mm a few days ago:

When I ran an rsync on my machine I use for backups, it eventually
hits this trace..

kernel BUG at mm/page_alloc.c:2016!
invalid opcode:  [#1] SMP RIP: move_freepages_block+0x120/0x2d0
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.18.0-rc4-backup+ #1
Hardware name: ASUS All Series/Z97-DELUXE, BIOS 2602 08/18/2015
RIP: 0010:move_freepages_block+0x120/0x2d0
Code: 05 48 01 c8 74 3b f6 00 02 74 36 48 8b 03 48 c1 e8 3e 48 8d 0c 40 48 8b 
86 c0 7f 00 00 48 c1 e8 3e 48 8d 04 40 48 39 c8 74 17 <0f> 0b 45 31 f6 48 83 c4 
28 44 89 f0 5b 5d 41
5c 41 5d 41 5e 41 5f
RSP: 0018:88043fac3af8 EFLAGS: 00010093
RAX:  RBX: ea0002e2 RCX: 0003
RDX:  RSI: ea0002e2 RDI: 
RBP:  R08: 88043fac3b5c R09: 9295e110
R10: 88043fdf4000 R11: ea0002e20008 R12: ea0002e2
R13: 9295dd40 R14: 0008 R15: ea0002e27fc0
FS:  () GS:88043fac() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f2a75f71fe8 CR3: 0001e380f006 CR4: 001606e0
Call Trace:
 
 ? lock_acquire+0xe6/0x1dc
 steal_suitable_fallback+0x152/0x1a0
 get_page_from_freelist+0x1029/0x1650
 ? free_debug_processing+0x271/0x410
 __alloc_pages_nodemask+0x111/0x310
 page_frag_alloc+0x74/0x120
 __netdev_alloc_skb+0x95/0x110
 e1000_alloc_rx_buffers+0x225/0x2b0
 e1000_clean_rx_irq+0x2ee/0x450
 e1000e_poll+0x7c/0x2e0
 net_rx_action+0x273/0x4d0
 __do_softirq+0xc6/0x4d6
 irq_exit+0xbb/0xc0
 do_IRQ+0x60/0x110
 common_interrupt+0xf/0xf
 
RIP: 0010:cpuidle_enter_state+0xb5/0x390
Code: 89 04 24 0f 1f 44 00 00 31 ff e8 86 26 64 ff 80 7c 24 0f 00 0f 85 fb 01 
00 00 e8 66 02 66 ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <48> 8b 0c 24 4c 29 f9 48 
89 c8 48 c1 f9 3f 48
f7 ea b8 ff ff ff 7f
RSP: 0018:c90abe70 EFLAGS: 0202
 ORIG_RAX: ffdc
RAX: 880107fe8040 RBX: 0003 RCX: 0001
RDX: 20c49ba5e353f7cf RSI: 0001 RDI: 880107fe8040
RBP: 88043fae8c20 R08: 0001 R09: 0018
R10:  R11:  R12: 928fb7d8
R13: 0003 R14: 0003 R15: 015e55aecf23
 do_idle+0x128/0x230
 cpu_startup_entry+0x6f/0x80
 start_secondary+0x192/0x1f0
 secondary_startup_64+0xa5/0xb0
NMI watchdog: Watchdog detected hard LOCKUP on cpu 4

Everything then locks up & rebooots.

It's fairly reproduceable, though every time I run it my rsync gets further, 
and eventually I suspect it
won't create enough load to reproduce.

2006 #ifndef CONFIG_HOLES_IN_ZONE
2007 /*
2008  * page_zone is not safe to call in this context when
2009  * CONFIG_HOLES_IN_ZONE is set. This bug check is probably 
redundant
2010  * anyway as we check zone boundaries in move_freepages_block().
2011  * Remove at a later date when no bug reports exist related to
2012  * grouping pages by mobility
2013  */
2014 VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
2015   pfn_valid(page_to_pfn(end_page)) &&
2016   page_zone(start_page) != page_zone(end_page));
2017 #endif
2018


I could trigger it fairly quickly last week, but it seemed dependant on just 
how much
rsync is actually transferring. (There are millions of files, and only a few 
thousand had changed)

When there's nothing changed, the rsync was running to completion every time.

Dave

Re: Kernel 4.17.4 lockup

2018-07-11 Thread Dave Jones

On Wed, Jul 11, 2018 at 10:50:22AM -0700, Dave Hansen wrote:
 > On 07/11/2018 10:29 AM, H.J. Lu wrote:
 > >> I have seen it on machines with various amounts of cores and RAMs.
 > >> It triggers the fastest on 8 cores with 6GB RAM reliably.
 > > Here is the first kernel message.
 > 
 > This looks like random corruption again.  It's probably a bogus 'struct
 > page' that fails the move_freepages() pfn_valid() checks.  I'm too lazy
 > to go reproduce the likely stack trace (not sure why it didn't show up
 > on your screen), but this could just be another symptom of the same
 > issue that caused the TLB batching oops.
 > 
 > My money is on this being some kind of odd stack corruption, maybe
 > interrupt-induced, but that's a total guess at this point.

So, maybe related.. I reported this to linux-mm a few days ago:

When I ran an rsync on my machine I use for backups, it eventually
hits this trace..

kernel BUG at mm/page_alloc.c:2016!
invalid opcode:  [#1] SMP RIP: move_freepages_block+0x120/0x2d0
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.18.0-rc4-backup+ #1
Hardware name: ASUS All Series/Z97-DELUXE, BIOS 2602 08/18/2015
RIP: 0010:move_freepages_block+0x120/0x2d0
Code: 05 48 01 c8 74 3b f6 00 02 74 36 48 8b 03 48 c1 e8 3e 48 8d 0c 40 48 8b 
86 c0 7f 00 00 48 c1 e8 3e 48 8d 04 40 48 39 c8 74 17 <0f> 0b 45 31 f6 48 83 c4 
28 44 89 f0 5b 5d 41
5c 41 5d 41 5e 41 5f
RSP: 0018:88043fac3af8 EFLAGS: 00010093
RAX:  RBX: ea0002e2 RCX: 0003
RDX:  RSI: ea0002e2 RDI: 
RBP:  R08: 88043fac3b5c R09: 9295e110
R10: 88043fdf4000 R11: ea0002e20008 R12: ea0002e2
R13: 9295dd40 R14: 0008 R15: ea0002e27fc0
FS:  () GS:88043fac() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f2a75f71fe8 CR3: 0001e380f006 CR4: 001606e0
Call Trace:
 
 ? lock_acquire+0xe6/0x1dc
 steal_suitable_fallback+0x152/0x1a0
 get_page_from_freelist+0x1029/0x1650
 ? free_debug_processing+0x271/0x410
 __alloc_pages_nodemask+0x111/0x310
 page_frag_alloc+0x74/0x120
 __netdev_alloc_skb+0x95/0x110
 e1000_alloc_rx_buffers+0x225/0x2b0
 e1000_clean_rx_irq+0x2ee/0x450
 e1000e_poll+0x7c/0x2e0
 net_rx_action+0x273/0x4d0
 __do_softirq+0xc6/0x4d6
 irq_exit+0xbb/0xc0
 do_IRQ+0x60/0x110
 common_interrupt+0xf/0xf
 
RIP: 0010:cpuidle_enter_state+0xb5/0x390
Code: 89 04 24 0f 1f 44 00 00 31 ff e8 86 26 64 ff 80 7c 24 0f 00 0f 85 fb 01 
00 00 e8 66 02 66 ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <48> 8b 0c 24 4c 29 f9 48 
89 c8 48 c1 f9 3f 48
f7 ea b8 ff ff ff 7f
RSP: 0018:c90abe70 EFLAGS: 0202
 ORIG_RAX: ffdc
RAX: 880107fe8040 RBX: 0003 RCX: 0001
RDX: 20c49ba5e353f7cf RSI: 0001 RDI: 880107fe8040
RBP: 88043fae8c20 R08: 0001 R09: 0018
R10:  R11:  R12: 928fb7d8
R13: 0003 R14: 0003 R15: 015e55aecf23
 do_idle+0x128/0x230
 cpu_startup_entry+0x6f/0x80
 start_secondary+0x192/0x1f0
 secondary_startup_64+0xa5/0xb0
NMI watchdog: Watchdog detected hard LOCKUP on cpu 4

Everything then locks up & rebooots.

It's fairly reproduceable, though every time I run it my rsync gets further, 
and eventually I suspect it
won't create enough load to reproduce.

2006 #ifndef CONFIG_HOLES_IN_ZONE
2007 /*
2008  * page_zone is not safe to call in this context when
2009  * CONFIG_HOLES_IN_ZONE is set. This bug check is probably 
redundant
2010  * anyway as we check zone boundaries in move_freepages_block().
2011  * Remove at a later date when no bug reports exist related to
2012  * grouping pages by mobility
2013  */
2014 VM_BUG_ON(pfn_valid(page_to_pfn(start_page)) &&
2015   pfn_valid(page_to_pfn(end_page)) &&
2016   page_zone(start_page) != page_zone(end_page));
2017 #endif
2018


I could trigger it fairly quickly last week, but it seemed dependant on just 
how much
rsync is actually transferring. (There are millions of files, and only a few 
thousand had changed)

When there's nothing changed, the rsync was running to completion every time.

Dave

fscache kasan splat on v4.17-rc3

2018-04-29 Thread Dave Jones

[   46.333213] 
==
[   46.336298] BUG: KASAN: slab-out-of-bounds in 
fscache_alloc_cookie+0x129/0x310
[   46.338208] Read of size 4 at addr 8803ea90261c by task mount.nfs/839

[   46.342780] CPU: 2 PID: 839 Comm: mount.nfs Not tainted 
4.17.0-rc3-backup-debug+ #1
[   46.342783] Hardware name: ASUS All Series/Z97-DELUXE, BIOS 2602 08/18/2015
[   46.342784] Call Trace:
[   46.342790]  dump_stack+0x74/0xbb
[   46.342795]  print_address_description+0x9b/0x2b0
[   46.342797]  kasan_report+0x258/0x380
[   46.355407]  ? fscache_alloc_cookie+0x129/0x310
[   46.355410]  fscache_alloc_cookie+0x129/0x310
[   46.355413]  __fscache_acquire_cookie+0xd2/0x570
[   46.355417]  nfs_fscache_get_client_cookie+0x206/0x220
[   46.355419]  ? nfs_readpage_from_fscache_complete+0xa0/0xa0
[   46.355422]  ? rcu_read_lock_sched_held+0x8a/0xa0
[   46.355426]  ? memcpy+0x34/0x50
[   46.355428]  nfs_alloc_client+0x1d9/0x1f0
[   46.371854]  nfs4_alloc_client+0x22/0x420
[   46.371857]  nfs_get_client+0x47d/0x8f0
[   46.371860]  ? pcpu_alloc+0x599/0xaf0
[   46.371862]  nfs4_set_client+0x155/0x1e0
[   46.371865]  ? nfs4_check_serverowner_major_id+0x50/0x50
[   46.371867]  nfs4_create_server+0x261/0x4e0
[   46.371870]  ? nfs4_set_ds_client+0x200/0x200
[   46.371872]  ? alloc_vfsmnt+0xa6/0x360
[   46.371875]  ? __lockdep_init_map+0xaa/0x290
[   46.371878]  nfs4_remote_mount+0x31/0x60
[   46.371880]  mount_fs+0x2f/0xd0
[   46.371884]  vfs_kern_mount+0x68/0x200
[   46.396948]  nfs_do_root_mount+0x7f/0xc0
[   46.396952]  ? do_raw_spin_unlock+0xa2/0x130
[   46.396954]  nfs4_try_mount+0x7f/0x110
[   46.396957]  nfs_fs_mount+0xca5/0x1450
[   46.396960]  ? pcpu_alloc+0x599/0xaf0
[   46.396962]  ? nfs_remount+0x8a0/0x8a0
[   46.396964]  ? mark_held_locks+0x1c/0xb0
[   46.396967]  ? __raw_spin_lock_init+0x1c/0x70
[   46.412631]  ? trace_hardirqs_on_caller+0x187/0x260
[   46.412633]  ? nfs_clone_super+0x150/0x150
[   46.412635]  ? nfs_destroy_inode+0x20/0x20
[   46.412637]  ? __lockdep_init_map+0xaa/0x290
[   46.412639]  ? __lockdep_init_map+0xaa/0x290
[   46.412641]  ? mount_fs+0x2f/0xd0
[   46.412642]  mount_fs+0x2f/0xd0
[   46.412645]  vfs_kern_mount+0x68/0x200
[   46.412648]  ? do_raw_read_unlock+0x28/0x50
[   46.412651]  do_mount+0x2ac/0x14f0
[   46.412653]  ? copy_mount_string+0x20/0x20
[   46.431590]  ? copy_mount_options+0xe6/0x1b0
[   46.431592]  ? copy_mount_options+0x100/0x1b0
[   46.431594]  ? copy_mount_options+0xe6/0x1b0
[   46.431596]  ksys_mount+0x7e/0xd0
[   46.431599]  __x64_sys_mount+0x62/0x70
[   46.431601]  do_syscall_64+0xc7/0x8a0
[   46.431603]  ? syscall_return_slowpath+0x3c0/0x3c0
[   46.431605]  ? mark_held_locks+0x1c/0xb0
[   46.431609]  ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[   46.431611]  ? trace_hardirqs_off_caller+0xc2/0x110
[   46.431613]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[   46.431615]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   46.431617] RIP: 0033:0x7f546ceb97fa
[   46.431619] RSP: 002b:7ffdf1c9d078 EFLAGS: 0206 ORIG_RAX: 
00a5
[   46.431622] RAX: ffda RBX:  RCX: 7f546ceb97fa
[   46.431623] RDX: 55decf202b20 RSI: 55decf202b40 RDI: 55decf204850
[   46.431625] RBP: 7ffdf1c9d1d0 R08: 55decf206680 R09: 62353a303036343a
[   46.431626] R10: 0c00 R11: 0206 R12: 7ffdf1c9d1d0
[   46.431627] R13: 55decf205870 R14: 001c R15: 7ffdf1c9d0e0

[   46.431631] Allocated by task 839:
[   46.431634]  kasan_kmalloc+0xa0/0xd0
[   46.431636]  __kmalloc+0x156/0x350
[   46.431639]  fscache_alloc_cookie+0x2e4/0x310
[   46.431640]  __fscache_acquire_cookie+0xd2/0x570
[   46.431643]  nfs_fscache_get_client_cookie+0x206/0x220
[   46.431645]  nfs_alloc_client+0x1d9/0x1f0
[   46.431648]  nfs4_alloc_client+0x22/0x420
[   46.431650]  nfs_get_client+0x47d/0x8f0
[   46.431652]  nfs4_set_client+0x155/0x1e0
[   46.431653]  nfs4_create_server+0x261/0x4e0
[   46.431655]  nfs4_remote_mount+0x31/0x60
[   46.431657]  mount_fs+0x2f/0xd0
[   46.431659]  vfs_kern_mount+0x68/0x200
[   46.431662]  nfs_do_root_mount+0x7f/0xc0
[   46.484441]  nfs4_try_mount+0x7f/0x110
[   46.484443]  nfs_fs_mount+0xca5/0x1450
[   46.484445]  mount_fs+0x2f/0xd0
[   46.484447]  vfs_kern_mount+0x68/0x200
[   46.484449]  do_mount+0x2ac/0x14f0
[   46.484451]  ksys_mount+0x7e/0xd0
[   46.484452]  __x64_sys_mount+0x62/0x70
[   46.484455]  do_syscall_64+0xc7/0x8a0
[   46.484458]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[   46.484461] Freed by task 407:
[   46.499159]  __kasan_slab_free+0x11d/0x160
[   46.499161]  kfree+0xe5/0x320
[   46.499163]  kobject_uevent_env+0x1ab/0x760
[   46.499165]  kobject_synth_uevent+0x470/0x4e0
[   46.499168]  uevent_store+0x1c/0x40
[   46.499171]  kernfs_fop_write+0x196/0x230
[   46.499174]  __vfs_write+0xc5/0x310
[   46.499175]  vfs_write+0xfb/0x250
[   46.499177]  ksys_write+0xa7/0x130
[   46.499180]  do_syscall_64+0xc7/0x8a0
[   46.512915]

fscache kasan splat on v4.17-rc3

2018-04-29 Thread Dave Jones

[   46.333213] 
==
[   46.336298] BUG: KASAN: slab-out-of-bounds in 
fscache_alloc_cookie+0x129/0x310
[   46.338208] Read of size 4 at addr 8803ea90261c by task mount.nfs/839

[   46.342780] CPU: 2 PID: 839 Comm: mount.nfs Not tainted 
4.17.0-rc3-backup-debug+ #1
[   46.342783] Hardware name: ASUS All Series/Z97-DELUXE, BIOS 2602 08/18/2015
[   46.342784] Call Trace:
[   46.342790]  dump_stack+0x74/0xbb
[   46.342795]  print_address_description+0x9b/0x2b0
[   46.342797]  kasan_report+0x258/0x380
[   46.355407]  ? fscache_alloc_cookie+0x129/0x310
[   46.355410]  fscache_alloc_cookie+0x129/0x310
[   46.355413]  __fscache_acquire_cookie+0xd2/0x570
[   46.355417]  nfs_fscache_get_client_cookie+0x206/0x220
[   46.355419]  ? nfs_readpage_from_fscache_complete+0xa0/0xa0
[   46.355422]  ? rcu_read_lock_sched_held+0x8a/0xa0
[   46.355426]  ? memcpy+0x34/0x50
[   46.355428]  nfs_alloc_client+0x1d9/0x1f0
[   46.371854]  nfs4_alloc_client+0x22/0x420
[   46.371857]  nfs_get_client+0x47d/0x8f0
[   46.371860]  ? pcpu_alloc+0x599/0xaf0
[   46.371862]  nfs4_set_client+0x155/0x1e0
[   46.371865]  ? nfs4_check_serverowner_major_id+0x50/0x50
[   46.371867]  nfs4_create_server+0x261/0x4e0
[   46.371870]  ? nfs4_set_ds_client+0x200/0x200
[   46.371872]  ? alloc_vfsmnt+0xa6/0x360
[   46.371875]  ? __lockdep_init_map+0xaa/0x290
[   46.371878]  nfs4_remote_mount+0x31/0x60
[   46.371880]  mount_fs+0x2f/0xd0
[   46.371884]  vfs_kern_mount+0x68/0x200
[   46.396948]  nfs_do_root_mount+0x7f/0xc0
[   46.396952]  ? do_raw_spin_unlock+0xa2/0x130
[   46.396954]  nfs4_try_mount+0x7f/0x110
[   46.396957]  nfs_fs_mount+0xca5/0x1450
[   46.396960]  ? pcpu_alloc+0x599/0xaf0
[   46.396962]  ? nfs_remount+0x8a0/0x8a0
[   46.396964]  ? mark_held_locks+0x1c/0xb0
[   46.396967]  ? __raw_spin_lock_init+0x1c/0x70
[   46.412631]  ? trace_hardirqs_on_caller+0x187/0x260
[   46.412633]  ? nfs_clone_super+0x150/0x150
[   46.412635]  ? nfs_destroy_inode+0x20/0x20
[   46.412637]  ? __lockdep_init_map+0xaa/0x290
[   46.412639]  ? __lockdep_init_map+0xaa/0x290
[   46.412641]  ? mount_fs+0x2f/0xd0
[   46.412642]  mount_fs+0x2f/0xd0
[   46.412645]  vfs_kern_mount+0x68/0x200
[   46.412648]  ? do_raw_read_unlock+0x28/0x50
[   46.412651]  do_mount+0x2ac/0x14f0
[   46.412653]  ? copy_mount_string+0x20/0x20
[   46.431590]  ? copy_mount_options+0xe6/0x1b0
[   46.431592]  ? copy_mount_options+0x100/0x1b0
[   46.431594]  ? copy_mount_options+0xe6/0x1b0
[   46.431596]  ksys_mount+0x7e/0xd0
[   46.431599]  __x64_sys_mount+0x62/0x70
[   46.431601]  do_syscall_64+0xc7/0x8a0
[   46.431603]  ? syscall_return_slowpath+0x3c0/0x3c0
[   46.431605]  ? mark_held_locks+0x1c/0xb0
[   46.431609]  ? entry_SYSCALL_64_after_hwframe+0x59/0xbe
[   46.431611]  ? trace_hardirqs_off_caller+0xc2/0x110
[   46.431613]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[   46.431615]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   46.431617] RIP: 0033:0x7f546ceb97fa
[   46.431619] RSP: 002b:7ffdf1c9d078 EFLAGS: 0206 ORIG_RAX: 
00a5
[   46.431622] RAX: ffda RBX:  RCX: 7f546ceb97fa
[   46.431623] RDX: 55decf202b20 RSI: 55decf202b40 RDI: 55decf204850
[   46.431625] RBP: 7ffdf1c9d1d0 R08: 55decf206680 R09: 62353a303036343a
[   46.431626] R10: 0c00 R11: 0206 R12: 7ffdf1c9d1d0
[   46.431627] R13: 55decf205870 R14: 001c R15: 7ffdf1c9d0e0

[   46.431631] Allocated by task 839:
[   46.431634]  kasan_kmalloc+0xa0/0xd0
[   46.431636]  __kmalloc+0x156/0x350
[   46.431639]  fscache_alloc_cookie+0x2e4/0x310
[   46.431640]  __fscache_acquire_cookie+0xd2/0x570
[   46.431643]  nfs_fscache_get_client_cookie+0x206/0x220
[   46.431645]  nfs_alloc_client+0x1d9/0x1f0
[   46.431648]  nfs4_alloc_client+0x22/0x420
[   46.431650]  nfs_get_client+0x47d/0x8f0
[   46.431652]  nfs4_set_client+0x155/0x1e0
[   46.431653]  nfs4_create_server+0x261/0x4e0
[   46.431655]  nfs4_remote_mount+0x31/0x60
[   46.431657]  mount_fs+0x2f/0xd0
[   46.431659]  vfs_kern_mount+0x68/0x200
[   46.431662]  nfs_do_root_mount+0x7f/0xc0
[   46.484441]  nfs4_try_mount+0x7f/0x110
[   46.484443]  nfs_fs_mount+0xca5/0x1450
[   46.484445]  mount_fs+0x2f/0xd0
[   46.484447]  vfs_kern_mount+0x68/0x200
[   46.484449]  do_mount+0x2ac/0x14f0
[   46.484451]  ksys_mount+0x7e/0xd0
[   46.484452]  __x64_sys_mount+0x62/0x70
[   46.484455]  do_syscall_64+0xc7/0x8a0
[   46.484458]  entry_SYSCALL_64_after_hwframe+0x49/0xbe

[   46.484461] Freed by task 407:
[   46.499159]  __kasan_slab_free+0x11d/0x160
[   46.499161]  kfree+0xe5/0x320
[   46.499163]  kobject_uevent_env+0x1ab/0x760
[   46.499165]  kobject_synth_uevent+0x470/0x4e0
[   46.499168]  uevent_store+0x1c/0x40
[   46.499171]  kernfs_fop_write+0x196/0x230
[   46.499174]  __vfs_write+0xc5/0x310
[   46.499175]  vfs_write+0xfb/0x250
[   46.499177]  ksys_write+0xa7/0x130
[   46.499180]  do_syscall_64+0xc7/0x8a0
[   46.512915]

Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Dave Jones

On Sun, Apr 29, 2018 at 07:02:02PM -0400, Dave Jones wrote:
 > On Tue, Apr 24, 2018 at 09:56:21AM -0400, Theodore Y. Ts'o wrote:
 > 
 >  > Can you tell me a bit about your system?  What distribution, what
 >  > hardware is present in your sytsem (what architecture, what
 >  > peripherals are attached, etc.)?
 >  > 
 >  > There's a reason why we made this --- we were declaring the random
 >  > number pool to be fully intialized before it really was, and that was
 >  > a potential security concern.  It's not as bad as the weakness
 >  > discovered by Nadia Heninger in 2012.  (See https://factorable.net for
 >  > more details.)  However, this is not one of those things where we like
 >  > to fool around.
 >  > 
 >  > So I want to understand if this is an issue with a particular hardware
 >  > configuration, or whether it's just a badly designed Linux init system
 >  > or embedded setup, or something else.  After all, you wouldn't want
 >  > the NSA spying on all of your network traffic, would you?  :-)
 > 
 > Why do we continue to print this stuff out when crng_init=1 though ?

answering my own question, I think.. This is a tristate, and we need it
to be >1 to be quiet, which doesn't happen until..

 > [  165.806247] random: crng init done

this point.

Dave

Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Dave Jones

On Sun, Apr 29, 2018 at 07:02:02PM -0400, Dave Jones wrote:
 > On Tue, Apr 24, 2018 at 09:56:21AM -0400, Theodore Y. Ts'o wrote:
 > 
 >  > Can you tell me a bit about your system?  What distribution, what
 >  > hardware is present in your sytsem (what architecture, what
 >  > peripherals are attached, etc.)?
 >  > 
 >  > There's a reason why we made this --- we were declaring the random
 >  > number pool to be fully intialized before it really was, and that was
 >  > a potential security concern.  It's not as bad as the weakness
 >  > discovered by Nadia Heninger in 2012.  (See https://factorable.net for
 >  > more details.)  However, this is not one of those things where we like
 >  > to fool around.
 >  > 
 >  > So I want to understand if this is an issue with a particular hardware
 >  > configuration, or whether it's just a badly designed Linux init system
 >  > or embedded setup, or something else.  After all, you wouldn't want
 >  > the NSA spying on all of your network traffic, would you?  :-)
 > 
 > Why do we continue to print this stuff out when crng_init=1 though ?

answering my own question, I think.. This is a tristate, and we need it
to be >1 to be quiet, which doesn't happen until..

 > [  165.806247] random: crng init done

this point.

Dave

Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Dave Jones

On Tue, Apr 24, 2018 at 09:56:21AM -0400, Theodore Y. Ts'o wrote:

 > Can you tell me a bit about your system?  What distribution, what
 > hardware is present in your sytsem (what architecture, what
 > peripherals are attached, etc.)?
 > 
 > There's a reason why we made this --- we were declaring the random
 > number pool to be fully intialized before it really was, and that was
 > a potential security concern.  It's not as bad as the weakness
 > discovered by Nadia Heninger in 2012.  (See https://factorable.net for
 > more details.)  However, this is not one of those things where we like
 > to fool around.
 > 
 > So I want to understand if this is an issue with a particular hardware
 > configuration, or whether it's just a badly designed Linux init system
 > or embedded setup, or something else.  After all, you wouldn't want
 > the NSA spying on all of your network traffic, would you?  :-)

Why do we continue to print this stuff out when crng_init=1 though ?

(This from debian stable, on a pretty basic atom box, but similar
dmesg's on everything else I've put 4.17-rc on so far)

[0.00] random: get_random_bytes called from start_kernel+0x96/0x519 
with crng_init=0
[0.00] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[0.00] random: get_random_u64 called from 
cache_random_seq_create+0x76/0x120 with crng_init=0
[0.151401] calling  initialize_ptr_random+0x0/0x36 @ 1
[0.151527] initcall initialize_ptr_random+0x0/0x36 returned 0 after 0 usecs
[0.294661] calling  prandom_init+0x0/0xbd @ 1
[0.294763] initcall prandom_init+0x0/0xbd returned 0 after 0 usecs
[1.430529] _warn_unseeded_randomness: 165 callbacks suppressed
[1.430540] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[1.430860] random: get_random_u64 called from 
cache_random_seq_create+0x76/0x120 with crng_init=0
[1.452240] random: get_random_u64 called from 
copy_process.part.67+0x1ae/0x1e60 with crng_init=0
[2.954901] _warn_unseeded_randomness: 54 callbacks suppressed
[2.954910] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[2.955185] random: get_random_u64 called from 
cache_random_seq_create+0x76/0x120 with crng_init=0
[2.957701] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[6.017364] _warn_unseeded_randomness: 88 callbacks suppressed
[6.017373] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[6.042652] random: get_random_u64 called from 
cache_random_seq_create+0x76/0x120 with crng_init=0
[6.060333] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[6.951978] calling  prandom_reseed+0x0/0x2a @ 1
[6.960627] initcall prandom_reseed+0x0/0x2a returned 0 after 105 usecs
[7.371745] _warn_unseeded_randomness: 37 callbacks suppressed
[7.371759] random: get_random_u64 called from 
arch_pick_mmap_layout+0x64/0x130 with crng_init=0
[7.395926] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 
with crng_init=0
[7.411549] random: get_random_u32 called from arch_align_stack+0x37/0x50 
with crng_init=0
[7.553379] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[7.563210] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[7.571498] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[8.449679] _warn_unseeded_randomness: 154 callbacks suppressed
[8.449691] random: get_random_u64 called from 
copy_process.part.67+0x1ae/0x1e60 with crng_init=0
[8.483097] random: get_random_u64 called from 
arch_pick_mmap_layout+0x64/0x130 with crng_init=0
[8.497999] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 
with crng_init=0
[9.353904] random: fast init done
[9.770384] _warn_unseeded_randomness: 187 callbacks suppressed
[9.770398] random: get_random_u32 called from bucket_table_alloc+0x84/0x1b0 
with crng_init=1
[9.791514] random: get_random_u32 called from new_slab+0x174/0x680 with 
crng_init=1
[9.834909] random: get_random_u64 called from 
copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[   10.802200] _warn_unseeded_randomness: 168 callbacks suppressed
[   10.802214] random: get_random_u64 called from 
arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[   10.802276] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 
with crng_init=1
[   10.802289] random: get_random_u32 called from arch_align_stack+0x37/0x50 
with crng_init=1
[   11.821109] _warn_unseeded_randomness: 160 callbacks suppressed
[   11.821122] random: get_random_u64 called from 
copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[   11.863770] random: get_random_u32 called from bucket_table_alloc+0x84/0x1b0 
with crng_init=1
[   11.869384] random: get_random_u32 called from new_slab+0x174/0x680 with 
crng_init=1
[   12.843237]

Re: Linux messages full of `random: get_random_u32 called from`

2018-04-29 Thread Dave Jones

On Tue, Apr 24, 2018 at 09:56:21AM -0400, Theodore Y. Ts'o wrote:

 > Can you tell me a bit about your system?  What distribution, what
 > hardware is present in your sytsem (what architecture, what
 > peripherals are attached, etc.)?
 > 
 > There's a reason why we made this --- we were declaring the random
 > number pool to be fully intialized before it really was, and that was
 > a potential security concern.  It's not as bad as the weakness
 > discovered by Nadia Heninger in 2012.  (See https://factorable.net for
 > more details.)  However, this is not one of those things where we like
 > to fool around.
 > 
 > So I want to understand if this is an issue with a particular hardware
 > configuration, or whether it's just a badly designed Linux init system
 > or embedded setup, or something else.  After all, you wouldn't want
 > the NSA spying on all of your network traffic, would you?  :-)

Why do we continue to print this stuff out when crng_init=1 though ?

(This from debian stable, on a pretty basic atom box, but similar
dmesg's on everything else I've put 4.17-rc on so far)

[0.00] random: get_random_bytes called from start_kernel+0x96/0x519 
with crng_init=0
[0.00] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[0.00] random: get_random_u64 called from 
cache_random_seq_create+0x76/0x120 with crng_init=0
[0.151401] calling  initialize_ptr_random+0x0/0x36 @ 1
[0.151527] initcall initialize_ptr_random+0x0/0x36 returned 0 after 0 usecs
[0.294661] calling  prandom_init+0x0/0xbd @ 1
[0.294763] initcall prandom_init+0x0/0xbd returned 0 after 0 usecs
[1.430529] _warn_unseeded_randomness: 165 callbacks suppressed
[1.430540] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[1.430860] random: get_random_u64 called from 
cache_random_seq_create+0x76/0x120 with crng_init=0
[1.452240] random: get_random_u64 called from 
copy_process.part.67+0x1ae/0x1e60 with crng_init=0
[2.954901] _warn_unseeded_randomness: 54 callbacks suppressed
[2.954910] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[2.955185] random: get_random_u64 called from 
cache_random_seq_create+0x76/0x120 with crng_init=0
[2.957701] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[6.017364] _warn_unseeded_randomness: 88 callbacks suppressed
[6.017373] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[6.042652] random: get_random_u64 called from 
cache_random_seq_create+0x76/0x120 with crng_init=0
[6.060333] random: get_random_u64 called from 
__kmem_cache_create+0x39/0x450 with crng_init=0
[6.951978] calling  prandom_reseed+0x0/0x2a @ 1
[6.960627] initcall prandom_reseed+0x0/0x2a returned 0 after 105 usecs
[7.371745] _warn_unseeded_randomness: 37 callbacks suppressed
[7.371759] random: get_random_u64 called from 
arch_pick_mmap_layout+0x64/0x130 with crng_init=0
[7.395926] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 
with crng_init=0
[7.411549] random: get_random_u32 called from arch_align_stack+0x37/0x50 
with crng_init=0
[7.553379] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[7.563210] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[7.571498] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[8.449679] _warn_unseeded_randomness: 154 callbacks suppressed
[8.449691] random: get_random_u64 called from 
copy_process.part.67+0x1ae/0x1e60 with crng_init=0
[8.483097] random: get_random_u64 called from 
arch_pick_mmap_layout+0x64/0x130 with crng_init=0
[8.497999] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 
with crng_init=0
[9.353904] random: fast init done
[9.770384] _warn_unseeded_randomness: 187 callbacks suppressed
[9.770398] random: get_random_u32 called from bucket_table_alloc+0x84/0x1b0 
with crng_init=1
[9.791514] random: get_random_u32 called from new_slab+0x174/0x680 with 
crng_init=1
[9.834909] random: get_random_u64 called from 
copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[   10.802200] _warn_unseeded_randomness: 168 callbacks suppressed
[   10.802214] random: get_random_u64 called from 
arch_pick_mmap_layout+0x64/0x130 with crng_init=1
[   10.802276] random: get_random_u64 called from load_elf_binary+0x4ae/0x1720 
with crng_init=1
[   10.802289] random: get_random_u32 called from arch_align_stack+0x37/0x50 
with crng_init=1
[   11.821109] _warn_unseeded_randomness: 160 callbacks suppressed
[   11.821122] random: get_random_u64 called from 
copy_process.part.67+0x1ae/0x1e60 with crng_init=1
[   11.863770] random: get_random_u32 called from bucket_table_alloc+0x84/0x1b0 
with crng_init=1
[   11.869384] random: get_random_u32 called from new_slab+0x174/0x680 with 
crng_init=1
[   12.843237]

Re: [Intel-gfx] 4.17-rc2: Could not determine valid watermarks for inherited state

2018-04-26 Thread Dave Jones

On Thu, Apr 26, 2018 at 06:25:13PM +0300, Ville Syrjälä wrote:
 > On Thu, Apr 26, 2018 at 06:16:41PM +0300, Ville Syrjälä wrote:
 > > On Thu, Apr 26, 2018 at 05:56:14PM +0300, Ville Syrjälä wrote:
 > > > On Thu, Apr 26, 2018 at 10:27:19AM -0400, Dave Jones wrote:
 > > > > [1.176131] [drm:i9xx_get_initial_plane_config] pipe A/primary A 
 > > > > with fb: size=800x600@32, offset=0, pitch 3200, size 0x1d4c00
 > > > > [1.176161] [drm:i915_gem_object_create_stolen_for_preallocated] 
 > > > > creating preallocated stolen object: stolen_offset=0x, 
 > > > > gtt_offset=0x, size=0x001d5000
 > > > > [1.176312] [drm:intel_alloc_initial_plane_obj.isra.127] initial 
 > > > > plane fb obj (ptrval)
 > > > > [1.176351] [drm:intel_modeset_init] pipe A active planes 0x1
 > > > > [1.176456] [drm:drm_atomic_helper_check_plane_state] Plane must 
 > > > > cover entire CRTC
 > > > > [1.176481] [drm:drm_rect_debug_print] dst: 800x600+0+0
 > > > > [1.176494] [drm:drm_rect_debug_print] clip: 1366x768+0+0
 > > > 
 > > > OK, so that's the problem right there. The fb we took over from the
 > > > BIOS was 800x600, but now we're trying to set up a 1366x768 mode.
 > > > 
 > > > We seem to be missing checks to make sure the initial fb is actually
 > > > big enough for the mode we're currently using :(
 > > 
 > Hmm. Or maybe we should just stick to the pipe src size.
 > 
 > I'm curious whether this fixes the problem?
 > 
 > diff --git a/drivers/gpu/drm/i915/intel_display.c 
 > b/drivers/gpu/drm/i915/intel_display.c
 > index 0f8c7389e87d..30824beedef7 100644
 > --- a/drivers/gpu/drm/i915/intel_display.c
 > +++ b/drivers/gpu/drm/i915/intel_display.c
 > @@ -15284,6 +15284,8 @@ static void intel_modeset_readout_hw_state(struct 
 > drm_device *dev)
 >  memset(>base.mode, 0, sizeof(crtc->base.mode));
 >  if (crtc_state->base.active) {
 >  intel_mode_from_pipe_config(>base.mode, 
 > crtc_state);
 > +crtc->base.mode.hdisplay = crtc_state->pipe_src_w;
 > +crtc->base.mode.vdisplay = crtc_state->pipe_src_h;
 >  
 > intel_mode_from_pipe_config(_state->base.adjusted_mode, crtc_state);
 >  WARN_ON(drm_atomic_set_mode_for_crtc(crtc->base.state, 
 > >base.mode));
 > 

It does!

Feel free to throw a Tested-by: Dave Jones <da...@codemonkey.org.uk> in there.

Dave

Re: [Intel-gfx] 4.17-rc2: Could not determine valid watermarks for inherited state

2018-04-26 Thread Dave Jones

On Thu, Apr 26, 2018 at 06:25:13PM +0300, Ville Syrjälä wrote:
 > On Thu, Apr 26, 2018 at 06:16:41PM +0300, Ville Syrjälä wrote:
 > > On Thu, Apr 26, 2018 at 05:56:14PM +0300, Ville Syrjälä wrote:
 > > > On Thu, Apr 26, 2018 at 10:27:19AM -0400, Dave Jones wrote:
 > > > > [1.176131] [drm:i9xx_get_initial_plane_config] pipe A/primary A 
 > > > > with fb: size=800x600@32, offset=0, pitch 3200, size 0x1d4c00
 > > > > [1.176161] [drm:i915_gem_object_create_stolen_for_preallocated] 
 > > > > creating preallocated stolen object: stolen_offset=0x, 
 > > > > gtt_offset=0x, size=0x001d5000
 > > > > [1.176312] [drm:intel_alloc_initial_plane_obj.isra.127] initial 
 > > > > plane fb obj (ptrval)
 > > > > [1.176351] [drm:intel_modeset_init] pipe A active planes 0x1
 > > > > [1.176456] [drm:drm_atomic_helper_check_plane_state] Plane must 
 > > > > cover entire CRTC
 > > > > [1.176481] [drm:drm_rect_debug_print] dst: 800x600+0+0
 > > > > [1.176494] [drm:drm_rect_debug_print] clip: 1366x768+0+0
 > > > 
 > > > OK, so that's the problem right there. The fb we took over from the
 > > > BIOS was 800x600, but now we're trying to set up a 1366x768 mode.
 > > > 
 > > > We seem to be missing checks to make sure the initial fb is actually
 > > > big enough for the mode we're currently using :(
 > > 
 > Hmm. Or maybe we should just stick to the pipe src size.
 > 
 > I'm curious whether this fixes the problem?
 > 
 > diff --git a/drivers/gpu/drm/i915/intel_display.c 
 > b/drivers/gpu/drm/i915/intel_display.c
 > index 0f8c7389e87d..30824beedef7 100644
 > --- a/drivers/gpu/drm/i915/intel_display.c
 > +++ b/drivers/gpu/drm/i915/intel_display.c
 > @@ -15284,6 +15284,8 @@ static void intel_modeset_readout_hw_state(struct 
 > drm_device *dev)
 >  memset(>base.mode, 0, sizeof(crtc->base.mode));
 >  if (crtc_state->base.active) {
 >  intel_mode_from_pipe_config(>base.mode, 
 > crtc_state);
 > +crtc->base.mode.hdisplay = crtc_state->pipe_src_w;
 > +crtc->base.mode.vdisplay = crtc_state->pipe_src_h;
 >  
 > intel_mode_from_pipe_config(_state->base.adjusted_mode, crtc_state);
 >  WARN_ON(drm_atomic_set_mode_for_crtc(crtc->base.state, 
 > >base.mode));
 > 

It does!

Feel free to throw a Tested-by: Dave Jones  in there.

Dave

Re: [Intel-gfx] 4.17-rc2: Could not determine valid watermarks for inherited state

2018-04-26 Thread Dave Jones

On Thu, Apr 26, 2018 at 04:10:45PM +0300, Ville Syrjälä wrote:
 > On Mon, Apr 23, 2018 at 11:27:13AM -0400, Dave Jones wrote:
 > > This warning just started appearing during boot on a machine I upgraded
 > > to 4.17-rc2.  The warning seems to have been there since 2015, but it
 > > has never triggered before today.
 > 
 > Looks like we have bug open about this. I just asked for more
 > information there:
 > https://bugs.freedesktop.org/show_bug.cgi?id=105992#c5
 > 
 > If you can also boot with drm.debug=0xe maybe we can see some more
 > details about the supposedly bad watermarks.


[1.153294] calling  drm_kms_helper_init+0x0/0x15 @ 1
[1.153768] initcall drm_kms_helper_init+0x0/0x15 returned 0 after 0 usecs
[1.154242] calling  drm_core_init+0x0/0xea @ 1
[1.154760] initcall drm_core_init+0x0/0xea returned 0 after 53 usecs
[1.156781] [drm:intel_pch_type] Found LynxPoint PCH
[1.157254] [drm:intel_power_domains_init] Allowed DC state mask 00
[1.158717] [drm:i915_driver_load] ppgtt mode: 1
[1.159187] [drm:intel_uc_sanitize_options] enable_guc=0 (submission:no 
huc:no)
[1.159665] [drm:i915_driver_load] guc_log_level=0 (enabled:no verbosity:-1)
[1.160247] [drm:i915_ggtt_probe_hw] GGTT size = 2048M
[1.160720] [drm:i915_ggtt_probe_hw] GMADR size = 256M
[1.161189] [drm:i915_ggtt_probe_hw] DSM size = 64M
[1.162126] fb: switching to inteldrmfb from EFI VGA
[1.163161] fb: switching to inteldrmfb from VGA16 VGA
[1.163511] [drm] Replacing VGA console driver
[1.163819] [drm:i915_gem_init_stolen] Memory reserved for graphics device: 
65536K, usable: 64512K
[1.163868] [drm:intel_opregion_setup] graphic opregion physical addr: 
0xd9a13018
[1.163908] [drm:intel_opregion_setup] Public ACPI methods supported
[1.163924] [drm:intel_opregion_setup] SWSCI supported
[1.168084] [drm:intel_opregion_setup] SWSCI GBDA callbacks 0cb3, SBCB 
callbacks 00300483
[1.168107] [drm:intel_opregion_setup] ASLE supported
[1.168120] [drm:intel_opregion_setup] ASLE extension supported
[1.168136] [drm:intel_opregion_setup] Found valid VBT in ACPI OpRegion 
(Mailbox #4)
[1.168325] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[1.168341] [drm] Driver supports precise vblank timestamp query.
[1.168357] [drm:intel_bios_init] Set default to SSC at 12 kHz
[1.168373] [drm:intel_bios_init] VBT signature "$VBT HASWELL", BDB 
version 174
[1.168392] [drm:intel_bios_init] BDB_GENERAL_FEATURES int_tv_support 0 
int_crt_support 1 lvds_use_ssc 0 lvds_ssc_freq 12 display_clock_mode 0 
fdi_rx_polarity_inverted 0
[1.168425] [drm:intel_bios_init] crt_ddc_bus_pin: 5
[1.171131] [drm:intel_opregion_get_panel_type] Ignoring OpRegion panel type 
(0)
[1.171151] [drm:intel_bios_init] Panel type: 2 (VBT)
[1.171164] [drm:intel_bios_init] DRRS supported mode is static
[1.171185] [drm:intel_bios_init] Found panel mode in BIOS VBT tables:
[1.171203] [drm:drm_mode_debug_printmodeline] Modeline 0:"1024x768" 0 65000 
1024 1048 1184 1344 768 771 777 806 0x8 0xa
[1.171227] [drm:intel_bios_init] VBT initial LVDS value 300
[1.171242] [drm:intel_bios_init] VBT backlight PWM modulation frequency 200 
Hz, active high, min brightness 0, level 255, controller 0
[1.171272] [drm:intel_bios_init] Found SDVO panel mode in BIOS VBT tables:
[1.171289] [drm:drm_mode_debug_printmodeline] Modeline 0:"1600x1200" 0 
162000 1600 1664 1856 2160 1200 1201 1204 1250 0x8 0xa
[1.171314] [drm:intel_bios_init] DRRS State Enabled:1
[1.171327] [drm:intel_bios_init] No SDVO device info is found in VBT
[1.171344] [drm:intel_bios_init] Port B VBT info: DP:0 HDMI:0 DVI:1 EDP:0 
CRT:0
[1.171362] [drm:intel_bios_init] VBT HDMI level shift for port B: 6
[1.171377] [drm:intel_bios_init] Port D VBT info: DP:0 HDMI:1 DVI:1 EDP:0 
CRT:0
[1.171395] [drm:intel_bios_init] VBT HDMI level shift for port D: 11
[1.171470] [drm:intel_dsm_detect] no _DSM method for intel device
[1.171492] [drm:i915_driver_load] rawclk rate: 125000 kHz
[1.171524] [drm:intel_power_well_enable] enabling always-on
[1.171549] [drm:intel_power_well_enable] enabling display
[1.172946] [drm:intel_fbc_init] Sanitized enable_fbc value: 0
[1.172964] [drm:intel_print_wm_latency] Primary WM0 latency 20 (2.0 usec)
[1.172981] [drm:intel_print_wm_latency] Primary WM1 latency 4 (2.0 usec)
[1.172997] [drm:intel_print_wm_latency] Primary WM2 latency 36 (18.0 usec)
[1.173014] [drm:intel_print_wm_latency] Primary WM3 latency 90 (45.0 usec)
[1.173030] [drm:intel_print_wm_latency] Primary WM4 latency 160 (80.0 usec)
[1.173047] [drm:intel_print_wm_latency] Sprite WM0 latency 20 (2.0 usec)
[1.173063] [drm:intel_print_wm_latency] Sprite WM1 latency 4 (2.0 usec)
[1.173080] [drm:intel_print_wm_latency] Sprite WM2 latency 36 (18.0 usec)
[

Re: [Intel-gfx] 4.17-rc2: Could not determine valid watermarks for inherited state

2018-04-26 Thread Dave Jones

On Thu, Apr 26, 2018 at 04:10:45PM +0300, Ville Syrjälä wrote:
 > On Mon, Apr 23, 2018 at 11:27:13AM -0400, Dave Jones wrote:
 > > This warning just started appearing during boot on a machine I upgraded
 > > to 4.17-rc2.  The warning seems to have been there since 2015, but it
 > > has never triggered before today.
 > 
 > Looks like we have bug open about this. I just asked for more
 > information there:
 > https://bugs.freedesktop.org/show_bug.cgi?id=105992#c5
 > 
 > If you can also boot with drm.debug=0xe maybe we can see some more
 > details about the supposedly bad watermarks.


[1.153294] calling  drm_kms_helper_init+0x0/0x15 @ 1
[1.153768] initcall drm_kms_helper_init+0x0/0x15 returned 0 after 0 usecs
[1.154242] calling  drm_core_init+0x0/0xea @ 1
[1.154760] initcall drm_core_init+0x0/0xea returned 0 after 53 usecs
[1.156781] [drm:intel_pch_type] Found LynxPoint PCH
[1.157254] [drm:intel_power_domains_init] Allowed DC state mask 00
[1.158717] [drm:i915_driver_load] ppgtt mode: 1
[1.159187] [drm:intel_uc_sanitize_options] enable_guc=0 (submission:no 
huc:no)
[1.159665] [drm:i915_driver_load] guc_log_level=0 (enabled:no verbosity:-1)
[1.160247] [drm:i915_ggtt_probe_hw] GGTT size = 2048M
[1.160720] [drm:i915_ggtt_probe_hw] GMADR size = 256M
[1.161189] [drm:i915_ggtt_probe_hw] DSM size = 64M
[1.162126] fb: switching to inteldrmfb from EFI VGA
[1.163161] fb: switching to inteldrmfb from VGA16 VGA
[1.163511] [drm] Replacing VGA console driver
[1.163819] [drm:i915_gem_init_stolen] Memory reserved for graphics device: 
65536K, usable: 64512K
[1.163868] [drm:intel_opregion_setup] graphic opregion physical addr: 
0xd9a13018
[1.163908] [drm:intel_opregion_setup] Public ACPI methods supported
[1.163924] [drm:intel_opregion_setup] SWSCI supported
[1.168084] [drm:intel_opregion_setup] SWSCI GBDA callbacks 0cb3, SBCB 
callbacks 00300483
[1.168107] [drm:intel_opregion_setup] ASLE supported
[1.168120] [drm:intel_opregion_setup] ASLE extension supported
[1.168136] [drm:intel_opregion_setup] Found valid VBT in ACPI OpRegion 
(Mailbox #4)
[1.168325] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[1.168341] [drm] Driver supports precise vblank timestamp query.
[1.168357] [drm:intel_bios_init] Set default to SSC at 12 kHz
[1.168373] [drm:intel_bios_init] VBT signature "$VBT HASWELL", BDB 
version 174
[1.168392] [drm:intel_bios_init] BDB_GENERAL_FEATURES int_tv_support 0 
int_crt_support 1 lvds_use_ssc 0 lvds_ssc_freq 12 display_clock_mode 0 
fdi_rx_polarity_inverted 0
[1.168425] [drm:intel_bios_init] crt_ddc_bus_pin: 5
[1.171131] [drm:intel_opregion_get_panel_type] Ignoring OpRegion panel type 
(0)
[1.171151] [drm:intel_bios_init] Panel type: 2 (VBT)
[1.171164] [drm:intel_bios_init] DRRS supported mode is static
[1.171185] [drm:intel_bios_init] Found panel mode in BIOS VBT tables:
[1.171203] [drm:drm_mode_debug_printmodeline] Modeline 0:"1024x768" 0 65000 
1024 1048 1184 1344 768 771 777 806 0x8 0xa
[1.171227] [drm:intel_bios_init] VBT initial LVDS value 300
[1.171242] [drm:intel_bios_init] VBT backlight PWM modulation frequency 200 
Hz, active high, min brightness 0, level 255, controller 0
[1.171272] [drm:intel_bios_init] Found SDVO panel mode in BIOS VBT tables:
[1.171289] [drm:drm_mode_debug_printmodeline] Modeline 0:"1600x1200" 0 
162000 1600 1664 1856 2160 1200 1201 1204 1250 0x8 0xa
[1.171314] [drm:intel_bios_init] DRRS State Enabled:1
[1.171327] [drm:intel_bios_init] No SDVO device info is found in VBT
[1.171344] [drm:intel_bios_init] Port B VBT info: DP:0 HDMI:0 DVI:1 EDP:0 
CRT:0
[1.171362] [drm:intel_bios_init] VBT HDMI level shift for port B: 6
[1.171377] [drm:intel_bios_init] Port D VBT info: DP:0 HDMI:1 DVI:1 EDP:0 
CRT:0
[1.171395] [drm:intel_bios_init] VBT HDMI level shift for port D: 11
[1.171470] [drm:intel_dsm_detect] no _DSM method for intel device
[1.171492] [drm:i915_driver_load] rawclk rate: 125000 kHz
[1.171524] [drm:intel_power_well_enable] enabling always-on
[1.171549] [drm:intel_power_well_enable] enabling display
[1.172946] [drm:intel_fbc_init] Sanitized enable_fbc value: 0
[1.172964] [drm:intel_print_wm_latency] Primary WM0 latency 20 (2.0 usec)
[1.172981] [drm:intel_print_wm_latency] Primary WM1 latency 4 (2.0 usec)
[1.172997] [drm:intel_print_wm_latency] Primary WM2 latency 36 (18.0 usec)
[1.173014] [drm:intel_print_wm_latency] Primary WM3 latency 90 (45.0 usec)
[1.173030] [drm:intel_print_wm_latency] Primary WM4 latency 160 (80.0 usec)
[1.173047] [drm:intel_print_wm_latency] Sprite WM0 latency 20 (2.0 usec)
[1.173063] [drm:intel_print_wm_latency] Sprite WM1 latency 4 (2.0 usec)
[1.173080] [drm:intel_print_wm_latency] Sprite WM2 latency 36 (18.0 usec)
[

4.17-rc2: Could not determine valid watermarks for inherited state

2018-04-23 Thread Dave Jones

This warning just started appearing during boot on a machine I upgraded
to 4.17-rc2.  The warning seems to have been there since 2015, but it
has never triggered before today.

Dave

[1.158500] fb: switching to inteldrmfb from EFI VGA
[1.159073] Console: switching to colour dummy device 80x25
[1.159523] checking generic (a 1) vs hw (e000 1000)
[1.159539] fb: switching to inteldrmfb from VGA16 VGA
[1.159752] [drm] Replacing VGA console driver
[1.164454] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[1.164472] [drm] Driver supports precise vblank timestamp query.
[1.167285] i915 :00:02.0: vgaarb: changed VGA decodes: 
olddecodes=io+mem,decodes=io+mem:owns=io+mem
[1.170212] [ cut here ]
[1.170230] Could not determine valid watermarks for inherited state
[1.170267] WARNING: CPU: 1 PID: 1 at 
drivers/gpu/drm/i915/intel_display.c:14584 sanitize_watermarks+0x17b/0x1c0
[1.170291] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc2+ #1
[1.170308] Hardware name: Shuttle Inc. SH87R/FH87, BIOS 2.03 06/19/2014
[1.170325] RIP: 0010:sanitize_watermarks+0x17b/0x1c0
[1.170338] RSP: :a944c0023bf0 EFLAGS: 00010246
[1.170352] RAX:  RBX: 9193508c RCX: 
[1.170369] RDX: 0001 RSI: 990b7399 RDI: 990b7399
[1.170385] RBP: 9193508c R08: 0001 R09: 0001
[1.170401] R10:  R11:  R12: ffea
[1.170418] R13: 9193508faa88 R14: 919350823528 R15: 9193508c0a08
[1.170434] FS:  () GS:91935640() 
knlGS:
[1.170453] CS:  0010 DS:  ES:  CR0: 80050033
[1.170466] CR2:  CR3: 00011d224001 CR4: 000606e0
[1.170483] Call Trace:
[1.170493]  intel_modeset_init+0x769/0x18f0
[1.170506]  i915_driver_load+0x9b9/0xf30
[1.170519]  ? _raw_spin_unlock_irqrestore+0x3f/0x70
[1.170534]  pci_device_probe+0xa3/0x120
[1.170546]  driver_probe_device+0x28a/0x320
[1.170557]  __driver_attach+0x9e/0xb0
[1.170568]  ? driver_probe_device+0x320/0x320
[1.170581]  bus_for_each_dev+0x68/0xc0
[1.170592]  bus_add_driver+0x11d/0x210
[1.170604]  ? mipi_dsi_bus_init+0x11/0x11
[1.170615]  driver_register+0x5b/0xd0
[1.170627]  do_one_initcall+0x10b/0x33f
[1.170638]  ? do_early_param+0x8b/0x8b
[1.170651]  ? rcu_read_lock_sched_held+0x66/0x70
[1.170663]  ? do_early_param+0x8b/0x8b
[1.170674]  kernel_init_freeable+0x1c3/0x249
[1.170687]  ? rest_init+0xc0/0xc0
[1.170697]  kernel_init+0xa/0x100
[1.170707]  ret_from_fork+0x24/0x30
[1.170717] Code: 00 00 00 65 48 33 04 25 28 00 00 00 75 4f 48 8d a4 24 88 
00 00 00 5b 5d 41 5c 41 5d 41 5e c3 48 c7 c7 e0 5d 04 9a e8 25 33 b1 ff <0f> 0b 
eb a4 48 c7 c6 d5 73 04 9a 48 c7 c7 0f c6 fe 99 e8 0e 33 
[1.170847] irq event stamp: 1449710
[1.170858] hardirqs last  enabled at (1449709): [] 
console_unlock+0x51b/0x6b0
[1.170879] hardirqs last disabled at (1449710): [] 
error_entry+0x86/0x100
[1.170900] softirqs last  enabled at (1449580): [] 
__do_softirq+0x3dd/0x521
[1.170922] softirqs last disabled at (1449563): [] 
irq_exit+0xb7/0xc0


00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen 
Core Processor Integrated Graphics Controller (rev 06)

(That's 8086:0402 fwiw)

4.17-rc2: Could not determine valid watermarks for inherited state

2018-04-23 Thread Dave Jones

This warning just started appearing during boot on a machine I upgraded
to 4.17-rc2.  The warning seems to have been there since 2015, but it
has never triggered before today.

Dave

[1.158500] fb: switching to inteldrmfb from EFI VGA
[1.159073] Console: switching to colour dummy device 80x25
[1.159523] checking generic (a 1) vs hw (e000 1000)
[1.159539] fb: switching to inteldrmfb from VGA16 VGA
[1.159752] [drm] Replacing VGA console driver
[1.164454] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[1.164472] [drm] Driver supports precise vblank timestamp query.
[1.167285] i915 :00:02.0: vgaarb: changed VGA decodes: 
olddecodes=io+mem,decodes=io+mem:owns=io+mem
[1.170212] [ cut here ]
[1.170230] Could not determine valid watermarks for inherited state
[1.170267] WARNING: CPU: 1 PID: 1 at 
drivers/gpu/drm/i915/intel_display.c:14584 sanitize_watermarks+0x17b/0x1c0
[1.170291] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc2+ #1
[1.170308] Hardware name: Shuttle Inc. SH87R/FH87, BIOS 2.03 06/19/2014
[1.170325] RIP: 0010:sanitize_watermarks+0x17b/0x1c0
[1.170338] RSP: :a944c0023bf0 EFLAGS: 00010246
[1.170352] RAX:  RBX: 9193508c RCX: 
[1.170369] RDX: 0001 RSI: 990b7399 RDI: 990b7399
[1.170385] RBP: 9193508c R08: 0001 R09: 0001
[1.170401] R10:  R11:  R12: ffea
[1.170418] R13: 9193508faa88 R14: 919350823528 R15: 9193508c0a08
[1.170434] FS:  () GS:91935640() 
knlGS:
[1.170453] CS:  0010 DS:  ES:  CR0: 80050033
[1.170466] CR2:  CR3: 00011d224001 CR4: 000606e0
[1.170483] Call Trace:
[1.170493]  intel_modeset_init+0x769/0x18f0
[1.170506]  i915_driver_load+0x9b9/0xf30
[1.170519]  ? _raw_spin_unlock_irqrestore+0x3f/0x70
[1.170534]  pci_device_probe+0xa3/0x120
[1.170546]  driver_probe_device+0x28a/0x320
[1.170557]  __driver_attach+0x9e/0xb0
[1.170568]  ? driver_probe_device+0x320/0x320
[1.170581]  bus_for_each_dev+0x68/0xc0
[1.170592]  bus_add_driver+0x11d/0x210
[1.170604]  ? mipi_dsi_bus_init+0x11/0x11
[1.170615]  driver_register+0x5b/0xd0
[1.170627]  do_one_initcall+0x10b/0x33f
[1.170638]  ? do_early_param+0x8b/0x8b
[1.170651]  ? rcu_read_lock_sched_held+0x66/0x70
[1.170663]  ? do_early_param+0x8b/0x8b
[1.170674]  kernel_init_freeable+0x1c3/0x249
[1.170687]  ? rest_init+0xc0/0xc0
[1.170697]  kernel_init+0xa/0x100
[1.170707]  ret_from_fork+0x24/0x30
[1.170717] Code: 00 00 00 65 48 33 04 25 28 00 00 00 75 4f 48 8d a4 24 88 
00 00 00 5b 5d 41 5c 41 5d 41 5e c3 48 c7 c7 e0 5d 04 9a e8 25 33 b1 ff <0f> 0b 
eb a4 48 c7 c6 d5 73 04 9a 48 c7 c7 0f c6 fe 99 e8 0e 33 
[1.170847] irq event stamp: 1449710
[1.170858] hardirqs last  enabled at (1449709): [] 
console_unlock+0x51b/0x6b0
[1.170879] hardirqs last disabled at (1449710): [] 
error_entry+0x86/0x100
[1.170900] softirqs last  enabled at (1449580): [] 
__do_softirq+0x3dd/0x521
[1.170922] softirqs last disabled at (1449563): [] 
irq_exit+0xb7/0xc0


00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen 
Core Processor Integrated Graphics Controller (rev 06)

(That's 8086:0402 fwiw)

Re: [4.15-rc9] fs_reclaim lockdep trace

2018-01-28 Thread Dave Jones

On Sun, Jan 28, 2018 at 02:55:28PM +0900, Tetsuo Handa wrote:
 > Dave, would you try below patch?
 > 
 > >From cae2cbf389ae3cdef1b492622722b4aeb07eb284 Mon Sep 17 00:00:00 2001
 > From: Tetsuo Handa <penguin-ker...@i-love.sakura.ne.jp>
 > Date: Sun, 28 Jan 2018 14:17:14 +0900
 > Subject: [PATCH] lockdep: Fix fs_reclaim warning.


Seems to suppress the warning for me.

Tested-by: Dave Jones <da...@codemonkey.org.uk>

Re: [4.15-rc9] fs_reclaim lockdep trace

2018-01-28 Thread Dave Jones

On Sun, Jan 28, 2018 at 02:55:28PM +0900, Tetsuo Handa wrote:
 > Dave, would you try below patch?
 > 
 > >From cae2cbf389ae3cdef1b492622722b4aeb07eb284 Mon Sep 17 00:00:00 2001
 > From: Tetsuo Handa 
 > Date: Sun, 28 Jan 2018 14:17:14 +0900
 > Subject: [PATCH] lockdep: Fix fs_reclaim warning.


Seems to suppress the warning for me.

Tested-by: Dave Jones

Re: [4.15-rc9] fs_reclaim lockdep trace

2018-01-27 Thread Dave Jones

On Tue, Jan 23, 2018 at 08:36:51PM -0500, Dave Jones wrote:
 > Just triggered this on a server I was rsync'ing to.

Actually, I can trigger this really easily, even with an rsync from one
disk to another.  Though that also smells a little like networking in
the traces. Maybe netdev has ideas.

 
The first instance:

 > 
 > WARNING: possible recursive locking detected
 > 4.15.0-rc9-backup-debug+ #1 Not tainted
 > 
 > sshd/24800 is trying to acquire lock:
 >  (fs_reclaim){+.+.}, at: [<84f438c2>] 
 > fs_reclaim_acquire.part.102+0x5/0x30
 > 
 > but task is already holding lock:
 >  (fs_reclaim){+.+.}, at: [<84f438c2>] 
 > fs_reclaim_acquire.part.102+0x5/0x30
 > 
 > other info that might help us debug this:
 >  Possible unsafe locking scenario:
 > 
 >CPU0
 >
 >   lock(fs_reclaim);
 >   lock(fs_reclaim);
 > 
 >  *** DEADLOCK ***
 > 
 >  May be due to missing lock nesting notation
 > 
 > 2 locks held by sshd/24800:
 >  #0:  (sk_lock-AF_INET6){+.+.}, at: [<1a069652>] 
 > tcp_sendmsg+0x19/0x40
 >  #1:  (fs_reclaim){+.+.}, at: [<84f438c2>] 
 > fs_reclaim_acquire.part.102+0x5/0x30
 > 
 > stack backtrace:
 > CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1
 > Call Trace:
 >  dump_stack+0xbc/0x13f
 >  ? _atomic_dec_and_lock+0x101/0x101
 >  ? fs_reclaim_acquire.part.102+0x5/0x30
 >  ? print_lock+0x54/0x68
 >  __lock_acquire+0xa09/0x2040
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  ? mutex_destroy+0x120/0x120
 >  ? hlock_class+0xa0/0xa0
 >  ? kernel_text_address+0x5c/0x90
 >  ? __kernel_text_address+0xe/0x30
 >  ? unwind_get_return_address+0x2f/0x50
 >  ? __save_stack_trace+0x92/0x100
 >  ? graph_lock+0x8d/0x100
 >  ? check_noncircular+0x20/0x20
 >  ? __lock_acquire+0x616/0x2040
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  ? __lock_acquire+0x616/0x2040
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  ? print_irqtrace_events+0x110/0x110
 >  ? active_load_balance_cpu_stop+0x7b0/0x7b0
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  ? mark_lock+0x1b1/0xa00
 >  ? lock_acquire+0x12e/0x350
 >  lock_acquire+0x12e/0x350
 >  ? fs_reclaim_acquire.part.102+0x5/0x30
 >  ? lockdep_rcu_suspicious+0x100/0x100
 >  ? set_next_entity+0x20e/0x10d0
 >  ? mark_lock+0x1b1/0xa00
 >  ? match_held_lock+0x8d/0x440
 >  ? mark_lock+0x1b1/0xa00
 >  ? save_trace+0x1e0/0x1e0
 >  ? print_irqtrace_events+0x110/0x110
 >  ? alloc_extent_state+0xa7/0x410
 >  fs_reclaim_acquire.part.102+0x29/0x30
 >  ? fs_reclaim_acquire.part.102+0x5/0x30
 >  kmem_cache_alloc+0x3d/0x2c0
 >  ? rb_erase+0xe63/0x1240
 >  alloc_extent_state+0xa7/0x410
 >  ? lock_extent_buffer_for_io+0x3f0/0x3f0
 >  ? find_held_lock+0x6d/0xd0
 >  ? test_range_bit+0x197/0x210
 >  ? lock_acquire+0x350/0x350
 >  ? do_raw_spin_unlock+0x147/0x220
 >  ? do_raw_spin_trylock+0x100/0x100
 >  ? iotree_fs_info+0x30/0x30
 >  __clear_extent_bit+0x3ea/0x570
 >  ? clear_state_bit+0x270/0x270
 >  ? count_range_bits+0x2f0/0x2f0
 >  ? lock_acquire+0x350/0x350
 >  ? rb_prev+0x21/0x90
 >  try_release_extent_mapping+0x21a/0x260
 >  __btrfs_releasepage+0xb0/0x1c0
 >  ? btrfs_submit_direct+0xca0/0xca0
 >  ? check_new_page_bad+0x1f0/0x1f0
 >  ? match_held_lock+0xa5/0x440
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  btrfs_releasepage+0x161/0x170
 >  ? __btrfs_releasepage+0x1c0/0x1c0
 >  ? page_rmapping+0xd0/0xd0
 >  ? rmap_walk+0x100/0x100
 >  try_to_release_page+0x162/0x1c0
 >  ? generic_file_write_iter+0x3c0/0x3c0
 >  ? page_evictable+0xcc/0x110
 >  ? lookup_address_in_pgd+0x107/0x190
 >  shrink_page_list+0x1d5a/0x2fb0
 >  ? putback_lru_page+0x3f0/0x3f0
 >  ? save_trace+0x1e0/0x1e0
 >  ? _lookup_address_cpa.isra.13+0x40/0x60
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  ? kmem_cache_free+0x8c/0x280
 >  ? free_extent_state+0x1c8/0x3b0
 >  ? mark_lock+0x1b1/0xa00
 >  ? page_rmapping+0xd0/0xd0
 >  ? print_irqtrace_events+0x110/0x110
 >  ? shrink_node_memcg.constprop.88+0x4c9/0x5e0
 >  ? shrink_node+0x12d/0x260
 >  ? try_to_free_pages+0x418/0xaf0
 >  ? __alloc_pages_slowpath+0x976/0x1790
 >  ? __alloc_pages_nodemask+0x52c/0x5c0
 >  ? delete_node+0x28d/0x5c0
 >  ? find_held_lock+0x6d/0xd0
 >  ? free_pcppages_bulk+0x381/0x570
 >  ? lock_acquire+0x350/0x350
 >  ? do_raw_spin_unlock+0x147/0x220
 >  ? do_raw_spin_trylock+0x100/0x100
 >  ? __lock_is_held+0x51/0xc0
 >  ? _raw_spin_unlock+0x24/0x30
 >  ? free_pcppages_bulk+0x381/0x570
 >  ? mark_lock+0x1b1/0xa00
 >  ? free_compound_page+0x30/0x30
 >  ? print_irqtrace_events+0x110/0x110
 >  ? __kernel_map_pages+0x2c9/0x310
 >  ? mark_lock+0

Re: [4.15-rc9] fs_reclaim lockdep trace

2018-01-27 Thread Dave Jones

On Tue, Jan 23, 2018 at 08:36:51PM -0500, Dave Jones wrote:
 > Just triggered this on a server I was rsync'ing to.

Actually, I can trigger this really easily, even with an rsync from one
disk to another.  Though that also smells a little like networking in
the traces. Maybe netdev has ideas.

 
The first instance:

 > 
 > WARNING: possible recursive locking detected
 > 4.15.0-rc9-backup-debug+ #1 Not tainted
 > 
 > sshd/24800 is trying to acquire lock:
 >  (fs_reclaim){+.+.}, at: [<84f438c2>] 
 > fs_reclaim_acquire.part.102+0x5/0x30
 > 
 > but task is already holding lock:
 >  (fs_reclaim){+.+.}, at: [<84f438c2>] 
 > fs_reclaim_acquire.part.102+0x5/0x30
 > 
 > other info that might help us debug this:
 >  Possible unsafe locking scenario:
 > 
 >CPU0
 >
 >   lock(fs_reclaim);
 >   lock(fs_reclaim);
 > 
 >  *** DEADLOCK ***
 > 
 >  May be due to missing lock nesting notation
 > 
 > 2 locks held by sshd/24800:
 >  #0:  (sk_lock-AF_INET6){+.+.}, at: [<1a069652>] 
 > tcp_sendmsg+0x19/0x40
 >  #1:  (fs_reclaim){+.+.}, at: [<84f438c2>] 
 > fs_reclaim_acquire.part.102+0x5/0x30
 > 
 > stack backtrace:
 > CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1
 > Call Trace:
 >  dump_stack+0xbc/0x13f
 >  ? _atomic_dec_and_lock+0x101/0x101
 >  ? fs_reclaim_acquire.part.102+0x5/0x30
 >  ? print_lock+0x54/0x68
 >  __lock_acquire+0xa09/0x2040
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  ? mutex_destroy+0x120/0x120
 >  ? hlock_class+0xa0/0xa0
 >  ? kernel_text_address+0x5c/0x90
 >  ? __kernel_text_address+0xe/0x30
 >  ? unwind_get_return_address+0x2f/0x50
 >  ? __save_stack_trace+0x92/0x100
 >  ? graph_lock+0x8d/0x100
 >  ? check_noncircular+0x20/0x20
 >  ? __lock_acquire+0x616/0x2040
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  ? __lock_acquire+0x616/0x2040
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  ? print_irqtrace_events+0x110/0x110
 >  ? active_load_balance_cpu_stop+0x7b0/0x7b0
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  ? mark_lock+0x1b1/0xa00
 >  ? lock_acquire+0x12e/0x350
 >  lock_acquire+0x12e/0x350
 >  ? fs_reclaim_acquire.part.102+0x5/0x30
 >  ? lockdep_rcu_suspicious+0x100/0x100
 >  ? set_next_entity+0x20e/0x10d0
 >  ? mark_lock+0x1b1/0xa00
 >  ? match_held_lock+0x8d/0x440
 >  ? mark_lock+0x1b1/0xa00
 >  ? save_trace+0x1e0/0x1e0
 >  ? print_irqtrace_events+0x110/0x110
 >  ? alloc_extent_state+0xa7/0x410
 >  fs_reclaim_acquire.part.102+0x29/0x30
 >  ? fs_reclaim_acquire.part.102+0x5/0x30
 >  kmem_cache_alloc+0x3d/0x2c0
 >  ? rb_erase+0xe63/0x1240
 >  alloc_extent_state+0xa7/0x410
 >  ? lock_extent_buffer_for_io+0x3f0/0x3f0
 >  ? find_held_lock+0x6d/0xd0
 >  ? test_range_bit+0x197/0x210
 >  ? lock_acquire+0x350/0x350
 >  ? do_raw_spin_unlock+0x147/0x220
 >  ? do_raw_spin_trylock+0x100/0x100
 >  ? iotree_fs_info+0x30/0x30
 >  __clear_extent_bit+0x3ea/0x570
 >  ? clear_state_bit+0x270/0x270
 >  ? count_range_bits+0x2f0/0x2f0
 >  ? lock_acquire+0x350/0x350
 >  ? rb_prev+0x21/0x90
 >  try_release_extent_mapping+0x21a/0x260
 >  __btrfs_releasepage+0xb0/0x1c0
 >  ? btrfs_submit_direct+0xca0/0xca0
 >  ? check_new_page_bad+0x1f0/0x1f0
 >  ? match_held_lock+0xa5/0x440
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  btrfs_releasepage+0x161/0x170
 >  ? __btrfs_releasepage+0x1c0/0x1c0
 >  ? page_rmapping+0xd0/0xd0
 >  ? rmap_walk+0x100/0x100
 >  try_to_release_page+0x162/0x1c0
 >  ? generic_file_write_iter+0x3c0/0x3c0
 >  ? page_evictable+0xcc/0x110
 >  ? lookup_address_in_pgd+0x107/0x190
 >  shrink_page_list+0x1d5a/0x2fb0
 >  ? putback_lru_page+0x3f0/0x3f0
 >  ? save_trace+0x1e0/0x1e0
 >  ? _lookup_address_cpa.isra.13+0x40/0x60
 >  ? debug_show_all_locks+0x2f0/0x2f0
 >  ? kmem_cache_free+0x8c/0x280
 >  ? free_extent_state+0x1c8/0x3b0
 >  ? mark_lock+0x1b1/0xa00
 >  ? page_rmapping+0xd0/0xd0
 >  ? print_irqtrace_events+0x110/0x110
 >  ? shrink_node_memcg.constprop.88+0x4c9/0x5e0
 >  ? shrink_node+0x12d/0x260
 >  ? try_to_free_pages+0x418/0xaf0
 >  ? __alloc_pages_slowpath+0x976/0x1790
 >  ? __alloc_pages_nodemask+0x52c/0x5c0
 >  ? delete_node+0x28d/0x5c0
 >  ? find_held_lock+0x6d/0xd0
 >  ? free_pcppages_bulk+0x381/0x570
 >  ? lock_acquire+0x350/0x350
 >  ? do_raw_spin_unlock+0x147/0x220
 >  ? do_raw_spin_trylock+0x100/0x100
 >  ? __lock_is_held+0x51/0xc0
 >  ? _raw_spin_unlock+0x24/0x30
 >  ? free_pcppages_bulk+0x381/0x570
 >  ? mark_lock+0x1b1/0xa00
 >  ? free_compound_page+0x30/0x30
 >  ? print_irqtrace_events+0x110/0x110
 >  ? __kernel_map_pages+0x2c9/0x310
 >  ? mark_lock+0

[4.15-rc9] fs_reclaim lockdep trace

2018-01-23 Thread Dave Jones

Just triggered this on a server I was rsync'ing to.



WARNING: possible recursive locking detected
4.15.0-rc9-backup-debug+ #1 Not tainted

sshd/24800 is trying to acquire lock:
 (fs_reclaim){+.+.}, at: [<84f438c2>] 
fs_reclaim_acquire.part.102+0x5/0x30

but task is already holding lock:
 (fs_reclaim){+.+.}, at: [<84f438c2>] 
fs_reclaim_acquire.part.102+0x5/0x30

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(fs_reclaim);
  lock(fs_reclaim);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by sshd/24800:
 #0:  (sk_lock-AF_INET6){+.+.}, at: [<1a069652>] tcp_sendmsg+0x19/0x40
 #1:  (fs_reclaim){+.+.}, at: [<84f438c2>] 
fs_reclaim_acquire.part.102+0x5/0x30

stack backtrace:
CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1
Call Trace:
 dump_stack+0xbc/0x13f
 ? _atomic_dec_and_lock+0x101/0x101
 ? fs_reclaim_acquire.part.102+0x5/0x30
 ? print_lock+0x54/0x68
 __lock_acquire+0xa09/0x2040
 ? debug_show_all_locks+0x2f0/0x2f0
 ? mutex_destroy+0x120/0x120
 ? hlock_class+0xa0/0xa0
 ? kernel_text_address+0x5c/0x90
 ? __kernel_text_address+0xe/0x30
 ? unwind_get_return_address+0x2f/0x50
 ? __save_stack_trace+0x92/0x100
 ? graph_lock+0x8d/0x100
 ? check_noncircular+0x20/0x20
 ? __lock_acquire+0x616/0x2040
 ? debug_show_all_locks+0x2f0/0x2f0
 ? __lock_acquire+0x616/0x2040
 ? debug_show_all_locks+0x2f0/0x2f0
 ? print_irqtrace_events+0x110/0x110
 ? active_load_balance_cpu_stop+0x7b0/0x7b0
 ? debug_show_all_locks+0x2f0/0x2f0
 ? mark_lock+0x1b1/0xa00
 ? lock_acquire+0x12e/0x350
 lock_acquire+0x12e/0x350
 ? fs_reclaim_acquire.part.102+0x5/0x30
 ? lockdep_rcu_suspicious+0x100/0x100
 ? set_next_entity+0x20e/0x10d0
 ? mark_lock+0x1b1/0xa00
 ? match_held_lock+0x8d/0x440
 ? mark_lock+0x1b1/0xa00
 ? save_trace+0x1e0/0x1e0
 ? print_irqtrace_events+0x110/0x110
 ? alloc_extent_state+0xa7/0x410
 fs_reclaim_acquire.part.102+0x29/0x30
 ? fs_reclaim_acquire.part.102+0x5/0x30
 kmem_cache_alloc+0x3d/0x2c0
 ? rb_erase+0xe63/0x1240
 alloc_extent_state+0xa7/0x410
 ? lock_extent_buffer_for_io+0x3f0/0x3f0
 ? find_held_lock+0x6d/0xd0
 ? test_range_bit+0x197/0x210
 ? lock_acquire+0x350/0x350
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? iotree_fs_info+0x30/0x30
 __clear_extent_bit+0x3ea/0x570
 ? clear_state_bit+0x270/0x270
 ? count_range_bits+0x2f0/0x2f0
 ? lock_acquire+0x350/0x350
 ? rb_prev+0x21/0x90
 try_release_extent_mapping+0x21a/0x260
 __btrfs_releasepage+0xb0/0x1c0
 ? btrfs_submit_direct+0xca0/0xca0
 ? check_new_page_bad+0x1f0/0x1f0
 ? match_held_lock+0xa5/0x440
 ? debug_show_all_locks+0x2f0/0x2f0
 btrfs_releasepage+0x161/0x170
 ? __btrfs_releasepage+0x1c0/0x1c0
 ? page_rmapping+0xd0/0xd0
 ? rmap_walk+0x100/0x100
 try_to_release_page+0x162/0x1c0
 ? generic_file_write_iter+0x3c0/0x3c0
 ? page_evictable+0xcc/0x110
 ? lookup_address_in_pgd+0x107/0x190
 shrink_page_list+0x1d5a/0x2fb0
 ? putback_lru_page+0x3f0/0x3f0
 ? save_trace+0x1e0/0x1e0
 ? _lookup_address_cpa.isra.13+0x40/0x60
 ? debug_show_all_locks+0x2f0/0x2f0
 ? kmem_cache_free+0x8c/0x280
 ? free_extent_state+0x1c8/0x3b0
 ? mark_lock+0x1b1/0xa00
 ? page_rmapping+0xd0/0xd0
 ? print_irqtrace_events+0x110/0x110
 ? shrink_node_memcg.constprop.88+0x4c9/0x5e0
 ? shrink_node+0x12d/0x260
 ? try_to_free_pages+0x418/0xaf0
 ? __alloc_pages_slowpath+0x976/0x1790
 ? __alloc_pages_nodemask+0x52c/0x5c0
 ? delete_node+0x28d/0x5c0
 ? find_held_lock+0x6d/0xd0
 ? free_pcppages_bulk+0x381/0x570
 ? lock_acquire+0x350/0x350
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? __lock_is_held+0x51/0xc0
 ? _raw_spin_unlock+0x24/0x30
 ? free_pcppages_bulk+0x381/0x570
 ? mark_lock+0x1b1/0xa00
 ? free_compound_page+0x30/0x30
 ? print_irqtrace_events+0x110/0x110
 ? __kernel_map_pages+0x2c9/0x310
 ? mark_lock+0x1b1/0xa00
 ? print_irqtrace_events+0x110/0x110
 ? __delete_from_page_cache+0x2e7/0x4e0
 ? save_trace+0x1e0/0x1e0
 ? __add_to_page_cache_locked+0x680/0x680
 ? find_held_lock+0x6d/0xd0
 ? __list_add_valid+0x29/0xa0
 ? free_unref_page_commit+0x198/0x270
 ? drain_local_pages_wq+0x20/0x20
 ? stop_critical_timings+0x210/0x210
 ? mark_lock+0x1b1/0xa00
 ? mark_lock+0x1b1/0xa00
 ? print_irqtrace_events+0x110/0x110
 ? __lock_acquire+0x616/0x2040
 ? mark_lock+0x1b1/0xa00
 ? mark_lock+0x1b1/0xa00
 ? print_irqtrace_events+0x110/0x110
 ? __phys_addr_symbol+0x23/0x40
 ? __change_page_attr_set_clr+0xe86/0x1640
 ? __btrfs_releasepage+0x1c0/0x1c0
 ? mark_lock+0x1b1/0xa00
 ? mark_lock+0x1b1/0xa00
 ? print_irqtrace_events+0x110/0x110
 ? mark_lock+0x1b1/0xa00
 ? __lock_acquire+0x616/0x2040
 ? __lock_acquire+0x616/0x2040
 ? debug_show_all_locks+0x2f0/0x2f0
 ? swiotlb_free_coherent+0x60/0x60
 ? __phys_addr+0x32/0x80
 ? igb_xmit_frame_ring+0xad7/0x1890
 ? stack_access_ok+0x35/0x80
 ? deref_stack_reg+0xa1/0xe0
 ? __read_once_size_nocheck.constprop.6+0x10/0x10
 ?

[4.15-rc9] fs_reclaim lockdep trace

2018-01-23 Thread Dave Jones

Just triggered this on a server I was rsync'ing to.



WARNING: possible recursive locking detected
4.15.0-rc9-backup-debug+ #1 Not tainted

sshd/24800 is trying to acquire lock:
 (fs_reclaim){+.+.}, at: [<84f438c2>] 
fs_reclaim_acquire.part.102+0x5/0x30

but task is already holding lock:
 (fs_reclaim){+.+.}, at: [<84f438c2>] 
fs_reclaim_acquire.part.102+0x5/0x30

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(fs_reclaim);
  lock(fs_reclaim);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by sshd/24800:
 #0:  (sk_lock-AF_INET6){+.+.}, at: [<1a069652>] tcp_sendmsg+0x19/0x40
 #1:  (fs_reclaim){+.+.}, at: [<84f438c2>] 
fs_reclaim_acquire.part.102+0x5/0x30

stack backtrace:
CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1
Call Trace:
 dump_stack+0xbc/0x13f
 ? _atomic_dec_and_lock+0x101/0x101
 ? fs_reclaim_acquire.part.102+0x5/0x30
 ? print_lock+0x54/0x68
 __lock_acquire+0xa09/0x2040
 ? debug_show_all_locks+0x2f0/0x2f0
 ? mutex_destroy+0x120/0x120
 ? hlock_class+0xa0/0xa0
 ? kernel_text_address+0x5c/0x90
 ? __kernel_text_address+0xe/0x30
 ? unwind_get_return_address+0x2f/0x50
 ? __save_stack_trace+0x92/0x100
 ? graph_lock+0x8d/0x100
 ? check_noncircular+0x20/0x20
 ? __lock_acquire+0x616/0x2040
 ? debug_show_all_locks+0x2f0/0x2f0
 ? __lock_acquire+0x616/0x2040
 ? debug_show_all_locks+0x2f0/0x2f0
 ? print_irqtrace_events+0x110/0x110
 ? active_load_balance_cpu_stop+0x7b0/0x7b0
 ? debug_show_all_locks+0x2f0/0x2f0
 ? mark_lock+0x1b1/0xa00
 ? lock_acquire+0x12e/0x350
 lock_acquire+0x12e/0x350
 ? fs_reclaim_acquire.part.102+0x5/0x30
 ? lockdep_rcu_suspicious+0x100/0x100
 ? set_next_entity+0x20e/0x10d0
 ? mark_lock+0x1b1/0xa00
 ? match_held_lock+0x8d/0x440
 ? mark_lock+0x1b1/0xa00
 ? save_trace+0x1e0/0x1e0
 ? print_irqtrace_events+0x110/0x110
 ? alloc_extent_state+0xa7/0x410
 fs_reclaim_acquire.part.102+0x29/0x30
 ? fs_reclaim_acquire.part.102+0x5/0x30
 kmem_cache_alloc+0x3d/0x2c0
 ? rb_erase+0xe63/0x1240
 alloc_extent_state+0xa7/0x410
 ? lock_extent_buffer_for_io+0x3f0/0x3f0
 ? find_held_lock+0x6d/0xd0
 ? test_range_bit+0x197/0x210
 ? lock_acquire+0x350/0x350
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? iotree_fs_info+0x30/0x30
 __clear_extent_bit+0x3ea/0x570
 ? clear_state_bit+0x270/0x270
 ? count_range_bits+0x2f0/0x2f0
 ? lock_acquire+0x350/0x350
 ? rb_prev+0x21/0x90
 try_release_extent_mapping+0x21a/0x260
 __btrfs_releasepage+0xb0/0x1c0
 ? btrfs_submit_direct+0xca0/0xca0
 ? check_new_page_bad+0x1f0/0x1f0
 ? match_held_lock+0xa5/0x440
 ? debug_show_all_locks+0x2f0/0x2f0
 btrfs_releasepage+0x161/0x170
 ? __btrfs_releasepage+0x1c0/0x1c0
 ? page_rmapping+0xd0/0xd0
 ? rmap_walk+0x100/0x100
 try_to_release_page+0x162/0x1c0
 ? generic_file_write_iter+0x3c0/0x3c0
 ? page_evictable+0xcc/0x110
 ? lookup_address_in_pgd+0x107/0x190
 shrink_page_list+0x1d5a/0x2fb0
 ? putback_lru_page+0x3f0/0x3f0
 ? save_trace+0x1e0/0x1e0
 ? _lookup_address_cpa.isra.13+0x40/0x60
 ? debug_show_all_locks+0x2f0/0x2f0
 ? kmem_cache_free+0x8c/0x280
 ? free_extent_state+0x1c8/0x3b0
 ? mark_lock+0x1b1/0xa00
 ? page_rmapping+0xd0/0xd0
 ? print_irqtrace_events+0x110/0x110
 ? shrink_node_memcg.constprop.88+0x4c9/0x5e0
 ? shrink_node+0x12d/0x260
 ? try_to_free_pages+0x418/0xaf0
 ? __alloc_pages_slowpath+0x976/0x1790
 ? __alloc_pages_nodemask+0x52c/0x5c0
 ? delete_node+0x28d/0x5c0
 ? find_held_lock+0x6d/0xd0
 ? free_pcppages_bulk+0x381/0x570
 ? lock_acquire+0x350/0x350
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? __lock_is_held+0x51/0xc0
 ? _raw_spin_unlock+0x24/0x30
 ? free_pcppages_bulk+0x381/0x570
 ? mark_lock+0x1b1/0xa00
 ? free_compound_page+0x30/0x30
 ? print_irqtrace_events+0x110/0x110
 ? __kernel_map_pages+0x2c9/0x310
 ? mark_lock+0x1b1/0xa00
 ? print_irqtrace_events+0x110/0x110
 ? __delete_from_page_cache+0x2e7/0x4e0
 ? save_trace+0x1e0/0x1e0
 ? __add_to_page_cache_locked+0x680/0x680
 ? find_held_lock+0x6d/0xd0
 ? __list_add_valid+0x29/0xa0
 ? free_unref_page_commit+0x198/0x270
 ? drain_local_pages_wq+0x20/0x20
 ? stop_critical_timings+0x210/0x210
 ? mark_lock+0x1b1/0xa00
 ? mark_lock+0x1b1/0xa00
 ? print_irqtrace_events+0x110/0x110
 ? __lock_acquire+0x616/0x2040
 ? mark_lock+0x1b1/0xa00
 ? mark_lock+0x1b1/0xa00
 ? print_irqtrace_events+0x110/0x110
 ? __phys_addr_symbol+0x23/0x40
 ? __change_page_attr_set_clr+0xe86/0x1640
 ? __btrfs_releasepage+0x1c0/0x1c0
 ? mark_lock+0x1b1/0xa00
 ? mark_lock+0x1b1/0xa00
 ? print_irqtrace_events+0x110/0x110
 ? mark_lock+0x1b1/0xa00
 ? __lock_acquire+0x616/0x2040
 ? __lock_acquire+0x616/0x2040
 ? debug_show_all_locks+0x2f0/0x2f0
 ? swiotlb_free_coherent+0x60/0x60
 ? __phys_addr+0x32/0x80
 ? igb_xmit_frame_ring+0xad7/0x1890
 ? stack_access_ok+0x35/0x80
 ? deref_stack_reg+0xa1/0xe0
 ? __read_once_size_nocheck.constprop.6+0x10/0x10
 ?

problematic rc9 futex changes.

2018-01-22 Thread Dave Jones

c1e2f0eaf015fb: "futex: Avoid violating the 10th rule of futex" seems to
make up a few new rules to violate.

Coverity picked up these two problems in the same code:


First it or's a value with stack garbage.

___
*** CID 1427826:  Uninitialized variables  (UNINIT)
/kernel/futex.c: 2316 in fixup_pi_state_owner()
2310 
2311raw_spin_lock_irq(_state->pi_mutex.wait_lock);
2312 
2313oldowner = pi_state->owner;
2314/* Owner died? */
2315if (!pi_state->owner)
>>> CID 1427826:  Uninitialized variables  (UNINIT)
>>> Using uninitialized value "newtid".
2316newtid |= FUTEX_OWNER_DIED;
2317 
2318/*
2319 * We are here because either:
2320 *
2321 *  - we stole the lock and pi_state->owner needs updating to 
reflect

Then it notices that value is never read from before it's written
anyway.

*** CID 1427824:  Code maintainability issues  (UNUSED_VALUE)
/kernel/futex.c: 2316 in fixup_pi_state_owner()
2310 
2311raw_spin_lock_irq(_state->pi_mutex.wait_lock);
2312 
2313oldowner = pi_state->owner;
2314/* Owner died? */
2315if (!pi_state->owner)
>>> CID 1427824:  Code maintainability issues  (UNUSED_VALUE)
>>> Assigning value from "newtid | 0x4000U" to "newtid" here, but that 
>>> stored value is overwritten before it can be used.
2316newtid |= FUTEX_OWNER_DIED;
2317 
2318/*
2319 * We are here because either:
2320 *
2321 *  - we stole the lock and pi_state->owner needs updating to 
reflect


(The next reference of newtid being..

2369 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;


Dave

problematic rc9 futex changes.

2018-01-22 Thread Dave Jones

c1e2f0eaf015fb: "futex: Avoid violating the 10th rule of futex" seems to
make up a few new rules to violate.

Coverity picked up these two problems in the same code:


First it or's a value with stack garbage.

___
*** CID 1427826:  Uninitialized variables  (UNINIT)
/kernel/futex.c: 2316 in fixup_pi_state_owner()
2310 
2311raw_spin_lock_irq(_state->pi_mutex.wait_lock);
2312 
2313oldowner = pi_state->owner;
2314/* Owner died? */
2315if (!pi_state->owner)
>>> CID 1427826:  Uninitialized variables  (UNINIT)
>>> Using uninitialized value "newtid".
2316newtid |= FUTEX_OWNER_DIED;
2317 
2318/*
2319 * We are here because either:
2320 *
2321 *  - we stole the lock and pi_state->owner needs updating to 
reflect

Then it notices that value is never read from before it's written
anyway.

*** CID 1427824:  Code maintainability issues  (UNUSED_VALUE)
/kernel/futex.c: 2316 in fixup_pi_state_owner()
2310 
2311raw_spin_lock_irq(_state->pi_mutex.wait_lock);
2312 
2313oldowner = pi_state->owner;
2314/* Owner died? */
2315if (!pi_state->owner)
>>> CID 1427824:  Code maintainability issues  (UNUSED_VALUE)
>>> Assigning value from "newtid | 0x4000U" to "newtid" here, but that 
>>> stored value is overwritten before it can be used.
2316newtid |= FUTEX_OWNER_DIED;
2317 
2318/*
2319 * We are here because either:
2320 *
2321 *  - we stole the lock and pi_state->owner needs updating to 
reflect


(The next reference of newtid being..

2369 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;


Dave

Re: proc_flush_task oops

2017-12-21 Thread Dave Jones

On Thu, Dec 21, 2017 at 07:31:26PM -0600, Eric W. Biederman wrote:
 > Dave Jones <da...@codemonkey.org.uk> writes:
 > 
 > > On Thu, Dec 21, 2017 at 12:38:12PM +0200, Alexey Dobriyan wrote:
 > >  
 > >  > > with proc_mnt still set to NULL is a mystery to me.
 > >  > >
 > >  > > Is there any chance the idr code doesn't always return the lowest 
 > > valid
 > >  > > free number?  So init gets assigned something other than 1?
 > >  > 
 > >  > Well, this theory is easy to test (attached).
 > >
 > > I didn't hit this BUG, but I hit the same oops in proc_flush_task.
 > 
 > Scratch one idea.
 > 
 > If it isn't too much trouble can you try this.
 > 
 > I am wondering if somehow the proc_mnt that is NULL is somewhere in the
 > middle of the stack of pid namespaces.
 > 
 > This adds two warnings.  The first just reports which pid namespace in
 > the stack of pid namespaces is problematic, and the pid number in that
 > pid namespace.  Which should give a whole lot more to go by.
 > 
 > The second warning complains if we manage to create a pid namespace
 > where the parent pid namespace is not properly set up.  The test to
 > prevent that looks quite robust, but at this point I don't know where to
 > look.

Progress ?

[ 1653.030190] [ cut here ]
[ 1653.030852] 1/1: 2 no proc_mnt
[ 1653.030946] WARNING: CPU: 2 PID: 4420 at kernel/pid.c:213 
alloc_pid+0x24f/0x2a0

Re: proc_flush_task oops

2017-12-21 Thread Dave Jones

On Thu, Dec 21, 2017 at 07:31:26PM -0600, Eric W. Biederman wrote:
 > Dave Jones  writes:
 > 
 > > On Thu, Dec 21, 2017 at 12:38:12PM +0200, Alexey Dobriyan wrote:
 > >  
 > >  > > with proc_mnt still set to NULL is a mystery to me.
 > >  > >
 > >  > > Is there any chance the idr code doesn't always return the lowest 
 > > valid
 > >  > > free number?  So init gets assigned something other than 1?
 > >  > 
 > >  > Well, this theory is easy to test (attached).
 > >
 > > I didn't hit this BUG, but I hit the same oops in proc_flush_task.
 > 
 > Scratch one idea.
 > 
 > If it isn't too much trouble can you try this.
 > 
 > I am wondering if somehow the proc_mnt that is NULL is somewhere in the
 > middle of the stack of pid namespaces.
 > 
 > This adds two warnings.  The first just reports which pid namespace in
 > the stack of pid namespaces is problematic, and the pid number in that
 > pid namespace.  Which should give a whole lot more to go by.
 > 
 > The second warning complains if we manage to create a pid namespace
 > where the parent pid namespace is not properly set up.  The test to
 > prevent that looks quite robust, but at this point I don't know where to
 > look.

Progress ?

[ 1653.030190] [ cut here ]
[ 1653.030852] 1/1: 2 no proc_mnt
[ 1653.030946] WARNING: CPU: 2 PID: 4420 at kernel/pid.c:213 
alloc_pid+0x24f/0x2a0

Re: proc_flush_task oops

2017-12-21 Thread Dave Jones

On Thu, Dec 21, 2017 at 12:38:12PM +0200, Alexey Dobriyan wrote:
 
 > > with proc_mnt still set to NULL is a mystery to me.
 > >
 > > Is there any chance the idr code doesn't always return the lowest valid
 > > free number?  So init gets assigned something other than 1?
 > 
 > Well, this theory is easy to test (attached).

I didn't hit this BUG, but I hit the same oops in proc_flush_task.

Dave

Re: proc_flush_task oops

2017-12-21 Thread Dave Jones

On Thu, Dec 21, 2017 at 12:38:12PM +0200, Alexey Dobriyan wrote:
 
 > > with proc_mnt still set to NULL is a mystery to me.
 > >
 > > Is there any chance the idr code doesn't always return the lowest valid
 > > free number?  So init gets assigned something other than 1?
 > 
 > Well, this theory is easy to test (attached).

I didn't hit this BUG, but I hit the same oops in proc_flush_task.

Dave

Re: proc_flush_task oops

2017-12-21 Thread Dave Jones

On Thu, Dec 21, 2017 at 12:38:12PM +0200, Alexey Dobriyan wrote:
 > On 12/21/17, Eric W. Biederman  wrote:
 > > I have stared at this code, and written some test programs and I can't
 > > see what is going on.  alloc_pid by design and in implementation (as far
 > > as I can see) is always single threaded when allocating the first pid
 > > in a pid namespace.  idr_init always initialized idr_next to 0.
 > >
 > > So how we can get past:
 > >
 > >if (unlikely(is_child_reaper(pid))) {
 > >if (pid_ns_prepare_proc(ns)) {
 > >disable_pid_allocation(ns);
 > >goto out_free;
 > >}
 > >}
 > >
 > > with proc_mnt still set to NULL is a mystery to me.
 > >
 > > Is there any chance the idr code doesn't always return the lowest valid
 > > free number?  So init gets assigned something other than 1?
 > 
 > Well, this theory is easy to test (attached).

I'll give this a shot and report back when I get to the office.

 > There is a "valid" way to break the code via kernel.ns_last_pid:
 > unshare+write+fork but the reproducer doesn't seem to use it (or it does?)

that sysctl is root only, so that isn't at play here.

Dav

Re: proc_flush_task oops

2017-12-21 Thread Dave Jones

On Thu, Dec 21, 2017 at 12:38:12PM +0200, Alexey Dobriyan wrote:
 > On 12/21/17, Eric W. Biederman  wrote:
 > > I have stared at this code, and written some test programs and I can't
 > > see what is going on.  alloc_pid by design and in implementation (as far
 > > as I can see) is always single threaded when allocating the first pid
 > > in a pid namespace.  idr_init always initialized idr_next to 0.
 > >
 > > So how we can get past:
 > >
 > >if (unlikely(is_child_reaper(pid))) {
 > >if (pid_ns_prepare_proc(ns)) {
 > >disable_pid_allocation(ns);
 > >goto out_free;
 > >}
 > >}
 > >
 > > with proc_mnt still set to NULL is a mystery to me.
 > >
 > > Is there any chance the idr code doesn't always return the lowest valid
 > > free number?  So init gets assigned something other than 1?
 > 
 > Well, this theory is easy to test (attached).

I'll give this a shot and report back when I get to the office.

 > There is a "valid" way to break the code via kernel.ns_last_pid:
 > unshare+write+fork but the reproducer doesn't seem to use it (or it does?)

that sysctl is root only, so that isn't at play here.

Dav

Re: proc_flush_task oops

2017-12-20 Thread Dave Jones

On Wed, Dec 20, 2017 at 12:25:52PM -0600, Eric W. Biederman wrote:
> >  > 
 > >  > If the warning triggers it means the bug is in alloc_pid and somehow
 > >  > something has gotten past the is_child_reaper check.
 > >
 > > You're onto something.
 > >
 > I am not seeing where things go wrong, but that puts the recent pid bitmap, 
 > bit
 > hash to idr change in the suspect zone.
 > 
 > Can you try reverting that change:
 > 
 > e8cfbc245e24 ("pid: remove pidhash")
 > 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API")
 > 
 > While keeping the warning in place so we can see if this fixes the
 > allocation problem?

So I can't trigger this any more with those reverted.  I seem to hit a
bunch of other long-standing bugs first.  I'll keep running it
overnight, but it looks like this is where the problem lies.

Dave

Re: proc_flush_task oops

2017-12-20 Thread Dave Jones

On Wed, Dec 20, 2017 at 12:25:52PM -0600, Eric W. Biederman wrote:
> >  > 
 > >  > If the warning triggers it means the bug is in alloc_pid and somehow
 > >  > something has gotten past the is_child_reaper check.
 > >
 > > You're onto something.
 > >
 > I am not seeing where things go wrong, but that puts the recent pid bitmap, 
 > bit
 > hash to idr change in the suspect zone.
 > 
 > Can you try reverting that change:
 > 
 > e8cfbc245e24 ("pid: remove pidhash")
 > 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API")
 > 
 > While keeping the warning in place so we can see if this fixes the
 > allocation problem?

So I can't trigger this any more with those reverted.  I seem to hit a
bunch of other long-standing bugs first.  I'll keep running it
overnight, but it looks like this is where the problem lies.

Dave

Re: proc_flush_task oops

2017-12-19 Thread Dave Jones

On Tue, Dec 19, 2017 at 07:54:24PM -0600, Eric W. Biederman wrote:

 > > *Scratches my head*  I am not seeing anything obvious.
 > 
 > Can you try this patch as you reproduce this issue?
 > 
 > diff --git a/kernel/pid.c b/kernel/pid.c
 > index b13b624e2c49..df9e5d4d8f83 100644
 > --- a/kernel/pid.c
 > +++ b/kernel/pid.c
 > @@ -210,6 +210,7 @@ struct pid *alloc_pid(struct pid_namespace *ns)
 > goto out_unlock;
 > for ( ; upid >= pid->numbers; --upid) {
 > /* Make the PID visible to find_pid_ns. */
 > +   WARN_ON(!upid->ns->proc_mnt);
 > idr_replace(>ns->idr, pid, upid->nr);
 > upid->ns->pid_allocated++;
 > }
 > 
 > 
 > If the warning triggers it means the bug is in alloc_pid and somehow
 > something has gotten past the is_child_reaper check.

You're onto something.

WARNING: CPU: 1 PID: 12020 at kernel/pid.c:213 alloc_pid+0x230/0x280
CPU: 1 PID: 12020 Comm: trinity-c29 Not tainted 4.15.0-rc4-think+ #3 
RIP: 0010:alloc_pid+0x230/0x280
RSP: 0018:c90009977d48 EFLAGS: 00010046
RAX: 0030 RBX: 8804fb431280 RCX: 8f5c28f5c28f5c29
RDX: 88050a00de40 RSI: 82005218 RDI: 8804fc6aa9a8
RBP: 8804fb431270 R08:  R09: 0001
R10: c90009977cc0 R11: eab94e31da7171b7 R12: 8804fb431260
R13: 8804fb431240 R14: 82005200 R15: 8804fb431268
FS:  7f49b9065700() GS:88050a00() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f49b906a000 CR3: 0004f7446001 CR4: 001606e0
DR0: 7f0b4c405000 DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Call Trace:
 copy_process.part.41+0x14fa/0x1e30
 _do_fork+0xe7/0x720
 ? rcu_read_lock_sched_held+0x6c/0x80
 ? syscall_trace_enter+0x2d7/0x340
 do_syscall_64+0x60/0x210
 entry_SYSCALL64_slow_path+0x25/0x25

followed immediately by...

Oops:  [#1] SMP
CPU: 1 PID: 12020 Comm: trinity-c29 Tainted: GW
4.15.0-rc4-think+ #3 
RIP: 0010:proc_flush_task+0x8e/0x1b0
RSP: 0018:c90009977c40 EFLAGS: 00010286
RAX: 0001 RBX: 0001 RCX: fffb
RDX:  RSI: c90009977c50 RDI: 
RBP: c90009977c63 R08:  R09: 0002
R10: c90009977b70 R11: c90009977c64 R12: 0004
R13:  R14: 0004 R15: 8804fb431240
FS:  7f49b9065700() GS:88050a00() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2:  CR3: 0004f7446001 CR4: 001606e0
DR0: 7f0b4c405000 DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Call Trace:
 ? release_task+0xaf/0x680
 release_task+0xd2/0x680
 ? wait_consider_task+0xb82/0xce0
 wait_consider_task+0xbe9/0xce0
 ? do_wait+0xe1/0x330
 do_wait+0x151/0x330
 kernel_wait4+0x8d/0x150
 ? task_stopped_code+0x50/0x50
 SYSC_wait4+0x95/0xa0
 ? rcu_read_lock_sched_held+0x6c/0x80
 ? syscall_trace_enter+0x2d7/0x340
 ? do_syscall_64+0x60/0x210
 do_syscall_64+0x60/0x210
 entry_SYSCALL64_slow_path+0x25/0x25

Re: proc_flush_task oops

2017-12-19 Thread Dave Jones

On Tue, Dec 19, 2017 at 07:54:24PM -0600, Eric W. Biederman wrote:

 > > *Scratches my head*  I am not seeing anything obvious.
 > 
 > Can you try this patch as you reproduce this issue?
 > 
 > diff --git a/kernel/pid.c b/kernel/pid.c
 > index b13b624e2c49..df9e5d4d8f83 100644
 > --- a/kernel/pid.c
 > +++ b/kernel/pid.c
 > @@ -210,6 +210,7 @@ struct pid *alloc_pid(struct pid_namespace *ns)
 > goto out_unlock;
 > for ( ; upid >= pid->numbers; --upid) {
 > /* Make the PID visible to find_pid_ns. */
 > +   WARN_ON(!upid->ns->proc_mnt);
 > idr_replace(>ns->idr, pid, upid->nr);
 > upid->ns->pid_allocated++;
 > }
 > 
 > 
 > If the warning triggers it means the bug is in alloc_pid and somehow
 > something has gotten past the is_child_reaper check.

You're onto something.

WARNING: CPU: 1 PID: 12020 at kernel/pid.c:213 alloc_pid+0x230/0x280
CPU: 1 PID: 12020 Comm: trinity-c29 Not tainted 4.15.0-rc4-think+ #3 
RIP: 0010:alloc_pid+0x230/0x280
RSP: 0018:c90009977d48 EFLAGS: 00010046
RAX: 0030 RBX: 8804fb431280 RCX: 8f5c28f5c28f5c29
RDX: 88050a00de40 RSI: 82005218 RDI: 8804fc6aa9a8
RBP: 8804fb431270 R08:  R09: 0001
R10: c90009977cc0 R11: eab94e31da7171b7 R12: 8804fb431260
R13: 8804fb431240 R14: 82005200 R15: 8804fb431268
FS:  7f49b9065700() GS:88050a00() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f49b906a000 CR3: 0004f7446001 CR4: 001606e0
DR0: 7f0b4c405000 DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Call Trace:
 copy_process.part.41+0x14fa/0x1e30
 _do_fork+0xe7/0x720
 ? rcu_read_lock_sched_held+0x6c/0x80
 ? syscall_trace_enter+0x2d7/0x340
 do_syscall_64+0x60/0x210
 entry_SYSCALL64_slow_path+0x25/0x25

followed immediately by...

Oops:  [#1] SMP
CPU: 1 PID: 12020 Comm: trinity-c29 Tainted: GW
4.15.0-rc4-think+ #3 
RIP: 0010:proc_flush_task+0x8e/0x1b0
RSP: 0018:c90009977c40 EFLAGS: 00010286
RAX: 0001 RBX: 0001 RCX: fffb
RDX:  RSI: c90009977c50 RDI: 
RBP: c90009977c63 R08:  R09: 0002
R10: c90009977b70 R11: c90009977c64 R12: 0004
R13:  R14: 0004 R15: 8804fb431240
FS:  7f49b9065700() GS:88050a00() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2:  CR3: 0004f7446001 CR4: 001606e0
DR0: 7f0b4c405000 DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Call Trace:
 ? release_task+0xaf/0x680
 release_task+0xd2/0x680
 ? wait_consider_task+0xb82/0xce0
 wait_consider_task+0xbe9/0xce0
 ? do_wait+0xe1/0x330
 do_wait+0x151/0x330
 kernel_wait4+0x8d/0x150
 ? task_stopped_code+0x50/0x50
 SYSC_wait4+0x95/0xa0
 ? rcu_read_lock_sched_held+0x6c/0x80
 ? syscall_trace_enter+0x2d7/0x340
 ? do_syscall_64+0x60/0x210
 do_syscall_64+0x60/0x210
 entry_SYSCALL64_slow_path+0x25/0x25

Re: proc_flush_task oops

2017-12-19 Thread Dave Jones

On Tue, Dec 19, 2017 at 12:27:30PM -0600, Eric W. Biederman wrote:
 > Dave Jones <da...@codemonkey.org.uk> writes:
 > 
 > > On Mon, Dec 18, 2017 at 03:50:52PM -0800, Linus Torvalds wrote:
 > >
 > >  > But I don't see what would have changed in this area recently.
 > >  > 
 > >  > Do you end up saving the seeds that cause crashes? Is this
 > >  > reproducible? (Other than seeing it twoce, of course)
 > >
 > > Only clue so far, is every time I'm able to trigger it, the last thing
 > > the child process that triggers it did, was an execveat.
 > 
 > Is there any chance the excveat might be called from a child thread?

If trinity choose one of the exec syscalls, it forks off an extra child
to do it in, on the off-chance that it succeeds, and we never return.
https://github.com/kernelslacker/trinity/blob/master/syscall.c#L139

 > That switching pids between tasks of a process during exec can get a
 > little bit tricky.
 > 
 > > Telling it to just fuzz execveat doesn't instantly trigger it, so it
 > > must be a combination of some other syscall. I'll leave a script running
 > > overnight to see if I can binary search the other syscalls in
 > > combination with it.
 > 
 > Could we have a buggy syscall that is stomping something?

Not totally impossible I guess, though I would expect that would
manifest in additional random failures, whereas this seems remarkably
consistent.

Dave

Re: proc_flush_task oops

2017-12-19 Thread Dave Jones

On Tue, Dec 19, 2017 at 12:27:30PM -0600, Eric W. Biederman wrote:
 > Dave Jones  writes:
 > 
 > > On Mon, Dec 18, 2017 at 03:50:52PM -0800, Linus Torvalds wrote:
 > >
 > >  > But I don't see what would have changed in this area recently.
 > >  > 
 > >  > Do you end up saving the seeds that cause crashes? Is this
 > >  > reproducible? (Other than seeing it twoce, of course)
 > >
 > > Only clue so far, is every time I'm able to trigger it, the last thing
 > > the child process that triggers it did, was an execveat.
 > 
 > Is there any chance the excveat might be called from a child thread?

If trinity choose one of the exec syscalls, it forks off an extra child
to do it in, on the off-chance that it succeeds, and we never return.
https://github.com/kernelslacker/trinity/blob/master/syscall.c#L139

 > That switching pids between tasks of a process during exec can get a
 > little bit tricky.
 > 
 > > Telling it to just fuzz execveat doesn't instantly trigger it, so it
 > > must be a combination of some other syscall. I'll leave a script running
 > > overnight to see if I can binary search the other syscalls in
 > > combination with it.
 > 
 > Could we have a buggy syscall that is stomping something?

Not totally impossible I guess, though I would expect that would
manifest in additional random failures, whereas this seems remarkably
consistent.

Dave

Re: proc_flush_task oops

2017-12-18 Thread Dave Jones

On Mon, Dec 18, 2017 at 03:50:52PM -0800, Linus Torvalds wrote:

 > But I don't see what would have changed in this area recently.
 > 
 > Do you end up saving the seeds that cause crashes? Is this
 > reproducible? (Other than seeing it twoce, of course)

Only clue so far, is every time I'm able to trigger it, the last thing
the child process that triggers it did, was an execveat.

Telling it to just fuzz execveat doesn't instantly trigger it, so it
must be a combination of some other syscall. I'll leave a script running
overnight to see if I can binary search the other syscalls in
combination with it.

One other thing: I said this was rc4, but it was actually rc4 + all the
x86 stuff from today.  There's enough creepy stuff in that pile, that
I'll try with just plain rc4 tomorrow too.

Dave

Re: proc_flush_task oops

2017-12-18 Thread Dave Jones

On Mon, Dec 18, 2017 at 03:50:52PM -0800, Linus Torvalds wrote:

 > But I don't see what would have changed in this area recently.
 > 
 > Do you end up saving the seeds that cause crashes? Is this
 > reproducible? (Other than seeing it twoce, of course)

Only clue so far, is every time I'm able to trigger it, the last thing
the child process that triggers it did, was an execveat.

Telling it to just fuzz execveat doesn't instantly trigger it, so it
must be a combination of some other syscall. I'll leave a script running
overnight to see if I can binary search the other syscalls in
combination with it.

One other thing: I said this was rc4, but it was actually rc4 + all the
x86 stuff from today.  There's enough creepy stuff in that pile, that
I'll try with just plain rc4 tomorrow too.

Dave

Re: proc_flush_task oops

2017-12-18 Thread Dave Jones

On Mon, Dec 18, 2017 at 03:50:52PM -0800, Linus Torvalds wrote:
 > On Mon, Dec 18, 2017 at 3:10 PM, Dave Jones <da...@codemonkey.org.uk> wrote:
 > > On Mon, Dec 18, 2017 at 10:15:41PM +, Al Viro wrote:
 > >  > On Mon, Dec 18, 2017 at 04:44:38PM -0500, Dave Jones wrote:
 > >  > > I've hit this twice today. It's odd, because afaics, none of this code
 > >  > > has really changed in a long time.
 > >  >
 > >  > Which tree had that been?
 > >
 > > Linus, rc4.
 > 
 > Ok, so the original report was marked as spam for me for whatever
 > reason. I ended up re-analyzing the oops, but came to the same
 > conclusion you did: it's a NULL mnt pointer in proc_flush_task_mnt().
 > .. 
 > But I don't see what would have changed in this area recently.
 > 
 > Do you end up saving the seeds that cause crashes? Is this
 > reproducible? (Other than seeing it twoce, of course)

Hit it another two times in the last hour, so it's pretty reproducable.
Running it now with some more logging, will see if that yields any extra clues.

Dave

Re: proc_flush_task oops

2017-12-18 Thread Dave Jones

On Mon, Dec 18, 2017 at 03:50:52PM -0800, Linus Torvalds wrote:
 > On Mon, Dec 18, 2017 at 3:10 PM, Dave Jones  wrote:
 > > On Mon, Dec 18, 2017 at 10:15:41PM +, Al Viro wrote:
 > >  > On Mon, Dec 18, 2017 at 04:44:38PM -0500, Dave Jones wrote:
 > >  > > I've hit this twice today. It's odd, because afaics, none of this code
 > >  > > has really changed in a long time.
 > >  >
 > >  > Which tree had that been?
 > >
 > > Linus, rc4.
 > 
 > Ok, so the original report was marked as spam for me for whatever
 > reason. I ended up re-analyzing the oops, but came to the same
 > conclusion you did: it's a NULL mnt pointer in proc_flush_task_mnt().
 > .. 
 > But I don't see what would have changed in this area recently.
 > 
 > Do you end up saving the seeds that cause crashes? Is this
 > reproducible? (Other than seeing it twoce, of course)

Hit it another two times in the last hour, so it's pretty reproducable.
Running it now with some more logging, will see if that yields any extra clues.

Dave

Re: proc_flush_task oops

2017-12-18 Thread Dave Jones

On Mon, Dec 18, 2017 at 10:15:41PM +, Al Viro wrote:
 > On Mon, Dec 18, 2017 at 04:44:38PM -0500, Dave Jones wrote:
 > > I've hit this twice today. It's odd, because afaics, none of this code
 > > has really changed in a long time.
 > 
 > Which tree had that been?

Linus, rc4.

Dave

Re: proc_flush_task oops

2017-12-18 Thread Dave Jones

On Mon, Dec 18, 2017 at 10:15:41PM +, Al Viro wrote:
 > On Mon, Dec 18, 2017 at 04:44:38PM -0500, Dave Jones wrote:
 > > I've hit this twice today. It's odd, because afaics, none of this code
 > > has really changed in a long time.
 > 
 > Which tree had that been?

Linus, rc4.

Dave

proc_flush_task oops

2017-12-18 Thread Dave Jones

I've hit this twice today. It's odd, because afaics, none of this code
has really changed in a long time.

Dave

Oops:  [#1] SMP
CPU: 2 PID: 6743 Comm: trinity-c117 Not tainted 4.15.0-rc4-think+ #2 
RIP: 0010:proc_flush_task+0x8e/0x1b0
RSP: 0018:c9000bbffc40 EFLAGS: 00010286
RAX: 0001 RBX: 0001 RCX: fffb
RDX:  RSI: c9000bbffc50 RDI: 
RBP: c9000bbffc63 R08:  R09: 0002
R10: c9000bbffb70 R11: c9000bbffc64 R12: 0003
R13:  R14: 0003 R15: 8804c10d7840
FS:  7f7cb8965700() GS:88050a20() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2:  CR3: 0003e21ae003 CR4: 001606e0
DR0: 7fb1d6c22000 DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Call Trace:
 ? release_task+0xaf/0x680
 release_task+0xd2/0x680
 ? wait_consider_task+0xb82/0xce0
 wait_consider_task+0xbe9/0xce0
 ? do_wait+0xe1/0x330
 do_wait+0x151/0x330
 kernel_wait4+0x8d/0x150
 ? task_stopped_code+0x50/0x50
 SYSC_wait4+0x95/0xa0
 ? rcu_read_lock_sched_held+0x6c/0x80
 ? syscall_trace_enter+0x2d7/0x340
 ? do_syscall_64+0x60/0x210
 do_syscall_64+0x60/0x210
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f7cb82603aa
RSP: 002b:7ffd60770bc8 EFLAGS: 0246
 ORIG_RAX: 003d
RAX: ffda RBX: 7f7cb6cd4000 RCX: 7f7cb82603aa
RDX: 000b RSI: 7ffd60770bd0 RDI: 7cca
RBP: 7cca R08: 7f7cb8965700 R09: 7ffd607c7080
R10:  R11: 0246 R12: 
R13: 7ffd60770bd0 R14: 7f7cb6cd4058 R15: cccd
Code: c1 e2 04 44 8b 60 30 48 8b 40 38 44 8b 34 11 48 c7 c2 60 3a f5 81 44 89 
e1 4c 8b 68 58 e8 4b b4 77 00 89 44 24 14 48 8d 74 24 10 <49> 8b 7d 00 e8 b9 6a 
f9 ff 48 85 c0 74 1a 48 89 c7 48 89 44 24 
RIP: proc_flush_task+0x8e/0x1b0 RSP: c9000bbffc40
CR2: 
---[ end trace 53d67a6481059862 ]---

All code

   0:   c1 e2 04shl$0x4,%edx
   3:   44 8b 60 30 mov0x30(%rax),%r12d
   7:   48 8b 40 38 mov0x38(%rax),%rax
   b:   44 8b 34 11 mov(%rcx,%rdx,1),%r14d
   f:   48 c7 c2 60 3a f5 81mov$0x81f53a60,%rdx
  16:   44 89 e1mov%r12d,%ecx
  19:   4c 8b 68 58 mov0x58(%rax),%r13
  1d:   e8 4b b4 77 00  callq  0x77b46d
  22:   89 44 24 14 mov%eax,0x14(%rsp)
  26:   48 8d 74 24 10  lea0x10(%rsp),%rsi
  2b:*  49 8b 7d 00 mov0x0(%r13),%rdi <-- trapping 
instruction
  2f:   e8 b9 6a f9 ff  callq  0xfff96aed
  34:   48 85 c0test   %rax,%rax
  37:   74 1a   je 0x53
  39:   48 89 c7mov%rax,%rdi
  3c:   48  rex.W
  3d:   89  .byte 0x89
  3e:   44  rex.R
  3f:   24  .byte 0x24

Code starting with the faulting instruction
===
   0:   49 8b 7d 00 mov0x0(%r13),%rdi
   4:   e8 b9 6a f9 ff  callq  0xfff96ac2
   9:   48 85 c0test   %rax,%rax
   c:   74 1a   je 0x28
   e:   48 89 c7mov%rax,%rdi
  11:   48  rex.W
  12:   89  .byte 0x89
  13:   44  rex.R
  14:   24  .byte 0x24



This looks like an inlined part of proc_flush_task_mnt

dentry = d_hash_and_lookup(mnt->mnt_root, );
4f99:   48 8d 74 24 10  lea0x10(%rsp),%rsi
4f9e:   49 8b 7d 00 mov0x0(%r13),%rdi
4fa2:   e8 00 00 00 00  callq  4fa7 

So it looks like this..

3097 for (i = 0; i <= pid->level; i++) {
3098 upid = >numbers[i];
3099 proc_flush_task_mnt(upid->ns->proc_mnt, upid->nr,
3100 tgid->numbers[i].nr);
3101 }

somehow passed a null upid->ns->proc_mnt down there.

I'll try and narrow down a reproducer tomorrow.

Any obvious recent changes that might explain this, or did I just
finally appease the entropy gods enough to find the right combination of
args to hit this ?

Dave

proc_flush_task oops

2017-12-18 Thread Dave Jones

I've hit this twice today. It's odd, because afaics, none of this code
has really changed in a long time.

Dave

Oops:  [#1] SMP
CPU: 2 PID: 6743 Comm: trinity-c117 Not tainted 4.15.0-rc4-think+ #2 
RIP: 0010:proc_flush_task+0x8e/0x1b0
RSP: 0018:c9000bbffc40 EFLAGS: 00010286
RAX: 0001 RBX: 0001 RCX: fffb
RDX:  RSI: c9000bbffc50 RDI: 
RBP: c9000bbffc63 R08:  R09: 0002
R10: c9000bbffb70 R11: c9000bbffc64 R12: 0003
R13:  R14: 0003 R15: 8804c10d7840
FS:  7f7cb8965700() GS:88050a20() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2:  CR3: 0003e21ae003 CR4: 001606e0
DR0: 7fb1d6c22000 DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Call Trace:
 ? release_task+0xaf/0x680
 release_task+0xd2/0x680
 ? wait_consider_task+0xb82/0xce0
 wait_consider_task+0xbe9/0xce0
 ? do_wait+0xe1/0x330
 do_wait+0x151/0x330
 kernel_wait4+0x8d/0x150
 ? task_stopped_code+0x50/0x50
 SYSC_wait4+0x95/0xa0
 ? rcu_read_lock_sched_held+0x6c/0x80
 ? syscall_trace_enter+0x2d7/0x340
 ? do_syscall_64+0x60/0x210
 do_syscall_64+0x60/0x210
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f7cb82603aa
RSP: 002b:7ffd60770bc8 EFLAGS: 0246
 ORIG_RAX: 003d
RAX: ffda RBX: 7f7cb6cd4000 RCX: 7f7cb82603aa
RDX: 000b RSI: 7ffd60770bd0 RDI: 7cca
RBP: 7cca R08: 7f7cb8965700 R09: 7ffd607c7080
R10:  R11: 0246 R12: 
R13: 7ffd60770bd0 R14: 7f7cb6cd4058 R15: cccd
Code: c1 e2 04 44 8b 60 30 48 8b 40 38 44 8b 34 11 48 c7 c2 60 3a f5 81 44 89 
e1 4c 8b 68 58 e8 4b b4 77 00 89 44 24 14 48 8d 74 24 10 <49> 8b 7d 00 e8 b9 6a 
f9 ff 48 85 c0 74 1a 48 89 c7 48 89 44 24 
RIP: proc_flush_task+0x8e/0x1b0 RSP: c9000bbffc40
CR2: 
---[ end trace 53d67a6481059862 ]---

All code

   0:   c1 e2 04shl$0x4,%edx
   3:   44 8b 60 30 mov0x30(%rax),%r12d
   7:   48 8b 40 38 mov0x38(%rax),%rax
   b:   44 8b 34 11 mov(%rcx,%rdx,1),%r14d
   f:   48 c7 c2 60 3a f5 81mov$0x81f53a60,%rdx
  16:   44 89 e1mov%r12d,%ecx
  19:   4c 8b 68 58 mov0x58(%rax),%r13
  1d:   e8 4b b4 77 00  callq  0x77b46d
  22:   89 44 24 14 mov%eax,0x14(%rsp)
  26:   48 8d 74 24 10  lea0x10(%rsp),%rsi
  2b:*  49 8b 7d 00 mov0x0(%r13),%rdi <-- trapping 
instruction
  2f:   e8 b9 6a f9 ff  callq  0xfff96aed
  34:   48 85 c0test   %rax,%rax
  37:   74 1a   je 0x53
  39:   48 89 c7mov%rax,%rdi
  3c:   48  rex.W
  3d:   89  .byte 0x89
  3e:   44  rex.R
  3f:   24  .byte 0x24

Code starting with the faulting instruction
===
   0:   49 8b 7d 00 mov0x0(%r13),%rdi
   4:   e8 b9 6a f9 ff  callq  0xfff96ac2
   9:   48 85 c0test   %rax,%rax
   c:   74 1a   je 0x28
   e:   48 89 c7mov%rax,%rdi
  11:   48  rex.W
  12:   89  .byte 0x89
  13:   44  rex.R
  14:   24  .byte 0x24



This looks like an inlined part of proc_flush_task_mnt

dentry = d_hash_and_lookup(mnt->mnt_root, );
4f99:   48 8d 74 24 10  lea0x10(%rsp),%rsi
4f9e:   49 8b 7d 00 mov0x0(%r13),%rdi
4fa2:   e8 00 00 00 00  callq  4fa7 

So it looks like this..

3097 for (i = 0; i <= pid->level; i++) {
3098 upid = >numbers[i];
3099 proc_flush_task_mnt(upid->ns->proc_mnt, upid->nr,
3100 tgid->numbers[i].nr);
3101 }

somehow passed a null upid->ns->proc_mnt down there.

I'll try and narrow down a reproducer tomorrow.

Any obvious recent changes that might explain this, or did I just
finally appease the entropy gods enough to find the right combination of
args to hit this ?

Dave

Re: drm/amd/display: Restructuring and cleaning up DML

2017-11-28 Thread Dave Jones

On Sat, Nov 18, 2017 at 12:02:01AM +, Linux Kernel wrote:
 > Web:
 > https://git.kernel.org/torvalds/c/6d04ee9dc10149db842d41de66eca201c9d91b60
 > Commit: 6d04ee9dc10149db842d41de66eca201c9d91b60
 > Parent: 19b7fe4a48efbe0f7e8c496b040c4eb16ff02313
 > Refname:refs/heads/master
 > Author: Dmytro Laktyushkin 
 > AuthorDate: Wed Aug 23 16:43:17 2017 -0400
 > Committer:  Alex Deucher 
 > CommitDate: Sat Oct 21 16:45:24 2017 -0400
 > 
 > drm/amd/display: Restructuring and cleaning up DML
 > 
 > Signed-off-by: Dmytro Laktyushkin 
 > Reviewed-by: Tony Cheng 
 > Acked-by: Harry Wentland 
 > Signed-off-by: Alex Deucher 
 > ---


 > diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c 
 > b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c
 > index a18474437990..b6abe0f3bb15 100644
 > --- a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c
 > +++ b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c
 > @@ -27,20 +27,36 @@
 >  
 >  float dcn_bw_mod(const float arg1, const float arg2)
 >  {
 > +if (arg1 != arg1)
 > +return arg2;
 > +if (arg2 != arg2)
 > +return arg1;
 >  return arg1 - arg1 * ((int) (arg1 / arg2));
 >  }
 >  
 >  float dcn_bw_min2(const float arg1, const float arg2)
 >  {
 > +if (arg1 != arg1)
 > +return arg2;
 > +if (arg2 != arg2)
 > +return arg1;
 >  return arg1 < arg2 ? arg1 : arg2;
 >  }
 >  
 >  unsigned int dcn_bw_max(const unsigned int arg1, const unsigned int arg2)
 >  {
 > +if (arg1 != arg1)
 > +return arg2;
 > +if (arg2 != arg2)
 > +return arg1;
 >  return arg1 > arg2 ? arg1 : arg2;
 >  }
 >  float dcn_bw_max2(const float arg1, const float arg2)
 >  {
 > +if (arg1 != arg1)
 > +return arg2;
 > +if (arg2 != arg2)
 > +return arg1;
 >  return arg1 > arg2 ? arg1 : arg2;
 >  }

This looks really, really bizarre. What was the intention here ?

(This, and a bunch of other stuff in this driver picked up by Coverity,
 sign up at scan.coverity.com if you want access, and I'll approve.)

Dave

Re: drm/amd/display: Restructuring and cleaning up DML

2017-11-28 Thread Dave Jones

On Sat, Nov 18, 2017 at 12:02:01AM +, Linux Kernel wrote:
 > Web:
 > https://git.kernel.org/torvalds/c/6d04ee9dc10149db842d41de66eca201c9d91b60
 > Commit: 6d04ee9dc10149db842d41de66eca201c9d91b60
 > Parent: 19b7fe4a48efbe0f7e8c496b040c4eb16ff02313
 > Refname:refs/heads/master
 > Author: Dmytro Laktyushkin 
 > AuthorDate: Wed Aug 23 16:43:17 2017 -0400
 > Committer:  Alex Deucher 
 > CommitDate: Sat Oct 21 16:45:24 2017 -0400
 > 
 > drm/amd/display: Restructuring and cleaning up DML
 > 
 > Signed-off-by: Dmytro Laktyushkin 
 > Reviewed-by: Tony Cheng 
 > Acked-by: Harry Wentland 
 > Signed-off-by: Alex Deucher 
 > ---


 > diff --git a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c 
 > b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c
 > index a18474437990..b6abe0f3bb15 100644
 > --- a/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c
 > +++ b/drivers/gpu/drm/amd/display/dc/calcs/dcn_calc_math.c
 > @@ -27,20 +27,36 @@
 >  
 >  float dcn_bw_mod(const float arg1, const float arg2)
 >  {
 > +if (arg1 != arg1)
 > +return arg2;
 > +if (arg2 != arg2)
 > +return arg1;
 >  return arg1 - arg1 * ((int) (arg1 / arg2));
 >  }
 >  
 >  float dcn_bw_min2(const float arg1, const float arg2)
 >  {
 > +if (arg1 != arg1)
 > +return arg2;
 > +if (arg2 != arg2)
 > +return arg1;
 >  return arg1 < arg2 ? arg1 : arg2;
 >  }
 >  
 >  unsigned int dcn_bw_max(const unsigned int arg1, const unsigned int arg2)
 >  {
 > +if (arg1 != arg1)
 > +return arg2;
 > +if (arg2 != arg2)
 > +return arg1;
 >  return arg1 > arg2 ? arg1 : arg2;
 >  }
 >  float dcn_bw_max2(const float arg1, const float arg2)
 >  {
 > +if (arg1 != arg1)
 > +return arg2;
 > +if (arg2 != arg2)
 > +return arg1;
 >  return arg1 > arg2 ? arg1 : arg2;
 >  }

This looks really, really bizarre. What was the intention here ?

(This, and a bunch of other stuff in this driver picked up by Coverity,
 sign up at scan.coverity.com if you want access, and I'll approve.)

Dave

Re: [trinity] WARNING: CPU: 0 PID: 515 at drivers/pci/pci-sysfs.c:1224 pci_mmap_resource+0xd6/0x10e

2017-11-27 Thread Dave Jones

On Fri, Nov 24, 2017 at 08:11:39AM +0800, Fengguang Wu wrote:
 > Hello,
 > 
 > FYI this happens in mainline kernel 4.14.0-12995-g0c86a6b.
 > It at least dates back to v4.9 .
 > 
 > I wonder where can we avoid this warning, by improving trinity (or how
 > we use it), or the pci subsystem?
 > 
 > [main] Added 42 filenames from /dev
 > [main] Added 13651 filenames from /proc
 > [main] Added 11163 filenames from /sys
 > [   19.452176] [ cut here ]
 > [   19.452938] process "trinity-main" tried to map 0x4000 bytes at page 
 > 0x0001 on :00:06.0 BAR 4 (start 0xfe008000, size 0x  
 >   4000)
 > [   19.454804] WARNING: CPU: 0 PID: 515 at drivers/pci/pci-sysfs.c:1224 
 > pci_mmap_resource+0xd6/0x10e

That's a root-only operation, where we allow the user to shoot
themselves in the foot afaik.   What you could do now that you're
running an up to date trinity in 0day, is pass the --dropprivs flag
to setuid to nobody in the child processes.

Dave

Re: [trinity] WARNING: CPU: 0 PID: 515 at drivers/pci/pci-sysfs.c:1224 pci_mmap_resource+0xd6/0x10e

2017-11-27 Thread Dave Jones

On Fri, Nov 24, 2017 at 08:11:39AM +0800, Fengguang Wu wrote:
 > Hello,
 > 
 > FYI this happens in mainline kernel 4.14.0-12995-g0c86a6b.
 > It at least dates back to v4.9 .
 > 
 > I wonder where can we avoid this warning, by improving trinity (or how
 > we use it), or the pci subsystem?
 > 
 > [main] Added 42 filenames from /dev
 > [main] Added 13651 filenames from /proc
 > [main] Added 11163 filenames from /sys
 > [   19.452176] [ cut here ]
 > [   19.452938] process "trinity-main" tried to map 0x4000 bytes at page 
 > 0x0001 on :00:06.0 BAR 4 (start 0xfe008000, size 0x  
 >   4000)
 > [   19.454804] WARNING: CPU: 0 PID: 515 at drivers/pci/pci-sysfs.c:1224 
 > pci_mmap_resource+0xd6/0x10e

That's a root-only operation, where we allow the user to shoot
themselves in the foot afaik.   What you could do now that you're
running an up to date trinity in 0day, is pass the --dropprivs flag
to setuid to nobody in the child processes.

Dave

Re: x86/umip: Enable User-Mode Instruction Prevention at runtime

2017-11-26 Thread Dave Jones

On Mon, Nov 13, 2017 at 11:44:02PM +, Linux Kernel wrote:
 > Web:
 > https://git.kernel.org/torvalds/c/aa35f896979d9610bb11df485cf7bb6ca241febb
 > Commit: aa35f896979d9610bb11df485cf7bb6ca241febb
 > Parent: c6a960bbf6a36572a06bde866d94a7338c7f256a
 > Refname:refs/heads/master
 > Author: Ricardo Neri 
 > AuthorDate: Sun Nov 5 18:27:54 2017 -0800
 > Committer:  Ingo Molnar 
 > CommitDate: Wed Nov 8 11:16:23 2017 +0100
 > 
 > x86/umip: Enable User-Mode Instruction Prevention at runtime
 

 > +config X86_INTEL_UMIP
 > +def_bool n
 > +depends on CPU_SUP_INTEL
 > +prompt "Intel User Mode Instruction Prevention" if EXPERT
 > +---help---
 > +  The User Mode Instruction Prevention (UMIP) is a security
 > +  feature in newer Intel processors.

Can we start defining which CPU generation features appear in in Kconfigs ?

In six months time, "newer" will mean even less than it does today.

It'd be nice to be able to answer oldconfig without having to look
things up in the SDM.

Dave

Re: x86/umip: Enable User-Mode Instruction Prevention at runtime

2017-11-26 Thread Dave Jones

On Mon, Nov 13, 2017 at 11:44:02PM +, Linux Kernel wrote:
 > Web:
 > https://git.kernel.org/torvalds/c/aa35f896979d9610bb11df485cf7bb6ca241febb
 > Commit: aa35f896979d9610bb11df485cf7bb6ca241febb
 > Parent: c6a960bbf6a36572a06bde866d94a7338c7f256a
 > Refname:refs/heads/master
 > Author: Ricardo Neri 
 > AuthorDate: Sun Nov 5 18:27:54 2017 -0800
 > Committer:  Ingo Molnar 
 > CommitDate: Wed Nov 8 11:16:23 2017 +0100
 > 
 > x86/umip: Enable User-Mode Instruction Prevention at runtime
 

 > +config X86_INTEL_UMIP
 > +def_bool n
 > +depends on CPU_SUP_INTEL
 > +prompt "Intel User Mode Instruction Prevention" if EXPERT
 > +---help---
 > +  The User Mode Instruction Prevention (UMIP) is a security
 > +  feature in newer Intel processors.

Can we start defining which CPU generation features appear in in Kconfigs ?

In six months time, "newer" will mean even less than it does today.

It'd be nice to be able to answer oldconfig without having to look
things up in the SDM.

Dave

Re: [4.14-rc7] task struct corruption after fork

2017-10-30 Thread Dave Jones

On Mon, Oct 30, 2017 at 11:59:30AM -0700, Linus Torvalds wrote:

 > and that location would *almost* make sense in that it's the end of
 > the same page that contained a "struct task_struct".
 > 
 > Are you running with VMAP_STACK? Is there perhaps some stale code that
 > ends up doing the old "stack pointer is in the same allocation as task
 > struct"?

yeah, it's enabled.

 > If you have the kernel symbols for that image, can you look up if any
 > of those addresses look like any static kernel symbol addresses? Those
 > things that have the pattern  8xxx might be symbol
 > addresses and give us a clue about where the values came from.

it got clobbered by another build, but I managed to rebuild it from the
older config. Modulo modules that were loaded, things should be the same.


> 81172d1e   r15

81172d00 t usage_match
81172d30 t HARDIRQ_verbose

> 8426daec   r14

841cee60 b lock_classes
844eed00 B nr_lock_classes

> ed008b17e001   r13

> 811737e2   r12

81173540 t __bfs
81173970 t check_noncircular

> 8426dbe0   rbp

841cee60 b lock_classes
844eed00 B nr_lock_classes


> 880458bf0008   rbx
> 84590d00   r11

841cee60 b lock_classes
844eed00 B nr_lock_classes

> 5  r10
> 81172d00   r9

81172d00 t usage_match

> 1  r8
> 11008b17dfed   rax
> 880458bf00f0   rcx
> ed008b17dff9   rdx
> dc00   rsi
> 41b58ab3   rdi
> 82a349a8   orig_eax

828a7f40 R inat_primary_table
82a42840 r POLY

> 81173540   rip

81173540 t __bfs

> 5a5a5a5a5a5a5a5a
> 8450f080   flags

844eed40 b list_entries
846eed60 B nr_list_entries


So a bunch of lockdep stuff, and not much else afaics.

Dave

Re: [4.14-rc7] task struct corruption after fork

2017-10-30 Thread Dave Jones

On Mon, Oct 30, 2017 at 11:59:30AM -0700, Linus Torvalds wrote:

 > and that location would *almost* make sense in that it's the end of
 > the same page that contained a "struct task_struct".
 > 
 > Are you running with VMAP_STACK? Is there perhaps some stale code that
 > ends up doing the old "stack pointer is in the same allocation as task
 > struct"?

yeah, it's enabled.

 > If you have the kernel symbols for that image, can you look up if any
 > of those addresses look like any static kernel symbol addresses? Those
 > things that have the pattern  8xxx might be symbol
 > addresses and give us a clue about where the values came from.

it got clobbered by another build, but I managed to rebuild it from the
older config. Modulo modules that were loaded, things should be the same.


> 81172d1e   r15

81172d00 t usage_match
81172d30 t HARDIRQ_verbose

> 8426daec   r14

841cee60 b lock_classes
844eed00 B nr_lock_classes

> ed008b17e001   r13

> 811737e2   r12

81173540 t __bfs
81173970 t check_noncircular

> 8426dbe0   rbp

841cee60 b lock_classes
844eed00 B nr_lock_classes


> 880458bf0008   rbx
> 84590d00   r11

841cee60 b lock_classes
844eed00 B nr_lock_classes

> 5  r10
> 81172d00   r9

81172d00 t usage_match

> 1  r8
> 11008b17dfed   rax
> 880458bf00f0   rcx
> ed008b17dff9   rdx
> dc00   rsi
> 41b58ab3   rdi
> 82a349a8   orig_eax

828a7f40 R inat_primary_table
82a42840 r POLY

> 81173540   rip

81173540 t __bfs

> 5a5a5a5a5a5a5a5a
> 8450f080   flags

844eed40 b list_entries
846eed60 B nr_list_entries


So a bunch of lockdep stuff, and not much else afaics.

Dave

[4.14-rc7] task struct corruption after fork

2017-10-30 Thread Dave Jones

Something scary for halloween. Only saw this once so far.

[10737.049397] 
=
[10737.052151] BUG task_struct (Not tainted): Padding overwritten. 
0x880458befef8-0x880458beffcf
[10737.055172] 
-
[10737.061267] Disabling lock debugging due to kernel taint
[10737.064384] INFO: Slab 0xea001162fa00 objects=4 used=4 fp=0x  
(null) flags=0x2ffc0008100
[10737.067771] CPU: 2 PID: 26357 Comm: trinity-c13 Tainted: GB   
4.14.0-rc7-think+ #1 
[10737.074807] Call Trace:
[10737.078329]  dump_stack+0xbc/0x145
[10737.081919]  ? dma_virt_map_sg+0xfb/0xfb
[10737.085566]  ? lock_release+0x890/0x890
[10737.089264]  slab_err+0xad/0xd0
[10737.092997]  ? memchr_inv+0x160/0x180
[10737.096769]  slab_pad_check.part.43+0xfa/0x160
[10737.100681]  ? copy_process.part.42+0x101c/0x29e0
[10737.104600]  check_slab+0xa6/0xd0
[10737.108563]  alloc_debug_processing+0x85/0x1b0
[10737.112612]  ___slab_alloc+0x525/0x5d0
[10737.116672]  ? __lock_is_held+0x2e/0xd0
[10737.120810]  ? copy_process.part.42+0x101c/0x29e0
[10737.125019]  ? ___might_sleep.part.69+0x118/0x320
[10737.129267]  ? copy_process.part.42+0x101c/0x29e0
[10737.133556]  ? __slab_alloc+0x3e/0x80
[10737.137803]  __slab_alloc+0x3e/0x80
[10737.142100]  kmem_cache_alloc_node+0xbd/0x360
[10737.146464]  ? copy_process.part.42+0x101c/0x29e0
[10737.150932]  copy_process.part.42+0x101c/0x29e0
[10737.155473]  ? jbd2_buffer_abort_trigger+0x50/0x50
[10737.160040]  ? __might_sleep+0x58/0xe0
[10737.164670]  ? __cleanup_sighand+0x30/0x30
[10737.169308]  ? mark_lock+0x16f/0x9b0
[10737.174016]  ? balance_dirty_pages_ratelimited+0x744/0x10d0
[10737.178868]  ? print_irqtrace_events+0x110/0x110
[10737.183779]  ? mark_lock+0x16f/0x9b0
[10737.188731]  ? print_irqtrace_events+0x110/0x110
[10737.193696]  ? block_write_end+0x150/0x150
[10737.198745]  ? match_held_lock+0xa6/0x410
[10737.203887]  ? save_trace+0x1c0/0x1c0
[10737.209040]  ? native_sched_clock+0xf9/0x1a0
[10737.214255]  ? cyc2ns_read_end+0x10/0x10
[10737.219500]  ? ext4_da_write_end+0x301/0x690
[10737.224771]  ? sched_clock_cpu+0x14/0xf0
[10737.230077]  ? __lock_acquire+0x6b3/0x2050
[10737.235438]  ? sched_clock_cpu+0x14/0xf0
[10737.240833]  ? debug_check_no_locks_freed+0x1a0/0x1a0
[10737.246272]  ? debug_check_no_locks_freed+0x1a0/0x1a0
[10737.251754]  ? lock_downgrade+0x310/0x310
[10737.257255]  ? __lock_page_killable+0x100/0x100
[10737.262801]  ? __mnt_drop_write_file+0x26/0x40
[10737.268432]  ? current_time+0x70/0x70
[10737.274106]  ? fsnotify+0xe99/0x1020
[10737.279744]  ? up_write+0x97/0xe0
[10737.285470]  ? match_held_lock+0x93/0x410
[10737.291287]  ? save_trace+0x1c0/0x1c0
[10737.297097]  ? __fsnotify_update_child_dentry_flags.part.2+0x160/0x160
[10737.303127]  ? native_sched_clock+0xf9/0x1a0
[10737.309193]  ? cyc2ns_read_end+0x10/0x10
[10737.315280]  ? ext4_file_mmap+0xb0/0xb0
[10737.321446]  ? match_held_lock+0x93/0x410
[10737.327627]  ? sched_clock_cpu+0x14/0xf0
[10737.333831]  ? save_trace+0x1c0/0x1c0
[10737.340108]  ? native_sched_clock+0xf9/0x1a0
[10737.346408]  ? cyc2ns_read_end+0x10/0x10
[10737.352788]  _do_fork+0x1c4/0xa30
[10737.359190]  ? fork_idle+0x120/0x120
[10737.365607]  ? lock_downgrade+0x310/0x310
[10737.371992]  ? native_sched_clock+0xf9/0x1a0
[10737.378485]  ? cyc2ns_read_end+0x10/0x10
[10737.385060]  ? syscall_trace_enter+0x2a6/0x670
[10737.391669]  ? exit_to_usermode_loop+0x180/0x180
[10737.398368]  ? __lock_is_held+0x2e/0xd0
[10737.405116]  ? rcu_read_lock_sched_held+0x90/0xa0
[10737.411917]  ? __context_tracking_exit.part.4+0x223/0x290
[10737.418798]  ? context_tracking_recursion_enter+0x50/0x50
[10737.425758]  ? __task_pid_nr_ns+0x1c4/0x300
[10737.432746]  ? free_pidmap.isra.0+0x40/0x40
[10737.439743]  ? SyS_read+0x140/0x140
[10737.446788]  ? mark_held_locks+0x1b/0xa0
[10737.453883]  ? do_syscall_64+0xae/0x400
[10737.461040]  ? ptregs_sys_rt_sigreturn+0x10/0x10
[10737.468260]  do_syscall_64+0x182/0x400
[10737.475508]  ? syscall_return_slowpath+0x270/0x270
[10737.482801]  ? rcu_read_lock_sched_held+0x90/0xa0
[10737.490156]  ? __context_tracking_exit.part.4+0x223/0x290
[10737.497608]  ? mark_held_locks+0x1b/0xa0
[10737.505087]  ? return_from_SYSCALL_64+0x2d/0x7a
[10737.512662]  ? trace_hardirqs_on_caller+0x17a/0x250
[10737.520339]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[10737.527991]  entry_SYSCALL64_slow_path+0x25/0x25
[10737.535684] RIP: 0033:0x7f8917f3837b
[10737.543411] RSP: 002b:7ffdca212e00 EFLAGS: 0246
[10737.551299]  ORIG_RAX: 0038
[10737.559237] RAX: ffda RBX: 7ffdca212e00 RCX: 7f8917f3837b
[10737.567378] RDX:  RSI:  RDI: 01200011
[10737.575550] RBP: 7ffdca212e50 R08: 7f891863a700 R09: 7ffdca3ef080
[10737.583786] R10: 7f891863a9d0 R11: 0246 R12: 
[10737.592129] R13: 0020 R14:

[4.14-rc7] task struct corruption after fork

2017-10-30 Thread Dave Jones

Something scary for halloween. Only saw this once so far.

[10737.049397] 
=
[10737.052151] BUG task_struct (Not tainted): Padding overwritten. 
0x880458befef8-0x880458beffcf
[10737.055172] 
-
[10737.061267] Disabling lock debugging due to kernel taint
[10737.064384] INFO: Slab 0xea001162fa00 objects=4 used=4 fp=0x  
(null) flags=0x2ffc0008100
[10737.067771] CPU: 2 PID: 26357 Comm: trinity-c13 Tainted: GB   
4.14.0-rc7-think+ #1 
[10737.074807] Call Trace:
[10737.078329]  dump_stack+0xbc/0x145
[10737.081919]  ? dma_virt_map_sg+0xfb/0xfb
[10737.085566]  ? lock_release+0x890/0x890
[10737.089264]  slab_err+0xad/0xd0
[10737.092997]  ? memchr_inv+0x160/0x180
[10737.096769]  slab_pad_check.part.43+0xfa/0x160
[10737.100681]  ? copy_process.part.42+0x101c/0x29e0
[10737.104600]  check_slab+0xa6/0xd0
[10737.108563]  alloc_debug_processing+0x85/0x1b0
[10737.112612]  ___slab_alloc+0x525/0x5d0
[10737.116672]  ? __lock_is_held+0x2e/0xd0
[10737.120810]  ? copy_process.part.42+0x101c/0x29e0
[10737.125019]  ? ___might_sleep.part.69+0x118/0x320
[10737.129267]  ? copy_process.part.42+0x101c/0x29e0
[10737.133556]  ? __slab_alloc+0x3e/0x80
[10737.137803]  __slab_alloc+0x3e/0x80
[10737.142100]  kmem_cache_alloc_node+0xbd/0x360
[10737.146464]  ? copy_process.part.42+0x101c/0x29e0
[10737.150932]  copy_process.part.42+0x101c/0x29e0
[10737.155473]  ? jbd2_buffer_abort_trigger+0x50/0x50
[10737.160040]  ? __might_sleep+0x58/0xe0
[10737.164670]  ? __cleanup_sighand+0x30/0x30
[10737.169308]  ? mark_lock+0x16f/0x9b0
[10737.174016]  ? balance_dirty_pages_ratelimited+0x744/0x10d0
[10737.178868]  ? print_irqtrace_events+0x110/0x110
[10737.183779]  ? mark_lock+0x16f/0x9b0
[10737.188731]  ? print_irqtrace_events+0x110/0x110
[10737.193696]  ? block_write_end+0x150/0x150
[10737.198745]  ? match_held_lock+0xa6/0x410
[10737.203887]  ? save_trace+0x1c0/0x1c0
[10737.209040]  ? native_sched_clock+0xf9/0x1a0
[10737.214255]  ? cyc2ns_read_end+0x10/0x10
[10737.219500]  ? ext4_da_write_end+0x301/0x690
[10737.224771]  ? sched_clock_cpu+0x14/0xf0
[10737.230077]  ? __lock_acquire+0x6b3/0x2050
[10737.235438]  ? sched_clock_cpu+0x14/0xf0
[10737.240833]  ? debug_check_no_locks_freed+0x1a0/0x1a0
[10737.246272]  ? debug_check_no_locks_freed+0x1a0/0x1a0
[10737.251754]  ? lock_downgrade+0x310/0x310
[10737.257255]  ? __lock_page_killable+0x100/0x100
[10737.262801]  ? __mnt_drop_write_file+0x26/0x40
[10737.268432]  ? current_time+0x70/0x70
[10737.274106]  ? fsnotify+0xe99/0x1020
[10737.279744]  ? up_write+0x97/0xe0
[10737.285470]  ? match_held_lock+0x93/0x410
[10737.291287]  ? save_trace+0x1c0/0x1c0
[10737.297097]  ? __fsnotify_update_child_dentry_flags.part.2+0x160/0x160
[10737.303127]  ? native_sched_clock+0xf9/0x1a0
[10737.309193]  ? cyc2ns_read_end+0x10/0x10
[10737.315280]  ? ext4_file_mmap+0xb0/0xb0
[10737.321446]  ? match_held_lock+0x93/0x410
[10737.327627]  ? sched_clock_cpu+0x14/0xf0
[10737.333831]  ? save_trace+0x1c0/0x1c0
[10737.340108]  ? native_sched_clock+0xf9/0x1a0
[10737.346408]  ? cyc2ns_read_end+0x10/0x10
[10737.352788]  _do_fork+0x1c4/0xa30
[10737.359190]  ? fork_idle+0x120/0x120
[10737.365607]  ? lock_downgrade+0x310/0x310
[10737.371992]  ? native_sched_clock+0xf9/0x1a0
[10737.378485]  ? cyc2ns_read_end+0x10/0x10
[10737.385060]  ? syscall_trace_enter+0x2a6/0x670
[10737.391669]  ? exit_to_usermode_loop+0x180/0x180
[10737.398368]  ? __lock_is_held+0x2e/0xd0
[10737.405116]  ? rcu_read_lock_sched_held+0x90/0xa0
[10737.411917]  ? __context_tracking_exit.part.4+0x223/0x290
[10737.418798]  ? context_tracking_recursion_enter+0x50/0x50
[10737.425758]  ? __task_pid_nr_ns+0x1c4/0x300
[10737.432746]  ? free_pidmap.isra.0+0x40/0x40
[10737.439743]  ? SyS_read+0x140/0x140
[10737.446788]  ? mark_held_locks+0x1b/0xa0
[10737.453883]  ? do_syscall_64+0xae/0x400
[10737.461040]  ? ptregs_sys_rt_sigreturn+0x10/0x10
[10737.468260]  do_syscall_64+0x182/0x400
[10737.475508]  ? syscall_return_slowpath+0x270/0x270
[10737.482801]  ? rcu_read_lock_sched_held+0x90/0xa0
[10737.490156]  ? __context_tracking_exit.part.4+0x223/0x290
[10737.497608]  ? mark_held_locks+0x1b/0xa0
[10737.505087]  ? return_from_SYSCALL_64+0x2d/0x7a
[10737.512662]  ? trace_hardirqs_on_caller+0x17a/0x250
[10737.520339]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[10737.527991]  entry_SYSCALL64_slow_path+0x25/0x25
[10737.535684] RIP: 0033:0x7f8917f3837b
[10737.543411] RSP: 002b:7ffdca212e00 EFLAGS: 0246
[10737.551299]  ORIG_RAX: 0038
[10737.559237] RAX: ffda RBX: 7ffdca212e00 RCX: 7f8917f3837b
[10737.567378] RDX:  RSI:  RDI: 01200011
[10737.575550] RBP: 7ffdca212e50 R08: 7f891863a700 R09: 7ffdca3ef080
[10737.583786] R10: 7f891863a9d0 R11: 0246 R12: 
[10737.592129] R13: 0020 R14:

[4.14rc6] suspicious nfs rcu dereference

2017-10-24 Thread Dave Jones

WARNING: suspicious RCU usage
4.14.0-rc6-think+ #2 Not tainted
-
net/sunrpc/clnt.c:1206 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


rcu_scheduler_active = 2, debug_locks = 1
2 locks held by kworker/2:0/9104:
 #0: 
 (
"rpciod"
){+.+.}
, at: [] process_one_work+0x66e/0xea0
 #1: 
 (
(>u.tk_work)
){+.+.}
, at: [] process_one_work+0x66e/0xea0

stack backtrace:
CPU: 2 PID: 9104 Comm: kworker/2:0 Not tainted 4.14.0-rc6-think+ #2 
Workqueue: rpciod rpc_async_schedule [sunrpc]
Call Trace:
 dump_stack+0xbc/0x145
 ? dma_virt_map_sg+0xfb/0xfb
 ? lockdep_rcu_suspicious+0xda/0x100
 rpc_peeraddr2str+0x11a/0x130 [sunrpc]
 ? call_start+0x1e0/0x1e0 [sunrpc]
 perf_trace_nfs4_clientid_event+0xde/0x420 [nfsv4]
 ? do_raw_spin_unlock+0x147/0x220
 ? save_trace+0x1c0/0x1c0
 ? perf_trace_nfs4_open_event+0x5f0/0x5f0 [nfsv4]
 ? nfs41_sequence_process+0xba/0x5a0 [nfsv4]
 ? _raw_spin_unlock+0x24/0x30
 ? nfs41_sequence_free_slot.isra.47+0x143/0x230 [nfsv4]
 ? __lock_is_held+0x51/0xd0
 nfs41_sequence_call_done+0x29a/0x430 [nfsv4]
 ? perf_trace_nfs4_open_event+0x5f0/0x5f0 [nfsv4]
 ? nfs4_proc_unlink_done+0xb0/0xb0 [nfsv4]
 ? __internal_add_timer+0x11b/0x170
 ? call_connect_status+0x490/0x490 [sunrpc]
 ? __lock_is_held+0x51/0xd0
 ? call_decode+0x33f/0xdd0 [sunrpc]
 ? nfs4_proc_unlink_done+0xb0/0xb0 [nfsv4]
 ? rpc_make_runnable+0x180/0x180 [sunrpc]
 rpc_exit_task+0x61/0x100 [sunrpc]
 ? rpc_make_runnable+0x180/0x180 [sunrpc]
 __rpc_execute+0x1c8/0x9e0 [sunrpc]
 ? rpc_wake_up_queued_task+0x40/0x40 [sunrpc]
 ? lock_downgrade+0x310/0x310
 ? match_held_lock+0xa6/0x410
 ? sched_clock_cpu+0x14/0xf0
 ? sched_clock_cpu+0x14/0xf0
 ? save_trace+0x1c0/0x1c0
 ? lock_acquire+0x12e/0x350
 ? lock_acquire+0x12e/0x350
 ? process_one_work+0x66e/0xea0
 ? lock_release+0x890/0x890
 ? do_raw_spin_trylock+0x100/0x100
 ? __lock_is_held+0x51/0xd0
 process_one_work+0x766/0xea0
 ? pwq_dec_nr_in_flight+0x1e0/0x1e0
 ? preempt_notifier_dec+0x20/0x20
 ? __schedule+0x5cc/0x1310
 ? __sched_text_start+0x8/0x8
 ? match_held_lock+0x93/0x410
 ? save_trace+0x1c0/0x1c0
 ? native_sched_clock+0xf9/0x1a0
 ? cyc2ns_read_end+0x10/0x10
 ? cyc2ns_read_end+0x10/0x10
 ? find_held_lock+0x74/0xd0
 ? lock_contended+0x790/0x790
 ? lock_release+0x890/0x890
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? retint_kernel+0x10/0x10
 ? do_raw_spin_trylock+0xb3/0x100
 ? do_raw_spin_lock+0x110/0x110
 ? mark_held_locks+0x1b/0xa0
 worker_thread+0x1cf/0xcf0
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? process_one_work+0xea0/0xea0
 ? get_vtime_delta+0x13/0x80
 ? mark_held_locks+0x1b/0xa0
 ? trace_hardirqs_on_caller+0x17a/0x250
 ? _raw_spin_unlock_irq+0x29/0x40
 ? finish_task_switch+0x183/0x470
 ? finish_task_switch+0x101/0x470
 ? preempt_notifier_dec+0x20/0x20
 ? __schedule+0x5cc/0x1310
 ? try_to_wake_up+0xe7/0xbb0
 ? save_stack+0x32/0xb0
 ? kasan_kmalloc+0xa0/0xd0
 ? native_sched_clock+0xf9/0x1a0
 ? ret_from_fork+0x27/0x40
 ? sched_clock_cpu+0x14/0xf0
 ? sched_clock_cpu+0x14/0xf0
 ? lock_downgrade+0x310/0x310
 ? __schedule+0x1310/0x1310
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? do_raw_spin_lock+0x110/0x110
 ? __init_waitqueue_head+0xbe/0xf0
 ? mark_held_locks+0x1b/0xa0
 ? _raw_spin_unlock_irqrestore+0x32/0x50
 ? process_one_work+0xea0/0xea0
 kthread+0x1c9/0x1f0
 ? kthread_create_on_node+0xc0/0xc0
 ret_from_fork+0x27/0x40

[4.14rc6] suspicious nfs rcu dereference

2017-10-24 Thread Dave Jones

WARNING: suspicious RCU usage
4.14.0-rc6-think+ #2 Not tainted
-
net/sunrpc/clnt.c:1206 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


rcu_scheduler_active = 2, debug_locks = 1
2 locks held by kworker/2:0/9104:
 #0: 
 (
"rpciod"
){+.+.}
, at: [] process_one_work+0x66e/0xea0
 #1: 
 (
(>u.tk_work)
){+.+.}
, at: [] process_one_work+0x66e/0xea0

stack backtrace:
CPU: 2 PID: 9104 Comm: kworker/2:0 Not tainted 4.14.0-rc6-think+ #2 
Workqueue: rpciod rpc_async_schedule [sunrpc]
Call Trace:
 dump_stack+0xbc/0x145
 ? dma_virt_map_sg+0xfb/0xfb
 ? lockdep_rcu_suspicious+0xda/0x100
 rpc_peeraddr2str+0x11a/0x130 [sunrpc]
 ? call_start+0x1e0/0x1e0 [sunrpc]
 perf_trace_nfs4_clientid_event+0xde/0x420 [nfsv4]
 ? do_raw_spin_unlock+0x147/0x220
 ? save_trace+0x1c0/0x1c0
 ? perf_trace_nfs4_open_event+0x5f0/0x5f0 [nfsv4]
 ? nfs41_sequence_process+0xba/0x5a0 [nfsv4]
 ? _raw_spin_unlock+0x24/0x30
 ? nfs41_sequence_free_slot.isra.47+0x143/0x230 [nfsv4]
 ? __lock_is_held+0x51/0xd0
 nfs41_sequence_call_done+0x29a/0x430 [nfsv4]
 ? perf_trace_nfs4_open_event+0x5f0/0x5f0 [nfsv4]
 ? nfs4_proc_unlink_done+0xb0/0xb0 [nfsv4]
 ? __internal_add_timer+0x11b/0x170
 ? call_connect_status+0x490/0x490 [sunrpc]
 ? __lock_is_held+0x51/0xd0
 ? call_decode+0x33f/0xdd0 [sunrpc]
 ? nfs4_proc_unlink_done+0xb0/0xb0 [nfsv4]
 ? rpc_make_runnable+0x180/0x180 [sunrpc]
 rpc_exit_task+0x61/0x100 [sunrpc]
 ? rpc_make_runnable+0x180/0x180 [sunrpc]
 __rpc_execute+0x1c8/0x9e0 [sunrpc]
 ? rpc_wake_up_queued_task+0x40/0x40 [sunrpc]
 ? lock_downgrade+0x310/0x310
 ? match_held_lock+0xa6/0x410
 ? sched_clock_cpu+0x14/0xf0
 ? sched_clock_cpu+0x14/0xf0
 ? save_trace+0x1c0/0x1c0
 ? lock_acquire+0x12e/0x350
 ? lock_acquire+0x12e/0x350
 ? process_one_work+0x66e/0xea0
 ? lock_release+0x890/0x890
 ? do_raw_spin_trylock+0x100/0x100
 ? __lock_is_held+0x51/0xd0
 process_one_work+0x766/0xea0
 ? pwq_dec_nr_in_flight+0x1e0/0x1e0
 ? preempt_notifier_dec+0x20/0x20
 ? __schedule+0x5cc/0x1310
 ? __sched_text_start+0x8/0x8
 ? match_held_lock+0x93/0x410
 ? save_trace+0x1c0/0x1c0
 ? native_sched_clock+0xf9/0x1a0
 ? cyc2ns_read_end+0x10/0x10
 ? cyc2ns_read_end+0x10/0x10
 ? find_held_lock+0x74/0xd0
 ? lock_contended+0x790/0x790
 ? lock_release+0x890/0x890
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? retint_kernel+0x10/0x10
 ? do_raw_spin_trylock+0xb3/0x100
 ? do_raw_spin_lock+0x110/0x110
 ? mark_held_locks+0x1b/0xa0
 worker_thread+0x1cf/0xcf0
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? process_one_work+0xea0/0xea0
 ? get_vtime_delta+0x13/0x80
 ? mark_held_locks+0x1b/0xa0
 ? trace_hardirqs_on_caller+0x17a/0x250
 ? _raw_spin_unlock_irq+0x29/0x40
 ? finish_task_switch+0x183/0x470
 ? finish_task_switch+0x101/0x470
 ? preempt_notifier_dec+0x20/0x20
 ? __schedule+0x5cc/0x1310
 ? try_to_wake_up+0xe7/0xbb0
 ? save_stack+0x32/0xb0
 ? kasan_kmalloc+0xa0/0xd0
 ? native_sched_clock+0xf9/0x1a0
 ? ret_from_fork+0x27/0x40
 ? sched_clock_cpu+0x14/0xf0
 ? sched_clock_cpu+0x14/0xf0
 ? lock_downgrade+0x310/0x310
 ? __schedule+0x1310/0x1310
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? do_raw_spin_lock+0x110/0x110
 ? __init_waitqueue_head+0xbe/0xf0
 ? mark_held_locks+0x1b/0xa0
 ? _raw_spin_unlock_irqrestore+0x32/0x50
 ? process_one_work+0xea0/0xea0
 kthread+0x1c9/0x1f0
 ? kthread_create_on_node+0xc0/0xc0
 ret_from_fork+0x27/0x40

Re: out of bounds strscpy from seccomp_actions_logged_handler

2017-10-24 Thread Dave Jones

On Tue, Oct 24, 2017 at 06:54:25PM -0500, Tyler Hicks wrote:
 > On 10/24/2017 06:46 PM, Dave Jones wrote:
 > > (Triggered with trinity, but it seems just a 'cat
 > > /proc/sys/kernel/seccomp/actions_logged' reproduces just as easily).
 > 
 > Hi Dave - Thanks for the report. This is a false positive that was
 > previously discussed here:
 > 
 > https://lkml.kernel.org/r/<20171010182805.52b9b...@cakuba.netronome.com>
 
Bah, I thought this smelled familiar.  I'll just roll Andrey's
workaround diff into my builds for fuzzing runs until someone figures
out something better.

Dave

Re: out of bounds strscpy from seccomp_actions_logged_handler

2017-10-24 Thread Dave Jones

On Tue, Oct 24, 2017 at 06:54:25PM -0500, Tyler Hicks wrote:
 > On 10/24/2017 06:46 PM, Dave Jones wrote:
 > > (Triggered with trinity, but it seems just a 'cat
 > > /proc/sys/kernel/seccomp/actions_logged' reproduces just as easily).
 > 
 > Hi Dave - Thanks for the report. This is a false positive that was
 > previously discussed here:
 > 
 > https://lkml.kernel.org/r/<20171010182805.52b9b...@cakuba.netronome.com>
 
Bah, I thought this smelled familiar.  I'll just roll Andrey's
workaround diff into my builds for fuzzing runs until someone figures
out something better.

Dave

out of bounds strscpy from seccomp_actions_logged_handler

2017-10-24 Thread Dave Jones

(Triggered with trinity, but it seems just a 'cat
/proc/sys/kernel/seccomp/actions_logged' reproduces just as easily).


BUG: KASAN: global-out-of-bounds in strscpy+0x133/0x2d0
Read of size 8 at addr 824b0028 by task trinity-c63/6883

CPU: 3 PID: 6883 Comm: trinity-c63 Not tainted 4.14.0-rc6-think+ #1 
Call Trace:
 dump_stack+0xbc/0x145
 ? dma_virt_map_sg+0xfb/0xfb
 print_address_description+0x2d/0x260
 kasan_report+0x277/0x360
 ? strscpy+0x133/0x2d0
 strscpy+0x133/0x2d0
 ? strcasecmp+0xb0/0xb0
 seccomp_actions_logged_handler+0x2c5/0x440
 ? seccomp_send_sigsys+0xd0/0xd0
 ? lock_downgrade+0x310/0x310
 ? lock_release+0x890/0x890
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? do_raw_spin_trylock+0x40/0x100
 ? do_raw_spin_lock+0x110/0x110
 proc_sys_call_handler+0x1b1/0x1f0
 ? seccomp_send_sigsys+0xd0/0xd0
 ? proc_sys_readdir+0x6d0/0x6d0
 do_iter_read+0x23b/0x280
 vfs_readv+0x107/0x180
 ? compat_rw_copy_check_uvector+0x1d0/0x1d0
 ? native_sched_clock+0xf9/0x1a0
 ? cyc2ns_read_end+0x10/0x10
 ? __fget_light+0x181/0x200
 ? fget_raw+0x10/0x10
 ? __lock_is_held+0x2e/0xd0
 ? rcu_read_lock_sched_held+0x90/0xa0
 ? __context_tracking_exit.part.4+0x223/0x290
 ? context_tracking_recursion_enter+0x50/0x50
 ? __task_pid_nr_ns+0x1c4/0x300
 ? do_preadv+0xb0/0xf0
 do_preadv+0xb0/0xf0
 ? SyS_preadv+0x10/0x10
 do_syscall_64+0x182/0x400
 ? syscall_return_slowpath+0x270/0x270
 ? rcu_read_lock_sched_held+0x90/0xa0
 ? __context_tracking_exit.part.4+0x223/0x290
 ? mark_held_locks+0x1b/0xa0
 ? return_from_SYSCALL_64+0x2d/0x7a
 ? trace_hardirqs_on_caller+0x17a/0x250
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f52d5f45219
RSP: 002b:7fff8a422838 EFLAGS: 0246
 ORIG_RAX: 0147
RAX: ffda RBX: 0147 RCX: 7f52d5f45219
RDX: 00f3 RSI: 55d6d5b413d0 RDI: 00b2
RBP: 7fff8a4228e0 R08: 316c1272491c R09: 
R10: 725c3dd7 R11: 0246 R12: 0002
R13: 7f52d645b058 R14: 7f52d661b698 R15: 7f52d645b000

The buggy address belongs to the variable:
 kdb_rwtypes+0x1268/0x1320

Memory state around the buggy address:
 824aff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 824aff80: 00 00 00 00 00 00 00 00 00 00 00 00 07 fa fa fa
>824b: fa fa fa fa 00 05 fa fa fa fa fa fa 02 fa fa fa
  ^
 824b0080: fa fa fa fa 00 00 01 fa fa fa fa fa 00 00 04 fa
 824b0100: fa fa fa fa 00 06 fa fa fa fa fa fa 00 07 fa fa
==
Disabling lock debugging due to kernel taint

out of bounds strscpy from seccomp_actions_logged_handler

2017-10-24 Thread Dave Jones

(Triggered with trinity, but it seems just a 'cat
/proc/sys/kernel/seccomp/actions_logged' reproduces just as easily).


BUG: KASAN: global-out-of-bounds in strscpy+0x133/0x2d0
Read of size 8 at addr 824b0028 by task trinity-c63/6883

CPU: 3 PID: 6883 Comm: trinity-c63 Not tainted 4.14.0-rc6-think+ #1 
Call Trace:
 dump_stack+0xbc/0x145
 ? dma_virt_map_sg+0xfb/0xfb
 print_address_description+0x2d/0x260
 kasan_report+0x277/0x360
 ? strscpy+0x133/0x2d0
 strscpy+0x133/0x2d0
 ? strcasecmp+0xb0/0xb0
 seccomp_actions_logged_handler+0x2c5/0x440
 ? seccomp_send_sigsys+0xd0/0xd0
 ? lock_downgrade+0x310/0x310
 ? lock_release+0x890/0x890
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? do_raw_spin_trylock+0x40/0x100
 ? do_raw_spin_lock+0x110/0x110
 proc_sys_call_handler+0x1b1/0x1f0
 ? seccomp_send_sigsys+0xd0/0xd0
 ? proc_sys_readdir+0x6d0/0x6d0
 do_iter_read+0x23b/0x280
 vfs_readv+0x107/0x180
 ? compat_rw_copy_check_uvector+0x1d0/0x1d0
 ? native_sched_clock+0xf9/0x1a0
 ? cyc2ns_read_end+0x10/0x10
 ? __fget_light+0x181/0x200
 ? fget_raw+0x10/0x10
 ? __lock_is_held+0x2e/0xd0
 ? rcu_read_lock_sched_held+0x90/0xa0
 ? __context_tracking_exit.part.4+0x223/0x290
 ? context_tracking_recursion_enter+0x50/0x50
 ? __task_pid_nr_ns+0x1c4/0x300
 ? do_preadv+0xb0/0xf0
 do_preadv+0xb0/0xf0
 ? SyS_preadv+0x10/0x10
 do_syscall_64+0x182/0x400
 ? syscall_return_slowpath+0x270/0x270
 ? rcu_read_lock_sched_held+0x90/0xa0
 ? __context_tracking_exit.part.4+0x223/0x290
 ? mark_held_locks+0x1b/0xa0
 ? return_from_SYSCALL_64+0x2d/0x7a
 ? trace_hardirqs_on_caller+0x17a/0x250
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f52d5f45219
RSP: 002b:7fff8a422838 EFLAGS: 0246
 ORIG_RAX: 0147
RAX: ffda RBX: 0147 RCX: 7f52d5f45219
RDX: 00f3 RSI: 55d6d5b413d0 RDI: 00b2
RBP: 7fff8a4228e0 R08: 316c1272491c R09: 
R10: 725c3dd7 R11: 0246 R12: 0002
R13: 7f52d645b058 R14: 7f52d661b698 R15: 7f52d645b000

The buggy address belongs to the variable:
 kdb_rwtypes+0x1268/0x1320

Memory state around the buggy address:
 824aff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 824aff80: 00 00 00 00 00 00 00 00 00 00 00 00 07 fa fa fa
>824b: fa fa fa fa 00 05 fa fa fa fa fa fa 02 fa fa fa
  ^
 824b0080: fa fa fa fa 00 00 01 fa fa fa fa fa 00 00 04 fa
 824b0100: fa fa fa fa 00 06 fa fa fa fa fa fa 00 07 fa fa
==
Disabling lock debugging due to kernel taint

[4.14rc5] corrupted stack end detected inside scheduler

2017-10-16 Thread Dave Jones

Just hit this fairly quickly by fuzzing writev calls.
Attempting to reproduce, but so far only seeing floods of page allocation 
stalls.

Kernel panic - not syncing: corrupted stack end detected inside scheduler\x0a
CPU: 1 PID: 2531 Comm: kworker/u8:4 Not tainted 4.14.0-rc5-think+ #1 
Workqueue: writeback wb_workfn (flush-8:16)
Call Trace:
 dump_stack+0xbc/0x145
 ? dma_virt_map_sg+0xfb/0xfb
 ? sched_clock_cpu+0x14/0xf0
 ? vsnprintf+0x331/0x7e0
 panic+0x14e/0x2b5
 ? __warn+0x12b/0x12b
 ? __schedule+0x111/0x1310
 __schedule+0x12fd/0x1310
 ? isolate_lru_page+0x890/0x890
 ? __sched_text_start+0x8/0x8
 ? blk_init_request_from_bio+0x150/0x150
 ? lock_downgrade+0x310/0x310
 ? lock_release+0x890/0x890
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? mark_held_locks+0x70/0xa0
 ? _raw_spin_unlock_irqrestore+0x32/0x50
 schedule+0xc3/0x260
 ? __schedule+0x1310/0x1310
 ? __wake_up_locked_key_bookmark+0x20/0x20
 ? match_held_lock+0x93/0x410
 ? match_held_lock+0x93/0x410
 ? save_trace+0x1c0/0x1c0
 ? save_trace+0x1c0/0x1c0
 io_schedule+0x1c/0x50
 wbt_wait+0x45a/0x7f0
 ? wbt_update_limits+0x40/0x40
 ? sched_clock_cpu+0x14/0xf0
 ? lock_downgrade+0x310/0x310
 ? finish_wait+0x200/0x200
 ? elv_rb_find+0x32/0x60
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? blk_mq_sched_try_merge+0x74/0x250
 ? init_emergency_isa_pool+0x50/0x50
 ? _raw_spin_unlock+0x24/0x30
 ? dd_bio_merge+0xd3/0x120
 ? save_trace+0x1c0/0x1c0
 ? __blk_mq_sched_bio_merge+0x106/0x350
 blk_mq_make_request+0x298/0x1160
 ? __blk_mq_insert_request+0x4c0/0x4c0
 ? cyc2ns_read_end+0x10/0x10
 ? sched_clock_cpu+0x14/0xf0
 ? sched_clock_cpu+0x14/0xf0
 ? lock_downgrade+0x310/0x310
 ? lock_release+0x890/0x890
 ? rcu_is_watching+0x88/0xd0
 ? blk_queue_enter+0x188/0x4e0
 ? blk_exit_rl+0x40/0x40
 ? lock_page_memcg+0xf6/0x240
 ? rcu_is_watching+0x88/0xd0
 ? rcutorture_record_progress+0x10/0x10
 ? lock_release+0x890/0x890
 ? do_raw_spin_unlock+0x147/0x220
 ? __test_set_page_writeback+0x45f/0x950
 ? mark_held_locks+0x70/0xa0
 ? _raw_spin_unlock_irqrestore+0x32/0x50
 ? balance_dirty_pages_ratelimited+0x10d0/0x10d0
 ? mempool_alloc+0x1d6/0x2f0
 generic_make_request+0x316/0x7f0
 ? bio_add_page+0x140/0x140
 ? blk_queue_enter+0x4e0/0x4e0
 ? debug_check_no_locks_freed+0x1a0/0x1a0
 ? bio_alloc_bioset+0x1e8/0x3b0
 ? bvec_alloc+0x160/0x160
 ? cyc2ns_read_end+0x10/0x10
 ? match_held_lock+0x93/0x410
 ? bio_add_page+0xdb/0x140
 ? submit_bio+0xe1/0x270
 submit_bio+0xe1/0x270
 ? wake_up_page_bit+0x300/0x300
 ? generic_make_request+0x7f0/0x7f0
 ? __lock_acquire+0x6b3/0x2050
 ? lock_release+0x890/0x890
 ? bdev_write_page+0x50/0x160
 __swap_writepage+0x3c6/0xb20
 ? SyS_madvise+0xf60/0xf60
 ? generic_swapfile_activate+0x2b0/0x2b0
 ? lock_downgrade+0x310/0x310
 ? lock_release+0x890/0x890
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? do_raw_spin_trylock+0xb0/0x100
 ? do_raw_spin_lock+0x110/0x110
 ? _raw_spin_unlock+0x24/0x30
 ? page_swapcount+0x9f/0xc0
 ? page_swapped+0x179/0x190
 ? page_trans_huge_map_swapcount+0x700/0x700
 ? save_trace+0x1c0/0x1c0
 ? sched_clock_cpu+0x14/0xf0
 ? sched_clock_cpu+0x14/0xf0
 ? try_to_free_swap+0x264/0x330
 ? reuse_swap_page+0x560/0x560
 ? lock_downgrade+0x310/0x310
 ? clear_page_dirty_for_io+0x1a9/0x5a0
 ? redirty_page_for_writepage+0x40/0x40
 ? ___might_sleep.part.69+0x118/0x320
 ? cyc2ns_read_end+0x10/0x10
 ? page_remove_rmap+0x690/0x690
 ? up_read+0x1c/0x40
 pageout.isra.54+0x520/0xb50
 ? move_active_pages_to_lru+0x920/0x920
 ? do_raw_spin_unlock+0x147/0x220
 ? mark_held_locks+0x70/0xa0
 ? page_mapping+0x274/0x2b0
 ? kstrndup+0x90/0x90
 ? __add_to_swap_cache+0x63a/0x710
 ? swap_readpage+0x610/0x610
 ? swap_set_page_dirty+0x1dd/0x1f0
 ? swap_readpage+0x610/0x610
 ? show_swap_cache_info+0x130/0x130
 ? wait_for_completion+0x3e0/0x3e0
 ? rmap_walk+0x175/0x190
 ? __anon_vma_prepare+0x360/0x360
 ? set_page_dirty+0x1a7/0x380
 ? __writepage+0x80/0x80
 ? __anon_vma_prepare+0x360/0x360
 ? drop_buffers+0x2a0/0x2a0
 ? page_rmapping+0x9c/0xd0
 ? try_to_unmap+0x34c/0x3a0
 ? rmap_walk_locked+0x190/0x190
 ? free_swap_slot+0x150/0x150
 ? page_remove_rmap+0x690/0x690
 ? rcu_read_unlock+0x60/0x60
 ? page_get_anon_vma+0x2c0/0x2c0
 ? mem_cgroup_swapout+0x4a0/0x4a0
 ? page_mapping+0x274/0x2b0
 ? kstrndup+0x90/0x90
 ? page_get_anon_vma+0x2c0/0x2c0
 ? add_to_swap+0x1ae/0x1d0
 ? __delete_from_swap_cache+0x4b0/0x4b0
 ? page_evictable+0xcc/0x110
 shrink_page_list+0x242b/0x2cc0
 ? putback_lru_page+0x430/0x430
 ? native_flush_tlb_others+0x480/0x480
 ? mark_lock+0x16f/0x9b0
 ? mark_lock+0x16f/0x9b0
 ? print_irqtrace_events+0x110/0x110
 ? make_huge_pte+0xa0/0xa0
 ? ptep_clear_flush+0xf7/0x140
 ? pmd_clear_bad+0x40/0x40
 ? mark_lock+0x16f/0x9b0
 ? _find_next_bit+0x30/0xb0
 ? print_irqtrace_events+0x110/0x110
 ? try_to_unmap_one+0x10ff/0x14b0
 ? match_held_lock+0x93/0x410
 ? native_sched_clock+0xf9/0x1a0
 ? match_held_lock+0x93/0x410
 ? save_trace+0x1c0/0x1c0
 ? save_trace+0x1c0/0x1c0
 ? native_sched_clock+0xf9/0x1a0
 ?

[4.14rc5] corrupted stack end detected inside scheduler

2017-10-16 Thread Dave Jones

Just hit this fairly quickly by fuzzing writev calls.
Attempting to reproduce, but so far only seeing floods of page allocation 
stalls.

Kernel panic - not syncing: corrupted stack end detected inside scheduler\x0a
CPU: 1 PID: 2531 Comm: kworker/u8:4 Not tainted 4.14.0-rc5-think+ #1 
Workqueue: writeback wb_workfn (flush-8:16)
Call Trace:
 dump_stack+0xbc/0x145
 ? dma_virt_map_sg+0xfb/0xfb
 ? sched_clock_cpu+0x14/0xf0
 ? vsnprintf+0x331/0x7e0
 panic+0x14e/0x2b5
 ? __warn+0x12b/0x12b
 ? __schedule+0x111/0x1310
 __schedule+0x12fd/0x1310
 ? isolate_lru_page+0x890/0x890
 ? __sched_text_start+0x8/0x8
 ? blk_init_request_from_bio+0x150/0x150
 ? lock_downgrade+0x310/0x310
 ? lock_release+0x890/0x890
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? mark_held_locks+0x70/0xa0
 ? _raw_spin_unlock_irqrestore+0x32/0x50
 schedule+0xc3/0x260
 ? __schedule+0x1310/0x1310
 ? __wake_up_locked_key_bookmark+0x20/0x20
 ? match_held_lock+0x93/0x410
 ? match_held_lock+0x93/0x410
 ? save_trace+0x1c0/0x1c0
 ? save_trace+0x1c0/0x1c0
 io_schedule+0x1c/0x50
 wbt_wait+0x45a/0x7f0
 ? wbt_update_limits+0x40/0x40
 ? sched_clock_cpu+0x14/0xf0
 ? lock_downgrade+0x310/0x310
 ? finish_wait+0x200/0x200
 ? elv_rb_find+0x32/0x60
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? blk_mq_sched_try_merge+0x74/0x250
 ? init_emergency_isa_pool+0x50/0x50
 ? _raw_spin_unlock+0x24/0x30
 ? dd_bio_merge+0xd3/0x120
 ? save_trace+0x1c0/0x1c0
 ? __blk_mq_sched_bio_merge+0x106/0x350
 blk_mq_make_request+0x298/0x1160
 ? __blk_mq_insert_request+0x4c0/0x4c0
 ? cyc2ns_read_end+0x10/0x10
 ? sched_clock_cpu+0x14/0xf0
 ? sched_clock_cpu+0x14/0xf0
 ? lock_downgrade+0x310/0x310
 ? lock_release+0x890/0x890
 ? rcu_is_watching+0x88/0xd0
 ? blk_queue_enter+0x188/0x4e0
 ? blk_exit_rl+0x40/0x40
 ? lock_page_memcg+0xf6/0x240
 ? rcu_is_watching+0x88/0xd0
 ? rcutorture_record_progress+0x10/0x10
 ? lock_release+0x890/0x890
 ? do_raw_spin_unlock+0x147/0x220
 ? __test_set_page_writeback+0x45f/0x950
 ? mark_held_locks+0x70/0xa0
 ? _raw_spin_unlock_irqrestore+0x32/0x50
 ? balance_dirty_pages_ratelimited+0x10d0/0x10d0
 ? mempool_alloc+0x1d6/0x2f0
 generic_make_request+0x316/0x7f0
 ? bio_add_page+0x140/0x140
 ? blk_queue_enter+0x4e0/0x4e0
 ? debug_check_no_locks_freed+0x1a0/0x1a0
 ? bio_alloc_bioset+0x1e8/0x3b0
 ? bvec_alloc+0x160/0x160
 ? cyc2ns_read_end+0x10/0x10
 ? match_held_lock+0x93/0x410
 ? bio_add_page+0xdb/0x140
 ? submit_bio+0xe1/0x270
 submit_bio+0xe1/0x270
 ? wake_up_page_bit+0x300/0x300
 ? generic_make_request+0x7f0/0x7f0
 ? __lock_acquire+0x6b3/0x2050
 ? lock_release+0x890/0x890
 ? bdev_write_page+0x50/0x160
 __swap_writepage+0x3c6/0xb20
 ? SyS_madvise+0xf60/0xf60
 ? generic_swapfile_activate+0x2b0/0x2b0
 ? lock_downgrade+0x310/0x310
 ? lock_release+0x890/0x890
 ? do_raw_spin_unlock+0x147/0x220
 ? do_raw_spin_trylock+0x100/0x100
 ? do_raw_spin_trylock+0xb0/0x100
 ? do_raw_spin_lock+0x110/0x110
 ? _raw_spin_unlock+0x24/0x30
 ? page_swapcount+0x9f/0xc0
 ? page_swapped+0x179/0x190
 ? page_trans_huge_map_swapcount+0x700/0x700
 ? save_trace+0x1c0/0x1c0
 ? sched_clock_cpu+0x14/0xf0
 ? sched_clock_cpu+0x14/0xf0
 ? try_to_free_swap+0x264/0x330
 ? reuse_swap_page+0x560/0x560
 ? lock_downgrade+0x310/0x310
 ? clear_page_dirty_for_io+0x1a9/0x5a0
 ? redirty_page_for_writepage+0x40/0x40
 ? ___might_sleep.part.69+0x118/0x320
 ? cyc2ns_read_end+0x10/0x10
 ? page_remove_rmap+0x690/0x690
 ? up_read+0x1c/0x40
 pageout.isra.54+0x520/0xb50
 ? move_active_pages_to_lru+0x920/0x920
 ? do_raw_spin_unlock+0x147/0x220
 ? mark_held_locks+0x70/0xa0
 ? page_mapping+0x274/0x2b0
 ? kstrndup+0x90/0x90
 ? __add_to_swap_cache+0x63a/0x710
 ? swap_readpage+0x610/0x610
 ? swap_set_page_dirty+0x1dd/0x1f0
 ? swap_readpage+0x610/0x610
 ? show_swap_cache_info+0x130/0x130
 ? wait_for_completion+0x3e0/0x3e0
 ? rmap_walk+0x175/0x190
 ? __anon_vma_prepare+0x360/0x360
 ? set_page_dirty+0x1a7/0x380
 ? __writepage+0x80/0x80
 ? __anon_vma_prepare+0x360/0x360
 ? drop_buffers+0x2a0/0x2a0
 ? page_rmapping+0x9c/0xd0
 ? try_to_unmap+0x34c/0x3a0
 ? rmap_walk_locked+0x190/0x190
 ? free_swap_slot+0x150/0x150
 ? page_remove_rmap+0x690/0x690
 ? rcu_read_unlock+0x60/0x60
 ? page_get_anon_vma+0x2c0/0x2c0
 ? mem_cgroup_swapout+0x4a0/0x4a0
 ? page_mapping+0x274/0x2b0
 ? kstrndup+0x90/0x90
 ? page_get_anon_vma+0x2c0/0x2c0
 ? add_to_swap+0x1ae/0x1d0
 ? __delete_from_swap_cache+0x4b0/0x4b0
 ? page_evictable+0xcc/0x110
 shrink_page_list+0x242b/0x2cc0
 ? putback_lru_page+0x430/0x430
 ? native_flush_tlb_others+0x480/0x480
 ? mark_lock+0x16f/0x9b0
 ? mark_lock+0x16f/0x9b0
 ? print_irqtrace_events+0x110/0x110
 ? make_huge_pte+0xa0/0xa0
 ? ptep_clear_flush+0xf7/0x140
 ? pmd_clear_bad+0x40/0x40
 ? mark_lock+0x16f/0x9b0
 ? _find_next_bit+0x30/0xb0
 ? print_irqtrace_events+0x110/0x110
 ? try_to_unmap_one+0x10ff/0x14b0
 ? match_held_lock+0x93/0x410
 ? native_sched_clock+0xf9/0x1a0
 ? match_held_lock+0x93/0x410
 ? save_trace+0x1c0/0x1c0
 ? save_trace+0x1c0/0x1c0
 ? native_sched_clock+0xf9/0x1a0
 ?

Re: WARN_ON_ONCE in fs/iomap.c:993

2017-09-11 Thread Dave Jones

On Mon, Sep 11, 2017 at 06:56:05AM -0400, Shankara Pailoor wrote:
 > Hi,
 > 
 > I am fuzzing linux 4.13-rc7 with XFS using syzkaller on x86_64 and I
 > found the following warning:
 > 
 > WARNING: CPU: 2 PID: 5391 at fs/iomap.c:993 iomap_dio_rw+0xc79/0xe70
 > 
 > Here is a reproducer program: https://pastebin.com/tc014k97

pwrite in one thread, sendfile on another. Same thing trinity has been hitting.
 
See thread "Subject: Re: iov_iter_pipe warning".

Dave

Re: WARN_ON_ONCE in fs/iomap.c:993

2017-09-11 Thread Dave Jones

On Mon, Sep 11, 2017 at 06:56:05AM -0400, Shankara Pailoor wrote:
 > Hi,
 > 
 > I am fuzzing linux 4.13-rc7 with XFS using syzkaller on x86_64 and I
 > found the following warning:
 > 
 > WARNING: CPU: 2 PID: 5391 at fs/iomap.c:993 iomap_dio_rw+0xc79/0xe70
 > 
 > Here is a reproducer program: https://pastebin.com/tc014k97

pwrite in one thread, sendfile on another. Same thing trinity has been hitting.
 
See thread "Subject: Re: iov_iter_pipe warning".

Dave

Re: iov_iter_pipe warning.

2017-09-10 Thread Dave Jones

On Sun, Sep 10, 2017 at 09:05:48PM +0100, Al Viro wrote:
 > On Sun, Sep 10, 2017 at 12:07:10PM -0400, Dave Jones wrote:
 > > On Sun, Sep 10, 2017 at 03:57:21AM +0100, Al Viro wrote:
 > >  > On Sat, Sep 09, 2017 at 09:07:56PM -0400, Dave Jones wrote:
 > >  > 
 > >  > > With this in place, I'm still seeing -EBUSY from 
 > > invalidate_inode_pages2_range
 > >  > > which doesn't end well...
 > >  > 
 > >  > Different issue, and I'm not sure why that WARN_ON() is there in the
 > >  > first place.  Note that in a similar situation 
 > > generic_file_direct_write()
 > >  > simply buggers off and lets the caller do buffered write...
 > >  > 
 > >  > iov_iter_pipe() warning is a sign of ->read_iter() on pipe-backed 
 > > iov_iter
 > >  > putting into the pipe more than it claims to have done.
 > > 
 > > (from a rerun after hitting that EBUSY warn; hence the taint)
 > > 
 > > WARNING: CPU: 0 PID: 14154 at fs/iomap.c:1055 iomap_dio_rw+0x78e/0x840
 > 
 > ... and that's another invalidate_inode_pages2_range() in the same
 > sucker.  Again, compare with generic_file_direct_write()...
 > 
 > I don't believe that this one has anything splice-specific to do with it.
 > And its only relation to iov_iter_pipe() splat is that it's in the same
 > fs/iomap.c...

The interesting part is that I'm hitting these two over and over now
rather than the iov_iter_pipe warning.  Could just be unlucky
randomness though..

Dave

Re: iov_iter_pipe warning.

2017-09-10 Thread Dave Jones

On Sun, Sep 10, 2017 at 09:05:48PM +0100, Al Viro wrote:
 > On Sun, Sep 10, 2017 at 12:07:10PM -0400, Dave Jones wrote:
 > > On Sun, Sep 10, 2017 at 03:57:21AM +0100, Al Viro wrote:
 > >  > On Sat, Sep 09, 2017 at 09:07:56PM -0400, Dave Jones wrote:
 > >  > 
 > >  > > With this in place, I'm still seeing -EBUSY from 
 > > invalidate_inode_pages2_range
 > >  > > which doesn't end well...
 > >  > 
 > >  > Different issue, and I'm not sure why that WARN_ON() is there in the
 > >  > first place.  Note that in a similar situation 
 > > generic_file_direct_write()
 > >  > simply buggers off and lets the caller do buffered write...
 > >  > 
 > >  > iov_iter_pipe() warning is a sign of ->read_iter() on pipe-backed 
 > > iov_iter
 > >  > putting into the pipe more than it claims to have done.
 > > 
 > > (from a rerun after hitting that EBUSY warn; hence the taint)
 > > 
 > > WARNING: CPU: 0 PID: 14154 at fs/iomap.c:1055 iomap_dio_rw+0x78e/0x840
 > 
 > ... and that's another invalidate_inode_pages2_range() in the same
 > sucker.  Again, compare with generic_file_direct_write()...
 > 
 > I don't believe that this one has anything splice-specific to do with it.
 > And its only relation to iov_iter_pipe() splat is that it's in the same
 > fs/iomap.c...

The interesting part is that I'm hitting these two over and over now
rather than the iov_iter_pipe warning.  Could just be unlucky
randomness though..

Dave

Re: iov_iter_pipe warning.

2017-09-10 Thread Dave Jones

On Sun, Sep 10, 2017 at 03:57:21AM +0100, Al Viro wrote:
 > On Sat, Sep 09, 2017 at 09:07:56PM -0400, Dave Jones wrote:
 > 
 > > With this in place, I'm still seeing -EBUSY from 
 > > invalidate_inode_pages2_range
 > > which doesn't end well...
 > 
 > Different issue, and I'm not sure why that WARN_ON() is there in the
 > first place.  Note that in a similar situation generic_file_direct_write()
 > simply buggers off and lets the caller do buffered write...
 > 
 > iov_iter_pipe() warning is a sign of ->read_iter() on pipe-backed iov_iter
 > putting into the pipe more than it claims to have done.

(from a rerun after hitting that EBUSY warn; hence the taint)

WARNING: CPU: 0 PID: 14154 at fs/iomap.c:1055 iomap_dio_rw+0x78e/0x840
CPU: 0 PID: 14154 Comm: trinity-c33 Tainted: GW   4.13.0-think+ #9 
task: 8801027e3e40 task.stack: 8801632d8000
RIP: 0010:iomap_dio_rw+0x78e/0x840
RSP: 0018:8801632df370 EFLAGS: 00010286
RAX: fff0 RBX: 880428666428 RCX: ffea
RDX: ed002c65bdef RSI:  RDI: ed002c65be5f
RBP: 8801632df550 R08: 88046ae176c0 R09: 
R10: 8801632de960 R11: 0001 R12: 8801632df7f0
R13: ffea R14: 11002c65be7c R15: 8801632df988
FS:  7f3da2100700() GS:88046ae0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2:  CR3: 0002f6223001 CR4: 001606f0
DR0: 7f3da1f3d000 DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Call Trace:
 ? iomap_seek_data+0xb0/0xb0
 ? find_inode_fast+0xd0/0xd0
 ? xfs_file_aio_write_checks+0x295/0x320 [xfs]
 ? __lock_is_held+0x51/0xc0
 ? xfs_file_dio_aio_write+0x286/0x7e0 [xfs]
 ? rcu_read_lock_sched_held+0x90/0xa0
 xfs_file_dio_aio_write+0x286/0x7e0 [xfs]
 ? xfs_file_aio_write_checks+0x320/0x320 [xfs]
 ? unwind_get_return_address+0x2f/0x50
 ? __save_stack_trace+0x92/0x100
 ? memcmp+0x45/0x70
 ? depot_save_stack+0x12e/0x480
 ? save_stack+0x89/0xb0
 ? save_stack+0x32/0xb0
 ? kasan_kmalloc+0xa0/0xd0
 ? __kmalloc+0x157/0x360
 ? iter_file_splice_write+0x154/0x760
 ? direct_splice_actor+0x86/0xa0
 ? splice_direct_to_actor+0x1c4/0x420
 ? do_splice_direct+0x173/0x1e0
 ? do_sendfile+0x3a2/0x6d0
 ? SyS_sendfile64+0xa4/0x130
 ? do_syscall_64+0x182/0x3e0
 ? entry_SYSCALL64_slow_path+0x25/0x25
 ? match_held_lock+0xa6/0x410
 ? iter_file_splice_write+0x154/0x760
 xfs_file_write_iter+0x227/0x280 [xfs]
 do_iter_readv_writev+0x267/0x330
 ? vfs_dedupe_file_range+0x400/0x400
 do_iter_write+0xd7/0x280
 ? splice_from_pipe_next.part.9+0x28/0x160
 iter_file_splice_write+0x4d5/0x760
 ? page_cache_pipe_buf_steal+0x2b0/0x2b0
 ? generic_file_splice_read+0x2e1/0x340
 ? pipe_to_user+0x80/0x80
 direct_splice_actor+0x86/0xa0
 splice_direct_to_actor+0x1c4/0x420
 ? generic_pipe_buf_nosteal+0x10/0x10
 ? do_splice_to+0xc0/0xc0
 do_splice_direct+0x173/0x1e0
 ? splice_direct_to_actor+0x420/0x420
 ? rcu_read_lock_sched_held+0x90/0xa0
 ? rcu_sync_lockdep_assert+0x43/0x70
 ? __sb_start_write+0x179/0x1e0
 do_sendfile+0x3a2/0x6d0
 ? do_compat_pwritev64+0xa0/0xa0
 ? __lock_is_held+0x2e/0xc0
 SyS_sendfile64+0xa4/0x130
 ? SyS_sendfile+0x140/0x140
 ? mark_held_locks+0x1c/0x90
 ? do_syscall_64+0xae/0x3e0
 ? SyS_sendfile+0x140/0x140
 do_syscall_64+0x182/0x3e0
 ? syscall_return_slowpath+0x250/0x250
 ? rcu_read_lock_sched_held+0x90/0xa0
 ? __context_tracking_exit.part.4+0x223/0x290
 ? mark_held_locks+0x1c/0x90
 ? return_from_SYSCALL_64+0x2d/0x7a
 ? trace_hardirqs_on_caller+0x17a/0x250
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f3da1a2b219
RSP: 002b:7ffdd1642f38 EFLAGS: 0246
 ORIG_RAX: 0028
RAX: ffda RBX: 0028 RCX: 7f3da1a2b219
RDX: 7f3da1f3d000 RSI: 005f RDI: 0060
RBP: 7ffdd1642fe0 R08: 30503123188dbe3f R09: e7e7e7e7
R10: f000 R11: 0246 R12: 0002
R13: 7f3da2012058 R14: 7f3da2100698 R15: 7f3da2012000

Re: iov_iter_pipe warning.

2017-09-10 Thread Dave Jones

On Sun, Sep 10, 2017 at 03:57:21AM +0100, Al Viro wrote:
 > On Sat, Sep 09, 2017 at 09:07:56PM -0400, Dave Jones wrote:
 > 
 > > With this in place, I'm still seeing -EBUSY from 
 > > invalidate_inode_pages2_range
 > > which doesn't end well...
 > 
 > Different issue, and I'm not sure why that WARN_ON() is there in the
 > first place.  Note that in a similar situation generic_file_direct_write()
 > simply buggers off and lets the caller do buffered write...
 > 
 > iov_iter_pipe() warning is a sign of ->read_iter() on pipe-backed iov_iter
 > putting into the pipe more than it claims to have done.

(from a rerun after hitting that EBUSY warn; hence the taint)

WARNING: CPU: 0 PID: 14154 at fs/iomap.c:1055 iomap_dio_rw+0x78e/0x840
CPU: 0 PID: 14154 Comm: trinity-c33 Tainted: GW   4.13.0-think+ #9 
task: 8801027e3e40 task.stack: 8801632d8000
RIP: 0010:iomap_dio_rw+0x78e/0x840
RSP: 0018:8801632df370 EFLAGS: 00010286
RAX: fff0 RBX: 880428666428 RCX: ffea
RDX: ed002c65bdef RSI:  RDI: ed002c65be5f
RBP: 8801632df550 R08: 88046ae176c0 R09: 
R10: 8801632de960 R11: 0001 R12: 8801632df7f0
R13: ffea R14: 11002c65be7c R15: 8801632df988
FS:  7f3da2100700() GS:88046ae0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2:  CR3: 0002f6223001 CR4: 001606f0
DR0: 7f3da1f3d000 DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Call Trace:
 ? iomap_seek_data+0xb0/0xb0
 ? find_inode_fast+0xd0/0xd0
 ? xfs_file_aio_write_checks+0x295/0x320 [xfs]
 ? __lock_is_held+0x51/0xc0
 ? xfs_file_dio_aio_write+0x286/0x7e0 [xfs]
 ? rcu_read_lock_sched_held+0x90/0xa0
 xfs_file_dio_aio_write+0x286/0x7e0 [xfs]
 ? xfs_file_aio_write_checks+0x320/0x320 [xfs]
 ? unwind_get_return_address+0x2f/0x50
 ? __save_stack_trace+0x92/0x100
 ? memcmp+0x45/0x70
 ? depot_save_stack+0x12e/0x480
 ? save_stack+0x89/0xb0
 ? save_stack+0x32/0xb0
 ? kasan_kmalloc+0xa0/0xd0
 ? __kmalloc+0x157/0x360
 ? iter_file_splice_write+0x154/0x760
 ? direct_splice_actor+0x86/0xa0
 ? splice_direct_to_actor+0x1c4/0x420
 ? do_splice_direct+0x173/0x1e0
 ? do_sendfile+0x3a2/0x6d0
 ? SyS_sendfile64+0xa4/0x130
 ? do_syscall_64+0x182/0x3e0
 ? entry_SYSCALL64_slow_path+0x25/0x25
 ? match_held_lock+0xa6/0x410
 ? iter_file_splice_write+0x154/0x760
 xfs_file_write_iter+0x227/0x280 [xfs]
 do_iter_readv_writev+0x267/0x330
 ? vfs_dedupe_file_range+0x400/0x400
 do_iter_write+0xd7/0x280
 ? splice_from_pipe_next.part.9+0x28/0x160
 iter_file_splice_write+0x4d5/0x760
 ? page_cache_pipe_buf_steal+0x2b0/0x2b0
 ? generic_file_splice_read+0x2e1/0x340
 ? pipe_to_user+0x80/0x80
 direct_splice_actor+0x86/0xa0
 splice_direct_to_actor+0x1c4/0x420
 ? generic_pipe_buf_nosteal+0x10/0x10
 ? do_splice_to+0xc0/0xc0
 do_splice_direct+0x173/0x1e0
 ? splice_direct_to_actor+0x420/0x420
 ? rcu_read_lock_sched_held+0x90/0xa0
 ? rcu_sync_lockdep_assert+0x43/0x70
 ? __sb_start_write+0x179/0x1e0
 do_sendfile+0x3a2/0x6d0
 ? do_compat_pwritev64+0xa0/0xa0
 ? __lock_is_held+0x2e/0xc0
 SyS_sendfile64+0xa4/0x130
 ? SyS_sendfile+0x140/0x140
 ? mark_held_locks+0x1c/0x90
 ? do_syscall_64+0xae/0x3e0
 ? SyS_sendfile+0x140/0x140
 do_syscall_64+0x182/0x3e0
 ? syscall_return_slowpath+0x250/0x250
 ? rcu_read_lock_sched_held+0x90/0xa0
 ? __context_tracking_exit.part.4+0x223/0x290
 ? mark_held_locks+0x1c/0x90
 ? return_from_SYSCALL_64+0x2d/0x7a
 ? trace_hardirqs_on_caller+0x17a/0x250
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f3da1a2b219
RSP: 002b:7ffdd1642f38 EFLAGS: 0246
 ORIG_RAX: 0028
RAX: ffda RBX: 0028 RCX: 7f3da1a2b219
RDX: 7f3da1f3d000 RSI: 005f RDI: 0060
RBP: 7ffdd1642fe0 R08: 30503123188dbe3f R09: e7e7e7e7
R10: f000 R11: 0246 R12: 0002
R13: 7f3da2012058 R14: 7f3da2100698 R15: 7f3da2012000

Re: iov_iter_pipe warning.

2017-09-09 Thread Dave Jones

On Fri, Sep 08, 2017 at 02:04:41AM +0100, Al Viro wrote:
 
 > There's at least one suspicious place in iomap_dio_actor() -
 > if (!(dio->flags & IOMAP_DIO_WRITE)) {
 > iov_iter_zero(length, dio->submit.iter);
 > dio->size += length;
 > return length;
 > }
 > which assumes that iov_iter_zero() always succeeds.  That's very
 > much _not_ true - neither for iovec-backed, not for pipe-backed.
 > Orangefs read_one_page() is fine (it calls that sucker for bvec-backed
 > iov_iter it's just created), but iomap_dio_actor() is not.
 > 
 > I'm not saying that it will suffice, but we definitely need this:
 > 
 > diff --git a/fs/iomap.c b/fs/iomap.c
 > index 269b24a01f32..4a671263475f 100644
 > --- a/fs/iomap.c
 > +++ b/fs/iomap.c
 > @@ -843,7 +843,7 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t 
 > length,
 >  /*FALLTHRU*/
 >  case IOMAP_UNWRITTEN:
 >  if (!(dio->flags & IOMAP_DIO_WRITE)) {
 > -iov_iter_zero(length, dio->submit.iter);
 > +length = iov_iter_zero(length, dio->submit.iter);
 >  dio->size += length;
 >  return length;

With this in place, I'm still seeing -EBUSY from invalidate_inode_pages2_range
which doesn't end well...


WARNING: CPU: 3 PID: 11443 at fs/iomap.c:993 iomap_dio_rw+0x825/0x840
CPU: 3 PID: 11443 Comm: trinity-c39 Not tainted 4.13.0-think+ #9 
task: 880461080040 task.stack: 88043d72
RIP: 0010:iomap_dio_rw+0x825/0x840
RSP: 0018:88043d727730 EFLAGS: 00010286
RAX: fff0 RBX: 88044f036428 RCX: 
RDX: ed0087ae4e67 RSI:  RDI: ed0087ae4ed7
RBP: 88043d727910 R08: 88046b4176c0 R09: 
R10: 88043d726d20 R11: 0001 R12: 88043d727a90
R13: 027253f7 R14: 110087ae4ef4 R15: 88043d727c10
FS:  7f5d8613e700() GS:88046b40() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f5d84503000 CR3: 0004594e1000 CR4: 001606e0
Call Trace:
 ? iomap_seek_data+0xb0/0xb0
 ? down_read_nested+0xd3/0x160
 ? down_read_non_owner+0x40/0x40
 ? xfs_ilock+0x3cb/0x460 [xfs]
 ? sched_clock_cpu+0x14/0xf0
 ? __lock_is_held+0x51/0xc0
 ? xfs_file_dio_aio_read+0x123/0x350 [xfs]
 xfs_file_dio_aio_read+0x123/0x350 [xfs]
 ? xfs_file_fallocate+0x550/0x550 [xfs]
 ? lock_release+0xa00/0xa00
 ? ___might_sleep.part.70+0x118/0x320
 xfs_file_read_iter+0x1b1/0x1d0 [xfs]
 do_iter_readv_writev+0x2ea/0x330
 ? vfs_dedupe_file_range+0x400/0x400
 do_iter_read+0x149/0x280
 vfs_readv+0x107/0x180
 ? vfs_iter_read+0x60/0x60
 ? fget_raw+0x10/0x10
 ? native_sched_clock+0xf9/0x1a0
 ? __fdget_pos+0xd6/0x110
 ? __fdget_pos+0xd6/0x110
 ? __fdget_raw+0x10/0x10
 ? do_readv+0xc0/0x1b0
 do_readv+0xc0/0x1b0
 ? vfs_readv+0x180/0x180
 ? mark_held_locks+0x1c/0x90
 ? do_syscall_64+0xae/0x3e0
 ? compat_rw_copy_check_uvector+0x1b0/0x1b0
 do_syscall_64+0x182/0x3e0
 ? syscall_return_slowpath+0x250/0x250
 ? rcu_read_lock_sched_held+0x90/0xa0
 ? mark_held_locks+0x1c/0x90
 ? return_from_SYSCALL_64+0x2d/0x7a
 ? trace_hardirqs_on_caller+0x17a/0x250
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f5d85a69219
RSP: 002b:7ffdf090afd8 EFLAGS: 0246
 ORIG_RAX: 0013
RAX: ffda RBX: 0013 RCX: 7f5d85a69219
RDX: 00ae RSI: 565183cd5490 RDI: 0056
RBP: 7ffdf090b080 R08: 0141082b00011c63 R09: 
R10: e000 R11: 0246 R12: 0002
R13: 7f5d86026058 R14: 7f5d8613e698 R15: 7f5d86026000

Re: iov_iter_pipe warning.

2017-09-09 Thread Dave Jones

On Fri, Sep 08, 2017 at 02:04:41AM +0100, Al Viro wrote:
 
 > There's at least one suspicious place in iomap_dio_actor() -
 > if (!(dio->flags & IOMAP_DIO_WRITE)) {
 > iov_iter_zero(length, dio->submit.iter);
 > dio->size += length;
 > return length;
 > }
 > which assumes that iov_iter_zero() always succeeds.  That's very
 > much _not_ true - neither for iovec-backed, not for pipe-backed.
 > Orangefs read_one_page() is fine (it calls that sucker for bvec-backed
 > iov_iter it's just created), but iomap_dio_actor() is not.
 > 
 > I'm not saying that it will suffice, but we definitely need this:
 > 
 > diff --git a/fs/iomap.c b/fs/iomap.c
 > index 269b24a01f32..4a671263475f 100644
 > --- a/fs/iomap.c
 > +++ b/fs/iomap.c
 > @@ -843,7 +843,7 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t 
 > length,
 >  /*FALLTHRU*/
 >  case IOMAP_UNWRITTEN:
 >  if (!(dio->flags & IOMAP_DIO_WRITE)) {
 > -iov_iter_zero(length, dio->submit.iter);
 > +length = iov_iter_zero(length, dio->submit.iter);
 >  dio->size += length;
 >  return length;

With this in place, I'm still seeing -EBUSY from invalidate_inode_pages2_range
which doesn't end well...


WARNING: CPU: 3 PID: 11443 at fs/iomap.c:993 iomap_dio_rw+0x825/0x840
CPU: 3 PID: 11443 Comm: trinity-c39 Not tainted 4.13.0-think+ #9 
task: 880461080040 task.stack: 88043d72
RIP: 0010:iomap_dio_rw+0x825/0x840
RSP: 0018:88043d727730 EFLAGS: 00010286
RAX: fff0 RBX: 88044f036428 RCX: 
RDX: ed0087ae4e67 RSI:  RDI: ed0087ae4ed7
RBP: 88043d727910 R08: 88046b4176c0 R09: 
R10: 88043d726d20 R11: 0001 R12: 88043d727a90
R13: 027253f7 R14: 110087ae4ef4 R15: 88043d727c10
FS:  7f5d8613e700() GS:88046b40() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f5d84503000 CR3: 0004594e1000 CR4: 001606e0
Call Trace:
 ? iomap_seek_data+0xb0/0xb0
 ? down_read_nested+0xd3/0x160
 ? down_read_non_owner+0x40/0x40
 ? xfs_ilock+0x3cb/0x460 [xfs]
 ? sched_clock_cpu+0x14/0xf0
 ? __lock_is_held+0x51/0xc0
 ? xfs_file_dio_aio_read+0x123/0x350 [xfs]
 xfs_file_dio_aio_read+0x123/0x350 [xfs]
 ? xfs_file_fallocate+0x550/0x550 [xfs]
 ? lock_release+0xa00/0xa00
 ? ___might_sleep.part.70+0x118/0x320
 xfs_file_read_iter+0x1b1/0x1d0 [xfs]
 do_iter_readv_writev+0x2ea/0x330
 ? vfs_dedupe_file_range+0x400/0x400
 do_iter_read+0x149/0x280
 vfs_readv+0x107/0x180
 ? vfs_iter_read+0x60/0x60
 ? fget_raw+0x10/0x10
 ? native_sched_clock+0xf9/0x1a0
 ? __fdget_pos+0xd6/0x110
 ? __fdget_pos+0xd6/0x110
 ? __fdget_raw+0x10/0x10
 ? do_readv+0xc0/0x1b0
 do_readv+0xc0/0x1b0
 ? vfs_readv+0x180/0x180
 ? mark_held_locks+0x1c/0x90
 ? do_syscall_64+0xae/0x3e0
 ? compat_rw_copy_check_uvector+0x1b0/0x1b0
 do_syscall_64+0x182/0x3e0
 ? syscall_return_slowpath+0x250/0x250
 ? rcu_read_lock_sched_held+0x90/0xa0
 ? mark_held_locks+0x1c/0x90
 ? return_from_SYSCALL_64+0x2d/0x7a
 ? trace_hardirqs_on_caller+0x17a/0x250
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f5d85a69219
RSP: 002b:7ffdf090afd8 EFLAGS: 0246
 ORIG_RAX: 0013
RAX: ffda RBX: 0013 RCX: 7f5d85a69219
RDX: 00ae RSI: 565183cd5490 RDI: 0056
RBP: 7ffdf090b080 R08: 0141082b00011c63 R09: 
R10: e000 R11: 0246 R12: 0002
R13: 7f5d86026058 R14: 7f5d8613e698 R15: 7f5d86026000

Re: iov_iter_pipe warning.

2017-09-06 Thread Dave Jones

On Thu, Sep 07, 2017 at 09:46:17AM +1000, Dave Chinner wrote:
 > On Wed, Sep 06, 2017 at 04:03:37PM -0400, Dave Jones wrote:
 > > On Mon, Aug 28, 2017 at 09:25:42PM -0700, Darrick J. Wong wrote:
 > >  > On Mon, Aug 28, 2017 at 04:31:30PM -0400, Dave Jones wrote:
 > >  > > I'm still trying to narrow down an exact reproducer, but it seems 
 > > having
 > >  > > trinity do a combination of sendfile & writev, with pipes and regular
 > >  > > files as fd's is the best repro.
 > >  > > 
 > >  > > Is this a real problem, or am I chasing ghosts ?  That it doesn't 
 > > happen
 > >  > > on ext4 or btrfs is making me wonder...
 > >  > 
 > >  >  I haven't heard of any problems w/ directio xfs lately, but OTOH
 > >  > I think it's the only filesystem that uses iomap_dio_rw, which would
 > >  > explain why ext4/btrfs don't have this problem.
 > > 
 > > Another warning, from likely the same root cause.
 > > 
 > > WARNING: CPU: 3 PID: 572 at lib/iov_iter.c:962 iov_iter_pipe+0xe2/0xf0
 > 
 >  WARN_ON(pipe->nrbufs == pipe->buffers);
 > 
 >  *  @nrbufs: the number of non-empty pipe buffers in this pipe
 >  *  @buffers: total number of buffers (should be a power of 2)
 > 
 > So that's warning that the pipe buffer is already full before we
 > try to read from the filesystem?
 > 
 > That doesn't seem like an XFS problem - it indicates the pipe we are
 > filling in generic_file_splice_read() is not being emptied by
 > whatever we are splicing the file data to

The puzzling part is this runs for a day on ext4 or btrfs, whereas I can
make xfs fall over pretty quickly.  As Darrick pointed out though, this
could be due to xfs being the only user of iomap_dio_rw.

I'm juggling a few other things right now, so probably not going to
have much time to dig further on this until after plumbers + 1 wk.

Dave

Re: iov_iter_pipe warning.

2017-09-06 Thread Dave Jones

On Thu, Sep 07, 2017 at 09:46:17AM +1000, Dave Chinner wrote:
 > On Wed, Sep 06, 2017 at 04:03:37PM -0400, Dave Jones wrote:
 > > On Mon, Aug 28, 2017 at 09:25:42PM -0700, Darrick J. Wong wrote:
 > >  > On Mon, Aug 28, 2017 at 04:31:30PM -0400, Dave Jones wrote:
 > >  > > I'm still trying to narrow down an exact reproducer, but it seems 
 > > having
 > >  > > trinity do a combination of sendfile & writev, with pipes and regular
 > >  > > files as fd's is the best repro.
 > >  > > 
 > >  > > Is this a real problem, or am I chasing ghosts ?  That it doesn't 
 > > happen
 > >  > > on ext4 or btrfs is making me wonder...
 > >  > 
 > >  >  I haven't heard of any problems w/ directio xfs lately, but OTOH
 > >  > I think it's the only filesystem that uses iomap_dio_rw, which would
 > >  > explain why ext4/btrfs don't have this problem.
 > > 
 > > Another warning, from likely the same root cause.
 > > 
 > > WARNING: CPU: 3 PID: 572 at lib/iov_iter.c:962 iov_iter_pipe+0xe2/0xf0
 > 
 >  WARN_ON(pipe->nrbufs == pipe->buffers);
 > 
 >  *  @nrbufs: the number of non-empty pipe buffers in this pipe
 >  *  @buffers: total number of buffers (should be a power of 2)
 > 
 > So that's warning that the pipe buffer is already full before we
 > try to read from the filesystem?
 > 
 > That doesn't seem like an XFS problem - it indicates the pipe we are
 > filling in generic_file_splice_read() is not being emptied by
 > whatever we are splicing the file data to

The puzzling part is this runs for a day on ext4 or btrfs, whereas I can
make xfs fall over pretty quickly.  As Darrick pointed out though, this
could be due to xfs being the only user of iomap_dio_rw.

I'm juggling a few other things right now, so probably not going to
have much time to dig further on this until after plumbers + 1 wk.

Dave

Re: x86/kconfig: Consolidate unwinders into multiple choice selection

2017-09-06 Thread Dave Jones

On Wed, Sep 06, 2017 at 04:49:45PM -0500, Josh Poimboeuf wrote:
 
 > > Choose kernel unwinder
 > > > 1. Frame pointer unwinder (FRAME_POINTER_UNWINDER) (NEW)
 > >   2. ORC unwinder (ORC_UNWINDER) (NEW)
 > >   3. Guess unwinder (GUESS_UNWINDER) (NEW)
 > > choice[1-3?]: 
 > 
 > This is a quirk of the config tool.  It's not very intuitive, but to see
 > the help for a given option you have to type the number appended with a
 > '?', like:
 > 
 >   > 1. Frame pointer unwinder (FRAME_POINTER_UNWINDER) (NEW)
 > 2. ORC unwinder (ORC_UNWINDER) (NEW)
 >   choice[1-2?]: 1?

Hey, I learned something today!

thanks,

Dave

Re: x86/kconfig: Consolidate unwinders into multiple choice selection

2017-09-06 Thread Dave Jones

On Wed, Sep 06, 2017 at 04:49:45PM -0500, Josh Poimboeuf wrote:
 
 > > Choose kernel unwinder
 > > > 1. Frame pointer unwinder (FRAME_POINTER_UNWINDER) (NEW)
 > >   2. ORC unwinder (ORC_UNWINDER) (NEW)
 > >   3. Guess unwinder (GUESS_UNWINDER) (NEW)
 > > choice[1-3?]: 
 > 
 > This is a quirk of the config tool.  It's not very intuitive, but to see
 > the help for a given option you have to type the number appended with a
 > '?', like:
 > 
 >   > 1. Frame pointer unwinder (FRAME_POINTER_UNWINDER) (NEW)
 > 2. ORC unwinder (ORC_UNWINDER) (NEW)
 >   choice[1-2?]: 1?

Hey, I learned something today!

thanks,

Dave

Re: iov_iter_pipe warning.

2017-09-06 Thread Dave Jones

On Mon, Aug 28, 2017 at 09:25:42PM -0700, Darrick J. Wong wrote:
 > On Mon, Aug 28, 2017 at 04:31:30PM -0400, Dave Jones wrote:
 > > On Mon, Aug 07, 2017 at 04:18:18PM -0400, Dave Jones wrote:
 > >  > On Fri, Apr 28, 2017 at 06:20:25PM +0100, Al Viro wrote:
 > >  >  > On Fri, Apr 28, 2017 at 12:50:24PM -0400, Dave Jones wrote:
 > >  >  > > currently running v4.11-rc8-75-gf83246089ca0
 > >  >  > > 
 > >  >  > > sunrpc bit is for the other unrelated problem I'm chasing.
 > >  >  > > 
 > >  >  > > note also, I saw the backtrace without the fs/splice.c changes.
 > >  >  > 
 > >  >  >  Interesting...  Could you add this and see if that triggers?
 > >  >  > 
 > >  >  > diff --git a/fs/splice.c b/fs/splice.c
 > >  >  > index 540c4a44756c..12a12d9c313f 100644
 > >  >  > --- a/fs/splice.c
 > >  >  > +++ b/fs/splice.c
 > >  >  > @@ -306,6 +306,9 @@ ssize_t generic_file_splice_read(struct file 
 > > *in, loff_t *ppos,
 > >  >  >  kiocb.ki_pos = *ppos;
 > >  >  >  ret = call_read_iter(in, , );
 > >  >  >  if (ret > 0) {
 > >  >  > +if (WARN_ON(iov_iter_count() != len - ret))
 > >  >  > +printk(KERN_ERR "ops %p: was %zd, left %zd, 
 > > returned %d\n",
 > >  >  > +in->f_op, len, iov_iter_count(), 
 > > ret);
 > >  >  >  *ppos = kiocb.ki_pos;
 > >  >  >  file_accessed(in);
 > >  >  >  } else if (ret < 0) {
 > >  > 
 > >  > Hey Al,
 > >  >  Due to a git stash screw up on my part, I've had this leftover WARN_ON
 > >  > in my tree for the last couple months. (That screw-up might turn out to 
 > > be
 > >  > serendipitous if this is a real bug..)
 > >  > 
 > >  > Today I decided to change things up and beat up on xfs for a change, and
 > >  > was able to trigger this again.
 > >  > 
 > >  > Is this check no longer valid, or am I triggering the same bug we were 
 > > chased
 > >  > down in nfs, but now in xfs ?  (None of the other detritus from that 
 > > debugging
 > >  > back in April made it, just those three lines above).
 > > 
 > > Revisiting this. I went back and dug out some of the other debug diffs [1]
 > > from that old thread.
 > > 
 > > I can easily trigger this spew on xfs.
 > > 
 > > 
 > > WARNING: CPU: 1 PID: 2251 at fs/splice.c:292 test_it+0xd4/0x1d0
 > > CPU: 1 PID: 2251 Comm: trinity-c42 Not tainted 4.13.0-rc7-think+ #1 
 > > task: 880459173a40 task.stack: 88044f7d
 > > RIP: 0010:test_it+0xd4/0x1d0
 > > RSP: 0018:88044f7d7878 EFLAGS: 00010283
 > > RAX:  RBX: 88044f44b968 RCX: 81511ea0
 > > RDX: 0003 RSI: dc00 RDI: 88044f44ba68
 > > RBP: 88044f7d78c8 R08: 88046b218ec0 R09: 
 > > R10: 88044f7d7518 R11:  R12: 1000
 > > R13: 0001 R14:  R15: 0001
 > > FS:  7fdbc09b2700() GS:88046b20() 
 > > knlGS:
 > > CS:  0010 DS:  ES:  CR0: 80050033
 > > CR2:  CR3: 000459e1d000 CR4: 001406e0
 > > Call Trace:
 > >  generic_file_splice_read+0x414/0x4e0
 > >  ? opipe_prep.part.14+0x180/0x180
 > >  ? lockdep_init_map+0xb2/0x2b0
 > >  ? rw_verify_area+0x65/0x150
 > >  do_splice_to+0xab/0xc0
 > >  splice_direct_to_actor+0x1f5/0x540
 > >  ? generic_pipe_buf_nosteal+0x10/0x10
 > >  ? do_splice_to+0xc0/0xc0
 > >  ? rw_verify_area+0x9d/0x150
 > >  do_splice_direct+0x1b9/0x230
 > >  ? splice_direct_to_actor+0x540/0x540
 > >  ? __sb_start_write+0x164/0x1c0
 > >  ? do_sendfile+0x7b3/0x840
 > >  do_sendfile+0x428/0x840
 > >  ? do_compat_pwritev64+0xb0/0xb0
 > >  ? __might_sleep+0x72/0xe0
 > >  ? kasan_check_write+0x14/0x20
 > >  SyS_sendfile64+0xa4/0x120
 > >  ? SyS_sendfile+0x150/0x150
 > >  ? mark_held_locks+0x23/0xb0
 > >  ? do_syscall_64+0xc0/0x3e0
 > >  ? SyS_sendfile+0x150/0x150
 > >  do_syscall_64+0x1bc/0x3e0
 > >  ? syscall_return_slowpath+0x240/0x240
 > >  ? mark_held_locks+0x23/0xb0
 > >  ? return_from_SYSCALL_64+0x2d/0x7a
 > >  ? trace_hardirqs_on_caller+0x182/0x260
 > >  ? trace_hardirqs_on_thunk+0x1a/0x1c
 > >  entry_SYSCALL64_slow_path+0x25/0x25
 > > RIP: 0033:0x7fdbc02dd219
 > > RSP: 002b:7ffc5024

Re: iov_iter_pipe warning.

2017-09-06 Thread Dave Jones

On Mon, Aug 28, 2017 at 09:25:42PM -0700, Darrick J. Wong wrote:
 > On Mon, Aug 28, 2017 at 04:31:30PM -0400, Dave Jones wrote:
 > > On Mon, Aug 07, 2017 at 04:18:18PM -0400, Dave Jones wrote:
 > >  > On Fri, Apr 28, 2017 at 06:20:25PM +0100, Al Viro wrote:
 > >  >  > On Fri, Apr 28, 2017 at 12:50:24PM -0400, Dave Jones wrote:
 > >  >  > > currently running v4.11-rc8-75-gf83246089ca0
 > >  >  > > 
 > >  >  > > sunrpc bit is for the other unrelated problem I'm chasing.
 > >  >  > > 
 > >  >  > > note also, I saw the backtrace without the fs/splice.c changes.
 > >  >  > 
 > >  >  >  Interesting...  Could you add this and see if that triggers?
 > >  >  > 
 > >  >  > diff --git a/fs/splice.c b/fs/splice.c
 > >  >  > index 540c4a44756c..12a12d9c313f 100644
 > >  >  > --- a/fs/splice.c
 > >  >  > +++ b/fs/splice.c
 > >  >  > @@ -306,6 +306,9 @@ ssize_t generic_file_splice_read(struct file 
 > > *in, loff_t *ppos,
 > >  >  >  kiocb.ki_pos = *ppos;
 > >  >  >  ret = call_read_iter(in, , );
 > >  >  >  if (ret > 0) {
 > >  >  > +if (WARN_ON(iov_iter_count() != len - ret))
 > >  >  > +printk(KERN_ERR "ops %p: was %zd, left %zd, 
 > > returned %d\n",
 > >  >  > +in->f_op, len, iov_iter_count(), 
 > > ret);
 > >  >  >  *ppos = kiocb.ki_pos;
 > >  >  >  file_accessed(in);
 > >  >  >  } else if (ret < 0) {
 > >  > 
 > >  > Hey Al,
 > >  >  Due to a git stash screw up on my part, I've had this leftover WARN_ON
 > >  > in my tree for the last couple months. (That screw-up might turn out to 
 > > be
 > >  > serendipitous if this is a real bug..)
 > >  > 
 > >  > Today I decided to change things up and beat up on xfs for a change, and
 > >  > was able to trigger this again.
 > >  > 
 > >  > Is this check no longer valid, or am I triggering the same bug we were 
 > > chased
 > >  > down in nfs, but now in xfs ?  (None of the other detritus from that 
 > > debugging
 > >  > back in April made it, just those three lines above).
 > > 
 > > Revisiting this. I went back and dug out some of the other debug diffs [1]
 > > from that old thread.
 > > 
 > > I can easily trigger this spew on xfs.
 > > 
 > > 
 > > WARNING: CPU: 1 PID: 2251 at fs/splice.c:292 test_it+0xd4/0x1d0
 > > CPU: 1 PID: 2251 Comm: trinity-c42 Not tainted 4.13.0-rc7-think+ #1 
 > > task: 880459173a40 task.stack: 88044f7d
 > > RIP: 0010:test_it+0xd4/0x1d0
 > > RSP: 0018:88044f7d7878 EFLAGS: 00010283
 > > RAX:  RBX: 88044f44b968 RCX: 81511ea0
 > > RDX: 0003 RSI: dc00 RDI: 88044f44ba68
 > > RBP: 88044f7d78c8 R08: 88046b218ec0 R09: 
 > > R10: 88044f7d7518 R11:  R12: 1000
 > > R13: 0001 R14:  R15: 0001
 > > FS:  7fdbc09b2700() GS:88046b20() 
 > > knlGS:
 > > CS:  0010 DS:  ES:  CR0: 80050033
 > > CR2:  CR3: 000459e1d000 CR4: 001406e0
 > > Call Trace:
 > >  generic_file_splice_read+0x414/0x4e0
 > >  ? opipe_prep.part.14+0x180/0x180
 > >  ? lockdep_init_map+0xb2/0x2b0
 > >  ? rw_verify_area+0x65/0x150
 > >  do_splice_to+0xab/0xc0
 > >  splice_direct_to_actor+0x1f5/0x540
 > >  ? generic_pipe_buf_nosteal+0x10/0x10
 > >  ? do_splice_to+0xc0/0xc0
 > >  ? rw_verify_area+0x9d/0x150
 > >  do_splice_direct+0x1b9/0x230
 > >  ? splice_direct_to_actor+0x540/0x540
 > >  ? __sb_start_write+0x164/0x1c0
 > >  ? do_sendfile+0x7b3/0x840
 > >  do_sendfile+0x428/0x840
 > >  ? do_compat_pwritev64+0xb0/0xb0
 > >  ? __might_sleep+0x72/0xe0
 > >  ? kasan_check_write+0x14/0x20
 > >  SyS_sendfile64+0xa4/0x120
 > >  ? SyS_sendfile+0x150/0x150
 > >  ? mark_held_locks+0x23/0xb0
 > >  ? do_syscall_64+0xc0/0x3e0
 > >  ? SyS_sendfile+0x150/0x150
 > >  do_syscall_64+0x1bc/0x3e0
 > >  ? syscall_return_slowpath+0x240/0x240
 > >  ? mark_held_locks+0x23/0xb0
 > >  ? return_from_SYSCALL_64+0x2d/0x7a
 > >  ? trace_hardirqs_on_caller+0x182/0x260
 > >  ? trace_hardirqs_on_thunk+0x1a/0x1c
 > >  entry_SYSCALL64_slow_path+0x25/0x25
 > > RIP: 0033:0x7fdbc02dd219
 > > RSP: 002b:7ffc5024

Re: x86/kconfig: Consolidate unwinders into multiple choice selection

2017-09-05 Thread Dave Jones

On Mon, Sep 04, 2017 at 08:05:13PM +, Linux Kernel wrote:
 > Web:
 > https://git.kernel.org/torvalds/c/81d387190039c14edac8de2b3ec789beb899afd9
 > Commit: 81d387190039c14edac8de2b3ec789beb899afd9
 > Parent: a34a766ff96d9e88572e35a45066279e40a85d84
 > Refname:refs/heads/master
 > Author: Josh Poimboeuf 
 > AuthorDate: Tue Jul 25 08:54:24 2017 -0500
 > Committer:  Ingo Molnar 
 > CommitDate: Wed Jul 26 14:05:36 2017 +0200
 > 
 > x86/kconfig: Consolidate unwinders into multiple choice selection
 > 
 > There are three mutually exclusive unwinders.  Make that more obvious by
 > combining them into a multiple-choice selection:
 > 
 >   CONFIG_FRAME_POINTER_UNWINDER
 >   CONFIG_ORC_UNWINDER
 >   CONFIG_GUESS_UNWINDER (if CONFIG_EXPERT=y)

The help texts for the various unwinders are now attached to the wrong
kconfig item.

 > +choice
 > +prompt "Choose kernel unwinder"
 > +default FRAME_POINTER_UNWINDER
 > +---help---
 > +  This determines which method will be used for unwinding kernel stack
 > +  traces for panics, oopses, bugs, warnings, perf, /proc//stack,
 > +  livepatch, lockdep, and more.

This is what gets displayed, but tells me nothing about what the
benefits/downsides are of each (or even what they are; I had to read the
Kconfig file to figure out what 'GUESS' meant)


an oldconfig run ..


Choose kernel unwinder
> 1. Frame pointer unwinder (FRAME_POINTER_UNWINDER) (NEW)
  2. ORC unwinder (ORC_UNWINDER) (NEW)
  3. Guess unwinder (GUESS_UNWINDER) (NEW)
choice[1-3?]: ?

This determines which method will be used for unwinding kernel stack
traces for panics, oopses, bugs, warnings, perf, /proc//stack,
livepatch, lockdep, and more.

Prompt: Choose kernel unwinder
  Location:
-> Kernel hacking
  Defined at arch/x86/Kconfig.debug:359
  Selected by: m



Choose kernel unwinder
> 1. Frame pointer unwinder (FRAME_POINTER_UNWINDER) (NEW)
  2. ORC unwinder (ORC_UNWINDER) (NEW)
  3. Guess unwinder (GUESS_UNWINDER) (NEW)
choice[1-3?]: 



Dave

Re: x86/kconfig: Consolidate unwinders into multiple choice selection

2017-09-05 Thread Dave Jones

On Mon, Sep 04, 2017 at 08:05:13PM +, Linux Kernel wrote:
 > Web:
 > https://git.kernel.org/torvalds/c/81d387190039c14edac8de2b3ec789beb899afd9
 > Commit: 81d387190039c14edac8de2b3ec789beb899afd9
 > Parent: a34a766ff96d9e88572e35a45066279e40a85d84
 > Refname:refs/heads/master
 > Author: Josh Poimboeuf 
 > AuthorDate: Tue Jul 25 08:54:24 2017 -0500
 > Committer:  Ingo Molnar 
 > CommitDate: Wed Jul 26 14:05:36 2017 +0200
 > 
 > x86/kconfig: Consolidate unwinders into multiple choice selection
 > 
 > There are three mutually exclusive unwinders.  Make that more obvious by
 > combining them into a multiple-choice selection:
 > 
 >   CONFIG_FRAME_POINTER_UNWINDER
 >   CONFIG_ORC_UNWINDER
 >   CONFIG_GUESS_UNWINDER (if CONFIG_EXPERT=y)

The help texts for the various unwinders are now attached to the wrong
kconfig item.

 > +choice
 > +prompt "Choose kernel unwinder"
 > +default FRAME_POINTER_UNWINDER
 > +---help---
 > +  This determines which method will be used for unwinding kernel stack
 > +  traces for panics, oopses, bugs, warnings, perf, /proc//stack,
 > +  livepatch, lockdep, and more.

This is what gets displayed, but tells me nothing about what the
benefits/downsides are of each (or even what they are; I had to read the
Kconfig file to figure out what 'GUESS' meant)


an oldconfig run ..


Choose kernel unwinder
> 1. Frame pointer unwinder (FRAME_POINTER_UNWINDER) (NEW)
  2. ORC unwinder (ORC_UNWINDER) (NEW)
  3. Guess unwinder (GUESS_UNWINDER) (NEW)
choice[1-3?]: ?

This determines which method will be used for unwinding kernel stack
traces for panics, oopses, bugs, warnings, perf, /proc//stack,
livepatch, lockdep, and more.

Prompt: Choose kernel unwinder
  Location:
-> Kernel hacking
  Defined at arch/x86/Kconfig.debug:359
  Selected by: m



Choose kernel unwinder
> 1. Frame pointer unwinder (FRAME_POINTER_UNWINDER) (NEW)
  2. ORC unwinder (ORC_UNWINDER) (NEW)
  3. Guess unwinder (GUESS_UNWINDER) (NEW)
choice[1-3?]: 



Dave

Re: iov_iter_pipe warning.

2017-08-30 Thread Dave Jones

On Wed, Aug 30, 2017 at 10:13:43AM -0700, Darrick J. Wong wrote:


 > > I reverted the debug patches mentioned above, and ran trinity for a while 
 > > again,
 > > and got this which smells really suspiciously related
 > > 
 > > WARNING: CPU: 1 PID: 10380 at fs/iomap.c:993 iomap_dio_rw+0x825/0x840
 > > RAX: fff0 RBX: 88046a64d0e8 RCX: 
 > > 
 > > 
 > > 
 > > That's this..
 > > 
 > >  987 ret = filemap_write_and_wait_range(mapping, start, end);
 > >  988 if (ret)
 > >  989 goto out_free_dio;
 > >  990 
 > >  991 ret = invalidate_inode_pages2_range(mapping,
 > >  992 start >> PAGE_SHIFT, end >> PAGE_SHIFT);
 > >  993 WARN_ON_ONCE(ret);
 > > 
 > > 
 > > Plot thickens..
 > 
 > Hm, that's the WARN_ON that comes from a failed pagecache invalidation
 > prior to a dio operation, which implies that something's mixing buffered
 > and dio?

Plausible. Judging by RAX, we got -EBUSY

 > Given that it's syzkaller it wouldn't surprise me to hear that it's
 > doing that... :)

s/syzkaller/trinity/, but yes.

Dave

Re: iov_iter_pipe warning.

2017-08-30 Thread Dave Jones

On Wed, Aug 30, 2017 at 10:13:43AM -0700, Darrick J. Wong wrote:


 > > I reverted the debug patches mentioned above, and ran trinity for a while 
 > > again,
 > > and got this which smells really suspiciously related
 > > 
 > > WARNING: CPU: 1 PID: 10380 at fs/iomap.c:993 iomap_dio_rw+0x825/0x840
 > > RAX: fff0 RBX: 88046a64d0e8 RCX: 
 > > 
 > > 
 > > 
 > > That's this..
 > > 
 > >  987 ret = filemap_write_and_wait_range(mapping, start, end);
 > >  988 if (ret)
 > >  989 goto out_free_dio;
 > >  990 
 > >  991 ret = invalidate_inode_pages2_range(mapping,
 > >  992 start >> PAGE_SHIFT, end >> PAGE_SHIFT);
 > >  993 WARN_ON_ONCE(ret);
 > > 
 > > 
 > > Plot thickens..
 > 
 > Hm, that's the WARN_ON that comes from a failed pagecache invalidation
 > prior to a dio operation, which implies that something's mixing buffered
 > and dio?

Plausible. Judging by RAX, we got -EBUSY

 > Given that it's syzkaller it wouldn't surprise me to hear that it's
 > doing that... :)

s/syzkaller/trinity/, but yes.

Dave

Re: iov_iter_pipe warning.

2017-08-30 Thread Dave Jones

On Mon, Aug 28, 2017 at 09:25:42PM -0700, Darrick J. Wong wrote:
 > On Mon, Aug 28, 2017 at 04:31:30PM -0400, Dave Jones wrote:
 > > On Mon, Aug 07, 2017 at 04:18:18PM -0400, Dave Jones wrote:
 > >  > On Fri, Apr 28, 2017 at 06:20:25PM +0100, Al Viro wrote:
 > >  >  > On Fri, Apr 28, 2017 at 12:50:24PM -0400, Dave Jones wrote:
 > >  >  > 
 > >  >  > diff --git a/fs/splice.c b/fs/splice.c
 > >  >  > index 540c4a44756c..12a12d9c313f 100644
 > >  >  > --- a/fs/splice.c
 > >  >  > +++ b/fs/splice.c
 > >  >  > @@ -306,6 +306,9 @@ ssize_t generic_file_splice_read(struct file 
 > > *in, loff_t *ppos,
 > >  >  >  kiocb.ki_pos = *ppos;
 > >  >  >  ret = call_read_iter(in, , );
 > >  >  >  if (ret > 0) {
 > >  >  > +if (WARN_ON(iov_iter_count() != len - ret))
 > >  >  > +printk(KERN_ERR "ops %p: was %zd, left %zd, 
 > > returned %d\n",
 > >  >  > +in->f_op, len, iov_iter_count(), 
 > > ret);
 > >  >  >  *ppos = kiocb.ki_pos;
 > >  >  >  file_accessed(in);
 > >  >  >  } else if (ret < 0) {
 > >  > 
 > >  > Hey Al,
 > >  >  Due to a git stash screw up on my part, I've had this leftover WARN_ON
 > >  > in my tree for the last couple months. (That screw-up might turn out to 
 > > be
 > >  > serendipitous if this is a real bug..)
 > >  > 
 > >  > Today I decided to change things up and beat up on xfs for a change, and
 > >  > was able to trigger this again.
 > >  > 
 > >  > Is this check no longer valid, or am I triggering the same bug we were 
 > > chased
 > >  > down in nfs, but now in xfs ?  (None of the other detritus from that 
 > > debugging
 > >  > back in April made it, just those three lines above).
 > > 
 > > Revisiting this. I went back and dug out some of the other debug diffs [1]
 > > from that old thread.
 > > 
 > > I can easily trigger this spew on xfs.
 > > 
 > > ...
 > >
 > > asked to read 4096, claims to have read 1
 > > actual size of data in pipe 4096 
 > > [0:4096]
 > > f_op: a058c920, f_flags: 49154, pos: 0/1, size: 0
 > > 
 > > 
 > > I'm still trying to narrow down an exact reproducer, but it seems having
 > > trinity do a combination of sendfile & writev, with pipes and regular
 > > files as fd's is the best repro.
 > > 
 > > Is this a real problem, or am I chasing ghosts ?  That it doesn't happen
 > > on ext4 or btrfs is making me wonder...
 > 
 >  I haven't heard of any problems w/ directio xfs lately, but OTOH
 > I think it's the only filesystem that uses iomap_dio_rw, which would
 > explain why ext4/btrfs don't have this problem.
 > 
 > Granted that's idle speculation; is there a reproducer/xfstest for this?

I reverted the debug patches mentioned above, and ran trinity for a while again,
and got this which smells really suspiciously related

WARNING: CPU: 1 PID: 10380 at fs/iomap.c:993 iomap_dio_rw+0x825/0x840
CPU: 1 PID: 10380 Comm: trinity-c30 Not tainted 4.13.0-rc7-think+ #3 
task: 8804613a5740 task.stack: 88043212
RIP: 0010:iomap_dio_rw+0x825/0x840
RSP: 0018:880432127890 EFLAGS: 00010286
RAX: fff0 RBX: 88046a64d0e8 RCX: 
RDX: ed0086424e9b RSI:  RDI: ed0086424f03
RBP: 880432127a70 R08: 88046b239840 R09: 0001
R10: 880432126f50 R11:  R12: 880432127c40
R13: 0e0a R14: 110086424f20 R15: 880432127ca0
FS:  7f4cda32f700() GS:88046b20() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f181e02f000 CR3: 00043d32a000 CR4: 001406e0
Call Trace:
 ? iomap_seek_data+0xc0/0xc0
 ? down_read_non_owner+0x40/0x40
 ? xfs_ilock+0x3f2/0x490 [xfs]
 ? touch_atime+0x9c/0x180
 ? __atime_needs_update+0x440/0x440
 xfs_file_dio_aio_read+0x12d/0x390 [xfs]
 ? xfs_file_dio_aio_read+0x12d/0x390 [xfs]
 ? xfs_file_fallocate+0x660/0x660 [xfs]
 ? cyc2ns_read_end+0x10/0x10
 xfs_file_read_iter+0x1bb/0x1d0 [xfs]
 __vfs_read+0x332/0x440
 ? default_llseek+0x140/0x140
 ? cyc2ns_read_end+0x10/0x10
 ? __fget_light+0x1ae/0x230
 ? rcu_is_watching+0x8d/0xd0
 ? exit_to_usermode_loop+0x1b0/0x1b0
 ? rw_verify_area+0x9d/0x150
 vfs_read+0xc8/0x1c0
 SyS_pread64+0x11a/0x140
 ? SyS_write+0x160/0x160
 ? do_syscall_64+0xc0/0x3e0
 ? SyS_write+0x160/0x160
 do_syscall_64+0x1bc/0x3e0
 ? syscall_return_slowpath+0x240/0x240
 ? cpumask_check.part.2+0x10/0x10
 ? cpumask_check.part.2+0x10/0x10
 ? mark_held_locks+0x23/0xb0
 ? return_from_SYSCAL

Re: iov_iter_pipe warning.

2017-08-30 Thread Dave Jones

On Mon, Aug 28, 2017 at 09:25:42PM -0700, Darrick J. Wong wrote:
 > On Mon, Aug 28, 2017 at 04:31:30PM -0400, Dave Jones wrote:
 > > On Mon, Aug 07, 2017 at 04:18:18PM -0400, Dave Jones wrote:
 > >  > On Fri, Apr 28, 2017 at 06:20:25PM +0100, Al Viro wrote:
 > >  >  > On Fri, Apr 28, 2017 at 12:50:24PM -0400, Dave Jones wrote:
 > >  >  > 
 > >  >  > diff --git a/fs/splice.c b/fs/splice.c
 > >  >  > index 540c4a44756c..12a12d9c313f 100644
 > >  >  > --- a/fs/splice.c
 > >  >  > +++ b/fs/splice.c
 > >  >  > @@ -306,6 +306,9 @@ ssize_t generic_file_splice_read(struct file 
 > > *in, loff_t *ppos,
 > >  >  >  kiocb.ki_pos = *ppos;
 > >  >  >  ret = call_read_iter(in, , );
 > >  >  >  if (ret > 0) {
 > >  >  > +if (WARN_ON(iov_iter_count() != len - ret))
 > >  >  > +printk(KERN_ERR "ops %p: was %zd, left %zd, 
 > > returned %d\n",
 > >  >  > +in->f_op, len, iov_iter_count(), 
 > > ret);
 > >  >  >  *ppos = kiocb.ki_pos;
 > >  >  >  file_accessed(in);
 > >  >  >  } else if (ret < 0) {
 > >  > 
 > >  > Hey Al,
 > >  >  Due to a git stash screw up on my part, I've had this leftover WARN_ON
 > >  > in my tree for the last couple months. (That screw-up might turn out to 
 > > be
 > >  > serendipitous if this is a real bug..)
 > >  > 
 > >  > Today I decided to change things up and beat up on xfs for a change, and
 > >  > was able to trigger this again.
 > >  > 
 > >  > Is this check no longer valid, or am I triggering the same bug we were 
 > > chased
 > >  > down in nfs, but now in xfs ?  (None of the other detritus from that 
 > > debugging
 > >  > back in April made it, just those three lines above).
 > > 
 > > Revisiting this. I went back and dug out some of the other debug diffs [1]
 > > from that old thread.
 > > 
 > > I can easily trigger this spew on xfs.
 > > 
 > > ...
 > >
 > > asked to read 4096, claims to have read 1
 > > actual size of data in pipe 4096 
 > > [0:4096]
 > > f_op: a058c920, f_flags: 49154, pos: 0/1, size: 0
 > > 
 > > 
 > > I'm still trying to narrow down an exact reproducer, but it seems having
 > > trinity do a combination of sendfile & writev, with pipes and regular
 > > files as fd's is the best repro.
 > > 
 > > Is this a real problem, or am I chasing ghosts ?  That it doesn't happen
 > > on ext4 or btrfs is making me wonder...
 > 
 >  I haven't heard of any problems w/ directio xfs lately, but OTOH
 > I think it's the only filesystem that uses iomap_dio_rw, which would
 > explain why ext4/btrfs don't have this problem.
 > 
 > Granted that's idle speculation; is there a reproducer/xfstest for this?

I reverted the debug patches mentioned above, and ran trinity for a while again,
and got this which smells really suspiciously related

WARNING: CPU: 1 PID: 10380 at fs/iomap.c:993 iomap_dio_rw+0x825/0x840
CPU: 1 PID: 10380 Comm: trinity-c30 Not tainted 4.13.0-rc7-think+ #3 
task: 8804613a5740 task.stack: 88043212
RIP: 0010:iomap_dio_rw+0x825/0x840
RSP: 0018:880432127890 EFLAGS: 00010286
RAX: fff0 RBX: 88046a64d0e8 RCX: 
RDX: ed0086424e9b RSI:  RDI: ed0086424f03
RBP: 880432127a70 R08: 88046b239840 R09: 0001
R10: 880432126f50 R11:  R12: 880432127c40
R13: 0e0a R14: 110086424f20 R15: 880432127ca0
FS:  7f4cda32f700() GS:88046b20() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f181e02f000 CR3: 00043d32a000 CR4: 001406e0
Call Trace:
 ? iomap_seek_data+0xc0/0xc0
 ? down_read_non_owner+0x40/0x40
 ? xfs_ilock+0x3f2/0x490 [xfs]
 ? touch_atime+0x9c/0x180
 ? __atime_needs_update+0x440/0x440
 xfs_file_dio_aio_read+0x12d/0x390 [xfs]
 ? xfs_file_dio_aio_read+0x12d/0x390 [xfs]
 ? xfs_file_fallocate+0x660/0x660 [xfs]
 ? cyc2ns_read_end+0x10/0x10
 xfs_file_read_iter+0x1bb/0x1d0 [xfs]
 __vfs_read+0x332/0x440
 ? default_llseek+0x140/0x140
 ? cyc2ns_read_end+0x10/0x10
 ? __fget_light+0x1ae/0x230
 ? rcu_is_watching+0x8d/0xd0
 ? exit_to_usermode_loop+0x1b0/0x1b0
 ? rw_verify_area+0x9d/0x150
 vfs_read+0xc8/0x1c0
 SyS_pread64+0x11a/0x140
 ? SyS_write+0x160/0x160
 ? do_syscall_64+0xc0/0x3e0
 ? SyS_write+0x160/0x160
 do_syscall_64+0x1bc/0x3e0
 ? syscall_return_slowpath+0x240/0x240
 ? cpumask_check.part.2+0x10/0x10
 ? cpumask_check.part.2+0x10/0x10
 ? mark_held_locks+0x23/0xb0
 ? return_from_SYSCAL

Re: iov_iter_pipe warning.

2017-08-28 Thread Dave Jones

On Mon, Aug 07, 2017 at 04:18:18PM -0400, Dave Jones wrote:
 > On Fri, Apr 28, 2017 at 06:20:25PM +0100, Al Viro wrote:
 >  > On Fri, Apr 28, 2017 at 12:50:24PM -0400, Dave Jones wrote:
 >  > > currently running v4.11-rc8-75-gf83246089ca0
 >  > > 
 >  > > sunrpc bit is for the other unrelated problem I'm chasing.
 >  > > 
 >  > > note also, I saw the backtrace without the fs/splice.c changes.
 >  > 
 >  >   Interesting...  Could you add this and see if that triggers?
 >  > 
 >  > diff --git a/fs/splice.c b/fs/splice.c
 >  > index 540c4a44756c..12a12d9c313f 100644
 >  > --- a/fs/splice.c
 >  > +++ b/fs/splice.c
 >  > @@ -306,6 +306,9 @@ ssize_t generic_file_splice_read(struct file *in, 
 > loff_t *ppos,
 >  >   kiocb.ki_pos = *ppos;
 >  >   ret = call_read_iter(in, , );
 >  >   if (ret > 0) {
 >  > + if (WARN_ON(iov_iter_count() != len - ret))
 >  > + printk(KERN_ERR "ops %p: was %zd, left %zd, returned 
 > %d\n",
 >  > + in->f_op, len, iov_iter_count(), ret);
 >  >   *ppos = kiocb.ki_pos;
 >  >   file_accessed(in);
 >  >   } else if (ret < 0) {
 > 
 > Hey Al,
 >  Due to a git stash screw up on my part, I've had this leftover WARN_ON
 > in my tree for the last couple months. (That screw-up might turn out to be
 > serendipitous if this is a real bug..)
 > 
 > Today I decided to change things up and beat up on xfs for a change, and
 > was able to trigger this again.
 > 
 > Is this check no longer valid, or am I triggering the same bug we were chased
 > down in nfs, but now in xfs ?  (None of the other detritus from that 
 > debugging
 > back in April made it, just those three lines above).

Revisiting this. I went back and dug out some of the other debug diffs [1]
from that old thread.

I can easily trigger this spew on xfs.


WARNING: CPU: 1 PID: 2251 at fs/splice.c:292 test_it+0xd4/0x1d0
CPU: 1 PID: 2251 Comm: trinity-c42 Not tainted 4.13.0-rc7-think+ #1 
task: 880459173a40 task.stack: 88044f7d
RIP: 0010:test_it+0xd4/0x1d0
RSP: 0018:88044f7d7878 EFLAGS: 00010283
RAX:  RBX: 88044f44b968 RCX: 81511ea0
RDX: 0003 RSI: dc00 RDI: 88044f44ba68
RBP: 88044f7d78c8 R08: 88046b218ec0 R09: 
R10: 88044f7d7518 R11:  R12: 1000
R13: 0001 R14:  R15: 0001
FS:  7fdbc09b2700() GS:88046b20() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2:  CR3: 000459e1d000 CR4: 001406e0
Call Trace:
 generic_file_splice_read+0x414/0x4e0
 ? opipe_prep.part.14+0x180/0x180
 ? lockdep_init_map+0xb2/0x2b0
 ? rw_verify_area+0x65/0x150
 do_splice_to+0xab/0xc0
 splice_direct_to_actor+0x1f5/0x540
 ? generic_pipe_buf_nosteal+0x10/0x10
 ? do_splice_to+0xc0/0xc0
 ? rw_verify_area+0x9d/0x150
 do_splice_direct+0x1b9/0x230
 ? splice_direct_to_actor+0x540/0x540
 ? __sb_start_write+0x164/0x1c0
 ? do_sendfile+0x7b3/0x840
 do_sendfile+0x428/0x840
 ? do_compat_pwritev64+0xb0/0xb0
 ? __might_sleep+0x72/0xe0
 ? kasan_check_write+0x14/0x20
 SyS_sendfile64+0xa4/0x120
 ? SyS_sendfile+0x150/0x150
 ? mark_held_locks+0x23/0xb0
 ? do_syscall_64+0xc0/0x3e0
 ? SyS_sendfile+0x150/0x150
 do_syscall_64+0x1bc/0x3e0
 ? syscall_return_slowpath+0x240/0x240
 ? mark_held_locks+0x23/0xb0
 ? return_from_SYSCALL_64+0x2d/0x7a
 ? trace_hardirqs_on_caller+0x182/0x260
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7fdbc02dd219
RSP: 002b:7ffc5024fa48 EFLAGS: 0246
 ORIG_RAX: 0028
RAX: ffda RBX: 0028 RCX: 7fdbc02dd219
RDX: 7fdbbe348000 RSI: 0011 RDI: 0015
RBP: 7ffc5024faf0 R08: 006d R09: 0094e82f2c730a50
R10: 1000 R11: 0246 R12: 0002
R13: 7fdbc0885058 R14: 7fdbc09b2698 R15: 7fdbc0885000
---[ end trace a5847ef0f7be7e20 ]---
asked to read 4096, claims to have read 1
actual size of data in pipe 4096 
[0:4096]
f_op: a058c920, f_flags: 49154, pos: 0/1, size: 0


I'm still trying to narrow down an exact reproducer, but it seems having
trinity do a combination of sendfile & writev, with pipes and regular
files as fd's is the best repro.

Is this a real problem, or am I chasing ghosts ?  That it doesn't happen
on ext4 or btrfs is making me wonder...

Dave


[1] https://lkml.org/lkml/2017/4/11/921

Re: iov_iter_pipe warning.

2017-08-28 Thread Dave Jones

On Mon, Aug 07, 2017 at 04:18:18PM -0400, Dave Jones wrote:
 > On Fri, Apr 28, 2017 at 06:20:25PM +0100, Al Viro wrote:
 >  > On Fri, Apr 28, 2017 at 12:50:24PM -0400, Dave Jones wrote:
 >  > > currently running v4.11-rc8-75-gf83246089ca0
 >  > > 
 >  > > sunrpc bit is for the other unrelated problem I'm chasing.
 >  > > 
 >  > > note also, I saw the backtrace without the fs/splice.c changes.
 >  > 
 >  >   Interesting...  Could you add this and see if that triggers?
 >  > 
 >  > diff --git a/fs/splice.c b/fs/splice.c
 >  > index 540c4a44756c..12a12d9c313f 100644
 >  > --- a/fs/splice.c
 >  > +++ b/fs/splice.c
 >  > @@ -306,6 +306,9 @@ ssize_t generic_file_splice_read(struct file *in, 
 > loff_t *ppos,
 >  >   kiocb.ki_pos = *ppos;
 >  >   ret = call_read_iter(in, , );
 >  >   if (ret > 0) {
 >  > + if (WARN_ON(iov_iter_count() != len - ret))
 >  > + printk(KERN_ERR "ops %p: was %zd, left %zd, returned 
 > %d\n",
 >  > + in->f_op, len, iov_iter_count(), ret);
 >  >   *ppos = kiocb.ki_pos;
 >  >   file_accessed(in);
 >  >   } else if (ret < 0) {
 > 
 > Hey Al,
 >  Due to a git stash screw up on my part, I've had this leftover WARN_ON
 > in my tree for the last couple months. (That screw-up might turn out to be
 > serendipitous if this is a real bug..)
 > 
 > Today I decided to change things up and beat up on xfs for a change, and
 > was able to trigger this again.
 > 
 > Is this check no longer valid, or am I triggering the same bug we were chased
 > down in nfs, but now in xfs ?  (None of the other detritus from that 
 > debugging
 > back in April made it, just those three lines above).

Revisiting this. I went back and dug out some of the other debug diffs [1]
from that old thread.

I can easily trigger this spew on xfs.


WARNING: CPU: 1 PID: 2251 at fs/splice.c:292 test_it+0xd4/0x1d0
CPU: 1 PID: 2251 Comm: trinity-c42 Not tainted 4.13.0-rc7-think+ #1 
task: 880459173a40 task.stack: 88044f7d
RIP: 0010:test_it+0xd4/0x1d0
RSP: 0018:88044f7d7878 EFLAGS: 00010283
RAX:  RBX: 88044f44b968 RCX: 81511ea0
RDX: 0003 RSI: dc00 RDI: 88044f44ba68
RBP: 88044f7d78c8 R08: 88046b218ec0 R09: 
R10: 88044f7d7518 R11:  R12: 1000
R13: 0001 R14:  R15: 0001
FS:  7fdbc09b2700() GS:88046b20() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2:  CR3: 000459e1d000 CR4: 001406e0
Call Trace:
 generic_file_splice_read+0x414/0x4e0
 ? opipe_prep.part.14+0x180/0x180
 ? lockdep_init_map+0xb2/0x2b0
 ? rw_verify_area+0x65/0x150
 do_splice_to+0xab/0xc0
 splice_direct_to_actor+0x1f5/0x540
 ? generic_pipe_buf_nosteal+0x10/0x10
 ? do_splice_to+0xc0/0xc0
 ? rw_verify_area+0x9d/0x150
 do_splice_direct+0x1b9/0x230
 ? splice_direct_to_actor+0x540/0x540
 ? __sb_start_write+0x164/0x1c0
 ? do_sendfile+0x7b3/0x840
 do_sendfile+0x428/0x840
 ? do_compat_pwritev64+0xb0/0xb0
 ? __might_sleep+0x72/0xe0
 ? kasan_check_write+0x14/0x20
 SyS_sendfile64+0xa4/0x120
 ? SyS_sendfile+0x150/0x150
 ? mark_held_locks+0x23/0xb0
 ? do_syscall_64+0xc0/0x3e0
 ? SyS_sendfile+0x150/0x150
 do_syscall_64+0x1bc/0x3e0
 ? syscall_return_slowpath+0x240/0x240
 ? mark_held_locks+0x23/0xb0
 ? return_from_SYSCALL_64+0x2d/0x7a
 ? trace_hardirqs_on_caller+0x182/0x260
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7fdbc02dd219
RSP: 002b:7ffc5024fa48 EFLAGS: 0246
 ORIG_RAX: 0028
RAX: ffda RBX: 0028 RCX: 7fdbc02dd219
RDX: 7fdbbe348000 RSI: 0011 RDI: 0015
RBP: 7ffc5024faf0 R08: 006d R09: 0094e82f2c730a50
R10: 1000 R11: 0246 R12: 0002
R13: 7fdbc0885058 R14: 7fdbc09b2698 R15: 7fdbc0885000
---[ end trace a5847ef0f7be7e20 ]---
asked to read 4096, claims to have read 1
actual size of data in pipe 4096 
[0:4096]
f_op: a058c920, f_flags: 49154, pos: 0/1, size: 0


I'm still trying to narrow down an exact reproducer, but it seems having
trinity do a combination of sendfile & writev, with pipes and regular
files as fd's is the best repro.

Is this a real problem, or am I chasing ghosts ?  That it doesn't happen
on ext4 or btrfs is making me wonder...

Dave


[1] https://lkml.org/lkml/2017/4/11/921

Re: nvmet_fc: add defer_req callback for deferment of cmd buffer return

2017-08-14 Thread Dave Jones

On Fri, Aug 11, 2017 at 07:44:19PM +, Linux Kernel wrote:
 > Web:
 > https://git.kernel.org/torvalds/c/0fb228d30b8d72bfee51f57e638d412324d44a11
 > Commit: 0fb228d30b8d72bfee51f57e638d412324d44a11
 > Parent: 758f3735580c21b8a36d644128af6608120a1dde
 > Refname:refs/heads/master
 > Author: James Smart 
 > AuthorDate: Tue Aug 1 15:12:39 2017 -0700
 > Committer:  Christoph Hellwig 
 > CommitDate: Thu Aug 10 11:06:38 2017 +0200
 > 
 > nvmet_fc: add defer_req callback for deferment of cmd buffer return


 > +
 > +/* Cleanup defer'ed IOs in queue */
 > +list_for_each_entry(deferfcp, >avail_defer_list, req_list) {
 > +list_del(>req_list);
 > +kfree(deferfcp);
 > +}

Shouldn't this be list_for_each_entry_safe ?

Dave

Re: nvmet_fc: add defer_req callback for deferment of cmd buffer return

2017-08-14 Thread Dave Jones

On Fri, Aug 11, 2017 at 07:44:19PM +, Linux Kernel wrote:
 > Web:
 > https://git.kernel.org/torvalds/c/0fb228d30b8d72bfee51f57e638d412324d44a11
 > Commit: 0fb228d30b8d72bfee51f57e638d412324d44a11
 > Parent: 758f3735580c21b8a36d644128af6608120a1dde
 > Refname:refs/heads/master
 > Author: James Smart 
 > AuthorDate: Tue Aug 1 15:12:39 2017 -0700
 > Committer:  Christoph Hellwig 
 > CommitDate: Thu Aug 10 11:06:38 2017 +0200
 > 
 > nvmet_fc: add defer_req callback for deferment of cmd buffer return


 > +
 > +/* Cleanup defer'ed IOs in queue */
 > +list_for_each_entry(deferfcp, >avail_defer_list, req_list) {
 > +list_del(>req_list);
 > +kfree(deferfcp);
 > +}

Shouldn't this be list_for_each_entry_safe ?

Dave

Re: iov_iter_pipe warning.

2017-08-07 Thread Dave Jones

On Fri, Apr 28, 2017 at 06:20:25PM +0100, Al Viro wrote:
 > On Fri, Apr 28, 2017 at 12:50:24PM -0400, Dave Jones wrote:
 > > currently running v4.11-rc8-75-gf83246089ca0
 > > 
 > > sunrpc bit is for the other unrelated problem I'm chasing.
 > > 
 > > note also, I saw the backtrace without the fs/splice.c changes.
 > 
 >  Interesting...  Could you add this and see if that triggers?
 > 
 > diff --git a/fs/splice.c b/fs/splice.c
 > index 540c4a44756c..12a12d9c313f 100644
 > --- a/fs/splice.c
 > +++ b/fs/splice.c
 > @@ -306,6 +306,9 @@ ssize_t generic_file_splice_read(struct file *in, loff_t 
 > *ppos,
 >  kiocb.ki_pos = *ppos;
 >  ret = call_read_iter(in, , );
 >  if (ret > 0) {
 > +if (WARN_ON(iov_iter_count() != len - ret))
 > +printk(KERN_ERR "ops %p: was %zd, left %zd, returned 
 > %d\n",
 > +in->f_op, len, iov_iter_count(), ret);
 >  *ppos = kiocb.ki_pos;
 >  file_accessed(in);
 >  } else if (ret < 0) {

Hey Al,
 Due to a git stash screw up on my part, I've had this leftover WARN_ON
in my tree for the last couple months. (That screw-up might turn out to be
serendipitous if this is a real bug..)

Today I decided to change things up and beat up on xfs for a change, and
was able to trigger this again.

Is this check no longer valid, or am I triggering the same bug we were chased
down in nfs, but now in xfs ?  (None of the other detritus from that debugging
back in April made it, just those three lines above).

Dave

WARNING: CPU: 1 PID: 18377 at fs/splice.c:309 
generic_file_splice_read+0x3e4/0x430
 
CPU: 1 PID: 18377 Comm: trinity-c17 Not tainted 4.13.0-rc4-think+ #1  
task: 88045d2855c0 task.stack: 88045ca28000   
RIP: 0010:generic_file_splice_read+0x3e4/0x430
RSP: 0018:88045ca2f900 EFLAGS: 00010206   
RAX: 001f RBX: 88045c36e200 RCX:  
RDX: 0fe1 RSI: dc00 RDI: 88045ca2f960 
RBP: 88045ca2fa38 R08: 88046b26b880 R09: 001f 
R10: 88045ca2f540 R11:  R12: 88045ca2f9b0 
R13: 88045ca2fa10 R14: 11008b945f26 R15: 88045c36e228 
FS:  7f5580594700() GS:88046b20() knlGS:

   
CS:  0010 DS:  ES:  CR0: 80050033 
CR2: 7f5580594698 CR3: 00045d3ef000 CR4: 001406e0 
Call Trace:  
 ? pipe_to_user+0xa0/0xa0
 ? lockdep_init_map+0xb2/0x2b0
 ? rw_verify_area+0x9d/0x150 
 do_splice_to+0xab/0xc0  
 splice_direct_to_actor+0x1ac/0x480   
 ? generic_pipe_buf_nosteal+0x10/0x10 
 ? do_splice_to+0xc0/0xc0
 ? rw_verify_area+0x9d/0x150 
 do_splice_direct+0x1b9/0x230
 ? splice_direct_to_actor+0x480/0x480 
 ? retint_kernel+0x10/0x10   
 ? rw_verify_area+0x9d/0x150 
 do_sendfile+0x428/0x840 
 ? do_compat_pwritev64+0xb0/0xb0  
 ? copy_user_generic_unrolled+0x83/0xb0   
 SyS_sendfile64+0xa4/0x120   
 ? SyS_sendfile+0x150/0x150  
 ? mark_held_locks+0x23/0xb0 
 ? do_syscall_64+0xc0/0x3e0  
 ? SyS_sendfile+0x150/0x150  
 do_syscall_64+0x1bc/0x3e0   
 ? syscall_return_slowpath+0x240/0x240
 ? mark_held_locks+0x23/0xb0 
 ? return_from_SYSCALL_64+0x2d/0x7a   
 ? trace_hardirqs_on_caller+0x182/0x260   
 ? trace_hardirqs_on_thunk+0x1a/0x1c  
 entry_SYSCALL64_slow_path+0x25/0x25  
RIP: 0033:0x7f557febf219 
RSP: 002b:7ffc25086db8 EFLAGS: 0246   
 ORIG_RAX: 0028  
RAX: ffda RBX: 0028 RCX: 7f557febf219 
RDX: 7f557e559000 RSI: 0187 RDI: 0199 
RBP: 7ffc25086e60 R08: 0100 R09: 6262 
R10: 1000 R11: 0246 R12: 0002 
R13: 7f5580516058 R14: 7f5580594698 R15: 7f5580516000 
---[ end trace e2f2217aba545e92 ]---  
ops a09e4920: was 4096, left 0, returned 31   

$ grep a09e4920 /proc/kallsyms 
a09e4920 r xfs_file_operations  [xfs]

Re: iov_iter_pipe warning.

2017-08-07 Thread Dave Jones

On Fri, Apr 28, 2017 at 06:20:25PM +0100, Al Viro wrote:
 > On Fri, Apr 28, 2017 at 12:50:24PM -0400, Dave Jones wrote:
 > > currently running v4.11-rc8-75-gf83246089ca0
 > > 
 > > sunrpc bit is for the other unrelated problem I'm chasing.
 > > 
 > > note also, I saw the backtrace without the fs/splice.c changes.
 > 
 >  Interesting...  Could you add this and see if that triggers?
 > 
 > diff --git a/fs/splice.c b/fs/splice.c
 > index 540c4a44756c..12a12d9c313f 100644
 > --- a/fs/splice.c
 > +++ b/fs/splice.c
 > @@ -306,6 +306,9 @@ ssize_t generic_file_splice_read(struct file *in, loff_t 
 > *ppos,
 >  kiocb.ki_pos = *ppos;
 >  ret = call_read_iter(in, , );
 >  if (ret > 0) {
 > +if (WARN_ON(iov_iter_count() != len - ret))
 > +printk(KERN_ERR "ops %p: was %zd, left %zd, returned 
 > %d\n",
 > +in->f_op, len, iov_iter_count(), ret);
 >  *ppos = kiocb.ki_pos;
 >  file_accessed(in);
 >  } else if (ret < 0) {

Hey Al,
 Due to a git stash screw up on my part, I've had this leftover WARN_ON
in my tree for the last couple months. (That screw-up might turn out to be
serendipitous if this is a real bug..)

Today I decided to change things up and beat up on xfs for a change, and
was able to trigger this again.

Is this check no longer valid, or am I triggering the same bug we were chased
down in nfs, but now in xfs ?  (None of the other detritus from that debugging
back in April made it, just those three lines above).

Dave

WARNING: CPU: 1 PID: 18377 at fs/splice.c:309 
generic_file_splice_read+0x3e4/0x430
 
CPU: 1 PID: 18377 Comm: trinity-c17 Not tainted 4.13.0-rc4-think+ #1  
task: 88045d2855c0 task.stack: 88045ca28000   
RIP: 0010:generic_file_splice_read+0x3e4/0x430
RSP: 0018:88045ca2f900 EFLAGS: 00010206   
RAX: 001f RBX: 88045c36e200 RCX:  
RDX: 0fe1 RSI: dc00 RDI: 88045ca2f960 
RBP: 88045ca2fa38 R08: 88046b26b880 R09: 001f 
R10: 88045ca2f540 R11:  R12: 88045ca2f9b0 
R13: 88045ca2fa10 R14: 11008b945f26 R15: 88045c36e228 
FS:  7f5580594700() GS:88046b20() knlGS:

   
CS:  0010 DS:  ES:  CR0: 80050033 
CR2: 7f5580594698 CR3: 00045d3ef000 CR4: 001406e0 
Call Trace:  
 ? pipe_to_user+0xa0/0xa0
 ? lockdep_init_map+0xb2/0x2b0
 ? rw_verify_area+0x9d/0x150 
 do_splice_to+0xab/0xc0  
 splice_direct_to_actor+0x1ac/0x480   
 ? generic_pipe_buf_nosteal+0x10/0x10 
 ? do_splice_to+0xc0/0xc0
 ? rw_verify_area+0x9d/0x150 
 do_splice_direct+0x1b9/0x230
 ? splice_direct_to_actor+0x480/0x480 
 ? retint_kernel+0x10/0x10   
 ? rw_verify_area+0x9d/0x150 
 do_sendfile+0x428/0x840 
 ? do_compat_pwritev64+0xb0/0xb0  
 ? copy_user_generic_unrolled+0x83/0xb0   
 SyS_sendfile64+0xa4/0x120   
 ? SyS_sendfile+0x150/0x150  
 ? mark_held_locks+0x23/0xb0 
 ? do_syscall_64+0xc0/0x3e0  
 ? SyS_sendfile+0x150/0x150  
 do_syscall_64+0x1bc/0x3e0   
 ? syscall_return_slowpath+0x240/0x240
 ? mark_held_locks+0x23/0xb0 
 ? return_from_SYSCALL_64+0x2d/0x7a   
 ? trace_hardirqs_on_caller+0x182/0x260   
 ? trace_hardirqs_on_thunk+0x1a/0x1c  
 entry_SYSCALL64_slow_path+0x25/0x25  
RIP: 0033:0x7f557febf219 
RSP: 002b:7ffc25086db8 EFLAGS: 0246   
 ORIG_RAX: 0028  
RAX: ffda RBX: 0028 RCX: 7f557febf219 
RDX: 7f557e559000 RSI: 0187 RDI: 0199 
RBP: 7ffc25086e60 R08: 0100 R09: 6262 
R10: 1000 R11: 0246 R12: 0002 
R13: 7f5580516058 R14: 7f5580594698 R15: 7f5580516000 
---[ end trace e2f2217aba545e92 ]---  
ops a09e4920: was 4096, left 0, returned 31   

$ grep a09e4920 /proc/kallsyms 
a09e4920 r xfs_file_operations  [xfs]

use-after-free. [libata/block]

2017-07-27 Thread Dave Jones

Found this in the logs this morning after an overnight fuzz run..

BUG: KASAN: use-after-free in __lock_acquire+0x1aa/0x1970
Read of size 8 at addr 880406805e30 by task trinity-c8/25954

CPU: 1 PID: 25954 Comm: trinity-c8 Not tainted 4.13.0-rc2-think+ #1 
Call Trace:
 
 dump_stack+0x68/0xa1
 print_address_description+0xd9/0x270
 kasan_report+0x257/0x370
 ? __lock_acquire+0x1aa/0x1970
 __asan_load8+0x54/0x90
 __lock_acquire+0x1aa/0x1970
 ? save_stack+0xb1/0xd0
 ? save_stack_trace+0x1b/0x20
 ? save_stack+0x46/0xd0
 ? try_to_wake_up+0x9b/0xa20
 ? end_swap_bio_read+0xbe/0x1a0
 ? debug_check_no_locks_freed+0x1b0/0x1b0
 ? scsi_softirq_done+0x1a3/0x1d0
 ? __blk_mq_complete_request+0x14a/0x2a0
 ? blk_mq_complete_request+0x33/0x40
 ? scsi_mq_done+0x4e/0x190
 ? ata_scsi_qc_complete+0x15b/0x700
 ? __ata_qc_complete+0x16d/0x2e0
 ? ata_qc_complete+0x1a4/0x740
 ? ata_qc_complete_multiple+0xeb/0x140
 ? ahci_handle_port_interrupt+0x19e/0xa10
 ? ahci_handle_port_intr+0xd9/0x130
 ? ahci_single_level_irq_intr+0x62/0x90
 ? __handle_irq_event_percpu+0x6e/0x450
 ? handle_irq_event_percpu+0x70/0xf0
 ? handle_irq_event+0x5a/0x90
 ? handle_edge_irq+0xd9/0x2f0
 ? handle_irq+0xb4/0x190
 ? do_IRQ+0x67/0x140
 ? common_interrupt+0x97/0x97
 ? do_syscall_64+0x45/0x260
 ? entry_SYSCALL64_slow_path+0x25/0x25
 lock_acquire+0xfc/0x220
 ? lock_acquire+0xfc/0x220
 ? try_to_wake_up+0x9b/0xa20
 _raw_spin_lock_irqsave+0x40/0x80
 ? try_to_wake_up+0x9b/0xa20
 try_to_wake_up+0x9b/0xa20
 ? rcu_read_lock_sched_held+0x8f/0xa0
 ? kmem_cache_free+0x2d3/0x300
 ? migrate_swap_stop+0x3f0/0x3f0
 ? mempool_free+0x5f/0xd0
 wake_up_process+0x15/0x20
 end_swap_bio_read+0xc6/0x1a0
 bio_endio+0x12f/0x300
 blk_update_request+0x12e/0x5c0
 scsi_end_request+0x63/0x2f0
 scsi_io_completion+0x3f3/0xa50
 ? scsi_end_request+0x2f0/0x2f0
 ? lock_downgrade+0x2c0/0x2c0
 ? lock_acquire+0xfc/0x220
 ? blk_stat_add+0x62/0x340
 ? scsi_handle_queue_ramp_up+0x42/0x1e0
 scsi_finish_command+0x1b1/0x220
 scsi_softirq_done+0x1a3/0x1d0
 __blk_mq_complete_request+0x14a/0x2a0
 ? scsi_prep_state_check.isra.26+0xa0/0xa0
 blk_mq_complete_request+0x33/0x40
 scsi_mq_done+0x4e/0x190
 ? scsi_prep_state_check.isra.26+0xa0/0xa0
 ata_scsi_qc_complete+0x15b/0x700
 ? lock_downgrade+0x2c0/0x2c0
 ? msleep_interruptible+0xb0/0xb0
 ? ata_scsi_activity_show+0xb0/0xb0
 ? trace_hardirqs_off_caller+0x70/0x110
 ? trace_hardirqs_off+0xd/0x10
 ? _raw_spin_unlock_irqrestore+0x4b/0x50
 ? intel_unmap+0x20b/0x300
 ? intel_unmap_sg+0x9e/0xc0
 __ata_qc_complete+0x16d/0x2e0
 ? intel_unmap+0x300/0x300
 ata_qc_complete+0x1a4/0x740
 ata_qc_complete_multiple+0xeb/0x140
 ahci_handle_port_interrupt+0x19e/0xa10
 ? ahci_single_level_irq_intr+0x57/0x90
 ahci_handle_port_intr+0xd9/0x130
 ahci_single_level_irq_intr+0x62/0x90
 ? ahci_handle_port_intr+0x130/0x130
 __handle_irq_event_percpu+0x6e/0x450
 handle_irq_event_percpu+0x70/0xf0
 ? __handle_irq_event_percpu+0x450/0x450
 ? lock_contended+0x810/0x810
 ? handle_edge_irq+0x30/0x2f0
 ? do_raw_spin_unlock+0x97/0x130
 handle_irq_event+0x5a/0x90
 handle_edge_irq+0xd9/0x2f0
 handle_irq+0xb4/0x190
 do_IRQ+0x67/0x140
 common_interrupt+0x97/0x97
RIP: 0010:do_syscall_64+0x45/0x260
RSP: 0018:8803b2bd7f08 EFLAGS: 0246
 ORIG_RAX: ff1e
RAX:  RBX: 8803b2bd7f58 RCX: 81146032
RDX: 0007 RSI: dc00 RDI: 88044e6c9cc0
RBP: 8803b2bd7f48 R08: 88046b21d1c0 R09: 
R10:  R11:  R12: 0023
R13: 88044e6c9cc0 R14: 8803b2bd7fd0 R15: cccd
 
 ? trace_hardirqs_on_caller+0x182/0x260
 ? do_syscall_64+0x41/0x260
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f80c5932230
RSP: 002b:7fff521ac2e8 EFLAGS: 0246
 ORIG_RAX: 0023
RAX: ffda RBX: 7f80c5ff6058 RCX: 7f80c5932230
RDX:  RSI:  RDI: 7fff521ac2f0
RBP: 6562 R08: 7f80c5c130a4 R09: 7f80c5c13120
R10: 0001 R11: 0246 R12: 005a
R13: 7f80c5ff6058 R14: 004df61cd3a0 R15: cccd

Allocated by task 14480:
 save_stack_trace+0x1b/0x20
 save_stack+0x46/0xd0
 kasan_kmalloc+0xad/0xe0
 kasan_slab_alloc+0x12/0x20
 kmem_cache_alloc+0xe0/0x2f0
 copy_process.part.44+0xbe0/0x2f90
 _do_fork+0x173/0x8a0
 SyS_clone+0x19/0x20
 do_syscall_64+0xea/0x260
 return_from_SYSCALL_64+0x0/0x7a

Freed by task 0:
 save_stack_trace+0x1b/0x20
 save_stack+0x46/0xd0
 kasan_slab_free+0x72/0xc0
 kmem_cache_free+0xa8/0x300
 free_task+0x69/0x70
 __put_task_struct+0xdc/0x220
 delayed_put_task_struct+0x59/0x1a0
 rcu_process_callbacks+0x49a/0x1580
 __do_softirq+0x109/0x5bc

The buggy address belongs to the object at 8804068055c0
 which belongs to the cache task_struct of size 6848
The buggy address is located 2160 bytes inside of
 6848-byte region [8804068055c0, 880406807080)
The buggy address belongs to the page:
page:ea00101a count:1 mapcount:0 mapping:  (null) index:0x0

use-after-free. [libata/block]

2017-07-27 Thread Dave Jones

Found this in the logs this morning after an overnight fuzz run..

BUG: KASAN: use-after-free in __lock_acquire+0x1aa/0x1970
Read of size 8 at addr 880406805e30 by task trinity-c8/25954

CPU: 1 PID: 25954 Comm: trinity-c8 Not tainted 4.13.0-rc2-think+ #1 
Call Trace:
 
 dump_stack+0x68/0xa1
 print_address_description+0xd9/0x270
 kasan_report+0x257/0x370
 ? __lock_acquire+0x1aa/0x1970
 __asan_load8+0x54/0x90
 __lock_acquire+0x1aa/0x1970
 ? save_stack+0xb1/0xd0
 ? save_stack_trace+0x1b/0x20
 ? save_stack+0x46/0xd0
 ? try_to_wake_up+0x9b/0xa20
 ? end_swap_bio_read+0xbe/0x1a0
 ? debug_check_no_locks_freed+0x1b0/0x1b0
 ? scsi_softirq_done+0x1a3/0x1d0
 ? __blk_mq_complete_request+0x14a/0x2a0
 ? blk_mq_complete_request+0x33/0x40
 ? scsi_mq_done+0x4e/0x190
 ? ata_scsi_qc_complete+0x15b/0x700
 ? __ata_qc_complete+0x16d/0x2e0
 ? ata_qc_complete+0x1a4/0x740
 ? ata_qc_complete_multiple+0xeb/0x140
 ? ahci_handle_port_interrupt+0x19e/0xa10
 ? ahci_handle_port_intr+0xd9/0x130
 ? ahci_single_level_irq_intr+0x62/0x90
 ? __handle_irq_event_percpu+0x6e/0x450
 ? handle_irq_event_percpu+0x70/0xf0
 ? handle_irq_event+0x5a/0x90
 ? handle_edge_irq+0xd9/0x2f0
 ? handle_irq+0xb4/0x190
 ? do_IRQ+0x67/0x140
 ? common_interrupt+0x97/0x97
 ? do_syscall_64+0x45/0x260
 ? entry_SYSCALL64_slow_path+0x25/0x25
 lock_acquire+0xfc/0x220
 ? lock_acquire+0xfc/0x220
 ? try_to_wake_up+0x9b/0xa20
 _raw_spin_lock_irqsave+0x40/0x80
 ? try_to_wake_up+0x9b/0xa20
 try_to_wake_up+0x9b/0xa20
 ? rcu_read_lock_sched_held+0x8f/0xa0
 ? kmem_cache_free+0x2d3/0x300
 ? migrate_swap_stop+0x3f0/0x3f0
 ? mempool_free+0x5f/0xd0
 wake_up_process+0x15/0x20
 end_swap_bio_read+0xc6/0x1a0
 bio_endio+0x12f/0x300
 blk_update_request+0x12e/0x5c0
 scsi_end_request+0x63/0x2f0
 scsi_io_completion+0x3f3/0xa50
 ? scsi_end_request+0x2f0/0x2f0
 ? lock_downgrade+0x2c0/0x2c0
 ? lock_acquire+0xfc/0x220
 ? blk_stat_add+0x62/0x340
 ? scsi_handle_queue_ramp_up+0x42/0x1e0
 scsi_finish_command+0x1b1/0x220
 scsi_softirq_done+0x1a3/0x1d0
 __blk_mq_complete_request+0x14a/0x2a0
 ? scsi_prep_state_check.isra.26+0xa0/0xa0
 blk_mq_complete_request+0x33/0x40
 scsi_mq_done+0x4e/0x190
 ? scsi_prep_state_check.isra.26+0xa0/0xa0
 ata_scsi_qc_complete+0x15b/0x700
 ? lock_downgrade+0x2c0/0x2c0
 ? msleep_interruptible+0xb0/0xb0
 ? ata_scsi_activity_show+0xb0/0xb0
 ? trace_hardirqs_off_caller+0x70/0x110
 ? trace_hardirqs_off+0xd/0x10
 ? _raw_spin_unlock_irqrestore+0x4b/0x50
 ? intel_unmap+0x20b/0x300
 ? intel_unmap_sg+0x9e/0xc0
 __ata_qc_complete+0x16d/0x2e0
 ? intel_unmap+0x300/0x300
 ata_qc_complete+0x1a4/0x740
 ata_qc_complete_multiple+0xeb/0x140
 ahci_handle_port_interrupt+0x19e/0xa10
 ? ahci_single_level_irq_intr+0x57/0x90
 ahci_handle_port_intr+0xd9/0x130
 ahci_single_level_irq_intr+0x62/0x90
 ? ahci_handle_port_intr+0x130/0x130
 __handle_irq_event_percpu+0x6e/0x450
 handle_irq_event_percpu+0x70/0xf0
 ? __handle_irq_event_percpu+0x450/0x450
 ? lock_contended+0x810/0x810
 ? handle_edge_irq+0x30/0x2f0
 ? do_raw_spin_unlock+0x97/0x130
 handle_irq_event+0x5a/0x90
 handle_edge_irq+0xd9/0x2f0
 handle_irq+0xb4/0x190
 do_IRQ+0x67/0x140
 common_interrupt+0x97/0x97
RIP: 0010:do_syscall_64+0x45/0x260
RSP: 0018:8803b2bd7f08 EFLAGS: 0246
 ORIG_RAX: ff1e
RAX:  RBX: 8803b2bd7f58 RCX: 81146032
RDX: 0007 RSI: dc00 RDI: 88044e6c9cc0
RBP: 8803b2bd7f48 R08: 88046b21d1c0 R09: 
R10:  R11:  R12: 0023
R13: 88044e6c9cc0 R14: 8803b2bd7fd0 R15: cccd
 
 ? trace_hardirqs_on_caller+0x182/0x260
 ? do_syscall_64+0x41/0x260
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f80c5932230
RSP: 002b:7fff521ac2e8 EFLAGS: 0246
 ORIG_RAX: 0023
RAX: ffda RBX: 7f80c5ff6058 RCX: 7f80c5932230
RDX:  RSI:  RDI: 7fff521ac2f0
RBP: 6562 R08: 7f80c5c130a4 R09: 7f80c5c13120
R10: 0001 R11: 0246 R12: 005a
R13: 7f80c5ff6058 R14: 004df61cd3a0 R15: cccd

Allocated by task 14480:
 save_stack_trace+0x1b/0x20
 save_stack+0x46/0xd0
 kasan_kmalloc+0xad/0xe0
 kasan_slab_alloc+0x12/0x20
 kmem_cache_alloc+0xe0/0x2f0
 copy_process.part.44+0xbe0/0x2f90
 _do_fork+0x173/0x8a0
 SyS_clone+0x19/0x20
 do_syscall_64+0xea/0x260
 return_from_SYSCALL_64+0x0/0x7a

Freed by task 0:
 save_stack_trace+0x1b/0x20
 save_stack+0x46/0xd0
 kasan_slab_free+0x72/0xc0
 kmem_cache_free+0xa8/0x300
 free_task+0x69/0x70
 __put_task_struct+0xdc/0x220
 delayed_put_task_struct+0x59/0x1a0
 rcu_process_callbacks+0x49a/0x1580
 __do_softirq+0x109/0x5bc

The buggy address belongs to the object at 8804068055c0
 which belongs to the cache task_struct of size 6848
The buggy address is located 2160 bytes inside of
 6848-byte region [8804068055c0, 880406807080)
The buggy address belongs to the page:
page:ea00101a count:1 mapcount:0 mapping:  (null) index:0x0

Re: [PATCH] lib/strscpy: avoid KASAN false positive

2017-07-19 Thread Dave Jones

On Wed, Jul 19, 2017 at 11:39:32AM -0400, Chris Metcalf wrote:
 
 > > We could just remove all that word-at-a-time logic.  Do we have any
 > > evidence that this would harm anything?
 > 
 > The word-at-a-time logic was part of the initial commit since I wanted
 > to ensure that strscpy could be used to replace strlcpy or strncpy without
 > serious concerns about performance.

I'm curious what the typical length of the strings we're concerned about
in this case are if this makes a difference.

Dave

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4526 matches

Mail list logo