On 5 December 2016 at 19:11, Andy Lutomirski <l...@kernel.org> wrote: > On Sun, Dec 4, 2016 at 3:04 PM, Vegard Nossum <vegard.nos...@gmail.com> wrote: >> On 23 November 2016 at 20:58, Dave Jones <da...@codemonkey.org.uk> wrote: >>> On Wed, Nov 23, 2016 at 02:34:19PM -0500, Dave Jones wrote: >>> >>> > [ 317.689216] BUG: Bad page state in process kworker/u8:8 pfn:4d8fd4 >>> > trace from just before this happened. Does this shed any light ? >>> > >>> > https://codemonkey.org.uk/junk/trace.txt >>> >>> crap, I just noticed the timestamps in the trace come from quite a bit >>> later. I'll tweak the code to do the taint checking/ftrace stop after >>> every syscall, that should narrow the window some more. >> >> FWIW I hit this as well: >> >> BUG: unable to handle kernel paging request at ffffffff81ff08b7 > > We really ought to improve this message. If nothing else, it should > say whether it was a read, a write, or an instruction fetch. > >> IP: [<ffffffff8135f2ea>] __lock_acquire.isra.32+0xda/0x1a30 >> PGD 461e067 PUD 461f063 >> PMD 1e001e1 > > Too lazy to manually decode this right now, but I don't think it matters. > >> Oops: 0003 [#1] PREEMPT SMP KASAN > > Note this is SMP, but that just means CONFIG_SMP=y. Vegard, how many > CPUs does your kernel think you have?
My first crash was running on a 1-CPU guest (not intentionally, but because of a badly configured qemu). I'm honestly surprised it triggered at all with 1 CPU, but I guess it shows that it's not a true concurrency issue at least! > >> RIP: 0010:[<ffffffff8135f2ea>] [<ffffffff8135f2ea>] >> __lock_acquire.isra.32+0xda/0x1a30 >> RSP: 0018:ffff8801bab8f730 EFLAGS: 00010082 >> RAX: ffffffff81ff071f RBX: 0000000000000000 RCX: 0000000000000000 > > RAX points to kernel text. Yes, it's somewhere in the middle of iov_iter_init() -- other crashes also had put_prev_entity(), a null pointer, and some garbage values I couldn't identify. > >> Code: 89 4d b8 44 89 45 c0 89 4d c8 4c 89 55 d0 e8 ee c3 ff ff 48 85 >> c0 4c 8b 55 d0 8b 4d c8 44 8b 45 c0 4c 8b 4d b8 0f 84 c6 01 00 00 <3e> >> ff 80 98 01 00 00 49 8d be 48 07 00 00 48 ba 00 00 00 00 00 > > 2b: 3e ff 80 98 01 00 00 incl %ds:*0x198(%rax) <-- > trapping instruction > > That's very strange. I think this is: > > atomic_inc((atomic_t *)&class->ops); > > but my kernel contains: > > 3cb4: f0 ff 80 98 01 00 00 lock incl 0x198(%rax) > > So your kernel has been smp-alternatived. That 3e comes from > alternatives_smp_unlock. If you're running on SMP with UP > alternatives, things will break. Yes, indeed. It was running on 1 CPU by mistake and still triggered the bug. The crashes started really pouring in once I got my qemu fixed. Just to reassure you, here's another crash which shows it's using the correct instruction on an actual multicore guest: BUG: unable to handle kernel paging request at 00000003030001de IP: [<ffffffff8135f2ea>] __lock_acquire.isra.32+0xda/0x1a30 PGD 183fd2067 PUD 0 Oops: 0002 [#1] PREEMPT SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) CPU: 23 PID: 9584 Comm: trinity-c104 Not tainted 4.9.0-rc7+ #219 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 task: ffff880189f68000 task.stack: ffff88017fe50000 RIP: 0010:[<ffffffff8135f2ea>] [<ffffffff8135f2ea>] __lock_acquire.isra.32+0xda/0x1a30 RSP: 0018:ffff88017fe575e0 EFLAGS: 00010002 RAX: 0000000303000046 RBX: 0000000000000000 RCX: 0000000000000000 [...] Code: 89 4d b8 44 89 45 c0 89 4d c8 4c 89 55 d0 e8 ee c3 ff ff 48 85 c0 4c 8b 55 d0 8b 4d c8 44 8b 45 c0 4c 8b 4d b8 0f 84 c6 01 00 00 <f0> ff 80 98 01 00 00 49 8d be 48 07 00 00 48 ba 00 00 00 00 00 RIP [<ffffffff8135f2ea>] __lock_acquire.isra.32+0xda/0x1a30 RSP <ffff88017fe575e0> CR2: 00000003030001de ---[ end trace 2846425104eb6141 ]--- Kernel panic - not syncing: Fatal exception ------------[ cut here ]------------ 2b:* f0 ff 80 98 01 00 00 lock incl 0x198(%rax) <-- trapping instruction > What's your kernel command line? Can we have your entire kernel log from > boot? Just in case you still want this, I've attached the boot log for the "true SMP" guest above. Vegard
5.txt.gz
Description: GNU Zip compressed data