On Mon, Nov 20, 2017 at 05:39:34PM -0800, Andy Lutomirski wrote: > On Mon, Nov 20, 2017 at 1:55 PM, Josh Poimboeuf <jpoim...@redhat.com> wrote: > > On Mon, Nov 20, 2017 at 01:30:12PM -0800, Andy Lutomirski wrote: > >> On Mon, Nov 20, 2017 at 1:27 PM, Josh Poimboeuf <jpoim...@redhat.com> > >> wrote: > >> > On Mon, Nov 20, 2017 at 01:07:16PM -0800, Andy Lutomirski wrote: > >> >> >> but, more importantly, the OOPS unwinder will just bail without this > >> >> >> patch. With the patch, we get a valid unwind, except that everything > >> >> >> has a ? in front. > >> >> > > >> >> > Hm. I can't even fathom how that's possible. Are you talking about > >> >> > the > >> >> > "unwind from NMI to SYSENTER stack" path? Or any unwind to a syscall? > >> >> > Either way I'm baffled... If the unwinder only encounters the > >> >> > SYSENTER > >> >> > stack at the end, how could that cause everything beforehand to have a > >> >> > question mark? > >> >> > >> >> I mean that, if I put a ud2 or other bug in the code that runs on the > >> >> SYSENTER stack, without this patch, I get a totally blank call trace. > >> > > >> > I would expect a blank call trace either way... > >> > >> Try making sync_regs use a few kB of stack space or, better yet, call > >> a non-inlined function that uses too much stack. > > > > You mean overflow the exception stack? I still don't see how that would > > do it. > > > > If you could show a specific example, with splats from before/after, > > that would be helpful. Because I still have no idea how this patch > > could possibly help. > > I added BUG() to sync_regs(). With the patch, I get: > > [ 4.211553] PANIC: double fault, error_code: 0x0 > [ 4.212113] CPU: 0 PID: 1 Comm: sh Not tainted 4.14.0+ #920 > [ 4.212741] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.10.2-1.fc26 04/01/2014 > [ 4.213536] task: ffff88001aa18000 task.stack: ffff88001aa20000 > [ 4.214059] RIP: 0010:do_error_trap+0x33/0x1c0 > [ 4.214449] RSP: 0000:ffffffffff1b8f78 EFLAGS: 00010096 > [ 4.214934] RAX: dffffc0000000000 RBX: ffffffffff1b8f90 RCX: > 0000000000000006 > [ 4.215554] RDX: ffffffff82048b20 RSI: 0000000000000000 RDI: > ffffffffff1b9110 > [ 4.216176] RBP: ffffffffff1b9088 R08: 0000000000000004 R09: > 0000000000000000 > [ 4.216793] R10: 0000000000000000 R11: fffffbffffe3723f R12: > 0000000000000006 > [ 4.217419] R13: 0000000000000000 R14: 0000000000000004 R15: > 0000000000000000 > [ 4.218046] FS: 0000000000000000(0000) GS:ffff88001ae00000(0000) > knlGS:0000000000000000 > [ 4.218775] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 4.219280] CR2: ffffffffff1b8f68 CR3: 00000000193da002 CR4: > 00000000003606f0 > [ 4.219931] Call Trace: > [ 4.220156] <SYSENTER> > [ 4.220383] ? async_page_fault+0x36/0x60 > [ 4.220768] ? invalid_op+0x22/0x40 > [ 4.221087] ? async_page_fault+0x36/0x60 > [ 4.221442] ? sync_regs+0x3c/0x40 > [ 4.221745] ? sync_regs+0x2e/0x40 > [ 4.222051] ? error_entry+0x6c/0xd0 > [ 4.222395] ? async_page_fault+0x36/0x60 > [ 4.222748] </SYSENTER>
Ah, page fault. I thought you were talking about an NMI. I get it now. Did it overflow the stack? I think that would explain the question marks. -- Josh