On Wed, Aug 5, 2020 at 4:03 AM Jason A. Donenfeld <ja...@zx2c4.com> wrote:
>
> The commit 8bb9bf242d1f ("x86/mm/64: Do not sync vmalloc/ioremap
> mappings") causes the OOPS below, in Linus' tree and in linux-next,
> unearthed by my CI on <https://www.wireguard.com/build-status/>.
> Bisecting reveals 8bb9bf242d1f, and reverting this makes the OOPS go
> away.

The oops happens early in the function, and the "Code:" line actually
gets almost the whole function prologue in it (missing first two bytes
are probably "push %rbp"):

   0: 41 56                push   %r14
   2: 41 55                push   %r13
   4: 41 54                push   %r12
   6: 55                    push   %rbp
   7: 48 89 f5              mov    %rsi,%rbp
   a: 53                    push   %rbx
   b: 48 89 fb              mov    %rdi,%rbx
   e: 48 83 ec 08          sub    $0x8,%rsp
  12: 48 8b 06              mov    (%rsi),%rax
  15: 4c 8b 67 40          mov    0x40(%rdi),%r12
  19: 49 89 c6              mov    %rax,%r14
  1c: 45 30 f6              xor    %r14b,%r14b
  1f: a8 04                test   $0x4,%al
  21: b8 00 00 00 00        mov    $0x0,%eax
  26: 4c 0f 44 f0          cmove  %rax,%r14
  2a:* 49 8b 46 08          mov    0x8(%r14),%rax <-- trapping instruction


> BUG: unable to handle page fault for address: ffffe8ffffd00608
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0

Yeah, missing page table because it wasn't copied.

Presumably because that kthread is using the active_mm of some random
user space process that didn't get sync'ed.

And the sync_global_pgds() may have ended up being sufficient
synchronization with whoever allocated thigns, even if it wasn't about
the TLB contents themselves.

So apparently the "the page-table pages are all pre-allocated now" is
simply not true. Joerg?

Unless somebody can figure this out fairly quickly, I think it should
just be reverted.

               Linus

Reply via email to