On Thu, Aug 6, 2020 at 12:23 PM Joerg Roedel <jroe...@suse.de> wrote: > > Yes, that's the best for now. My gut feeling is that the fault Jason is > seeing didn't happen on a vmalloc address, but I can't prove that yet.
No, it's definitely fairly high in the vmalloc space. Look at the faulting address: BUG: unable to handle page fault for address: ffffe8ffffd00608 and the code sequence is this: > 12: 48 8b 06 mov (%rsi),%rax > 15: 4c 8b 67 40 mov 0x40(%rdi),%r12 > 19: 49 89 c6 mov %rax,%r14 > 1c: 45 30 f6 xor %r14b,%r14b > 1f: a8 04 test $0x4,%al > 21: b8 00 00 00 00 mov $0x0,%eax > 26: 4c 0f 44 f0 cmove %rax,%r14 that admittedly odd sequence is get_work_pwq(work) And then the faulting instruction is: > 2a:* 49 8b 46 08 mov 0x8(%r14),%rax <-- trapping instruction and this is the "->wq" dereference. So it's the pwq->wq that traps, with 'pwq' being the trapping base pointer, and clearly being in the vmalloc space. I think pwq may a percpu allocation, so not _directly_ vmalloc(). Adding Tejun to the cc in case he can clarify ("No, silly Linus, it's allocated here.."). Linus