On Mon, Jul 9, 2018 at 5:44 PM, Dave Hansen <dave.han...@intel.com> wrote:
> ...  cc'ing a few folks who I know have been looking at this code
> lately.  The full oops is below if any of you want to take a look.
>
> OK, well, annotating the disassembly a bit:
>
>> (gdb) disass free_pages_and_swap_cache
>> Dump of assembler code for function free_pages_and_swap_cache:
>>    0xffffffff8124c0d0 <+0>: callq  0xffffffff81a017a0 <__fentry__>
>>    0xffffffff8124c0d5 <+5>: push   %r14
>>    0xffffffff8124c0d7 <+7>: push   %r13
>>    0xffffffff8124c0d9 <+9>: push   %r12
>>    0xffffffff8124c0db <+11>: mov    %rdi,%r12         // %r12 = pages
>>    0xffffffff8124c0de <+14>: push   %rbp
>>    0xffffffff8124c0df <+15>: mov    %esi,%ebp         // %ebp = nr
>>    0xffffffff8124c0e1 <+17>: push   %rbx
>>    0xffffffff8124c0e2 <+18>: callq  0xffffffff81205a10 <lru_add_drain>
>>    0xffffffff8124c0e7 <+23>: test   %ebp,%ebp         // test nr==0
>>    0xffffffff8124c0e9 <+25>: jle    0xffffffff8124c156 
>> <free_pages_and_swap_cache+134>
>>    0xffffffff8124c0eb <+27>: lea    -0x1(%rbp),%eax
>>    0xffffffff8124c0ee <+30>: mov    %r12,%rbx         // %rbx = pages
>>    0xffffffff8124c0f1 <+33>: lea    0x8(%r12,%rax,8),%r14 // load &pages[nr] 
>> into %r14?
>>    0xffffffff8124c0f6 <+38>: mov    (%rbx),%r13       // %r13 = pages[i]
>>    0xffffffff8124c0f9 <+41>: mov    0x20(%r13),%rdx   //<<<<<<<<<<<<<<<<<<<< 
>> GPF here.
> %r13 is 64-byte aligned, so looks like a halfway reasonable 'struct page *'.
>
> %R14 looks OK (0xffff93d4abb5f000) because it points to the end of a
> dynamically-allocated (not on-stack) mmu_gather_batch page.  %RBX is
> pointing 50 pages up from the start of the previous page.  That makes it
> the 48th page in pages[] after a pointer and two integers in the
> beginning of the structure.  That 48 is important because it's way
> larger than the on-stack size of 8.
>
> It's hard to make much sense of %R13 (pages[48] / 0xfffbf0809e304bc0)
> because the vmemmap addresses get randomized.  But, I _think_ that's too
> high of an address for a 4-level paging vmemmap[] entry.  Does anybody
> else know offhand?
>
> I'd really want to see this reproduced without KASLR to make the oops
> easier to read.  It would also be handy to try your workload with all
> the pedantic debugging: KASAN, slab debugging, DEBUG_PAGE_ALLOC, etc...
> and see if it still triggers.

How can I turn them on at boot time?

> Some relevant functions and structures below for reference.
>
> void free_pages_and_swap_cache(struct page **pages, int nr)
> {
>         for (i = 0; i < nr; i++)
>                 free_swap_cache(pages[i]);
> }
>
>
> static void tlb_flush_mmu_free(struct mmu_gather *tlb)
> {
>         for (batch = &tlb->local; batch && batch->nr;
>              batch = batch->next) {
>                 free_pages_and_swap_cache(batch->pages, batch->nr);
> }
>
> zap_pte_range()
> {
>         if (force_flush)
>                 tlb_flush_mmu_free(tlb);
> }
>
> ... all the way up to the on-stack-allocated mmu_gather:
>
> void zap_page_range(struct vm_area_struct *vma, unsigned long start,
>                 unsigned long size)
> {
>         struct mmu_gather tlb;
>
>
> #define MMU_GATHER_BUNDLE       8
>
> struct mmu_gather {
> ...
>         struct mmu_gather_batch local;
>         struct page             *__pages[MMU_GATHER_BUNDLE];
> }
>
> struct mmu_gather_batch {
>         struct mmu_gather_batch *next;
>         unsigned int            nr;
>         unsigned int            max;
>         struct page             *pages[0];
> };
>
> #define MAX_GATHER_BATCH        \
>         ((PAGE_SIZE - sizeof(struct mmu_gather_batch)) / sizeof(void *))
>



-- 
H.J.

Reply via email to