On Mon, 21 Mar 2005 14:31:36 -0800 "Luck, Tony" <[EMAIL PROTECTED]> wrote:
> Builds clean and boots on ia64. > > I haven't tried any hugetlb operations on it though. It works on ia64 because it doesn't actually do anything in flush_tlb_pgtables(), I bet. Hugh, I know the exact trigger case, it's unmapping a VMA right before the stack segment. So the free_pgtables() call happens with this state: prev->vm_end == 0x70186000 next->vm_start == 0xefab8000 vma->vm_start == 0x70186000 vma->vm_end == 0x70188000 (so we're doing munmap(0x70186000, PAGE_SIZE), the sparc64 stack segment for 32-bit tasks grows down from 0xf0000000, the bottom of it is at 0xefab8000 at this point in time) So the free_pgtables() call will be with: floor == 0x70186000 ceiling == 0xefab8000 This should be fairly simple, so let's analyze exactly what happens: 1) vma == the munmap() call's VMA next == stack segment VMA, which sits right after "vma" addr == 0x70186000 (base of munmap() area) 2) VMA optimization loop runs: next->vm_start is 0xefab8000 vma->vm_end is 0x70188000 on sparc64 PMD_SIZE is 1UL << 23 or 0x800000 therefore vma->vm_end + (2 * PMD_SIZE) is 0x71188000 this is much less than 0xefab8000 so the loop terminates immediately Therefore, next is unchanged. 3) free_pgd_range() is invoked with: addr == 0x70186000 end == 0x70188000 floor == 0x70186000 ceiling == 0xefab8000 4) We mask addr with PMD_MASK (which is 0xffffffffff800000) This sets addr to 0x70000000, which makes it less than floor, therefore addr has PMD_SIZE added to it. Now, addr is 0x70800000, this is the source of the problems as this value determines the "start" argument passed to flush_tlb_pgtables(). Note how it is larger than "end". 5) We also mask ceiling with PMD_MASK. This sets ceiling to 0xef800000. Now addr is less than or equal to ceiling - 1 so we continue. 6) start is set to addr, which as stated is 0x70800000, the free_pud_range() loop is executed 7) start is 0x70800000 and end is 0x70188000 and here we have the problem that start > end, flush_tlb_pgtables() is called with the arguments like this and we trigger the aforementioned BUG(). This adjustment of addr relative to floor is very strange, it can advance "addr" (and thus "start") past the end of the VMA we are unmapping. In fact, it is miraculious that this free_pud_range() calling loop terminates properly! Actually, it is no mystery, since the next PGD address is the same for both the original and adjusted value of "addr". So the loop terminates after the first iteration. Anyways, there's the full analysis, what do you make of this Hugh? :-) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/