On Thu, Dec 13, 2007 at 11:47:29AM -0800, Christoph Lameter wrote: > On Wed, 12 Dec 2007, Jeremy Fitzhardinge wrote: > > > I'm looking at unifying the various pgalloc+pgd_lists mechanisms between > > 32-bit (PAE and non-PAE) and 64-bit, so I'm trying to understand why > > these differences exist in the first place. > > > > Change da8f153e51290e7438ba7da66234a864e5d3e1c1 reverted the use of > > quicklists for allocating pagetables, because of concerns about ordering > > with respect to tlb flushes. > > These issues only exist with NUMA because of the freeing of off node pages > before the TLB flush was done. There was a discussion about this issue and > my fix of simply not freeing the offnode pages early was ignored. Instead > the x86_64 implementation (which has no problems that I know of) was
NUMA bug might not be the only problem. I think there are more issues as Linus noticed. <snip> Oh, and I see what's wrong: you not only switched "free_page()" to "quicklist_free()", you *also* switched "tlb_remove_page()" to "quicklist_free()". </snip> The above comment is in reference to below portion of code: -#define __pte_free_tlb(tlb,pte) tlb_remove_page((tlb),(pte)) +#define __pte_free_tlb(tlb,pte) quicklist_free_page(QUICK_PT, NULL,(pte)) tlb_remove_page() was marking tlb->need_flush. Which is later used by tlb_flush_mmu(). With quicklist_free_page() we loose all that.. Now in a corner case scenario with a big munmap() which calls unmap_region() and if it so happens that the region getting unmapped just has page tables setup but with all PTE's set to NULL, unmap_region() may potentially free the page table pages [ tlb_finish_mmu() calls check_pgt_cache() which trims quicklist ] with out flushing the TLB's. [ (tlb_finish_mmu() calls the tlb_flush_mmu() but it will not do much as need_flush is not set ] Similarly Linus brought pre-emptions issues associated with quicklist usage.. thanks, suresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/