On Wed, 2013-06-05 at 20:58 +0530, Aneesh Kumar K.V wrote: > This is the second patchset needed to support THP on ppc64. Some of > the changes > included in this series are tricky in that it changes the powerpc > linux page table > walk subtly. We also overload few of the pte flags for ptes at PMD > level (huge > page PTEs). > > The related mm/ changes are already merged to Andrew's -mm tree.
[Andrea, question for you near the end ] So I'm trying to understand how you handle races between hash_page and collapse. The generic collapse code does: _pmd = pmdp_clear_flush(vma, address, pmd); Which expects the architecture to essentially have stopped any concurrent walk by the time it returns. Your implementation of the above does this: pmd = *pmdp; pmd_clear(pmdp); /* * Now invalidate the hpte entries in the range * covered by pmd. This make sure we take a * fault and will find the pmd as none, which will * result in a major fault which takes mmap_sem and * hence wait for collapse to complete. Without this * the __collapse_huge_page_copy can result in copying * the old content. */ flush_tlb_pmd_range(vma->vm_mm, &pmd, address); So we clear the pmd after making a copy of it. This will eventually prevent a tablewalk but only eventually, once that store becomes visible to other processors, which may take a while. Then you proceed to flush the hash table for all the underlying PTEs. So at this point, hash_page might *still* see the old pmd. Unless I missed something, you did nothing that will prevent that (the only way to lock against hash_page is really an IPI & wait or to take the PTE's busy and make them !present or something). So as far as I can tell, a concurrent hash_page can still sneak into the hash some "small" entries after you have supposedly flushed them. In addition, my reading of __collapse_huge_page_isolate() is that it expects page_young() to be stable at that point, while because of the above, a concurrent hash_page() might still be setting _PAGE_ACCESSED (and _PAGE_DIRTY). So it might be that you have a sneaky way to perform the synchronization that I have missed :-) But so far I haven't seen it.... Also, a more general question from Andrea. Looking at the code, I was originally thinking that there is a similar race with dirty. But then I noticed that the collapse code doesn't look at dirty at all on the sub pages, it just ignores it. That stroke me as broken until I also noticed that you seem to always make the THPs dirty.... Is there a reason for that rather than harvesting dirty in the sub pages and making the THP's dirty the logical OR of the small pages one ? I understand that anonymous memory is often either zero or dirty, but I suppose it can be clear with content as well as a result of swap out and back in, no ? Or is there other reasons why THPs must be dirty always ? Cheers, Ben. _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev