Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
Hi Oren, On Thu, May 15, 2014 at 10:05:05AM +0300, Oren Twaig wrote: > > On 05/15/2014 08:00 PM, Anthony Iliopoulos wrote: > > I have dismissed this case, since I assume that there are many more > > cycles spent in servicing the TLB invalidation IPI, walking the pgtable > > plus other related overhead (e.g. sched) than in updating the pte/pmd > > so I am not sure how possible it would be to hit this condition. > > Hi Anthony, > > I have a question about the above statement. What will happen with multi > cpu VMs ? won't the race described above can happen ? I.e one virtual CPU > can will visit the host and the other will continue to encounter your race ? I don't think there will be any race for the vcpu cases, since the sptes will be cleared via the mmu_notifier, so this should take care of it in the same manner as ptep_get_and_clear() does, as described by Dave earlier in this thread. Regards, Anthony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
On 05/15/2014 08:00 PM, Anthony Iliopoulos wrote: > I have dismissed this case, since I assume that there are many more > cycles spent in servicing the TLB invalidation IPI, walking the pgtable > plus other related overhead (e.g. sched) than in updating the pte/pmd > so I am not sure how possible it would be to hit this condition. Hi Anthony, I have a question about the above statement. What will happen with multi cpu VMs ? won't the race described above can happen ? I.e one virtual CPU can will visit the host and the other will continue to encounter your race ? Thanks, Oren. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
On 05/15/2014 10:00 AM, Anthony Iliopoulos wrote: > I have actually also wondered about another related thing: > even when we actually do invalidate the page, there may still be a > race between a thread on a core that reads the page in some tight > loop (e.g. on a spinlock), and the page fault handler running on > a different core, at the point where the pte is set. Since we > invalidate the page via the TLB shootdowns *before* we update > the pte (this is true for all do_wp_page(), do_huge_pmd_wp_page() > as well as hugetlb_cow()), there may be some tiny window where the > thread might re-read the page before the pte is set. Don't forget about the "clear" part. ptep_clear_flush() does: pte = ptep_get_and_clear(mm, address, ptep); if (pte_accessible(mm, pte)) flush_tlb_page(vma, address); so it makes the pte !present and guarantees that any other CPUs looking at it after the flush but before the set_pte() will also end up in the page fault handler, and they'll wait until the first fault has finished with the page tables. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
On Tue, May 13, 2014 at 03:44:55PM -0700, Dave Hansen wrote: > On 05/14/2014 02:29 AM, Anthony Iliopoulos wrote: > > The invalidation is required in order to maintain proper semantics > > under CoW conditions. In scenarios where a process clones several > > threads, a thread operating on a core whose DTLB entry for a > > particular hugepage has not been invalidated, will be reading from > > the hugepage that belongs to the forked child process, even after > > hugetlb_cow(). > > > > The thread will not see the updated page as long as the stale DTLB > > entry remains cached, the thread attempts to write into the page, > > the child process exits, or the thread gets migrated to a different > > processor. > > No to be too nitpicky, but this applies to ITLB too, right? Quite true, this does apply to ITLB too. I suppose there might be cases like self-modifying code that touches the private text pages, or some JIT compiler writing on mmap-ed executable regions that would observe the same behavior under similar conditions. > I believe this bug came all the way back from: > > > commit 1e8f889b10d8d2223105719e36ce45688fedbd59 > > Author: David Gibson > > Date: Fri Jan 6 00:10:44 2006 -0800 > > > > [PATCH] Hugetlb: Copy on Write support > > It was probably the first time that we ever changed an _existing_ > hugetlbfs pte, and that patch probably just missed the TLB flush because > none of the other pte-setting hugetlb.c code needed TLB flushes. This seems to be the case, I assume that our internal use case was also probably the first that needed to rely on proper semantics under a multithreaded CoW scenario with hugetlb pages, so this came into the surface. I have actually also wondered about another related thing: even when we actually do invalidate the page, there may still be a race between a thread on a core that reads the page in some tight loop (e.g. on a spinlock), and the page fault handler running on a different core, at the point where the pte is set. Since we invalidate the page via the TLB shootdowns *before* we update the pte (this is true for all do_wp_page(), do_huge_pmd_wp_page() as well as hugetlb_cow()), there may be some tiny window where the thread might re-read the page before the pte is set. I have dismissed this case, since I assume that there are many more cycles spent in servicing the TLB invalidation IPI, walking the pgtable plus other related overhead (e.g. sched) than in updating the pte/pmd so I am not sure how possible it would be to hit this condition. I am also hesitant to simply submit a patch that reverses the order of flushing and setting the pte, due to the following commit: commit 4ce072f1faf29d24df4600f53db8cdd62d400a8f Author: Siddha, Suresh B Date: Fri Sep 29 01:58:42 2006 -0700 [PATCH] mm: fix a race condition under SMC + COW > The bogus x86 version of huge_ptep_clear_flush() came from the s390 > guys, so double-shame on IBM! :P > > > commit 8fe627ec5b7c47b1654dff50536d9709863295a3 > > Author: Gerald Schaefer > > Date: Mon Apr 28 02:13:28 2008 -0700 > > > > hugetlbfs: add missing TLB flush to hugetlb_cow() > > This is probably an opportunity for all the other architecture > maintainers to make sure that they have proper copies of > huge_ptep_clear_flush(). > > I went through the hugetlb code on x86 and couldn't find another TLB > flush that fixes this issue, and I believe this is correct, so: > > Acked-by: Dave Hansen Many thanks for confirming, and for your prompt response. Regards, Anthony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
On 05/14/2014 02:29 AM, Anthony Iliopoulos wrote: > The invalidation is required in order to maintain proper semantics > under CoW conditions. In scenarios where a process clones several > threads, a thread operating on a core whose DTLB entry for a > particular hugepage has not been invalidated, will be reading from > the hugepage that belongs to the forked child process, even after > hugetlb_cow(). > > The thread will not see the updated page as long as the stale DTLB > entry remains cached, the thread attempts to write into the page, > the child process exits, or the thread gets migrated to a different > processor. No to be too nitpicky, but this applies to ITLB too, right? I believe this bug came all the way back from: > commit 1e8f889b10d8d2223105719e36ce45688fedbd59 > Author: David Gibson > Date: Fri Jan 6 00:10:44 2006 -0800 > > [PATCH] Hugetlb: Copy on Write support It was probably the first time that we ever changed an _existing_ hugetlbfs pte, and that patch probably just missed the TLB flush because none of the other pte-setting hugetlb.c code needed TLB flushes. The bogus x86 version of huge_ptep_clear_flush() came from the s390 guys, so double-shame on IBM! :P > commit 8fe627ec5b7c47b1654dff50536d9709863295a3 > Author: Gerald Schaefer > Date: Mon Apr 28 02:13:28 2008 -0700 > > hugetlbfs: add missing TLB flush to hugetlb_cow() This is probably an opportunity for all the other architecture maintainers to make sure that they have proper copies of huge_ptep_clear_flush(). I went through the hugetlb code on x86 and couldn't find another TLB flush that fixes this issue, and I believe this is correct, so: Acked-by: Dave Hansen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
The invalidation is required in order to maintain proper semantics under CoW conditions. In scenarios where a process clones several threads, a thread operating on a core whose DTLB entry for a particular hugepage has not been invalidated, will be reading from the hugepage that belongs to the forked child process, even after hugetlb_cow(). The thread will not see the updated page as long as the stale DTLB entry remains cached, the thread attempts to write into the page, the child process exits, or the thread gets migrated to a different processor. Signed-off-by: Anthony Iliopoulos Suggested-by: Shay Goikhman --- arch/x86/include/asm/hugetlb.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h index a809121..68c0539 100644 --- a/arch/x86/include/asm/hugetlb.h +++ b/arch/x86/include/asm/hugetlb.h @@ -52,6 +52,7 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { + ptep_clear_flush(vma, addr, ptep); } static inline int huge_pte_none(pte_t pte) -- 1.8.1.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/