Re: [PATCHv2 4/4] mm: make compound_head() robust
On Tue, Aug 18, 2015 at 06:41:13PM +0200, Michal Hocko wrote: > On Tue 18-08-15 13:20:22, Michal Hocko wrote: > > On Mon 17-08-15 18:09:05, Kirill A. Shutemov wrote: > > > Hugh has pointed that compound_head() call can be unsafe in some > > > context. There's one example: > > > > > > CPU0CPU1 > > > > > > isolate_migratepages_block() > > > page_count() > > > compound_head() > > > !!PageTail() == true > > > put_page() > > > tail->first_page = NULL > > > head = tail->first_page > > > alloc_pages(__GFP_COMP) > > > prep_compound_page() > > >tail->first_page = head > > >__SetPageTail(p); > > > !!PageTail() == true > > > > > > > > > The race is pure theoretical. I don't it's possible to trigger it in > > > practice. But who knows. > > > > > > We can fix the race by changing how encode PageTail() and compound_head() > > > within struct page to be able to update them in one shot. > > > > > > The patch introduces page->compound_head into third double word block in > > > front of compound_dtor and compound_order. That means it shares storage > > > space with: > > > > > > - page->lru.next; > > > - page->next; > > > - page->rcu_head.next; > > > - page->pmd_huge_pte; > > > > > > That's too long list to be absolutely sure, but looks like nobody uses > > > bit 0 of the word. It can be used to encode PageTail(). And if the bit > > > set, rest of the word is pointer to head page. > > > > I didn't look too closely but the general idea makes sense to me and the > > overal code simplification is sound. I will give it more detailed review > > after I sort out other stuff. > > AFICS page::first_page wasn't used outside of compound page logic so you > should remove it in this patch. The rest looks good to me. I missed it by accident during rework for v2. Will fix. > Acked-by: Michal Hocko Thanks! -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2 4/4] mm: make compound_head() robust
On Tue 18-08-15 13:20:22, Michal Hocko wrote: > On Mon 17-08-15 18:09:05, Kirill A. Shutemov wrote: > > Hugh has pointed that compound_head() call can be unsafe in some > > context. There's one example: > > > > CPU0CPU1 > > > > isolate_migratepages_block() > > page_count() > > compound_head() > > !!PageTail() == true > > put_page() > > tail->first_page = NULL > > head = tail->first_page > > alloc_pages(__GFP_COMP) > >prep_compound_page() > > tail->first_page = head > > __SetPageTail(p); > > !!PageTail() == true > > > > > > The race is pure theoretical. I don't it's possible to trigger it in > > practice. But who knows. > > > > We can fix the race by changing how encode PageTail() and compound_head() > > within struct page to be able to update them in one shot. > > > > The patch introduces page->compound_head into third double word block in > > front of compound_dtor and compound_order. That means it shares storage > > space with: > > > > - page->lru.next; > > - page->next; > > - page->rcu_head.next; > > - page->pmd_huge_pte; > > > > That's too long list to be absolutely sure, but looks like nobody uses > > bit 0 of the word. It can be used to encode PageTail(). And if the bit > > set, rest of the word is pointer to head page. > > I didn't look too closely but the general idea makes sense to me and the > overal code simplification is sound. I will give it more detailed review > after I sort out other stuff. AFICS page::first_page wasn't used outside of compound page logic so you should remove it in this patch. The rest looks good to me. Acked-by: Michal Hocko > > Signed-off-by: Kirill A. Shutemov > > Cc: Hugh Dickins > > Cc: David Rientjes > > Cc: Vlastimil Babka > > Signed-off-by: Kirill A. Shutemov > > --- > > Documentation/vm/split_page_table_lock | 4 +- > > arch/xtensa/configs/iss_defconfig | 1 - > > include/linux/mm.h | 53 ++ > > include/linux/mm_types.h | 8 +++- > > include/linux/page-flags.h | 80 > > -- > > mm/Kconfig | 12 - > > mm/debug.c | 5 --- > > mm/huge_memory.c | 3 +- > > mm/hugetlb.c | 8 +--- > > mm/internal.h | 4 +- > > mm/memory-failure.c| 7 --- > > mm/page_alloc.c| 28 +++- > > mm/swap.c | 4 +- > > 13 files changed, 53 insertions(+), 164 deletions(-) > > > > diff --git a/Documentation/vm/split_page_table_lock > > b/Documentation/vm/split_page_table_lock > > index 6dea4fd5c961..62842a857dab 100644 > > --- a/Documentation/vm/split_page_table_lock > > +++ b/Documentation/vm/split_page_table_lock > > @@ -54,8 +54,8 @@ everything required is done by pgtable_page_ctor() and > > pgtable_page_dtor(), > > which must be called on PTE table allocation / freeing. > > > > Make sure the architecture doesn't use slab allocator for page table > > -allocation: slab uses page->slab_cache and page->first_page for its pages. > > -These fields share storage with page->ptl. > > +allocation: slab uses page->slab_cache for its pages. > > +This field shares storage with page->ptl. > > > > PMD split lock only makes sense if you have more than two page table > > levels. > > diff --git a/arch/xtensa/configs/iss_defconfig > > b/arch/xtensa/configs/iss_defconfig > > index e4d193e7a300..5c7c385f21c4 100644 > > --- a/arch/xtensa/configs/iss_defconfig > > +++ b/arch/xtensa/configs/iss_defconfig > > @@ -169,7 +169,6 @@ CONFIG_FLATMEM_MANUAL=y > > # CONFIG_SPARSEMEM_MANUAL is not set > > CONFIG_FLATMEM=y > > CONFIG_FLAT_NODE_MEM_MAP=y > > -CONFIG_PAGEFLAGS_EXTENDED=y > > CONFIG_SPLIT_PTLOCK_CPUS=4 > > # CONFIG_PHYS_ADDR_T_64BIT is not set > > CONFIG_ZONE_DMA_FLAG=1 > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 9c21bbb8875a..5090a0b9bb43 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -437,46 +437,6 @@ static inline void compound_unlock_irqrestore(struct > > page *page, > > #endif > > } > > > > -static inline struct page *compound_head_by_tail(struct page *tail) > > -{ > > - struct page *head = tail->first_page; > > - > > - /* > > -* page->first_page may be a dangling pointer to an old > > -* compound page, so recheck that it is still a tail > > -* page before returning. > > -*/ > > - smp_rmb(); > > - if (likely(PageTail(tail))) > > - return head; > > - return tail; > > -} > > - > > -/* > > - * Since either compound page could be dismantled asynchronously in THP > > - *
Re: [PATCHv2 4/4] mm: make compound_head() robust
On Mon 17-08-15 18:09:05, Kirill A. Shutemov wrote: > Hugh has pointed that compound_head() call can be unsafe in some > context. There's one example: > > CPU0CPU1 > > isolate_migratepages_block() > page_count() > compound_head() > !!PageTail() == true > put_page() > tail->first_page = NULL > head = tail->first_page > alloc_pages(__GFP_COMP) > prep_compound_page() >tail->first_page = head >__SetPageTail(p); > !!PageTail() == true > > > The race is pure theoretical. I don't it's possible to trigger it in > practice. But who knows. > > We can fix the race by changing how encode PageTail() and compound_head() > within struct page to be able to update them in one shot. > > The patch introduces page->compound_head into third double word block in > front of compound_dtor and compound_order. That means it shares storage > space with: > > - page->lru.next; > - page->next; > - page->rcu_head.next; > - page->pmd_huge_pte; > > That's too long list to be absolutely sure, but looks like nobody uses > bit 0 of the word. It can be used to encode PageTail(). And if the bit > set, rest of the word is pointer to head page. I didn't look too closely but the general idea makes sense to me and the overal code simplification is sound. I will give it more detailed review after I sort out other stuff. > > Signed-off-by: Kirill A. Shutemov > Cc: Hugh Dickins > Cc: David Rientjes > Cc: Vlastimil Babka > Signed-off-by: Kirill A. Shutemov > --- > Documentation/vm/split_page_table_lock | 4 +- > arch/xtensa/configs/iss_defconfig | 1 - > include/linux/mm.h | 53 ++ > include/linux/mm_types.h | 8 +++- > include/linux/page-flags.h | 80 > -- > mm/Kconfig | 12 - > mm/debug.c | 5 --- > mm/huge_memory.c | 3 +- > mm/hugetlb.c | 8 +--- > mm/internal.h | 4 +- > mm/memory-failure.c| 7 --- > mm/page_alloc.c| 28 +++- > mm/swap.c | 4 +- > 13 files changed, 53 insertions(+), 164 deletions(-) > > diff --git a/Documentation/vm/split_page_table_lock > b/Documentation/vm/split_page_table_lock > index 6dea4fd5c961..62842a857dab 100644 > --- a/Documentation/vm/split_page_table_lock > +++ b/Documentation/vm/split_page_table_lock > @@ -54,8 +54,8 @@ everything required is done by pgtable_page_ctor() and > pgtable_page_dtor(), > which must be called on PTE table allocation / freeing. > > Make sure the architecture doesn't use slab allocator for page table > -allocation: slab uses page->slab_cache and page->first_page for its pages. > -These fields share storage with page->ptl. > +allocation: slab uses page->slab_cache for its pages. > +This field shares storage with page->ptl. > > PMD split lock only makes sense if you have more than two page table > levels. > diff --git a/arch/xtensa/configs/iss_defconfig > b/arch/xtensa/configs/iss_defconfig > index e4d193e7a300..5c7c385f21c4 100644 > --- a/arch/xtensa/configs/iss_defconfig > +++ b/arch/xtensa/configs/iss_defconfig > @@ -169,7 +169,6 @@ CONFIG_FLATMEM_MANUAL=y > # CONFIG_SPARSEMEM_MANUAL is not set > CONFIG_FLATMEM=y > CONFIG_FLAT_NODE_MEM_MAP=y > -CONFIG_PAGEFLAGS_EXTENDED=y > CONFIG_SPLIT_PTLOCK_CPUS=4 > # CONFIG_PHYS_ADDR_T_64BIT is not set > CONFIG_ZONE_DMA_FLAG=1 > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 9c21bbb8875a..5090a0b9bb43 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -437,46 +437,6 @@ static inline void compound_unlock_irqrestore(struct > page *page, > #endif > } > > -static inline struct page *compound_head_by_tail(struct page *tail) > -{ > - struct page *head = tail->first_page; > - > - /* > - * page->first_page may be a dangling pointer to an old > - * compound page, so recheck that it is still a tail > - * page before returning. > - */ > - smp_rmb(); > - if (likely(PageTail(tail))) > - return head; > - return tail; > -} > - > -/* > - * Since either compound page could be dismantled asynchronously in THP > - * or we access asynchronously arbitrary positioned struct page, there > - * would be tail flag race. To handle this race, we should call > - * smp_rmb() before checking tail flag. compound_head_by_tail() did it. > - */ > -static inline struct page *compound_head(struct page *page) > -{ > - if (unlikely(PageTail(page))) > - return compound_head_by_tail(page); > - return page; > -} > - > -/*
Re: [PATCHv2 4/4] mm: make compound_head() robust
On Mon 17-08-15 18:09:05, Kirill A. Shutemov wrote: Hugh has pointed that compound_head() call can be unsafe in some context. There's one example: CPU0CPU1 isolate_migratepages_block() page_count() compound_head() !!PageTail() == true put_page() tail-first_page = NULL head = tail-first_page alloc_pages(__GFP_COMP) prep_compound_page() tail-first_page = head __SetPageTail(p); !!PageTail() == true head == NULL dereferencing The race is pure theoretical. I don't it's possible to trigger it in practice. But who knows. We can fix the race by changing how encode PageTail() and compound_head() within struct page to be able to update them in one shot. The patch introduces page-compound_head into third double word block in front of compound_dtor and compound_order. That means it shares storage space with: - page-lru.next; - page-next; - page-rcu_head.next; - page-pmd_huge_pte; That's too long list to be absolutely sure, but looks like nobody uses bit 0 of the word. It can be used to encode PageTail(). And if the bit set, rest of the word is pointer to head page. I didn't look too closely but the general idea makes sense to me and the overal code simplification is sound. I will give it more detailed review after I sort out other stuff. Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com Cc: Hugh Dickins hu...@google.com Cc: David Rientjes rient...@google.com Cc: Vlastimil Babka vba...@suse.cz Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com --- Documentation/vm/split_page_table_lock | 4 +- arch/xtensa/configs/iss_defconfig | 1 - include/linux/mm.h | 53 ++ include/linux/mm_types.h | 8 +++- include/linux/page-flags.h | 80 -- mm/Kconfig | 12 - mm/debug.c | 5 --- mm/huge_memory.c | 3 +- mm/hugetlb.c | 8 +--- mm/internal.h | 4 +- mm/memory-failure.c| 7 --- mm/page_alloc.c| 28 +++- mm/swap.c | 4 +- 13 files changed, 53 insertions(+), 164 deletions(-) diff --git a/Documentation/vm/split_page_table_lock b/Documentation/vm/split_page_table_lock index 6dea4fd5c961..62842a857dab 100644 --- a/Documentation/vm/split_page_table_lock +++ b/Documentation/vm/split_page_table_lock @@ -54,8 +54,8 @@ everything required is done by pgtable_page_ctor() and pgtable_page_dtor(), which must be called on PTE table allocation / freeing. Make sure the architecture doesn't use slab allocator for page table -allocation: slab uses page-slab_cache and page-first_page for its pages. -These fields share storage with page-ptl. +allocation: slab uses page-slab_cache for its pages. +This field shares storage with page-ptl. PMD split lock only makes sense if you have more than two page table levels. diff --git a/arch/xtensa/configs/iss_defconfig b/arch/xtensa/configs/iss_defconfig index e4d193e7a300..5c7c385f21c4 100644 --- a/arch/xtensa/configs/iss_defconfig +++ b/arch/xtensa/configs/iss_defconfig @@ -169,7 +169,6 @@ CONFIG_FLATMEM_MANUAL=y # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y -CONFIG_PAGEFLAGS_EXTENDED=y CONFIG_SPLIT_PTLOCK_CPUS=4 # CONFIG_PHYS_ADDR_T_64BIT is not set CONFIG_ZONE_DMA_FLAG=1 diff --git a/include/linux/mm.h b/include/linux/mm.h index 9c21bbb8875a..5090a0b9bb43 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -437,46 +437,6 @@ static inline void compound_unlock_irqrestore(struct page *page, #endif } -static inline struct page *compound_head_by_tail(struct page *tail) -{ - struct page *head = tail-first_page; - - /* - * page-first_page may be a dangling pointer to an old - * compound page, so recheck that it is still a tail - * page before returning. - */ - smp_rmb(); - if (likely(PageTail(tail))) - return head; - return tail; -} - -/* - * Since either compound page could be dismantled asynchronously in THP - * or we access asynchronously arbitrary positioned struct page, there - * would be tail flag race. To handle this race, we should call - * smp_rmb() before checking tail flag. compound_head_by_tail() did it. - */ -static inline struct page *compound_head(struct page *page) -{ - if (unlikely(PageTail(page))) - return compound_head_by_tail(page); - return page; -} - -/* - *
Re: [PATCHv2 4/4] mm: make compound_head() robust
On Tue, Aug 18, 2015 at 06:41:13PM +0200, Michal Hocko wrote: On Tue 18-08-15 13:20:22, Michal Hocko wrote: On Mon 17-08-15 18:09:05, Kirill A. Shutemov wrote: Hugh has pointed that compound_head() call can be unsafe in some context. There's one example: CPU0CPU1 isolate_migratepages_block() page_count() compound_head() !!PageTail() == true put_page() tail-first_page = NULL head = tail-first_page alloc_pages(__GFP_COMP) prep_compound_page() tail-first_page = head __SetPageTail(p); !!PageTail() == true head == NULL dereferencing The race is pure theoretical. I don't it's possible to trigger it in practice. But who knows. We can fix the race by changing how encode PageTail() and compound_head() within struct page to be able to update them in one shot. The patch introduces page-compound_head into third double word block in front of compound_dtor and compound_order. That means it shares storage space with: - page-lru.next; - page-next; - page-rcu_head.next; - page-pmd_huge_pte; That's too long list to be absolutely sure, but looks like nobody uses bit 0 of the word. It can be used to encode PageTail(). And if the bit set, rest of the word is pointer to head page. I didn't look too closely but the general idea makes sense to me and the overal code simplification is sound. I will give it more detailed review after I sort out other stuff. AFICS page::first_page wasn't used outside of compound page logic so you should remove it in this patch. The rest looks good to me. I missed it by accident during rework for v2. Will fix. Acked-by: Michal Hocko mho...@suse.com Thanks! -- Kirill A. Shutemov -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHv2 4/4] mm: make compound_head() robust
On Tue 18-08-15 13:20:22, Michal Hocko wrote: On Mon 17-08-15 18:09:05, Kirill A. Shutemov wrote: Hugh has pointed that compound_head() call can be unsafe in some context. There's one example: CPU0CPU1 isolate_migratepages_block() page_count() compound_head() !!PageTail() == true put_page() tail-first_page = NULL head = tail-first_page alloc_pages(__GFP_COMP) prep_compound_page() tail-first_page = head __SetPageTail(p); !!PageTail() == true head == NULL dereferencing The race is pure theoretical. I don't it's possible to trigger it in practice. But who knows. We can fix the race by changing how encode PageTail() and compound_head() within struct page to be able to update them in one shot. The patch introduces page-compound_head into third double word block in front of compound_dtor and compound_order. That means it shares storage space with: - page-lru.next; - page-next; - page-rcu_head.next; - page-pmd_huge_pte; That's too long list to be absolutely sure, but looks like nobody uses bit 0 of the word. It can be used to encode PageTail(). And if the bit set, rest of the word is pointer to head page. I didn't look too closely but the general idea makes sense to me and the overal code simplification is sound. I will give it more detailed review after I sort out other stuff. AFICS page::first_page wasn't used outside of compound page logic so you should remove it in this patch. The rest looks good to me. Acked-by: Michal Hocko mho...@suse.com Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com Cc: Hugh Dickins hu...@google.com Cc: David Rientjes rient...@google.com Cc: Vlastimil Babka vba...@suse.cz Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com --- Documentation/vm/split_page_table_lock | 4 +- arch/xtensa/configs/iss_defconfig | 1 - include/linux/mm.h | 53 ++ include/linux/mm_types.h | 8 +++- include/linux/page-flags.h | 80 -- mm/Kconfig | 12 - mm/debug.c | 5 --- mm/huge_memory.c | 3 +- mm/hugetlb.c | 8 +--- mm/internal.h | 4 +- mm/memory-failure.c| 7 --- mm/page_alloc.c| 28 +++- mm/swap.c | 4 +- 13 files changed, 53 insertions(+), 164 deletions(-) diff --git a/Documentation/vm/split_page_table_lock b/Documentation/vm/split_page_table_lock index 6dea4fd5c961..62842a857dab 100644 --- a/Documentation/vm/split_page_table_lock +++ b/Documentation/vm/split_page_table_lock @@ -54,8 +54,8 @@ everything required is done by pgtable_page_ctor() and pgtable_page_dtor(), which must be called on PTE table allocation / freeing. Make sure the architecture doesn't use slab allocator for page table -allocation: slab uses page-slab_cache and page-first_page for its pages. -These fields share storage with page-ptl. +allocation: slab uses page-slab_cache for its pages. +This field shares storage with page-ptl. PMD split lock only makes sense if you have more than two page table levels. diff --git a/arch/xtensa/configs/iss_defconfig b/arch/xtensa/configs/iss_defconfig index e4d193e7a300..5c7c385f21c4 100644 --- a/arch/xtensa/configs/iss_defconfig +++ b/arch/xtensa/configs/iss_defconfig @@ -169,7 +169,6 @@ CONFIG_FLATMEM_MANUAL=y # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y -CONFIG_PAGEFLAGS_EXTENDED=y CONFIG_SPLIT_PTLOCK_CPUS=4 # CONFIG_PHYS_ADDR_T_64BIT is not set CONFIG_ZONE_DMA_FLAG=1 diff --git a/include/linux/mm.h b/include/linux/mm.h index 9c21bbb8875a..5090a0b9bb43 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -437,46 +437,6 @@ static inline void compound_unlock_irqrestore(struct page *page, #endif } -static inline struct page *compound_head_by_tail(struct page *tail) -{ - struct page *head = tail-first_page; - - /* -* page-first_page may be a dangling pointer to an old -* compound page, so recheck that it is still a tail -* page before returning. -*/ - smp_rmb(); - if (likely(PageTail(tail))) - return head; - return tail; -} - -/* - * Since either compound page could be dismantled asynchronously in THP - * or we access asynchronously arbitrary positioned struct page, there - * would be tail flag race. To
[PATCHv2 4/4] mm: make compound_head() robust
Hugh has pointed that compound_head() call can be unsafe in some context. There's one example: CPU0CPU1 isolate_migratepages_block() page_count() compound_head() !!PageTail() == true put_page() tail->first_page = NULL head = tail->first_page alloc_pages(__GFP_COMP) prep_compound_page() tail->first_page = head __SetPageTail(p); !!PageTail() == true The race is pure theoretical. I don't it's possible to trigger it in practice. But who knows. We can fix the race by changing how encode PageTail() and compound_head() within struct page to be able to update them in one shot. The patch introduces page->compound_head into third double word block in front of compound_dtor and compound_order. That means it shares storage space with: - page->lru.next; - page->next; - page->rcu_head.next; - page->pmd_huge_pte; That's too long list to be absolutely sure, but looks like nobody uses bit 0 of the word. It can be used to encode PageTail(). And if the bit set, rest of the word is pointer to head page. Signed-off-by: Kirill A. Shutemov Cc: Hugh Dickins Cc: David Rientjes Cc: Vlastimil Babka Signed-off-by: Kirill A. Shutemov --- Documentation/vm/split_page_table_lock | 4 +- arch/xtensa/configs/iss_defconfig | 1 - include/linux/mm.h | 53 ++ include/linux/mm_types.h | 8 +++- include/linux/page-flags.h | 80 -- mm/Kconfig | 12 - mm/debug.c | 5 --- mm/huge_memory.c | 3 +- mm/hugetlb.c | 8 +--- mm/internal.h | 4 +- mm/memory-failure.c| 7 --- mm/page_alloc.c| 28 +++- mm/swap.c | 4 +- 13 files changed, 53 insertions(+), 164 deletions(-) diff --git a/Documentation/vm/split_page_table_lock b/Documentation/vm/split_page_table_lock index 6dea4fd5c961..62842a857dab 100644 --- a/Documentation/vm/split_page_table_lock +++ b/Documentation/vm/split_page_table_lock @@ -54,8 +54,8 @@ everything required is done by pgtable_page_ctor() and pgtable_page_dtor(), which must be called on PTE table allocation / freeing. Make sure the architecture doesn't use slab allocator for page table -allocation: slab uses page->slab_cache and page->first_page for its pages. -These fields share storage with page->ptl. +allocation: slab uses page->slab_cache for its pages. +This field shares storage with page->ptl. PMD split lock only makes sense if you have more than two page table levels. diff --git a/arch/xtensa/configs/iss_defconfig b/arch/xtensa/configs/iss_defconfig index e4d193e7a300..5c7c385f21c4 100644 --- a/arch/xtensa/configs/iss_defconfig +++ b/arch/xtensa/configs/iss_defconfig @@ -169,7 +169,6 @@ CONFIG_FLATMEM_MANUAL=y # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y -CONFIG_PAGEFLAGS_EXTENDED=y CONFIG_SPLIT_PTLOCK_CPUS=4 # CONFIG_PHYS_ADDR_T_64BIT is not set CONFIG_ZONE_DMA_FLAG=1 diff --git a/include/linux/mm.h b/include/linux/mm.h index 9c21bbb8875a..5090a0b9bb43 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -437,46 +437,6 @@ static inline void compound_unlock_irqrestore(struct page *page, #endif } -static inline struct page *compound_head_by_tail(struct page *tail) -{ - struct page *head = tail->first_page; - - /* -* page->first_page may be a dangling pointer to an old -* compound page, so recheck that it is still a tail -* page before returning. -*/ - smp_rmb(); - if (likely(PageTail(tail))) - return head; - return tail; -} - -/* - * Since either compound page could be dismantled asynchronously in THP - * or we access asynchronously arbitrary positioned struct page, there - * would be tail flag race. To handle this race, we should call - * smp_rmb() before checking tail flag. compound_head_by_tail() did it. - */ -static inline struct page *compound_head(struct page *page) -{ - if (unlikely(PageTail(page))) - return compound_head_by_tail(page); - return page; -} - -/* - * If we access compound page synchronously such as access to - * allocated page, there is no need to handle tail flag race, so we can - * check tail flag directly without any synchronization primitive. - */ -static inline struct page *compound_head_fast(struct page *page) -{ - if (unlikely(PageTail(page))) - return page->first_page; - return page; -} - /* * The atomic page->_mapcount, starts from -1: so that
[PATCHv2 4/4] mm: make compound_head() robust
Hugh has pointed that compound_head() call can be unsafe in some context. There's one example: CPU0CPU1 isolate_migratepages_block() page_count() compound_head() !!PageTail() == true put_page() tail-first_page = NULL head = tail-first_page alloc_pages(__GFP_COMP) prep_compound_page() tail-first_page = head __SetPageTail(p); !!PageTail() == true head == NULL dereferencing The race is pure theoretical. I don't it's possible to trigger it in practice. But who knows. We can fix the race by changing how encode PageTail() and compound_head() within struct page to be able to update them in one shot. The patch introduces page-compound_head into third double word block in front of compound_dtor and compound_order. That means it shares storage space with: - page-lru.next; - page-next; - page-rcu_head.next; - page-pmd_huge_pte; That's too long list to be absolutely sure, but looks like nobody uses bit 0 of the word. It can be used to encode PageTail(). And if the bit set, rest of the word is pointer to head page. Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com Cc: Hugh Dickins hu...@google.com Cc: David Rientjes rient...@google.com Cc: Vlastimil Babka vba...@suse.cz Signed-off-by: Kirill A. Shutemov kirill.shute...@linux.intel.com --- Documentation/vm/split_page_table_lock | 4 +- arch/xtensa/configs/iss_defconfig | 1 - include/linux/mm.h | 53 ++ include/linux/mm_types.h | 8 +++- include/linux/page-flags.h | 80 -- mm/Kconfig | 12 - mm/debug.c | 5 --- mm/huge_memory.c | 3 +- mm/hugetlb.c | 8 +--- mm/internal.h | 4 +- mm/memory-failure.c| 7 --- mm/page_alloc.c| 28 +++- mm/swap.c | 4 +- 13 files changed, 53 insertions(+), 164 deletions(-) diff --git a/Documentation/vm/split_page_table_lock b/Documentation/vm/split_page_table_lock index 6dea4fd5c961..62842a857dab 100644 --- a/Documentation/vm/split_page_table_lock +++ b/Documentation/vm/split_page_table_lock @@ -54,8 +54,8 @@ everything required is done by pgtable_page_ctor() and pgtable_page_dtor(), which must be called on PTE table allocation / freeing. Make sure the architecture doesn't use slab allocator for page table -allocation: slab uses page-slab_cache and page-first_page for its pages. -These fields share storage with page-ptl. +allocation: slab uses page-slab_cache for its pages. +This field shares storage with page-ptl. PMD split lock only makes sense if you have more than two page table levels. diff --git a/arch/xtensa/configs/iss_defconfig b/arch/xtensa/configs/iss_defconfig index e4d193e7a300..5c7c385f21c4 100644 --- a/arch/xtensa/configs/iss_defconfig +++ b/arch/xtensa/configs/iss_defconfig @@ -169,7 +169,6 @@ CONFIG_FLATMEM_MANUAL=y # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_FLATMEM=y CONFIG_FLAT_NODE_MEM_MAP=y -CONFIG_PAGEFLAGS_EXTENDED=y CONFIG_SPLIT_PTLOCK_CPUS=4 # CONFIG_PHYS_ADDR_T_64BIT is not set CONFIG_ZONE_DMA_FLAG=1 diff --git a/include/linux/mm.h b/include/linux/mm.h index 9c21bbb8875a..5090a0b9bb43 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -437,46 +437,6 @@ static inline void compound_unlock_irqrestore(struct page *page, #endif } -static inline struct page *compound_head_by_tail(struct page *tail) -{ - struct page *head = tail-first_page; - - /* -* page-first_page may be a dangling pointer to an old -* compound page, so recheck that it is still a tail -* page before returning. -*/ - smp_rmb(); - if (likely(PageTail(tail))) - return head; - return tail; -} - -/* - * Since either compound page could be dismantled asynchronously in THP - * or we access asynchronously arbitrary positioned struct page, there - * would be tail flag race. To handle this race, we should call - * smp_rmb() before checking tail flag. compound_head_by_tail() did it. - */ -static inline struct page *compound_head(struct page *page) -{ - if (unlikely(PageTail(page))) - return compound_head_by_tail(page); - return page; -} - -/* - * If we access compound page synchronously such as access to - * allocated page, there is no need to handle tail flag race, so we can - * check tail flag directly without any synchronization primitive. - */ -static inline struct page *compound_head_fast(struct page *page) -{ - if (unlikely(PageTail(page))) -