Re: [PATCH] mm: hwpoison: fix thp split handing in soft_offline_in_use_page()

2019-02-28 Thread zhong jiang
On 2019/3/1 15:29, Naoya Horiguchi wrote:
> On Tue, Feb 26, 2019 at 10:34:32PM +0800, zhong jiang wrote:
>> On 2019/2/26 21:51, Kirill A. Shutemov wrote:
>>> On Tue, Feb 26, 2019 at 07:18:00PM +0800, zhong jiang wrote:
 From: zhongjiang 

 When soft_offline_in_use_page() runs on a thp tail page after pmd is plit,
>>> s/plit/split/
>>>
 we trigger the following VM_BUG_ON_PAGE():

 Memory failure: 0x3755ff: non anonymous thp
 __get_any_page: 0x3755ff: unknown zero refcount page type 2f8000
 Soft offlining pfn 0x34d805 at process virtual address 0x20fff000
 page:ea000d360140 count:0 mapcount:0 mapping: index:0x1
 flags: 0x2f8000()
 raw: 002f8000 ea000d360108 ea000d360188 
 raw: 0001   
 page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
 [ cut here ]
 kernel BUG at ./include/linux/mm.h:519!

 soft_offline_in_use_page() passed refcount and page lock from tail page to
 head page, which is not needed because we can pass any subpage to
 split_huge_page().
>>> I don't see a description of what is going wrong and why change will fixed
>>> it. From the description, it appears as it's cosmetic-only change.
>>>
>>> Please elaborate.
>> When soft_offline_in_use_page runs on a thp tail page after pmd is split,  
>> and we pass the head page to split_huge_page, Unfortunately, the tail page
>> can be free or count turn into zero.
> I guess that you have the similar fix on memory_failure() in your mind:
>
>   commit c3901e722b2975666f42748340df798114742d6d
>   Author: Naoya Horiguchi 
>   Date:   Thu Nov 10 10:46:23 2016 -0800
>   
>   mm: hwpoison: fix thp split handling in memory_failure()
>
> So it seems that I somehow missed fixing soft offline when I wrote commit
> c3901e722b29, and now you find and fix that. Thank you very much.
> If you resend the patch with fixing typo, can you add some reference to
> c3901e722b29 in the patch description to show the linkage?
> And you can add the following tags:
Yep, I find that that is a similar issue. hence I refer to that description in 
the patch you
had mentioned.

I will add the above desprition you had mentioned in V2.

Thanks,
zhong jiang
> Fixes: 61f5d698cc97 ("mm: re-enable THP")
> Acked-by: Naoya Horiguchi 
>
> Thanks,
> Naoya Horiguchi
>
> .
>




Re: [PATCH] mm: hwpoison: fix thp split handing in soft_offline_in_use_page()

2019-02-28 Thread Naoya Horiguchi
On Tue, Feb 26, 2019 at 10:34:32PM +0800, zhong jiang wrote:
> On 2019/2/26 21:51, Kirill A. Shutemov wrote:
> > On Tue, Feb 26, 2019 at 07:18:00PM +0800, zhong jiang wrote:
> >> From: zhongjiang 
> >>
> >> When soft_offline_in_use_page() runs on a thp tail page after pmd is plit,
> > s/plit/split/
> >
> >> we trigger the following VM_BUG_ON_PAGE():
> >>
> >> Memory failure: 0x3755ff: non anonymous thp
> >> __get_any_page: 0x3755ff: unknown zero refcount page type 2f8000
> >> Soft offlining pfn 0x34d805 at process virtual address 0x20fff000
> >> page:ea000d360140 count:0 mapcount:0 mapping: index:0x1
> >> flags: 0x2f8000()
> >> raw: 002f8000 ea000d360108 ea000d360188 
> >> raw: 0001   
> >> page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
> >> [ cut here ]
> >> kernel BUG at ./include/linux/mm.h:519!
> >>
> >> soft_offline_in_use_page() passed refcount and page lock from tail page to
> >> head page, which is not needed because we can pass any subpage to
> >> split_huge_page().
> > I don't see a description of what is going wrong and why change will fixed
> > it. From the description, it appears as it's cosmetic-only change.
> >
> > Please elaborate.
> When soft_offline_in_use_page runs on a thp tail page after pmd is split,  
> and we pass the head page to split_huge_page, Unfortunately, the tail page
> can be free or count turn into zero.

I guess that you have the similar fix on memory_failure() in your mind:

  commit c3901e722b2975666f42748340df798114742d6d
  Author: Naoya Horiguchi 
  Date:   Thu Nov 10 10:46:23 2016 -0800
  
  mm: hwpoison: fix thp split handling in memory_failure()

So it seems that I somehow missed fixing soft offline when I wrote commit
c3901e722b29, and now you find and fix that. Thank you very much.
If you resend the patch with fixing typo, can you add some reference to
c3901e722b29 in the patch description to show the linkage?
And you can add the following tags:

Fixes: 61f5d698cc97 ("mm: re-enable THP")
Acked-by: Naoya Horiguchi 

Thanks,
Naoya Horiguchi


Re: [PATCH] mm: hwpoison: fix thp split handing in soft_offline_in_use_page()

2019-02-26 Thread zhong jiang
On 2019/2/26 21:51, Kirill A. Shutemov wrote:
> On Tue, Feb 26, 2019 at 07:18:00PM +0800, zhong jiang wrote:
>> From: zhongjiang 
>>
>> When soft_offline_in_use_page() runs on a thp tail page after pmd is plit,
> s/plit/split/
>
>> we trigger the following VM_BUG_ON_PAGE():
>>
>> Memory failure: 0x3755ff: non anonymous thp
>> __get_any_page: 0x3755ff: unknown zero refcount page type 2f8000
>> Soft offlining pfn 0x34d805 at process virtual address 0x20fff000
>> page:ea000d360140 count:0 mapcount:0 mapping: index:0x1
>> flags: 0x2f8000()
>> raw: 002f8000 ea000d360108 ea000d360188 
>> raw: 0001   
>> page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
>> [ cut here ]
>> kernel BUG at ./include/linux/mm.h:519!
>>
>> soft_offline_in_use_page() passed refcount and page lock from tail page to
>> head page, which is not needed because we can pass any subpage to
>> split_huge_page().
> I don't see a description of what is going wrong and why change will fixed
> it. From the description, it appears as it's cosmetic-only change.
>
> Please elaborate.
When soft_offline_in_use_page runs on a thp tail page after pmd is split,  
and we pass the head page to split_huge_page, Unfortunately, the tail page
can be free or count turn into zero.

Thanks,
zhong jiang





Re: [PATCH] mm: hwpoison: fix thp split handing in soft_offline_in_use_page()

2019-02-26 Thread Kirill A. Shutemov
On Tue, Feb 26, 2019 at 07:18:00PM +0800, zhong jiang wrote:
> From: zhongjiang 
> 
> When soft_offline_in_use_page() runs on a thp tail page after pmd is plit,

s/plit/split/

> we trigger the following VM_BUG_ON_PAGE():
> 
> Memory failure: 0x3755ff: non anonymous thp
> __get_any_page: 0x3755ff: unknown zero refcount page type 2f8000
> Soft offlining pfn 0x34d805 at process virtual address 0x20fff000
> page:ea000d360140 count:0 mapcount:0 mapping: index:0x1
> flags: 0x2f8000()
> raw: 002f8000 ea000d360108 ea000d360188 
> raw: 0001   
> page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
> [ cut here ]
> kernel BUG at ./include/linux/mm.h:519!
> 
> soft_offline_in_use_page() passed refcount and page lock from tail page to
> head page, which is not needed because we can pass any subpage to
> split_huge_page().

I don't see a description of what is going wrong and why change will fixed
it. From the description, it appears as it's cosmetic-only change.

Please elaborate.

-- 
 Kirill A. Shutemov


Re: [PATCH] mm: hwpoison: fix thp split handing in soft_offline_in_use_page()

2019-02-26 Thread Michal Hocko
[Cc Kirril for the THP side]

On Tue 26-02-19 19:18:00, zhong jiang wrote:
> From: zhongjiang 
> 
> When soft_offline_in_use_page() runs on a thp tail page after pmd is plit,
> we trigger the following VM_BUG_ON_PAGE():
> 
> Memory failure: 0x3755ff: non anonymous thp
> __get_any_page: 0x3755ff: unknown zero refcount page type 2f8000
> Soft offlining pfn 0x34d805 at process virtual address 0x20fff000
> page:ea000d360140 count:0 mapcount:0 mapping: index:0x1
> flags: 0x2f8000()
> raw: 002f8000 ea000d360108 ea000d360188 
> raw: 0001   
> page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
> [ cut here ]
> kernel BUG at ./include/linux/mm.h:519!
> 
> soft_offline_in_use_page() passed refcount and page lock from tail page to
> head page, which is not needed because we can pass any subpage to
> split_huge_page().
> 
> Cc: [4.5+]
> Signed-off-by: zhongjiang 
> ---
>  mm/memory-failure.c | 14 ++
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index d9b8a24..6edc6db 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1823,19 +1823,17 @@ static int soft_offline_in_use_page(struct page 
> *page, int flags)
>   struct page *hpage = compound_head(page);
>  
>   if (!PageHuge(page) && PageTransHuge(hpage)) {
> - lock_page(hpage);
> - if (!PageAnon(hpage) || unlikely(split_huge_page(hpage))) {
> - unlock_page(hpage);
> - if (!PageAnon(hpage))
> + lock_page(page);
> + if (!PageAnon(page) || unlikely(split_huge_page(page))) {
> + unlock_page(page);
> + if (!PageAnon(page))
>   pr_info("soft offline: %#lx: non anonymous 
> thp\n", page_to_pfn(page));
>   else
>   pr_info("soft offline: %#lx: thp split 
> failed\n", page_to_pfn(page));
> - put_hwpoison_page(hpage);
> + put_hwpoison_page(page);
>   return -EBUSY;
>   }
> - unlock_page(hpage);
> - get_hwpoison_page(page);
> - put_hwpoison_page(hpage);
> + unlock_page(page);
>   }
>  
>   /*
> -- 
> 1.7.12.4
> 

-- 
Michal Hocko
SUSE Labs


[PATCH] mm: hwpoison: fix thp split handing in soft_offline_in_use_page()

2019-02-26 Thread zhong jiang
From: zhongjiang 

When soft_offline_in_use_page() runs on a thp tail page after pmd is plit,
we trigger the following VM_BUG_ON_PAGE():

Memory failure: 0x3755ff: non anonymous thp
__get_any_page: 0x3755ff: unknown zero refcount page type 2f8000
Soft offlining pfn 0x34d805 at process virtual address 0x20fff000
page:ea000d360140 count:0 mapcount:0 mapping: index:0x1
flags: 0x2f8000()
raw: 002f8000 ea000d360108 ea000d360188 
raw: 0001   
page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
[ cut here ]
kernel BUG at ./include/linux/mm.h:519!

soft_offline_in_use_page() passed refcount and page lock from tail page to
head page, which is not needed because we can pass any subpage to
split_huge_page().

Cc: [4.5+]
Signed-off-by: zhongjiang 
---
 mm/memory-failure.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index d9b8a24..6edc6db 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1823,19 +1823,17 @@ static int soft_offline_in_use_page(struct page *page, 
int flags)
struct page *hpage = compound_head(page);
 
if (!PageHuge(page) && PageTransHuge(hpage)) {
-   lock_page(hpage);
-   if (!PageAnon(hpage) || unlikely(split_huge_page(hpage))) {
-   unlock_page(hpage);
-   if (!PageAnon(hpage))
+   lock_page(page);
+   if (!PageAnon(page) || unlikely(split_huge_page(page))) {
+   unlock_page(page);
+   if (!PageAnon(page))
pr_info("soft offline: %#lx: non anonymous 
thp\n", page_to_pfn(page));
else
pr_info("soft offline: %#lx: thp split 
failed\n", page_to_pfn(page));
-   put_hwpoison_page(hpage);
+   put_hwpoison_page(page);
return -EBUSY;
}
-   unlock_page(hpage);
-   get_hwpoison_page(page);
-   put_hwpoison_page(hpage);
+   unlock_page(page);
}
 
/*
-- 
1.7.12.4