On Fri, Apr 05, 2013 at 02:59:43PM -0400, KOSAKI Motohiro wrote: > (4/3/13 2:35 PM), Naoya Horiguchi wrote: > > With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in > > initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory > > error happens on a hugepage and the affected processes try to access > > the error hugepage, we hit VM_BUG_ON(atomic_read(&page->_count) <= 0) > > in get_page(). > > > > The reason for this bug is that coredump-related code doesn't recognise > > "hugepage hwpoison entry" with which a pmd entry is replaced when a memory > > error occurs on a hugepage. > > In other words, physical address information is stored in different bit > > layout > > between hugepage hwpoison entry and pmd entry, so follow_hugetlb_page() > > which is called in get_dump_page() returns a wrong page from a given > > address. > > > > We need to filter out only hwpoison hugepages to have data on healthy > > hugepages in coredump. So this patch makes follow_hugetlb_page() avoid > > trying to get page when a pmd is in swap entry like format. > > > > ChangeLog v3: > > - add comment about using is_swap_pte() > > > > Signed-off-by: Naoya Horiguchi <n-horigu...@ah.jp.nec.com> > > Reviewed-by: Michal Hocko <mho...@suse.cz> > > Acked-by: Konstantin Khlebnikov <khlebni...@openvz.org> > > Cc: sta...@vger.kernel.org > > --- > > mm/hugetlb.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git v3.9-rc3.orig/mm/hugetlb.c v3.9-rc3/mm/hugetlb.c > > index 0d1705b..3bc20bd 100644 > > --- v3.9-rc3.orig/mm/hugetlb.c > > +++ v3.9-rc3/mm/hugetlb.c > > @@ -2966,9 +2966,15 @@ long follow_hugetlb_page(struct mm_struct *mm, > > struct vm_area_struct *vma, > > * Some archs (sparc64, sh*) have multiple pte_ts to > > * each hugepage. We have to make sure we get the > > * first, for the page indexing below to work. > > + * > > + * is_swap_pte test covers both is_hugetlb_entry_hwpoisoned > > + * and hugepages under migration in which case > > + * hugetlb_fault waits for the migration and bails out > > + * properly for HWPosined pages. > > */ > > pte = huge_pte_offset(mm, vaddr & huge_page_mask(h)); > > - absent = !pte || huge_pte_none(huge_ptep_get(pte)); > > + absent = !pte || huge_pte_none(huge_ptep_get(pte)) || > > + is_swap_pte(huge_ptep_get(pte)); > > Hmmm... > > Now absent has two meanings. 1) skip hugetlb_fault() and return immediately > if FOLL_DUMP is used. > 2) call hugetlb_fault() if to be need page population or cow. > > The description of this patch only explain about (2). and I'm not convinced > why we don't need to > dump pages under migraion.
We can/should dump hugepages under migration, and to do that we have to put is_swap_pte() in the check of the hugetlb_falut block. I updated this patch like below. # I suspended Reviewed and Acked given for the previous version, because # it has a non-minor change. If you want to restore it, please let me know. Thanks, Naoya ------------------------------------------------------------------------ From: Naoya Horiguchi <n-horigu...@ah.jp.nec.com> Date: Thu, 28 Mar 2013 10:17:38 -0400 Subject: [PATCH v4] hugetlbfs: add swap entry check in follow_hugetlb_page() With applying the previous patch "hugetlbfs: stop setting VM_DONTDUMP in initializing vma(VM_HUGETLB)" to reenable hugepage coredump, if a memory error happens on a hugepage and the affected processes try to access the error hugepage, we hit VM_BUG_ON(atomic_read(&page->_count) <= 0) in get_page(). The reason for this bug is that coredump-related code doesn't recognise "hugepage hwpoison entry" with which a pmd entry is replaced when a memory error occurs on a hugepage. In other words, physical address information is stored in different bit layout between hugepage hwpoison entry and pmd entry, so follow_hugetlb_page() which is called in get_dump_page() returns a wrong page from a given address. The expected behavior is like this: absent is_swap_pte FOLL_DUMP Expected behavior --------------------------------------------------------------------- true false false hugetlb_fault false true false hugetlb_fault false false false return page true false true skip page (to avoid allocation) false true true hugetlb_fault false false true return page With this patch, we can call hugetlb_fault() and take proper actions (we can wait for migration entries, fail with VM_FAULT_HWPOISON_LARGE for hwpoisoned entries,) and as the result we can dump all hugepages except for hwpoisoned ones. ChangeLog v4: - move is_swap_page() to right place. ChangeLog v3: - add comment about using is_swap_pte() Signed-off-by: Naoya Horiguchi <n-horigu...@ah.jp.nec.com> Cc: sta...@vger.kernel.org --- mm/hugetlb.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0d1705b..f155e59 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2983,7 +2983,13 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, break; } - if (absent || + /* + * is_swap_pte test covers both is_hugetlb_entry_hwpoisoned + * and hugepages under migration in which case + * hugetlb_fault waits for the migration and bails out + * properly for HWPosined pages. + */ + if (absent || is_swap_pte(huge_ptep_get(pte)) || ((flags & FOLL_WRITE) && !pte_write(huge_ptep_get(pte)))) { int ret; -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/