On 9/20/19 9:21 PM, Qiujun Huang wrote:
__get_user_pages_fast try to walk the page table but the
hugepage pte is replace by hwpoison swap entry by mca path.
...


Can you describe this in more details. I guess you are facing the issue with respect PUD level PTE entry that got updated by hwpoison as a swap entry. Since we don't specifically check for pud_present(), we walk the page table with wrong values and that results in corruption?


[15798.177437] mce: Uncorrected hardware memory error in
                                user-access at 224f1761c0
[15798.180171] MCE 0x224f176: Killing pal_main:6784 due to
                                hardware memory corruption
[15798.180176] MCE 0x224f176: Killing qemu-system-x86:167336
                                due to hardware memory corruption
...
[15798.180206] BUG: unable to handle kernel
[15798.180226] paging request at ffff891200003000
[15798.180236] IP: [<ffffffff8106edae>] gup_pud_range+
                                0x13e/0x1e0
...

We need to skip the hwpoison entry in gup_pud_range.

Signed-off-by: Qiujun Huang <hqjag...@gmail.com>
---
  mm/gup.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/mm/gup.c b/mm/gup.c
index 98f13ab..6157ed9 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2230,6 +2230,8 @@ static int gup_pud_range(p4d_t p4d, unsigned long addr, 
unsigned long end,
                next = pud_addr_end(addr, end);
                if (pud_none(pud))
                        return 0;
+               if (unlikely(!pud_present(pud)))
+                       return 0;


You should be able to remove that if (pud_none(pud)) check and just keep the pud_present() check?

                if (unlikely(pud_huge(pud))) {
                        if (!gup_huge_pud(pud, pudp, addr, next, flags,
                                          pages, nr))


Reply via email to