On Mon, Aug 03, 2020 at 11:19:09AM -0400, Qian Cai wrote:
> On Mon, Aug 03, 2020 at 01:36:58PM +0000, HORIGUCHI NAOYA(堀口 直也) wrote:
> > Hello,
> > 
> > On Mon, Aug 03, 2020 at 08:39:55AM -0400, Qian Cai wrote:
> > > On Fri, Jul 31, 2020 at 12:20:56PM +0000, nao.horigu...@gmail.com wrote:
> > > > This patchset is the latest version of soft offline rework patchset
> > > > targetted for v5.9.
> > > > 
> > > > Main focus of this series is to stabilize soft offline.  Historically 
> > > > soft
> > > > offlined pages have suffered from racy conditions because PageHWPoison 
> > > > is
> > > > used to a little too aggressively, which (directly or indirectly) 
> > > > invades
> > > > other mm code which cares little about hwpoison.  This results in 
> > > > unexpected
> > > > behavior or kernel panic, which is very far from soft offline's "do not
> > > > disturb userspace or other kernel component" policy.
> > > > 
> > > > Main point of this change set is to contain target page "via buddy 
> > > > allocator",
> > > > where we first free the target page as we do for normal pages, and 
> > > > remove
> > > > from buddy only when we confirm that it reaches free list. There is 
> > > > surely
> > > > race window of page allocation, but that's fine because someone really 
> > > > want
> > > > that page and the page is still working, so soft offline can happily 
> > > > give up.
> > > > 
> > > > v4 from Oscar tries to handle the race around reallocation, but that 
> > > > part
> > > > seems still work in progress, so I decide to separate it for changes 
> > > > into
> > > > v5.9.  Thank you for your contribution, Oscar.
> > > > 
> > > > The issue reported by Qian Cai is fixed by patch 16/16.
> > > 
> > > I am still getting EIO everywhere on next-20200803 (which includes this 
> > > v5).
> > > 
> > > # ./random 1
> > > - start: migrate_huge_offline
> > > - use NUMA nodes 0,8.
> > > - mmap and free 8388608 bytes hugepages on node 0
> > > - mmap and free 8388608 bytes hugepages on node 8
> > > madvise: Input/output error
> > > 
> > > From the serial console,
> > > 
> > > [  637.164222][ T8357] soft offline: 0x118ee0: hugepage isolation failed: 
> > > 0, page count 2, type 7fff800001000e (referenced|uptodate|dirty|head)
> > > [  637.164890][ T8357] Soft offlining pfn 0x20001380 at process virtual 
> > > address 0x7fff9f000000
> > > [  637.165422][ T8357] Soft offlining pfn 0x3ba00 at process virtual 
> > > address 0x7fff9f200000
> > > [  637.166409][ T8357] Soft offlining pfn 0x201914a0 at process virtual 
> > > address 0x7fff9f000000
> > > [  637.166833][ T8357] Soft offlining pfn 0x12b9a0 at process virtual 
> > > address 0x7fff9f200000
> > > [  637.168044][ T8357] Soft offlining pfn 0x1abb60 at process virtual 
> > > address 0x7fff9f000000
> > > [  637.168557][ T8357] Soft offlining pfn 0x20014820 at process virtual 
> > > address 0x7fff9f200000
> > > [  637.169493][ T8357] Soft offlining pfn 0x119720 at process virtual 
> > > address 0x7fff9f000000
> > > [  637.169603][ T8357] soft offline: 0x119720: hugepage isolation failed: 
> > > 0, page count 2, type 7fff800001000e (referenced|uptodate|dirty|head)
> > > [  637.169756][ T8357] Soft offlining pfn 0x118ee0 at process virtual 
> > > address 0x7fff9f200000
> > > [  637.170653][ T8357] Soft offlining pfn 0x200e81e0 at process virtual 
> > > address 0x7fff9f000000
> > > [  637.171067][ T8357] Soft offlining pfn 0x201c5f60 at process virtual 
> > > address 0x7fff9f200000
> > > [  637.172101][ T8357] Soft offlining pfn 0x201c8f00 at process virtual 
> > > address 0x7fff9f000000
> > > [  637.172241][ T8357] __get_any_page: 0x201c8f00: unknown zero refcount 
> > > page type 87fff8000000000
> > 
> > I might misjudge to skip the following patch, sorry about that.
> > Could you try with it?
> 
> Still getting EIO after applied this patch.
> 
> [ 1215.499030][T88982] soft offline: 0x201bdc20: hugepage isolation failed: 
> 0, page count 2, type 87fff800001000e (referenced|uptodate|dirty|head)
> [ 1215.499775][T88982] Soft offlining pfn 0x201bdc20 at process virtual 
> address 0x7fff91a00000
> [ 1215.500189][T88982] Soft offlining pfn 0x201c19c0 at process virtual 
> address 0x7fff91c00000
> [ 1215.500297][T88982] soft offline: 0x201c19c0: hugepage isolation failed: 
> 0, page count 2, type 87fff800001000e (referenced|uptodate|dirty|head)
> [ 1215.500982][T88982] Soft offlining pfn 0x1f1fa0 at process virtual address 
> 0x7fff91a00000
> [ 1215.501086][T88982] soft offline: 0x1f1fa0: hugepage isolation failed: 0, 
> page count 2, type 7fff800001000e (referenced|uptodate|dirty|head)
> [ 1215.501237][T88982] Soft offlining pfn 0x1f4520 at process virtual address 
> 0x7fff91c00000
> [ 1215.501355][T88982] soft offline: 0x1f4520: hugepage isolation failed: 0, 
> page count 2, type 7fff800001000e (referenced|uptodate|dirty|head)
> [ 1215.502196][T88982] Soft offlining pfn 0x1f4520 at process virtual address 
> 0x7fff91a00000
> [ 1215.502573][T88982] Soft offlining pfn 0x1f1fa0 at process virtual address 
> 0x7fff91c00000
> [ 1215.502687][T88982] soft offline: 0x1f1fa0: hugepage isolation failed: 0, 
> page count 2, type 7fff800001000e (referenced|uptodate|dirty|head)
> [ 1215.503245][T88982] Soft offlining pfn 0x201c3cc0 at process virtual 
> address 0x7fff91a00000
> [ 1215.503594][T88982] Soft offlining pfn 0x201c3ce0 at process virtual 
> address 0x7fff91c00000
> [ 1215.503755][T88982] __get_any_page: 0x201c3ce0: unknown zero refcount page 
> type 87fff8000000000

So I lean to cancel patch 3/16 where madvise_inject_error() releases page
refcount before calling soft_offline_page(). That should solve the issue
because refcount of the target page never becomes zero.
Patch 4/16, 8/16 and 9/16 depend on it, so they should be removed too.
I'll resubmit the series later, but I prepare the delta below (based on
next-20200805), so it would be appreciated if you can run random.c with it
to confrim the fix.

Thanks,
Naoya Horiguchi
---
>From 533090c0869aeca88d8ff14d27c3cef8fc060ccd Mon Sep 17 00:00:00 2001
From: Naoya Horiguchi <naoya.horigu...@nec.com>
Date: Thu, 6 Aug 2020 01:29:08 +0900
Subject: [PATCH] (for testing) revert change around removing
 MF_COUNT_INCREASED

revert the following patches
- mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
- mm,madvise: Refactor madvise_inject_error
- mm,hwpoison: remove MF_COUNT_INCREASED
- mm,hwpoison: remove flag argument from soft offline functions

Signed-off-by: Naoya Horiguchi <naoya.horigu...@nec.com>
---
 drivers/base/memory.c |  2 +-
 include/linux/mm.h    |  9 +++++----
 mm/madvise.c          | 36 +++++++++++++++++++-----------------
 mm/memory-failure.c   | 31 ++++++++++++++++++++-----------
 4 files changed, 45 insertions(+), 33 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 3e6d27c9dff6..4db3c660de83 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -463,7 +463,7 @@ static ssize_t soft_offline_page_store(struct device *dev,
        if (kstrtoull(buf, 0, &pfn) < 0)
                return -EINVAL;
        pfn >>= PAGE_SHIFT;
-       ret = soft_offline_page(pfn);
+       ret = soft_offline_page(pfn, 0);
        return ret == 0 ? count : ret;
 }
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4f12b2465e80..442921a004a2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2976,9 +2976,10 @@ void register_page_bootmem_memmap(unsigned long 
section_nr, struct page *map,
                                  unsigned long nr_pages);
 
 enum mf_flags {
-       MF_ACTION_REQUIRED = 1 << 0,
-       MF_MUST_KILL = 1 << 1,
-       MF_SOFT_OFFLINE = 1 << 2,
+       MF_COUNT_INCREASED = 1 << 0,
+       MF_ACTION_REQUIRED = 1 << 1,
+       MF_MUST_KILL = 1 << 2,
+       MF_SOFT_OFFLINE = 1 << 3,
 };
 extern int memory_failure(unsigned long pfn, int flags);
 extern void memory_failure_queue(unsigned long pfn, int flags);
@@ -2988,7 +2989,7 @@ extern int sysctl_memory_failure_early_kill;
 extern int sysctl_memory_failure_recovery;
 extern void shake_page(struct page *p, int access);
 extern atomic_long_t num_poisoned_pages __read_mostly;
-extern int soft_offline_page(unsigned long pfn);
+extern int soft_offline_page(unsigned long pfn, int flags);
 
 
 /*
diff --git a/mm/madvise.c b/mm/madvise.c
index 843f6fad3b89..5fa5f66468b3 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -886,15 +886,16 @@ static long madvise_remove(struct vm_area_struct *vma,
 static int madvise_inject_error(int behavior,
                unsigned long start, unsigned long end)
 {
+       struct page *page;
        struct zone *zone;
        unsigned long size;
 
        if (!capable(CAP_SYS_ADMIN))
                return -EPERM;
 
+
        for (; start < end; start += size) {
                unsigned long pfn;
-               struct page *page;
                int ret;
 
                ret = get_user_pages_fast(start, 1, 0, &page);
@@ -909,26 +910,27 @@ static int madvise_inject_error(int behavior,
                 */
                size = page_size(compound_head(page));
 
-               /*
-                * The get_user_pages_fast() is just to get the pfn of the
-                * given address, and the refcount has nothing to do with
-                * what we try to test, so it should be released immediately.
-                * This is racy but it's intended because the real hardware
-                * errors could happen at any moment and memory error handlers
-                * must properly handle the race.
-                */
-               put_page(page);
-
                if (behavior == MADV_SOFT_OFFLINE) {
                        pr_info("Soft offlining pfn %#lx at process virtual 
address %#lx\n",
-                               pfn, start);
-                       ret = soft_offline_page(pfn);
-               } else {
-                       pr_info("Injecting memory failure for pfn %#lx at 
process virtual address %#lx\n",
-                               pfn, start);
-                       ret = memory_failure(pfn, 0);
+                                       pfn, start);
+
+                       ret = soft_offline_page(pfn, MF_COUNT_INCREASED);
+                       if (ret)
+                               return ret;
+                       continue;
                }
 
+               pr_info("Injecting memory failure for pfn %#lx at process 
virtual address %#lx\n",
+                               pfn, start);
+
+               /*
+                * Drop the page reference taken by get_user_pages_fast(). In
+                * the absence of MF_COUNT_INCREASED the memory_failure()
+                * routine is responsible for pinning the page to prevent it
+                * from being released back to the page allocator.
+                */
+               put_page(page);
+               ret = memory_failure(pfn, 0);
                if (ret)
                        return ret;
        }
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a229d4694954..fd50a6f9a60d 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1167,7 +1167,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int 
flags)
 
        num_poisoned_pages_inc();
 
-       if (!get_hwpoison_page(p)) {
+       if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) {
                /*
                 * Check "filter hit" and "race with other subpage."
                 */
@@ -1363,7 +1363,7 @@ int memory_failure(unsigned long pfn, int flags)
         * In fact it's dangerous to directly bump up page count from 0,
         * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
         */
-       if (!get_hwpoison_page(p)) {
+       if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) {
                if (is_free_buddy_page(p)) {
                        action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
                        return 0;
@@ -1392,7 +1392,10 @@ int memory_failure(unsigned long pfn, int flags)
        shake_page(p, 0);
        /* shake_page could have turned it free. */
        if (!PageLRU(p) && is_free_buddy_page(p)) {
-               action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
+               if (flags & MF_COUNT_INCREASED)
+                       action_result(pfn, MF_MSG_BUDDY, MF_DELAYED);
+               else
+                       action_result(pfn, MF_MSG_BUDDY_2ND, MF_DELAYED);
                return 0;
        }
 
@@ -1540,7 +1543,7 @@ static void memory_failure_work_func(struct work_struct 
*work)
                if (!gotten)
                        break;
                if (entry.flags & MF_SOFT_OFFLINE)
-                       soft_offline_page(entry.pfn);
+                       soft_offline_page(entry.pfn, entry.flags);
                else
                        memory_failure(entry.pfn, entry.flags);
        }
@@ -1679,10 +1682,13 @@ EXPORT_SYMBOL(unpoison_memory);
  * that is not free, and 1 for any other page type.
  * For 1 the page is returned with increased page count, otherwise not.
  */
-static int __get_any_page(struct page *p, unsigned long pfn)
+static int __get_any_page(struct page *p, unsigned long pfn, int flags)
 {
        int ret;
 
+       if (flags & MF_COUNT_INCREASED)
+               return 1;
+
        /*
         * When the target page is a free hugepage, just remove it
         * from free hugepage list.
@@ -1709,12 +1715,12 @@ static int __get_any_page(struct page *p, unsigned long 
pfn)
        return ret;
 }
 
-static int get_any_page(struct page *page, unsigned long pfn)
+static int get_any_page(struct page *page, unsigned long pfn, int flags)
 {
-       int ret = __get_any_page(page, pfn);
+       int ret = __get_any_page(page, pfn, flags);
 
        if (ret == -EBUSY)
-               ret = __get_any_page(page, pfn);
+               ret = __get_any_page(page, pfn, flags);
 
        if (ret == 1 && !PageHuge(page) &&
            !PageLRU(page) && !__PageMovable(page)) {
@@ -1727,7 +1733,7 @@ static int get_any_page(struct page *page, unsigned long 
pfn)
                /*
                 * Did it turn free?
                 */
-               ret = __get_any_page(page, pfn);
+               ret = __get_any_page(page, pfn, 0);
                if (ret == 1 && !PageLRU(page)) {
                        /* Drop page reference which is from __get_any_page() */
                        put_page(page);
@@ -1869,6 +1875,7 @@ static int soft_offline_free_page(struct page *page)
 /**
  * soft_offline_page - Soft offline a page.
  * @pfn: pfn to soft-offline
+ * @flags: flags. Same as memory_failure().
  *
  * Returns 0 on success, otherwise negated errno.
  *
@@ -1887,7 +1894,7 @@ static int soft_offline_free_page(struct page *page)
  * This is not a 100% solution for all memory, but tries to be
  * ``good enough'' for the majority of memory.
  */
-int soft_offline_page(unsigned long pfn)
+int soft_offline_page(unsigned long pfn, int flags)
 {
        int ret;
        struct page *page;
@@ -1901,11 +1908,13 @@ int soft_offline_page(unsigned long pfn)
 
        if (PageHWPoison(page)) {
                pr_info("soft offline: %#lx page already poisoned\n", pfn);
+               if (flags & MF_COUNT_INCREASED)
+                       put_page(page);
                return 0;
        }
 
        get_online_mems();
-       ret = get_any_page(page, pfn);
+       ret = get_any_page(page, pfn, flags);
        put_online_mems();
 
        if (ret > 0)
-- 
2.25.1

Reply via email to