Re: [PATCH] MCE: fix an error of mce_bad_pages statistics
On Fri, Dec 07, 2012 at 03:35:15PM +0800, Xishi Qiu wrote: > Hi Borislav, you mean we should move this to the beginning of > soft_offline_page()? > > soft_offline_page() > { > ... > get_any_page() > ... > /* >* Synchronized using the page lock with memory_failure() >*/ > if (PageHWPoison(page)) { > unlock_page(page); Basically yes, except without the unlock_page. Where do you do lock_page earlier so that you need to unlock it now? Thanks. -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] MCE: fix an error of mce_bad_pages statistics
On 2012/12/7 15:25, Borislav Petkov wrote: > On Fri, Dec 07, 2012 at 10:53:41AM +0800, Xishi Qiu wrote: >> On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to >> offline a >> free page twice, the value of mce_bad_pages will be added twice. So this is >> an error, >> since the page was already marked HWPoison, we should skip the page and >> don't add the >> value of mce_bad_pages. >> >> $ cat /proc/meminfo | grep HardwareCorrupted >> >> soft_offline_page() >> get_any_page() >> atomic_long_add(1, &mce_bad_pages) >> >> The free page which marked HWPoison is still managed by page buddy >> allocator. So when >> offlining it again, get_any_page() always returns 0 with >> "pr_info("%s: %#lx free buddy page\n", __func__, pfn);". >> >> When page is allocated, the PageBuddy is removed in bad_page(), then >> get_any_page() >> returns -EIO with pr_info("%s: %#lx: unknown zero refcount page type %lx\n", >> so >> mce_bad_pages will not be added. >> >> Signed-off-by: Xishi Qiu >> Signed-off-by: Jiang Liu >> --- >> mm/memory-failure.c |5 + >> 1 files changed, 5 insertions(+), 0 deletions(-) >> >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index 8b20278..02a522e 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -1375,6 +1375,11 @@ static int get_any_page(struct page *p, unsigned long >> pfn, int flags) >> if (flags & MF_COUNT_INCREASED) >> return 1; >> >> +if (PageHWPoison(p)) { >> +pr_info("%s: %#lx page already poisoned\n", __func__, pfn); >> +return -EBUSY; >> +} > > Shouldn't this be done in soft_offline_page() instead, like it is done > in soft_offline_huge_page() for hugepages? > > Thanks. > Hi Borislav, you mean we should move this to the beginning of soft_offline_page()? soft_offline_page() { ... get_any_page() ... /* * Synchronized using the page lock with memory_failure() */ if (PageHWPoison(page)) { unlock_page(page); put_page(page); pr_info("soft offline: %#lx page already poisoned\n", pfn); return -EBUSY; } ... } Thanks Xishi Qiu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] MCE: fix an error of mce_bad_pages statistics
On Fri, Dec 07, 2012 at 10:53:41AM +0800, Xishi Qiu wrote: > On x86 platform, if we use "/sys/devices/system/memory/soft_offline_page" to > offline a > free page twice, the value of mce_bad_pages will be added twice. So this is > an error, > since the page was already marked HWPoison, we should skip the page and don't > add the > value of mce_bad_pages. > > $ cat /proc/meminfo | grep HardwareCorrupted > > soft_offline_page() > get_any_page() > atomic_long_add(1, &mce_bad_pages) > > The free page which marked HWPoison is still managed by page buddy allocator. > So when > offlining it again, get_any_page() always returns 0 with > "pr_info("%s: %#lx free buddy page\n", __func__, pfn);". > > When page is allocated, the PageBuddy is removed in bad_page(), then > get_any_page() > returns -EIO with pr_info("%s: %#lx: unknown zero refcount page type %lx\n", > so > mce_bad_pages will not be added. > > Signed-off-by: Xishi Qiu > Signed-off-by: Jiang Liu > --- > mm/memory-failure.c |5 + > 1 files changed, 5 insertions(+), 0 deletions(-) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 8b20278..02a522e 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -1375,6 +1375,11 @@ static int get_any_page(struct page *p, unsigned long > pfn, int flags) > if (flags & MF_COUNT_INCREASED) > return 1; > > + if (PageHWPoison(p)) { > + pr_info("%s: %#lx page already poisoned\n", __func__, pfn); > + return -EBUSY; > + } Shouldn't this be done in soft_offline_page() instead, like it is done in soft_offline_huge_page() for hugepages? Thanks. -- Regards/Gruss, Boris. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/