Re: [PATCH v2 05/10] mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE

2020-04-14 Thread Andrew Morton
On Tue, 14 Apr 2020 12:34:26 -0400 "Michael S. Tsirkin"  wrote:

> Andrew, could you please ack merging this through the vhost tree
> together with the rest of the patches?

Acked-by: Andrew Morton 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 05/10] mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE

2020-04-14 Thread Michael S. Tsirkin
On Wed, Mar 11, 2020 at 06:14:17PM +0100, David Hildenbrand wrote:
> virtio-mem wants to allow to offline memory blocks of which some parts
> were unplugged (allocated via alloc_contig_range()), especially, to later
> offline and remove completely unplugged memory blocks. The important part
> is that PageOffline() has to remain set until the section is offline, so
> these pages will never get accessed (e.g., when dumping). The pages should
> not be handed back to the buddy (which would require clearing PageOffline()
> and result in issues if offlining fails and the pages are suddenly in the
> buddy).
> 
> Let's allow to do that by allowing to isolate any PageOffline() page
> when offlining. This way, we can reach the memory hotplug notifier
> MEM_GOING_OFFLINE, where the driver can signal that he is fine with
> offlining this page by dropping its reference count. PageOffline() pages
> with a reference count of 0 can then be skipped when offlining the
> pages (like if they were free, however they are not in the buddy).
> 
> Anybody who uses PageOffline() pages and does not agree to offline them
> (e.g., Hyper-V balloon, XEN balloon, VMWare balloon for 2MB pages) will not
> decrement the reference count and make offlining fail when trying to
> migrate such an unmovable page. So there should be no observable change.
> Same applies to balloon compaction users (movable PageOffline() pages), the
> pages will simply be migrated.
> 
> Note 1: If offlining fails, a driver has to increment the reference
>   count again in MEM_CANCEL_OFFLINE.
> 
> Note 2: A driver that makes use of this has to be aware that re-onlining
>   the memory block has to be handled by hooking into onlining code
>   (online_page_callback_t), resetting the page PageOffline() and
>   not giving them to the buddy.
> 
> Reviewed-by: Alexander Duyck 
> Acked-by: Michal Hocko 
> Cc: Andrew Morton 
> Cc: Juergen Gross 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Pavel Tatashin 
> Cc: Alexander Duyck 
> Cc: Vlastimil Babka 
> Cc: Johannes Weiner 
> Cc: Anthony Yznaga 
> Cc: Michal Hocko 
> Cc: Oscar Salvador 
> Cc: Mel Gorman 
> Cc: Mike Rapoport 
> Cc: Dan Williams 
> Cc: Anshuman Khandual 
> Cc: Qian Cai 
> Cc: Pingfan Liu 
> Signed-off-by: David Hildenbrand 

Andrew, could you please ack merging this through the vhost tree
together with the rest of the patches?



> ---
>  include/linux/page-flags.h | 10 +
>  mm/memory_hotplug.c| 44 +-
>  mm/page_alloc.c| 24 +
>  mm/page_isolation.c|  9 
>  4 files changed, 77 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 49c2697046b9..fd6d4670ccc3 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -772,6 +772,16 @@ PAGE_TYPE_OPS(Buddy, buddy)
>   * not onlined when onlining the section).
>   * The content of these pages is effectively stale. Such pages should not
>   * be touched (read/write/dump/save) except by their owner.
> + *
> + * If a driver wants to allow to offline unmovable PageOffline() pages 
> without
> + * putting them back to the buddy, it can do so via the memory notifier by
> + * decrementing the reference count in MEM_GOING_OFFLINE and incrementing the
> + * reference count in MEM_CANCEL_OFFLINE. When offlining, the PageOffline()
> + * pages (now with a reference count of zero) are treated like free pages,
> + * allowing the containing memory block to get offlined. A driver that
> + * relies on this feature is aware that re-onlining the memory block will
> + * require to re-set the pages PageOffline() and not giving them to the
> + * buddy via online_page_callback_t.
>   */
>  PAGE_TYPE_OPS(Offline, offline)
>  
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 1a00b5a37ef6..ab1c31e67fd1 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1221,11 +1221,17 @@ struct zone *test_pages_in_a_zone(unsigned long 
> start_pfn,
>  
>  /*
>   * Scan pfn range [start,end) to find movable/migratable pages (LRU pages,
> - * non-lru movable pages and hugepages). We scan pfn because it's much
> - * easier than scanning over linked list. This function returns the pfn
> - * of the first found movable page if it's found, otherwise 0.
> + * non-lru movable pages and hugepages). Will skip over most unmovable
> + * pages (esp., pages that can be skipped when offlining), but bail out on
> + * definitely unmovable pages.
> + *
> + * Returns:
> + *   0 in case a movable page is found and movable_pfn was updated.
> + *   -ENOENT in case no movable page was found.
> + *   -EBUSY in case a definitely unmovable page was found.
>   */
> -static unsigned long scan_movable_pages(unsigned long start, unsigned long 
> end)
> +static int scan_movable_pages(unsigned long start, unsigned long end,
> +   unsigned long *movable_pfn)
>  {
>   unsigned long pfn;
>  
> @@ -12

[PATCH v2 05/10] mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE

2020-03-11 Thread David Hildenbrand
virtio-mem wants to allow to offline memory blocks of which some parts
were unplugged (allocated via alloc_contig_range()), especially, to later
offline and remove completely unplugged memory blocks. The important part
is that PageOffline() has to remain set until the section is offline, so
these pages will never get accessed (e.g., when dumping). The pages should
not be handed back to the buddy (which would require clearing PageOffline()
and result in issues if offlining fails and the pages are suddenly in the
buddy).

Let's allow to do that by allowing to isolate any PageOffline() page
when offlining. This way, we can reach the memory hotplug notifier
MEM_GOING_OFFLINE, where the driver can signal that he is fine with
offlining this page by dropping its reference count. PageOffline() pages
with a reference count of 0 can then be skipped when offlining the
pages (like if they were free, however they are not in the buddy).

Anybody who uses PageOffline() pages and does not agree to offline them
(e.g., Hyper-V balloon, XEN balloon, VMWare balloon for 2MB pages) will not
decrement the reference count and make offlining fail when trying to
migrate such an unmovable page. So there should be no observable change.
Same applies to balloon compaction users (movable PageOffline() pages), the
pages will simply be migrated.

Note 1: If offlining fails, a driver has to increment the reference
count again in MEM_CANCEL_OFFLINE.

Note 2: A driver that makes use of this has to be aware that re-onlining
the memory block has to be handled by hooking into onlining code
(online_page_callback_t), resetting the page PageOffline() and
not giving them to the buddy.

Reviewed-by: Alexander Duyck 
Acked-by: Michal Hocko 
Cc: Andrew Morton 
Cc: Juergen Gross 
Cc: Konrad Rzeszutek Wilk 
Cc: Pavel Tatashin 
Cc: Alexander Duyck 
Cc: Vlastimil Babka 
Cc: Johannes Weiner 
Cc: Anthony Yznaga 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Mel Gorman 
Cc: Mike Rapoport 
Cc: Dan Williams 
Cc: Anshuman Khandual 
Cc: Qian Cai 
Cc: Pingfan Liu 
Signed-off-by: David Hildenbrand 
---
 include/linux/page-flags.h | 10 +
 mm/memory_hotplug.c| 44 +-
 mm/page_alloc.c| 24 +
 mm/page_isolation.c|  9 
 4 files changed, 77 insertions(+), 10 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 49c2697046b9..fd6d4670ccc3 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -772,6 +772,16 @@ PAGE_TYPE_OPS(Buddy, buddy)
  * not onlined when onlining the section).
  * The content of these pages is effectively stale. Such pages should not
  * be touched (read/write/dump/save) except by their owner.
+ *
+ * If a driver wants to allow to offline unmovable PageOffline() pages without
+ * putting them back to the buddy, it can do so via the memory notifier by
+ * decrementing the reference count in MEM_GOING_OFFLINE and incrementing the
+ * reference count in MEM_CANCEL_OFFLINE. When offlining, the PageOffline()
+ * pages (now with a reference count of zero) are treated like free pages,
+ * allowing the containing memory block to get offlined. A driver that
+ * relies on this feature is aware that re-onlining the memory block will
+ * require to re-set the pages PageOffline() and not giving them to the
+ * buddy via online_page_callback_t.
  */
 PAGE_TYPE_OPS(Offline, offline)
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 1a00b5a37ef6..ab1c31e67fd1 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1221,11 +1221,17 @@ struct zone *test_pages_in_a_zone(unsigned long 
start_pfn,
 
 /*
  * Scan pfn range [start,end) to find movable/migratable pages (LRU pages,
- * non-lru movable pages and hugepages). We scan pfn because it's much
- * easier than scanning over linked list. This function returns the pfn
- * of the first found movable page if it's found, otherwise 0.
+ * non-lru movable pages and hugepages). Will skip over most unmovable
+ * pages (esp., pages that can be skipped when offlining), but bail out on
+ * definitely unmovable pages.
+ *
+ * Returns:
+ * 0 in case a movable page is found and movable_pfn was updated.
+ * -ENOENT in case no movable page was found.
+ * -EBUSY in case a definitely unmovable page was found.
  */
-static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
+static int scan_movable_pages(unsigned long start, unsigned long end,
+ unsigned long *movable_pfn)
 {
unsigned long pfn;
 
@@ -1237,18 +1243,30 @@ static unsigned long scan_movable_pages(unsigned long 
start, unsigned long end)
continue;
page = pfn_to_page(pfn);
if (PageLRU(page))
-   return pfn;
+   goto found;
if (__PageMovable(page))
-   return pfn;
+