Re: [PATCH v6 4/8] mm,memory_hotplug: Allocate memmap from the added memory range

2021-04-09 Thread David Hildenbrand

On 09.04.21 07:10, Oscar Salvador wrote:

On Thu, Apr 08, 2021 at 10:05:18PM -0700, Andrew Morton wrote:

Yes please.


I just sent v7 with that yesterday [1]

Hope David/Michal finds some time to review patch#4 as that is the only
missing piece atm.


I will have a look next week. Still have to wrap my head around the 
present_pages stuff and if that couldn't be handled in a cleaner way.


--
Thanks,

David / dhildenb



Re: [PATCH v6 4/8] mm,memory_hotplug: Allocate memmap from the added memory range

2021-04-08 Thread Oscar Salvador
On Thu, Apr 08, 2021 at 10:05:18PM -0700, Andrew Morton wrote:
> Yes please.

I just sent v7 with that yesterday [1]

Hope David/Michal finds some time to review patch#4 as that is the only
missing piece atm.

[1] https://lkml.org/lkml/2021/4/8/546


-- 
Oscar Salvador
SUSE L3


Re: [PATCH v6 4/8] mm,memory_hotplug: Allocate memmap from the added memory range

2021-04-08 Thread Andrew Morton
On Wed, 07 Apr 2021 22:38:37 +0200 Oscar Salvador  wrote:

> On 2021-04-06 22:28, Oscar Salvador wrote:
> > Heh, it seems I spaced out today.
> > 
> > We need a few things on top:
> > 
> 

Yes please.


Re: [PATCH v6 4/8] mm,memory_hotplug: Allocate memmap from the added memory range

2021-04-07 Thread Oscar Salvador

On 2021-04-06 22:28, Oscar Salvador wrote:

Heh, it seems I spaced out today.

We need a few things on top:



Should I send a new version with the fixup included?
I think it would ease the review but I do not want to spam.


--
Oscar Salvador
SUSE L3


Re: [PATCH v6 4/8] mm,memory_hotplug: Allocate memmap from the added memory range

2021-04-06 Thread Oscar Salvador
On Tue, Apr 06, 2021 at 01:11:11PM +0200, Oscar Salvador wrote:
> Physical memory hotadd has to allocate a memmap (struct page array) for
> the newly added memory section. Currently, alloc_pages_node() is used
> for those allocations.
> 
> This has some disadvantages:
>  a) an existing memory is consumed for that purpose
> (eg: ~2MB per 128MB memory section on x86_64)
>  b) if the whole node is movable then we have off-node struct pages
> which has performance drawbacks.
>  c) It might be there are no PMD_ALIGNED chunks so memmap array gets
> populated with base pages.
> 
> This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled.
> 
> Vmemap page tables can map arbitrary memory.
> That means that we can simply use the beginning of each memory section and
> map struct pages there.
> struct pages which back the allocated space then just need to be treated
> carefully.
> 
> Implementation wise we will reuse vmem_altmap infrastructure to override
> the default allocator used by __populate_section_memmap.
> Part of the implementation also relies on memory_block structure gaining
> a new field which specifies the number of vmemmap_pages at the beginning.
> This patch also introduces the following functions:
> 
>  - vmemmap_init_space: Initializes vmemmap pages by calling 
> move_pfn_range_to_zone(),
>  calls kasan_add_zero_shadow() or the vmemmap range and 
> marks
>  online as many sections as vmemmap pages fully span.
>  - vmemmap_adjust_pages: Accounts/substract vmemmap_pages to node and zone
>present_pages
>  - vmemmap_deinit_space: Undoes what vmemmap_init_space does.
> 
> The new function memory_block_online() calls vmemmap_init_space() before
> doing the actual online_pages(). Should online_pages() fail, we clean up
> by calling vmemmap_adjust_pages() and vmemmap_deinit_space().
> 
> On offline, memory_block_offline() calls vmemmap_adjust_pages() prior to 
> calling
> offline_pages(), because offline_pages() performs the tearing-down of kthreads
> and the rebuilding of the zonelists if the node/zone become empty.
> If offline_pages() fails, we account back vmemmap pages by 
> vmemmap_adjust_pages().
> If it succeeds, we call vmemmap_deinit_space().
> 
> Hot-remove:
> 
>  We need to be careful when removing memory, as adding and
>  removing memory needs to be done with the same granularity.
>  To check that this assumption is not violated, we check the
>  memory range we want to remove and if a) any memory block has
>  vmemmap pages and b) the range spans more than a single memory
>  block, we scream out loud and refuse to proceed.
> 
>  If all is good and the range was using memmap on memory (aka vmemmap pages),
>  we construct an altmap structure so free_hugepage_table does the right
>  thing and calls vmem_altmap_free instead of free_pagetable.
> 
> Signed-off-by: Oscar Salvador 

Heh, it seems I spaced out today.

We need a few things on top:

- In case !CONFIG_MEMORY_HOTREMOVE, we still need vmemmap_deinit_space
  as we call it from memory_block_online() too in case online_pages()
  fails. So we need to move it out of #CONFIG_MEMORY_HOTREMOVE, with
  the others.
- If vmemmap pages fully spans sections, we need to mark those sections
  as online, since online_pages() will not do it for us.
  In vmemmap_deinit_space, we need to mark them back offline.
  Since vmemmap_deinit_space might get called from
  memory_block_online(), we need to move the offline_mem_sections()
  out of #CONFIG_MEMORY_HOTREMOVE code.

So, we would need the following on top:

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index cc3dbc0abf02..c7669d2accfd 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -111,6 +111,7 @@ extern int add_one_highpage(struct page *page, int pfn, int 
bad_ppro);
 extern void vmemmap_adjust_pages(unsigned long pfn, long nr_pages);
 extern int vmemmap_init_space(unsigned long pfn, unsigned long nr_pages,
  int nid, int online_type);
+extern void vmemmap_deinit_space(unsigned long pfn, unsigned long nr_pages);
 extern int online_pages(unsigned long pfn, unsigned long nr_pages,
int online_type, int nid);
 extern struct zone *test_pages_in_a_zone(unsigned long start_pfn,
@@ -317,7 +318,6 @@ static inline void pgdat_resize_init(struct pglist_data 
*pgdat) {}

 #ifdef CONFIG_MEMORY_HOTREMOVE

-extern void vmemmap_deinit_space(unsigned long pfn, unsigned long nr_pages);
 extern void try_offline_node(int nid);
 extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
 extern int remove_memory(int nid, u64 start, u64 size);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 747e1c21d8e2..76f4ca5ed230 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1383,10 +1383,8 @@ static inline int online_section_nr(unsigned long nr)

 #ifdef CONFIG_MEMORY_HOTPLUG
 void 

Re: [PATCH v6 4/8] mm,memory_hotplug: Allocate memmap from the added memory range

2021-04-06 Thread kernel test robot
Hi Oscar,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on driver-core/driver-core-testing]
[also build test ERROR on pm/linux-next tip/x86/core arm64/for-next/core 
linus/master v5.12-rc6]
[cannot apply to hnaz-linux-mm/master next-20210406]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Oscar-Salvador/Allocate-memmap-from-hotadded-memory-per-device/20210406-191333
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git 
6e11b376fd74356e32d842be588e12dc9bf6e197
config: x86_64-randconfig-a015-20210406 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 
a46f59a747a7273cc439efaf3b4f98d8b63d2f20)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/4d4265dd4e598c7b0971d57894685136229f5d07
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Oscar-Salvador/Allocate-memmap-from-hotadded-memory-per-device/20210406-191333
git checkout 4d4265dd4e598c7b0971d57894685136229f5d07
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> drivers/base/memory.c:201:3: error: implicit declaration of function 
>> 'vmemmap_deinit_space' [-Werror,-Wimplicit-function-declaration]
   vmemmap_deinit_space(start_pfn, nr_vmemmap_pages);
   ^
   drivers/base/memory.c:201:3: note: did you mean 'vmemmap_init_space'?
   include/linux/memory_hotplug.h:112:12: note: 'vmemmap_init_space' declared 
here
   extern int vmemmap_init_space(unsigned long pfn, unsigned long nr_pages,
  ^
   drivers/base/memory.c:231:4: error: implicit declaration of function 
'vmemmap_deinit_space' [-Werror,-Wimplicit-function-declaration]
   vmemmap_deinit_space(start_pfn, nr_pages);
   ^
   2 errors generated.


vim +/vmemmap_deinit_space +201 drivers/base/memory.c

   171  
   172  static int memory_block_online(struct memory_block *mem)
   173  {
   174  unsigned long start_pfn = 
section_nr_to_pfn(mem->start_section_nr);
   175  unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
   176  unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
   177  int ret;
   178  
   179  /*
   180   * Although vmemmap pages have a different lifecycle than the 
pages
   181   * they describe (they remain until the memory is unplugged), 
doing
   182   * its initialization and accounting at hot-{online,offline} 
stage
   183   * simplifies things a lot
   184   */
   185  if (nr_vmemmap_pages) {
   186  ret = vmemmap_init_space(start_pfn, nr_vmemmap_pages, 
mem->nid,
   187   mem->online_type);
   188  if (ret)
   189  return ret;
   190  }
   191  
   192  ret = online_pages(start_pfn + nr_vmemmap_pages,
   193 nr_pages - nr_vmemmap_pages, 
mem->online_type,
   194 mem->nid);
   195  
   196  /*
   197   * Undo the work if online_pages() fails.
   198   */
   199  if (ret && nr_vmemmap_pages) {
   200  vmemmap_adjust_pages(start_pfn, -nr_vmemmap_pages);
 > 201  vmemmap_deinit_space(start_pfn, nr_vmemmap_pages);
   202  }
   203  
   204  return ret;
   205  }
   206  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


[PATCH v6 4/8] mm,memory_hotplug: Allocate memmap from the added memory range

2021-04-06 Thread Oscar Salvador
Physical memory hotadd has to allocate a memmap (struct page array) for
the newly added memory section. Currently, alloc_pages_node() is used
for those allocations.

This has some disadvantages:
 a) an existing memory is consumed for that purpose
(eg: ~2MB per 128MB memory section on x86_64)
 b) if the whole node is movable then we have off-node struct pages
which has performance drawbacks.
 c) It might be there are no PMD_ALIGNED chunks so memmap array gets
populated with base pages.

This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled.

Vmemap page tables can map arbitrary memory.
That means that we can simply use the beginning of each memory section and
map struct pages there.
struct pages which back the allocated space then just need to be treated
carefully.

Implementation wise we will reuse vmem_altmap infrastructure to override
the default allocator used by __populate_section_memmap.
Part of the implementation also relies on memory_block structure gaining
a new field which specifies the number of vmemmap_pages at the beginning.
This patch also introduces the following functions:

 - vmemmap_init_space: Initializes vmemmap pages by calling 
move_pfn_range_to_zone(),
   calls kasan_add_zero_shadow() or the vmemmap range and 
marks
   online as many sections as vmemmap pages fully span.
 - vmemmap_adjust_pages: Accounts/substract vmemmap_pages to node and zone
 present_pages
 - vmemmap_deinit_space: Undoes what vmemmap_init_space does.

The new function memory_block_online() calls vmemmap_init_space() before
doing the actual online_pages(). Should online_pages() fail, we clean up
by calling vmemmap_adjust_pages() and vmemmap_deinit_space().

On offline, memory_block_offline() calls vmemmap_adjust_pages() prior to calling
offline_pages(), because offline_pages() performs the tearing-down of kthreads
and the rebuilding of the zonelists if the node/zone become empty.
If offline_pages() fails, we account back vmemmap pages by 
vmemmap_adjust_pages().
If it succeeds, we call vmemmap_deinit_space().

Hot-remove:

 We need to be careful when removing memory, as adding and
 removing memory needs to be done with the same granularity.
 To check that this assumption is not violated, we check the
 memory range we want to remove and if a) any memory block has
 vmemmap pages and b) the range spans more than a single memory
 block, we scream out loud and refuse to proceed.

 If all is good and the range was using memmap on memory (aka vmemmap pages),
 we construct an altmap structure so free_hugepage_table does the right
 thing and calls vmem_altmap_free instead of free_pagetable.

Signed-off-by: Oscar Salvador 
---
 drivers/base/memory.c  |  64 +++--
 include/linux/memory.h |   8 ++-
 include/linux/memory_hotplug.h |  13 
 include/linux/memremap.h   |   2 +-
 include/linux/mmzone.h |   5 ++
 mm/Kconfig |   5 ++
 mm/memory_hotplug.c| 153 +++--
 7 files changed, 238 insertions(+), 12 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index f209925a5d4e..a5e536a3e9a4 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -173,16 +173,65 @@ static int memory_block_online(struct memory_block *mem)
 {
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
+   unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+   int ret;
+
+   /*
+* Although vmemmap pages have a different lifecycle than the pages
+* they describe (they remain until the memory is unplugged), doing
+* its initialization and accounting at hot-{online,offline} stage
+* simplifies things a lot
+*/
+   if (nr_vmemmap_pages) {
+   ret = vmemmap_init_space(start_pfn, nr_vmemmap_pages, mem->nid,
+mem->online_type);
+   if (ret)
+   return ret;
+   }
 
-   return online_pages(start_pfn, nr_pages, mem->online_type, mem->nid);
+   ret = online_pages(start_pfn + nr_vmemmap_pages,
+  nr_pages - nr_vmemmap_pages, mem->online_type,
+  mem->nid);
+
+   /*
+* Undo the work if online_pages() fails.
+*/
+   if (ret && nr_vmemmap_pages) {
+   vmemmap_adjust_pages(start_pfn, -nr_vmemmap_pages);
+   vmemmap_deinit_space(start_pfn, nr_vmemmap_pages);
+   }
+
+   return ret;
 }
 
 static int memory_block_offline(struct memory_block *mem)
 {
unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
+   unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
+   int ret;
+
+   /*
+*