Re: [PATCH v6 4/8] mm,memory_hotplug: Allocate memmap from the added memory range
On 09.04.21 07:10, Oscar Salvador wrote: On Thu, Apr 08, 2021 at 10:05:18PM -0700, Andrew Morton wrote: Yes please. I just sent v7 with that yesterday [1] Hope David/Michal finds some time to review patch#4 as that is the only missing piece atm. I will have a look next week. Still have to wrap my head around the present_pages stuff and if that couldn't be handled in a cleaner way. -- Thanks, David / dhildenb
Re: [PATCH v6 4/8] mm,memory_hotplug: Allocate memmap from the added memory range
On Thu, Apr 08, 2021 at 10:05:18PM -0700, Andrew Morton wrote: > Yes please. I just sent v7 with that yesterday [1] Hope David/Michal finds some time to review patch#4 as that is the only missing piece atm. [1] https://lkml.org/lkml/2021/4/8/546 -- Oscar Salvador SUSE L3
Re: [PATCH v6 4/8] mm,memory_hotplug: Allocate memmap from the added memory range
On Wed, 07 Apr 2021 22:38:37 +0200 Oscar Salvador wrote: > On 2021-04-06 22:28, Oscar Salvador wrote: > > Heh, it seems I spaced out today. > > > > We need a few things on top: > > > Yes please.
Re: [PATCH v6 4/8] mm,memory_hotplug: Allocate memmap from the added memory range
On 2021-04-06 22:28, Oscar Salvador wrote: Heh, it seems I spaced out today. We need a few things on top: Should I send a new version with the fixup included? I think it would ease the review but I do not want to spam. -- Oscar Salvador SUSE L3
Re: [PATCH v6 4/8] mm,memory_hotplug: Allocate memmap from the added memory range
On Tue, Apr 06, 2021 at 01:11:11PM +0200, Oscar Salvador wrote: > Physical memory hotadd has to allocate a memmap (struct page array) for > the newly added memory section. Currently, alloc_pages_node() is used > for those allocations. > > This has some disadvantages: > a) an existing memory is consumed for that purpose > (eg: ~2MB per 128MB memory section on x86_64) > b) if the whole node is movable then we have off-node struct pages > which has performance drawbacks. > c) It might be there are no PMD_ALIGNED chunks so memmap array gets > populated with base pages. > > This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled. > > Vmemap page tables can map arbitrary memory. > That means that we can simply use the beginning of each memory section and > map struct pages there. > struct pages which back the allocated space then just need to be treated > carefully. > > Implementation wise we will reuse vmem_altmap infrastructure to override > the default allocator used by __populate_section_memmap. > Part of the implementation also relies on memory_block structure gaining > a new field which specifies the number of vmemmap_pages at the beginning. > This patch also introduces the following functions: > > - vmemmap_init_space: Initializes vmemmap pages by calling > move_pfn_range_to_zone(), > calls kasan_add_zero_shadow() or the vmemmap range and > marks > online as many sections as vmemmap pages fully span. > - vmemmap_adjust_pages: Accounts/substract vmemmap_pages to node and zone >present_pages > - vmemmap_deinit_space: Undoes what vmemmap_init_space does. > > The new function memory_block_online() calls vmemmap_init_space() before > doing the actual online_pages(). Should online_pages() fail, we clean up > by calling vmemmap_adjust_pages() and vmemmap_deinit_space(). > > On offline, memory_block_offline() calls vmemmap_adjust_pages() prior to > calling > offline_pages(), because offline_pages() performs the tearing-down of kthreads > and the rebuilding of the zonelists if the node/zone become empty. > If offline_pages() fails, we account back vmemmap pages by > vmemmap_adjust_pages(). > If it succeeds, we call vmemmap_deinit_space(). > > Hot-remove: > > We need to be careful when removing memory, as adding and > removing memory needs to be done with the same granularity. > To check that this assumption is not violated, we check the > memory range we want to remove and if a) any memory block has > vmemmap pages and b) the range spans more than a single memory > block, we scream out loud and refuse to proceed. > > If all is good and the range was using memmap on memory (aka vmemmap pages), > we construct an altmap structure so free_hugepage_table does the right > thing and calls vmem_altmap_free instead of free_pagetable. > > Signed-off-by: Oscar Salvador Heh, it seems I spaced out today. We need a few things on top: - In case !CONFIG_MEMORY_HOTREMOVE, we still need vmemmap_deinit_space as we call it from memory_block_online() too in case online_pages() fails. So we need to move it out of #CONFIG_MEMORY_HOTREMOVE, with the others. - If vmemmap pages fully spans sections, we need to mark those sections as online, since online_pages() will not do it for us. In vmemmap_deinit_space, we need to mark them back offline. Since vmemmap_deinit_space might get called from memory_block_online(), we need to move the offline_mem_sections() out of #CONFIG_MEMORY_HOTREMOVE code. So, we would need the following on top: diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index cc3dbc0abf02..c7669d2accfd 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -111,6 +111,7 @@ extern int add_one_highpage(struct page *page, int pfn, int bad_ppro); extern void vmemmap_adjust_pages(unsigned long pfn, long nr_pages); extern int vmemmap_init_space(unsigned long pfn, unsigned long nr_pages, int nid, int online_type); +extern void vmemmap_deinit_space(unsigned long pfn, unsigned long nr_pages); extern int online_pages(unsigned long pfn, unsigned long nr_pages, int online_type, int nid); extern struct zone *test_pages_in_a_zone(unsigned long start_pfn, @@ -317,7 +318,6 @@ static inline void pgdat_resize_init(struct pglist_data *pgdat) {} #ifdef CONFIG_MEMORY_HOTREMOVE -extern void vmemmap_deinit_space(unsigned long pfn, unsigned long nr_pages); extern void try_offline_node(int nid); extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages); extern int remove_memory(int nid, u64 start, u64 size); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 747e1c21d8e2..76f4ca5ed230 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1383,10 +1383,8 @@ static inline int online_section_nr(unsigned long nr) #ifdef CONFIG_MEMORY_HOTPLUG void
Re: [PATCH v6 4/8] mm,memory_hotplug: Allocate memmap from the added memory range
Hi Oscar, Thank you for the patch! Yet something to improve: [auto build test ERROR on driver-core/driver-core-testing] [also build test ERROR on pm/linux-next tip/x86/core arm64/for-next/core linus/master v5.12-rc6] [cannot apply to hnaz-linux-mm/master next-20210406] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Oscar-Salvador/Allocate-memmap-from-hotadded-memory-per-device/20210406-191333 base: https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git 6e11b376fd74356e32d842be588e12dc9bf6e197 config: x86_64-randconfig-a015-20210406 (attached as .config) compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project a46f59a747a7273cc439efaf3b4f98d8b63d2f20) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install x86_64 cross compiling tool for clang build # apt-get install binutils-x86-64-linux-gnu # https://github.com/0day-ci/linux/commit/4d4265dd4e598c7b0971d57894685136229f5d07 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Oscar-Salvador/Allocate-memmap-from-hotadded-memory-per-device/20210406-191333 git checkout 4d4265dd4e598c7b0971d57894685136229f5d07 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): >> drivers/base/memory.c:201:3: error: implicit declaration of function >> 'vmemmap_deinit_space' [-Werror,-Wimplicit-function-declaration] vmemmap_deinit_space(start_pfn, nr_vmemmap_pages); ^ drivers/base/memory.c:201:3: note: did you mean 'vmemmap_init_space'? include/linux/memory_hotplug.h:112:12: note: 'vmemmap_init_space' declared here extern int vmemmap_init_space(unsigned long pfn, unsigned long nr_pages, ^ drivers/base/memory.c:231:4: error: implicit declaration of function 'vmemmap_deinit_space' [-Werror,-Wimplicit-function-declaration] vmemmap_deinit_space(start_pfn, nr_pages); ^ 2 errors generated. vim +/vmemmap_deinit_space +201 drivers/base/memory.c 171 172 static int memory_block_online(struct memory_block *mem) 173 { 174 unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); 175 unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; 176 unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages; 177 int ret; 178 179 /* 180 * Although vmemmap pages have a different lifecycle than the pages 181 * they describe (they remain until the memory is unplugged), doing 182 * its initialization and accounting at hot-{online,offline} stage 183 * simplifies things a lot 184 */ 185 if (nr_vmemmap_pages) { 186 ret = vmemmap_init_space(start_pfn, nr_vmemmap_pages, mem->nid, 187 mem->online_type); 188 if (ret) 189 return ret; 190 } 191 192 ret = online_pages(start_pfn + nr_vmemmap_pages, 193 nr_pages - nr_vmemmap_pages, mem->online_type, 194 mem->nid); 195 196 /* 197 * Undo the work if online_pages() fails. 198 */ 199 if (ret && nr_vmemmap_pages) { 200 vmemmap_adjust_pages(start_pfn, -nr_vmemmap_pages); > 201 vmemmap_deinit_space(start_pfn, nr_vmemmap_pages); 202 } 203 204 return ret; 205 } 206 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
[PATCH v6 4/8] mm,memory_hotplug: Allocate memmap from the added memory range
Physical memory hotadd has to allocate a memmap (struct page array) for the newly added memory section. Currently, alloc_pages_node() is used for those allocations. This has some disadvantages: a) an existing memory is consumed for that purpose (eg: ~2MB per 128MB memory section on x86_64) b) if the whole node is movable then we have off-node struct pages which has performance drawbacks. c) It might be there are no PMD_ALIGNED chunks so memmap array gets populated with base pages. This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled. Vmemap page tables can map arbitrary memory. That means that we can simply use the beginning of each memory section and map struct pages there. struct pages which back the allocated space then just need to be treated carefully. Implementation wise we will reuse vmem_altmap infrastructure to override the default allocator used by __populate_section_memmap. Part of the implementation also relies on memory_block structure gaining a new field which specifies the number of vmemmap_pages at the beginning. This patch also introduces the following functions: - vmemmap_init_space: Initializes vmemmap pages by calling move_pfn_range_to_zone(), calls kasan_add_zero_shadow() or the vmemmap range and marks online as many sections as vmemmap pages fully span. - vmemmap_adjust_pages: Accounts/substract vmemmap_pages to node and zone present_pages - vmemmap_deinit_space: Undoes what vmemmap_init_space does. The new function memory_block_online() calls vmemmap_init_space() before doing the actual online_pages(). Should online_pages() fail, we clean up by calling vmemmap_adjust_pages() and vmemmap_deinit_space(). On offline, memory_block_offline() calls vmemmap_adjust_pages() prior to calling offline_pages(), because offline_pages() performs the tearing-down of kthreads and the rebuilding of the zonelists if the node/zone become empty. If offline_pages() fails, we account back vmemmap pages by vmemmap_adjust_pages(). If it succeeds, we call vmemmap_deinit_space(). Hot-remove: We need to be careful when removing memory, as adding and removing memory needs to be done with the same granularity. To check that this assumption is not violated, we check the memory range we want to remove and if a) any memory block has vmemmap pages and b) the range spans more than a single memory block, we scream out loud and refuse to proceed. If all is good and the range was using memmap on memory (aka vmemmap pages), we construct an altmap structure so free_hugepage_table does the right thing and calls vmem_altmap_free instead of free_pagetable. Signed-off-by: Oscar Salvador --- drivers/base/memory.c | 64 +++-- include/linux/memory.h | 8 ++- include/linux/memory_hotplug.h | 13 include/linux/memremap.h | 2 +- include/linux/mmzone.h | 5 ++ mm/Kconfig | 5 ++ mm/memory_hotplug.c| 153 +++-- 7 files changed, 238 insertions(+), 12 deletions(-) diff --git a/drivers/base/memory.c b/drivers/base/memory.c index f209925a5d4e..a5e536a3e9a4 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -173,16 +173,65 @@ static int memory_block_online(struct memory_block *mem) { unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; + unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages; + int ret; + + /* +* Although vmemmap pages have a different lifecycle than the pages +* they describe (they remain until the memory is unplugged), doing +* its initialization and accounting at hot-{online,offline} stage +* simplifies things a lot +*/ + if (nr_vmemmap_pages) { + ret = vmemmap_init_space(start_pfn, nr_vmemmap_pages, mem->nid, +mem->online_type); + if (ret) + return ret; + } - return online_pages(start_pfn, nr_pages, mem->online_type, mem->nid); + ret = online_pages(start_pfn + nr_vmemmap_pages, + nr_pages - nr_vmemmap_pages, mem->online_type, + mem->nid); + + /* +* Undo the work if online_pages() fails. +*/ + if (ret && nr_vmemmap_pages) { + vmemmap_adjust_pages(start_pfn, -nr_vmemmap_pages); + vmemmap_deinit_space(start_pfn, nr_vmemmap_pages); + } + + return ret; } static int memory_block_offline(struct memory_block *mem) { unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr); unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; + unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages; + int ret; + + /* +*