Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64
On Mon 27 Nov 2017, 17:39, Maciej Bielski wrote: Hi Robin, > Hi Robin, > > Thank you for your feedback, its highly appreciated. I let myself to add some > comments. > > Our primary goal was to have hotplug working even in the basic setup and > publish first working results. Then we want to improve the code building on > top > of community comments. This is a general answer for questions about > configuration flags. The working setup is presented, a bit as a hint, and we > do > not deem it to be ultimately best at all. The questions about configuration, > IMHO, falls into category of making an agreement on a proper setup (defaults, > dependencies) and, therefore, we strongly rely on the community experience to > advise us how it should be. So, shortly, for some questions "why this is setup > in such a way" the simple anser is that it worked as a first approximation. > Then, I totally agree that for a server-grade system it should be different > and > thanks a lot for sharing your opinion on that. > > On Mon, Nov 27, 2017 at 03:19:49PM +, Robin Murphy wrote: > > Hi Andrea, > > > > I've also been looking at memory hotplug for arm64, from the perspective of > > enabling ZONE_DEVICE for pmem. May I ask what your use-case for this series > > is? AFAICS the real demand will be coming from server systems, which in > > practice means both ACPI and NUMA, both of which are being resoundingly > > ignored here. > > > > Eventually we aim for aarch64 server system. > Adding to what Maciej said: the original motivation and driving factor for this development effort is this project: http://www.dredbox.eu In short, we have a software-defined interconnect for disaggregated memory, where memory can be connected to nodes dynamically and via software. At reconfigurations, we need to hot add and hot remove memory from running kernels. Our current research prototype is based on an arm64 SoC+FPGA system. Hence memory hotplug for arm64. Since triggers for hot-add and hot-remove are software, we do not need ACPI; in our specifc case, memory topologies can change dinamically, so we have a rather ad-hoc and project specific support NUMA that, we believe. does not make any sense to discuss for mainlining. > > Further review comments inline. > > > > On 23/11/17 11:13, Maciej Bielski wrote: > > >Introduces memory hotplug functionality (hot-add) for arm64. > > > > > >Changes v1->v2: > > >- swapper pgtable updated in place on hot add, avoiding unnecessary copy: > > > all changes are additive and non destructive. > > > > > >- stop_machine used to updated swapper on hot add, avoiding races > > > > > >- checking if pagealloc is under debug to stay coherent with mem_map > > > > > >Signed-off-by: Maciej Bielski> > >Signed-off-by: Andrea Reale > > >--- > > > arch/arm64/Kconfig | 12 ++ > > > arch/arm64/configs/defconfig | 1 + > > > arch/arm64/include/asm/mmu.h | 3 ++ > > > arch/arm64/mm/init.c | 87 > > > > > > arch/arm64/mm/mmu.c | 39 > > > 5 files changed, 142 insertions(+) > > > > > >diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > >index 0df64a6..c736bba 100644 > > >--- a/arch/arm64/Kconfig > > >+++ b/arch/arm64/Kconfig > > >@@ -641,6 +641,14 @@ config HOTPLUG_CPU > > > Say Y here to experiment with turning CPUs off and on. CPUs > > > can be controlled through /sys/devices/system/cpu. > > >+config ARCH_HAS_ADD_PAGES > > >+ def_bool y > > >+ depends on ARCH_ENABLE_MEMORY_HOTPLUG > > >+ > > >+config ARCH_ENABLE_MEMORY_HOTPLUG > > >+ def_bool y > > >+depends on !NUMA > > > > As above, realistically this seems too limiting to be useful. > > > > >+ > > > # Common NUMA Features > > > config NUMA > > > bool "Numa Memory Allocation and Scheduler Support" > > >@@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE > > > source "mm/Kconfig" > > >+config ARCH_MEMORY_PROBE > > >+ def_bool y > > >+ depends on MEMORY_HOTPLUG > > > > I'm particularly dubious about enabling this by default - it's useful for > > development and testing, yes, but I think it's the kind of feature where the > > onus should be on interested developers to turn it on, rather than > > production configs to have to turn it off. > > > > >+ > > > config SECCOMP > > > bool "Enable seccomp to safely compute untrusted bytecode" > > > ---help--- > > >diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > > >index 34480e9..5fc5656 100644 > > >--- a/arch/arm64/configs/defconfig > > >+++ b/arch/arm64/configs/defconfig > > >@@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y > > > CONFIG_SCHED_MC=y > > > CONFIG_NUMA=y > > > CONFIG_PREEMPT=y > > >+CONFIG_MEMORY_HOTPLUG=y > > > > Note that this is effectively pointless, given two lines above... > > Well spotted, thanks :) > > > CONFIG_KSM=y > > > CONFIG_TRANSPARENT_HUGEPAGE=y > > > CONFIG_CMA=y >
Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64
On Mon 27 Nov 2017, 17:39, Maciej Bielski wrote: Hi Robin, > Hi Robin, > > Thank you for your feedback, its highly appreciated. I let myself to add some > comments. > > Our primary goal was to have hotplug working even in the basic setup and > publish first working results. Then we want to improve the code building on > top > of community comments. This is a general answer for questions about > configuration flags. The working setup is presented, a bit as a hint, and we > do > not deem it to be ultimately best at all. The questions about configuration, > IMHO, falls into category of making an agreement on a proper setup (defaults, > dependencies) and, therefore, we strongly rely on the community experience to > advise us how it should be. So, shortly, for some questions "why this is setup > in such a way" the simple anser is that it worked as a first approximation. > Then, I totally agree that for a server-grade system it should be different > and > thanks a lot for sharing your opinion on that. > > On Mon, Nov 27, 2017 at 03:19:49PM +, Robin Murphy wrote: > > Hi Andrea, > > > > I've also been looking at memory hotplug for arm64, from the perspective of > > enabling ZONE_DEVICE for pmem. May I ask what your use-case for this series > > is? AFAICS the real demand will be coming from server systems, which in > > practice means both ACPI and NUMA, both of which are being resoundingly > > ignored here. > > > > Eventually we aim for aarch64 server system. > Adding to what Maciej said: the original motivation and driving factor for this development effort is this project: http://www.dredbox.eu In short, we have a software-defined interconnect for disaggregated memory, where memory can be connected to nodes dynamically and via software. At reconfigurations, we need to hot add and hot remove memory from running kernels. Our current research prototype is based on an arm64 SoC+FPGA system. Hence memory hotplug for arm64. Since triggers for hot-add and hot-remove are software, we do not need ACPI; in our specifc case, memory topologies can change dinamically, so we have a rather ad-hoc and project specific support NUMA that, we believe. does not make any sense to discuss for mainlining. > > Further review comments inline. > > > > On 23/11/17 11:13, Maciej Bielski wrote: > > >Introduces memory hotplug functionality (hot-add) for arm64. > > > > > >Changes v1->v2: > > >- swapper pgtable updated in place on hot add, avoiding unnecessary copy: > > > all changes are additive and non destructive. > > > > > >- stop_machine used to updated swapper on hot add, avoiding races > > > > > >- checking if pagealloc is under debug to stay coherent with mem_map > > > > > >Signed-off-by: Maciej Bielski > > >Signed-off-by: Andrea Reale > > >--- > > > arch/arm64/Kconfig | 12 ++ > > > arch/arm64/configs/defconfig | 1 + > > > arch/arm64/include/asm/mmu.h | 3 ++ > > > arch/arm64/mm/init.c | 87 > > > > > > arch/arm64/mm/mmu.c | 39 > > > 5 files changed, 142 insertions(+) > > > > > >diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > >index 0df64a6..c736bba 100644 > > >--- a/arch/arm64/Kconfig > > >+++ b/arch/arm64/Kconfig > > >@@ -641,6 +641,14 @@ config HOTPLUG_CPU > > > Say Y here to experiment with turning CPUs off and on. CPUs > > > can be controlled through /sys/devices/system/cpu. > > >+config ARCH_HAS_ADD_PAGES > > >+ def_bool y > > >+ depends on ARCH_ENABLE_MEMORY_HOTPLUG > > >+ > > >+config ARCH_ENABLE_MEMORY_HOTPLUG > > >+ def_bool y > > >+depends on !NUMA > > > > As above, realistically this seems too limiting to be useful. > > > > >+ > > > # Common NUMA Features > > > config NUMA > > > bool "Numa Memory Allocation and Scheduler Support" > > >@@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE > > > source "mm/Kconfig" > > >+config ARCH_MEMORY_PROBE > > >+ def_bool y > > >+ depends on MEMORY_HOTPLUG > > > > I'm particularly dubious about enabling this by default - it's useful for > > development and testing, yes, but I think it's the kind of feature where the > > onus should be on interested developers to turn it on, rather than > > production configs to have to turn it off. > > > > >+ > > > config SECCOMP > > > bool "Enable seccomp to safely compute untrusted bytecode" > > > ---help--- > > >diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > > >index 34480e9..5fc5656 100644 > > >--- a/arch/arm64/configs/defconfig > > >+++ b/arch/arm64/configs/defconfig > > >@@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y > > > CONFIG_SCHED_MC=y > > > CONFIG_NUMA=y > > > CONFIG_PREEMPT=y > > >+CONFIG_MEMORY_HOTPLUG=y > > > > Note that this is effectively pointless, given two lines above... > > Well spotted, thanks :) > > > CONFIG_KSM=y > > > CONFIG_TRANSPARENT_HUGEPAGE=y > > > CONFIG_CMA=y > > >diff --git a/arch/arm64/include/asm/mmu.h
Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64
Hi Robin, Thank you for your feedback, its highly appreciated. I let myself to add some comments. Our primary goal was to have hotplug working even in the basic setup and publish first working results. Then we want to improve the code building on top of community comments. This is a general answer for questions about configuration flags. The working setup is presented, a bit as a hint, and we do not deem it to be ultimately best at all. The questions about configuration, IMHO, falls into category of making an agreement on a proper setup (defaults, dependencies) and, therefore, we strongly rely on the community experience to advise us how it should be. So, shortly, for some questions "why this is setup in such a way" the simple anser is that it worked as a first approximation. Then, I totally agree that for a server-grade system it should be different and thanks a lot for sharing your opinion on that. On Mon, Nov 27, 2017 at 03:19:49PM +, Robin Murphy wrote: > Hi Andrea, > > I've also been looking at memory hotplug for arm64, from the perspective of > enabling ZONE_DEVICE for pmem. May I ask what your use-case for this series > is? AFAICS the real demand will be coming from server systems, which in > practice means both ACPI and NUMA, both of which are being resoundingly > ignored here. > Eventually we aim for aarch64 server system. > Further review comments inline. > > On 23/11/17 11:13, Maciej Bielski wrote: > >Introduces memory hotplug functionality (hot-add) for arm64. > > > >Changes v1->v2: > >- swapper pgtable updated in place on hot add, avoiding unnecessary copy: > > all changes are additive and non destructive. > > > >- stop_machine used to updated swapper on hot add, avoiding races > > > >- checking if pagealloc is under debug to stay coherent with mem_map > > > >Signed-off-by: Maciej Bielski> >Signed-off-by: Andrea Reale > >--- > > arch/arm64/Kconfig | 12 ++ > > arch/arm64/configs/defconfig | 1 + > > arch/arm64/include/asm/mmu.h | 3 ++ > > arch/arm64/mm/init.c | 87 > > > > arch/arm64/mm/mmu.c | 39 > > 5 files changed, 142 insertions(+) > > > >diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > >index 0df64a6..c736bba 100644 > >--- a/arch/arm64/Kconfig > >+++ b/arch/arm64/Kconfig > >@@ -641,6 +641,14 @@ config HOTPLUG_CPU > > Say Y here to experiment with turning CPUs off and on. CPUs > > can be controlled through /sys/devices/system/cpu. > >+config ARCH_HAS_ADD_PAGES > >+def_bool y > >+depends on ARCH_ENABLE_MEMORY_HOTPLUG > >+ > >+config ARCH_ENABLE_MEMORY_HOTPLUG > >+def_bool y > >+depends on !NUMA > > As above, realistically this seems too limiting to be useful. > > >+ > > # Common NUMA Features > > config NUMA > > bool "Numa Memory Allocation and Scheduler Support" > >@@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE > > source "mm/Kconfig" > >+config ARCH_MEMORY_PROBE > >+def_bool y > >+depends on MEMORY_HOTPLUG > > I'm particularly dubious about enabling this by default - it's useful for > development and testing, yes, but I think it's the kind of feature where the > onus should be on interested developers to turn it on, rather than > production configs to have to turn it off. > > >+ > > config SECCOMP > > bool "Enable seccomp to safely compute untrusted bytecode" > > ---help--- > >diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > >index 34480e9..5fc5656 100644 > >--- a/arch/arm64/configs/defconfig > >+++ b/arch/arm64/configs/defconfig > >@@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y > > CONFIG_SCHED_MC=y > > CONFIG_NUMA=y > > CONFIG_PREEMPT=y > >+CONFIG_MEMORY_HOTPLUG=y > > Note that this is effectively pointless, given two lines above... > > > CONFIG_KSM=y > > CONFIG_TRANSPARENT_HUGEPAGE=y > > CONFIG_CMA=y > >diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h > >index 0d34bf0..2b3fa4d 100644 > >--- a/arch/arm64/include/asm/mmu.h > >+++ b/arch/arm64/include/asm/mmu.h > >@@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, > >phys_addr_t phys, > >pgprot_t prot, bool page_mappings_only); > > extern void *fixmap_remap_fdt(phys_addr_t dt_phys); > > extern void mark_linear_text_alias_ro(void); > >+#ifdef CONFIG_MEMORY_HOTPLUG > >+extern void hotplug_paging(phys_addr_t start, phys_addr_t size); > > Is there any reason for not just implementing all the hotplug code > self-contained in mmu.c? > Simply, in the first version we were supposed to built on top of the patch by Scott Branden, who put a mock implementation of arch_add_memory() in arch/arm64/mm/init.c, this is why hotplug_paging() needed to be announced outside. Quickly looking on the code now I agree that it would be more clean to put everything in arch/arm64/mm/mmu.c. I will test that. >
Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64
Hi Robin, Thank you for your feedback, its highly appreciated. I let myself to add some comments. Our primary goal was to have hotplug working even in the basic setup and publish first working results. Then we want to improve the code building on top of community comments. This is a general answer for questions about configuration flags. The working setup is presented, a bit as a hint, and we do not deem it to be ultimately best at all. The questions about configuration, IMHO, falls into category of making an agreement on a proper setup (defaults, dependencies) and, therefore, we strongly rely on the community experience to advise us how it should be. So, shortly, for some questions "why this is setup in such a way" the simple anser is that it worked as a first approximation. Then, I totally agree that for a server-grade system it should be different and thanks a lot for sharing your opinion on that. On Mon, Nov 27, 2017 at 03:19:49PM +, Robin Murphy wrote: > Hi Andrea, > > I've also been looking at memory hotplug for arm64, from the perspective of > enabling ZONE_DEVICE for pmem. May I ask what your use-case for this series > is? AFAICS the real demand will be coming from server systems, which in > practice means both ACPI and NUMA, both of which are being resoundingly > ignored here. > Eventually we aim for aarch64 server system. > Further review comments inline. > > On 23/11/17 11:13, Maciej Bielski wrote: > >Introduces memory hotplug functionality (hot-add) for arm64. > > > >Changes v1->v2: > >- swapper pgtable updated in place on hot add, avoiding unnecessary copy: > > all changes are additive and non destructive. > > > >- stop_machine used to updated swapper on hot add, avoiding races > > > >- checking if pagealloc is under debug to stay coherent with mem_map > > > >Signed-off-by: Maciej Bielski > >Signed-off-by: Andrea Reale > >--- > > arch/arm64/Kconfig | 12 ++ > > arch/arm64/configs/defconfig | 1 + > > arch/arm64/include/asm/mmu.h | 3 ++ > > arch/arm64/mm/init.c | 87 > > > > arch/arm64/mm/mmu.c | 39 > > 5 files changed, 142 insertions(+) > > > >diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > >index 0df64a6..c736bba 100644 > >--- a/arch/arm64/Kconfig > >+++ b/arch/arm64/Kconfig > >@@ -641,6 +641,14 @@ config HOTPLUG_CPU > > Say Y here to experiment with turning CPUs off and on. CPUs > > can be controlled through /sys/devices/system/cpu. > >+config ARCH_HAS_ADD_PAGES > >+def_bool y > >+depends on ARCH_ENABLE_MEMORY_HOTPLUG > >+ > >+config ARCH_ENABLE_MEMORY_HOTPLUG > >+def_bool y > >+depends on !NUMA > > As above, realistically this seems too limiting to be useful. > > >+ > > # Common NUMA Features > > config NUMA > > bool "Numa Memory Allocation and Scheduler Support" > >@@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE > > source "mm/Kconfig" > >+config ARCH_MEMORY_PROBE > >+def_bool y > >+depends on MEMORY_HOTPLUG > > I'm particularly dubious about enabling this by default - it's useful for > development and testing, yes, but I think it's the kind of feature where the > onus should be on interested developers to turn it on, rather than > production configs to have to turn it off. > > >+ > > config SECCOMP > > bool "Enable seccomp to safely compute untrusted bytecode" > > ---help--- > >diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > >index 34480e9..5fc5656 100644 > >--- a/arch/arm64/configs/defconfig > >+++ b/arch/arm64/configs/defconfig > >@@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y > > CONFIG_SCHED_MC=y > > CONFIG_NUMA=y > > CONFIG_PREEMPT=y > >+CONFIG_MEMORY_HOTPLUG=y > > Note that this is effectively pointless, given two lines above... > > > CONFIG_KSM=y > > CONFIG_TRANSPARENT_HUGEPAGE=y > > CONFIG_CMA=y > >diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h > >index 0d34bf0..2b3fa4d 100644 > >--- a/arch/arm64/include/asm/mmu.h > >+++ b/arch/arm64/include/asm/mmu.h > >@@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, > >phys_addr_t phys, > >pgprot_t prot, bool page_mappings_only); > > extern void *fixmap_remap_fdt(phys_addr_t dt_phys); > > extern void mark_linear_text_alias_ro(void); > >+#ifdef CONFIG_MEMORY_HOTPLUG > >+extern void hotplug_paging(phys_addr_t start, phys_addr_t size); > > Is there any reason for not just implementing all the hotplug code > self-contained in mmu.c? > Simply, in the first version we were supposed to built on top of the patch by Scott Branden, who put a mock implementation of arch_add_memory() in arch/arm64/mm/init.c, this is why hotplug_paging() needed to be announced outside. Quickly looking on the code now I agree that it would be more clean to put everything in arch/arm64/mm/mmu.c. I will test that. > >+#endif > > #endif > >diff --git a/arch/arm64/mm/init.c
Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64
Hi Andrea, I've also been looking at memory hotplug for arm64, from the perspective of enabling ZONE_DEVICE for pmem. May I ask what your use-case for this series is? AFAICS the real demand will be coming from server systems, which in practice means both ACPI and NUMA, both of which are being resoundingly ignored here. Further review comments inline. On 23/11/17 11:13, Maciej Bielski wrote: Introduces memory hotplug functionality (hot-add) for arm64. Changes v1->v2: - swapper pgtable updated in place on hot add, avoiding unnecessary copy: all changes are additive and non destructive. - stop_machine used to updated swapper on hot add, avoiding races - checking if pagealloc is under debug to stay coherent with mem_map Signed-off-by: Maciej BielskiSigned-off-by: Andrea Reale --- arch/arm64/Kconfig | 12 ++ arch/arm64/configs/defconfig | 1 + arch/arm64/include/asm/mmu.h | 3 ++ arch/arm64/mm/init.c | 87 arch/arm64/mm/mmu.c | 39 5 files changed, 142 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 0df64a6..c736bba 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -641,6 +641,14 @@ config HOTPLUG_CPU Say Y here to experiment with turning CPUs off and on. CPUs can be controlled through /sys/devices/system/cpu. +config ARCH_HAS_ADD_PAGES + def_bool y + depends on ARCH_ENABLE_MEMORY_HOTPLUG + +config ARCH_ENABLE_MEMORY_HOTPLUG + def_bool y +depends on !NUMA As above, realistically this seems too limiting to be useful. + # Common NUMA Features config NUMA bool "Numa Memory Allocation and Scheduler Support" @@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE source "mm/Kconfig" +config ARCH_MEMORY_PROBE + def_bool y + depends on MEMORY_HOTPLUG I'm particularly dubious about enabling this by default - it's useful for development and testing, yes, but I think it's the kind of feature where the onus should be on interested developers to turn it on, rather than production configs to have to turn it off. + config SECCOMP bool "Enable seccomp to safely compute untrusted bytecode" ---help--- diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index 34480e9..5fc5656 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y CONFIG_SCHED_MC=y CONFIG_NUMA=y CONFIG_PREEMPT=y +CONFIG_MEMORY_HOTPLUG=y Note that this is effectively pointless, given two lines above... CONFIG_KSM=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_CMA=y diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h index 0d34bf0..2b3fa4d 100644 --- a/arch/arm64/include/asm/mmu.h +++ b/arch/arm64/include/asm/mmu.h @@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys, pgprot_t prot, bool page_mappings_only); extern void *fixmap_remap_fdt(phys_addr_t dt_phys); extern void mark_linear_text_alias_ro(void); +#ifdef CONFIG_MEMORY_HOTPLUG +extern void hotplug_paging(phys_addr_t start, phys_addr_t size); Is there any reason for not just implementing all the hotplug code self-contained in mmu.c? +#endif #endif diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 5960bef..e96e7d3 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -722,3 +722,90 @@ static int __init register_mem_limit_dumper(void) return 0; } __initcall(register_mem_limit_dumper); + +#ifdef CONFIG_MEMORY_HOTPLUG +int add_pages(int nid, unsigned long start_pfn, + unsigned long nr_pages, bool want_memblock) +{ + int ret; + u64 start_addr = start_pfn << PAGE_SHIFT; + /* +* Mark the first page in the range as unusable. This is needed +* because __add_section (within __add_pages) wants pfn_valid +* of it to be false, and in arm64 pfn falid is implemented by +* just checking at the nomap flag for existing blocks. +* +* A small trick here is that __add_section() requires only +* phys_start_pfn (that is the first pfn of a section) to be +* invalid. Regardless of whether it was assumed (by the function +* author) that all pfns within a section are either all valid +* or all invalid, it allows to avoid looping twice (once here, +* second when memblock_clear_nomap() is called) through all +* pfns of the section and modify only one pfn. Thanks to that, +* further, in __add_zone() only this very first pfn is skipped +* and corresponding page is not flagged reserved. Therefore it +* is enough to correct this setup only for it. +* +* When arch_add_memory() returns the walk_memory_range() function +
Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64
Hi Andrea, I've also been looking at memory hotplug for arm64, from the perspective of enabling ZONE_DEVICE for pmem. May I ask what your use-case for this series is? AFAICS the real demand will be coming from server systems, which in practice means both ACPI and NUMA, both of which are being resoundingly ignored here. Further review comments inline. On 23/11/17 11:13, Maciej Bielski wrote: Introduces memory hotplug functionality (hot-add) for arm64. Changes v1->v2: - swapper pgtable updated in place on hot add, avoiding unnecessary copy: all changes are additive and non destructive. - stop_machine used to updated swapper on hot add, avoiding races - checking if pagealloc is under debug to stay coherent with mem_map Signed-off-by: Maciej Bielski Signed-off-by: Andrea Reale --- arch/arm64/Kconfig | 12 ++ arch/arm64/configs/defconfig | 1 + arch/arm64/include/asm/mmu.h | 3 ++ arch/arm64/mm/init.c | 87 arch/arm64/mm/mmu.c | 39 5 files changed, 142 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 0df64a6..c736bba 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -641,6 +641,14 @@ config HOTPLUG_CPU Say Y here to experiment with turning CPUs off and on. CPUs can be controlled through /sys/devices/system/cpu. +config ARCH_HAS_ADD_PAGES + def_bool y + depends on ARCH_ENABLE_MEMORY_HOTPLUG + +config ARCH_ENABLE_MEMORY_HOTPLUG + def_bool y +depends on !NUMA As above, realistically this seems too limiting to be useful. + # Common NUMA Features config NUMA bool "Numa Memory Allocation and Scheduler Support" @@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE source "mm/Kconfig" +config ARCH_MEMORY_PROBE + def_bool y + depends on MEMORY_HOTPLUG I'm particularly dubious about enabling this by default - it's useful for development and testing, yes, but I think it's the kind of feature where the onus should be on interested developers to turn it on, rather than production configs to have to turn it off. + config SECCOMP bool "Enable seccomp to safely compute untrusted bytecode" ---help--- diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index 34480e9..5fc5656 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y CONFIG_SCHED_MC=y CONFIG_NUMA=y CONFIG_PREEMPT=y +CONFIG_MEMORY_HOTPLUG=y Note that this is effectively pointless, given two lines above... CONFIG_KSM=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_CMA=y diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h index 0d34bf0..2b3fa4d 100644 --- a/arch/arm64/include/asm/mmu.h +++ b/arch/arm64/include/asm/mmu.h @@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys, pgprot_t prot, bool page_mappings_only); extern void *fixmap_remap_fdt(phys_addr_t dt_phys); extern void mark_linear_text_alias_ro(void); +#ifdef CONFIG_MEMORY_HOTPLUG +extern void hotplug_paging(phys_addr_t start, phys_addr_t size); Is there any reason for not just implementing all the hotplug code self-contained in mmu.c? +#endif #endif diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 5960bef..e96e7d3 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -722,3 +722,90 @@ static int __init register_mem_limit_dumper(void) return 0; } __initcall(register_mem_limit_dumper); + +#ifdef CONFIG_MEMORY_HOTPLUG +int add_pages(int nid, unsigned long start_pfn, + unsigned long nr_pages, bool want_memblock) +{ + int ret; + u64 start_addr = start_pfn << PAGE_SHIFT; + /* +* Mark the first page in the range as unusable. This is needed +* because __add_section (within __add_pages) wants pfn_valid +* of it to be false, and in arm64 pfn falid is implemented by +* just checking at the nomap flag for existing blocks. +* +* A small trick here is that __add_section() requires only +* phys_start_pfn (that is the first pfn of a section) to be +* invalid. Regardless of whether it was assumed (by the function +* author) that all pfns within a section are either all valid +* or all invalid, it allows to avoid looping twice (once here, +* second when memblock_clear_nomap() is called) through all +* pfns of the section and modify only one pfn. Thanks to that, +* further, in __add_zone() only this very first pfn is skipped +* and corresponding page is not flagged reserved. Therefore it +* is enough to correct this setup only for it. +* +* When arch_add_memory() returns the walk_memory_range() function +* is called and passed with online_memory_block() callback,
Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64
On Fri, Nov 24, 2017 at 4:23 PM, Maciej Bielskiwrote: > On Fri, Nov 24, 2017 at 09:42:33AM +, Andrea Reale wrote: >> Hi Arun, >> >> >> On Fri 24 Nov 2017, 11:25, Arun KS wrote: >> > On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielski >> > wrote: >> >> [ ...] >> > > Introduces memory hotplug functionality (hot-add) for arm64. >> > > @@ -615,6 +616,44 @@ void __init paging_init(void) >> > > SWAPPER_DIR_SIZE - PAGE_SIZE); >> > > } >> > > >> > > +#ifdef CONFIG_MEMORY_HOTPLUG >> > > + >> > > +/* >> > > + * hotplug_paging() is used by memory hotplug to build new page tables >> > > + * for hot added memory. >> > > + */ >> > > + >> > > +struct mem_range { >> > > + phys_addr_t base; >> > > + phys_addr_t size; >> > > +}; >> > > + >> > > +static int __hotplug_paging(void *data) >> > > +{ >> > > + int flags = 0; >> > > + struct mem_range *section = data; >> > > + >> > > + if (debug_pagealloc_enabled()) >> > > + flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; >> > > + >> > > + __create_pgd_mapping(swapper_pg_dir, section->base, >> > > + __phys_to_virt(section->base), section->size, >> > > + PAGE_KERNEL, pgd_pgtable_alloc, flags); >> > >> > Hello Andrea, >> > >> > __hotplug_paging runs on stop_machine context. >> > cpu stop callbacks must not sleep. >> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/stop_machine.c?h=v4.14#n479 >> > >> > __create_pgd_mapping uses pgd_pgtable_alloc. which does >> > __get_free_page(PGALLOC_GFP) >> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/mm/mmu.c?h=v4.14#n342 >> > >> > PGALLOC_GFP has GFP_KERNEL which inturn has __GFP_RECLAIM >> > >> > #define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO) >> > #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) >> > >> > Now, prepare_alloc_pages() called by __alloc_pages_nodemask checks for >> > >> > might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM); >> > >> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_alloc.c?h=v4.14#n4150 >> > >> > and then BUG() >> >> Well spotted, thanks for reporting the problem. One possible solution >> would be to revert back to building the updated page tables on a copy >> pgdir (as it was done in v1 of this patchset) and then replacing swapper >> atomically with stop_machine. >> >> Actually, I am not sure if stop_machine is strictly needed, >> if we modify the swapper pgdir live: for example, in x86_64 >> kernel_physical_mapping_init, atomicity is ensured by spin-locking on >> init_mm.page_table_lock. >> https://elixir.free-electrons.com/linux/v4.14/source/arch/x86/mm/init_64.c#L684 >> I'll spend some time investigating whoever else could be working >> concurrently on the swapper pgdir. >> >> Any suggestion or pointer is very welcome. > > Hi Andrea, Arun, > > Alternative approach could be implementing pgd_pgtable_alloc_nosleep() and > pointing this to hotplug_paging(). Subsequently, it could use different flags, > eg: > > #define PGALLOC_GFP_NORECLAIM (__GFP_IO | __GFP_FS | __GFP_NOTRACK | > __GFP_ZERO) This solves the problem with __get_free_page. But pgd_pgtable_alloc() -> pgtable_page_ctor() -> ptlock_alloc() and then kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL) Same BUG again. Regards, Arun > > Is this unefficient approach in any way? > Do we like the fact that the memory-attaching thread can go to sleep? > > BR, > >> >> Thanks, >> Andrea >> >> > I was testing on 4.4 kernel, but cross checked with 4.14 as well. >> > >> > Regards, >> > Arun >> > >> > >> > > + >> > > + return 0; >> > > +} >> > > + >> > > +inline void hotplug_paging(phys_addr_t start, phys_addr_t size) >> > > +{ >> > > + struct mem_range section = { >> > > + .base = start, >> > > + .size = size, >> > > + }; >> > > + >> > > + stop_machine(__hotplug_paging, , NULL); >> > > +} >> > > +#endif /* CONFIG_MEMORY_HOTPLUG */ >> > > + >> > > /* >> > > * Check whether a kernel address is valid (derived from arch/x86/). >> > > */ >> > > -- >> > > 2.7.4 >> > > >> > >> > > -- > Maciej Bielski
Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64
On Fri, Nov 24, 2017 at 4:23 PM, Maciej Bielski wrote: > On Fri, Nov 24, 2017 at 09:42:33AM +, Andrea Reale wrote: >> Hi Arun, >> >> >> On Fri 24 Nov 2017, 11:25, Arun KS wrote: >> > On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielski >> > wrote: >> >> [ ...] >> > > Introduces memory hotplug functionality (hot-add) for arm64. >> > > @@ -615,6 +616,44 @@ void __init paging_init(void) >> > > SWAPPER_DIR_SIZE - PAGE_SIZE); >> > > } >> > > >> > > +#ifdef CONFIG_MEMORY_HOTPLUG >> > > + >> > > +/* >> > > + * hotplug_paging() is used by memory hotplug to build new page tables >> > > + * for hot added memory. >> > > + */ >> > > + >> > > +struct mem_range { >> > > + phys_addr_t base; >> > > + phys_addr_t size; >> > > +}; >> > > + >> > > +static int __hotplug_paging(void *data) >> > > +{ >> > > + int flags = 0; >> > > + struct mem_range *section = data; >> > > + >> > > + if (debug_pagealloc_enabled()) >> > > + flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; >> > > + >> > > + __create_pgd_mapping(swapper_pg_dir, section->base, >> > > + __phys_to_virt(section->base), section->size, >> > > + PAGE_KERNEL, pgd_pgtable_alloc, flags); >> > >> > Hello Andrea, >> > >> > __hotplug_paging runs on stop_machine context. >> > cpu stop callbacks must not sleep. >> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/stop_machine.c?h=v4.14#n479 >> > >> > __create_pgd_mapping uses pgd_pgtable_alloc. which does >> > __get_free_page(PGALLOC_GFP) >> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/mm/mmu.c?h=v4.14#n342 >> > >> > PGALLOC_GFP has GFP_KERNEL which inturn has __GFP_RECLAIM >> > >> > #define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO) >> > #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) >> > >> > Now, prepare_alloc_pages() called by __alloc_pages_nodemask checks for >> > >> > might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM); >> > >> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_alloc.c?h=v4.14#n4150 >> > >> > and then BUG() >> >> Well spotted, thanks for reporting the problem. One possible solution >> would be to revert back to building the updated page tables on a copy >> pgdir (as it was done in v1 of this patchset) and then replacing swapper >> atomically with stop_machine. >> >> Actually, I am not sure if stop_machine is strictly needed, >> if we modify the swapper pgdir live: for example, in x86_64 >> kernel_physical_mapping_init, atomicity is ensured by spin-locking on >> init_mm.page_table_lock. >> https://elixir.free-electrons.com/linux/v4.14/source/arch/x86/mm/init_64.c#L684 >> I'll spend some time investigating whoever else could be working >> concurrently on the swapper pgdir. >> >> Any suggestion or pointer is very welcome. > > Hi Andrea, Arun, > > Alternative approach could be implementing pgd_pgtable_alloc_nosleep() and > pointing this to hotplug_paging(). Subsequently, it could use different flags, > eg: > > #define PGALLOC_GFP_NORECLAIM (__GFP_IO | __GFP_FS | __GFP_NOTRACK | > __GFP_ZERO) This solves the problem with __get_free_page. But pgd_pgtable_alloc() -> pgtable_page_ctor() -> ptlock_alloc() and then kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL) Same BUG again. Regards, Arun > > Is this unefficient approach in any way? > Do we like the fact that the memory-attaching thread can go to sleep? > > BR, > >> >> Thanks, >> Andrea >> >> > I was testing on 4.4 kernel, but cross checked with 4.14 as well. >> > >> > Regards, >> > Arun >> > >> > >> > > + >> > > + return 0; >> > > +} >> > > + >> > > +inline void hotplug_paging(phys_addr_t start, phys_addr_t size) >> > > +{ >> > > + struct mem_range section = { >> > > + .base = start, >> > > + .size = size, >> > > + }; >> > > + >> > > + stop_machine(__hotplug_paging, , NULL); >> > > +} >> > > +#endif /* CONFIG_MEMORY_HOTPLUG */ >> > > + >> > > /* >> > > * Check whether a kernel address is valid (derived from arch/x86/). >> > > */ >> > > -- >> > > 2.7.4 >> > > >> > >> > > -- > Maciej Bielski
Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64
On Fri, Nov 24, 2017 at 09:42:33AM +, Andrea Reale wrote: > Hi Arun, > > > On Fri 24 Nov 2017, 11:25, Arun KS wrote: > > On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielski > >wrote: > >> [ ...] > > > Introduces memory hotplug functionality (hot-add) for arm64. > > > @@ -615,6 +616,44 @@ void __init paging_init(void) > > > SWAPPER_DIR_SIZE - PAGE_SIZE); > > > } > > > > > > +#ifdef CONFIG_MEMORY_HOTPLUG > > > + > > > +/* > > > + * hotplug_paging() is used by memory hotplug to build new page tables > > > + * for hot added memory. > > > + */ > > > + > > > +struct mem_range { > > > + phys_addr_t base; > > > + phys_addr_t size; > > > +}; > > > + > > > +static int __hotplug_paging(void *data) > > > +{ > > > + int flags = 0; > > > + struct mem_range *section = data; > > > + > > > + if (debug_pagealloc_enabled()) > > > + flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; > > > + > > > + __create_pgd_mapping(swapper_pg_dir, section->base, > > > + __phys_to_virt(section->base), section->size, > > > + PAGE_KERNEL, pgd_pgtable_alloc, flags); > > > > Hello Andrea, > > > > __hotplug_paging runs on stop_machine context. > > cpu stop callbacks must not sleep. > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/stop_machine.c?h=v4.14#n479 > > > > __create_pgd_mapping uses pgd_pgtable_alloc. which does > > __get_free_page(PGALLOC_GFP) > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/mm/mmu.c?h=v4.14#n342 > > > > PGALLOC_GFP has GFP_KERNEL which inturn has __GFP_RECLAIM > > > > #define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO) > > #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) > > > > Now, prepare_alloc_pages() called by __alloc_pages_nodemask checks for > > > > might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM); > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_alloc.c?h=v4.14#n4150 > > > > and then BUG() > > Well spotted, thanks for reporting the problem. One possible solution > would be to revert back to building the updated page tables on a copy > pgdir (as it was done in v1 of this patchset) and then replacing swapper > atomically with stop_machine. > > Actually, I am not sure if stop_machine is strictly needed, > if we modify the swapper pgdir live: for example, in x86_64 > kernel_physical_mapping_init, atomicity is ensured by spin-locking on > init_mm.page_table_lock. > https://elixir.free-electrons.com/linux/v4.14/source/arch/x86/mm/init_64.c#L684 > I'll spend some time investigating whoever else could be working > concurrently on the swapper pgdir. > > Any suggestion or pointer is very welcome. Hi Andrea, Arun, Alternative approach could be implementing pgd_pgtable_alloc_nosleep() and pointing this to hotplug_paging(). Subsequently, it could use different flags, eg: #define PGALLOC_GFP_NORECLAIM (__GFP_IO | __GFP_FS | __GFP_NOTRACK | __GFP_ZERO) Is this unefficient approach in any way? Do we like the fact that the memory-attaching thread can go to sleep? BR, > > Thanks, > Andrea > > > I was testing on 4.4 kernel, but cross checked with 4.14 as well. > > > > Regards, > > Arun > > > > > > > + > > > + return 0; > > > +} > > > + > > > +inline void hotplug_paging(phys_addr_t start, phys_addr_t size) > > > +{ > > > + struct mem_range section = { > > > + .base = start, > > > + .size = size, > > > + }; > > > + > > > + stop_machine(__hotplug_paging, , NULL); > > > +} > > > +#endif /* CONFIG_MEMORY_HOTPLUG */ > > > + > > > /* > > > * Check whether a kernel address is valid (derived from arch/x86/). > > > */ > > > -- > > > 2.7.4 > > > > > > -- Maciej Bielski
Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64
On Fri, Nov 24, 2017 at 09:42:33AM +, Andrea Reale wrote: > Hi Arun, > > > On Fri 24 Nov 2017, 11:25, Arun KS wrote: > > On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielski > > wrote: > >> [ ...] > > > Introduces memory hotplug functionality (hot-add) for arm64. > > > @@ -615,6 +616,44 @@ void __init paging_init(void) > > > SWAPPER_DIR_SIZE - PAGE_SIZE); > > > } > > > > > > +#ifdef CONFIG_MEMORY_HOTPLUG > > > + > > > +/* > > > + * hotplug_paging() is used by memory hotplug to build new page tables > > > + * for hot added memory. > > > + */ > > > + > > > +struct mem_range { > > > + phys_addr_t base; > > > + phys_addr_t size; > > > +}; > > > + > > > +static int __hotplug_paging(void *data) > > > +{ > > > + int flags = 0; > > > + struct mem_range *section = data; > > > + > > > + if (debug_pagealloc_enabled()) > > > + flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; > > > + > > > + __create_pgd_mapping(swapper_pg_dir, section->base, > > > + __phys_to_virt(section->base), section->size, > > > + PAGE_KERNEL, pgd_pgtable_alloc, flags); > > > > Hello Andrea, > > > > __hotplug_paging runs on stop_machine context. > > cpu stop callbacks must not sleep. > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/stop_machine.c?h=v4.14#n479 > > > > __create_pgd_mapping uses pgd_pgtable_alloc. which does > > __get_free_page(PGALLOC_GFP) > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/mm/mmu.c?h=v4.14#n342 > > > > PGALLOC_GFP has GFP_KERNEL which inturn has __GFP_RECLAIM > > > > #define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO) > > #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) > > > > Now, prepare_alloc_pages() called by __alloc_pages_nodemask checks for > > > > might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM); > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_alloc.c?h=v4.14#n4150 > > > > and then BUG() > > Well spotted, thanks for reporting the problem. One possible solution > would be to revert back to building the updated page tables on a copy > pgdir (as it was done in v1 of this patchset) and then replacing swapper > atomically with stop_machine. > > Actually, I am not sure if stop_machine is strictly needed, > if we modify the swapper pgdir live: for example, in x86_64 > kernel_physical_mapping_init, atomicity is ensured by spin-locking on > init_mm.page_table_lock. > https://elixir.free-electrons.com/linux/v4.14/source/arch/x86/mm/init_64.c#L684 > I'll spend some time investigating whoever else could be working > concurrently on the swapper pgdir. > > Any suggestion or pointer is very welcome. Hi Andrea, Arun, Alternative approach could be implementing pgd_pgtable_alloc_nosleep() and pointing this to hotplug_paging(). Subsequently, it could use different flags, eg: #define PGALLOC_GFP_NORECLAIM (__GFP_IO | __GFP_FS | __GFP_NOTRACK | __GFP_ZERO) Is this unefficient approach in any way? Do we like the fact that the memory-attaching thread can go to sleep? BR, > > Thanks, > Andrea > > > I was testing on 4.4 kernel, but cross checked with 4.14 as well. > > > > Regards, > > Arun > > > > > > > + > > > + return 0; > > > +} > > > + > > > +inline void hotplug_paging(phys_addr_t start, phys_addr_t size) > > > +{ > > > + struct mem_range section = { > > > + .base = start, > > > + .size = size, > > > + }; > > > + > > > + stop_machine(__hotplug_paging, , NULL); > > > +} > > > +#endif /* CONFIG_MEMORY_HOTPLUG */ > > > + > > > /* > > > * Check whether a kernel address is valid (derived from arch/x86/). > > > */ > > > -- > > > 2.7.4 > > > > > > -- Maciej Bielski
Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64
Hi Arun, On Fri 24 Nov 2017, 11:25, Arun KS wrote: > On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielski >wrote: >> [ ...] > > Introduces memory hotplug functionality (hot-add) for arm64. > > @@ -615,6 +616,44 @@ void __init paging_init(void) > > SWAPPER_DIR_SIZE - PAGE_SIZE); > > } > > > > +#ifdef CONFIG_MEMORY_HOTPLUG > > + > > +/* > > + * hotplug_paging() is used by memory hotplug to build new page tables > > + * for hot added memory. > > + */ > > + > > +struct mem_range { > > + phys_addr_t base; > > + phys_addr_t size; > > +}; > > + > > +static int __hotplug_paging(void *data) > > +{ > > + int flags = 0; > > + struct mem_range *section = data; > > + > > + if (debug_pagealloc_enabled()) > > + flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; > > + > > + __create_pgd_mapping(swapper_pg_dir, section->base, > > + __phys_to_virt(section->base), section->size, > > + PAGE_KERNEL, pgd_pgtable_alloc, flags); > > Hello Andrea, > > __hotplug_paging runs on stop_machine context. > cpu stop callbacks must not sleep. > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/stop_machine.c?h=v4.14#n479 > > __create_pgd_mapping uses pgd_pgtable_alloc. which does > __get_free_page(PGALLOC_GFP) > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/mm/mmu.c?h=v4.14#n342 > > PGALLOC_GFP has GFP_KERNEL which inturn has __GFP_RECLAIM > > #define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO) > #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) > > Now, prepare_alloc_pages() called by __alloc_pages_nodemask checks for > > might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM); > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_alloc.c?h=v4.14#n4150 > > and then BUG() Well spotted, thanks for reporting the problem. One possible solution would be to revert back to building the updated page tables on a copy pgdir (as it was done in v1 of this patchset) and then replacing swapper atomically with stop_machine. Actually, I am not sure if stop_machine is strictly needed, if we modify the swapper pgdir live: for example, in x86_64 kernel_physical_mapping_init, atomicity is ensured by spin-locking on init_mm.page_table_lock. https://elixir.free-electrons.com/linux/v4.14/source/arch/x86/mm/init_64.c#L684 I'll spend some time investigating whoever else could be working concurrently on the swapper pgdir. Any suggestion or pointer is very welcome. Thanks, Andrea > I was testing on 4.4 kernel, but cross checked with 4.14 as well. > > Regards, > Arun > > > > + > > + return 0; > > +} > > + > > +inline void hotplug_paging(phys_addr_t start, phys_addr_t size) > > +{ > > + struct mem_range section = { > > + .base = start, > > + .size = size, > > + }; > > + > > + stop_machine(__hotplug_paging, , NULL); > > +} > > +#endif /* CONFIG_MEMORY_HOTPLUG */ > > + > > /* > > * Check whether a kernel address is valid (derived from arch/x86/). > > */ > > -- > > 2.7.4 > > >
Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64
Hi Arun, On Fri 24 Nov 2017, 11:25, Arun KS wrote: > On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielski > wrote: >> [ ...] > > Introduces memory hotplug functionality (hot-add) for arm64. > > @@ -615,6 +616,44 @@ void __init paging_init(void) > > SWAPPER_DIR_SIZE - PAGE_SIZE); > > } > > > > +#ifdef CONFIG_MEMORY_HOTPLUG > > + > > +/* > > + * hotplug_paging() is used by memory hotplug to build new page tables > > + * for hot added memory. > > + */ > > + > > +struct mem_range { > > + phys_addr_t base; > > + phys_addr_t size; > > +}; > > + > > +static int __hotplug_paging(void *data) > > +{ > > + int flags = 0; > > + struct mem_range *section = data; > > + > > + if (debug_pagealloc_enabled()) > > + flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; > > + > > + __create_pgd_mapping(swapper_pg_dir, section->base, > > + __phys_to_virt(section->base), section->size, > > + PAGE_KERNEL, pgd_pgtable_alloc, flags); > > Hello Andrea, > > __hotplug_paging runs on stop_machine context. > cpu stop callbacks must not sleep. > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/stop_machine.c?h=v4.14#n479 > > __create_pgd_mapping uses pgd_pgtable_alloc. which does > __get_free_page(PGALLOC_GFP) > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/mm/mmu.c?h=v4.14#n342 > > PGALLOC_GFP has GFP_KERNEL which inturn has __GFP_RECLAIM > > #define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO) > #define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) > > Now, prepare_alloc_pages() called by __alloc_pages_nodemask checks for > > might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM); > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_alloc.c?h=v4.14#n4150 > > and then BUG() Well spotted, thanks for reporting the problem. One possible solution would be to revert back to building the updated page tables on a copy pgdir (as it was done in v1 of this patchset) and then replacing swapper atomically with stop_machine. Actually, I am not sure if stop_machine is strictly needed, if we modify the swapper pgdir live: for example, in x86_64 kernel_physical_mapping_init, atomicity is ensured by spin-locking on init_mm.page_table_lock. https://elixir.free-electrons.com/linux/v4.14/source/arch/x86/mm/init_64.c#L684 I'll spend some time investigating whoever else could be working concurrently on the swapper pgdir. Any suggestion or pointer is very welcome. Thanks, Andrea > I was testing on 4.4 kernel, but cross checked with 4.14 as well. > > Regards, > Arun > > > > + > > + return 0; > > +} > > + > > +inline void hotplug_paging(phys_addr_t start, phys_addr_t size) > > +{ > > + struct mem_range section = { > > + .base = start, > > + .size = size, > > + }; > > + > > + stop_machine(__hotplug_paging, , NULL); > > +} > > +#endif /* CONFIG_MEMORY_HOTPLUG */ > > + > > /* > > * Check whether a kernel address is valid (derived from arch/x86/). > > */ > > -- > > 2.7.4 > > >
Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64
On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielskiwrote: > Introduces memory hotplug functionality (hot-add) for arm64. > > Changes v1->v2: > - swapper pgtable updated in place on hot add, avoiding unnecessary copy: > all changes are additive and non destructive. > > - stop_machine used to updated swapper on hot add, avoiding races > > - checking if pagealloc is under debug to stay coherent with mem_map > > Signed-off-by: Maciej Bielski > Signed-off-by: Andrea Reale > --- > arch/arm64/Kconfig | 12 ++ > arch/arm64/configs/defconfig | 1 + > arch/arm64/include/asm/mmu.h | 3 ++ > arch/arm64/mm/init.c | 87 > > arch/arm64/mm/mmu.c | 39 > 5 files changed, 142 insertions(+) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 0df64a6..c736bba 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -641,6 +641,14 @@ config HOTPLUG_CPU > Say Y here to experiment with turning CPUs off and on. CPUs > can be controlled through /sys/devices/system/cpu. > > +config ARCH_HAS_ADD_PAGES > + def_bool y > + depends on ARCH_ENABLE_MEMORY_HOTPLUG > + > +config ARCH_ENABLE_MEMORY_HOTPLUG > + def_bool y > +depends on !NUMA > + > # Common NUMA Features > config NUMA > bool "Numa Memory Allocation and Scheduler Support" > @@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE > > source "mm/Kconfig" > > +config ARCH_MEMORY_PROBE > + def_bool y > + depends on MEMORY_HOTPLUG > + > config SECCOMP > bool "Enable seccomp to safely compute untrusted bytecode" > ---help--- > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > index 34480e9..5fc5656 100644 > --- a/arch/arm64/configs/defconfig > +++ b/arch/arm64/configs/defconfig > @@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y > CONFIG_SCHED_MC=y > CONFIG_NUMA=y > CONFIG_PREEMPT=y > +CONFIG_MEMORY_HOTPLUG=y > CONFIG_KSM=y > CONFIG_TRANSPARENT_HUGEPAGE=y > CONFIG_CMA=y > diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h > index 0d34bf0..2b3fa4d 100644 > --- a/arch/arm64/include/asm/mmu.h > +++ b/arch/arm64/include/asm/mmu.h > @@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, > phys_addr_t phys, >pgprot_t prot, bool page_mappings_only); > extern void *fixmap_remap_fdt(phys_addr_t dt_phys); > extern void mark_linear_text_alias_ro(void); > +#ifdef CONFIG_MEMORY_HOTPLUG > +extern void hotplug_paging(phys_addr_t start, phys_addr_t size); > +#endif > > #endif > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 5960bef..e96e7d3 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -722,3 +722,90 @@ static int __init register_mem_limit_dumper(void) > return 0; > } > __initcall(register_mem_limit_dumper); > + > +#ifdef CONFIG_MEMORY_HOTPLUG > +int add_pages(int nid, unsigned long start_pfn, > + unsigned long nr_pages, bool want_memblock) > +{ > + int ret; > + u64 start_addr = start_pfn << PAGE_SHIFT; > + /* > +* Mark the first page in the range as unusable. This is needed > +* because __add_section (within __add_pages) wants pfn_valid > +* of it to be false, and in arm64 pfn falid is implemented by > +* just checking at the nomap flag for existing blocks. > +* > +* A small trick here is that __add_section() requires only > +* phys_start_pfn (that is the first pfn of a section) to be > +* invalid. Regardless of whether it was assumed (by the function > +* author) that all pfns within a section are either all valid > +* or all invalid, it allows to avoid looping twice (once here, > +* second when memblock_clear_nomap() is called) through all > +* pfns of the section and modify only one pfn. Thanks to that, > +* further, in __add_zone() only this very first pfn is skipped > +* and corresponding page is not flagged reserved. Therefore it > +* is enough to correct this setup only for it. > +* > +* When arch_add_memory() returns the walk_memory_range() function > +* is called and passed with online_memory_block() callback, > +* which execution finally reaches the memory_block_action() > +* function, where also only the first pfn of a memory block is > +* checked to be reserved. Above, it was first pfn of a section, > +* here it is a block but > +* (drivers/base/memory.c): > +* sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE; > +* (include/linux/memory.h): > +* #define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS) > +* so we can consider block and section equivalently > +*/ > +
Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64
On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielski wrote: > Introduces memory hotplug functionality (hot-add) for arm64. > > Changes v1->v2: > - swapper pgtable updated in place on hot add, avoiding unnecessary copy: > all changes are additive and non destructive. > > - stop_machine used to updated swapper on hot add, avoiding races > > - checking if pagealloc is under debug to stay coherent with mem_map > > Signed-off-by: Maciej Bielski > Signed-off-by: Andrea Reale > --- > arch/arm64/Kconfig | 12 ++ > arch/arm64/configs/defconfig | 1 + > arch/arm64/include/asm/mmu.h | 3 ++ > arch/arm64/mm/init.c | 87 > > arch/arm64/mm/mmu.c | 39 > 5 files changed, 142 insertions(+) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 0df64a6..c736bba 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -641,6 +641,14 @@ config HOTPLUG_CPU > Say Y here to experiment with turning CPUs off and on. CPUs > can be controlled through /sys/devices/system/cpu. > > +config ARCH_HAS_ADD_PAGES > + def_bool y > + depends on ARCH_ENABLE_MEMORY_HOTPLUG > + > +config ARCH_ENABLE_MEMORY_HOTPLUG > + def_bool y > +depends on !NUMA > + > # Common NUMA Features > config NUMA > bool "Numa Memory Allocation and Scheduler Support" > @@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE > > source "mm/Kconfig" > > +config ARCH_MEMORY_PROBE > + def_bool y > + depends on MEMORY_HOTPLUG > + > config SECCOMP > bool "Enable seccomp to safely compute untrusted bytecode" > ---help--- > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig > index 34480e9..5fc5656 100644 > --- a/arch/arm64/configs/defconfig > +++ b/arch/arm64/configs/defconfig > @@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y > CONFIG_SCHED_MC=y > CONFIG_NUMA=y > CONFIG_PREEMPT=y > +CONFIG_MEMORY_HOTPLUG=y > CONFIG_KSM=y > CONFIG_TRANSPARENT_HUGEPAGE=y > CONFIG_CMA=y > diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h > index 0d34bf0..2b3fa4d 100644 > --- a/arch/arm64/include/asm/mmu.h > +++ b/arch/arm64/include/asm/mmu.h > @@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, > phys_addr_t phys, >pgprot_t prot, bool page_mappings_only); > extern void *fixmap_remap_fdt(phys_addr_t dt_phys); > extern void mark_linear_text_alias_ro(void); > +#ifdef CONFIG_MEMORY_HOTPLUG > +extern void hotplug_paging(phys_addr_t start, phys_addr_t size); > +#endif > > #endif > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 5960bef..e96e7d3 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -722,3 +722,90 @@ static int __init register_mem_limit_dumper(void) > return 0; > } > __initcall(register_mem_limit_dumper); > + > +#ifdef CONFIG_MEMORY_HOTPLUG > +int add_pages(int nid, unsigned long start_pfn, > + unsigned long nr_pages, bool want_memblock) > +{ > + int ret; > + u64 start_addr = start_pfn << PAGE_SHIFT; > + /* > +* Mark the first page in the range as unusable. This is needed > +* because __add_section (within __add_pages) wants pfn_valid > +* of it to be false, and in arm64 pfn falid is implemented by > +* just checking at the nomap flag for existing blocks. > +* > +* A small trick here is that __add_section() requires only > +* phys_start_pfn (that is the first pfn of a section) to be > +* invalid. Regardless of whether it was assumed (by the function > +* author) that all pfns within a section are either all valid > +* or all invalid, it allows to avoid looping twice (once here, > +* second when memblock_clear_nomap() is called) through all > +* pfns of the section and modify only one pfn. Thanks to that, > +* further, in __add_zone() only this very first pfn is skipped > +* and corresponding page is not flagged reserved. Therefore it > +* is enough to correct this setup only for it. > +* > +* When arch_add_memory() returns the walk_memory_range() function > +* is called and passed with online_memory_block() callback, > +* which execution finally reaches the memory_block_action() > +* function, where also only the first pfn of a memory block is > +* checked to be reserved. Above, it was first pfn of a section, > +* here it is a block but > +* (drivers/base/memory.c): > +* sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE; > +* (include/linux/memory.h): > +* #define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS) > +* so we can consider block and section equivalently > +*/ > + memblock_mark_nomap(start_addr, 1< + ret = __add_pages(nid, start_pfn, nr_pages,
[PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64
Introduces memory hotplug functionality (hot-add) for arm64. Changes v1->v2: - swapper pgtable updated in place on hot add, avoiding unnecessary copy: all changes are additive and non destructive. - stop_machine used to updated swapper on hot add, avoiding races - checking if pagealloc is under debug to stay coherent with mem_map Signed-off-by: Maciej BielskiSigned-off-by: Andrea Reale --- arch/arm64/Kconfig | 12 ++ arch/arm64/configs/defconfig | 1 + arch/arm64/include/asm/mmu.h | 3 ++ arch/arm64/mm/init.c | 87 arch/arm64/mm/mmu.c | 39 5 files changed, 142 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 0df64a6..c736bba 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -641,6 +641,14 @@ config HOTPLUG_CPU Say Y here to experiment with turning CPUs off and on. CPUs can be controlled through /sys/devices/system/cpu. +config ARCH_HAS_ADD_PAGES + def_bool y + depends on ARCH_ENABLE_MEMORY_HOTPLUG + +config ARCH_ENABLE_MEMORY_HOTPLUG + def_bool y +depends on !NUMA + # Common NUMA Features config NUMA bool "Numa Memory Allocation and Scheduler Support" @@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE source "mm/Kconfig" +config ARCH_MEMORY_PROBE + def_bool y + depends on MEMORY_HOTPLUG + config SECCOMP bool "Enable seccomp to safely compute untrusted bytecode" ---help--- diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index 34480e9..5fc5656 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y CONFIG_SCHED_MC=y CONFIG_NUMA=y CONFIG_PREEMPT=y +CONFIG_MEMORY_HOTPLUG=y CONFIG_KSM=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_CMA=y diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h index 0d34bf0..2b3fa4d 100644 --- a/arch/arm64/include/asm/mmu.h +++ b/arch/arm64/include/asm/mmu.h @@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys, pgprot_t prot, bool page_mappings_only); extern void *fixmap_remap_fdt(phys_addr_t dt_phys); extern void mark_linear_text_alias_ro(void); +#ifdef CONFIG_MEMORY_HOTPLUG +extern void hotplug_paging(phys_addr_t start, phys_addr_t size); +#endif #endif diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 5960bef..e96e7d3 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -722,3 +722,90 @@ static int __init register_mem_limit_dumper(void) return 0; } __initcall(register_mem_limit_dumper); + +#ifdef CONFIG_MEMORY_HOTPLUG +int add_pages(int nid, unsigned long start_pfn, + unsigned long nr_pages, bool want_memblock) +{ + int ret; + u64 start_addr = start_pfn << PAGE_SHIFT; + /* +* Mark the first page in the range as unusable. This is needed +* because __add_section (within __add_pages) wants pfn_valid +* of it to be false, and in arm64 pfn falid is implemented by +* just checking at the nomap flag for existing blocks. +* +* A small trick here is that __add_section() requires only +* phys_start_pfn (that is the first pfn of a section) to be +* invalid. Regardless of whether it was assumed (by the function +* author) that all pfns within a section are either all valid +* or all invalid, it allows to avoid looping twice (once here, +* second when memblock_clear_nomap() is called) through all +* pfns of the section and modify only one pfn. Thanks to that, +* further, in __add_zone() only this very first pfn is skipped +* and corresponding page is not flagged reserved. Therefore it +* is enough to correct this setup only for it. +* +* When arch_add_memory() returns the walk_memory_range() function +* is called and passed with online_memory_block() callback, +* which execution finally reaches the memory_block_action() +* function, where also only the first pfn of a memory block is +* checked to be reserved. Above, it was first pfn of a section, +* here it is a block but +* (drivers/base/memory.c): +* sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE; +* (include/linux/memory.h): +* #define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS) +* so we can consider block and section equivalently +*/ + memblock_mark_nomap(start_addr, 1<
[PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64
Introduces memory hotplug functionality (hot-add) for arm64. Changes v1->v2: - swapper pgtable updated in place on hot add, avoiding unnecessary copy: all changes are additive and non destructive. - stop_machine used to updated swapper on hot add, avoiding races - checking if pagealloc is under debug to stay coherent with mem_map Signed-off-by: Maciej Bielski Signed-off-by: Andrea Reale --- arch/arm64/Kconfig | 12 ++ arch/arm64/configs/defconfig | 1 + arch/arm64/include/asm/mmu.h | 3 ++ arch/arm64/mm/init.c | 87 arch/arm64/mm/mmu.c | 39 5 files changed, 142 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 0df64a6..c736bba 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -641,6 +641,14 @@ config HOTPLUG_CPU Say Y here to experiment with turning CPUs off and on. CPUs can be controlled through /sys/devices/system/cpu. +config ARCH_HAS_ADD_PAGES + def_bool y + depends on ARCH_ENABLE_MEMORY_HOTPLUG + +config ARCH_ENABLE_MEMORY_HOTPLUG + def_bool y +depends on !NUMA + # Common NUMA Features config NUMA bool "Numa Memory Allocation and Scheduler Support" @@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE source "mm/Kconfig" +config ARCH_MEMORY_PROBE + def_bool y + depends on MEMORY_HOTPLUG + config SECCOMP bool "Enable seccomp to safely compute untrusted bytecode" ---help--- diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index 34480e9..5fc5656 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y CONFIG_SCHED_MC=y CONFIG_NUMA=y CONFIG_PREEMPT=y +CONFIG_MEMORY_HOTPLUG=y CONFIG_KSM=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_CMA=y diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h index 0d34bf0..2b3fa4d 100644 --- a/arch/arm64/include/asm/mmu.h +++ b/arch/arm64/include/asm/mmu.h @@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys, pgprot_t prot, bool page_mappings_only); extern void *fixmap_remap_fdt(phys_addr_t dt_phys); extern void mark_linear_text_alias_ro(void); +#ifdef CONFIG_MEMORY_HOTPLUG +extern void hotplug_paging(phys_addr_t start, phys_addr_t size); +#endif #endif diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 5960bef..e96e7d3 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -722,3 +722,90 @@ static int __init register_mem_limit_dumper(void) return 0; } __initcall(register_mem_limit_dumper); + +#ifdef CONFIG_MEMORY_HOTPLUG +int add_pages(int nid, unsigned long start_pfn, + unsigned long nr_pages, bool want_memblock) +{ + int ret; + u64 start_addr = start_pfn << PAGE_SHIFT; + /* +* Mark the first page in the range as unusable. This is needed +* because __add_section (within __add_pages) wants pfn_valid +* of it to be false, and in arm64 pfn falid is implemented by +* just checking at the nomap flag for existing blocks. +* +* A small trick here is that __add_section() requires only +* phys_start_pfn (that is the first pfn of a section) to be +* invalid. Regardless of whether it was assumed (by the function +* author) that all pfns within a section are either all valid +* or all invalid, it allows to avoid looping twice (once here, +* second when memblock_clear_nomap() is called) through all +* pfns of the section and modify only one pfn. Thanks to that, +* further, in __add_zone() only this very first pfn is skipped +* and corresponding page is not flagged reserved. Therefore it +* is enough to correct this setup only for it. +* +* When arch_add_memory() returns the walk_memory_range() function +* is called and passed with online_memory_block() callback, +* which execution finally reaches the memory_block_action() +* function, where also only the first pfn of a memory block is +* checked to be reserved. Above, it was first pfn of a section, +* here it is a block but +* (drivers/base/memory.c): +* sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE; +* (include/linux/memory.h): +* #define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS) +* so we can consider block and section equivalently +*/ + memblock_mark_nomap(start_addr, 1<> PAGE_SHIFT; + unsigned long nr_pages = size >> PAGE_SHIFT; + unsigned long end_pfn = start_pfn + nr_pages; + unsigned long max_sparsemem_pfn = 1UL << (MAX_PHYSMEM_BITS-PAGE_SHIFT); + + if (end_pfn > max_sparsemem_pfn) { + pr_err("end_pfn too big"); + return -1; + } +