Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64

2017-11-27 Thread Andrea Reale
On Mon 27 Nov 2017, 17:39, Maciej Bielski wrote:

Hi Robin,

> Hi Robin,
> 
> Thank you for your feedback, its highly appreciated. I let myself to add some
> comments.
> 
> Our primary goal was to have hotplug working even in the basic setup and
> publish first working results. Then we want to improve the code building on 
> top
> of community comments. This is a general answer for questions about
> configuration flags. The working setup is presented, a bit as a hint, and we 
> do
> not deem it to be ultimately best at all. The questions about configuration,
> IMHO, falls into category of making an agreement on a proper setup (defaults,
> dependencies) and, therefore, we strongly rely on the community experience to
> advise us how it should be. So, shortly, for some questions "why this is setup
> in such a way" the simple anser is that it worked as a first approximation.
> Then, I totally agree that for a server-grade system it should be different 
> and
> thanks a lot for sharing your opinion on that.
> 
> On Mon, Nov 27, 2017 at 03:19:49PM +, Robin Murphy wrote:
> > Hi Andrea,
> > 
> > I've also been looking at memory hotplug for arm64, from the perspective of
> > enabling ZONE_DEVICE for pmem. May I ask what your use-case for this series
> > is? AFAICS the real demand will be coming from server systems, which in
> > practice means both ACPI and NUMA, both of which are being resoundingly
> > ignored here.
> > 
> 
> Eventually we aim for aarch64 server system.
> 

Adding to what Maciej said: the original motivation and driving factor
for this development effort is this project: http://www.dredbox.eu

In short, we have a software-defined interconnect for disaggregated
memory, where memory can be connected to nodes dynamically and via
software. At reconfigurations, we need to hot add and hot remove memory
from running kernels. Our current research prototype is based on an
arm64 SoC+FPGA system. Hence memory hotplug for arm64.  
Since triggers for hot-add and hot-remove are software, we do not need
ACPI; in our specifc case, memory topologies can change dinamically, so
we have a rather ad-hoc and project specific support NUMA that, we
believe. does not make any sense to discuss for mainlining.

> > Further review comments inline.
> > 
> > On 23/11/17 11:13, Maciej Bielski wrote:
> > >Introduces memory hotplug functionality (hot-add) for arm64.
> > >
> > >Changes v1->v2:
> > >- swapper pgtable updated in place on hot add, avoiding unnecessary copy:
> > >   all changes are additive and non destructive.
> > >
> > >- stop_machine used to updated swapper on hot add, avoiding races
> > >
> > >- checking if pagealloc is under debug to stay coherent with mem_map
> > >
> > >Signed-off-by: Maciej Bielski 
> > >Signed-off-by: Andrea Reale 
> > >---
> > >  arch/arm64/Kconfig   | 12 ++
> > >  arch/arm64/configs/defconfig |  1 +
> > >  arch/arm64/include/asm/mmu.h |  3 ++
> > >  arch/arm64/mm/init.c | 87 
> > > 
> > >  arch/arm64/mm/mmu.c  | 39 
> > >  5 files changed, 142 insertions(+)
> > >
> > >diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > >index 0df64a6..c736bba 100644
> > >--- a/arch/arm64/Kconfig
> > >+++ b/arch/arm64/Kconfig
> > >@@ -641,6 +641,14 @@ config HOTPLUG_CPU
> > > Say Y here to experiment with turning CPUs off and on.  CPUs
> > > can be controlled through /sys/devices/system/cpu.
> > >+config ARCH_HAS_ADD_PAGES
> > >+  def_bool y
> > >+  depends on ARCH_ENABLE_MEMORY_HOTPLUG
> > >+
> > >+config ARCH_ENABLE_MEMORY_HOTPLUG
> > >+  def_bool y
> > >+depends on !NUMA
> > 
> > As above, realistically this seems too limiting to be useful.
> > 
> > >+
> > >  # Common NUMA Features
> > >  config NUMA
> > >   bool "Numa Memory Allocation and Scheduler Support"
> > >@@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE
> > >  source "mm/Kconfig"
> > >+config ARCH_MEMORY_PROBE
> > >+  def_bool y
> > >+  depends on MEMORY_HOTPLUG
> > 
> > I'm particularly dubious about enabling this by default - it's useful for
> > development and testing, yes, but I think it's the kind of feature where the
> > onus should be on interested developers to turn it on, rather than
> > production configs to have to turn it off.
> > 
> > >+
> > >  config SECCOMP
> > >   bool "Enable seccomp to safely compute untrusted bytecode"
> > >   ---help---
> > >diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> > >index 34480e9..5fc5656 100644
> > >--- a/arch/arm64/configs/defconfig
> > >+++ b/arch/arm64/configs/defconfig
> > >@@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y
> > >  CONFIG_SCHED_MC=y
> > >  CONFIG_NUMA=y
> > >  CONFIG_PREEMPT=y
> > >+CONFIG_MEMORY_HOTPLUG=y
> > 
> > Note that this is effectively pointless, given two lines above...
> > 

Well spotted, thanks :) 

> > >  CONFIG_KSM=y
> > >  CONFIG_TRANSPARENT_HUGEPAGE=y
> > >  CONFIG_CMA=y
> 

Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64

2017-11-27 Thread Andrea Reale
On Mon 27 Nov 2017, 17:39, Maciej Bielski wrote:

Hi Robin,

> Hi Robin,
> 
> Thank you for your feedback, its highly appreciated. I let myself to add some
> comments.
> 
> Our primary goal was to have hotplug working even in the basic setup and
> publish first working results. Then we want to improve the code building on 
> top
> of community comments. This is a general answer for questions about
> configuration flags. The working setup is presented, a bit as a hint, and we 
> do
> not deem it to be ultimately best at all. The questions about configuration,
> IMHO, falls into category of making an agreement on a proper setup (defaults,
> dependencies) and, therefore, we strongly rely on the community experience to
> advise us how it should be. So, shortly, for some questions "why this is setup
> in such a way" the simple anser is that it worked as a first approximation.
> Then, I totally agree that for a server-grade system it should be different 
> and
> thanks a lot for sharing your opinion on that.
> 
> On Mon, Nov 27, 2017 at 03:19:49PM +, Robin Murphy wrote:
> > Hi Andrea,
> > 
> > I've also been looking at memory hotplug for arm64, from the perspective of
> > enabling ZONE_DEVICE for pmem. May I ask what your use-case for this series
> > is? AFAICS the real demand will be coming from server systems, which in
> > practice means both ACPI and NUMA, both of which are being resoundingly
> > ignored here.
> > 
> 
> Eventually we aim for aarch64 server system.
> 

Adding to what Maciej said: the original motivation and driving factor
for this development effort is this project: http://www.dredbox.eu

In short, we have a software-defined interconnect for disaggregated
memory, where memory can be connected to nodes dynamically and via
software. At reconfigurations, we need to hot add and hot remove memory
from running kernels. Our current research prototype is based on an
arm64 SoC+FPGA system. Hence memory hotplug for arm64.  
Since triggers for hot-add and hot-remove are software, we do not need
ACPI; in our specifc case, memory topologies can change dinamically, so
we have a rather ad-hoc and project specific support NUMA that, we
believe. does not make any sense to discuss for mainlining.

> > Further review comments inline.
> > 
> > On 23/11/17 11:13, Maciej Bielski wrote:
> > >Introduces memory hotplug functionality (hot-add) for arm64.
> > >
> > >Changes v1->v2:
> > >- swapper pgtable updated in place on hot add, avoiding unnecessary copy:
> > >   all changes are additive and non destructive.
> > >
> > >- stop_machine used to updated swapper on hot add, avoiding races
> > >
> > >- checking if pagealloc is under debug to stay coherent with mem_map
> > >
> > >Signed-off-by: Maciej Bielski 
> > >Signed-off-by: Andrea Reale 
> > >---
> > >  arch/arm64/Kconfig   | 12 ++
> > >  arch/arm64/configs/defconfig |  1 +
> > >  arch/arm64/include/asm/mmu.h |  3 ++
> > >  arch/arm64/mm/init.c | 87 
> > > 
> > >  arch/arm64/mm/mmu.c  | 39 
> > >  5 files changed, 142 insertions(+)
> > >
> > >diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > >index 0df64a6..c736bba 100644
> > >--- a/arch/arm64/Kconfig
> > >+++ b/arch/arm64/Kconfig
> > >@@ -641,6 +641,14 @@ config HOTPLUG_CPU
> > > Say Y here to experiment with turning CPUs off and on.  CPUs
> > > can be controlled through /sys/devices/system/cpu.
> > >+config ARCH_HAS_ADD_PAGES
> > >+  def_bool y
> > >+  depends on ARCH_ENABLE_MEMORY_HOTPLUG
> > >+
> > >+config ARCH_ENABLE_MEMORY_HOTPLUG
> > >+  def_bool y
> > >+depends on !NUMA
> > 
> > As above, realistically this seems too limiting to be useful.
> > 
> > >+
> > >  # Common NUMA Features
> > >  config NUMA
> > >   bool "Numa Memory Allocation and Scheduler Support"
> > >@@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE
> > >  source "mm/Kconfig"
> > >+config ARCH_MEMORY_PROBE
> > >+  def_bool y
> > >+  depends on MEMORY_HOTPLUG
> > 
> > I'm particularly dubious about enabling this by default - it's useful for
> > development and testing, yes, but I think it's the kind of feature where the
> > onus should be on interested developers to turn it on, rather than
> > production configs to have to turn it off.
> > 
> > >+
> > >  config SECCOMP
> > >   bool "Enable seccomp to safely compute untrusted bytecode"
> > >   ---help---
> > >diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> > >index 34480e9..5fc5656 100644
> > >--- a/arch/arm64/configs/defconfig
> > >+++ b/arch/arm64/configs/defconfig
> > >@@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y
> > >  CONFIG_SCHED_MC=y
> > >  CONFIG_NUMA=y
> > >  CONFIG_PREEMPT=y
> > >+CONFIG_MEMORY_HOTPLUG=y
> > 
> > Note that this is effectively pointless, given two lines above...
> > 

Well spotted, thanks :) 

> > >  CONFIG_KSM=y
> > >  CONFIG_TRANSPARENT_HUGEPAGE=y
> > >  CONFIG_CMA=y
> > >diff --git a/arch/arm64/include/asm/mmu.h 

Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64

2017-11-27 Thread Maciej Bielski
Hi Robin,

Thank you for your feedback, its highly appreciated. I let myself to add some
comments.

Our primary goal was to have hotplug working even in the basic setup and
publish first working results. Then we want to improve the code building on top
of community comments. This is a general answer for questions about
configuration flags. The working setup is presented, a bit as a hint, and we do
not deem it to be ultimately best at all. The questions about configuration,
IMHO, falls into category of making an agreement on a proper setup (defaults,
dependencies) and, therefore, we strongly rely on the community experience to
advise us how it should be. So, shortly, for some questions "why this is setup
in such a way" the simple anser is that it worked as a first approximation.
Then, I totally agree that for a server-grade system it should be different and
thanks a lot for sharing your opinion on that.

On Mon, Nov 27, 2017 at 03:19:49PM +, Robin Murphy wrote:
> Hi Andrea,
> 
> I've also been looking at memory hotplug for arm64, from the perspective of
> enabling ZONE_DEVICE for pmem. May I ask what your use-case for this series
> is? AFAICS the real demand will be coming from server systems, which in
> practice means both ACPI and NUMA, both of which are being resoundingly
> ignored here.
> 

Eventually we aim for aarch64 server system.

> Further review comments inline.
> 
> On 23/11/17 11:13, Maciej Bielski wrote:
> >Introduces memory hotplug functionality (hot-add) for arm64.
> >
> >Changes v1->v2:
> >- swapper pgtable updated in place on hot add, avoiding unnecessary copy:
> >   all changes are additive and non destructive.
> >
> >- stop_machine used to updated swapper on hot add, avoiding races
> >
> >- checking if pagealloc is under debug to stay coherent with mem_map
> >
> >Signed-off-by: Maciej Bielski 
> >Signed-off-by: Andrea Reale 
> >---
> >  arch/arm64/Kconfig   | 12 ++
> >  arch/arm64/configs/defconfig |  1 +
> >  arch/arm64/include/asm/mmu.h |  3 ++
> >  arch/arm64/mm/init.c | 87 
> > 
> >  arch/arm64/mm/mmu.c  | 39 
> >  5 files changed, 142 insertions(+)
> >
> >diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >index 0df64a6..c736bba 100644
> >--- a/arch/arm64/Kconfig
> >+++ b/arch/arm64/Kconfig
> >@@ -641,6 +641,14 @@ config HOTPLUG_CPU
> >   Say Y here to experiment with turning CPUs off and on.  CPUs
> >   can be controlled through /sys/devices/system/cpu.
> >+config ARCH_HAS_ADD_PAGES
> >+def_bool y
> >+depends on ARCH_ENABLE_MEMORY_HOTPLUG
> >+
> >+config ARCH_ENABLE_MEMORY_HOTPLUG
> >+def_bool y
> >+depends on !NUMA
> 
> As above, realistically this seems too limiting to be useful.
> 
> >+
> >  # Common NUMA Features
> >  config NUMA
> > bool "Numa Memory Allocation and Scheduler Support"
> >@@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE
> >  source "mm/Kconfig"
> >+config ARCH_MEMORY_PROBE
> >+def_bool y
> >+depends on MEMORY_HOTPLUG
> 
> I'm particularly dubious about enabling this by default - it's useful for
> development and testing, yes, but I think it's the kind of feature where the
> onus should be on interested developers to turn it on, rather than
> production configs to have to turn it off.
> 
> >+
> >  config SECCOMP
> > bool "Enable seccomp to safely compute untrusted bytecode"
> > ---help---
> >diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> >index 34480e9..5fc5656 100644
> >--- a/arch/arm64/configs/defconfig
> >+++ b/arch/arm64/configs/defconfig
> >@@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y
> >  CONFIG_SCHED_MC=y
> >  CONFIG_NUMA=y
> >  CONFIG_PREEMPT=y
> >+CONFIG_MEMORY_HOTPLUG=y
> 
> Note that this is effectively pointless, given two lines above...
> 
> >  CONFIG_KSM=y
> >  CONFIG_TRANSPARENT_HUGEPAGE=y
> >  CONFIG_CMA=y
> >diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> >index 0d34bf0..2b3fa4d 100644
> >--- a/arch/arm64/include/asm/mmu.h
> >+++ b/arch/arm64/include/asm/mmu.h
> >@@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, 
> >phys_addr_t phys,
> >pgprot_t prot, bool page_mappings_only);
> >  extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
> >  extern void mark_linear_text_alias_ro(void);
> >+#ifdef CONFIG_MEMORY_HOTPLUG
> >+extern void hotplug_paging(phys_addr_t start, phys_addr_t size);
> 
> Is there any reason for not just implementing all the hotplug code
> self-contained in mmu.c?
> 

Simply, in the first version we were supposed to built on top of the patch by
Scott Branden, who put a mock implementation of arch_add_memory() in
arch/arm64/mm/init.c, this is why hotplug_paging() needed to be announced
outside. Quickly looking on the code now I agree that it would be more clean to
put everything in arch/arm64/mm/mmu.c. I will test that.

> 

Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64

2017-11-27 Thread Maciej Bielski
Hi Robin,

Thank you for your feedback, its highly appreciated. I let myself to add some
comments.

Our primary goal was to have hotplug working even in the basic setup and
publish first working results. Then we want to improve the code building on top
of community comments. This is a general answer for questions about
configuration flags. The working setup is presented, a bit as a hint, and we do
not deem it to be ultimately best at all. The questions about configuration,
IMHO, falls into category of making an agreement on a proper setup (defaults,
dependencies) and, therefore, we strongly rely on the community experience to
advise us how it should be. So, shortly, for some questions "why this is setup
in such a way" the simple anser is that it worked as a first approximation.
Then, I totally agree that for a server-grade system it should be different and
thanks a lot for sharing your opinion on that.

On Mon, Nov 27, 2017 at 03:19:49PM +, Robin Murphy wrote:
> Hi Andrea,
> 
> I've also been looking at memory hotplug for arm64, from the perspective of
> enabling ZONE_DEVICE for pmem. May I ask what your use-case for this series
> is? AFAICS the real demand will be coming from server systems, which in
> practice means both ACPI and NUMA, both of which are being resoundingly
> ignored here.
> 

Eventually we aim for aarch64 server system.

> Further review comments inline.
> 
> On 23/11/17 11:13, Maciej Bielski wrote:
> >Introduces memory hotplug functionality (hot-add) for arm64.
> >
> >Changes v1->v2:
> >- swapper pgtable updated in place on hot add, avoiding unnecessary copy:
> >   all changes are additive and non destructive.
> >
> >- stop_machine used to updated swapper on hot add, avoiding races
> >
> >- checking if pagealloc is under debug to stay coherent with mem_map
> >
> >Signed-off-by: Maciej Bielski 
> >Signed-off-by: Andrea Reale 
> >---
> >  arch/arm64/Kconfig   | 12 ++
> >  arch/arm64/configs/defconfig |  1 +
> >  arch/arm64/include/asm/mmu.h |  3 ++
> >  arch/arm64/mm/init.c | 87 
> > 
> >  arch/arm64/mm/mmu.c  | 39 
> >  5 files changed, 142 insertions(+)
> >
> >diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >index 0df64a6..c736bba 100644
> >--- a/arch/arm64/Kconfig
> >+++ b/arch/arm64/Kconfig
> >@@ -641,6 +641,14 @@ config HOTPLUG_CPU
> >   Say Y here to experiment with turning CPUs off and on.  CPUs
> >   can be controlled through /sys/devices/system/cpu.
> >+config ARCH_HAS_ADD_PAGES
> >+def_bool y
> >+depends on ARCH_ENABLE_MEMORY_HOTPLUG
> >+
> >+config ARCH_ENABLE_MEMORY_HOTPLUG
> >+def_bool y
> >+depends on !NUMA
> 
> As above, realistically this seems too limiting to be useful.
> 
> >+
> >  # Common NUMA Features
> >  config NUMA
> > bool "Numa Memory Allocation and Scheduler Support"
> >@@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE
> >  source "mm/Kconfig"
> >+config ARCH_MEMORY_PROBE
> >+def_bool y
> >+depends on MEMORY_HOTPLUG
> 
> I'm particularly dubious about enabling this by default - it's useful for
> development and testing, yes, but I think it's the kind of feature where the
> onus should be on interested developers to turn it on, rather than
> production configs to have to turn it off.
> 
> >+
> >  config SECCOMP
> > bool "Enable seccomp to safely compute untrusted bytecode"
> > ---help---
> >diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> >index 34480e9..5fc5656 100644
> >--- a/arch/arm64/configs/defconfig
> >+++ b/arch/arm64/configs/defconfig
> >@@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y
> >  CONFIG_SCHED_MC=y
> >  CONFIG_NUMA=y
> >  CONFIG_PREEMPT=y
> >+CONFIG_MEMORY_HOTPLUG=y
> 
> Note that this is effectively pointless, given two lines above...
> 
> >  CONFIG_KSM=y
> >  CONFIG_TRANSPARENT_HUGEPAGE=y
> >  CONFIG_CMA=y
> >diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> >index 0d34bf0..2b3fa4d 100644
> >--- a/arch/arm64/include/asm/mmu.h
> >+++ b/arch/arm64/include/asm/mmu.h
> >@@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, 
> >phys_addr_t phys,
> >pgprot_t prot, bool page_mappings_only);
> >  extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
> >  extern void mark_linear_text_alias_ro(void);
> >+#ifdef CONFIG_MEMORY_HOTPLUG
> >+extern void hotplug_paging(phys_addr_t start, phys_addr_t size);
> 
> Is there any reason for not just implementing all the hotplug code
> self-contained in mmu.c?
> 

Simply, in the first version we were supposed to built on top of the patch by
Scott Branden, who put a mock implementation of arch_add_memory() in
arch/arm64/mm/init.c, this is why hotplug_paging() needed to be announced
outside. Quickly looking on the code now I agree that it would be more clean to
put everything in arch/arm64/mm/mmu.c. I will test that.

> >+#endif
> >  #endif
> >diff --git a/arch/arm64/mm/init.c 

Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64

2017-11-27 Thread Robin Murphy

Hi Andrea,

I've also been looking at memory hotplug for arm64, from the perspective 
of enabling ZONE_DEVICE for pmem. May I ask what your use-case for this 
series is? AFAICS the real demand will be coming from server systems, 
which in practice means both ACPI and NUMA, both of which are being 
resoundingly ignored here.


Further review comments inline.

On 23/11/17 11:13, Maciej Bielski wrote:

Introduces memory hotplug functionality (hot-add) for arm64.

Changes v1->v2:
- swapper pgtable updated in place on hot add, avoiding unnecessary copy:
   all changes are additive and non destructive.

- stop_machine used to updated swapper on hot add, avoiding races

- checking if pagealloc is under debug to stay coherent with mem_map

Signed-off-by: Maciej Bielski 
Signed-off-by: Andrea Reale 
---
  arch/arm64/Kconfig   | 12 ++
  arch/arm64/configs/defconfig |  1 +
  arch/arm64/include/asm/mmu.h |  3 ++
  arch/arm64/mm/init.c | 87 
  arch/arm64/mm/mmu.c  | 39 
  5 files changed, 142 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6..c736bba 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -641,6 +641,14 @@ config HOTPLUG_CPU
  Say Y here to experiment with turning CPUs off and on.  CPUs
  can be controlled through /sys/devices/system/cpu.
  
+config ARCH_HAS_ADD_PAGES

+   def_bool y
+   depends on ARCH_ENABLE_MEMORY_HOTPLUG
+
+config ARCH_ENABLE_MEMORY_HOTPLUG
+   def_bool y
+depends on !NUMA


As above, realistically this seems too limiting to be useful.


+
  # Common NUMA Features
  config NUMA
bool "Numa Memory Allocation and Scheduler Support"
@@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE
  
  source "mm/Kconfig"
  
+config ARCH_MEMORY_PROBE

+   def_bool y
+   depends on MEMORY_HOTPLUG


I'm particularly dubious about enabling this by default - it's useful 
for development and testing, yes, but I think it's the kind of feature 
where the onus should be on interested developers to turn it on, rather 
than production configs to have to turn it off.



+
  config SECCOMP
bool "Enable seccomp to safely compute untrusted bytecode"
---help---
diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 34480e9..5fc5656 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y
  CONFIG_SCHED_MC=y
  CONFIG_NUMA=y
  CONFIG_PREEMPT=y
+CONFIG_MEMORY_HOTPLUG=y


Note that this is effectively pointless, given two lines above...


  CONFIG_KSM=y
  CONFIG_TRANSPARENT_HUGEPAGE=y
  CONFIG_CMA=y
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 0d34bf0..2b3fa4d 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, 
phys_addr_t phys,
   pgprot_t prot, bool page_mappings_only);
  extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
  extern void mark_linear_text_alias_ro(void);
+#ifdef CONFIG_MEMORY_HOTPLUG
+extern void hotplug_paging(phys_addr_t start, phys_addr_t size);


Is there any reason for not just implementing all the hotplug code 
self-contained in mmu.c?



+#endif
  
  #endif

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 5960bef..e96e7d3 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -722,3 +722,90 @@ static int __init register_mem_limit_dumper(void)
return 0;
  }
  __initcall(register_mem_limit_dumper);
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+int add_pages(int nid, unsigned long start_pfn,
+   unsigned long nr_pages, bool want_memblock)
+{
+   int ret;
+   u64 start_addr = start_pfn << PAGE_SHIFT;
+   /*
+* Mark the first page in the range as unusable. This is needed
+* because __add_section (within __add_pages) wants pfn_valid
+* of it to be false, and in arm64 pfn falid is implemented by
+* just checking at the nomap flag for existing blocks.
+*
+* A small trick here is that __add_section() requires only
+* phys_start_pfn (that is the first pfn of a section) to be
+* invalid. Regardless of whether it was assumed (by the function
+* author) that all pfns within a section are either all valid
+* or all invalid, it allows to avoid looping twice (once here,
+* second when memblock_clear_nomap() is called) through all
+* pfns of the section and modify only one pfn. Thanks to that,
+* further, in __add_zone() only this very first pfn is skipped
+* and corresponding page is not flagged reserved. Therefore it
+* is enough to correct this setup only for it.
+*
+* When arch_add_memory() returns the walk_memory_range() function
+

Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64

2017-11-27 Thread Robin Murphy

Hi Andrea,

I've also been looking at memory hotplug for arm64, from the perspective 
of enabling ZONE_DEVICE for pmem. May I ask what your use-case for this 
series is? AFAICS the real demand will be coming from server systems, 
which in practice means both ACPI and NUMA, both of which are being 
resoundingly ignored here.


Further review comments inline.

On 23/11/17 11:13, Maciej Bielski wrote:

Introduces memory hotplug functionality (hot-add) for arm64.

Changes v1->v2:
- swapper pgtable updated in place on hot add, avoiding unnecessary copy:
   all changes are additive and non destructive.

- stop_machine used to updated swapper on hot add, avoiding races

- checking if pagealloc is under debug to stay coherent with mem_map

Signed-off-by: Maciej Bielski 
Signed-off-by: Andrea Reale 
---
  arch/arm64/Kconfig   | 12 ++
  arch/arm64/configs/defconfig |  1 +
  arch/arm64/include/asm/mmu.h |  3 ++
  arch/arm64/mm/init.c | 87 
  arch/arm64/mm/mmu.c  | 39 
  5 files changed, 142 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6..c736bba 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -641,6 +641,14 @@ config HOTPLUG_CPU
  Say Y here to experiment with turning CPUs off and on.  CPUs
  can be controlled through /sys/devices/system/cpu.
  
+config ARCH_HAS_ADD_PAGES

+   def_bool y
+   depends on ARCH_ENABLE_MEMORY_HOTPLUG
+
+config ARCH_ENABLE_MEMORY_HOTPLUG
+   def_bool y
+depends on !NUMA


As above, realistically this seems too limiting to be useful.


+
  # Common NUMA Features
  config NUMA
bool "Numa Memory Allocation and Scheduler Support"
@@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE
  
  source "mm/Kconfig"
  
+config ARCH_MEMORY_PROBE

+   def_bool y
+   depends on MEMORY_HOTPLUG


I'm particularly dubious about enabling this by default - it's useful 
for development and testing, yes, but I think it's the kind of feature 
where the onus should be on interested developers to turn it on, rather 
than production configs to have to turn it off.



+
  config SECCOMP
bool "Enable seccomp to safely compute untrusted bytecode"
---help---
diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 34480e9..5fc5656 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y
  CONFIG_SCHED_MC=y
  CONFIG_NUMA=y
  CONFIG_PREEMPT=y
+CONFIG_MEMORY_HOTPLUG=y


Note that this is effectively pointless, given two lines above...


  CONFIG_KSM=y
  CONFIG_TRANSPARENT_HUGEPAGE=y
  CONFIG_CMA=y
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 0d34bf0..2b3fa4d 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, 
phys_addr_t phys,
   pgprot_t prot, bool page_mappings_only);
  extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
  extern void mark_linear_text_alias_ro(void);
+#ifdef CONFIG_MEMORY_HOTPLUG
+extern void hotplug_paging(phys_addr_t start, phys_addr_t size);


Is there any reason for not just implementing all the hotplug code 
self-contained in mmu.c?



+#endif
  
  #endif

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 5960bef..e96e7d3 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -722,3 +722,90 @@ static int __init register_mem_limit_dumper(void)
return 0;
  }
  __initcall(register_mem_limit_dumper);
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+int add_pages(int nid, unsigned long start_pfn,
+   unsigned long nr_pages, bool want_memblock)
+{
+   int ret;
+   u64 start_addr = start_pfn << PAGE_SHIFT;
+   /*
+* Mark the first page in the range as unusable. This is needed
+* because __add_section (within __add_pages) wants pfn_valid
+* of it to be false, and in arm64 pfn falid is implemented by
+* just checking at the nomap flag for existing blocks.
+*
+* A small trick here is that __add_section() requires only
+* phys_start_pfn (that is the first pfn of a section) to be
+* invalid. Regardless of whether it was assumed (by the function
+* author) that all pfns within a section are either all valid
+* or all invalid, it allows to avoid looping twice (once here,
+* second when memblock_clear_nomap() is called) through all
+* pfns of the section and modify only one pfn. Thanks to that,
+* further, in __add_zone() only this very first pfn is skipped
+* and corresponding page is not flagged reserved. Therefore it
+* is enough to correct this setup only for it.
+*
+* When arch_add_memory() returns the walk_memory_range() function
+* is called and passed with online_memory_block() callback,

Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64

2017-11-25 Thread Arun KS
On Fri, Nov 24, 2017 at 4:23 PM, Maciej Bielski
 wrote:
> On Fri, Nov 24, 2017 at 09:42:33AM +, Andrea Reale wrote:
>> Hi Arun,
>>
>>
>> On Fri 24 Nov 2017, 11:25, Arun KS wrote:
>> > On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielski
>> >  wrote:
>> >> [ ...]
>> > > Introduces memory hotplug functionality (hot-add) for arm64.
>> > > @@ -615,6 +616,44 @@ void __init paging_init(void)
>> > >   SWAPPER_DIR_SIZE - PAGE_SIZE);
>> > >  }
>> > >
>> > > +#ifdef CONFIG_MEMORY_HOTPLUG
>> > > +
>> > > +/*
>> > > + * hotplug_paging() is used by memory hotplug to build new page tables
>> > > + * for hot added memory.
>> > > + */
>> > > +
>> > > +struct mem_range {
>> > > +   phys_addr_t base;
>> > > +   phys_addr_t size;
>> > > +};
>> > > +
>> > > +static int __hotplug_paging(void *data)
>> > > +{
>> > > +   int flags = 0;
>> > > +   struct mem_range *section = data;
>> > > +
>> > > +   if (debug_pagealloc_enabled())
>> > > +   flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
>> > > +
>> > > +   __create_pgd_mapping(swapper_pg_dir, section->base,
>> > > +   __phys_to_virt(section->base), section->size,
>> > > +   PAGE_KERNEL, pgd_pgtable_alloc, flags);
>> >
>> > Hello Andrea,
>> >
>> > __hotplug_paging runs on stop_machine context.
>> > cpu stop callbacks must not sleep.
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/stop_machine.c?h=v4.14#n479
>> >
>> > __create_pgd_mapping uses pgd_pgtable_alloc. which does
>> > __get_free_page(PGALLOC_GFP)
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/mm/mmu.c?h=v4.14#n342
>> >
>> > PGALLOC_GFP has GFP_KERNEL which inturn has __GFP_RECLAIM
>> >
>> > #define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO)
>> > #define GFP_KERNEL  (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
>> >
>> > Now, prepare_alloc_pages() called by __alloc_pages_nodemask checks for
>> >
>> > might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
>> >
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_alloc.c?h=v4.14#n4150
>> >
>> > and then BUG()
>>
>> Well spotted, thanks for reporting the problem. One possible solution
>> would be to revert back to building the updated page tables on a copy
>> pgdir (as it was done in v1 of this patchset) and then replacing swapper
>> atomically with stop_machine.
>>
>> Actually, I am not sure if stop_machine is strictly needed,
>> if we modify the swapper pgdir live: for example, in x86_64
>> kernel_physical_mapping_init, atomicity is ensured by spin-locking on
>> init_mm.page_table_lock.
>> https://elixir.free-electrons.com/linux/v4.14/source/arch/x86/mm/init_64.c#L684
>> I'll spend some time investigating whoever else could be working
>> concurrently on the swapper pgdir.
>>
>> Any suggestion or pointer is very welcome.
>
> Hi Andrea, Arun,
>
> Alternative approach could be implementing pgd_pgtable_alloc_nosleep() and
> pointing this to hotplug_paging(). Subsequently, it could use different flags,
> eg:
>
> #define PGALLOC_GFP_NORECLAIM   (__GFP_IO | __GFP_FS | __GFP_NOTRACK | 
> __GFP_ZERO)

This solves the problem with __get_free_page.

But pgd_pgtable_alloc() ->  pgtable_page_ctor() -> ptlock_alloc() and
then kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL)
Same BUG again.

Regards,
Arun

>
> Is this unefficient approach in any way?
> Do we like the fact that the memory-attaching thread can go to sleep?
>
> BR,
>
>>
>> Thanks,
>> Andrea
>>
>> > I was testing on 4.4 kernel, but cross checked with 4.14 as well.
>> >
>> > Regards,
>> > Arun
>> >
>> >
>> > > +
>> > > +   return 0;
>> > > +}
>> > > +
>> > > +inline void hotplug_paging(phys_addr_t start, phys_addr_t size)
>> > > +{
>> > > +   struct mem_range section = {
>> > > +   .base = start,
>> > > +   .size = size,
>> > > +   };
>> > > +
>> > > +   stop_machine(__hotplug_paging, , NULL);
>> > > +}
>> > > +#endif /* CONFIG_MEMORY_HOTPLUG */
>> > > +
>> > >  /*
>> > >   * Check whether a kernel address is valid (derived from arch/x86/).
>> > >   */
>> > > --
>> > > 2.7.4
>> > >
>> >
>>
>
> --
> Maciej Bielski


Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64

2017-11-25 Thread Arun KS
On Fri, Nov 24, 2017 at 4:23 PM, Maciej Bielski
 wrote:
> On Fri, Nov 24, 2017 at 09:42:33AM +, Andrea Reale wrote:
>> Hi Arun,
>>
>>
>> On Fri 24 Nov 2017, 11:25, Arun KS wrote:
>> > On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielski
>> >  wrote:
>> >> [ ...]
>> > > Introduces memory hotplug functionality (hot-add) for arm64.
>> > > @@ -615,6 +616,44 @@ void __init paging_init(void)
>> > >   SWAPPER_DIR_SIZE - PAGE_SIZE);
>> > >  }
>> > >
>> > > +#ifdef CONFIG_MEMORY_HOTPLUG
>> > > +
>> > > +/*
>> > > + * hotplug_paging() is used by memory hotplug to build new page tables
>> > > + * for hot added memory.
>> > > + */
>> > > +
>> > > +struct mem_range {
>> > > +   phys_addr_t base;
>> > > +   phys_addr_t size;
>> > > +};
>> > > +
>> > > +static int __hotplug_paging(void *data)
>> > > +{
>> > > +   int flags = 0;
>> > > +   struct mem_range *section = data;
>> > > +
>> > > +   if (debug_pagealloc_enabled())
>> > > +   flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
>> > > +
>> > > +   __create_pgd_mapping(swapper_pg_dir, section->base,
>> > > +   __phys_to_virt(section->base), section->size,
>> > > +   PAGE_KERNEL, pgd_pgtable_alloc, flags);
>> >
>> > Hello Andrea,
>> >
>> > __hotplug_paging runs on stop_machine context.
>> > cpu stop callbacks must not sleep.
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/stop_machine.c?h=v4.14#n479
>> >
>> > __create_pgd_mapping uses pgd_pgtable_alloc. which does
>> > __get_free_page(PGALLOC_GFP)
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/mm/mmu.c?h=v4.14#n342
>> >
>> > PGALLOC_GFP has GFP_KERNEL which inturn has __GFP_RECLAIM
>> >
>> > #define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO)
>> > #define GFP_KERNEL  (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
>> >
>> > Now, prepare_alloc_pages() called by __alloc_pages_nodemask checks for
>> >
>> > might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
>> >
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_alloc.c?h=v4.14#n4150
>> >
>> > and then BUG()
>>
>> Well spotted, thanks for reporting the problem. One possible solution
>> would be to revert back to building the updated page tables on a copy
>> pgdir (as it was done in v1 of this patchset) and then replacing swapper
>> atomically with stop_machine.
>>
>> Actually, I am not sure if stop_machine is strictly needed,
>> if we modify the swapper pgdir live: for example, in x86_64
>> kernel_physical_mapping_init, atomicity is ensured by spin-locking on
>> init_mm.page_table_lock.
>> https://elixir.free-electrons.com/linux/v4.14/source/arch/x86/mm/init_64.c#L684
>> I'll spend some time investigating whoever else could be working
>> concurrently on the swapper pgdir.
>>
>> Any suggestion or pointer is very welcome.
>
> Hi Andrea, Arun,
>
> Alternative approach could be implementing pgd_pgtable_alloc_nosleep() and
> pointing this to hotplug_paging(). Subsequently, it could use different flags,
> eg:
>
> #define PGALLOC_GFP_NORECLAIM   (__GFP_IO | __GFP_FS | __GFP_NOTRACK | 
> __GFP_ZERO)

This solves the problem with __get_free_page.

But pgd_pgtable_alloc() ->  pgtable_page_ctor() -> ptlock_alloc() and
then kmem_cache_alloc(page_ptl_cachep, GFP_KERNEL)
Same BUG again.

Regards,
Arun

>
> Is this unefficient approach in any way?
> Do we like the fact that the memory-attaching thread can go to sleep?
>
> BR,
>
>>
>> Thanks,
>> Andrea
>>
>> > I was testing on 4.4 kernel, but cross checked with 4.14 as well.
>> >
>> > Regards,
>> > Arun
>> >
>> >
>> > > +
>> > > +   return 0;
>> > > +}
>> > > +
>> > > +inline void hotplug_paging(phys_addr_t start, phys_addr_t size)
>> > > +{
>> > > +   struct mem_range section = {
>> > > +   .base = start,
>> > > +   .size = size,
>> > > +   };
>> > > +
>> > > +   stop_machine(__hotplug_paging, , NULL);
>> > > +}
>> > > +#endif /* CONFIG_MEMORY_HOTPLUG */
>> > > +
>> > >  /*
>> > >   * Check whether a kernel address is valid (derived from arch/x86/).
>> > >   */
>> > > --
>> > > 2.7.4
>> > >
>> >
>>
>
> --
> Maciej Bielski


Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64

2017-11-24 Thread Maciej Bielski
On Fri, Nov 24, 2017 at 09:42:33AM +, Andrea Reale wrote:
> Hi Arun,
>
>
> On Fri 24 Nov 2017, 11:25, Arun KS wrote:
> > On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielski
> >  wrote:
> >> [ ...]
> > > Introduces memory hotplug functionality (hot-add) for arm64.
> > > @@ -615,6 +616,44 @@ void __init paging_init(void)
> > >   SWAPPER_DIR_SIZE - PAGE_SIZE);
> > >  }
> > >
> > > +#ifdef CONFIG_MEMORY_HOTPLUG
> > > +
> > > +/*
> > > + * hotplug_paging() is used by memory hotplug to build new page tables
> > > + * for hot added memory.
> > > + */
> > > +
> > > +struct mem_range {
> > > +   phys_addr_t base;
> > > +   phys_addr_t size;
> > > +};
> > > +
> > > +static int __hotplug_paging(void *data)
> > > +{
> > > +   int flags = 0;
> > > +   struct mem_range *section = data;
> > > +
> > > +   if (debug_pagealloc_enabled())
> > > +   flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> > > +
> > > +   __create_pgd_mapping(swapper_pg_dir, section->base,
> > > +   __phys_to_virt(section->base), section->size,
> > > +   PAGE_KERNEL, pgd_pgtable_alloc, flags);
> >
> > Hello Andrea,
> >
> > __hotplug_paging runs on stop_machine context.
> > cpu stop callbacks must not sleep.
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/stop_machine.c?h=v4.14#n479
> >
> > __create_pgd_mapping uses pgd_pgtable_alloc. which does
> > __get_free_page(PGALLOC_GFP)
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/mm/mmu.c?h=v4.14#n342
> >
> > PGALLOC_GFP has GFP_KERNEL which inturn has __GFP_RECLAIM
> >
> > #define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO)
> > #define GFP_KERNEL  (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
> >
> > Now, prepare_alloc_pages() called by __alloc_pages_nodemask checks for
> >
> > might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_alloc.c?h=v4.14#n4150
> >
> > and then BUG()
>
> Well spotted, thanks for reporting the problem. One possible solution
> would be to revert back to building the updated page tables on a copy
> pgdir (as it was done in v1 of this patchset) and then replacing swapper
> atomically with stop_machine.
>
> Actually, I am not sure if stop_machine is strictly needed,
> if we modify the swapper pgdir live: for example, in x86_64
> kernel_physical_mapping_init, atomicity is ensured by spin-locking on
> init_mm.page_table_lock.
> https://elixir.free-electrons.com/linux/v4.14/source/arch/x86/mm/init_64.c#L684
> I'll spend some time investigating whoever else could be working
> concurrently on the swapper pgdir.
>
> Any suggestion or pointer is very welcome.

Hi Andrea, Arun,

Alternative approach could be implementing pgd_pgtable_alloc_nosleep() and
pointing this to hotplug_paging(). Subsequently, it could use different flags,
eg:

#define PGALLOC_GFP_NORECLAIM   (__GFP_IO | __GFP_FS | __GFP_NOTRACK | 
__GFP_ZERO)

Is this unefficient approach in any way?
Do we like the fact that the memory-attaching thread can go to sleep?

BR,

>
> Thanks,
> Andrea
>
> > I was testing on 4.4 kernel, but cross checked with 4.14 as well.
> >
> > Regards,
> > Arun
> >
> >
> > > +
> > > +   return 0;
> > > +}
> > > +
> > > +inline void hotplug_paging(phys_addr_t start, phys_addr_t size)
> > > +{
> > > +   struct mem_range section = {
> > > +   .base = start,
> > > +   .size = size,
> > > +   };
> > > +
> > > +   stop_machine(__hotplug_paging, , NULL);
> > > +}
> > > +#endif /* CONFIG_MEMORY_HOTPLUG */
> > > +
> > >  /*
> > >   * Check whether a kernel address is valid (derived from arch/x86/).
> > >   */
> > > --
> > > 2.7.4
> > >
> >
>

--
Maciej Bielski


Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64

2017-11-24 Thread Maciej Bielski
On Fri, Nov 24, 2017 at 09:42:33AM +, Andrea Reale wrote:
> Hi Arun,
>
>
> On Fri 24 Nov 2017, 11:25, Arun KS wrote:
> > On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielski
> >  wrote:
> >> [ ...]
> > > Introduces memory hotplug functionality (hot-add) for arm64.
> > > @@ -615,6 +616,44 @@ void __init paging_init(void)
> > >   SWAPPER_DIR_SIZE - PAGE_SIZE);
> > >  }
> > >
> > > +#ifdef CONFIG_MEMORY_HOTPLUG
> > > +
> > > +/*
> > > + * hotplug_paging() is used by memory hotplug to build new page tables
> > > + * for hot added memory.
> > > + */
> > > +
> > > +struct mem_range {
> > > +   phys_addr_t base;
> > > +   phys_addr_t size;
> > > +};
> > > +
> > > +static int __hotplug_paging(void *data)
> > > +{
> > > +   int flags = 0;
> > > +   struct mem_range *section = data;
> > > +
> > > +   if (debug_pagealloc_enabled())
> > > +   flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> > > +
> > > +   __create_pgd_mapping(swapper_pg_dir, section->base,
> > > +   __phys_to_virt(section->base), section->size,
> > > +   PAGE_KERNEL, pgd_pgtable_alloc, flags);
> >
> > Hello Andrea,
> >
> > __hotplug_paging runs on stop_machine context.
> > cpu stop callbacks must not sleep.
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/stop_machine.c?h=v4.14#n479
> >
> > __create_pgd_mapping uses pgd_pgtable_alloc. which does
> > __get_free_page(PGALLOC_GFP)
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/mm/mmu.c?h=v4.14#n342
> >
> > PGALLOC_GFP has GFP_KERNEL which inturn has __GFP_RECLAIM
> >
> > #define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO)
> > #define GFP_KERNEL  (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
> >
> > Now, prepare_alloc_pages() called by __alloc_pages_nodemask checks for
> >
> > might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_alloc.c?h=v4.14#n4150
> >
> > and then BUG()
>
> Well spotted, thanks for reporting the problem. One possible solution
> would be to revert back to building the updated page tables on a copy
> pgdir (as it was done in v1 of this patchset) and then replacing swapper
> atomically with stop_machine.
>
> Actually, I am not sure if stop_machine is strictly needed,
> if we modify the swapper pgdir live: for example, in x86_64
> kernel_physical_mapping_init, atomicity is ensured by spin-locking on
> init_mm.page_table_lock.
> https://elixir.free-electrons.com/linux/v4.14/source/arch/x86/mm/init_64.c#L684
> I'll spend some time investigating whoever else could be working
> concurrently on the swapper pgdir.
>
> Any suggestion or pointer is very welcome.

Hi Andrea, Arun,

Alternative approach could be implementing pgd_pgtable_alloc_nosleep() and
pointing this to hotplug_paging(). Subsequently, it could use different flags,
eg:

#define PGALLOC_GFP_NORECLAIM   (__GFP_IO | __GFP_FS | __GFP_NOTRACK | 
__GFP_ZERO)

Is this unefficient approach in any way?
Do we like the fact that the memory-attaching thread can go to sleep?

BR,

>
> Thanks,
> Andrea
>
> > I was testing on 4.4 kernel, but cross checked with 4.14 as well.
> >
> > Regards,
> > Arun
> >
> >
> > > +
> > > +   return 0;
> > > +}
> > > +
> > > +inline void hotplug_paging(phys_addr_t start, phys_addr_t size)
> > > +{
> > > +   struct mem_range section = {
> > > +   .base = start,
> > > +   .size = size,
> > > +   };
> > > +
> > > +   stop_machine(__hotplug_paging, , NULL);
> > > +}
> > > +#endif /* CONFIG_MEMORY_HOTPLUG */
> > > +
> > >  /*
> > >   * Check whether a kernel address is valid (derived from arch/x86/).
> > >   */
> > > --
> > > 2.7.4
> > >
> >
>

--
Maciej Bielski


Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64

2017-11-24 Thread Andrea Reale
Hi Arun,


On Fri 24 Nov 2017, 11:25, Arun KS wrote:
> On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielski
>  wrote:
>> [ ...]
> > Introduces memory hotplug functionality (hot-add) for arm64.
> > @@ -615,6 +616,44 @@ void __init paging_init(void)
> >   SWAPPER_DIR_SIZE - PAGE_SIZE);
> >  }
> >
> > +#ifdef CONFIG_MEMORY_HOTPLUG
> > +
> > +/*
> > + * hotplug_paging() is used by memory hotplug to build new page tables
> > + * for hot added memory.
> > + */
> > +
> > +struct mem_range {
> > +   phys_addr_t base;
> > +   phys_addr_t size;
> > +};
> > +
> > +static int __hotplug_paging(void *data)
> > +{
> > +   int flags = 0;
> > +   struct mem_range *section = data;
> > +
> > +   if (debug_pagealloc_enabled())
> > +   flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> > +
> > +   __create_pgd_mapping(swapper_pg_dir, section->base,
> > +   __phys_to_virt(section->base), section->size,
> > +   PAGE_KERNEL, pgd_pgtable_alloc, flags);
> 
> Hello Andrea,
> 
> __hotplug_paging runs on stop_machine context.
> cpu stop callbacks must not sleep.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/stop_machine.c?h=v4.14#n479
> 
> __create_pgd_mapping uses pgd_pgtable_alloc. which does
> __get_free_page(PGALLOC_GFP)
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/mm/mmu.c?h=v4.14#n342
> 
> PGALLOC_GFP has GFP_KERNEL which inturn has __GFP_RECLAIM
> 
> #define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO)
> #define GFP_KERNEL  (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
> 
> Now, prepare_alloc_pages() called by __alloc_pages_nodemask checks for
> 
> might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_alloc.c?h=v4.14#n4150
> 
> and then BUG()

Well spotted, thanks for reporting the problem. One possible solution
would be to revert back to building the updated page tables on a copy
pgdir (as it was done in v1 of this patchset) and then replacing swapper
atomically with stop_machine.

Actually, I am not sure if stop_machine is strictly needed,
if we modify the swapper pgdir live: for example, in x86_64
kernel_physical_mapping_init, atomicity is ensured by spin-locking on
init_mm.page_table_lock.
https://elixir.free-electrons.com/linux/v4.14/source/arch/x86/mm/init_64.c#L684
I'll spend some time investigating whoever else could be working
concurrently on the swapper pgdir.

Any suggestion or pointer is very welcome.

Thanks,
Andrea

> I was testing on 4.4 kernel, but cross checked with 4.14 as well.
> 
> Regards,
> Arun
> 
> 
> > +
> > +   return 0;
> > +}
> > +
> > +inline void hotplug_paging(phys_addr_t start, phys_addr_t size)
> > +{
> > +   struct mem_range section = {
> > +   .base = start,
> > +   .size = size,
> > +   };
> > +
> > +   stop_machine(__hotplug_paging, , NULL);
> > +}
> > +#endif /* CONFIG_MEMORY_HOTPLUG */
> > +
> >  /*
> >   * Check whether a kernel address is valid (derived from arch/x86/).
> >   */
> > --
> > 2.7.4
> >
> 



Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64

2017-11-24 Thread Andrea Reale
Hi Arun,


On Fri 24 Nov 2017, 11:25, Arun KS wrote:
> On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielski
>  wrote:
>> [ ...]
> > Introduces memory hotplug functionality (hot-add) for arm64.
> > @@ -615,6 +616,44 @@ void __init paging_init(void)
> >   SWAPPER_DIR_SIZE - PAGE_SIZE);
> >  }
> >
> > +#ifdef CONFIG_MEMORY_HOTPLUG
> > +
> > +/*
> > + * hotplug_paging() is used by memory hotplug to build new page tables
> > + * for hot added memory.
> > + */
> > +
> > +struct mem_range {
> > +   phys_addr_t base;
> > +   phys_addr_t size;
> > +};
> > +
> > +static int __hotplug_paging(void *data)
> > +{
> > +   int flags = 0;
> > +   struct mem_range *section = data;
> > +
> > +   if (debug_pagealloc_enabled())
> > +   flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
> > +
> > +   __create_pgd_mapping(swapper_pg_dir, section->base,
> > +   __phys_to_virt(section->base), section->size,
> > +   PAGE_KERNEL, pgd_pgtable_alloc, flags);
> 
> Hello Andrea,
> 
> __hotplug_paging runs on stop_machine context.
> cpu stop callbacks must not sleep.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/stop_machine.c?h=v4.14#n479
> 
> __create_pgd_mapping uses pgd_pgtable_alloc. which does
> __get_free_page(PGALLOC_GFP)
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/mm/mmu.c?h=v4.14#n342
> 
> PGALLOC_GFP has GFP_KERNEL which inturn has __GFP_RECLAIM
> 
> #define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO)
> #define GFP_KERNEL  (__GFP_RECLAIM | __GFP_IO | __GFP_FS)
> 
> Now, prepare_alloc_pages() called by __alloc_pages_nodemask checks for
> 
> might_sleep_if(gfp_mask & __GFP_DIRECT_RECLAIM);
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/page_alloc.c?h=v4.14#n4150
> 
> and then BUG()

Well spotted, thanks for reporting the problem. One possible solution
would be to revert back to building the updated page tables on a copy
pgdir (as it was done in v1 of this patchset) and then replacing swapper
atomically with stop_machine.

Actually, I am not sure if stop_machine is strictly needed,
if we modify the swapper pgdir live: for example, in x86_64
kernel_physical_mapping_init, atomicity is ensured by spin-locking on
init_mm.page_table_lock.
https://elixir.free-electrons.com/linux/v4.14/source/arch/x86/mm/init_64.c#L684
I'll spend some time investigating whoever else could be working
concurrently on the swapper pgdir.

Any suggestion or pointer is very welcome.

Thanks,
Andrea

> I was testing on 4.4 kernel, but cross checked with 4.14 as well.
> 
> Regards,
> Arun
> 
> 
> > +
> > +   return 0;
> > +}
> > +
> > +inline void hotplug_paging(phys_addr_t start, phys_addr_t size)
> > +{
> > +   struct mem_range section = {
> > +   .base = start,
> > +   .size = size,
> > +   };
> > +
> > +   stop_machine(__hotplug_paging, , NULL);
> > +}
> > +#endif /* CONFIG_MEMORY_HOTPLUG */
> > +
> >  /*
> >   * Check whether a kernel address is valid (derived from arch/x86/).
> >   */
> > --
> > 2.7.4
> >
> 



Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64

2017-11-23 Thread Arun KS
On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielski
 wrote:
> Introduces memory hotplug functionality (hot-add) for arm64.
>
> Changes v1->v2:
> - swapper pgtable updated in place on hot add, avoiding unnecessary copy:
>   all changes are additive and non destructive.
>
> - stop_machine used to updated swapper on hot add, avoiding races
>
> - checking if pagealloc is under debug to stay coherent with mem_map
>
> Signed-off-by: Maciej Bielski 
> Signed-off-by: Andrea Reale 
> ---
>  arch/arm64/Kconfig   | 12 ++
>  arch/arm64/configs/defconfig |  1 +
>  arch/arm64/include/asm/mmu.h |  3 ++
>  arch/arm64/mm/init.c | 87 
> 
>  arch/arm64/mm/mmu.c  | 39 
>  5 files changed, 142 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 0df64a6..c736bba 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -641,6 +641,14 @@ config HOTPLUG_CPU
>   Say Y here to experiment with turning CPUs off and on.  CPUs
>   can be controlled through /sys/devices/system/cpu.
>
> +config ARCH_HAS_ADD_PAGES
> +   def_bool y
> +   depends on ARCH_ENABLE_MEMORY_HOTPLUG
> +
> +config ARCH_ENABLE_MEMORY_HOTPLUG
> +   def_bool y
> +depends on !NUMA
> +
>  # Common NUMA Features
>  config NUMA
> bool "Numa Memory Allocation and Scheduler Support"
> @@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE
>
>  source "mm/Kconfig"
>
> +config ARCH_MEMORY_PROBE
> +   def_bool y
> +   depends on MEMORY_HOTPLUG
> +
>  config SECCOMP
> bool "Enable seccomp to safely compute untrusted bytecode"
> ---help---
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index 34480e9..5fc5656 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y
>  CONFIG_SCHED_MC=y
>  CONFIG_NUMA=y
>  CONFIG_PREEMPT=y
> +CONFIG_MEMORY_HOTPLUG=y
>  CONFIG_KSM=y
>  CONFIG_TRANSPARENT_HUGEPAGE=y
>  CONFIG_CMA=y
> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> index 0d34bf0..2b3fa4d 100644
> --- a/arch/arm64/include/asm/mmu.h
> +++ b/arch/arm64/include/asm/mmu.h
> @@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, 
> phys_addr_t phys,
>pgprot_t prot, bool page_mappings_only);
>  extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
>  extern void mark_linear_text_alias_ro(void);
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +extern void hotplug_paging(phys_addr_t start, phys_addr_t size);
> +#endif
>
>  #endif
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 5960bef..e96e7d3 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -722,3 +722,90 @@ static int __init register_mem_limit_dumper(void)
> return 0;
>  }
>  __initcall(register_mem_limit_dumper);
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +int add_pages(int nid, unsigned long start_pfn,
> +   unsigned long nr_pages, bool want_memblock)
> +{
> +   int ret;
> +   u64 start_addr = start_pfn << PAGE_SHIFT;
> +   /*
> +* Mark the first page in the range as unusable. This is needed
> +* because __add_section (within __add_pages) wants pfn_valid
> +* of it to be false, and in arm64 pfn falid is implemented by
> +* just checking at the nomap flag for existing blocks.
> +*
> +* A small trick here is that __add_section() requires only
> +* phys_start_pfn (that is the first pfn of a section) to be
> +* invalid. Regardless of whether it was assumed (by the function
> +* author) that all pfns within a section are either all valid
> +* or all invalid, it allows to avoid looping twice (once here,
> +* second when memblock_clear_nomap() is called) through all
> +* pfns of the section and modify only one pfn. Thanks to that,
> +* further, in __add_zone() only this very first pfn is skipped
> +* and corresponding page is not flagged reserved. Therefore it
> +* is enough to correct this setup only for it.
> +*
> +* When arch_add_memory() returns the walk_memory_range() function
> +* is called and passed with online_memory_block() callback,
> +* which execution finally reaches the memory_block_action()
> +* function, where also only the first pfn of a memory block is
> +* checked to be reserved. Above, it was first pfn of a section,
> +* here it is a block but
> +* (drivers/base/memory.c):
> +* sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
> +* (include/linux/memory.h):
> +* #define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS)
> +* so we can consider block and section equivalently
> +*/
> +   

Re: [PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64

2017-11-23 Thread Arun KS
On Thu, Nov 23, 2017 at 4:43 PM, Maciej Bielski
 wrote:
> Introduces memory hotplug functionality (hot-add) for arm64.
>
> Changes v1->v2:
> - swapper pgtable updated in place on hot add, avoiding unnecessary copy:
>   all changes are additive and non destructive.
>
> - stop_machine used to updated swapper on hot add, avoiding races
>
> - checking if pagealloc is under debug to stay coherent with mem_map
>
> Signed-off-by: Maciej Bielski 
> Signed-off-by: Andrea Reale 
> ---
>  arch/arm64/Kconfig   | 12 ++
>  arch/arm64/configs/defconfig |  1 +
>  arch/arm64/include/asm/mmu.h |  3 ++
>  arch/arm64/mm/init.c | 87 
> 
>  arch/arm64/mm/mmu.c  | 39 
>  5 files changed, 142 insertions(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 0df64a6..c736bba 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -641,6 +641,14 @@ config HOTPLUG_CPU
>   Say Y here to experiment with turning CPUs off and on.  CPUs
>   can be controlled through /sys/devices/system/cpu.
>
> +config ARCH_HAS_ADD_PAGES
> +   def_bool y
> +   depends on ARCH_ENABLE_MEMORY_HOTPLUG
> +
> +config ARCH_ENABLE_MEMORY_HOTPLUG
> +   def_bool y
> +depends on !NUMA
> +
>  # Common NUMA Features
>  config NUMA
> bool "Numa Memory Allocation and Scheduler Support"
> @@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE
>
>  source "mm/Kconfig"
>
> +config ARCH_MEMORY_PROBE
> +   def_bool y
> +   depends on MEMORY_HOTPLUG
> +
>  config SECCOMP
> bool "Enable seccomp to safely compute untrusted bytecode"
> ---help---
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index 34480e9..5fc5656 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y
>  CONFIG_SCHED_MC=y
>  CONFIG_NUMA=y
>  CONFIG_PREEMPT=y
> +CONFIG_MEMORY_HOTPLUG=y
>  CONFIG_KSM=y
>  CONFIG_TRANSPARENT_HUGEPAGE=y
>  CONFIG_CMA=y
> diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
> index 0d34bf0..2b3fa4d 100644
> --- a/arch/arm64/include/asm/mmu.h
> +++ b/arch/arm64/include/asm/mmu.h
> @@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, 
> phys_addr_t phys,
>pgprot_t prot, bool page_mappings_only);
>  extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
>  extern void mark_linear_text_alias_ro(void);
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +extern void hotplug_paging(phys_addr_t start, phys_addr_t size);
> +#endif
>
>  #endif
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 5960bef..e96e7d3 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -722,3 +722,90 @@ static int __init register_mem_limit_dumper(void)
> return 0;
>  }
>  __initcall(register_mem_limit_dumper);
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +int add_pages(int nid, unsigned long start_pfn,
> +   unsigned long nr_pages, bool want_memblock)
> +{
> +   int ret;
> +   u64 start_addr = start_pfn << PAGE_SHIFT;
> +   /*
> +* Mark the first page in the range as unusable. This is needed
> +* because __add_section (within __add_pages) wants pfn_valid
> +* of it to be false, and in arm64 pfn falid is implemented by
> +* just checking at the nomap flag for existing blocks.
> +*
> +* A small trick here is that __add_section() requires only
> +* phys_start_pfn (that is the first pfn of a section) to be
> +* invalid. Regardless of whether it was assumed (by the function
> +* author) that all pfns within a section are either all valid
> +* or all invalid, it allows to avoid looping twice (once here,
> +* second when memblock_clear_nomap() is called) through all
> +* pfns of the section and modify only one pfn. Thanks to that,
> +* further, in __add_zone() only this very first pfn is skipped
> +* and corresponding page is not flagged reserved. Therefore it
> +* is enough to correct this setup only for it.
> +*
> +* When arch_add_memory() returns the walk_memory_range() function
> +* is called and passed with online_memory_block() callback,
> +* which execution finally reaches the memory_block_action()
> +* function, where also only the first pfn of a memory block is
> +* checked to be reserved. Above, it was first pfn of a section,
> +* here it is a block but
> +* (drivers/base/memory.c):
> +* sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
> +* (include/linux/memory.h):
> +* #define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS)
> +* so we can consider block and section equivalently
> +*/
> +   memblock_mark_nomap(start_addr, 1< +   ret = __add_pages(nid, start_pfn, nr_pages, 

[PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64

2017-11-23 Thread Maciej Bielski
Introduces memory hotplug functionality (hot-add) for arm64.

Changes v1->v2:
- swapper pgtable updated in place on hot add, avoiding unnecessary copy:
  all changes are additive and non destructive.

- stop_machine used to updated swapper on hot add, avoiding races

- checking if pagealloc is under debug to stay coherent with mem_map

Signed-off-by: Maciej Bielski 
Signed-off-by: Andrea Reale 
---
 arch/arm64/Kconfig   | 12 ++
 arch/arm64/configs/defconfig |  1 +
 arch/arm64/include/asm/mmu.h |  3 ++
 arch/arm64/mm/init.c | 87 
 arch/arm64/mm/mmu.c  | 39 
 5 files changed, 142 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6..c736bba 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -641,6 +641,14 @@ config HOTPLUG_CPU
  Say Y here to experiment with turning CPUs off and on.  CPUs
  can be controlled through /sys/devices/system/cpu.
 
+config ARCH_HAS_ADD_PAGES
+   def_bool y
+   depends on ARCH_ENABLE_MEMORY_HOTPLUG
+
+config ARCH_ENABLE_MEMORY_HOTPLUG
+   def_bool y
+depends on !NUMA
+
 # Common NUMA Features
 config NUMA
bool "Numa Memory Allocation and Scheduler Support"
@@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE
 
 source "mm/Kconfig"
 
+config ARCH_MEMORY_PROBE
+   def_bool y
+   depends on MEMORY_HOTPLUG
+
 config SECCOMP
bool "Enable seccomp to safely compute untrusted bytecode"
---help---
diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 34480e9..5fc5656 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y
 CONFIG_SCHED_MC=y
 CONFIG_NUMA=y
 CONFIG_PREEMPT=y
+CONFIG_MEMORY_HOTPLUG=y
 CONFIG_KSM=y
 CONFIG_TRANSPARENT_HUGEPAGE=y
 CONFIG_CMA=y
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 0d34bf0..2b3fa4d 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, 
phys_addr_t phys,
   pgprot_t prot, bool page_mappings_only);
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
 extern void mark_linear_text_alias_ro(void);
+#ifdef CONFIG_MEMORY_HOTPLUG
+extern void hotplug_paging(phys_addr_t start, phys_addr_t size);
+#endif
 
 #endif
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 5960bef..e96e7d3 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -722,3 +722,90 @@ static int __init register_mem_limit_dumper(void)
return 0;
 }
 __initcall(register_mem_limit_dumper);
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+int add_pages(int nid, unsigned long start_pfn,
+   unsigned long nr_pages, bool want_memblock)
+{
+   int ret;
+   u64 start_addr = start_pfn << PAGE_SHIFT;
+   /*
+* Mark the first page in the range as unusable. This is needed
+* because __add_section (within __add_pages) wants pfn_valid
+* of it to be false, and in arm64 pfn falid is implemented by
+* just checking at the nomap flag for existing blocks.
+*
+* A small trick here is that __add_section() requires only
+* phys_start_pfn (that is the first pfn of a section) to be
+* invalid. Regardless of whether it was assumed (by the function
+* author) that all pfns within a section are either all valid
+* or all invalid, it allows to avoid looping twice (once here,
+* second when memblock_clear_nomap() is called) through all
+* pfns of the section and modify only one pfn. Thanks to that,
+* further, in __add_zone() only this very first pfn is skipped
+* and corresponding page is not flagged reserved. Therefore it
+* is enough to correct this setup only for it.
+*
+* When arch_add_memory() returns the walk_memory_range() function
+* is called and passed with online_memory_block() callback,
+* which execution finally reaches the memory_block_action()
+* function, where also only the first pfn of a memory block is
+* checked to be reserved. Above, it was first pfn of a section,
+* here it is a block but
+* (drivers/base/memory.c):
+* sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
+* (include/linux/memory.h):
+* #define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS)
+* so we can consider block and section equivalently
+*/
+   memblock_mark_nomap(start_addr, 1<

[PATCH v2 1/5] mm: memory_hotplug: Memory hotplug (add) support for arm64

2017-11-23 Thread Maciej Bielski
Introduces memory hotplug functionality (hot-add) for arm64.

Changes v1->v2:
- swapper pgtable updated in place on hot add, avoiding unnecessary copy:
  all changes are additive and non destructive.

- stop_machine used to updated swapper on hot add, avoiding races

- checking if pagealloc is under debug to stay coherent with mem_map

Signed-off-by: Maciej Bielski 
Signed-off-by: Andrea Reale 
---
 arch/arm64/Kconfig   | 12 ++
 arch/arm64/configs/defconfig |  1 +
 arch/arm64/include/asm/mmu.h |  3 ++
 arch/arm64/mm/init.c | 87 
 arch/arm64/mm/mmu.c  | 39 
 5 files changed, 142 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6..c736bba 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -641,6 +641,14 @@ config HOTPLUG_CPU
  Say Y here to experiment with turning CPUs off and on.  CPUs
  can be controlled through /sys/devices/system/cpu.
 
+config ARCH_HAS_ADD_PAGES
+   def_bool y
+   depends on ARCH_ENABLE_MEMORY_HOTPLUG
+
+config ARCH_ENABLE_MEMORY_HOTPLUG
+   def_bool y
+depends on !NUMA
+
 # Common NUMA Features
 config NUMA
bool "Numa Memory Allocation and Scheduler Support"
@@ -715,6 +723,10 @@ config ARCH_HAS_CACHE_LINE_SIZE
 
 source "mm/Kconfig"
 
+config ARCH_MEMORY_PROBE
+   def_bool y
+   depends on MEMORY_HOTPLUG
+
 config SECCOMP
bool "Enable seccomp to safely compute untrusted bytecode"
---help---
diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 34480e9..5fc5656 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -80,6 +80,7 @@ CONFIG_ARM64_VA_BITS_48=y
 CONFIG_SCHED_MC=y
 CONFIG_NUMA=y
 CONFIG_PREEMPT=y
+CONFIG_MEMORY_HOTPLUG=y
 CONFIG_KSM=y
 CONFIG_TRANSPARENT_HUGEPAGE=y
 CONFIG_CMA=y
diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 0d34bf0..2b3fa4d 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -40,5 +40,8 @@ extern void create_pgd_mapping(struct mm_struct *mm, 
phys_addr_t phys,
   pgprot_t prot, bool page_mappings_only);
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
 extern void mark_linear_text_alias_ro(void);
+#ifdef CONFIG_MEMORY_HOTPLUG
+extern void hotplug_paging(phys_addr_t start, phys_addr_t size);
+#endif
 
 #endif
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 5960bef..e96e7d3 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -722,3 +722,90 @@ static int __init register_mem_limit_dumper(void)
return 0;
 }
 __initcall(register_mem_limit_dumper);
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+int add_pages(int nid, unsigned long start_pfn,
+   unsigned long nr_pages, bool want_memblock)
+{
+   int ret;
+   u64 start_addr = start_pfn << PAGE_SHIFT;
+   /*
+* Mark the first page in the range as unusable. This is needed
+* because __add_section (within __add_pages) wants pfn_valid
+* of it to be false, and in arm64 pfn falid is implemented by
+* just checking at the nomap flag for existing blocks.
+*
+* A small trick here is that __add_section() requires only
+* phys_start_pfn (that is the first pfn of a section) to be
+* invalid. Regardless of whether it was assumed (by the function
+* author) that all pfns within a section are either all valid
+* or all invalid, it allows to avoid looping twice (once here,
+* second when memblock_clear_nomap() is called) through all
+* pfns of the section and modify only one pfn. Thanks to that,
+* further, in __add_zone() only this very first pfn is skipped
+* and corresponding page is not flagged reserved. Therefore it
+* is enough to correct this setup only for it.
+*
+* When arch_add_memory() returns the walk_memory_range() function
+* is called and passed with online_memory_block() callback,
+* which execution finally reaches the memory_block_action()
+* function, where also only the first pfn of a memory block is
+* checked to be reserved. Above, it was first pfn of a section,
+* here it is a block but
+* (drivers/base/memory.c):
+* sections_per_block = block_sz / MIN_MEMORY_BLOCK_SIZE;
+* (include/linux/memory.h):
+* #define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS)
+* so we can consider block and section equivalently
+*/
+   memblock_mark_nomap(start_addr, 1<> PAGE_SHIFT;
+   unsigned long nr_pages = size >> PAGE_SHIFT;
+   unsigned long end_pfn = start_pfn + nr_pages;
+   unsigned long max_sparsemem_pfn = 1UL << (MAX_PHYSMEM_BITS-PAGE_SHIFT);
+
+   if (end_pfn > max_sparsemem_pfn) {
+   pr_err("end_pfn too big");
+   return -1;
+   }
+