Re: [PATCH v2 3/5] memblock: allow to specify flags with memblock_add_node()

2021-10-04 Thread Shahab Vahedi
On 10/4/21 11:36 AM, David Hildenbrand wrote:
> We want to specify flags when hotplugging memory. Let's prepare to pass
> flags to memblock_add_node() by adjusting all existing users.
> 
> Note that when hotplugging memory the system is already up and running
> and we might have concurrent memblock users: for example, while we're
> hotplugging memory, kexec_file code might search for suitable memory
> regions to place kexec images. It's important to add the memory directly
> to memblock via a single call with the right flags, instead of adding the
> memory first and apply flags later: otherwise, concurrent memblock users
> might temporarily stumble over memblocks with wrong flags, which will be
> important in a follow-up patch that introduces a new flag to properly
> handle add_memory_driver_managed().
> 
> Acked-by: Geert Uytterhoeven 
> Acked-by: Heiko Carstens 
> Signed-off-by: David Hildenbrand 
> ---
>  arch/arc/mm/init.c   | 4 ++--
>  arch/ia64/mm/contig.c| 2 +-
>  arch/ia64/mm/init.c  | 2 +-
>  arch/m68k/mm/mcfmmu.c| 3 ++-
>  arch/m68k/mm/motorola.c  | 6 --
>  arch/mips/loongson64/init.c  | 4 +++-
>  arch/mips/sgi-ip27/ip27-memory.c | 3 ++-
>  arch/s390/kernel/setup.c | 3 ++-
>  include/linux/memblock.h | 3 ++-
>  include/linux/mm.h   | 2 +-
>  mm/memblock.c| 9 +
>  mm/memory_hotplug.c  | 2 +-
>  12 files changed, 26 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/arc/mm/init.c b/arch/arc/mm/init.c
> index 699ecf119641..110eb69e9bee 100644
> --- a/arch/arc/mm/init.c
> +++ b/arch/arc/mm/init.c
> @@ -59,13 +59,13 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 
> size)
>  
>   low_mem_sz = size;
>   in_use = 1;
> - memblock_add_node(base, size, 0);
> + memblock_add_node(base, size, 0, MEMBLOCK_NONE);
>   } else {
>  #ifdef CONFIG_HIGHMEM
>   high_mem_start = base;
>   high_mem_sz = size;
>   in_use = 1;
> - memblock_add_node(base, size, 1);
> + memblock_add_node(base, size, 1, MEMBLOCK_NONE);
>   memblock_reserve(base, size);
>  #endif

arch/arc part: Acked-by: Shahab Vahedi 

-- 
Shahab
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 5/5] mm/memory_hotplug: indicate MEMBLOCK_DRIVER_MANAGED with IORESOURCE_SYSRAM_DRIVER_MANAGED

2021-10-04 Thread David Hildenbrand
Let's communicate driver-managed regions to memblock, to properly
teach kexec_file with CONFIG_ARCH_KEEP_MEMBLOCK to not place images on
these memory regions.

Signed-off-by: David Hildenbrand 
---
 mm/memory_hotplug.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 5f873e7f5b29..6d90818d4ce8 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1357,6 +1357,7 @@ bool mhp_supports_memmap_on_memory(unsigned long size)
 int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
 {
struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
+   enum memblock_flags memblock_flags = MEMBLOCK_NONE;
struct vmem_altmap mhp_altmap = {};
struct memory_group *group = NULL;
u64 start, size;
@@ -1385,7 +1386,9 @@ int __ref add_memory_resource(int nid, struct resource 
*res, mhp_t mhp_flags)
mem_hotplug_begin();
 
if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
-   ret = memblock_add_node(start, size, nid, MEMBLOCK_NONE);
+   if (res->flags & IORESOURCE_SYSRAM_DRIVER_MANAGED)
+   memblock_flags = MEMBLOCK_DRIVER_MANAGED;
+   ret = memblock_add_node(start, size, nid, memblock_flags);
if (ret)
goto error_mem_hotplug_end;
}
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 4/5] memblock: add MEMBLOCK_DRIVER_MANAGED to mimic IORESOURCE_SYSRAM_DRIVER_MANAGED

2021-10-04 Thread David Hildenbrand
Let's add a flag that corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED,
indicating that we're dealing with a memory region that is never
indicated in the firmware-provided memory map, but always detected and
added by a driver.

Similar to MEMBLOCK_HOTPLUG, most infrastructure has to treat such memory
regions like ordinary MEMBLOCK_NONE memory regions -- for example, when
selecting memory regions to add to the vmcore for dumping in the
crashkernel via for_each_mem_range().

However, especially kexec_file is not supposed to select such memblocks via
for_each_free_mem_range() / for_each_free_mem_range_reverse() to place
kexec images, similar to how we handle IORESOURCE_SYSRAM_DRIVER_MANAGED
without CONFIG_ARCH_KEEP_MEMBLOCK.

We'll make sure that memory hotplug code sets the flag where applicable
(IORESOURCE_SYSRAM_DRIVER_MANAGED) next. This prepares architectures
that need CONFIG_ARCH_KEEP_MEMBLOCK, such as arm64, for virtio-mem
support.

Note that kexec *must not* indicate this memory to the second kernel
and *must not* place kexec-images on this memory. Let's add a comment to
kexec_walk_memblock(), documenting how we handle MEMBLOCK_DRIVER_MANAGED
now just like using IORESOURCE_SYSRAM_DRIVER_MANAGED in
locate_mem_hole_callback() for kexec_walk_resources().

Also note that MEMBLOCK_HOTPLUG cannot be reused due to different
semantics:
MEMBLOCK_HOTPLUG: memory is indicated as "System RAM" in the
firmware-provided memory map and added to the system early during
boot; kexec *has to* indicate this memory to the second kernel and
can place kexec-images on this memory. After memory hotunplug,
kexec has to be re-armed. We mostly ignore this flag when
"movable_node" is not set on the kernel command line, because
then we're told to not care about hotunpluggability of such
memory regions.

MEMBLOCK_DRIVER_MANAGED: memory is not indicated as "System RAM" in
the firmware-provided memory map; this memory is always detected
and added to the system by a driver; memory might not actually be
physically hotunpluggable. kexec *must not* indicate this memory to
the second kernel and *must not* place kexec-images on this memory.

Signed-off-by: David Hildenbrand 
---
 include/linux/memblock.h | 16 ++--
 kernel/kexec_file.c  |  5 +
 mm/memblock.c|  4 
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 2bc726e43a1b..b3b29ccf91f3 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -37,12 +37,17 @@ extern unsigned long long max_possible_pfn;
  * @MEMBLOCK_NOMAP: don't add to kernel direct mapping and treat as
  * reserved in the memory map; refer to memblock_mark_nomap() description
  * for further details
+ * @MEMBLOCK_DRIVER_MANAGED: memory region that is always detected and added
+ * via a driver, and never indicated in the firmware-provided memory map as
+ * system RAM. This corresponds to IORESOURCE_SYSRAM_DRIVER_MANAGED in the
+ * kernel resource tree.
  */
 enum memblock_flags {
MEMBLOCK_NONE   = 0x0,  /* No special request */
MEMBLOCK_HOTPLUG= 0x1,  /* hotpluggable region */
MEMBLOCK_MIRROR = 0x2,  /* mirrored region */
MEMBLOCK_NOMAP  = 0x4,  /* don't add to kernel direct mapping */
+   MEMBLOCK_DRIVER_MANAGED = 0x8,  /* always detected via a driver */
 };
 
 /**
@@ -213,7 +218,8 @@ static inline void __next_physmem_range(u64 *idx, struct 
memblock_type *type,
  */
 #define for_each_mem_range(i, p_start, p_end) \
__for_each_mem_range(i, , NULL, NUMA_NO_NODE,   \
-MEMBLOCK_HOTPLUG, p_start, p_end, NULL)
+MEMBLOCK_HOTPLUG | MEMBLOCK_DRIVER_MANAGED, \
+p_start, p_end, NULL)
 
 /**
  * for_each_mem_range_rev - reverse iterate through memblock areas from
@@ -224,7 +230,8 @@ static inline void __next_physmem_range(u64 *idx, struct 
memblock_type *type,
  */
 #define for_each_mem_range_rev(i, p_start, p_end)  \
__for_each_mem_range_rev(i, , NULL, NUMA_NO_NODE, \
-MEMBLOCK_HOTPLUG, p_start, p_end, NULL)
+MEMBLOCK_HOTPLUG | MEMBLOCK_DRIVER_MANAGED,\
+p_start, p_end, NULL)
 
 /**
  * for_each_reserved_mem_range - iterate over all reserved memblock areas
@@ -254,6 +261,11 @@ static inline bool memblock_is_nomap(struct 
memblock_region *m)
return m->flags & MEMBLOCK_NOMAP;
 }
 
+static inline bool memblock_is_driver_managed(struct memblock_region *m)
+{
+   return m->flags & MEMBLOCK_DRIVER_MANAGED;
+}
+
 int memblock_search_pfn_nid(unsigned long pfn, unsigned long *start_pfn,
unsigned long  *end_pfn);
 void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
diff --git 

[PATCH v2 3/5] memblock: allow to specify flags with memblock_add_node()

2021-10-04 Thread David Hildenbrand
We want to specify flags when hotplugging memory. Let's prepare to pass
flags to memblock_add_node() by adjusting all existing users.

Note that when hotplugging memory the system is already up and running
and we might have concurrent memblock users: for example, while we're
hotplugging memory, kexec_file code might search for suitable memory
regions to place kexec images. It's important to add the memory directly
to memblock via a single call with the right flags, instead of adding the
memory first and apply flags later: otherwise, concurrent memblock users
might temporarily stumble over memblocks with wrong flags, which will be
important in a follow-up patch that introduces a new flag to properly
handle add_memory_driver_managed().

Acked-by: Geert Uytterhoeven 
Acked-by: Heiko Carstens 
Signed-off-by: David Hildenbrand 
---
 arch/arc/mm/init.c   | 4 ++--
 arch/ia64/mm/contig.c| 2 +-
 arch/ia64/mm/init.c  | 2 +-
 arch/m68k/mm/mcfmmu.c| 3 ++-
 arch/m68k/mm/motorola.c  | 6 --
 arch/mips/loongson64/init.c  | 4 +++-
 arch/mips/sgi-ip27/ip27-memory.c | 3 ++-
 arch/s390/kernel/setup.c | 3 ++-
 include/linux/memblock.h | 3 ++-
 include/linux/mm.h   | 2 +-
 mm/memblock.c| 9 +
 mm/memory_hotplug.c  | 2 +-
 12 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/arch/arc/mm/init.c b/arch/arc/mm/init.c
index 699ecf119641..110eb69e9bee 100644
--- a/arch/arc/mm/init.c
+++ b/arch/arc/mm/init.c
@@ -59,13 +59,13 @@ void __init early_init_dt_add_memory_arch(u64 base, u64 
size)
 
low_mem_sz = size;
in_use = 1;
-   memblock_add_node(base, size, 0);
+   memblock_add_node(base, size, 0, MEMBLOCK_NONE);
} else {
 #ifdef CONFIG_HIGHMEM
high_mem_start = base;
high_mem_sz = size;
in_use = 1;
-   memblock_add_node(base, size, 1);
+   memblock_add_node(base, size, 1, MEMBLOCK_NONE);
memblock_reserve(base, size);
 #endif
}
diff --git a/arch/ia64/mm/contig.c b/arch/ia64/mm/contig.c
index 42e025cfbd08..24901d809301 100644
--- a/arch/ia64/mm/contig.c
+++ b/arch/ia64/mm/contig.c
@@ -153,7 +153,7 @@ find_memory (void)
efi_memmap_walk(find_max_min_low_pfn, NULL);
max_pfn = max_low_pfn;
 
-   memblock_add_node(0, PFN_PHYS(max_low_pfn), 0);
+   memblock_add_node(0, PFN_PHYS(max_low_pfn), 0, MEMBLOCK_NONE);
 
find_initrd();
 
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index 5c6da8d83c1a..5d165607bf35 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -378,7 +378,7 @@ int __init register_active_ranges(u64 start, u64 len, int 
nid)
 #endif
 
if (start < end)
-   memblock_add_node(__pa(start), end - start, nid);
+   memblock_add_node(__pa(start), end - start, nid, MEMBLOCK_NONE);
return 0;
 }
 
diff --git a/arch/m68k/mm/mcfmmu.c b/arch/m68k/mm/mcfmmu.c
index eac9dde65193..6f1f25125294 100644
--- a/arch/m68k/mm/mcfmmu.c
+++ b/arch/m68k/mm/mcfmmu.c
@@ -174,7 +174,8 @@ void __init cf_bootmem_alloc(void)
m68k_memory[0].addr = _rambase;
m68k_memory[0].size = _ramend - _rambase;
 
-   memblock_add_node(m68k_memory[0].addr, m68k_memory[0].size, 0);
+   memblock_add_node(m68k_memory[0].addr, m68k_memory[0].size, 0,
+ MEMBLOCK_NONE);
 
/* compute total pages in system */
num_pages = PFN_DOWN(_ramend - _rambase);
diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
index 9f3f77785aa7..2b05bb2bac00 100644
--- a/arch/m68k/mm/motorola.c
+++ b/arch/m68k/mm/motorola.c
@@ -410,7 +410,8 @@ void __init paging_init(void)
 
min_addr = m68k_memory[0].addr;
max_addr = min_addr + m68k_memory[0].size;
-   memblock_add_node(m68k_memory[0].addr, m68k_memory[0].size, 0);
+   memblock_add_node(m68k_memory[0].addr, m68k_memory[0].size, 0,
+ MEMBLOCK_NONE);
for (i = 1; i < m68k_num_memory;) {
if (m68k_memory[i].addr < min_addr) {
printk("Ignoring memory chunk at 0x%lx:0x%lx before the 
first chunk\n",
@@ -421,7 +422,8 @@ void __init paging_init(void)
(m68k_num_memory - i) * sizeof(struct 
m68k_mem_info));
continue;
}
-   memblock_add_node(m68k_memory[i].addr, m68k_memory[i].size, i);
+   memblock_add_node(m68k_memory[i].addr, m68k_memory[i].size, i,
+ MEMBLOCK_NONE);
addr = m68k_memory[i].addr + m68k_memory[i].size;
if (addr > max_addr)
max_addr = addr;
diff --git a/arch/mips/loongson64/init.c b/arch/mips/loongson64/init.c
index 76e0a9636a0e..4ac5ba80bbf6 100644
--- a/arch/mips/loongson64/init.c
+++ b/arch/mips/loongson64/init.c
@@ 

[PATCH v2 0/5] mm/memory_hotplug: full support for add_memory_driver_managed() with CONFIG_ARCH_KEEP_MEMBLOCK

2021-10-04 Thread David Hildenbrand
Architectures that require CONFIG_ARCH_KEEP_MEMBLOCK=y, such as arm64,
don't cleanly support add_memory_driver_managed() yet. Most prominently,
kexec_file can still end up placing kexec images on such driver-managed
memory, resulting in undesired behavior, for example, having kexec images
located on memory not part of the firmware-provided memory map.

Teaching kexec to not place images on driver-managed memory is especially
relevant for virtio-mem. Details can be found in commit 7b7b27214bba
("mm/memory_hotplug: introduce add_memory_driver_managed()").

Extend memblock with a new flag and set it from memory hotplug code
when applicable. This is required to fully support virtio-mem on
arm64, making also kexec_file behave like on x86-64.

v1 -> v2:
- "memblock: improve MEMBLOCK_HOTPLUG documentation"
-- Added
- "memblock: add MEMBLOCK_DRIVER_MANAGED to mimic
   IORESOURCE_SYSRAM_DRIVER_MANAGED"
-- Improve documentation of MEMBLOCK_DRIVER_MANAGED
- Refine patch descriptions

Cc: Andrew Morton 
Cc: Mike Rapoport 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Jianyong Wu 
Cc: Aneesh Kumar K.V 
Cc: Vineet Gupta 
Cc: Geert Uytterhoeven 
Cc: Huacai Chen 
Cc: Jiaxun Yang 
Cc: Thomas Bogendoerfer 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: Eric Biederman 
Cc: Arnd Bergmann 
Cc: linux-snps-...@lists.infradead.org
Cc: linux-i...@vger.kernel.org
Cc: linux-m...@lists.linux-m68k.org
Cc: linux-m...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@kvack.org
Cc: kexec@lists.infradead.org

David Hildenbrand (5):
  mm/memory_hotplug: handle memblock_add_node() failures in
add_memory_resource()
  memblock: improve MEMBLOCK_HOTPLUG documentation
  memblock: allow to specify flags with memblock_add_node()
  memblock: add MEMBLOCK_DRIVER_MANAGED to mimic
IORESOURCE_SYSRAM_DRIVER_MANAGED
  mm/memory_hotplug: indicate MEMBLOCK_DRIVER_MANAGED with
IORESOURCE_SYSRAM_DRIVER_MANAGED

 arch/arc/mm/init.c   |  4 ++--
 arch/ia64/mm/contig.c|  2 +-
 arch/ia64/mm/init.c  |  2 +-
 arch/m68k/mm/mcfmmu.c|  3 ++-
 arch/m68k/mm/motorola.c  |  6 --
 arch/mips/loongson64/init.c  |  4 +++-
 arch/mips/sgi-ip27/ip27-memory.c |  3 ++-
 arch/s390/kernel/setup.c |  3 ++-
 include/linux/memblock.h | 25 +
 include/linux/mm.h   |  2 +-
 kernel/kexec_file.c  |  5 +
 mm/memblock.c| 13 +
 mm/memory_hotplug.c  | 11 +--
 13 files changed, 62 insertions(+), 21 deletions(-)


base-commit: 9e1ff307c779ce1f0f810c7ecce3d95bbae40896
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 2/5] memblock: improve MEMBLOCK_HOTPLUG documentation

2021-10-04 Thread David Hildenbrand
The description of MEMBLOCK_HOTPLUG is currently short and consequently
misleading: we're actually dealing with a memory region that might get
hotunplugged later (i.e., the platform+firmware supports it), yet it is
indicated in the firmware-provided memory map as system ram that will just
get used by the system for any purpose when not taking special care. The
firmware marked this memory region as a hot(un)plugged (e.g., hotplugged
before reboot), implying that it might get hotunplugged again later.

Whether we consider this information depends on the "movable_node" kernel
commandline parameter: only with "movable_node" set, we'll try keeping
this memory hotunpluggable, for example, by not serving early allocations
from this memory region and by letting the buddy manage it using the
ZONE_MOVABLE.

Let's make this clearer by extending the documentation.

Note: kexec *has to* indicate this memory to the second kernel. With
"movable_node" set, we don't want to place kexec-images on this memory.
Without "movable_node" set, we don't care and can place kexec-images on
this memory. In both cases, after successful memory hotunplug, kexec has to
be re-armed to update the memory map for the second kernel and to place the
kexec-images somewhere else.

Signed-off-by: David Hildenbrand 
---
 include/linux/memblock.h | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 34de69b3b8ba..4ee8dd2d63a7 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -28,7 +28,11 @@ extern unsigned long long max_possible_pfn;
 /**
  * enum memblock_flags - definition of memory region attributes
  * @MEMBLOCK_NONE: no special request
- * @MEMBLOCK_HOTPLUG: hotpluggable region
+ * @MEMBLOCK_HOTPLUG: memory region indicated in the firmware-provided memory
+ * map during early boot as hot(un)pluggable system RAM (e.g., memory range
+ * that might get hotunplugged later). With "movable_node" set on the kernel
+ * commandline, try keeping this memory region hotunpluggable. Does not apply
+ * to memblocks added ("hotplugged") after early boot.
  * @MEMBLOCK_MIRROR: mirrored region
  * @MEMBLOCK_NOMAP: don't add to kernel direct mapping and treat as
  * reserved in the memory map; refer to memblock_mark_nomap() description
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2 1/5] mm/memory_hotplug: handle memblock_add_node() failures in add_memory_resource()

2021-10-04 Thread David Hildenbrand
If memblock_add_node() fails, we're most probably running out of memory.
While this is unlikely to happen, it can happen and having memory added
without a memblock can be problematic for architectures that use
memblock to detect valid memory. Let's fail in a nice way instead of
silently ignoring the error.

Signed-off-by: David Hildenbrand 
---
 mm/memory_hotplug.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9fd0be32a281..917b3528636d 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1384,8 +1384,11 @@ int __ref add_memory_resource(int nid, struct resource 
*res, mhp_t mhp_flags)
 
mem_hotplug_begin();
 
-   if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
-   memblock_add_node(start, size, nid);
+   if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) {
+   ret = memblock_add_node(start, size, nid);
+   if (ret)
+   goto error_mem_hotplug_end;
+   }
 
ret = __try_online_node(nid, false);
if (ret < 0)
@@ -1458,6 +1461,7 @@ int __ref add_memory_resource(int nid, struct resource 
*res, mhp_t mhp_flags)
rollback_node_hotadd(nid);
if (IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK))
memblock_remove(start, size);
+error_mem_hotplug_end:
mem_hotplug_done();
return ret;
 }
-- 
2.31.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec