Re: [PATCH 1/1] arm64: reduce section size for sparsemem

2021-01-20 Thread Sudarshan Rajagopalan

On 2021-01-11 03:09, Anshuman Khandual wrote:

+ Catalin

Hello Sudershan,

Could you please change the subject line above as follows for
better classifications and clarity.

arm64/sparsemem: Reduce SECTION_SIZE_BITS

On 1/9/21 4:46 AM, Sudarshan Rajagopalan wrote:

Reducing the section size helps reduce wastage of reserved memory
for huge memory holes in sparsemem model. But having a much smaller


There are two distinct benefits of reducing SECTION_SIZE_BITS.

- Improve memory hotplug granularity
- Reduce reserved memory wastage for vmmemmap mappings for sections
  with large memory holes


section size bits could break PMD mappings for vmemmap and wouldn't
accomodate the highest order page for certain page size granule 
configs.


There are constrains in reducing SECTION_SIZE_BIT like

- Should accommodate highest order page for a given config
- Should not break PMD mapping in vmemmap for 4K pages
- Should not consume too many page->flags bits reducing space for other 
info


Both benefits and constraints should be described in the commit message
for folks to understand the rationale clearly at a later point in time.


It is determined that SECTION_SIZE_BITS of 27 (128MB) could be ideal


Probably needs some description how we arrived here.

default value for 4K_PAGES that gives least section size without 
breaking

PMD based vmemmap mappings. For simplicity, 16K_PAGES could follow the
same as 4K_PAGES. And the least SECTION_SIZE_BITS for 64K_PAGES is 29
that could accomodate MAX_ORDER.


Did not see this patch earlier and hence ended up writing yet another 
one.
Here is the draft commit message from that patch, please feel free to 
use
in part or full. But please do include the benefits, the constraints 
and

the rationale for arriving at these figures.

-
memory_block_size_bytes() determines the memory hotplug granularity 
i.e the
amount of memory which can be hot added or hot removed from the 
kernel. The
generic value here being MIN_MEMORY_BLOCK_SIZE (1UL << 
SECTION_SIZE_BITS)

for memory_block_size_bytes() on platforms like arm64 that does
not override.

Current SECTION_SIZE_BITS is 30 i.e 1GB which is large and a 
reduction here
increases memory hotplug granularity, thus improving its agility. A 
reduced
section size also reduces memory wastage in vmemmmap mapping for 
sections

with large memory holes. A section size bits selection must follow.

(MAX_ORDER - 1 + PAGE_SHIFT) <= SECTION_SIZE_BITS

CONFIG_FORCE_MAX_ZONEORDER is always defined on arm64 and just 
following it

would help achieve the smallest section size.

SECTION_SIZE_BITS = (CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT)

SECTION_SIZE_BITS = 22 (11 - 1 + 12) i.e 4MB   for 4K pages
SECTION_SIZE_BITS = 24 (11 - 1 + 14) i.e 16MB  for 16K pages 
without THP
SECTION_SIZE_BITS = 25 (12 - 1 + 14) i.e 32MB  for 16K pages with 
THP
SECTION_SIZE_BITS = 26 (11 - 1 + 16) i.e 64MB  for 64K pages 
without THP
SECTION_SIZE_BITS = 29 (14 - 1 + 16) i.e 512MB for 64K pages with 
THP


But there are other problems. Reducing the section size too much 
would over
populate /sys/devices/system/memory/ and also consume too many 
page->flags
bits in the !vmemmap case. Also section size needs to be multiple 
of 128MB

to have PMD based vmemmap mapping with CONFIG_ARM64_4K_PAGES.

Given these constraints, lets just reduce the section size to 128MB 
for 4K
and 16K base page size configs and to 512MB for 64K base page size 
config.

-



Signed-off-by: Sudarshan Rajagopalan 
Suggested-by: David Hildenbrand 
Cc: Will Deacon 
Cc: Anshuman Khandual 
Cc: Mike Rapoport 
Cc: Mark Rutland 
Cc: Suren Baghdasaryan 

A nit. Please add all relevant mailing lists like LAKML, MM along
with other developers here in the CC list, so that it would never
be missed.


---
 arch/arm64/include/asm/sparsemem.h | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/sparsemem.h 
b/arch/arm64/include/asm/sparsemem.h

index 1f43fcc79738..ff08ff6b677c 100644
--- a/arch/arm64/include/asm/sparsemem.h
+++ b/arch/arm64/include/asm/sparsemem.h
@@ -7,7 +7,13 @@

 #ifdef CONFIG_SPARSEMEM
 #define MAX_PHYSMEM_BITS   CONFIG_ARM64_PA_BITS
-#define SECTION_SIZE_BITS  30
-#endif
+
+#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)


Please add a comment, something like

/*
 * Section size must be at least 128MB for 4K base
 * page size config. Otherwise PMD based huge page
 * entries could not be created for vmemmap mappings.
 * 16K follows 4K for simplicity.
 */


+#define SECTION_SIZE_BITS 27
+#else


Please add a comment, something like

/*
 * Section size must be at least 512MB for 64K base
 * page size config. Otherwise it will be less than
 * (MAX_ORDER - 1) and the build process will fail.
 */


+#define SECTION_SIZE_BITS 29
+#endif /* CONFIG_ARM64_4K_PAGES || CONFIG_

[PATCH 1/1] arm64/sparsemem: reduce SECTION_SIZE_BITS

2021-01-20 Thread Sudarshan Rajagopalan
memory_block_size_bytes() determines the memory hotplug granularity i.e the
amount of memory which can be hot added or hot removed from the kernel. The
generic value here being MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS)
for memory_block_size_bytes() on platforms like arm64 that does not override.

Current SECTION_SIZE_BITS is 30 i.e 1GB which is large and a reduction here
increases memory hotplug granularity, thus improving its agility. A reduced
section size also reduces memory wastage in vmemmmap mapping for sections
with large memory holes. So we try to set the least section size as possible.

A section size bits selection must follow:
(MAX_ORDER - 1 + PAGE_SHIFT) <= SECTION_SIZE_BITS

CONFIG_FORCE_MAX_ZONEORDER is always defined on arm64 and so just following it
would help achieve the smallest section size.

SECTION_SIZE_BITS = (CONFIG_FORCE_MAX_ZONEORDER - 1 + PAGE_SHIFT)

SECTION_SIZE_BITS = 22 (11 - 1 + 12) i.e 4MB   for 4K pages
SECTION_SIZE_BITS = 24 (11 - 1 + 14) i.e 16MB  for 16K pages without THP
SECTION_SIZE_BITS = 25 (12 - 1 + 14) i.e 32MB  for 16K pages with THP
SECTION_SIZE_BITS = 26 (11 - 1 + 16) i.e 64MB  for 64K pages without THP
SECTION_SIZE_BITS = 29 (14 - 1 + 16) i.e 512MB for 64K pages with THP

But there are other problems in reducing SECTION_SIZE_BIT. Reducing it by too
much would over populate /sys/devices/system/memory/ and also consume too many
page->flags bits in the !vmemmap case. Also section size needs to be multiple
of 128MB to have PMD based vmemmap mapping with CONFIG_ARM64_4K_PAGES.

Given these constraints, lets just reduce the section size to 128MB for 4K
and 16K base page size configs, and to 512MB for 64K base page size config.

Signed-off-by: Sudarshan Rajagopalan 
Suggested-by: Anshuman Khandual 
Suggested-by: David Hildenbrand 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Anshuman Khandual 
Cc: David Hildenbrand 
Cc: Mike Rapoport 
Cc: Mark Rutland 
Cc: Logan Gunthorpe 
Cc: Andrew Morton 
Cc: Steven Price 
Cc: Suren Baghdasaryan 
---
 arch/arm64/include/asm/sparsemem.h | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/sparsemem.h 
b/arch/arm64/include/asm/sparsemem.h
index 1f43fcc79738..eb4a75d720ed 100644
--- a/arch/arm64/include/asm/sparsemem.h
+++ b/arch/arm64/include/asm/sparsemem.h
@@ -7,7 +7,26 @@
 
 #ifdef CONFIG_SPARSEMEM
 #define MAX_PHYSMEM_BITS   CONFIG_ARM64_PA_BITS
-#define SECTION_SIZE_BITS  30
-#endif
+
+/*
+ * Section size must be at least 512MB for 64K base
+ * page size config. Otherwise it will be less than
+ * (MAX_ORDER - 1) and the build process will fail.
+ */
+#ifdef CONFIG_ARM64_64K_PAGES
+#define SECTION_SIZE_BITS 29
+
+#else
+
+/*
+ * Section size must be at least 128MB for 4K base
+ * page size config. Otherwise PMD based huge page
+ * entries could not be created for vmemmap mappings.
+ * 16K follows 4K for simplicity.
+ */
+#define SECTION_SIZE_BITS 27
+#endif /* CONFIG_ARM64_64K_PAGES */
+
+#endif /* CONFIG_SPARSEMEM*/
 
 #endif
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH 0/1] arm64/sparsemem: reduce SECTION_SIZE_BITS

2021-01-20 Thread Sudarshan Rajagopalan
This patch is the follow-up from the discussions in the thread [1].
Reducing the section size has the merit of reducing wastage of reserved memory
for vmmemmap mappings for sections with large memory holes. Also with smaller 
section size gives more grunularity and agility for memory hot(un)plugging.

But there are also constraints in reducing SECTION_SIZE_BIT:

- Should accommodate highest order page for a given config
- Should not break PMD mapping in vmemmap for 4K pages
- Should not consume too many page->flags bits reducing space for other info

This patch uses the suggestions from Anshuman Khandual and David Hildenbrand
in thread [1] to set the least possible section size to 128MB for 4K and 16K
base page size configs for simplicity, and to 512MB for 64K base page size 
config.

[1] 
https://lore.kernel.org/lkml/cover.1609895500.git.sudar...@codeaurora.org/T/#m8ee60ae69db5e9eb06ca7999c43828d49ccb9626


Sudarshan Rajagopalan (1):
  arm64/sparsemem: reduce SECTION_SIZE_BITS

 arch/arm64/include/asm/sparsemem.h | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [PATCH 1/1] arm64: reduce section size for sparsemem

2021-01-20 Thread Sudarshan Rajagopalan

On 2021-01-20 09:49, Will Deacon wrote:

On Fri, Jan 08, 2021 at 03:16:00PM -0800, Sudarshan Rajagopalan wrote:

Reducing the section size helps reduce wastage of reserved memory
for huge memory holes in sparsemem model. But having a much smaller
section size bits could break PMD mappings for vmemmap and wouldn't
accomodate the highest order page for certain page size granule 
configs.

It is determined that SECTION_SIZE_BITS of 27 (128MB) could be ideal
default value for 4K_PAGES that gives least section size without 
breaking

PMD based vmemmap mappings. For simplicity, 16K_PAGES could follow the
same as 4K_PAGES. And the least SECTION_SIZE_BITS for 64K_PAGES is 29
that could accomodate MAX_ORDER.

Signed-off-by: Sudarshan Rajagopalan 
Suggested-by: David Hildenbrand 
Cc: Will Deacon 
Cc: Anshuman Khandual 
Cc: Mike Rapoport 
Cc: Mark Rutland 
Cc: Suren Baghdasaryan 
---
 arch/arm64/include/asm/sparsemem.h | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/sparsemem.h 
b/arch/arm64/include/asm/sparsemem.h

index 1f43fcc79738..ff08ff6b677c 100644
--- a/arch/arm64/include/asm/sparsemem.h
+++ b/arch/arm64/include/asm/sparsemem.h
@@ -7,7 +7,13 @@

 #ifdef CONFIG_SPARSEMEM
 #define MAX_PHYSMEM_BITS   CONFIG_ARM64_PA_BITS
-#define SECTION_SIZE_BITS  30
-#endif
+
+#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
+#define SECTION_SIZE_BITS 27
+#else
+#define SECTION_SIZE_BITS 29
+#endif /* CONFIG_ARM64_4K_PAGES || CONFIG_ARM64_16K_PAGES */
+
+#endif /* CONFIG_SPARSEMEM*/


Please can you repost this in light of the comments from Anshuman?

Thanks,

Will


Sure Will. We were held up with some other critical tasks.. will repost 
the patch by EOD after addressing Anshuman's comments.


--
Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


[PATCH] mm: vmscan: support equal reclaim for anon and file pages

2021-01-11 Thread Sudarshan Rajagopalan
When performing memory reclaim support treating anonymous and
file backed pages equally.
Swapping anonymous pages out to memory can be efficient enough
to justify treating anonymous and file backed pages equally.

Signed-off-by: Sudarshan Rajagopalan 
Cc: Andrew Morton 
---
 mm/vmscan.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 257cba79a96d..ec7585e0d5f5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -169,6 +169,8 @@ struct scan_control {
  */
 int vm_swappiness = 60;
 
+bool balance_anon_file_reclaim = false;
+
 static void set_task_reclaim_state(struct task_struct *task,
   struct reclaim_state *rs)
 {
@@ -201,6 +203,13 @@ static DECLARE_RWSEM(shrinker_rwsem);
 static DEFINE_IDR(shrinker_idr);
 static int shrinker_nr_max;
 
+static int __init cmdline_parse_balance_reclaim(char *p)
+{
+   balance_anon_file_reclaim = true;
+   return 0;
+}
+early_param("balance_reclaim", cmdline_parse_balance_reclaim);
+
 static int prealloc_memcg_shrinker(struct shrinker *shrinker)
 {
int id, ret = -ENOMEM;
@@ -2291,9 +2300,11 @@ static void get_scan_count(struct lruvec *lruvec, struct 
scan_control *sc,
 
/*
 * If there is enough inactive page cache, we do not reclaim
-* anything from the anonymous working right now.
+* anything from the anonymous working right now. But when balancing
+* anon and page cache files for reclaim, allow swapping of anon pages
+* even if there are a number of inactive file cache pages.
 */
-   if (sc->cache_trim_mode) {
+   if (!balance_anon_file_reclaim && sc->cache_trim_mode) {
scan_balance = SCAN_FILE;
goto out;
}
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH 1/1] arm64: reduce section size for sparsemem

2021-01-08 Thread Sudarshan Rajagopalan
Reducing the section size helps reduce wastage of reserved memory
for huge memory holes in sparsemem model. But having a much smaller
section size bits could break PMD mappings for vmemmap and wouldn't
accomodate the highest order page for certain page size granule configs.
It is determined that SECTION_SIZE_BITS of 27 (128MB) could be ideal
default value for 4K_PAGES that gives least section size without breaking
PMD based vmemmap mappings. For simplicity, 16K_PAGES could follow the
same as 4K_PAGES. And the least SECTION_SIZE_BITS for 64K_PAGES is 29
that could accomodate MAX_ORDER.

Signed-off-by: Sudarshan Rajagopalan 
Suggested-by: David Hildenbrand 
Cc: Will Deacon 
Cc: Anshuman Khandual 
Cc: Mike Rapoport 
Cc: Mark Rutland 
Cc: Suren Baghdasaryan 
---
 arch/arm64/include/asm/sparsemem.h | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/sparsemem.h 
b/arch/arm64/include/asm/sparsemem.h
index 1f43fcc79738..ff08ff6b677c 100644
--- a/arch/arm64/include/asm/sparsemem.h
+++ b/arch/arm64/include/asm/sparsemem.h
@@ -7,7 +7,13 @@
 
 #ifdef CONFIG_SPARSEMEM
 #define MAX_PHYSMEM_BITS   CONFIG_ARM64_PA_BITS
-#define SECTION_SIZE_BITS  30
-#endif
+
+#if defined(CONFIG_ARM64_4K_PAGES) || defined(CONFIG_ARM64_16K_PAGES)
+#define SECTION_SIZE_BITS 27
+#else
+#define SECTION_SIZE_BITS 29
+#endif /* CONFIG_ARM64_4K_PAGES || CONFIG_ARM64_16K_PAGES */
+
+#endif /* CONFIG_SPARSEMEM*/
 
 #endif
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH 0/1] arm64: reduce section size for sparsemem

2021-01-08 Thread Sudarshan Rajagopalan
This patch is the follow-up from the discussions in the thread [1].
Reducing the section size has the merit of reducing wastage of reserved memory
for huge memory holes in sparsemem model. Also with smaller section size gives
more grunularity and agility for memory hot(un)plugging.

This patch tends to use the suggestion from David Hildenbrand in thread [1]
to set the least possible SECTION_SIZE_BITS for 4K, 16K and 64K page granule.
That is 27 (128MB) for 4K/16K and 29 (512MB) for 64K page granule.

[1] 
https://lore.kernel.org/lkml/cover.1609895500.git.sudar...@codeaurora.org/T/#m8ee60ae69db5e9eb06ca7999c43828d49ccb9626

Sudarshan Rajagopalan (1):
  arm64: reduce section size for sparsemem

 arch/arm64/include/asm/sparsemem.h | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [PATCH 1/1] arm64: make section size configurable for memory hotplug

2021-01-07 Thread Sudarshan Rajagopalan

On 2021-01-05 22:11, Anshuman Khandual wrote:

Hello Anshuman, thanks for your response.


(+ Will)


Hi Sudershan,

This patch (and the cover letter) does not copy LAKML even though the
entire change here is arm64 specific. Please do copy all applicable
mailing lists for a given patch.


I used ./scripts/get_maintainer.pl patch.patch to get the maintainers 
list. It somehow didn't mention LAKML. I've added the mailing list to 
this thread.




On 1/6/21 6:58 AM, Sudarshan Rajagopalan wrote:

Currently on arm64, memory section size is hard-coded to 1GB.
Make this configurable if memory-hotplug is enabled, to support
more finer granularity for hotplug-able memory.


Section size has always been decided by the platform. It cannot be a
configurable option because the user would not know the constraints
for memory representation on the platform and besides it also cannot
be trusted.



Signed-off-by: Sudarshan Rajagopalan 
---
 arch/arm64/Kconfig | 11 +++
 arch/arm64/include/asm/sparsemem.h |  4 
 2 files changed, 15 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6d232837cbee..34124eee65da 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -294,6 +294,17 @@ config ARCH_ENABLE_MEMORY_HOTREMOVE
 config SMP
def_bool y

+config HOTPLUG_SIZE_BITS
+   int "Memory hotplug block size(29 => 512MB 30 => 1GB)"
+   depends on SPARSEMEM
+   depends on MEMORY_HOTPLUG
+   range 28 30


28 would not work for 64K pages.


+   default 30
+   help
+Selects granularity of hotplug memory. Block size for
+memory hotplug is represent as a power of 2.
+If unsure, stick with default value.
+
 config KERNEL_MODE_NEON
def_bool y

diff --git a/arch/arm64/include/asm/sparsemem.h 
b/arch/arm64/include/asm/sparsemem.h

index 1f43fcc79738..3d5310f3aad5 100644
--- a/arch/arm64/include/asm/sparsemem.h
+++ b/arch/arm64/include/asm/sparsemem.h
@@ -7,7 +7,11 @@

 #ifdef CONFIG_SPARSEMEM
 #define MAX_PHYSMEM_BITS   CONFIG_ARM64_PA_BITS
+#ifndef CONFIG_MEMORY_HOTPLUG
 #define SECTION_SIZE_BITS  30
+#else
+#define SECTION_SIZE_BITS  CONFIG_HOTPLUG_SIZE_BITS
+#endif
 #endif

 #endif



There was an inconclusive discussion regarding this last month.

https://lore.kernel.org/linux-arm-kernel/20201204014443.43329-1-liwei...@huawei.com/


Thanks for pointing out this thread. Looking into all the comments, 
major concern with reducing the section size seems to be risk of running 
out of bits in the page flags. And while SECTION_SIZE must be greater or 
equal to highest order page in the buddy, it must also satisfy cases for 
4K page size where it doesn't break PMD mapping for vmemmap - and hence 
SECTION_SIZE_BITS of 27 could be set for 4K page size that could allow 
2MB PMD mappings for each 128M(2^27) block size.


While this is the least value that can be set (27 for 4K_PAGE, 
MAX_ZONEORDER - 1 + PAGE_SHIFT for 16K or 64K_PAGE), are there any 
concerns with setting higher values (but <= 30bits). It seems like any 
arbitrary number between this range could be applied that wouldn't break 
vmemmaps. That's why we were thinking of letting the user configure it 
since this directly impacts memory hotplug about the granularity or 
least size that can be hot (up)plugged. The current setting of 1GB for 
arm64 does poses a lot of challenges in utilizing memory hotplug via a 
driver, esp. for low RAM targets. I agree its sub-optimal in some sense 
but wanted to know the maintainers opinion on this.


Also, the patch introduced in that thread does seem to help reduce 
vmemmap memory if there are large holes. So there is some merit in 
reducing the section size along with memory hotplug leveraging it.




I have been wondering if this would solve the problem for 4K page size
config which requires PMD mapping for the vmemmap mapping while making
section size bits dependent on max order. But this has not been tested
properly.

diff --git a/arch/arm64/include/asm/sparsemem.h
b/arch/arm64/include/asm/sparsemem.h
index 1f43fcc79738..fe4353cb1dce 100644
--- a/arch/arm64/include/asm/sparsemem.h
+++ b/arch/arm64/include/asm/sparsemem.h
@@ -7,7 +7,18 @@

 #ifdef CONFIG_SPARSEMEM
 #define MAX_PHYSMEM_BITS   CONFIG_ARM64_PA_BITS
-#define SECTION_SIZE_BITS  30
-#endif
+
+#ifdef CONFIG_ARM64_4K_PAGES
+#define SECTION_SIZE_BITS 27
+#else
+#ifdef CONFIG_FORCE_MAX_ZONEORDER
+#define SECTION_SIZE_BITS (CONFIG_FORCE_MAX_ZONEORDER - 1 + 
PAGE_SHIFT)

+#else
+#define SECTION_SIZE_BITS 30
+#endif /* CONFIG_FORCE_MAX_ZONEORDER */
+
+#endif /* CONFIG_ARM64_4K_PAGES */
+
+#endif /* CONFIG_SPARSEMEM*/

 #endif


SECTION_SIZE_BITS of 27 for 4K_PAGES should be fine for us. Would you 
know if there's possibility of this patch above being applied in 
upstream anytime soon? This is in regards with Generic Kernel Image 
(GKI) that we are working with Google. If this patch would positively 
end up in upstream, we could appl

[PATCH 1/1] arm64: make section size configurable for memory hotplug

2021-01-05 Thread Sudarshan Rajagopalan
Currently on arm64, memory section size is hard-coded to 1GB.
Make this configurable if memory-hotplug is enabled, to support
more finer granularity for hotplug-able memory.

Signed-off-by: Sudarshan Rajagopalan 
---
 arch/arm64/Kconfig | 11 +++
 arch/arm64/include/asm/sparsemem.h |  4 
 2 files changed, 15 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6d232837cbee..34124eee65da 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -294,6 +294,17 @@ config ARCH_ENABLE_MEMORY_HOTREMOVE
 config SMP
def_bool y
 
+config HOTPLUG_SIZE_BITS
+   int "Memory hotplug block size(29 => 512MB 30 => 1GB)"
+   depends on SPARSEMEM
+   depends on MEMORY_HOTPLUG
+   range 28 30
+   default 30
+   help
+Selects granularity of hotplug memory. Block size for
+memory hotplug is represent as a power of 2.
+If unsure, stick with default value.
+
 config KERNEL_MODE_NEON
def_bool y
 
diff --git a/arch/arm64/include/asm/sparsemem.h 
b/arch/arm64/include/asm/sparsemem.h
index 1f43fcc79738..3d5310f3aad5 100644
--- a/arch/arm64/include/asm/sparsemem.h
+++ b/arch/arm64/include/asm/sparsemem.h
@@ -7,7 +7,11 @@
 
 #ifdef CONFIG_SPARSEMEM
 #define MAX_PHYSMEM_BITS   CONFIG_ARM64_PA_BITS
+#ifndef CONFIG_MEMORY_HOTPLUG
 #define SECTION_SIZE_BITS  30
+#else
+#define SECTION_SIZE_BITS  CONFIG_HOTPLUG_SIZE_BITS
+#endif
 #endif
 
 #endif
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH 0/1] arm64: make section size configurable for memory hotplug

2021-01-05 Thread Sudarshan Rajagopalan


The section size defines the granularity of memory hotplug. This is currently
hard coded to 1GB on arm64 linux, which defines that the least size of memblock
that can be hotplugged out is 1GB. Some DDR configurations (especially low RAM
and dual-rank DDRs) may have section sizes that are less than 1GB (ex. 512MB, 
256MB etc.).
Having an option to reduce the memblock size to section size or lower gives more
granularity of memory hotplug. For example, a system with DDR section size of 
512MB
and kernel memblock size of 1GB, we would have to remove two segments of DDR 
sections
in order to hotplug out atleast 1 memblock from kernel POV. 

Section sizes of DDRs vary based on specs (number of ranks, channels, regions 
etc.)
Making this section size configurable helps users to assign based on the DDR 
being used.
The default is set to 1GB which is the current memblock size.

Sudarshan Rajagopalan (1):
  arm64: Make section size configurable for memory hotplug

 arch/arm64/Kconfig | 11 +++
 arch/arm64/include/asm/sparsemem.h |  4 
 2 files changed, 15 insertions(+)

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH] mm: vmscan: support complete shrinker reclaim

2021-01-05 Thread Sudarshan Rajagopalan
Ensure that shrinkers are given the option to completely drop
their caches even when their caches are smaller than the batch size.
This change helps improve memory headroom by ensuring that under
significant memory pressure shrinkers can drop all of their caches.
This change only attempts to more aggressively call the shrinkers
during background memory reclaim, inorder to avoid hurting the
performance of direct memory reclaim.

Signed-off-by: Sudarshan Rajagopalan 
Cc: Andrew Morton 
---
 mm/vmscan.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9727dd8e2581..35973665ae64 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -424,6 +424,10 @@ static unsigned long do_shrink_slab(struct shrink_control 
*shrinkctl,
long batch_size = shrinker->batch ? shrinker->batch
  : SHRINK_BATCH;
long scanned = 0, next_deferred;
+   long min_cache_size = batch_size;
+
+   if (current_is_kswapd())
+   min_cache_size = 0;
 
if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
nid = 0;
@@ -503,7 +507,7 @@ static unsigned long do_shrink_slab(struct shrink_control 
*shrinkctl,
 * scanning at high prio and therefore should try to reclaim as much as
 * possible.
 */
-   while (total_scan >= batch_size ||
+   while (total_scan > min_cache_size ||
   total_scan >= freeable) {
unsigned long ret;
unsigned long nr_to_scan = min(batch_size, total_scan);
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



[RFC] depopulate_range_driver_managed() for removing page-table mappings for hot-added memory blocks

2020-11-13 Thread Sudarshan Rajagopalan



Hello,

When memory blocks are removed, along with removing the memmap entries, 
memory resource and memory block devices, the arch specific 
arch_remove_memory() is called which takes care of tearing down the 
page-tables.


Suppose there’s a usecase where the removed memory blocks will be added 
back into the system at later point, we can remove/offline the block in 
a way that all entries such as memmaps, memory resources and block 
devices can be kept intact so that they won’t be needed to be created 
again when blocks are added back. Now this can be done by doing offline 
alone. But if there’s special usecase where the page-table entries are 
needed to be teared down when blocks are offlined in order to avoid 
speculative accesses on offlined memory region, but also keep the memmap 
entries and block devices intact, I was thinking if we can implement 
something like {populate|depopulate}_range_driver_managed() that can be 
called after online/offline which can create/tear down page table 
mappings for that range. This would avoid us from the need to do 
remove_memory() entirely just for the sake of page-table entries being 
removed. We can now just offline the block and call 
depopulate_range_driver_managed.


This basically isolates arch_{add/remove}_memory outside of 
add/remove_memory routines so that drivers can choose if it needs to 
just offline and remove page-table mappings or hotremove memory 
entirely. This gives drivers the flexibility to retain memmap entries 
and memory resource and block device creation so that they can be 
skipped when blocks are added back – this helps us reduce the latencies 
for removing and adding memory blocks.


I’m still in the process the creating the patch that implements this, 
which would give clear view about this RFC but just putting out the 
thought here if it makes sense or not.



Sudarshan
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


Re: [PATCH v4] arm64/mm: add fallback option to allocate virtually contiguous memory

2020-11-04 Thread Sudarshan Rajagopalan

On 2020-10-16 11:56, Sudarshan Rajagopalan wrote:

Hello Will, Catalin,

Did you have a chance to review this patch? It is reviewed by others and 
haven't seen any Nacks. This patch will be useful to have so that memory 
hotremove doesn't fail when such PMD_SIZE pages aren't available.. which 
is usually the case in low RAM devices.



When section mappings are enabled, we allocate vmemmap pages from
physically continuous memory of size PMD_SIZE using
vmemmap_alloc_block_buf(). Section mappings are good to reduce TLB
pressure. But when system is highly fragmented and memory blocks are
being hot-added at runtime, its possible that such physically 
continuous

memory allocations can fail. Rather than failing the memory hot-add
procedure, add a fallback option to allocate vmemmap pages from
discontinuous pages using vmemmap_populate_basepages().

Signed-off-by: Sudarshan Rajagopalan 
Reviewed-by: Gavin Shan 
Reviewed-by: Anshuman Khandual 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Anshuman Khandual 
Cc: Mark Rutland 
Cc: Logan Gunthorpe 
Cc: David Hildenbrand 
Cc: Andrew Morton 
Cc: Steven Price 
---
 arch/arm64/mm/mmu.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75df62fea1b6..44486fd0e883 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1121,8 +1121,11 @@ int __meminit vmemmap_populate(unsigned long
start, unsigned long end, int node,
void *p = NULL;

p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
-   if (!p)
-   return -ENOMEM;
+   if (!p) {
+   if (vmemmap_populate_basepages(addr, next, 
node, altmap))
+   return -ENOMEM;
+   continue;
+   }

pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
} else


--
Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


Re: mm/memblock: export memblock_{start/end}_of_DRAM

2020-11-03 Thread Sudarshan Rajagopalan

On 2020-10-30 01:38, Mike Rapoport wrote:

On Thu, Oct 29, 2020 at 02:29:27PM -0700, Sudarshan Rajagopalan wrote:

Hello all,

We have a usecase where a module driver adds certain memory blocks 
using

add_memory_driver_managed(), so that it can perform memory hotplug
operations on these blocks. In general, these memory blocks aren’t 
something
that gets physically added later, but is part of actual RAM that 
system
booted up with. Meaning – we set the ‘mem=’ cmdline parameter to limit 
the

memory and later add the remaining ones using add_memory*() variants.

The basic idea is to have driver have ownership and manage certain 
memory

blocks for hotplug operations.

For the driver be able to know how much memory was limited and how 
much
actually present, we take the delta of ‘bootmem physical end address’ 
and
‘memblock_end_of_DRAM’. The 'bootmem physical end address' is obtained 
by

scanning the reg values in ‘memory’ DT node and determining the max
{addr,size}. Since our driver is getting modularized, we won’t have 
access
to memblock_end_of_DRAM (i.e. end address of all memory blocks after 
‘mem=’

is applied).

So checking if memblock_{start/end}_of_DRAM() symbols can be exported? 
Also,
this information can be obtained by userspace by doing ‘cat 
/proc/iomem’ and
greping for ‘System RAM’. So wondering if userspace can have access to 
such

info, can we allow kernel module drivers have access by exporting
memblock_{start/end}_of_DRAM().


These functions cannot be exported not because we want to hide this
information from the modules but because it is unsafe to use them.
On most architecturs these functions are __init so they are discarded
after boot anyway. Beisdes, the memory configuration known to memblock
might be not accurate in many cases as David explained in his reply.



I don't see how information contained in memblock_{start/end}_of_DRAM() 
is considered hidden if the information can be obtained using 'cat 
/proc/iomem'. The memory resource manager adds these blocks either in 
"System RAM", "reserved", "Kernel data/code" etc. Inspecting this, one 
could determine whats the start and end of memblocks.


I agree on the part that its __init annotated and could be removed after 
boot. This is something that the driver can be vary of too.


Or are there any other ways where a module driver can get the end 
address of

system memory block?


What do you mean by "system memory block"? There could be a lot of
interpretations if you take into account memory hotplug, "mem=" option,
reserved and firmware memory.


I meant the physical end address of memblock. The equivalent of 
memblock_end_of_DRAM.




I'd suggest you to describe the entire use case in more detail. Having
the complete picture would help finding a proper solution.


The usecase in general is have a way to add/remove and online/offline 
certain memory blocks which are part of boot. We do this by limiting the 
memory using "mem=" and latter add the remaining blocks using 
add_memory_driver_mamanaged().





Sudarshan



--
Sincerely yours,
Mike.



Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


Re: mm/memblock: export memblock_{start/end}_of_DRAM

2020-11-03 Thread Sudarshan Rajagopalan

On 2020-10-29 23:41, David Hildenbrand wrote:

On 29.10.20 22:29, Sudarshan Rajagopalan wrote:

Hello all,



Hi!



Hi David.. thanks for the response as always.

We have a usecase where a module driver adds certain memory blocks 
using

add_memory_driver_managed(), so that it can perform memory hotplug
operations on these blocks. In general, these memory blocks aren’t
something that gets physically added later, but is part of actual RAM
that system booted up with. Meaning – we set the ‘mem=’ cmdline
parameter to limit the memory and later add the remaining ones using
add_memory*() variants.

The basic idea is to have driver have ownership and manage certain
memory blocks for hotplug operations.


So, in summary, you're still abusing the memory hot(un)plug
infrastructure from your driver - just not in a severe way as before.
And I'll tell you why, so you might understand why exposing this API
is not really a good idea and why your driver wouldn't - for example -
be upstream material.

Don't get me wrong, what you are doing might be ok in your context,
but it's simply not universally applicable in our current model.

Ordinary system RAM works different than many other devices (like PCI
devices) whereby *something* senses the device and exposes it to the
system, and some available driver binds to it and owns the memory.

Memory is detected by a driver and added to the system via e.g.,
add_memory_driver_managed(). Memory devices are created and the memory
is directly handed off to the system, to be used as system RAM as soon
as memory devices are onlined. There is no driver that "binds" memory
like other devices - it's rather the core (buddy) that uses/owns that
memory immediately after device creation.



I see.. and I agree that drivers are meant to *sense* that something 
changed or newly added, so that driver can check if it's the one 
responsible or compatible for handling this entity and binds to it. So I 
guess what it boils down to is - a driver that uses memory hotplug 
_cannot_ add/remove or have ownership of memblock boot memory, but for 
the newly added RAM blocks later on.


I was trying to mimic the detecting and adding of extra RAM by limiting 
the System RAM with "mem=XGB" as though system booted with XGB of boot 
memory and later add the remaining blocks (force detection and adding) 
using add_memorY-driver_manager(). This remaining blocks are calculated 
by 'physical end addr of boot memory' - 'memblock_end_of_DRAM'. The 
"physical end addr of boot memory" i.e. the actual RAM that bootloader 
informs to kernel can be obtained by scanning the 'memory' DT node.




For the driver be able to know how much memory was limited and how 
much

actually present, we take the delta of ‘bootmem physical end address’
and ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is
obtained by scanning the reg values in ‘memory’ DT node and 
determining

the max {addr,size}. Since our driver is getting modularized, we won’t
have access to memblock_end_of_DRAM (i.e. end address of all memory
blocks after ‘mem=’ is applied).


What you do with "mem=" is force memory detection to ignore some of
it's detected memory.



So checking if memblock_{start/end}_of_DRAM() symbols can be exported?
Also, this information can be obtained by userspace by doing ‘cat
/proc/iomem’ and greping for ‘System RAM’. So wondering if userspace 
can


Not correct: with "mem=", cat /proc/iomem only shows *detected* +
added system RAM, not the unmodified detection.



That's correct - I meant 'memblock_end_of_DRAM' along with "mem=" can be 
calculated using 'cat /proc/iomem' which shows "detected plus added" 
System RAM, and not the remaining undetected one which got stripped off 
due to "mem=XGB". Basically, 'memblock_end_of_DRAM' address with 
'mem=XGB' is {end addr of boot RAM - XGB}.. which would be same as end 
address of "System RAM" showed in /proc/iomem.


The reasoning for this is - if userspace can have access to such info 
and calculate the memblock end address, why not let drivers have this 
info using memblock_end_of_DRAM()?


have access to such info, can we allow kernel module drivers have 
access

by exporting memblock_{start/end}_of_DRAM().

Or are there any other ways where a module driver can get the end
address of system memory block?


And here is our problem: You disabled *detection* of that memory by
the responsible driver (here: core). Now your driver wants to know
what would have been detected. Assume you have memory hole in that
region - it would not work by simply looking at start/end. You're
driver is not the one doing the detection.



Regarding the memory hole - the driver can inspect the 'memory' DT node 
that kernel gets from ABL from RAM partition table if any such holes 
exist or not. I agree that if such holes exists, hot adding will fail 
since it needs block size to be added.
The same issue will arise 

mm/memblock: export memblock_{start/end}_of_DRAM

2020-10-29 Thread Sudarshan Rajagopalan

Hello all,

We have a usecase where a module driver adds certain memory blocks using 
add_memory_driver_managed(), so that it can perform memory hotplug 
operations on these blocks. In general, these memory blocks aren’t 
something that gets physically added later, but is part of actual RAM 
that system booted up with. Meaning – we set the ‘mem=’ cmdline 
parameter to limit the memory and later add the remaining ones using 
add_memory*() variants.


The basic idea is to have driver have ownership and manage certain 
memory blocks for hotplug operations.


For the driver be able to know how much memory was limited and how much 
actually present, we take the delta of ‘bootmem physical end address’ 
and ‘memblock_end_of_DRAM’. The 'bootmem physical end address' is 
obtained by scanning the reg values in ‘memory’ DT node and determining 
the max {addr,size}. Since our driver is getting modularized, we won’t 
have access to memblock_end_of_DRAM (i.e. end address of all memory 
blocks after ‘mem=’ is applied).


So checking if memblock_{start/end}_of_DRAM() symbols can be exported? 
Also, this information can be obtained by userspace by doing ‘cat 
/proc/iomem’ and greping for ‘System RAM’. So wondering if userspace can 
have access to such info, can we allow kernel module drivers have access 
by exporting memblock_{start/end}_of_DRAM().


Or are there any other ways where a module driver can get the end 
address of system memory block?



Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


Re: arm64: dropping prevent_bootmem_remove_notifier

2020-10-29 Thread Sudarshan Rajagopalan




Hi Anshuman, David,

Thanks for all the detailed explanations for the reasoning to have 
bootmem protected from being removed. Also, I do agree drivers being 
able to mark memory sections isn't the right thing to do.


We went ahead with the approach of using "mem=" as you suggested to 
limit the bootmem and add remaining blocks using 
add_memory_driver_managed() so that driver has ownership of these 
blocks.


We do have some follow-up questions regarding this - will initiate a 
discussion soon.



On 2020-10-18 22:37, Anshuman Khandual wrote:

Hello Sudarshan,

On 10/17/2020 04:41 AM, Sudarshan Rajagopalan wrote:


Hello Anshuman,

In the patch that enables memory hot-remove (commit bbd6ec605c0f 
("arm64/mm: Enable memory hot remove")) for arm64, there’s a notifier 
put in place that prevents boot memory from being offlined and 
removed. Also commit text mentions that boot memory on arm64 cannot be 
removed. We wanted to understand more about the reasoning for this. 
X86 and other archs doesn’t seem to do this prevention. There’s also 
comment in the code that this notifier could be dropped in future if 
and when boot memory can be removed.


Right and till then the notifier cannot be dropped. There was a lot of
discussions
around this topic during multiple iterations of memory hot remove
series. Hence, I
would just request you to please go through them first. This list here
is from one
such series (https://lwn.net/Articles/809179/) but might not be 
exhaustive.


-
On arm64 platform, it is essential to ensure that the boot time 
discovered

memory couldn't be hot-removed so that,

1. FW data structures used across kexec are idempotent
   e.g. the EFI memory map.

2. linear map or vmemmap would not have to be dynamically split, and 
can

   map boot memory at a large granularity

3. Avoid penalizing paths that have to walk page tables, where we can 
be

   certain that the memory is not hot-removable
-

The primary reason being kexec which would need substantial rework 
otherwise.




The current logic is that only “new” memory blocks which are hot-added 
can later be offlined and removed. The memory that system booted up 
with cannot be offlined and removed. But there could be many usercases 
such as inter-VM memory sharing where a primary VM could offline and 
hot-remove a block/section of memory and lend it to secondary VM where 
it could hot-add it. And after usecase is done, the reverse happens 
where secondary VM hot-removes and gives it back to primary which can 
hot-add it back. In such cases, the present logic for arm64 doesn’t 
allow this hot-remove in primary to happen.


That is not true. Each VM could just boot with a minimum boot memory 
which can
not be offlined or removed but then a possible larger portion of memory 
can be
hot added during the boot process itself, making them available for any 
future
inter VM sharing purpose. Hence this problem could easily be solved in 
the user

space itself.



Also, on systems with movable zone that sort of guarantees pages to be 
migrated and isolated so that blocks can be offlined, this logic also 
defeats the purpose of having a movable zone which system can rely on 
memory hot-plugging, which say virt-io mem also relies on for fully 
plugged memory blocks.
ZONE_MOVABLE does not really guarantee migration, isolation and 
removal. There
are reasons an offline request might just fail. I agree that those 
reasons are
normally not platform related but core memory gives platform an 
opportunity to
decline an offlining request via a notifier. Hence ZONE_MOVABLE offline 
can be

denied. Semantics wise we are still okay.

This might look bit inconsistent that 
movablecore/kernelcore/movable_node with
firmware sending in 'hot pluggable' memory (IIRC arm64 does not really 
support
this yet), the system might end up with ZONE_MOVABLE marked boot memory 
which
cannot be offlined or removed. But an offline notifier action is 
orthogonal.
Hence did not block those kernel command line paths that creates 
ZONE_MOVABLE

during boot to preserve existing behavior.



I understand that some region of boot RAM shouldn’t be allowed to be 
removed, but such regions won’t be allowed to be offlined in first 
place since pages cannot be migrated and isolated, example reserved 
pages.


So we’re trying to understand the reasoning for such a prevention put 
in place for arm64 arch alone.


Primary reason being kexec. During kexec on arm64, next kernel's memory 
map is
derived from firmware and not from current running kernel. So the next 
kernel
will crash if it would access memory that might have been removed in 
running
kernel. Until kexec on arm64 changes substantially and takes into 
account the
real available memory on the current kernel, boot memory cannot be 
removed.




One possible way to solve this is by marking the required sections as 
“non-early” by removing the SECTION_IS_EARLY bit in its 
section_me

[PATCH 2/2] arm64: allow hotpluggable sections to be offlined

2020-10-16 Thread Sudarshan Rajagopalan
On receiving the MEM_GOING_OFFLINE notification, we disallow offlining of
any boot memory by checking if section_early or not. With the introduction
of SECTION_MARK_HOTPLUGGABLE, allow boot mem sections that are marked as
hotpluggable with this bit set to be offlined and removed. This now allows
certain boot mem sections to be offlined.

Signed-off-by: Sudarshan Rajagopalan 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Anshuman Khandual 
Cc: Mark Rutland 
Cc: Gavin Shan 
Cc: Logan Gunthorpe 
Cc: David Hildenbrand 
Cc: Andrew Morton 
Cc: Steven Price 
Cc: Suren Baghdasaryan 
---
 arch/arm64/mm/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75df62fea1b6..fb8878698672 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1487,7 +1487,7 @@ static int prevent_bootmem_remove_notifier(struct 
notifier_block *nb,
 
for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
ms = __pfn_to_section(pfn);
-   if (early_section(ms))
+   if (early_section(ms) && !removable_section(ms))
return NOTIFY_BAD;
}
return NOTIFY_OK;
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH 1/2] mm/memory_hotplug: allow marking of memory sections as hotpluggable

2020-10-16 Thread Sudarshan Rajagopalan
Certain architectures such as arm64 doesn't allow boot memory to be
offlined and removed. Distinguish certain memory sections as
"hotpluggable" which can be marked by module drivers stating to memory
hotplug layer that these sections can be offlined and then removed.
This is done by using a separate section memory mab bit and setting it,
rather than clearing the existing SECTION_IS_EARLY bit.
This patch introduces SECTION_MARK_HOTPLUGGABLE bit into section mem map.
Only the allowed sections which are in movable zone and have unmovable
pages are allowed to be set with this new bit.

Signed-off-by: Sudarshan Rajagopalan 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Mike Rapoport 
Cc: Anshuman Khandual 
Cc: David Hildenbrand 
Cc: Mark Rutland 
Cc: Steven Price 
Cc: Logan Gunthorpe 
Cc: Suren Baghdasaryan 
---
 include/linux/memory_hotplug.h |  1 +
 include/linux/mmzone.h |  9 -
 mm/memory_hotplug.c| 20 
 mm/sparse.c| 31 +++
 4 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 375515803cd8..81df45b582c8 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -319,6 +319,7 @@ extern int offline_pages(unsigned long start_pfn, unsigned 
long nr_pages);
 extern int remove_memory(int nid, u64 start, u64 size);
 extern void __remove_memory(int nid, u64 start, u64 size);
 extern int offline_and_remove_memory(int nid, u64 start, u64 size);
+extern int mark_memory_hotpluggable(unsigned long start, unsigned long end);
 
 #else
 static inline void try_offline_node(int nid) {}
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 8379432f4f2f..3df3a4975236 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1247,7 +1247,8 @@ extern size_t mem_section_usage_size(void);
 #define SECTION_HAS_MEM_MAP(1UL<<1)
 #define SECTION_IS_ONLINE  (1UL<<2)
 #define SECTION_IS_EARLY   (1UL<<3)
-#define SECTION_MAP_LAST_BIT   (1UL<<4)
+#define SECTION_MARK_HOTPLUGGABLE  (1UL<<4)
+#define SECTION_MAP_LAST_BIT   (1UL<<5)
 #define SECTION_MAP_MASK   (~(SECTION_MAP_LAST_BIT-1))
 #define SECTION_NID_SHIFT  3
 
@@ -1278,6 +1279,11 @@ static inline int early_section(struct mem_section 
*section)
return (section && (section->section_mem_map & SECTION_IS_EARLY));
 }
 
+static inline int removable_section(struct mem_section *section)
+{
+   return (section && (section->section_mem_map & 
SECTION_MARK_HOTPLUGGABLE));
+}
+
 static inline int valid_section_nr(unsigned long nr)
 {
return valid_section(__nr_to_section(nr));
@@ -1297,6 +1303,7 @@ static inline int online_section_nr(unsigned long nr)
 void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn);
 #ifdef CONFIG_MEMORY_HOTREMOVE
 void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn);
+int section_mark_hotpluggable(struct mem_section *ms);
 #endif
 #endif
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e9d5ab5d3ca0..503b0de489a0 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1860,4 +1860,24 @@ int offline_and_remove_memory(int nid, u64 start, u64 
size)
return rc;
 }
 EXPORT_SYMBOL_GPL(offline_and_remove_memory);
+
+int mark_memory_hotpluggable(unsigned long start_pfn, unsigned long end_pfn)
+{
+   struct mem_section *ms;
+   unsigned long nr;
+   int rc = -EINVAL;
+
+   if (end_pfn < start_pfn)
+   return rc;
+
+   for (nr = start_pfn; nr <= end_pfn; nr++) {
+   ms = __pfn_to_section(nr);
+   rc = section_mark_hotpluggable(ms);
+   if (!rc)
+   break;
+   }
+
+   return rc;
+}
+EXPORT_SYMBOL_GPL(mark_memory_hotpluggable);
 #endif /* CONFIG_MEMORY_HOTREMOVE */
diff --git a/mm/sparse.c b/mm/sparse.c
index fcc3d176f1ea..cc21c23e2f1d 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "internal.h"
 #include 
@@ -644,6 +645,36 @@ void offline_mem_sections(unsigned long start_pfn, 
unsigned long end_pfn)
ms->section_mem_map &= ~SECTION_IS_ONLINE;
}
 }
+
+int section_mark_hotpluggable(struct mem_section *ms)
+{
+   unsigned long section_nr, pfn;
+   bool unmovable;
+   struct page *page;
+
+   /* section needs to be both valid and present to be marked */
+   if (WARN_ON(!valid_section(ms)) || !present_section(ms))
+   return -EINVAL;
+
+   /*
+* now check if this section is removable. This can be done by checking
+* if section has unmovable pages or not.
+*/
+   section_nr = __section_nr(ms);
+   pfn = section_nr_to_pfn(section_nr);
+   page = pfn_to_page(pfn);
+   unmovable = has_unmovable_p

[PATCH 0/2] mm/memory_hotplug, arm64: allow certain bootmem sections to be offlinable

2020-10-16 Thread Sudarshan Rajagopalan
In the patch that enables memory hot-remove (commit bbd6ec605c0f ("arm64/mm: 
Enable memory hot remove")) for arm64, there’s a notifier put in place that 
prevents boot memory from being offlined and removed. The commit text mentions 
that boot memory on arm64 cannot be removed. But x86 and other archs doesn’t 
seem to do this prevention.

The current logic is that only “new” memory blocks which are hot-added can 
later be offlined and removed. The memory that system booted up with cannot be 
offlined and removed. But there could be many usercases such as inter-VM memory 
sharing where a primary VM could offline and hot-remove a block/section of 
memory and lend it to secondary VM where it could hot-add it. And after usecase 
is done, the reverse happens where secondary VM hot-removes and gives it back 
to primary which can hot-add it back. In such cases, the present logic for 
arm64 doesn’t allow this hot-remove in primary to happen.

Also, on systems with movable zone that sort of guarantees pages to be migrated 
and isolated so that blocks can be offlined, this logic also defeats the 
purpose of having a movable zone which system can rely on memory hot-plugging, 
which say virt-io mem also relies on for fully plugged memory blocks.

This patch tries to solve by introducing a new section mem map sit 
'SECTION_MARK_HOTPLUGGABLE' which allows the concerned module drivers be able
to mark requried sections as "hotpluggable" by setting this bit. Also this 
marking is only allowed for sections which are in movable zone and have 
unmovable pages. The arm64 mmu code on receiving the MEM_GOING_OFFLINE 
notification, we disallow offlining of any boot memory by checking if 
section_early or not. With the introduction of SECTION_MARK_HOTPLUGGABLE, we 
allow boot mem sections that are marked as hotpluggable with this bit set to be 
offlined and removed. Thereby allowing required bootmem sections to be 
offlinable.

Sudarshan Rajagopalan (2):
  mm/memory_hotplug: allow marking of memory sections as hotpluggable
  arm64: allow hotpluggable sections to be offlined

 arch/arm64/mm/mmu.c|  2 +-
 include/linux/memory_hotplug.h |  1 +
 include/linux/mmzone.h |  9 -
 mm/memory_hotplug.c| 20 
 mm/sparse.c| 31 +++
 5 files changed, 61 insertions(+), 2 deletions(-)

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



arm64: dropping prevent_bootmem_remove_notifier

2020-10-16 Thread Sudarshan Rajagopalan



Hello Anshuman,

In the patch that enables memory hot-remove (commit bbd6ec605c0f 
("arm64/mm: Enable memory hot remove")) for arm64, there’s a notifier 
put in place that prevents boot memory from being offlined and removed. 
Also commit text mentions that boot memory on arm64 cannot be removed. 
We wanted to understand more about the reasoning for this. X86 and other 
archs doesn’t seem to do this prevention. There’s also comment in the 
code that this notifier could be dropped in future if and when boot 
memory can be removed.


The current logic is that only “new” memory blocks which are hot-added 
can later be offlined and removed. The memory that system booted up with 
cannot be offlined and removed. But there could be many usercases such 
as inter-VM memory sharing where a primary VM could offline and 
hot-remove a block/section of memory and lend it to secondary VM where 
it could hot-add it. And after usecase is done, the reverse happens 
where secondary VM hot-removes and gives it back to primary which can 
hot-add it back. In such cases, the present logic for arm64 doesn’t 
allow this hot-remove in primary to happen.


Also, on systems with movable zone that sort of guarantees pages to be 
migrated and isolated so that blocks can be offlined, this logic also 
defeats the purpose of having a movable zone which system can rely on 
memory hot-plugging, which say virt-io mem also relies on for fully 
plugged memory blocks.


I understand that some region of boot RAM shouldn’t be allowed to be 
removed, but such regions won’t be allowed to be offlined in first place 
since pages cannot be migrated and isolated, example reserved pages.


So we’re trying to understand the reasoning for such a prevention put in 
place for arm64 arch alone.


One possible way to solve this is by marking the required sections as 
“non-early” by removing the SECTION_IS_EARLY bit in its section_mem_map. 
This puts these sections in the context of “memory hotpluggable” which 
can be offlined-removed and added-onlined which are part of boot RAM 
itself and doesn’t need any extra blocks to be hot added. This way of 
marking certain sections as “non-early” could be exported so that module 
drivers can set the required number of sections as “memory 
hotpluggable”. This could have certain checks put in place to see which 
sections are allowed, example only movable zone sections can be marked 
as “non-early”.


Your thoughts on this? We are also looking for different ways to solve 
the problem without having to completely dropping this notifier, but 
just putting out the concern here about the notifier logic that is 
breaking our usecase which is a generic memory sharing usecase using 
memory hotplug feature.



Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


Re: [PATCH v3] arm64/mm: add fallback option to allocate virtually contiguous memory

2020-10-16 Thread Sudarshan Rajagopalan

On 2020-10-15 01:36, Will Deacon wrote:

On Wed, Oct 14, 2020 at 05:51:23PM -0700, Sudarshan Rajagopalan wrote:

When section mappings are enabled, we allocate vmemmap pages from
physically continuous memory of size PMD_SIZE using
vmemmap_alloc_block_buf(). Section mappings are good to reduce TLB
pressure. But when system is highly fragmented and memory blocks are
being hot-added at runtime, its possible that such physically 
continuous

memory allocations can fail. Rather than failing the memory hot-add
procedure, add a fallback option to allocate vmemmap pages from
discontinuous pages using vmemmap_populate_basepages().

Signed-off-by: Sudarshan Rajagopalan 
Reviewed-by: Gavin Shan 
Reviewed-by: Anshuman Khandual 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Anshuman Khandual 
Cc: Mark Rutland 
Cc: Logan Gunthorpe 
Cc: David Hildenbrand 
Cc: Andrew Morton 
Cc: Steven Price 
---
 arch/arm64/mm/mmu.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)


Please can you fix the subject? I have three copies of "PATCH v3" from
different days in my inbox. I know it sounds trivial, but getting these
little things right really helps with review, especially when it's 
sitting

amongst a sea of other patches.


Yes sure, sorry about that - will change it to "PATCH v4" to make it 
stand out from other patches.




Thanks,

Will



Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


Re: [PATCH v3] arm64/mm: add fallback option to allocate virtually contiguous memory

2020-10-14 Thread Sudarshan Rajagopalan

On 2020-10-13 04:38, Anshuman Khandual wrote:

On 10/13/2020 04:35 AM, Sudarshan Rajagopalan wrote:
When section mappings are enabled, we allocate vmemmap pages from 
physically
continuous memory of size PMD_SIZE using vmemmap_alloc_block_buf(). 
Section
mappings are good to reduce TLB pressure. But when system is highly 
fragmented
and memory blocks are being hot-added at runtime, its possible that 
such
physically continuous memory allocations can fail. Rather than failing 
the
memory hot-add procedure, add a fallback option to allocate vmemmap 
pages from

discontinuous pages using vmemmap_populate_basepages().


There is a checkpatch warning here, which could be fixed while merging 
?


WARNING: Possible unwrapped commit description (prefer a maximum 75
chars per line)
#7:
When section mappings are enabled, we allocate vmemmap pages from 
physically


total: 0 errors, 1 warnings, 13 lines checked



Thanks Anshuman for the review. I sent out an updated patch fixing the 
checkpatch warning.




Signed-off-by: Sudarshan Rajagopalan 
Reviewed-by: Gavin Shan 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Anshuman Khandual 
Cc: Mark Rutland 
Cc: Logan Gunthorpe 
Cc: David Hildenbrand 
Cc: Andrew Morton 
Cc: Steven Price 


Nonetheless, this looks fine. Did not see any particular problem
while creating an experimental vmemmap with interleaving section
and base page mapping.

Reviewed-by: Anshuman Khandual 


---
 arch/arm64/mm/mmu.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75df62fea1b6..44486fd0e883 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1121,8 +1121,11 @@ int __meminit vmemmap_populate(unsigned long 
start, unsigned long end, int node,

void *p = NULL;

p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
-   if (!p)
-   return -ENOMEM;
+   if (!p) {
+   if (vmemmap_populate_basepages(addr, next, 
node, altmap))
+   return -ENOMEM;
+   continue;
+   }

pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
} else




Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


[PATCH v3] arm64/mm: add fallback option to allocate virtually contiguous memory

2020-10-14 Thread Sudarshan Rajagopalan
When section mappings are enabled, we allocate vmemmap pages from
physically continuous memory of size PMD_SIZE using
vmemmap_alloc_block_buf(). Section mappings are good to reduce TLB
pressure. But when system is highly fragmented and memory blocks are
being hot-added at runtime, its possible that such physically continuous
memory allocations can fail. Rather than failing the memory hot-add
procedure, add a fallback option to allocate vmemmap pages from
discontinuous pages using vmemmap_populate_basepages().

Signed-off-by: Sudarshan Rajagopalan 
Reviewed-by: Gavin Shan 
Reviewed-by: Anshuman Khandual 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Anshuman Khandual 
Cc: Mark Rutland 
Cc: Logan Gunthorpe 
Cc: David Hildenbrand 
Cc: Andrew Morton 
Cc: Steven Price 
---
 arch/arm64/mm/mmu.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75df62fea1b6..44486fd0e883 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1121,8 +1121,11 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node,
void *p = NULL;
 
p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
-   if (!p)
-   return -ENOMEM;
+   if (!p) {
+   if (vmemmap_populate_basepages(addr, next, 
node, altmap))
+   return -ENOMEM;
+   continue;
+   }
 
pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
} else
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v3] arm64/mm: add fallback option to allocate virtually contiguous memory

2020-10-14 Thread Sudarshan Rajagopalan
V1: The initial patch used the approach to abort at the first instance of 
PMD_SIZE
allocation failure, unmaps all previously mapped sections using vmemmap_free
and maps the entire request with vmemmap_populate_basepages to allocate 
virtually contiguous memory.
https://lkml.org/lkml/2020/9/10/66

V2: Allocates virtually contiguous memory only for sections that failed
PMD_SIZE allocation, and continous to allocate physically contiguous
memory for other sections.
https://lkml.org/lkml/2020/9/30/1489

V3: Addressed trivial review comments. Pass in altmap to 
vmemmap_populate_basepages.

Sudarshan Rajagopalan (1):
  arm64/mm: add fallback option to allocate virtually contiguous memory

 arch/arm64/mm/mmu.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v3] arm64/mm: add fallback option to allocate virtually contiguous memory

2020-10-12 Thread Sudarshan Rajagopalan
When section mappings are enabled, we allocate vmemmap pages from physically
continuous memory of size PMD_SIZE using vmemmap_alloc_block_buf(). Section
mappings are good to reduce TLB pressure. But when system is highly fragmented
and memory blocks are being hot-added at runtime, its possible that such
physically continuous memory allocations can fail. Rather than failing the
memory hot-add procedure, add a fallback option to allocate vmemmap pages from
discontinuous pages using vmemmap_populate_basepages().

Signed-off-by: Sudarshan Rajagopalan 
Reviewed-by: Gavin Shan 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Anshuman Khandual 
Cc: Mark Rutland 
Cc: Logan Gunthorpe 
Cc: David Hildenbrand 
Cc: Andrew Morton 
Cc: Steven Price 
---
 arch/arm64/mm/mmu.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75df62fea1b6..44486fd0e883 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1121,8 +1121,11 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node,
void *p = NULL;
 
p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
-   if (!p)
-   return -ENOMEM;
+   if (!p) {
+   if (vmemmap_populate_basepages(addr, next, 
node, altmap))
+   return -ENOMEM;
+   continue;
+   }
 
pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
} else
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v3] arm64/mm: add fallback option to allocate virtually contiguous memory

2020-10-12 Thread Sudarshan Rajagopalan
V1: The initial patch used the approach to abort at the first instance of 
PMD_SIZE
allocation failure, unmaps all previously mapped sections using vmemmap_free
and maps the entire request with vmemmap_populate_basepages to allocate 
virtually contiguous memory.
https://lkml.org/lkml/2020/9/10/66

V2: Allocates virtually contiguous memory only for sections that failed
PMD_SIZE allocation, and continous to allocate physically contiguous
memory for other sections.
https://lkml.org/lkml/2020/9/30/1489

V3: Addressed trivial review comments. Pass in altmap to 
vmemmap_populate_basepages.

Sudarshan Rajagopalan (1):
  arm64/mm: add fallback option to allocate virtually contiguous memory

 arch/arm64/mm/mmu.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v3] arm64/mm: add fallback option to allocate virtually contiguous memory

2020-10-01 Thread Sudarshan Rajagopalan
When section mappings are enabled, we allocate vmemmap pages from physically
continuous memory of size PMD_SIZE using vmemmap_alloc_block_buf(). Section
mappings are good to reduce TLB pressure. But when system is highly fragmented
and memory blocks are being hot-added at runtime, its possible that such
physically continuous memory allocations can fail. Rather than failing the
memory hot-add procedure, add a fallback option to allocate vmemmap pages from
discontinuous pages using vmemmap_populate_basepages().

Signed-off-by: Sudarshan Rajagopalan 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Anshuman Khandual 
Cc: Mark Rutland 
Cc: Logan Gunthorpe 
Cc: David Hildenbrand 
Cc: Andrew Morton 
Cc: Steven Price 
---
 arch/arm64/mm/mmu.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75df62f..11f8639 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1121,8 +1121,15 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node,
void *p = NULL;
 
p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
-   if (!p)
-   return -ENOMEM;
+   if (!p) {
+   /*
+* fallback allocating with virtually
+* contiguous memory for this section
+*/
+   if (vmemmap_populate_basepages(addr, next, 
node, NULL))
+   return -ENOMEM;
+   continue;
+   }
 
pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
} else
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v3] arm64/mm: add fallback option to allocate virtually

2020-10-01 Thread Sudarshan Rajagopalan
V1: The initial patch used the approach to abort at the first instance of 
PMD_SIZE
allocation failure, unmaps all previously mapped sections using vmemmap_free
and maps the entire request with vmemmap_populate_basepages to allocate
virtually contiguous memory.
https://lkml.org/lkml/2020/9/10/66

V2: Allocates virtually contiguous memory only for sections that failed
PMD_SIZE allocation, and continues to allocate physically contiguous
memory for other sections.
https://lkml.org/lkml/2020/9/30/1489

V3: Addresses Anshuman's comment to allow fallback to altmap base pages
as well if and when required.

Sudarshan Rajagopalan (1):
  arm64/mm: add fallback option to allocate virtually contiguous memory

 arch/arm64/mm/mmu.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [PATCH v2] arm64/mm: add fallback option to allocate virtually contiguous memory

2020-10-01 Thread Sudarshan Rajagopalan

On 2020-09-30 17:30, Anshuman Khandual wrote:

On 10/01/2020 04:43 AM, Sudarshan Rajagopalan wrote:
When section mappings are enabled, we allocate vmemmap pages from 
physically
continuous memory of size PMD_SIZE using vmemmap_alloc_block_buf(). 
Section
mappings are good to reduce TLB pressure. But when system is highly 
fragmented
and memory blocks are being hot-added at runtime, its possible that 
such
physically continuous memory allocations can fail. Rather than failing 
the
memory hot-add procedure, add a fallback option to allocate vmemmap 
pages from

discontinuous pages using vmemmap_populate_basepages().

Signed-off-by: Sudarshan Rajagopalan 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Anshuman Khandual 
Cc: Mark Rutland 
Cc: Logan Gunthorpe 
Cc: David Hildenbrand 
Cc: Andrew Morton 
Cc: Steven Price 
---
 arch/arm64/mm/mmu.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75df62f..9edbbb8 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1121,8 +1121,18 @@ int __meminit vmemmap_populate(unsigned long 
start, unsigned long end, int node,

void *p = NULL;

p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
-   if (!p)
-   return -ENOMEM;
+   if (!p) {
+   if (altmap)
+   return -ENOMEM; /* no fallback */


Why ? If huge pages inside a vmemmap section might have been allocated
from altmap, the base page could also fallback on altmap. If this patch
has just followed the existing x86 semantics, it was written [1] long
back before vmemmap_populate_basepages() supported altmap allocation.
While adding that support [2] recently, it was deliberate not to change
x86 semantics as it was a platform decision. Nonetheless, it makes 
sense

to fallback on altmap bases pages if and when required.

[1] 4b94ffdc4163 (x86, mm: introduce vmem_altmap to augment 
vmemmap_populate())

[2] 1d9cfee7535c (mm/sparsemem: enable vmem_altmap support in
vmemmap_populate_basepages())


Yes agreed. We can allow fallback on altmap as well. I did indeed follow 
x86 semantics. Will send the updated patch.


Sudarshan

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


[PATCH v2] arm64/mm: add fallback option to allocate virtually contiguous memory

2020-09-30 Thread Sudarshan Rajagopalan
When section mappings are enabled, we allocate vmemmap pages from physically
continuous memory of size PMD_SIZE using vmemmap_alloc_block_buf(). Section
mappings are good to reduce TLB pressure. But when system is highly fragmented
and memory blocks are being hot-added at runtime, its possible that such
physically continuous memory allocations can fail. Rather than failing the
memory hot-add procedure, add a fallback option to allocate vmemmap pages from
discontinuous pages using vmemmap_populate_basepages().

Signed-off-by: Sudarshan Rajagopalan 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Anshuman Khandual 
Cc: Mark Rutland 
Cc: Logan Gunthorpe 
Cc: David Hildenbrand 
Cc: Andrew Morton 
Cc: Steven Price 
---
 arch/arm64/mm/mmu.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75df62f..9edbbb8 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1121,8 +1121,18 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node,
void *p = NULL;
 
p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
-   if (!p)
-   return -ENOMEM;
+   if (!p) {
+   if (altmap)
+   return -ENOMEM; /* no fallback */
+
+   /*
+* fallback allocating with virtually
+* contiguous memory for this section
+*/
+   if (vmemmap_populate_basepages(addr, next, 
node, NULL))
+   return -ENOMEM;
+   continue;
+   }
 
pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
} else
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v2] arm64/mm: add fallback option to allocate virtually

2020-09-30 Thread Sudarshan Rajagopalan
V1: The initial patch used the approach to abort at the first instance of 
PMD_SIZE
allocation failure, unmaps all previously mapped sections using vmemmap_free
and maps the entire request with vmemmap_populate_basepages to allocate
virtually contiguous memory.
https://lkml.org/lkml/2020/9/10/66

V2: Allocates virtually contiguous memory only for sections that failed
PMD_SIZE allocation, and continues to allocate physically contiguous
memory for other sections.

Sudarshan Rajagopalan (1):
  arm64/mm: add fallback option to allocate virtually contiguous memory

 arch/arm64/mm/mmu.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH] arm64/mm: add fallback option to allocate virtually contiguous memory

2020-09-10 Thread Sudarshan Rajagopalan
When section mappings are enabled, we allocate vmemmap pages from physically
continuous memory of size PMD_SZIE using vmemmap_alloc_block_buf(). Section
mappings are good to reduce TLB pressure. But when system is highly fragmented
and memory blocks are being hot-added at runtime, its possible that such
physically continuous memory allocations can fail. Rather than failing the
memory hot-add procedure, add a fallback option to allocate vmemmap pages from
discontinuous pages using vmemmap_populate_basepages().

Signed-off-by: Sudarshan Rajagopalan 
Cc: Catalin Marinas 
Cc: Will Deacon 
Cc: Anshuman Khandual 
Cc: Mark Rutland 
Cc: Logan Gunthorpe 
Cc: David Hildenbrand 
Cc: Andrew Morton 
Cc: Steven Price 
---
 arch/arm64/mm/mmu.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 75df62f..a46c7d4 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node,
p4d_t *p4dp;
pud_t *pudp;
pmd_t *pmdp;
+   int ret = 0;
 
do {
next = pmd_addr_end(addr, end);
@@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long start, 
unsigned long end, int node,
void *p = NULL;
 
p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
-   if (!p)
-   return -ENOMEM;
+   if (!p) {
+#ifdef CONFIG_MEMORY_HOTPLUG
+   vmemmap_free(start, end, altmap);
+#endif
+   ret = -ENOMEM;
+   break;
+   }
 
pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
} else
vmemmap_verify((pte_t *)pmdp, node, addr, next);
} while (addr = next, addr != end);
 
-   return 0;
+   if (ret)
+   return vmemmap_populate_basepages(start, end, node, altmap);
+   else
+   return ret;
 }
 #endif /* !ARM64_SWAPPER_USES_SECTION_MAPS */
 void vmemmap_free(unsigned long start, unsigned long end,
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project