On 09/10/2020 01:57 PM, sudar...@codeaurora.org wrote:
> Hello Anshuman,
>
>> On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
>>> When section mappings are enabled, we allocate vmemmap pages from
>>> physically continuous memory of size PMD_SZIE using
>>> vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB
>>> pressure. But when system is highly fragmented and memory blocks are
>>> being hot-added at runtime, its possible that such physically
>>> continuous memory allocations can fail. Rather than failing the
>>
>> Did you really see this happen on a system ?
>
> Thanks for the response.
There seems to be some text alignment problem in your response on this
thread, please have a look.
>
> Yes, this happened on a system with very low RAM (size ~120MB) where no free
> order-9 pages were present. Pasting below few kernel logs. On systems with
> low RAM, its high probability where memory is fragmented and no higher order
> pages are free. On such scenarios, vmemmap alloc would fail for PMD_SIZE of
> contiguous memory.
>
> We have a usecase for memory sharing between VMs where one of the VM uses
> add_memory() to add the memory that was donated by the other VM. This uses
> something similar to VirtIO-Mem. And this requires memory to be _guaranteed_
> to be added in the VM so that the usecase can run without any failure.
>
> vmemmap alloc failure: order:9, mode:0x4cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL),
> nodemask=(null),cpuset=/,mems_allowed=0
> CPU: 1 PID: 294 Comm: -------- Tainted: G S 5.4.50 #1
> Call trace:
> dump_stack+0xa4/0xdc
> warn_alloc+0x104/0x160
> vmemmap_alloc_block+0xe4/0xf4
> vmemmap_alloc_block_buf+0x34/0x38
> vmemmap_populate+0xc8/0x224
> __populate_section_memmap+0x34/0x54
> sparse_add_section+0x16c/0x254
> __add_pages+0xd0/0x138
> arch_add_memory+0x114/0x1a8
>
> DMA32: 2627*4kB (UMC) 23*8kB (UME) 6*16kB (UM) 8*32kB (UME) 2*64kB (ME)
> 2*128kB (UE) 1*256kB (M) 2*512kB (ME) 1*1024kB (M) 0*2048kB 0*4096kB = 13732kB
> 30455 pages RAM
>
> But keeping this usecase aside, won’t this be problematic on any systems with
> low RAM where order-9 alloc would fail on a fragmented system, and any memory
> hot-adding would fail? Or other similar users of VirtIO-Mem which uses
> arch_add_memory.
>
>>
>>> memory hot-add procedure, add a fallback option to allocate vmemmap
>>> pages from discontinuous pages using vmemmap_populate_basepages().
>>
>> Which could lead to a mixed page size mapping in the VMEMMAP area.
>
> Would this be problematic? We would only lose one section mapping per failure
> and increases slight TLB pressure. Also, we would anyway do discontinuous
> pages alloc for systems having non-4K pages (ARM64_SWAPPER_USES_SECTION_MAPS
> will be 0). I only see a small cost to performance due to slight TLB pressure.
>
>> Allocation failure in vmemmap_populate() should just cleanly fail the memory
>> hot add operation, which can then be retried. Why the retry has to be
>> offloaded to kernel ?
>
> While a retry can attempted again, but it won’t help in cases where there are
> no order-9 pages available and any retry would just not succeed again until a
> order-9 page gets free'ed. Here we are just falling back to use discontinuous
> pages allocation to help succeed memory hot-add as best as possible.
Understood, seems like there is enough potential use cases and scenarios
right now, to consider this fallback mechanism and a possible mixed page
size vmemmap. But I would let others weigh in, on the performance impact.
>
> Thanks and Regards,
> Sudarshan
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
> Foundation Collaborative Project
>
> -----Original Message-----
> From: Anshuman Khandual <anshuman.khand...@arm.com>
> Sent: Wednesday, September 9, 2020 11:45 PM
> To: Sudarshan Rajagopalan <sudar...@codeaurora.org>;
> linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org
> Cc: Catalin Marinas <catalin.mari...@arm.com>; Will Deacon <w...@kernel.org>;
> Mark Rutland <mark.rutl...@arm.com>; Logan Gunthorpe <log...@deltatee.com>;
> David Hildenbrand <da...@redhat.com>; Andrew Morton
> <a...@linux-foundation.org>; Steven Price <steven.pr...@arm.com>
> Subject: Re: [PATCH] arm64/mm: add fallback option to allocate virtually
> contiguous memory
>
> Hello Sudarshan,
>
> On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
>> When section mappings are enabled, we allocate vmemmap pages from
>> physically continuous memory of size PMD_SZIE using
>> vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB
>> pressure. But when system is highly fragmented and memory blocks are
>> being hot-added at runtime, its possible that such physically
>> continuous memory allocations can fail. Rather than failing the
>
> Did you really see this happen on a system ?
>
>> memory hot-add procedure, add a fallback option to allocate vmemmap
>> pages from discontinuous pages using vmemmap_populate_basepages().
>
> Which could lead to a mixed page size mapping in the VMEMMAP area.
> Allocation failure in vmemmap_populate() should just cleanly fail the memory
> hot add operation, which can then be retried. Why the retry has to be
> offloaded to kernel ?
>
>>
>> Signed-off-by: Sudarshan Rajagopalan <sudar...@codeaurora.org>
>> Cc: Catalin Marinas <catalin.mari...@arm.com>
>> Cc: Will Deacon <w...@kernel.org>
>> Cc: Anshuman Khandual <anshuman.khand...@arm.com>
>> Cc: Mark Rutland <mark.rutl...@arm.com>
>> Cc: Logan Gunthorpe <log...@deltatee.com>
>> Cc: David Hildenbrand <da...@redhat.com>
>> Cc: Andrew Morton <a...@linux-foundation.org>
>> Cc: Steven Price <steven.pr...@arm.com>
>> ---
>> arch/arm64/mm/mmu.c | 15 ++++++++++++---
>> 1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index
>> 75df62f..a46c7d4 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long start,
>> unsigned long end, int node,
>> p4d_t *p4dp;
>> pud_t *pudp;
>> pmd_t *pmdp;
>> + int ret = 0;
>>
>> do {
>> next = pmd_addr_end(addr, end);
>> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long start,
>> unsigned long end, int node,
>> void *p = NULL;
>>
>> p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
>> - if (!p)
>> - return -ENOMEM;
>> + if (!p) {
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> + vmemmap_free(start, end, altmap); #endif
>
> The mapping was never created in the first place, as the allocation failed.
> vmemmap_free() here will free an unmapped area !
>
>> + ret = -ENOMEM;
>> + break;
>> + }
>>
>> pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
>> } else
>> vmemmap_verify((pte_t *)pmdp, node, addr, next);
>> } while (addr = next, addr != end);
>>
>> - return 0;
>> + if (ret)
>> + return vmemmap_populate_basepages(start, end, node, altmap);
>> + else
>> + return ret;
>> }
>> #endif /* !ARM64_SWAPPER_USES_SECTION_MAPS */
>> void vmemmap_free(unsigned long start, unsigned long end,
>>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>