On 09/10/2020 01:57 PM, sudar...@codeaurora.org wrote:
> Hello Anshuman,
> 
>> On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
>>> When section mappings are enabled, we allocate vmemmap pages from 
>>> physically continuous memory of size PMD_SZIE using 
>>> vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB 
>>> pressure. But when system is highly fragmented and memory blocks are 
>>> being hot-added at runtime, its possible that such physically 
>>> continuous memory allocations can fail. Rather than failing the
>>
>> Did you really see this happen on a system ?
> 
> Thanks for the response.

There seems to be some text alignment problem in your response on this
thread, please have a look.

> 
> Yes, this happened on a system with very low RAM (size ~120MB) where no free 
> order-9 pages were present. Pasting below few kernel logs. On systems with 
> low RAM, its high probability where memory is fragmented and no higher order 
> pages are free. On such scenarios, vmemmap alloc would fail for PMD_SIZE of 
> contiguous memory.
> 
> We have a usecase for memory sharing between VMs where one of the VM uses 
> add_memory() to add the memory that was donated by the other VM. This uses 
> something similar to VirtIO-Mem. And this requires memory to be _guaranteed_ 
> to be added in the VM so that the usecase can run without any failure.
> 
> vmemmap alloc failure: order:9, mode:0x4cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL), 
> nodemask=(null),cpuset=/,mems_allowed=0
> CPU: 1 PID: 294 Comm: -------- Tainted: G S                5.4.50 #1
> Call trace:
>  dump_stack+0xa4/0xdc
>  warn_alloc+0x104/0x160
>  vmemmap_alloc_block+0xe4/0xf4
>  vmemmap_alloc_block_buf+0x34/0x38
>  vmemmap_populate+0xc8/0x224
>  __populate_section_memmap+0x34/0x54
>  sparse_add_section+0x16c/0x254
>  __add_pages+0xd0/0x138
>  arch_add_memory+0x114/0x1a8
> 
> DMA32: 2627*4kB (UMC) 23*8kB (UME) 6*16kB (UM) 8*32kB (UME) 2*64kB (ME) 
> 2*128kB (UE) 1*256kB (M) 2*512kB (ME) 1*1024kB (M) 0*2048kB 0*4096kB = 13732kB
> 30455 pages RAM
> 
> But keeping this usecase aside, won’t this be problematic on any systems with 
> low RAM where order-9 alloc would fail on a fragmented system, and any memory 
> hot-adding would fail? Or other similar users of VirtIO-Mem which uses 
> arch_add_memory.
> 
>>
>>> memory hot-add procedure, add a fallback option to allocate vmemmap 
>>> pages from discontinuous pages using vmemmap_populate_basepages().
>>
>> Which could lead to a mixed page size mapping in the VMEMMAP area.
> 
> Would this be problematic? We would only lose one section mapping per failure 
> and increases slight TLB pressure. Also, we would anyway do discontinuous 
> pages alloc for systems having non-4K pages (ARM64_SWAPPER_USES_SECTION_MAPS 
> will be 0). I only see a small cost to performance due to slight TLB pressure.
> 
>> Allocation failure in vmemmap_populate() should just cleanly fail the memory 
>> hot add operation, which can then be retried. Why the retry has to be 
>> offloaded to kernel ?
> 
> While a retry can attempted again, but it won’t help in cases where there are 
> no order-9 pages available and any retry would just not succeed again until a 
> order-9 page gets free'ed. Here we are just falling back to use discontinuous 
> pages allocation to help succeed memory hot-add as best as possible.

Understood, seems like there is enough potential use cases and scenarios
right now, to consider this fallback mechanism and a possible mixed page
size vmemmap. But I would let others weigh in, on the performance impact.

> 
> Thanks and Regards,
> Sudarshan
> 
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
> Foundation Collaborative Project
> 
> -----Original Message-----
> From: Anshuman Khandual <anshuman.khand...@arm.com> 
> Sent: Wednesday, September 9, 2020 11:45 PM
> To: Sudarshan Rajagopalan <sudar...@codeaurora.org>; 
> linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org
> Cc: Catalin Marinas <catalin.mari...@arm.com>; Will Deacon <w...@kernel.org>; 
> Mark Rutland <mark.rutl...@arm.com>; Logan Gunthorpe <log...@deltatee.com>; 
> David Hildenbrand <da...@redhat.com>; Andrew Morton 
> <a...@linux-foundation.org>; Steven Price <steven.pr...@arm.com>
> Subject: Re: [PATCH] arm64/mm: add fallback option to allocate virtually 
> contiguous memory
> 
> Hello Sudarshan,
> 
> On 09/10/2020 11:35 AM, Sudarshan Rajagopalan wrote:
>> When section mappings are enabled, we allocate vmemmap pages from 
>> physically continuous memory of size PMD_SZIE using 
>> vmemmap_alloc_block_buf(). Section> mappings are good to reduce TLB 
>> pressure. But when system is highly fragmented and memory blocks are 
>> being hot-added at runtime, its possible that such physically 
>> continuous memory allocations can fail. Rather than failing the
> 
> Did you really see this happen on a system ?
> 
>> memory hot-add procedure, add a fallback option to allocate vmemmap 
>> pages from discontinuous pages using vmemmap_populate_basepages().
> 
> Which could lead to a mixed page size mapping in the VMEMMAP area.
> Allocation failure in vmemmap_populate() should just cleanly fail the memory 
> hot add operation, which can then be retried. Why the retry has to be 
> offloaded to kernel ?
> 
>>
>> Signed-off-by: Sudarshan Rajagopalan <sudar...@codeaurora.org>
>> Cc: Catalin Marinas <catalin.mari...@arm.com>
>> Cc: Will Deacon <w...@kernel.org>
>> Cc: Anshuman Khandual <anshuman.khand...@arm.com>
>> Cc: Mark Rutland <mark.rutl...@arm.com>
>> Cc: Logan Gunthorpe <log...@deltatee.com>
>> Cc: David Hildenbrand <da...@redhat.com>
>> Cc: Andrew Morton <a...@linux-foundation.org>
>> Cc: Steven Price <steven.pr...@arm.com>
>> ---
>>  arch/arm64/mm/mmu.c | 15 ++++++++++++---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 
>> 75df62f..a46c7d4 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -1100,6 +1100,7 @@ int __meminit vmemmap_populate(unsigned long start, 
>> unsigned long end, int node,
>>      p4d_t *p4dp;
>>      pud_t *pudp;
>>      pmd_t *pmdp;
>> +    int ret = 0;
>>  
>>      do {
>>              next = pmd_addr_end(addr, end);
>> @@ -1121,15 +1122,23 @@ int __meminit vmemmap_populate(unsigned long start, 
>> unsigned long end, int node,
>>                      void *p = NULL;
>>  
>>                      p = vmemmap_alloc_block_buf(PMD_SIZE, node, altmap);
>> -                    if (!p)
>> -                            return -ENOMEM;
>> +                    if (!p) {
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +                            vmemmap_free(start, end, altmap); #endif
> 
> The mapping was never created in the first place, as the allocation failed. 
> vmemmap_free() here will free an unmapped area !
> 
>> +                            ret = -ENOMEM;
>> +                            break;
>> +                    }
>>  
>>                      pmd_set_huge(pmdp, __pa(p), __pgprot(PROT_SECT_NORMAL));
>>              } else
>>                      vmemmap_verify((pte_t *)pmdp, node, addr, next);
>>      } while (addr = next, addr != end);
>>  
>> -    return 0;
>> +    if (ret)
>> +            return vmemmap_populate_basepages(start, end, node, altmap);
>> +    else
>> +            return ret;
>>  }
>>  #endif      /* !ARM64_SWAPPER_USES_SECTION_MAPS */
>>  void vmemmap_free(unsigned long start, unsigned long end,
>>
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

Reply via email to