David Hildenbrand <da...@redhat.com> writes:

> On 14.05.24 16:04, Björn Töpel wrote:
>> From: Björn Töpel <bj...@rivosinc.com>
>> 
>> For an architecture to support memory hotplugging, a couple of
>> callbacks needs to be implemented:
>> 
>>   arch_add_memory()
>>    This callback is responsible for adding the physical memory into the
>>    direct map, and call into the memory hotplugging generic code via
>>    __add_pages() that adds the corresponding struct page entries, and
>>    updates the vmemmap mapping.
>> 
>>   arch_remove_memory()
>>    This is the inverse of the callback above.
>> 
>>   vmemmap_free()
>>    This function tears down the vmemmap mappings (if
>>    CONFIG_SPARSEMEM_VMEMMAP is enabled), and also deallocates the
>>    backing vmemmap pages. Note that for persistent memory, an
>>    alternative allocator for the backing pages can be used; The
>>    vmem_altmap. This means that when the backing pages are cleared,
>>    extra care is needed so that the correct deallocation method is
>>    used.
>> 
>>   arch_get_mappable_range()
>>    This functions returns the PA range that the direct map can map.
>>    Used by the MHP internals for sanity checks.
>> 
>> The page table unmap/teardown functions are heavily based on code from
>> the x86 tree. The same remove_pgd_mapping() function is used in both
>> vmemmap_free() and arch_remove_memory(), but in the latter function
>> the backing pages are not removed.
>> 
>> Signed-off-by: Björn Töpel <bj...@rivosinc.com>
>> ---
>>   arch/riscv/mm/init.c | 242 +++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 242 insertions(+)
>> 
>> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
>> index 6f72b0b2b854..7f0b921a3d3a 100644
>> --- a/arch/riscv/mm/init.c
>> +++ b/arch/riscv/mm/init.c
>> @@ -1493,3 +1493,245 @@ void __init pgtable_cache_init(void)
>>      }
>>   }
>>   #endif
>> +
>> +#ifdef CONFIG_MEMORY_HOTPLUG
>> +static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
>> +{
>> +    pte_t *pte;
>> +    int i;
>> +
>> +    for (i = 0; i < PTRS_PER_PTE; i++) {
>> +            pte = pte_start + i;
>> +            if (!pte_none(*pte))
>> +                    return;
>> +    }
>> +
>> +    free_pages((unsigned long)page_address(pmd_page(*pmd)), 0);
>> +    pmd_clear(pmd);
>> +}
>> +
>> +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
>> +{
>> +    pmd_t *pmd;
>> +    int i;
>> +
>> +    for (i = 0; i < PTRS_PER_PMD; i++) {
>> +            pmd = pmd_start + i;
>> +            if (!pmd_none(*pmd))
>> +                    return;
>> +    }
>> +
>> +    free_pages((unsigned long)page_address(pud_page(*pud)), 0);
>> +    pud_clear(pud);
>> +}
>> +
>> +static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d)
>> +{
>> +    pud_t *pud;
>> +    int i;
>> +
>> +    for (i = 0; i < PTRS_PER_PUD; i++) {
>> +            pud = pud_start + i;
>> +            if (!pud_none(*pud))
>> +                    return;
>> +    }
>> +
>> +    free_pages((unsigned long)page_address(p4d_page(*p4d)), 0);
>> +    p4d_clear(p4d);
>> +}
>> +
>> +static void __meminit free_vmemmap_storage(struct page *page, size_t size,
>> +                                       struct vmem_altmap *altmap)
>> +{
>> +    if (altmap)
>> +            vmem_altmap_free(altmap, size >> PAGE_SHIFT);
>> +    else
>> +            free_pages((unsigned long)page_address(page), get_order(size));
>
> If you unplug a DIMM that was added during boot (can happen on x86-64, 
> can it happen on riscv?), free_pages() would not be sufficient. You'd be 
> freeing a PG_reserved page that has to be freed differently.

I'd say if it can happen on x86-64, it probably can on RISC-V. I'll look
into this for the next spin!

Thanks for spending time on the series!


Cheers,
Björn

Reply via email to