virt: Add memory hotplugging and virtio-md-pci support

Björn Töpel Mon, 27 May 2024 05:04:14 -0700

David Hildenbrand <da...@redhat.com> writes:

> On 24.05.24 15:14, Daniel Henrique Barboza wrote:
>> 
>> 
>> On 5/21/24 07:56, Björn Töpel wrote:
>>> From: Björn Töpel <bj...@rivosinc.com>
>>>
>>> Virtio-based memory devices (virtio-mem/virtio-pmem) allows for
>>> dynamic resizing of virtual machine memory, and requires proper
>>> hotplugging (add/remove) support to work.
>>>
>>> Add device memory support for RISC-V "virt" machine, and enable
>>> virtio-md-pci with the corresponding missing hotplugging callbacks.
>>>
>>> Signed-off-by: Björn Töpel <bj...@rivosinc.com>
>>> ---
>>>    hw/riscv/Kconfig       |  2 +
>>>    hw/riscv/virt.c        | 83 +++++++++++++++++++++++++++++++++++++++++-
>>>    hw/virtio/virtio-mem.c |  5 ++-
>>>    3 files changed, 87 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
>>> index a2030e3a6ff0..08f82dbb681a 100644
>>> --- a/hw/riscv/Kconfig
>>> +++ b/hw/riscv/Kconfig
>>> @@ -56,6 +56,8 @@ config RISCV_VIRT
>>>        select PLATFORM_BUS
>>>        select ACPI
>>>        select ACPI_PCI
>>> +    select VIRTIO_MEM_SUPPORTED
>>> +    select VIRTIO_PMEM_SUPPORTED
>>>    
>>>    config SHAKTI_C
>>>        bool
>>> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
>>> index 4fdb66052587..443902f919d2 100644
>>> --- a/hw/riscv/virt.c
>>> +++ b/hw/riscv/virt.c
>>> @@ -53,6 +53,8 @@
>>>    #include "hw/pci-host/gpex.h"
>>>    #include "hw/display/ramfb.h"
>>>    #include "hw/acpi/aml-build.h"
>>> +#include "hw/mem/memory-device.h"
>>> +#include "hw/virtio/virtio-mem-pci.h"
>>>    #include "qapi/qapi-visit-common.h"
>>>    #include "hw/virtio/virtio-iommu.h"
>>>    
>>> @@ -1407,6 +1409,7 @@ static void virt_machine_init(MachineState *machine)
>>>        DeviceState *mmio_irqchip, *virtio_irqchip, *pcie_irqchip;
>>>        int i, base_hartid, hart_count;
>>>        int socket_count = riscv_socket_count(machine);
>>> +    hwaddr device_memory_base, device_memory_size;
>>>    
>>>        /* Check socket count limit */
>>>        if (VIRT_SOCKETS_MAX < socket_count) {
>>> @@ -1420,6 +1423,12 @@ static void virt_machine_init(MachineState *machine)
>>>            exit(1);
>>>        }
>>>    
>>> +    if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
>>> +        error_report("unsupported amount of memory slots: %"PRIu64,
>>> +                     machine->ram_slots);
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>>        /* Initialize sockets */
>>>        mmio_irqchip = virtio_irqchip = pcie_irqchip = NULL;
>>>        for (i = 0; i < socket_count; i++) {
>>> @@ -1553,6 +1562,37 @@ static void virt_machine_init(MachineState *machine)
>>>        memory_region_add_subregion(system_memory, memmap[VIRT_MROM].base,
>>>                                    mask_rom);
>>>    
>>> +    /* device memory */
>>> +    device_memory_base = ROUND_UP(s->memmap[VIRT_DRAM].base + 
>>> machine->ram_size,
>>> +                                  GiB);
>>> +    device_memory_size = machine->maxram_size - machine->ram_size;
>>> +    if (device_memory_size > 0) {
>>> +        /*
>>> +         * Each DIMM is aligned based on the backend's alignment value.
>>> +         * Assume max 1G hugepage alignment per slot.
>>> +         */
>>> +        device_memory_size += machine->ram_slots * GiB;
>> 
>> We don't need to align to 1GiB. This calc can use 2MiB instead (or 4MiB if 
>> we're
>> running 32 bits).
>> 
>>> +
>>> +        if (riscv_is_32bit(&s->soc[0])) {
>>> +            hwaddr memtop = device_memory_base + 
>>> ROUND_UP(device_memory_size,
>>> +                                                          GiB);
>> 
>> Same here - alignment is 2/4 MiB.
>> 
>>> +
>>> +            if (memtop > UINT32_MAX) {
>>> +                error_report("memory exceeds 32-bit limit by %lu bytes",
>>> +                             memtop - UINT32_MAX);
>>> +                exit(EXIT_FAILURE);
>>> +            }
>>> +        }
>>> +
>>> +        if (device_memory_base + device_memory_size < device_memory_size) {
>>> +            error_report("unsupported amount of device memory");
>>> +            exit(EXIT_FAILURE);
>>> +        }
>> 
>> Took another look and found this a bit strange. These are all unsigned vars, 
>> so
>> if (unsigned a + unsigned b < unsigned b) will always be 'false'. The 
>> compiler is
>> probably cropping this out.
>
> No. Unsigned interger overflow is defined behavior and this is a common 
> check to detect such overflow. tI's consistent with what we do for other 
> architectures.
>
>> 
>> The calc we need to do is to ensure that the extra ram_slots * alignment 
>> will fit into
>> the VIRT_DRAM block, i.e. maxram_size + (ram_slots * alignment) < 
>> memmap[VIRT_DRAM].size.
>> 
>> 
>> TBH I'm starting to have second thoughts about letting users hotplug 
>> whatever they want.
>> It seems cleaner to just force the 2/4 Mb alignment in pre_plug() and be 
>> done with it,
>> no need to allocate ram_slots * alignment and doing all these extra checks.
>
> It's worth noting that if user space decides to specify addresses 
> manually, it can mess up everything already. There are other events that 
> can result in fragmentation of the memory device area (repeated 
> hot(un)plug of differing DIMMs).
>
> Assume you have 1 GiB range and hotplug a 512 MiB DIMM at offset 256 
> MiB. You won't be able to hotplug another 512 MiB DIMM even though we 
> reserved space.
>
> My take so far is: if the user wants to do such stuff it should size the 
> area (maxmem) much larger or deal with the concequences (not being able 
> to hotplug memory).
>
> It usually does not happen in practice ...


Daniel/David, again thanks for spending time on the patches.

The reason I picked 1 GiB per slot was because the alignment is also
dependent on the backend AFAIU. Rationale in commit 085f8e88ba73 ("pc:
count in 1Gb hugepage alignment when sizing hotplug-memory container").

What I'm reading from you guys are: Just depend on whatever maxmem says
(modulo 2/4M alignment), and leave at that. I agree that that's much
easier to reason about.

Correct?

>> As I sent in an earlier email, users must already comply to the alignment of 
>> the host
>> memory when plugging pc-dimms, so I'm not sure our value/proposition with 
>> all this
>> extra code is worth it - the alignment will most likely be forced by the 
>> host memory
>> backend, so might as well force ourselves in pre_plug().
>
> At least on x86-64, the 2 MiB alignment requirement is handled 
> automatically. QEMU_VMALLOC_ALIGN implicitly enforces that.
>
> Maybe RISCV also wants to set QEMU_VMALLOC_ALIGN?

I'll look into this for the next rev!


Björn

Re: [PATCH v2 1/3] hw/riscv/virt: Add memory hotplugging and virtio-md-pci support

Reply via email to