On 8/1/2023 6:46 AM, Daniel Henrique Barboza wrote: > > > On 7/30/23 22:53, Fei Wu wrote: >> riscv virt platform's memory started at 0x80000000 and >> straddled the 4GiB boundary. Curiously enough, this choice >> of a memory layout will prevent from launching a VM with >> a bit more than 2000MiB and PCIe pass-thru on an x86 host, due >> to identity mapping requirements for the MSI doorbell on x86, >> and these (APIC/IOAPIC) live right below 4GiB. >> >> So just split the RAM range into two portions: >> - 1 GiB range from 0x80000000 to 0xc0000000. >> - The remainder at 0x100000000 >> >> ...leaving a hole between the ranges. > > I am afraid this breaks some existing distro setups, like Ubuntu. After > this patch > this emulation stopped working: > > ~/work/qemu/build/qemu-system-riscv64 \ > -machine virt -nographic -m 8G -smp 8 \ > -kernel ./uboot-ubuntu/usr/lib/u-boot/qemu-riscv64_smode/uboot.elf \ > -drive file=snapshot.img,format=qcow2,if=virtio \ > -netdev bridge,id=bridge1,br=virbr0 -device > virtio-net-pci,netdev=bridge1 > > > This is basically a guest created via the official Canonical tutorial: > > https://wiki.ubuntu.com/RISC-V/QEMU > > The error being thrown: > > ================= > > Boot HART ID : 4 > Boot HART Domain : root > Boot HART Priv Version : v1.12 > Boot HART Base ISA : rv64imafdch > Boot HART ISA Extensions : time,sstc > Boot HART PMP Count : 16 > Boot HART PMP Granularity : 4 > Boot HART PMP Address Bits: 54 > Boot HART MHPM Count : 16 > Boot HART MIDELEG : 0x0000000000001666 > Boot HART MEDELEG : 0x0000000000f0b509 > > > U-Boot 2022.07+dfsg-1ubuntu4.2 (Nov 24 2022 - 18:47:41 +0000) > > CPU: > rv64imafdch_zicbom_zicboz_zicsr_zifencei_zihintpause_zawrs_zfa_zca_zcd_zba_zbb_zbc_zbs_sstc_svadu > Model: riscv-virtio,qemu > DRAM: Unhandled exception: Store/AMO access fault > EPC: 00000000802018b8 RA: 00000000802126a0 TVAL: 00000000ff733f90 > > Code: b823 06b2 bc23 06b2 b023 08b2 b423 08b2 (b823 08b2) > > > resetting ... > System reset not supported on this platform > ### ERROR ### Please RESET the board ### > QEMU 8.0.90 monitor - type 'help' for more infor > ================= > > > Based on the change made I can make an educated guess on what is going > wrong. > We have another board with a similar memory topology you're making here, > the > Microchip Polarfire (microchip_pfsoc.c). We were having some problems > with this > board while trying to consolidate the boot process between all boards in > hw/riscv/boot.c because of its non-continuous RAM bank. The full story > can be > read in the commit message of 4b402886ac89 ("hw/riscv: change > riscv_compute_fdt_addr() > semantics") but the short version can be seen in riscv_compute_fdt_addr() > from boot.c: > > - if ram_start is less than 3072MiB, the FDT will be put at the lowest > value > between 3072 MiB and the end of that RAM bank; > > - if ram_start is higher than 3072 MiB the FDT will be put at the end of > the > RAM bank. > > So, after this patch, since riscv_compute_fdt_addr() is being used with > the now > lower RAM bank, the fdt is being put in LOW_MEM - fdt_size for any setup > that has > more than 1Gb RAM, and this breaks assumptions made by uboot and Ubuntu > and possibly > others that are trying to retrieve the FDT from the gap that you created > between > low and hi mem in this patch. > > In fact, this same Ubuntu guest I mentioned above will boot if I put > only 1 Gb of RAM > (-m 1Gb). If I try with -m 1.1Gb I reproduce this error. This can be a > validation of > the guess I'm making here: Ubuntu is trying to fetch stuff (probably the > fdt) from > the gap between the memory areas. > > This change on top of this patch doesn't work either: > > $ git diff > diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c > index 8fbdc7220c..dfff48d849 100644 > --- a/hw/riscv/virt.c > +++ b/hw/riscv/virt.c > @@ -1335,9 +1335,16 @@ static void virt_machine_done(Notifier *notifier, > void *data) > kernel_start_addr, true, NULL); > } > > - fdt_load_addr = riscv_compute_fdt_addr(memmap[VIRT_DRAM].base, > + if (machine->ram_size < memmap[VIRT_DRAM].size) { > + fdt_load_addr = riscv_compute_fdt_addr(memmap[VIRT_DRAM].base, > memmap[VIRT_DRAM].size, > machine); > + } else { > + fdt_load_addr = > riscv_compute_fdt_addr(memmap[VIRT_DRAM_HIGH].base, > + memmap[VIRT_DRAM_HIGH].size, > + machine); > + } > + > > This would put the fdt at the end of the HI RAM for guests with more > than 1Gb of RAM. > This change in fact makes the situation even worse, breaking setups that > were working > before with this patch. > > There's a chance that reducing the gap between the RAM banks can make > Ubuntu work > reliably again but now I'm a little cold feet with this change. > > > I think we'll need some kernel/Opensbi folks to weight in here to see if > there's a > failsafe memory setup that won't break distros out there and allow your > passthrough > to work. > Hi Daniel,
Do you know who we should talk to? I think it's not uncommon to have seperated multi-range memory, the stack should support this configuration. Thanks, Fei. > > > Thanks, > > Daniel > >> >> Signed-off-by: Andrei Warkentin <andrei.warken...@intel.com> >> Signed-off-by: Fei Wu <fei2...@intel.com> >> --- >> hw/riscv/virt.c | 74 ++++++++++++++++++++++++++++++++++++----- >> include/hw/riscv/virt.h | 1 + >> 2 files changed, 66 insertions(+), 9 deletions(-) >> >> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c >> index d90286dc46..8fbdc7220c 100644 >> --- a/hw/riscv/virt.c >> +++ b/hw/riscv/virt.c >> @@ -75,7 +75,9 @@ >> #error "Can't accomodate all IMSIC groups in address space" >> #endif >> -static const MemMapEntry virt_memmap[] = { >> +#define LOW_MEM (1 * GiB) >> + >> +static MemMapEntry virt_memmap[] = { >> [VIRT_DEBUG] = { 0x0, 0x100 }, >> [VIRT_MROM] = { 0x1000, 0xf000 }, >> [VIRT_TEST] = { 0x100000, 0x1000 }, >> @@ -96,6 +98,7 @@ static const MemMapEntry virt_memmap[] = { >> [VIRT_PCIE_ECAM] = { 0x30000000, 0x10000000 }, >> [VIRT_PCIE_MMIO] = { 0x40000000, 0x40000000 }, >> [VIRT_DRAM] = { 0x80000000, 0x0 }, >> + [VIRT_DRAM_HIGH] = { 0x100000000, 0x0 }, >> }; >> /* PCIe high mmio is fixed for RV32 */ >> @@ -295,15 +298,12 @@ static void >> create_fdt_socket_cpus(RISCVVirtState *s, int socket, >> } >> } >> -static void create_fdt_socket_memory(RISCVVirtState *s, >> - const MemMapEntry *memmap, int >> socket) >> +static void create_fdt_socket_mem_range(RISCVVirtState *s, uint64_t >> addr, >> + uint64_t size, int socket) >> { >> char *mem_name; >> - uint64_t addr, size; >> MachineState *ms = MACHINE(s); >> - addr = memmap[VIRT_DRAM].base + riscv_socket_mem_offset(ms, >> socket); >> - size = riscv_socket_mem_size(ms, socket); >> mem_name = g_strdup_printf("/memory@%lx", (long)addr); >> qemu_fdt_add_subnode(ms->fdt, mem_name); >> qemu_fdt_setprop_cells(ms->fdt, mem_name, "reg", >> @@ -313,6 +313,34 @@ static void >> create_fdt_socket_memory(RISCVVirtState *s, >> g_free(mem_name); >> } >> +static void create_fdt_socket_memory(RISCVVirtState *s, >> + const MemMapEntry *memmap, int >> socket) >> +{ >> + uint64_t addr, size; >> + MachineState *mc = MACHINE(s); >> + uint64_t sock_offset = riscv_socket_mem_offset(mc, socket); >> + uint64_t sock_size = riscv_socket_mem_size(mc, socket); >> + >> + if (sock_offset < memmap[VIRT_DRAM].size) { >> + uint64_t low_mem_end = memmap[VIRT_DRAM].base + >> memmap[VIRT_DRAM].size; >> + >> + addr = memmap[VIRT_DRAM].base + sock_offset; >> + size = MIN(low_mem_end - addr, sock_size); >> + create_fdt_socket_mem_range(s, addr, size, socket); >> + >> + size = sock_size - size; >> + if (size > 0) { >> + create_fdt_socket_mem_range(s, memmap[VIRT_DRAM_HIGH].base, >> + size, socket); >> + } >> + } else { >> + addr = memmap[VIRT_DRAM_HIGH].base + >> + sock_offset - memmap[VIRT_DRAM].size; >> + >> + create_fdt_socket_mem_range(s, addr, sock_size, socket); >> + } >> +} >> + >> static void create_fdt_socket_clint(RISCVVirtState *s, >> const MemMapEntry *memmap, int >> socket, >> uint32_t *intc_phandles) >> @@ -1334,10 +1362,12 @@ static void virt_machine_done(Notifier >> *notifier, void *data) >> static void virt_machine_init(MachineState *machine) >> { >> - const MemMapEntry *memmap = virt_memmap; >> + MemMapEntry *memmap = virt_memmap; >> RISCVVirtState *s = RISCV_VIRT_MACHINE(machine); >> MemoryRegion *system_memory = get_system_memory(); >> MemoryRegion *mask_rom = g_new(MemoryRegion, 1); >> + MemoryRegion *ram_below_4g, *ram_above_4g; >> + uint64_t ram_size_low, ram_size_high; >> char *soc_name; >> DeviceState *mmio_irqchip, *virtio_irqchip, *pcie_irqchip; >> int i, base_hartid, hart_count; >> @@ -1448,6 +1478,17 @@ static void virt_machine_init(MachineState >> *machine) >> } >> } >> + if (machine->ram_size > LOW_MEM) { >> + ram_size_high = machine->ram_size - LOW_MEM; >> + ram_size_low = LOW_MEM; >> + } else { >> + ram_size_high = 0; >> + ram_size_low = machine->ram_size; >> + } >> + >> + memmap[VIRT_DRAM].size = ram_size_low; >> + memmap[VIRT_DRAM_HIGH].size = ram_size_high; >> + >> if (riscv_is_32bit(&s->soc[0])) { >> #if HOST_LONG_BITS == 64 >> /* limit RAM size in a 32-bit system */ >> @@ -1460,7 +1501,8 @@ static void virt_machine_init(MachineState >> *machine) >> virt_high_pcie_memmap.size = VIRT32_HIGH_PCIE_MMIO_SIZE; >> } else { >> virt_high_pcie_memmap.size = VIRT64_HIGH_PCIE_MMIO_SIZE; >> - virt_high_pcie_memmap.base = memmap[VIRT_DRAM].base + >> machine->ram_size; >> + virt_high_pcie_memmap.base = memmap[VIRT_DRAM_HIGH].base + >> + memmap[VIRT_DRAM_HIGH].size; >> virt_high_pcie_memmap.base = >> ROUND_UP(virt_high_pcie_memmap.base, >> virt_high_pcie_memmap.size); >> } >> @@ -1468,8 +1510,22 @@ static void virt_machine_init(MachineState >> *machine) >> s->memmap = virt_memmap; >> /* register system main memory (actual RAM) */ >> + ram_below_4g = g_malloc(sizeof(*ram_below_4g)); >> + memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", >> machine->ram, >> + 0, memmap[VIRT_DRAM].size); >> memory_region_add_subregion(system_memory, memmap[VIRT_DRAM].base, >> - machine->ram); >> + ram_below_4g); >> + >> + if (memmap[VIRT_DRAM_HIGH].size) { >> + ram_above_4g = g_malloc(sizeof(*ram_above_4g)); >> + memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", >> + machine->ram, >> + memmap[VIRT_DRAM].size, >> + memmap[VIRT_DRAM_HIGH].size); >> + memory_region_add_subregion(system_memory, >> + memmap[VIRT_DRAM_HIGH].base, >> + ram_above_4g); >> + } >> /* boot rom */ >> memory_region_init_rom(mask_rom, NULL, "riscv_virt_board.mrom", >> diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h >> index e5c474b26e..36004eb6ef 100644 >> --- a/include/hw/riscv/virt.h >> +++ b/include/hw/riscv/virt.h >> @@ -79,6 +79,7 @@ enum { >> VIRT_IMSIC_S, >> VIRT_FLASH, >> VIRT_DRAM, >> + VIRT_DRAM_HIGH, >> VIRT_PCIE_MMIO, >> VIRT_PCIE_PIO, >> VIRT_PLATFORM_BUS,