Re: [PATCH v2 0/9] ppc: nested KVM HV for spapr virtual hypervisor
On 2/16/22 11:25, Nicholas Piggin wrote: This should account for AFAIKS all comments, except maybe some about naming. Changes since v1: - Per-CPU spapr nested state moved to SpaprCpuState from PowerPCCPU. - address_space_map ops are used, small rearrangement to make any given access region store-only or load-only. - Some style, naming, etc cleanups and fixes. Hopefully I didn't miss anything. Thanks, Nick We can address migration aspects with followups. Applied for ppc-7.0 Thanks, C.
Re: [PATCH v7 0/3] spapr: nvdimm: Introduce spapr-nvdimm device
On 2/4/22 09:15, Shivaprasad G Bhat wrote: If the device backend is not persistent memory for the nvdimm, there is need for explicit IO flushes to ensure persistence. On SPAPR, the issue is addressed by adding a new hcall to request for an explicit flush from the guest when the backend is not pmem. So, the approach here is to convey when the hcall flush is required in a device tree property. The guest once it knows the device needs explicit flushes, makes the hcall as and when required. It was suggested to create a new device type to address the explicit flush for such backends on PPC instead of extending the generic nvdimm device with new property. So, the patch introduces the spapr-nvdimm device. The new device inherits the nvdimm device with the new bahviour such that if the backend has pmem=no, the device tree property is set by default. The below demonstration shows the map_sync behavior for non-pmem backends. (https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py.data/map_sync.c) The pmem0 is from spapr-nvdimm with with backend pmem=on, and pmem1 is from spapr-nvdimm with pmem=off, mounted as /dev/pmem0 on /mnt1 type xfs (rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota) /dev/pmem1 on /mnt2 type xfs (rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota) [root@atest-guest ~]# ./mapsync /mnt1/newfile > When pmem=on [root@atest-guest ~]# ./mapsync /mnt2/newfile > when pmem=off Failed to mmap with Operation not supported First patch adds the realize/unrealize call backs to the generic device for the new device's vmstate registration. The second patch implements the hcall, adds the necessary vmstate properties to spapr machine structure for carrying the hcall status during save-restore. The nature of the hcall being asynchronus, the patch uses aio utilities to offload the flush. The third patch introduces the spapr-nvdimm device, adds the device tree property for the guest when spapr-nvdimm is used with pmem=no on the backend. Also adds new property pmem-override(?, suggest if you have better name) to the spapr-nvdimm which hints at forcing the hcall based flushes even on pmem backed devices. The kernel changes to exploit this hcall is at https://github.com/linuxppc/linux/commit/75b7c05ebf9026.patch Applied for ppc-7.0 Thanks, C.
Re: [PATCH v2 00/27] target/ppc: SPR registration cleanups
On 2/16/22 17:23, Fabiano Rosas wrote: The goal of this series is to do some untangling of SPR registration code in cpu_init.c and prepare for moving the CPU initialization into separate files for each CPU family. After this series we'll have only cpu-specific SPR code in cpu_init.c, i.e. code that can be split and moved as a unit into other files. Common/generic SPR code will be in helper_regs.c, exposed via spr_common.h. Changes from v1: - Some commit message improvements suggested by David; - Removed the soft_tlb rename patch. Kept the old name; - Left the specific check_pow functions behind, they can be dealt with in the next series; - Added a new patch to rename spr_tcg to spr_common. Patches 23 and 26 still need review. This series is based on legoater/ppc7.0. v1: https://lists.nongnu.org/archive/html/qemu-ppc/2022-02/msg00313.html Fabiano Rosas (27): target/ppc: cpu_init: Remove not implemented comments target/ppc: cpu_init: Remove G2LE init code target/ppc: cpu_init: Group registration of generic SPRs target/ppc: cpu_init: Move Timebase registration into the common function target/ppc: cpu_init: Avoid nested SPR register functions target/ppc: cpu_init: Move 405 SPRs into register_405_sprs target/ppc: cpu_init: Move G2 SPRs into register_G2_sprs target/ppc: cpu_init: Decouple G2 SPR registration from 755 target/ppc: cpu_init: Decouple 74xx SPR registration from 7xx target/ppc: cpu_init: Deduplicate 440 SPR registration target/ppc: cpu_init: Deduplicate 603 SPR registration target/ppc: cpu_init: Deduplicate 604 SPR registration target/ppc: cpu_init: Deduplicate 745/755 SPR registration target/ppc: cpu_init: Deduplicate 7xx SPR registration target/ppc: cpu_init: Move 755 L2 cache SPRs into a function target/ppc: cpu_init: Move e300 SPR registration into a function target/ppc: cpu_init: Move 604e SPR registration into a function target/ppc: cpu_init: Reuse init_proc_603 for the e300 target/ppc: cpu_init: Reuse init_proc_604 for the 604e target/ppc: cpu_init: Reuse init_proc_745 for the 755 target/ppc: cpu_init: Rename register_ne_601_sprs target/ppc: cpu_init: Remove register_usprg3_sprs target/ppc: Rename spr_tcg.h to spr_common.h target/ppc: cpu_init: Expose some SPR registration helpers target/ppc: cpu_init: Move SPR registration macros to a header target/ppc: cpu_init: Move check_pow and QOM macros to a header target/ppc: Move common SPR functions out of cpu_init target/ppc/cpu.h | 39 + target/ppc/cpu_init.c | 1879 target/ppc/helper_regs.c | 402 + target/ppc/{spr_tcg.h => spr_common.h} | 69 +- target/ppc/translate.c |2 +- 5 files changed, 1098 insertions(+), 1293 deletions(-) rename target/ppc/{spr_tcg.h => spr_common.h} (72%) Applied for ppc-7.0 Thanks, C.
Re: [PATCH v5 05/15] hw/nvme: Add support for SR-IOV
On Feb 17 18:44, Lukasz Maniak wrote: > This patch implements initial support for Single Root I/O Virtualization > on an NVMe device. > > Essentially, it allows to define the maximum number of virtual functions > supported by the NVMe controller via sriov_max_vfs parameter. > > Passing a non-zero value to sriov_max_vfs triggers reporting of SR-IOV > capability by a physical controller and ARI capability by both the > physical and virtual function devices. > > NVMe controllers created via virtual functions mirror functionally > the physical controller, which may not entirely be the case, thus > consideration would be needed on the way to limit the capabilities of > the VF. > > NVMe subsystem is required for the use of SR-IOV. > > Signed-off-by: Lukasz Maniak > --- > hw/nvme/ctrl.c | 85 ++-- > hw/nvme/nvme.h | 3 +- > include/hw/pci/pci_ids.h | 1 + > 3 files changed, 85 insertions(+), 4 deletions(-) > LGTM. Reviewed-by: Klaus Jensen signature.asc Description: PGP signature
Re: [PATCH v5 15/15] hw/nvme: Update the initalization place for the AER queue
On Feb 17 18:45, Lukasz Maniak wrote: > From: Łukasz Gieryk > > This patch updates the initialization place for the AER queue, so it’s > initialized once, at controller initialization, and not every time > controller is enabled. > > While the original version works for a non-SR-IOV device, as it’s hard > to interact with the controller if it’s not enabled, the multiple > reinitialization is not necessarily correct. > > With the SR/IOV feature enabled a segfault can happen: a VF can have its > controller disabled, while a namespace can still be attached to the > controller through the parent PF. An event generated in such case ends > up on an uninitialized queue. > > While it’s an interesting question whether a VF should support AER in > the first place, I don’t think it must be answered today. > > Signed-off-by: Łukasz Gieryk Looks good. Reviewed-by: Klaus Jensen signature.asc Description: PGP signature
[PATCH v3] migration/rdma: set the REUSEADDR option for destination
We hit following error during testing RDMA transport: in case of migration error, mgmt daemon pick one migration port, incoming rdma:[::]:8089: RDMA ERROR: Error: could not rdma_bind_addr Then try another -incoming rdma:[::]:8103, sometime it worked, sometimes need another try with other ports number. Set the REUSEADDR option for destination, This allow address could be reused to avoid rdma_bind_addr error out. Signed-off-by: Jack Wang Reviewed-by: Pankaj Gupta --- v3: add reviewed-by tags from David and Pankaj. v2: extend commit message as discussed with Pankaj and David --- migration/rdma.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/migration/rdma.c b/migration/rdma.c index c7c7a384875b..663e1fbb096d 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -2705,6 +2705,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp) char ip[40] = "unknown"; struct rdma_addrinfo *res, *e; char port_str[16]; +int reuse = 1; for (idx = 0; idx < RDMA_WRID_MAX; idx++) { rdma->wr_data[idx].control_len = 0; @@ -2740,6 +2741,12 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error **errp) goto err_dest_init_bind_addr; } +ret = rdma_set_option(listen_id, RDMA_OPTION_ID, RDMA_OPTION_ID_REUSEADDR, + , sizeof reuse); +if (ret) { +ERROR(errp, "Error: could not set REUSEADDR option"); +goto err_dest_init_bind_addr; +} for (e = res; e != NULL; e = e->ai_next) { inet_ntop(e->ai_family, &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof ip); -- 2.25.1
[PATCH v2] hw: riscv: opentitan: fixup SPI addresses
From: Wilfred Mallawa This patch updates the SPI_DEVICE, SPI_HOST0, SPI_HOST1 base addresses. Also adds these as unimplemented devices. The address references can be found [1]. [1] https://github.com/lowRISC/opentitan/blob/6c317992fbd646818b34f2a2dbf44bc850e461e4/hw/top_earlgrey/sw/autogen/top_earlgrey_memory.h#L107 Signed-off-by: Wilfred Mallawa Reviewed-by: Alistair Francis --- v2: arranged base addrs in sorted order hw/riscv/opentitan.c | 12 +--- include/hw/riscv/opentitan.h | 4 +++- 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/hw/riscv/opentitan.c b/hw/riscv/opentitan.c index aec7cfa33f..833624d66c 100644 --- a/hw/riscv/opentitan.c +++ b/hw/riscv/opentitan.c @@ -34,13 +34,15 @@ static const MemMapEntry ibex_memmap[] = { [IBEX_DEV_FLASH] = { 0x2000, 0x8 }, [IBEX_DEV_UART] = { 0x4000, 0x1000 }, [IBEX_DEV_GPIO] = { 0x4004, 0x1000 }, -[IBEX_DEV_SPI] ={ 0x4005, 0x1000 }, +[IBEX_DEV_SPI_DEVICE] = { 0x4005, 0x1000 }, [IBEX_DEV_I2C] ={ 0x4008, 0x1000 }, [IBEX_DEV_PATTGEN] ={ 0x400e, 0x1000 }, [IBEX_DEV_TIMER] = { 0x4010, 0x1000 }, [IBEX_DEV_SENSOR_CTRL] ={ 0x4011, 0x1000 }, [IBEX_DEV_OTP_CTRL] = { 0x4013, 0x4000 }, [IBEX_DEV_USBDEV] = { 0x4015, 0x1000 }, +[IBEX_DEV_SPI_HOST0] = { 0x4030, 0x1000 }, +[IBEX_DEV_SPI_HOST1] = { 0x4031, 0x1000 }, [IBEX_DEV_PWRMGR] = { 0x4040, 0x1000 }, [IBEX_DEV_RSTMGR] = { 0x4041, 0x1000 }, [IBEX_DEV_CLKMGR] = { 0x4042, 0x1000 }, @@ -209,8 +211,12 @@ static void lowrisc_ibex_soc_realize(DeviceState *dev_soc, Error **errp) create_unimplemented_device("riscv.lowrisc.ibex.gpio", memmap[IBEX_DEV_GPIO].base, memmap[IBEX_DEV_GPIO].size); -create_unimplemented_device("riscv.lowrisc.ibex.spi", -memmap[IBEX_DEV_SPI].base, memmap[IBEX_DEV_SPI].size); +create_unimplemented_device("riscv.lowrisc.ibex.spi_device", +memmap[IBEX_DEV_SPI_DEVICE].base, memmap[IBEX_DEV_SPI_DEVICE].size); +create_unimplemented_device("riscv.lowrisc.ibex.spi_host0", +memmap[IBEX_DEV_SPI_HOST0].base, memmap[IBEX_DEV_SPI_HOST0].size); +create_unimplemented_device("riscv.lowrisc.ibex.spi_host1", +memmap[IBEX_DEV_SPI_HOST1].base, memmap[IBEX_DEV_SPI_HOST1].size); create_unimplemented_device("riscv.lowrisc.ibex.i2c", memmap[IBEX_DEV_I2C].base, memmap[IBEX_DEV_I2C].size); create_unimplemented_device("riscv.lowrisc.ibex.pattgen", diff --git a/include/hw/riscv/opentitan.h b/include/hw/riscv/opentitan.h index eac35ef590..00da9ded43 100644 --- a/include/hw/riscv/opentitan.h +++ b/include/hw/riscv/opentitan.h @@ -57,8 +57,10 @@ enum { IBEX_DEV_FLASH, IBEX_DEV_FLASH_VIRTUAL, IBEX_DEV_UART, +IBEX_DEV_SPI_DEVICE, +IBEX_DEV_SPI_HOST0, +IBEX_DEV_SPI_HOST1, IBEX_DEV_GPIO, -IBEX_DEV_SPI, IBEX_DEV_I2C, IBEX_DEV_PATTGEN, IBEX_DEV_TIMER, -- 2.35.1
Re: [PATCH v2 2/3] hw/smbios: fix table memory corruption with large memory vms
On Mon, Feb 14, 2022 at 6:21 PM Igor Mammedov wrote: > > On Mon, 7 Feb 2022 17:01:28 +0530 > Ani Sinha wrote: > > > With the current smbios table assignment code, we can have only 512 DIMM > > slots > it's a bit confusing, since it's not DIMM slots in QEMU sense (we do not > expose > DIMM devices via SMBIOS/E820). So maybe clarify here that initial RAM is split > into 16GB (with 'DIMM' type ) chunks/entries when it's described in SMBIOS > table 17. > > > (each DIMM of 16 GiB in size) before tables 17 and 19 conflict with their > > addresses. > > Are you sure it's addresses that are wrong? I don't know why I had this pre conception of memory corruption and overlapping addresses! Even the BZ says table handles overlap. Grr ... doing too much multiplexing these days :(
Re: [PATCH v2 00/15] target/arm: Implement LVA, LPA, LPA2 features
On 2/18/22 04:37, Alex Bennée wrote: Peter Maydell writes: On Thu, 10 Feb 2022 at 04:04, Richard Henderson wrote: Changes for v2: * Introduce FIELD_SEX64, instead of open-coding w/ sextract64. * Set TCR_EL1 more completely for user-only. * Continue to bound tsz within aa64_va_parameters; provide an out-of-bound indicator for raising AddressSize fault. * Split IPS patch. * Fix debug registers for LVA. * Fix long-format fsc for LPA2. * Fix TLBI page shift. * Validate TLBI granule vs TCR granule. Not done: * Validate translation levels which accept blocks. There is still no upstream kernel support for FEAT_LPA2, so that is essentially untested. This series seems to break 'make check-acceptance': (01/59) tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv2: INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '01-tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv2', 'logdir': '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/results/j... (900.74 s) (02/59) tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv3: INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: Timeout reached\nOriginal status: ERROR\n{'name': '02-tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv3', 'logdir': '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/results/j... (900.71 s) UEFI runs in the guest and seems to launch the kernel, but there's no output from the kernel itself in the logfile. Last thing it prints is: EFI stub: Booting Linux Kernel... EFI stub: EFI_RNG_PROTOCOL unavailable, no randomness supplied EFI stub: Using DTB from configuration table EFI stub: Exiting boot services and installing virtual address map... SetUefiImageMemoryAttributes - 0x7F50 - 0x0004 (0x0008) SetUefiImageMemoryAttributes - 0x7C19 - 0x0004 (0x0008) SetUefiImageMemoryAttributes - 0x7C14 - 0x0004 (0x0008) SetUefiImageMemoryAttributes - 0x7F4C - 0x0003 (0x0008) SetUefiImageMemoryAttributes - 0x7C0F - 0x0004 (0x0008) SetUefiImageMemoryAttributes - 0x7BFB - 0x0004 (0x0008) SetUefiImageMemoryAttributes - 0x7BE0 - 0x0003 (0x0008) SetUefiImageMemoryAttributes - 0x7BDC - 0x0003 (0x0008) This ought to be followed by the usual kernel boot log [0.00] Booting Linux on physical CPU 0x00 [0x000f0510] etc but it isn't. Probably the kernel is crashing in early bootup before it gets round to printing anything. As this test runs under -cpu max it is likely exercising the new features (and failing). I would have thought so too. However... I've bisected this to the final LPA2 patch. I have not tracked down what exactly is going on with this, but it's definitely not the guest exercising the new feature -- there is no upstream support for LPA2. I'll keep looking. r~
Re: [PATCH v6 01/19] configure, meson: override C compiler for cmake
> On Feb 17, 2022, at 7:09 AM, Peter Maydell wrote: > > On Thu, 17 Feb 2022 at 07:56, Jagannathan Raman wrote: >> >> The compiler path that cmake gets from meson is corrupted. It results in >> the following error: >> | -- The C compiler identification is unknown >> | CMake Error at CMakeLists.txt:35 (project): >> | The CMAKE_C_COMPILER: >> | /opt/rh/devtoolset-9/root/bin/cc;-m64;-mcx16 >> | is not a full path to an existing compiler tool. >> >> Explicitly specify the C compiler for cmake to avoid this error > > This sounds like a bug in Meson. Is there a Meson bug report > we can reference in the commit message here ? Hi Peter, This issue reproduces with the latest meson [1] also. I noticed the following about the “binaries” section [2]. The manual says meson could pass the values in this section to find_program [3]. As such I’m wondering if it’s OK to set compiler flags in this section because find_program doesn’t seem to accept any compiler flags. The compiler flags could be set in the “built-in options” section using options such as “c_args”, “cpp_args” and “objc_args” [4]. When I moved CPU_CFLAGS from the binaries section to the built-in-options section in “configure", I don’t see the issue anymore. [1]: https://github.com/mesonbuild/meson.git [2]: https://mesonbuild.com/Machine-files.html#binaries [3]: https://cmake.org/cmake/help/latest/command/find_program.html [4]: https://github.com/mesonbuild/meson/blob/master/docs/markdown/Reference-tables.md (section “Language arguments parameter names") Thank you! -- Jag > > thanks > -- PMM
Re: [PATCH] migration: NULL transport_data after freeing
On Thu, Feb 17, 2022 at 06:04:07PM +0100, Hanna Reitz wrote: > migration_incoming_state_destroy() NULLs all objects it frees after they > are freed, presumably so that a subsequent call to the same function > will not free them again, unless new objects have been created in the > meantime. > > transport_data is the exception, and it shows exactly this problem: When > an incoming migration uses transport_cleanup() and transport_data, and a > subsequent incoming migration (e.g. loadvm) occurs that does not, then > when this second one is done, it will call transport_cleanup() on the > old transport_data again -- which has already been freed. This is > sometimes visible in the iotest 201, though for some reason I can only > reproduce it with -m32. > > To fix this, call transport_cleanup() only when transport_data is not > NULL (otherwise there is nothing to clean up), and set transport_data to > NULL when it has been cleaned up (i.e. freed). > > (transport_cleanup() is used only by migration/socket.c, where > socket_start_incoming_migration_internal() sets both it and > transport_data to non-NULL values.) > > Signed-off-by: Hanna Reitz I had a similar fix here: https://lore.kernel.org/qemu-devel/20220216062809.57179-15-pet...@redhat.com/ Though there it was because I need migration_incoming_transport_cleanup() for other purposes, so the fix came along. My guess is this small fix will land earlier, if so I'll rebase. :) Thanks, -- Peter Xu
Re: [PATCH v2 4/8] configure: Disable out-of-line atomic operations on Aarch64
On 2/17/22 04:18, Philippe Mathieu-Daudé wrote: On 16/2/22 17:42, Akihiko Odaki wrote: On 2022/02/17 0:08, Philippe Mathieu-Daudé wrote: On 16/2/22 11:19, Richard Henderson wrote: These should have been supplied by libgcc.a, which we're supposed to be linking against. Something is wrong with your installation. I don't have gobjc/g++ installed, so ./configure defaulted to Clang to compile these languages, but compiled C files using GCC. At the end the Clang linker is used (the default c++ symlink). This is another form of compiler mis-configuration. If you don't have g++ to go with gcc, use --cxx=false to avoid picking up a different compiler. Could there be a mismatch between Clang (-mno-outline-atomics) and GCC (-moutline-atomics)? I have no idea if those options do the same thing. I think you have to instruct Clang to use libgcc instead of compiler-rt and link the objects with GCC. Here is the documentation of Clang about the runtime I could find: https://clang.llvm.org/docs/Toolchain.html#libgcc-s-gnu Thanks for the pointer. And the next section is https://clang.llvm.org/docs/Toolchain.html#atomics-library :) Clang does not currently automatically link against libatomic when using libgcc_s. You may need to manually add -latomic to support this configuration when using non-native atomic operations (if you see link errors referring to __atomic_* functions). I'll try that. -moutline-atomics is *not* the same as libatomic. You should not need libatomic at all. r~
Re: [PATCH] configure: Support empty prefixes
On 2/17/22 19:42, Joshua Seaton wrote: At least as of v5 (before the meson build), empty `--prefix` values were supported; this seems to have fallen out along the way. This change reintroduces support. What is the usecase exactly? QEMU supports relocatable installation so if you want you can use --prefix=/nonexistent and then move the resulting tree wherever you want. Paolo Tested locally with empty and non-empty values of `--prefix`. Signed-off-by: Joshua Seaton --- configure | 33 - 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/configure b/configure index 3a29eff5cc..87a32e52e4 100755 --- a/configure +++ b/configure @@ -1229,20 +1229,30 @@ case $git_submodules_action in ;; esac -libdir="${libdir:-$prefix/lib}" -libexecdir="${libexecdir:-$prefix/libexec}" -includedir="${includedir:-$prefix/include}" +# Emits a relative path in the case of an empty prefix. +prefix_subdir() { +dir="$1" +if test -z "$prefix" ; then +echo "$dir" +else +echo "$prefix/$dir" +fi +} + +libdir="${libdir:-$(prefix_subdir lib)}" +libexecdir="${libexecdir:-$(prefix_subdir libexec)}" +includedir="${includedir:-$(prefix_subdir include)}" if test "$mingw32" = "yes" ; then bindir="${bindir:-$prefix}" else -bindir="${bindir:-$prefix/bin}" +bindir="${bindir:-$(prefix_subdir bin)}" fi -mandir="${mandir:-$prefix/share/man}" -datadir="${datadir:-$prefix/share}" -docdir="${docdir:-$prefix/share/doc}" -sysconfdir="${sysconfdir:-$prefix/etc}" -local_statedir="${local_statedir:-$prefix/var}" +mandir="${mandir:-$(prefix_subdir share/man)}" +datadir="${datadir:-$(prefix_subdir share)}" +docdir="${docdir:-$(prefix_subdir share/doc)}" +sysconfdir="${sysconfdir:-$(prefix_subdir etc)}" +local_statedir="${local_statedir:-$(prefix_subdir var)}" firmwarepath="${firmwarepath:-$datadir/qemu-firmware}" localedir="${localedir:-$datadir/locale}" @@ -3763,6 +3773,11 @@ if test "$skip_meson" = no; then mv $cross config-meson.cross rm -rf meson-private meson-info meson-logs + + # Workaround for a meson bug preventing empty prefixes: + # see https://github.com/mesonbuild/meson/issues/6946. + prefix="${prefix:-/}" + run_meson() { NINJA=$ninja $meson setup \ --prefix "$prefix" \ -- 2.35.1.265.g69c8d7142f-goog
[PATCH v6 4/4] tests/tcg/s390x: changed to using .insn for tests requiring z15
Signed-off-by: David Miller --- tests/tcg/s390x/mie3-compl.c | 21 +++-- tests/tcg/s390x/mie3-mvcrl.c | 2 +- tests/tcg/s390x/mie3-sel.c | 6 +++--- 3 files changed, 15 insertions(+), 14 deletions(-) diff --git a/tests/tcg/s390x/mie3-compl.c b/tests/tcg/s390x/mie3-compl.c index 98281ee683..31820e4a2a 100644 --- a/tests/tcg/s390x/mie3-compl.c +++ b/tests/tcg/s390x/mie3-compl.c @@ -14,25 +14,26 @@ #define FbinOp(S, ASM) uint64_t S(uint64_t a, uint64_t b) \ { uint64_t res = 0; F_PRO; ASM; return res; } + /* AND WITH COMPLEMENT */ -FbinOp(_ncrk, asm("ncrk %%r0, %%r3, %%r2\n" F_EPI)) -FbinOp(_ncgrk, asm("ncgrk %%r0, %%r3, %%r2\n" F_EPI)) +FbinOp(_ncrk, asm(".insn rrf, 0xB9F5, %%r0, %%r3, %%r2, 0\n" F_EPI)) +FbinOp(_ncgrk, asm(".insn rrf, 0xB9E5, %%r0, %%r3, %%r2, 0\n" F_EPI)) /* NAND */ -FbinOp(_nnrk, asm("nnrk %%r0, %%r3, %%r2\n" F_EPI)) -FbinOp(_nngrk, asm("nngrk %%r0, %%r3, %%r2\n" F_EPI)) +FbinOp(_nnrk, asm(".insn rrf, 0xB974, %%r0, %%r3, %%r2, 0\n" F_EPI)) +FbinOp(_nngrk, asm(".insn rrf, 0xB964, %%r0, %%r3, %%r2, 0\n" F_EPI)) /* NOT XOR */ -FbinOp(_nxrk, asm("nxrk %%r0, %%r3, %%r2\n" F_EPI)) -FbinOp(_nxgrk, asm("nxgrk %%r0, %%r3, %%r2\n" F_EPI)) +FbinOp(_nxrk, asm(".insn rrf, 0xB977, %%r0, %%r3, %%r2, 0\n" F_EPI)) +FbinOp(_nxgrk, asm(".insn rrf, 0xB967, %%r0, %%r3, %%r2, 0\n" F_EPI)) /* NOR */ -FbinOp(_nork, asm("nork %%r0, %%r3, %%r2\n" F_EPI)) -FbinOp(_nogrk, asm("nogrk %%r0, %%r3, %%r2\n" F_EPI)) +FbinOp(_nork, asm(".insn rrf, 0xB976, %%r0, %%r3, %%r2, 0\n" F_EPI)) +FbinOp(_nogrk, asm(".insn rrf, 0xB966, %%r0, %%r3, %%r2, 0\n" F_EPI)) /* OR WITH COMPLEMENT */ -FbinOp(_ocrk, asm("ocrk %%r0, %%r3, %%r2\n" F_EPI)) -FbinOp(_ocgrk, asm("ocgrk %%r0, %%r3, %%r2\n" F_EPI)) +FbinOp(_ocrk, asm(".insn rrf, 0xB975, %%r0, %%r3, %%r2, 0\n" F_EPI)) +FbinOp(_ocgrk, asm(".insn rrf, 0xB965, %%r0, %%r3, %%r2, 0\n" F_EPI)) int main(int argc, char *argv[]) diff --git a/tests/tcg/s390x/mie3-mvcrl.c b/tests/tcg/s390x/mie3-mvcrl.c index 81cf3ad702..f0be83b197 100644 --- a/tests/tcg/s390x/mie3-mvcrl.c +++ b/tests/tcg/s390x/mie3-mvcrl.c @@ -6,7 +6,7 @@ static inline void mvcrl_8(const char *dst, const char *src) { asm volatile ( "llill %%r0, 8\n" -"mvcrl 0(%[dst]), 0(%[src])\n" +".insn sse, 0xE50A, 0(%[dst]), 0(%[src])" : : [dst] "d" (dst), [src] "d" (src) : "memory"); } diff --git a/tests/tcg/s390x/mie3-sel.c b/tests/tcg/s390x/mie3-sel.c index d6b7b0933b..32d434b01a 100644 --- a/tests/tcg/s390x/mie3-sel.c +++ b/tests/tcg/s390x/mie3-sel.c @@ -19,9 +19,9 @@ { uint64_t res = 0; F_PRO ; ASM ; return res; } -Fi3 (_selre, asm("selre%%r0, %%r3, %%r2\n" F_EPI)) -Fi3 (_selgrz,asm("selgrz %%r0, %%r3, %%r2\n" F_EPI)) -Fi3 (_selfhrnz, asm("selfhrnz %%r0, %%r3, %%r2\n" F_EPI)) +Fi3 (_selre, asm(".insn rrf, 0xB9F0, %%r0, %%r3, %%r2, 8\n" F_EPI)) +Fi3 (_selgrz,asm(".insn rrf, 0xB9E3, %%r0, %%r3, %%r2, 8\n" F_EPI)) +Fi3 (_selfhrnz, asm(".insn rrf, 0xB9C0, %%r0, %%r3, %%r2, 7\n" F_EPI)) int main(int argc, char *argv[]) -- 2.32.0
[PATCH v6 3/4] tests/tcg/s390x: Tests for Miscellaneous-Instruction-Extensions Facility 3
tests/tcg/s390x/mie3-compl.c: [N]*K instructions tests/tcg/s390x/mie3-mvcrl.c: MVCRL instruction tests/tcg/s390x/mie3-sel.c: SELECT instruction Signed-off-by: David Miller --- tests/tcg/s390x/Makefile.target | 5 ++- tests/tcg/s390x/mie3-compl.c| 55 + tests/tcg/s390x/mie3-mvcrl.c| 31 +++ tests/tcg/s390x/mie3-sel.c | 42 + 4 files changed, 132 insertions(+), 1 deletion(-) create mode 100644 tests/tcg/s390x/mie3-compl.c create mode 100644 tests/tcg/s390x/mie3-mvcrl.c create mode 100644 tests/tcg/s390x/mie3-sel.c diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target index 1a7238b4eb..54e67446aa 100644 --- a/tests/tcg/s390x/Makefile.target +++ b/tests/tcg/s390x/Makefile.target @@ -1,12 +1,15 @@ S390X_SRC=$(SRC_PATH)/tests/tcg/s390x VPATH+=$(S390X_SRC) -CFLAGS+=-march=zEC12 -m64 +CFLAGS+=-march=z15 -m64 TESTS+=hello-s390x TESTS+=csst TESTS+=ipm TESTS+=exrl-trt TESTS+=exrl-trtr TESTS+=pack +TESTS+=mie3-compl +TESTS+=mie3-mvcrl +TESTS+=mie3-sel TESTS+=mvo TESTS+=mvc TESTS+=shift diff --git a/tests/tcg/s390x/mie3-compl.c b/tests/tcg/s390x/mie3-compl.c new file mode 100644 index 00..98281ee683 --- /dev/null +++ b/tests/tcg/s390x/mie3-compl.c @@ -0,0 +1,55 @@ +#include + + +#define F_EPI "stg %%r0, %[res] " : [res] "+m" (res) : : "r0", "r2", "r3" + +#define F_PROasm ( \ +"llihf %%r0,801\n" \ +"lg %%r2, %[a]\n" \ +"lg %%r3, %[b] " \ +: : [a] "m" (a), \ +[b] "m" (b)\ +: "r2", "r3") + +#define FbinOp(S, ASM) uint64_t S(uint64_t a, uint64_t b) \ +{ uint64_t res = 0; F_PRO; ASM; return res; } + +/* AND WITH COMPLEMENT */ +FbinOp(_ncrk, asm("ncrk %%r0, %%r3, %%r2\n" F_EPI)) +FbinOp(_ncgrk, asm("ncgrk %%r0, %%r3, %%r2\n" F_EPI)) + +/* NAND */ +FbinOp(_nnrk, asm("nnrk %%r0, %%r3, %%r2\n" F_EPI)) +FbinOp(_nngrk, asm("nngrk %%r0, %%r3, %%r2\n" F_EPI)) + +/* NOT XOR */ +FbinOp(_nxrk, asm("nxrk %%r0, %%r3, %%r2\n" F_EPI)) +FbinOp(_nxgrk, asm("nxgrk %%r0, %%r3, %%r2\n" F_EPI)) + +/* NOR */ +FbinOp(_nork, asm("nork %%r0, %%r3, %%r2\n" F_EPI)) +FbinOp(_nogrk, asm("nogrk %%r0, %%r3, %%r2\n" F_EPI)) + +/* OR WITH COMPLEMENT */ +FbinOp(_ocrk, asm("ocrk %%r0, %%r3, %%r2\n" F_EPI)) +FbinOp(_ocgrk, asm("ocgrk %%r0, %%r3, %%r2\n" F_EPI)) + + +int main(int argc, char *argv[]) +{ +if (_ncrk(0xFF88, 0xAA11) != 0x03210011ull || +_nnrk(0xFF88, 0xAA11) != 0x032155FFull || +_nork(0xFF88, 0xAA11) != 0x03210066ull || +_nxrk(0xFF88, 0xAA11) != 0x0321AA66ull || +_ocrk(0xFF88, 0xAA11) != 0x0321AA77ull || +_ncgrk(0xFF88, 0xAA11) != 0x0011ull || +_nngrk(0xFF88, 0xAA11) != 0x55FFull || +_nogrk(0xFF88, 0xAA11) != 0x0066ull || +_nxgrk(0xFF88, 0xAA11) != 0xAA66ull || +_ocgrk(0xFF88, 0xAA11) != 0xAA77ull) +{ +return 1; +} + +return 0; +} diff --git a/tests/tcg/s390x/mie3-mvcrl.c b/tests/tcg/s390x/mie3-mvcrl.c new file mode 100644 index 00..81cf3ad702 --- /dev/null +++ b/tests/tcg/s390x/mie3-mvcrl.c @@ -0,0 +1,31 @@ +#include +#include + + +static inline void mvcrl_8(const char *dst, const char *src) +{ +asm volatile ( +"llill %%r0, 8\n" +"mvcrl 0(%[dst]), 0(%[src])\n" +: : [dst] "d" (dst), [src] "d" (src) +: "memory"); +} + + +int main(int argc, char *argv[]) +{ +const char *alpha = "abcdefghijklmnop"; + +/* array missing 'i' */ +char tstr[17] = "abcdefghjklmnop\0" ; + +/* mvcrl reference use: 'open a hole in an array' */ +mvcrl_8(tstr + 9, tstr + 8); + +/* place missing 'i' */ +tstr[8] = 'i'; + +return strncmp(alpha, tstr, 16ul); +} + + diff --git a/tests/tcg/s390x/mie3-sel.c b/tests/tcg/s390x/mie3-sel.c new file mode 100644 index 00..d6b7b0933b --- /dev/null +++ b/tests/tcg/s390x/mie3-sel.c @@ -0,0 +1,42 @@ +#include + + +#define F_EPI "stg %%r0, %[res] " : [res] "+m" (res) : : "r0", "r2", "r3" + +#define F_PROasm ( \ +"lg %%r2, %[a]\n" \ +"lg %%r3, %[b]\n" \ +"lg %%r0, %[c]\n" \ +"ltgr %%r0, %%r0" \ +: : [a] "m" (a), \ +[b] "m" (b), \ +[c] "m" (c)\ +: "r0", "r2", "r3", "r4") + + + +#define Fi3(S, ASM) uint64_t S(uint64_t a, uint64_t b, uint64_t c) \ +{ uint64_t res = 0; F_PRO ; ASM ; return res; } + + +Fi3 (_selre, asm("selre%%r0, %%r3, %%r2\n" F_EPI)) +Fi3 (_selgrz,asm("selgrz %%r0, %%r3, %%r2\n" F_EPI)) +Fi3 (_selfhrnz, asm("selfhrnz %%r0, %%r3, %%r2\n" F_EPI)) + + +int main(int argc, char *argv[]) +{ +uint64_t a = ~0, b = ~0, c = ~0; +a =_selre(0x06660066ull, 0x06660006ull, a); +b = _selgrz(0xF00D0005ull, 0xF00D0055ull, b); +c = _selfhrnz(0x00440044ull, 0x00040004ull, c); + +if ((0x0066ull != a) || +(0xF00D0005ull != b) || +
[PATCH v6 1/4] s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3 for the s390x
resolves: https://gitlab.com/qemu-project/qemu/-/issues/737 implements: AND WITH COMPLEMENT (NCRK, NCGRK) NAND (NNRK, NNGRK) NOT EXCLUSIVE OR (NXRK, NXGRK) NOR (NORK, NOGRK) OR WITH COMPLEMENT(OCRK, OCGRK) SELECT(SELR, SELGR) SELECT HIGH (SELFHR) MOVE RIGHT TO LEFT(MVCRL) POPULATION COUNT (POPCNT) Signed-off-by: David Miller --- target/s390x/gen-features.c| 1 + target/s390x/helper.h | 1 + target/s390x/tcg/insn-data.def | 30 +-- target/s390x/tcg/mem_helper.c | 20 + target/s390x/tcg/translate.c | 53 -- 5 files changed, 100 insertions(+), 5 deletions(-) diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c index 7cb1a6ec10..a3f30f69d9 100644 --- a/target/s390x/gen-features.c +++ b/target/s390x/gen-features.c @@ -740,6 +740,7 @@ static uint16_t qemu_LATEST[] = { /* add all new definitions before this point */ static uint16_t qemu_MAX[] = { +S390_FEAT_MISC_INSTRUCTION_EXT3, /* generates a dependency warning, leave it out for now */ S390_FEAT_MSA_EXT_5, }; diff --git a/target/s390x/helper.h b/target/s390x/helper.h index 271b081e8c..69f69cf718 100644 --- a/target/s390x/helper.h +++ b/target/s390x/helper.h @@ -4,6 +4,7 @@ DEF_HELPER_FLAGS_4(nc, TCG_CALL_NO_WG, i32, env, i32, i64, i64) DEF_HELPER_FLAGS_4(oc, TCG_CALL_NO_WG, i32, env, i32, i64, i64) DEF_HELPER_FLAGS_4(xc, TCG_CALL_NO_WG, i32, env, i32, i64, i64) DEF_HELPER_FLAGS_4(mvc, TCG_CALL_NO_WG, void, env, i32, i64, i64) +DEF_HELPER_FLAGS_4(mvcrl, TCG_CALL_NO_WG, void, env, i64, i64, i64) DEF_HELPER_FLAGS_4(mvcin, TCG_CALL_NO_WG, void, env, i32, i64, i64) DEF_HELPER_FLAGS_4(clc, TCG_CALL_NO_WG, i32, env, i32, i64, i64) DEF_HELPER_3(mvcl, i32, env, i32, i32) diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def index 1c3e115712..3e51cd7c6d 100644 --- a/target/s390x/tcg/insn-data.def +++ b/target/s390x/tcg/insn-data.def @@ -105,6 +105,9 @@ D(0xa507, NILL,RI_a, Z, r1_o, i2_16u, r1, 0, andi, 0, 0x1000) D(0x9400, NI, SI,Z, la1, i2_8u, new, 0, ni, nz64, MO_UB) D(0xeb54, NIY, SIY, LD, la1, i2_8u, new, 0, ni, nz64, MO_UB) +/* AND WITH COMPLEMENT */ +C(0xb9f5, NCRK,RRF_a, MIE3, r2, r3, new, r1_32, andc, nz32) +C(0xb9e5, NCGRK, RRF_a, MIE3, r2, r3, r1, 0, andc, nz64) /* BRANCH AND LINK */ C(0x0500, BALR,RR_a, Z, 0, r2_nz, r1, 0, bal, 0) @@ -640,6 +643,8 @@ C(0xeb8e, MVCLU, RSY_a, E2, 0, a2, 0, 0, mvclu, 0) /* MOVE NUMERICS */ C(0xd100, MVN, SS_a, Z, la1, a2, 0, 0, mvn, 0) +/* MOVE RIGHT TO LEFT */ +C(0xe50a, MVCRL, SSE, MIE3, la1, a2, 0, 0, mvcrl, 0) /* MOVE PAGE */ C(0xb254, MVPG,RRE, Z, 0, 0, 0, 0, mvpg, 0) /* MOVE STRING */ @@ -707,6 +712,16 @@ F(0xed0f, MSEB,RXF, Z, e1, m2_32u, new, e1, mseb, 0, IF_BFP) F(0xed1f, MSDB,RXF, Z, f1, m2_64, new, f1, msdb, 0, IF_BFP) +/* NAND */ +C(0xb974, NNRK,RRF_a, MIE3, r2, r3, new, r1_32, nand, nz32) +C(0xb964, NNGRK, RRF_a, MIE3, r2, r3, r1, 0, nand, nz64) +/* NOR */ +C(0xb976, NORK,RRF_a, MIE3, r2, r3, new, r1_32, nor, nz32) +C(0xb966, NOGRK, RRF_a, MIE3, r2, r3, r1, 0, nor, nz64) +/* NOT EXCLUSIVE OR */ +C(0xb977, NXRK,RRF_a, MIE3, r2, r3, new, r1_32, nxor, nz32) +C(0xb967, NXGRK, RRF_a, MIE3, r2, r3, r1, 0, nxor, nz64) + /* OR */ C(0x1600, OR, RR_a, Z, r1, r2, new, r1_32, or, nz32) C(0xb9f6, ORK, RRF_a, DO, r2, r3, new, r1_32, or, nz32) @@ -725,6 +740,9 @@ D(0xa50b, OILL,RI_a, Z, r1_o, i2_16u, r1, 0, ori, 0, 0x1000) D(0x9600, OI, SI,Z, la1, i2_8u, new, 0, oi, nz64, MO_UB) D(0xeb56, OIY, SIY, LD, la1, i2_8u, new, 0, oi, nz64, MO_UB) +/* OR WITH COMPLEMENT */ +C(0xb975, OCRK,RRF_a, MIE3, r2, r3, new, r1_32, orc, nz32) +C(0xb965, OCGRK, RRF_a, MIE3, r2, r3, r1, 0, orc, nz64) /* PACK */ /* Really format SS_b, but we pack both lengths into one argument @@ -735,6 +753,9 @@ /* PACK UNICODE */ C(0xe100, PKU, SS_f, E2, la1, a2, 0, 0, pku, 0) +/* POPULATION COUNT */ +C(0xb9e1, POPCNT, RRF_c, PC, 0, r2_o, r1, 0, popcnt, nz64) + /* PREFETCH */ /* Implemented as nops of course. */ C(0xe336, PFD, RXY_b, GIE, 0, 0, 0, 0, 0, 0) @@ -743,9 +764,6 @@ /* Implemented as nop of course. */ C(0xb2e8, PPA, RRF_c, PPA, 0, 0, 0, 0, 0, 0) -/* POPULATION COUNT */ -C(0xb9e1, POPCNT, RRE, PC, 0, r2_o, r1, 0, popcnt, nz64) - /* ROTATE LEFT SINGLE LOGICAL */ C(0xeb1d, RLL, RSY_a, Z, r3_o, sh, new, r1_32, rll32, 0) C(0xeb1c, RLLG,RSY_a, Z, r3_o, sh, r1, 0, rll64, 0) @@ -765,6 +783,12 @@ /* SEARCH STRING UNICODE */ C(0xb9be, SRSTU, RRE, ETF3, 0, 0, 0, 0, srstu, 0) +/* SELECT */ +C(0xb9f0, SELR,RRF_a, MIE3, r3, r2, new, r1_32, loc, 0) +C(0xb9e3, SELGR, RRF_a,
[PATCH v6 2/4] s390x/cpumodel: Bump up QEMU model to a stripped-down IBM z15 GA1
TCG implements everything we need to run basic z15 OS+software Signed-off-by: David Miller --- hw/s390x/s390-virtio-ccw.c | 3 +++ target/s390x/cpu_models.c | 6 +++--- target/s390x/gen-features.c | 7 +-- 3 files changed, 11 insertions(+), 5 deletions(-) diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c index 84e3e63c43..90480e7cf9 100644 --- a/hw/s390x/s390-virtio-ccw.c +++ b/hw/s390x/s390-virtio-ccw.c @@ -802,7 +802,10 @@ DEFINE_CCW_MACHINE(7_0, "7.0", true); static void ccw_machine_6_2_instance_options(MachineState *machine) { +static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V6_2 }; + ccw_machine_7_0_instance_options(machine); +s390_set_qemu_cpu_model(0x3906, 14, 2, qemu_cpu_feat); } static void ccw_machine_6_2_class_options(MachineClass *mc) diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c index 11e06cc51f..89f83e81d5 100644 --- a/target/s390x/cpu_models.c +++ b/target/s390x/cpu_models.c @@ -85,9 +85,9 @@ static S390CPUDef s390_cpu_defs[] = { CPUDEF_INIT(0x3932, 16, 1, 47, 0x0800U, "gen16b", "IBM 3932 GA1"), }; -#define QEMU_MAX_CPU_TYPE 0x3906 -#define QEMU_MAX_CPU_GEN 14 -#define QEMU_MAX_CPU_EC_GA 2 +#define QEMU_MAX_CPU_TYPE 0x8561 +#define QEMU_MAX_CPU_GEN 15 +#define QEMU_MAX_CPU_EC_GA 1 static const S390FeatInit qemu_max_cpu_feat_init = { S390_FEAT_LIST_QEMU_MAX }; static S390FeatBitmap qemu_max_cpu_feat; diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c index a3f30f69d9..22846121c4 100644 --- a/target/s390x/gen-features.c +++ b/target/s390x/gen-features.c @@ -731,16 +731,18 @@ static uint16_t qemu_V6_0[] = { S390_FEAT_ESOP, }; -static uint16_t qemu_LATEST[] = { +static uint16_t qemu_V6_2[] = { S390_FEAT_INSTRUCTION_EXEC_PROT, S390_FEAT_MISC_INSTRUCTION_EXT2, S390_FEAT_MSA_EXT_8, S390_FEAT_VECTOR_ENH, }; +static uint16_t qemu_LATEST[] = { +S390_FEAT_MISC_INSTRUCTION_EXT3, +}; /* add all new definitions before this point */ static uint16_t qemu_MAX[] = { -S390_FEAT_MISC_INSTRUCTION_EXT3, /* generates a dependency warning, leave it out for now */ S390_FEAT_MSA_EXT_5, }; @@ -863,6 +865,7 @@ static FeatGroupDefSpec QemuFeatDef[] = { QEMU_FEAT_INITIALIZER(V4_0), QEMU_FEAT_INITIALIZER(V4_1), QEMU_FEAT_INITIALIZER(V6_0), +QEMU_FEAT_INITIALIZER(V6_2), QEMU_FEAT_INITIALIZER(LATEST), QEMU_FEAT_INITIALIZER(MAX), }; -- 2.32.0
[PATCH v6 0/4] s390x: Add partial z15 support and tests
Add partial support for s390x z15 ga1 and specific tests for mie3 v5 -> v6: * Swap operands for sel* instructions * Use .insn in tests for z15 arch instructions v4 -> v5: * Readd missing tests/tcg/s390x/mie3-*.c to patch v3 -> v4: * Change popcnt encoding RRE -> RRF_c * Remove redundant code op_sel -> op_loc * Cleanup for checkpatch.pl * Readded mie3-* to Makefile.target v2 -> v3: * Moved tests to separate patch. * Combined patches into series. David Miller (4): s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3 for the s390x s390x/cpumodel: Bump up QEMU model to a stripped-down IBM z15 GA1 tests/tcg/s390x: Tests for Miscellaneous-Instruction-Extensions Facility 3 tests/tcg/s390x: changed to using .insn for tests requiring z15 hw/s390x/s390-virtio-ccw.c | 3 ++ target/s390x/cpu_models.c | 6 ++-- target/s390x/gen-features.c | 6 +++- target/s390x/helper.h | 1 + target/s390x/tcg/insn-data.def | 30 -- target/s390x/tcg/mem_helper.c | 20 target/s390x/tcg/translate.c| 53 +-- tests/tcg/s390x/Makefile.target | 5 ++- tests/tcg/s390x/mie3-compl.c| 56 + tests/tcg/s390x/mie3-mvcrl.c| 31 ++ tests/tcg/s390x/mie3-sel.c | 42 + 11 files changed, 243 insertions(+), 10 deletions(-) create mode 100644 tests/tcg/s390x/mie3-compl.c create mode 100644 tests/tcg/s390x/mie3-mvcrl.c create mode 100644 tests/tcg/s390x/mie3-sel.c -- 2.32.0
[PATCH] virtio/virtio-balloon: Prefer Object* over void* parameter
*opaque is an alias to *obj. Using the ladder makes the code consistent with with other devices, e.g. accel/kvm/kvm-all and accel/tcg/tcg-all. It also makes the cast more typesafe. Signed-off-by: Bernhard Beschow --- hw/virtio/virtio-balloon.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c index 9a4f491b54..38732d4118 100644 --- a/hw/virtio/virtio-balloon.c +++ b/hw/virtio/virtio-balloon.c @@ -241,7 +241,7 @@ static void balloon_stats_get_all(Object *obj, Visitor *v, const char *name, void *opaque, Error **errp) { Error *err = NULL; -VirtIOBalloon *s = opaque; +VirtIOBalloon *s = VIRTIO_BALLOON(obj); int i; if (!visit_start_struct(v, name, NULL, 0, )) { @@ -276,7 +276,7 @@ static void balloon_stats_get_poll_interval(Object *obj, Visitor *v, const char *name, void *opaque, Error **errp) { -VirtIOBalloon *s = opaque; +VirtIOBalloon *s = VIRTIO_BALLOON(obj); visit_type_int(v, name, >stats_poll_interval, errp); } @@ -284,7 +284,7 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v, const char *name, void *opaque, Error **errp) { -VirtIOBalloon *s = opaque; +VirtIOBalloon *s = VIRTIO_BALLOON(obj); int64_t value; if (!visit_type_int(v, name, , errp)) { @@ -1014,12 +1014,12 @@ static void virtio_balloon_instance_init(Object *obj) s->free_page_hint_notify.notify = virtio_balloon_free_page_hint_notify; object_property_add(obj, "guest-stats", "guest statistics", -balloon_stats_get_all, NULL, NULL, s); +balloon_stats_get_all, NULL, NULL, NULL); object_property_add(obj, "guest-stats-polling-interval", "int", balloon_stats_get_poll_interval, balloon_stats_set_poll_interval, -NULL, s); +NULL, NULL); } static const VMStateDescription vmstate_virtio_balloon = { -- 2.35.1
[PATCH 1/2] hw/vfio/pci-quirks: Resolve redundant property getters
The QOM API already provides getters for uint64 and uint32 values, so reuse them. Signed-off-by: Bernhard Beschow --- hw/vfio/pci-quirks.c | 34 +- 1 file changed, 9 insertions(+), 25 deletions(-) diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c index 0cf69a8c6d..f0147a050a 100644 --- a/hw/vfio/pci-quirks.c +++ b/hw/vfio/pci-quirks.c @@ -1565,22 +1565,6 @@ static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice *vdev, Error **errp) return 0; } -static void vfio_pci_nvlink2_get_tgt(Object *obj, Visitor *v, - const char *name, - void *opaque, Error **errp) -{ -uint64_t tgt = (uintptr_t) opaque; -visit_type_uint64(v, name, , errp); -} - -static void vfio_pci_nvlink2_get_link_speed(Object *obj, Visitor *v, - const char *name, - void *opaque, Error **errp) -{ -uint32_t link_speed = (uint32_t)(uintptr_t) opaque; -visit_type_uint32(v, name, _speed, errp); -} - int vfio_pci_nvidia_v100_ram_init(VFIOPCIDevice *vdev, Error **errp) { int ret; @@ -1618,9 +1602,9 @@ int vfio_pci_nvidia_v100_ram_init(VFIOPCIDevice *vdev, Error **errp) nv2reg->size, p); QLIST_INSERT_HEAD(>bars[0].quirks, quirk, next); -object_property_add(OBJECT(vdev), "nvlink2-tgt", "uint64", -vfio_pci_nvlink2_get_tgt, NULL, NULL, -(void *) (uintptr_t) cap->tgt); +object_property_add_uint64_ptr(OBJECT(vdev), "nvlink2-tgt", + (uint64_t *) >tgt, + OBJ_PROP_FLAG_READ); trace_vfio_pci_nvidia_gpu_setup_quirk(vdev->vbasedev.name, cap->tgt, nv2reg->size); free_exit: @@ -1679,15 +1663,15 @@ int vfio_pci_nvlink2_init(VFIOPCIDevice *vdev, Error **errp) QLIST_INSERT_HEAD(>bars[0].quirks, quirk, next); } -object_property_add(OBJECT(vdev), "nvlink2-tgt", "uint64", -vfio_pci_nvlink2_get_tgt, NULL, NULL, -(void *) (uintptr_t) captgt->tgt); +object_property_add_uint64_ptr(OBJECT(vdev), "nvlink2-tgt", + (uint64_t *) >tgt, + OBJ_PROP_FLAG_READ); trace_vfio_pci_nvlink2_setup_quirk_ssatgt(vdev->vbasedev.name, captgt->tgt, atsdreg->size); -object_property_add(OBJECT(vdev), "nvlink2-link-speed", "uint32", -vfio_pci_nvlink2_get_link_speed, NULL, NULL, -(void *) (uintptr_t) capspeed->link_speed); +object_property_add_uint32_ptr(OBJECT(vdev), "nvlink2-link-speed", + >link_speed, + OBJ_PROP_FLAG_READ); trace_vfio_pci_nvlink2_setup_quirk_lnkspd(vdev->vbasedev.name, capspeed->link_speed); free_exit: -- 2.35.1
[PATCH 0/2] Resolve some redundant property accessors
The QOM API already provides appropriate accessors, so reuse them. Testing done: :$ make check Ok: 569 Expected Fail: 0 Fail: 0 Unexpected Pass:0 Skipped:178 Timeout:0 Bernhard Beschow (2): hw/vfio/pci-quirks: Resolve redundant property getters hw/riscv/sifive_u: Resolve redundant property accessors hw/riscv/sifive_u.c | 24 hw/vfio/pci-quirks.c | 34 +- 2 files changed, 13 insertions(+), 45 deletions(-) -- 2.35.1
[PATCH 2/2] hw/riscv/sifive_u: Resolve redundant property accessors
The QOM API already provides accessors for uint32 values, so reuse them. Signed-off-by: Bernhard Beschow --- hw/riscv/sifive_u.c | 24 1 file changed, 4 insertions(+), 20 deletions(-) diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c index 7fbc7dea42..747eb4ee89 100644 --- a/hw/riscv/sifive_u.c +++ b/hw/riscv/sifive_u.c @@ -713,36 +713,20 @@ static void sifive_u_machine_set_start_in_flash(Object *obj, bool value, Error * s->start_in_flash = value; } -static void sifive_u_machine_get_uint32_prop(Object *obj, Visitor *v, - const char *name, void *opaque, - Error **errp) -{ -visit_type_uint32(v, name, (uint32_t *)opaque, errp); -} - -static void sifive_u_machine_set_uint32_prop(Object *obj, Visitor *v, - const char *name, void *opaque, - Error **errp) -{ -visit_type_uint32(v, name, (uint32_t *)opaque, errp); -} - static void sifive_u_machine_instance_init(Object *obj) { SiFiveUState *s = RISCV_U_MACHINE(obj); s->start_in_flash = false; s->msel = 0; -object_property_add(obj, "msel", "uint32", -sifive_u_machine_get_uint32_prop, -sifive_u_machine_set_uint32_prop, NULL, >msel); +object_property_add_uint32_ptr(obj, "msel", >msel, + OBJ_PROP_FLAG_READWRITE); object_property_set_description(obj, "msel", "Mode Select (MSEL[3:0]) pin state"); s->serial = OTP_SERIAL; -object_property_add(obj, "serial", "uint32", -sifive_u_machine_get_uint32_prop, -sifive_u_machine_set_uint32_prop, NULL, >serial); +object_property_add_uint32_ptr(obj, "serial", >serial, + OBJ_PROP_FLAG_READWRITE); object_property_set_description(obj, "serial", "Board serial number"); } -- 2.35.1
Re: [PATCH v12 2/5] target/ppc: make power8-pmu.c CONFIG_TCG only
On 2/16/22 21:10, Daniel Henrique Barboza wrote: static void init_tcg_pmu_power8(CPUPPCState *env) { -#if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY) +#if defined(CONFIG_TCG) /* Init PMU overflow timers */ if (!kvm_enabled()) { cpu_ppc_pmu_init(env); @@ -7872,10 +7872,9 @@ static void ppc_cpu_reset(DeviceState *dev) if (env->mmu_model != POWERPC_MMU_REAL) { ppc_tlb_invalidate_all(env); } +pmu_update_summaries(env); #endif /* CONFIG_TCG */ #endif - -pmu_update_summaries(env); It looks like you could remove all of the ifdefs if you simply use tcg_enabled() rather than !kvm_enabled(). If !defined(CONFIG_TCG), tcg_enabled() will be constant false, and the block will be optimized away. r~
Re: [PATCH v4 1/7] hw/mips/gt64xxx_pci: Fix PCI IRQ levels to be preserved during migration
Am 17. Februar 2022 10:19:18 UTC schrieb Bernhard Beschow : >Based on commit e735b55a8c11dd455e31ccd4420e6c9485191d0c: > > piix_pci: eliminate PIIX3State::pci_irq_levels > > PIIX3State::pci_irq_levels are redundant which is already tracked by > PCIBus layer. So eliminate them. > >The IRQ levels in the PCIBus layer are already preserved during >migration. By reusing them and rather than having a redundant implementation >the bug is avoided in the first place. > >Suggested-by: Peter Maydell >Signed-off-by: Bernhard Beschow Copy from v3: Reviewed-by: Peter Maydell >--- > hw/mips/gt64xxx_pci.c | 7 ++- > 1 file changed, 2 insertions(+), 5 deletions(-) > >diff --git a/hw/mips/gt64xxx_pci.c b/hw/mips/gt64xxx_pci.c >index c7480bd019..4cbd0911f5 100644 >--- a/hw/mips/gt64xxx_pci.c >+++ b/hw/mips/gt64xxx_pci.c >@@ -1006,14 +1006,11 @@ static int gt64120_pci_map_irq(PCIDevice *pci_dev, int >irq_num) > } > } > >-static int pci_irq_levels[4]; >- > static void gt64120_pci_set_irq(void *opaque, int irq_num, int level) > { > int i, pic_irq, pic_level; > qemu_irq *pic = opaque; >- >-pci_irq_levels[irq_num] = level; >+PCIBus *bus = pci_get_bus(piix4_dev); > > /* now we change the pic irq level according to the piix irq mappings */ > /* XXX: optimize */ >@@ -1023,7 +1020,7 @@ static void gt64120_pci_set_irq(void *opaque, int >irq_num, int level) > pic_level = 0; > for (i = 0; i < 4; i++) { > if (pic_irq == piix4_dev->config[PIIX_PIRQCA + i]) { >-pic_level |= pci_irq_levels[i]; >+pic_level |= pci_bus_get_irq_level(bus, i); > } > } > qemu_set_irq(pic[pic_irq], pic_level);
Re: [PATCH 3/4] hw/openrisc/openrisc_sim; Add support for loading a decice tree
On Thu, Feb 17, 2022 at 06:18:58PM +, Peter Maydell wrote: > On Thu, 10 Feb 2022 at 06:46, Stafford Horne wrote: > > > > Using the device tree means that qemu can now directly tell > > the kernel what hardware is configured rather than use having > > to maintain and update a separate device tree file. > > > > This patch adds device tree support for the OpenRISC simulator. > > A device tree is built up based on the state of the configure > > openrisc simulator. > > This sounds like it's support for creating a device > tree? Support for loading a device tree would be "the > user passes us a filename of a dtb file". (This is mostly a > quibble about commit message wording.) Ah, yes I will fix this to say, "adds automatic device tree generation support" > > -static void openrisc_load_kernel(ram_addr_t ram_size, > > +static hwaddr openrisc_load_kernel(ram_addr_t ram_size, > > const char *kernel_filename) > > Indentation looks off now ? Fixed now. > > { > > long kernel_size; > > uint64_t elf_entry; > > +uint64_t high_addr; > > hwaddr entry; > > > > if (kernel_filename && !qtest_enabled()) { > > kernel_size = load_elf(kernel_filename, NULL, NULL, NULL, > > - _entry, NULL, NULL, NULL, 1, > > EM_OPENRISC, > > - 1, 0); > > + _entry, NULL, _addr, NULL, 1, > > + EM_OPENRISC, 1, 0); > > entry = elf_entry; > > if (kernel_size < 0) { > > kernel_size = load_uimage(kernel_filename, > >, NULL, NULL, NULL, NULL); > > +high_addr = entry + kernel_size; > > } > > if (kernel_size < 0) { > > kernel_size = load_image_targphys(kernel_filename, > >KERNEL_LOAD_ADDR, > >ram_size - KERNEL_LOAD_ADDR); > > +high_addr = KERNEL_LOAD_ADDR + kernel_size; > > } > > > > if (entry <= 0) { > > @@ -168,7 +181,139 @@ static void openrisc_load_kernel(ram_addr_t ram_size, > > exit(1); > > } > > boot_info.bootstrap_pc = entry; > > + > > +return high_addr; > > +} > > +return 0; > > +} > > + > > +static uint32_t openrisc_load_fdt(Or1ksimState *s, hwaddr load_start, > > +uint64_t mem_size) > > Indentation again. Fixed. > > +{ > > +uint32_t fdt_addr; > > +int fdtsize = fdt_totalsize(s->fdt); > > + > > +if (fdtsize <= 0) { > > +error_report("invalid device-tree"); > > +exit(1); > > +} > > + > > +/* We should put fdt right after the kernel */ > > You change this comment in patch 4 -- I think you might as well > just use that text in this patch to start with. OK, I had that at first but I did this to be more techincally correct. I will simplify as you suggest. > > +fdt_addr = ROUND_UP(load_start, 4); > > + > > +fdt_pack(s->fdt); > > fdt_pack() returns an error code -- you should check it. OK. > > +/* copy in the device tree */ > > +qemu_fdt_dumpdtb(s->fdt, fdtsize); > > + > > +rom_add_blob_fixed_as("fdt", s->fdt, fdtsize, fdt_addr, > > + _space_memory); > > + > > +return fdt_addr; > > +} > > + > > +static void openrisc_create_fdt(Or1ksimState *s, > > +const struct MemmapEntry *memmap, int num_cpus, uint64_t mem_size, > > +const char *cmdline) > > Indentation. Right, fixed. > > +{ > > +void *fdt; > > +int cpu; > > +char *nodename; > > +int pic_ph; > > + > > +fdt = s->fdt = create_device_tree(>fdt_size); > > +if (!fdt) { > > +error_report("create_device_tree() failed"); > > +exit(1); > > +} > > + > > +qemu_fdt_setprop_string(fdt, "/", "compatible", "opencores,or1ksim"); > > +qemu_fdt_setprop_cell(fdt, "/", "#address-cells", 0x1); > > +qemu_fdt_setprop_cell(fdt, "/", "#size-cells", 0x1); > > + > > +nodename = g_strdup_printf("/memory@%lx", > > + (long)memmap[OR1KSIM_DRAM].base); > > Use the appropriate format string macro for the type, rather than > casting to long (here and below). Right good point. > > +qemu_fdt_add_subnode(fdt, nodename); > > +qemu_fdt_setprop_cells(fdt, nodename, "reg", > > + memmap[OR1KSIM_DRAM].base, mem_size); > > +qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory"); > > +g_free(nodename); > > + > > +qemu_fdt_add_subnode(fdt, "/cpus"); > > +qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0); > > +qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1); > > + > > +for (cpu = 0; cpu < num_cpus; cpu++) { > > +nodename = g_strdup_printf("/cpus/cpu@%d", cpu); > > +qemu_fdt_add_subnode(fdt, nodename); > > +qemu_fdt_setprop_string(fdt, nodename, "compatible", >
Re: [PATCH v2 1/2] Mark remaining global TypeInfo instances as const
Am 17. Januar 2022 14:58:04 UTC schrieb Bernhard Beschow : >More than 1k of TypeInfo instances are already marked as const. Mark the >remaining ones, too. > >This commit was created with: > git grep -z -l 'static TypeInfo' -- '*.c' | \ > xargs -0 sed -i 's/static TypeInfo/static const TypeInfo/' > >Signed-off-by: Bernhard Beschow >--- > hw/core/generic-loader.c | 2 +- > hw/core/guest-loader.c | 2 +- > hw/display/bcm2835_fb.c| 2 +- > hw/display/i2c-ddc.c | 2 +- > hw/display/macfb.c | 4 ++-- > hw/display/virtio-vga.c| 2 +- > hw/dma/bcm2835_dma.c | 2 +- > hw/i386/pc_piix.c | 2 +- > hw/i386/sgx-epc.c | 2 +- > hw/intc/bcm2835_ic.c | 2 +- > hw/intc/bcm2836_control.c | 2 +- > hw/ipmi/ipmi.c | 4 ++-- > hw/mem/nvdimm.c| 2 +- > hw/mem/pc-dimm.c | 2 +- > hw/misc/bcm2835_mbox.c | 2 +- > hw/misc/bcm2835_powermgt.c | 2 +- > hw/misc/bcm2835_property.c | 2 +- > hw/misc/bcm2835_rng.c | 2 +- > hw/misc/pvpanic-isa.c | 2 +- > hw/misc/pvpanic-pci.c | 2 +- > hw/net/fsl_etsec/etsec.c | 2 +- > hw/ppc/prep_systemio.c | 2 +- > hw/ppc/spapr_iommu.c | 2 +- > hw/s390x/s390-pci-bus.c| 2 +- > hw/s390x/sclp.c| 2 +- > hw/s390x/tod-kvm.c | 2 +- > hw/s390x/tod-tcg.c | 2 +- > hw/s390x/tod.c | 2 +- > hw/scsi/lsi53c895a.c | 2 +- > hw/sd/allwinner-sdhost.c | 2 +- > hw/sd/aspeed_sdhci.c | 2 +- > hw/sd/bcm2835_sdhost.c | 2 +- > hw/sd/cadence_sdhci.c | 2 +- > hw/sd/npcm7xx_sdhci.c | 2 +- > hw/usb/dev-mtp.c | 2 +- > hw/usb/host-libusb.c | 2 +- > hw/vfio/igd.c | 2 +- > hw/virtio/virtio-pmem.c| 2 +- > qom/object.c | 4 ++-- > 39 files changed, 42 insertions(+), 42 deletions(-) > >diff --git a/hw/core/generic-loader.c b/hw/core/generic-loader.c >index 9a24ffb880..eaafc416f4 100644 >--- a/hw/core/generic-loader.c >+++ b/hw/core/generic-loader.c >@@ -207,7 +207,7 @@ static void generic_loader_class_init(ObjectClass *klass, >void *data) > set_bit(DEVICE_CATEGORY_MISC, dc->categories); > } > >-static TypeInfo generic_loader_info = { >+static const TypeInfo generic_loader_info = { > .name = TYPE_GENERIC_LOADER, > .parent = TYPE_DEVICE, > .instance_size = sizeof(GenericLoaderState), >diff --git a/hw/core/guest-loader.c b/hw/core/guest-loader.c >index d3f9d1a06e..391c875a29 100644 >--- a/hw/core/guest-loader.c >+++ b/hw/core/guest-loader.c >@@ -129,7 +129,7 @@ static void guest_loader_class_init(ObjectClass *klass, >void *data) > set_bit(DEVICE_CATEGORY_MISC, dc->categories); > } > >-static TypeInfo guest_loader_info = { >+static const TypeInfo guest_loader_info = { > .name = TYPE_GUEST_LOADER, > .parent = TYPE_DEVICE, > .instance_size = sizeof(GuestLoaderState), >diff --git a/hw/display/bcm2835_fb.c b/hw/display/bcm2835_fb.c >index 2be77bdd3a..088fc3d51c 100644 >--- a/hw/display/bcm2835_fb.c >+++ b/hw/display/bcm2835_fb.c >@@ -454,7 +454,7 @@ static void bcm2835_fb_class_init(ObjectClass *klass, void >*data) > dc->vmsd = _bcm2835_fb; > } > >-static TypeInfo bcm2835_fb_info = { >+static const TypeInfo bcm2835_fb_info = { > .name = TYPE_BCM2835_FB, > .parent= TYPE_SYS_BUS_DEVICE, > .instance_size = sizeof(BCM2835FBState), >diff --git a/hw/display/i2c-ddc.c b/hw/display/i2c-ddc.c >index 13eb529fc1..146489518c 100644 >--- a/hw/display/i2c-ddc.c >+++ b/hw/display/i2c-ddc.c >@@ -113,7 +113,7 @@ static void i2c_ddc_class_init(ObjectClass *oc, void *data) > isc->send = i2c_ddc_tx; > } > >-static TypeInfo i2c_ddc_info = { >+static const TypeInfo i2c_ddc_info = { > .name = TYPE_I2CDDC, > .parent = TYPE_I2C_SLAVE, > .instance_size = sizeof(I2CDDCState), >diff --git a/hw/display/macfb.c b/hw/display/macfb.c >index 4bd7c3ad6a..69c2ea2b6e 100644 >--- a/hw/display/macfb.c >+++ b/hw/display/macfb.c >@@ -783,14 +783,14 @@ static void macfb_nubus_class_init(ObjectClass *klass, >void *data) > device_class_set_props(dc, macfb_nubus_properties); > } > >-static TypeInfo macfb_sysbus_info = { >+static const TypeInfo macfb_sysbus_info = { > .name = TYPE_MACFB, > .parent= TYPE_SYS_BUS_DEVICE, > .instance_size = sizeof(MacfbSysBusState), > .class_init= macfb_sysbus_class_init, > }; > >-static TypeInfo macfb_nubus_info = { >+static const TypeInfo macfb_nubus_info = { > .name = TYPE_NUBUS_MACFB, > .parent= TYPE_NUBUS_DEVICE, > .instance_size = sizeof(MacfbNubusState), >diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c >index b23a75a04b..5a2f7a4540 100644 >--- a/hw/display/virtio-vga.c >+++ b/hw/display/virtio-vga.c >@@ -220,7 +220,7 @@ static void virtio_vga_base_class_init(ObjectClass *klass, >void *data) >virtio_vga_set_big_endian_fb); > } > >-static TypeInfo virtio_vga_base_info = { >+static const TypeInfo virtio_vga_base_info = { >
[PATCH] configure: Support empty prefixes
At least as of v5 (before the meson build), empty `--prefix` values were supported; this seems to have fallen out along the way. This change reintroduces support. Tested locally with empty and non-empty values of `--prefix`. Signed-off-by: Joshua Seaton --- configure | 33 - 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/configure b/configure index 3a29eff5cc..87a32e52e4 100755 --- a/configure +++ b/configure @@ -1229,20 +1229,30 @@ case $git_submodules_action in ;; esac -libdir="${libdir:-$prefix/lib}" -libexecdir="${libexecdir:-$prefix/libexec}" -includedir="${includedir:-$prefix/include}" +# Emits a relative path in the case of an empty prefix. +prefix_subdir() { +dir="$1" +if test -z "$prefix" ; then +echo "$dir" +else +echo "$prefix/$dir" +fi +} + +libdir="${libdir:-$(prefix_subdir lib)}" +libexecdir="${libexecdir:-$(prefix_subdir libexec)}" +includedir="${includedir:-$(prefix_subdir include)}" if test "$mingw32" = "yes" ; then bindir="${bindir:-$prefix}" else -bindir="${bindir:-$prefix/bin}" +bindir="${bindir:-$(prefix_subdir bin)}" fi -mandir="${mandir:-$prefix/share/man}" -datadir="${datadir:-$prefix/share}" -docdir="${docdir:-$prefix/share/doc}" -sysconfdir="${sysconfdir:-$prefix/etc}" -local_statedir="${local_statedir:-$prefix/var}" +mandir="${mandir:-$(prefix_subdir share/man)}" +datadir="${datadir:-$(prefix_subdir share)}" +docdir="${docdir:-$(prefix_subdir share/doc)}" +sysconfdir="${sysconfdir:-$(prefix_subdir etc)}" +local_statedir="${local_statedir:-$(prefix_subdir var)}" firmwarepath="${firmwarepath:-$datadir/qemu-firmware}" localedir="${localedir:-$datadir/locale}" @@ -3763,6 +3773,11 @@ if test "$skip_meson" = no; then mv $cross config-meson.cross rm -rf meson-private meson-info meson-logs + + # Workaround for a meson bug preventing empty prefixes: + # see https://github.com/mesonbuild/meson/issues/6946. + prefix="${prefix:-/}" + run_meson() { NINJA=$ninja $meson setup \ --prefix "$prefix" \ -- 2.35.1.265.g69c8d7142f-goog
I have a Question? Can you build a version with German and English and Spain and France and Russia and Chinese Language Pack and Handbook Installation Guide in PDF and Paper book with all Function in
I have a Question? Can you build a version with German and English and Spain and France and Russia and Chinese Language Pack and Handbook Installation Guide in PDF and Paper book with all Function in Linux, Unix and Windows? A Book with configure Raspberry PI Emulation, Android Emulation, IOS & MAC & SPARC Unix Emulation and IBM PC DOS/WINDOWS/LINUX x86/amd64 Emulation Soundblaster pro 16 awe32 awe64 Live Adlib speake Emulation.## VHD, IMG VHDX mount unmount as Instalation Guide Installation Guide all OS IMA, VFD, IMG Floppy mount unmount ISO Image CD mount unmount Copy Tools from Harddisk to VHD, VHDx, IMG and CONVER FORMAT Copy Tool from real Floppy to IMG, IMA VFD Copy Tool from real CD to ISO and convert Tool from all CD-Image Files to ISO (NRG,BIN and other) Creation Tool Harddisk Image Partition Tool and Format and boot Install CD FLoppy A Backup & Restore Tool from Harddisk Image and Partition to Backup File A Guide for this in German, Please. Best King Regards Daniel Frank Nommensen
A Virtual Bios in Qemu with GUI by Boot QEMU with SHORTCUT
Hello can you build a Virtual Bios with Hardware Simulation Emulation and change Hardware? CPU Option = CPU x86/x64/SPARC/ARM/ARM64/ANDROID/IOS/MAC CPU Standard Config Name = Original CPU in HArdware or Simulation Emulation All INTEL CELERON, CENTRINO, ATOM, PENTIUM (MMX) 1234, i3,i5,i7,i9 , ALL MAC Processors, ALL AMD Processors, ALL HANDY/SMARTPHONE Processor, ALL ONE PLATINE CPU and Hardware Raspberry Pi ZERO, 1, 2, 3, 4, CM4, 400, RockPI and all other or Optional Config and Retro Computers AMIGA, COMMODORE, SPEKTRUM, SINCLAIR, PC IBM DOS ALL 8086 up to 486 CPU CPU Core MAX USE MANUAL OPTIONAL = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, up to 256 Cores. CPU Core Speed min. 1 MHz up to max. CPU Speed from Hardware CPU Architecture Simulation 8-bit, 16-bit, 32-bit, 64-bit (Future: Quantum Processor) GPU Grafic Card Simulation Emulation = Orginal GPU from Hardware or simulated emulated all Graficcard from RETRO MATROX S3 NVIDIA ATI DOS/WINDOWS and all other from other hardware GPU Grafic Card Space from 128 Kilobyte up to max Speed GPU. GPU Grafic Card Core = 1 up to max from graficcard Autodetection 1 - 4 Graficcards Architecture Graficcard from 8-Bit up to 64-Bit Graficcard Simulation = Hercules, CGA, EGA, VGA, SVGA and greater 4:3, 16:9, 16:10 Graficcard Min Size max Size = ALL DOS 320x240, 640x480, 1024x768, 1280x1024, 1600x1200, HDREADY, FULLHD, 2K, 4K, 8K RAM from 0,5 MB up to Max hardware. mount 26 Drives Floppy, Harddisk, SD-Card, CD/DVD/BLURAY REAL SATA / USB or Emulation Hardware Controllers Master SLAVE IDE/ATA FDD or Image file Sound Card EMulation Real or Creativ Soundblster, Terratec, Roland, Adlib, or / and PC Speaker Boot Option 1234 1=First drive Boot or USB Boot 2=Second Drive (image) Boot or USB Boot 3= third drive boot 4=fourth Drive Boot And i can change with File Folder Image File by Short Cuts in System CD/DVD/BLURAY/FLOPPY IMAGE Best King Regards Daniel Frank Nommensen
Re: [PATCH v2 4/4] hw/openrisc/openrisc_sim: Add support for initrd loading
On Thu, Feb 17, 2022 at 06:04:32PM +, Peter Maydell wrote: > On Thu, 10 Feb 2022 at 13:13, Stafford Horne wrote: > > > > The initrd passed via the command line is loaded into memory. It's > > location and size is then added to the device tree so the kernel knows > > where to find it. > > > > Signed-off-by: Stafford Horne > > --- > > hw/openrisc/openrisc_sim.c | 32 +++- > > 1 file changed, 31 insertions(+), 1 deletion(-) > > > > diff --git a/hw/openrisc/openrisc_sim.c b/hw/openrisc/openrisc_sim.c > > index d7c26af82c..5354797e20 100644 > > --- a/hw/openrisc/openrisc_sim.c > > +++ b/hw/openrisc/openrisc_sim.c > > @@ -187,6 +187,32 @@ static hwaddr openrisc_load_kernel(ram_addr_t ram_size, > > return 0; > > } > > > > +static hwaddr openrisc_load_initrd(Or1ksimState *s, const char *filename, > > +hwaddr load_start, uint64_t mem_size) > > Indentation here is off. Ah, I was going off the indentation already in the file. I will fix this. i.e. should be: static hwaddr openrisc_load_initrd(Or1ksimState *s, const char *filename, hwaddr load_start, uint64_t mem_size) That's why its like that everywhere. I might as well add another patch to fix up indentation. > > Otherwise > Reviewed-by: Peter Maydell Thanks!
Re: [PATCH] hw/ide: implement ich6 ide controller support
On Mon, Feb 14, 2022 at 1:48 PM Liav Albani wrote: > > Hello BALATON, > > Thank you for helping keeping this patch noticeable to everyone :) > > I tried to reach out to John via a private email last Saturday (two days > ago) so I don't "spam" the mailing list for no good reason. > It might be that I should actually refrain from doing so and talk to the > maintainer directly on the mailing list once the patch > has been submitted to the mailing list. > I've not yet seen any response from John so I assume it's a matter of > days before he can take care of this. > > Best regards, > Liav > > On 2/14/22 14:26, BALATON Zoltan wrote: > > On Sat, 5 Feb 2022, BALATON Zoltan wrote: > >> Hello, > > > > Ping? John, do you agree with my comments? Should Liav proceed to send > > a v2? > > > > Thanks, > > BALATON Zoltan > > > >> On Sat, 5 Feb 2022, Liav Albani wrote: > >>> On 2/5/22 17:48, BALATON Zoltan wrote: > On Sat, 5 Feb 2022, Liav Albani wrote: > > This type of IDE controller has support for relocating the IO > > ports and > > doesn't use IRQ 14 and 15 but one allocated PCI IRQ for the > > controller. > > I haven't looked at in detail so only a few comments I've got while > reading it. What machine needs this? In QEMU I think we only have > piix and ich9 emulated for pc and q35 machines but maybe ich6 is > also used by some machine I don't know about. Otherwise it looks > odd to have ide part of ich6 but not the other parts of this chip. > > >>> Hi BALATON, > >>> > >>> This is my first patch to QEMU and the first time I send patches > >>> over the mail. I sent my github tree to John Snow (the maintainer of > >>> the IDE code in QEMU) for advice if I should send them here and I > >>> was encouraged to do that. > >> > >> Welcome and thanks a lot for taking time to contribute and share your > >> results. In case you're not yet aware, these docs should explain how > >> patches are handled on the list: > >> > >> https://www.qemu.org/docs/master/devel/submitting-a-patch.html > >> > >>> For the next time patch I'll put a note on writing a descriptive > >>> cover letter as it could have put more valuable details on why I > >>> sent this patch. > >>> > >>> There's no such machine type emulating the ICH6 chipset in QEMU. > >>> However, I wrote this emulation component as a test for the > >>> SerenityOS kernel because I have a machine from 2009 which has > >>> an ICH7 southbridge, so, I wanted to emulate such device with QEMU > >>> to ease development on it. > >>> > >>> I found out that Linux with libata was using the controller without > >>> any noticeable problems, but the SerenityOS kernel struggled to use > >>> this device, so I decided that > >>> I should send this patch to get it merged and then I can use it > >>> locally and maybe other people will benefit from it. > >>> > >>> In regard to other components of the ICH6 chipset - I don't think > >>> it's worth anybody's time to actually implement them as the ICH9 > >>> chipset is quite close to what the ICH6 chipset offers as far as I > >>> can tell. > >>> The idea of implementing ich6-ide controller was to enable the > >>> option of people like me and other OS developers to ensure their > >>> kernels operate correctly on such type of device, > >>> which is legacy-free device in the aspect of PCI bus resource > >>> management but still is a legacy device which belongs to chipsets of > >>> late 2000s. > >> > >> That's OK, maybe a short mention (just one sentence) in the commit > >> message explaining this would help to understand why this device > >> model was added. > >> > > Signed-off-by: Liav Albani > > --- > > hw/i386/Kconfig | 2 + > > hw/ide/Kconfig | 5 + > > hw/ide/bmdma.c | 83 +++ > > hw/ide/ich6.c| 211 > > +++ > > hw/ide/meson.build | 3 +- > > hw/ide/piix.c| 50 +- > > include/hw/ide/pci.h | 5 + > > include/hw/pci/pci_ids.h | 1 + > > 8 files changed, 311 insertions(+), 49 deletions(-) > > create mode 100644 hw/ide/bmdma.c > > create mode 100644 hw/ide/ich6.c > > > > diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig > > index d22ac4a4b9..a18de2d962 100644 > > --- a/hw/i386/Kconfig > > +++ b/hw/i386/Kconfig > > @@ -75,6 +75,7 @@ config I440FX > > select PCI_I440FX > > select PIIX3 > > select IDE_PIIX > > +select IDE_ICH6 > > select DIMM > > select SMBIOS > > select FW_CFG_DMA > > @@ -101,6 +102,7 @@ config Q35 > > select PCI_EXPRESS_Q35 > > select LPC_ICH9 > > select AHCI_ICH9 > > +select IDE_ICH6 > > select DIMM > > select SMBIOS > > select FW_CFG_DMA > > diff --git a/hw/ide/Kconfig b/hw/ide/Kconfig > > index dd85fa3619..63304325a5 100644 > > --- a/hw/ide/Kconfig > > +++
Re: [PATCH] ppc/spapr: Advertise StoreEOI for POWER10 compat guests
On 2/17/22 10:23, Cédric Le Goater wrote: On 2/17/22 12:28, Daniel Henrique Barboza wrote: On 2/14/22 11:11, Cédric Le Goater wrote: When an interrupt has been handled, the OS notifies the interrupt controller with a EOI sequence. On a POWER9 and POWER10 systems using the XIVE interrupt controller, this can be done with a load or a store operation on the ESB interrupt management page of the interrupt. The StoreEOI operation has less latency and improves interrupt handling performance but it was deactivated during the POWER9 DD2.0 timeframe because of ordering issues. POWER9 systems use the LoadEOI instead. POWER10 compat guests should have fixed the issue with Load-after-Store ordering and StoreEOI can be activated for them again. To maintain performance, this ordering is only enforced for the XIVE_ESB_SET_PQ_10 load operation. This operation can be used to disable temporarily an interrupt source. If StoreEOI is active, a source could be left enabled if the load and store operations come out of order. Add a check in our XIVE emulation model for Load-after-Store when StoreEOI is active. It should catch unreliable sequences. Other load operations should be fine without it. Signed-off-by: Cédric Le Goater --- Reviewed-by: Daniel Henrique Barboza Unfortunetaly, this patch breaks migration under TCG because the XIVE source flag is not updated on the target side. KVM is not impacted because the emulated sources are not used. This needs to be addressed in a v2. That said, even without this patch, TCG migration is broken. some CPUs on the receive side are stalled on CPU Hard LOCKUPs. QEMU 6.2 is impacted. So it has been a while :/ I've done a few tests and I can see Hard Lockups with TCG pseries migration, when using multiples CPUs (I used -smp 4 like you suggested in private), since at least QEMU v6.0.0. This is hardly surprising since TCG migration isn't something that we ever supported in a product or even in the community*. It would be good to understand why and get it fixed, but for now we can take a bit comfort in knowing that: - it has been broken for awhile (if ever worked). If this was a recent 7.0 regression we would need to solve it for this upcoming release; - single CPU TCG migration seems to be working fine, so we can count with this TCG migration scenario for testing. * I'm hoping David and Greg can push back on this if my assumption is wrong. Thanks, Daniel See below. C. [ 24.113608] watchdog: CPU 0 detected hard LOCKUP on other CPUs 1,3 [ 24.116534] watchdog: CPU 0 TB:15585461459, last SMP heartbeat TB:7394335409 (15998ms ago) [ 24.117840] watchdog: CPU 1 Hard LOCKUP [ 24.117956] watchdog: CPU 1 TB:15587843000, last heartbeat TB:5355690415 (19984ms ago) [ 24.117999] Modules linked in: [ 24.118387] irq event stamp: 341399 [ 24.118399] hardirqs last enabled at (341399): [] snooze_loop+0x9c/0x290 [ 24.118900] hardirqs last disabled at (341398): [] do_idle+0x12c/0x450 [ 24.118943] softirqs last enabled at (9798): [] __do_softirq+0x60c/0x678 [ 24.118971] softirqs last disabled at (9789): [] __irq_exit_rcu+0x158/0x1c0 [ 24.119127] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.17.0-rc4-dirty #984 [ 24.119293] NIP: c0caea78 LR: c0caea38 CTR: c0cae990 [ 24.119315] REGS: c000fff43d60 TRAP: 0100 Not tainted (5.17.0-rc4-dirty) [ 24.119352] MSR: 8280b033 CR: 28000228 XER: 0006 [ 24.119554] CFAR: c0caea98 IRQMASK: 0 [ 24.119554] GPR00: c0caea2c c2bbbd80 c1c30b00 [ 24.119554] GPR04: 0006 c800 c1c7dc38 [ 24.119554] GPR08: c2b5d500 0003a115ef39 36d551ed [ 24.119554] GPR12: c0cae990 c000f300 [ 24.119554] GPR16: [ 24.119554] GPR20: c1b3a660 [ 24.119554] GPR24: c000ffa4fb48 00059d7c5070 c1c78e48 [ 24.119554] GPR28: c1b3a660 c15422e0 c15422e8 [ 24.119845] NIP [c0caea78] snooze_loop+0xe8/0x290 [ 24.119866] LR [c0caea38] snooze_loop+0xa8/0x290 [ 24.119998] Call Trace: [ 24.120029] [c2bbbd80] [c0caea2c] snooze_loop+0x9c/0x290 (unreliable) [ 24.120097] [c2bbbdc0] [c0cab730] cpuidle_enter_state+0x300/0x730 [ 24.120119] [c2bbbe30] [c0cabbfc] cpuidle_enter+0x4c/0x70 [ 24.120131] [c2bbbe70] [c0208d98] do_idle+0x328/0x450 [ 24.120141] [c2bbbf00] [c020926c] cpu_startup_entry+0x3c/0x40 [ 24.120150] [c2bbbf30] [c005e144] start_secondary+0x2a4/0x2b0 [ 24.120161] [c2bbbf90] [c000d054] start_secondary_prolog+0x10/0x14 [ 24.120238] Instruction
Re: [PATCH v2] ide: Increment BB in-flight counter for TRIM BH
On Tue, Feb 15, 2022 at 12:14 PM Hanna Reitz wrote: > > Ping > > (I can take it too, if you’d like, John, but you’re listed as the only > maintainer for hw/ide, so... Just say the word, though!) > Sorry, I sent you a mail off-list at the time where I said you were free to take it whenever you like. Why'd I send it off-list? I don't know Please feel free to send this with your next block PR. --js > On 20.01.22 15:22, Hanna Reitz wrote: > > When we still have an AIOCB registered for DMA operations, we try to > > settle the respective operation by draining the BlockBackend associated > > with the IDE device. > > > > However, this assumes that every DMA operation is associated with an > > increment of the BlockBackend’s in-flight counter (e.g. through some > > ongoing I/O operation), so that draining the BB until its in-flight > > counter reaches 0 will settle all DMA operations. That is not the case: > > For TRIM, the guest can issue a zero-length operation that will not > > result in any I/O operation forwarded to the BlockBackend, and also not > > increment the in-flight counter in any other way. In such a case, > > blk_drain() will be a no-op if no other operations are in flight. > > > > It is clear that if blk_drain() is a no-op, the value of > > s->bus->dma->aiocb will not change between checking it in the `if` > > condition and asserting that it is NULL after blk_drain(). > > > > The particular problem is that ide_issue_trim() creates a BH > > (ide_trim_bh_cb()) to settle the TRIM request: iocb->common.cb() is > > ide_dma_cb(), which will either create a new request, or find the > > transfer to be done and call ide_set_inactive(), which clears > > s->bus->dma->aiocb. Therefore, the blk_drain() must wait for > > ide_trim_bh_cb() to run, which currently it will not always do. > > > > To fix this issue, we increment the BlockBackend's in-flight counter > > when the TRIM operation begins (in ide_issue_trim(), when the > > ide_trim_bh_cb() BH is created) and decrement it when ide_trim_bh_cb() > > is done. > > > > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2029980 > > Suggested-by: Paolo Bonzini > > Signed-off-by: Hanna Reitz > > --- > > v1: > > https://lists.nongnu.org/archive/html/qemu-block/2022-01/msg00024.html > > > > v2: > > - Increment BB’s in-flight counter while the BH is active so that > >blk_drain() will poll until the BH is done, as suggested by Paolo > > > > (No git-backport-diff, because this patch was basically completely > > rewritten, so it wouldn’t be worth it.) > > --- > > hw/ide/core.c | 7 +++ > > 1 file changed, 7 insertions(+) >
Re: [PATCH v4 01/12] mm/shmem: Introduce F_SEAL_INACCESSIBLE
On Thu, Feb 17, 2022, at 5:06 AM, Chao Peng wrote: > On Fri, Feb 11, 2022 at 03:33:35PM -0800, Andy Lutomirski wrote: >> On 1/18/22 05:21, Chao Peng wrote: >> > From: "Kirill A. Shutemov" >> > >> > Introduce a new seal F_SEAL_INACCESSIBLE indicating the content of >> > the file is inaccessible from userspace through ordinary MMU access >> > (e.g., read/write/mmap). However, the file content can be accessed >> > via a different mechanism (e.g. KVM MMU) indirectly. >> > >> > It provides semantics required for KVM guest private memory support >> > that a file descriptor with this seal set is going to be used as the >> > source of guest memory in confidential computing environments such >> > as Intel TDX/AMD SEV but may not be accessible from host userspace. >> > >> > At this time only shmem implements this seal. >> > >> >> I don't dislike this *that* much, but I do dislike this. F_SEAL_INACCESSIBLE >> essentially transmutes a memfd into a different type of object. While this >> can apparently be done successfully and without races (as in this code), >> it's at least awkward. I think that either creating a special inaccessible >> memfd should be a single operation that create the correct type of object or >> there should be a clear justification for why it's a two-step process. > > Now one justification maybe from Stever's comment to patch-00: for ARM > usage it can be used with creating a normal memfd, (partially)populate > it with initial guest memory content (e.g. firmware), and then > F_SEAL_INACCESSIBLE it just before the first time lunch of the guest in > KVM (definitely the current code needs to be changed to support that). Except we don't allow F_SEAL_INACCESSIBLE on a non-empty file, right? So this won't work. In any case, the whole confidential VM initialization story is a bit buddy. From the earlier emails, it sounds like ARM expects the host to fill in guest memory and measure it. From my recollection of Intel's scheme (which may well be wrong, and I could easily be confusing it with SGX), TDX instead measures what is essentially a transcript of the series of operations that initializes the VM. These are fundamentally not the same thing even if they accomplish the same end goal. For TDX, we unavoidably need an operation (ioctl or similar) that initializes things according to the VM's instructions, and ARM ought to be able to use roughly the same mechanism. Also, if we ever get fancy and teach the page allocator about memory with reduced directmap permissions, it may well be more efficient for userspace to shove data into a memfd via ioctl than it is to mmap it and write the data.
Re: [PATCH v5 03/16] tests/fp/berkeley-testfloat-3: Ignore ignored #pragma directives
On Mon, 14 Feb 2022 at 18:56, Philippe Mathieu-Daudé wrote: > > Since we already use -Wno-unknown-pragmas, we can also use > -Wno-ignored-pragmas. This silences hundred of warnings using > clang 13 on macOS Monterey: > > [409/771] Compiling C object > tests/fp/libtestfloat.a.p/berkeley-testfloat-3_source_test_az_f128_rx.c.o > ../tests/fp/berkeley-testfloat-3/source/test_az_f128_rx.c:49:14: warning: > '#pragma FENV_ACCESS' is not supported on this target - ignored > [-Wignored-pragmas] > #pragma STDC FENV_ACCESS ON >^ > 1 warning generated. > GCC doesn't know about -Wignored-pragmas, so this change is relying on the GCC "ignore a -Wno-something that this gcc doesn't recognize if we wouldn't otherwise be complaining about something" behaviour. I forget which GCC version that was introduced in... (This is why configure has the cc_has_warning_flag() test before it tries to use a warning/warning-suppression option.) -- PMM
Re: qemu crash 100% CPU with Ubuntu10.04 guest (solved)
On Thu, Feb 17, 2022 at 12:07:15PM +1100, Ben Smith wrote: > Hi All, > > I'm cross-posting this from Reddit qemu_kvm, in case it helps in some > way. I know my setup is ancient and unique; let me know if you would > like more info. > > Symptoms: > 1. Ubuntu10.04 32-bit guest locks up randomly between 0 and 30 days. > 2. The console shows a CPU trace dump, nothing else logged on the guest or > host. > 3. Host system (Ubuntu20.04) 100% CPU for qemu process. > > Solution: > When using virt-install, always use the "--os-variant" parameter! > e.g. --os-variant ubuntu10.04 FWIW, the --os-variant / --osinfo argument is going to be mandatory starting with the upcoming virt-manager release. https://listman.redhat.com/archives/virt-tools-list/2022-February/msg00021.html -- Andrea Bolognani / Red Hat / Virtualization
Re: [PATCH v5 02/16] configure: Allow passing extra Objective C compiler flags
On Mon, 14 Feb 2022 at 18:56, Philippe Mathieu-Daudé wrote: > > We can pass C/CPP/LD flags via CFLAGS/CXXFLAGS/LDFLAGS environment > variables, or via configure --extra-cflags / --extra-cxxflags / > --extra-ldflags options. Provide similar behavior for Objective C: > use existing flags from $OBJCFLAGS, or passed via --extra-objcflags. > > Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Peter Maydell thanks -- PMM
Re: [PATCH v5 01/16] MAINTAINERS: Add Akihiko Odaki to macOS-relateds
On Mon, 14 Feb 2022 at 18:56, Philippe Mathieu-Daudé wrote: > > From: Akihiko Odaki > > Signed-off-by: Akihiko Odaki > Reviewed-by: Christian Schoenebeck > Reviewed-by: Philippe Mathieu-Daudé > Message-Id: <20220213021215.1974-1-akihiko.od...@gmail.com> > Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Peter Maydell thanks -- PMM
Re: Call for GSoC and Outreachy project ideas for summer 2022
On 28/01/2022 16.47, Stefan Hajnoczi wrote: Dear QEMU, KVM, and rust-vmm communities, QEMU will apply for Google Summer of Code 2022 (https://summerofcode.withgoogle.com/) and has been accepted into Outreachy May-August 2022 (https://www.outreachy.org/). You can now submit internship project ideas for QEMU, KVM, and rust-vmm! If you have experience contributing to QEMU, KVM, or rust-vmm you can be a mentor. It's a great way to give back and you get to work with people who are just starting out in open source. Please reply to this email by February 21st with your project ideas. I'd like to suggest an idea (shamelessly "inspired" by Philippe's suggestion last year): === Improve s390x (IBM Z) emulation with RISU === Summary: Adapt RISU to s390x and fix CPU emulation along the way. RISU (Random Instruction Sequence generator for Userspace testing) is a tool for testing CPU instructions with randomly generated opcodes. The goal of this project is to adapt the RISU framework for the IBM Z architecture (a.k.a. s390x), so that it could be used to test the s390x emulation of QEMU for correctness. This will certainly help to spot some instruction emulation deficiencies in QEMU which should be addressed during this internship, too. '''Links:''' * [https://git.linaro.org/people/peter.maydell/risu.git/tree/ Peter Maydell's RISU repository] * [https://www.linux-kvm.org/images/6/63/01x03-ValidatingTCG.pdf KVM Forum 2014 presentation by Alex Bennée] * [http://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf z/Architecture Principles of Operation] (the description of the CPU instructions) '''Details:''' * Skill level: intermediate (a good basic understanding of CPU instructions is required) * Language: C, Perl * Mentor: Thomas Huth (th...@redhat.com) (+1 TBD) What do you think about that idea? Thanks, Thomas
Re: [PATCH 3/4] hw/openrisc/openrisc_sim; Add support for loading a decice tree
On Thu, 10 Feb 2022 at 06:46, Stafford Horne wrote: > > Using the device tree means that qemu can now directly tell > the kernel what hardware is configured rather than use having > to maintain and update a separate device tree file. > > This patch adds device tree support for the OpenRISC simulator. > A device tree is built up based on the state of the configure > openrisc simulator. This sounds like it's support for creating a device tree? Support for loading a device tree would be "the user passes us a filename of a dtb file". (This is mostly a quibble about commit message wording.) > -static void openrisc_load_kernel(ram_addr_t ram_size, > +static hwaddr openrisc_load_kernel(ram_addr_t ram_size, > const char *kernel_filename) Indentation looks off now ? > { > long kernel_size; > uint64_t elf_entry; > +uint64_t high_addr; > hwaddr entry; > > if (kernel_filename && !qtest_enabled()) { > kernel_size = load_elf(kernel_filename, NULL, NULL, NULL, > - _entry, NULL, NULL, NULL, 1, EM_OPENRISC, > - 1, 0); > + _entry, NULL, _addr, NULL, 1, > + EM_OPENRISC, 1, 0); > entry = elf_entry; > if (kernel_size < 0) { > kernel_size = load_uimage(kernel_filename, >, NULL, NULL, NULL, NULL); > +high_addr = entry + kernel_size; > } > if (kernel_size < 0) { > kernel_size = load_image_targphys(kernel_filename, >KERNEL_LOAD_ADDR, >ram_size - KERNEL_LOAD_ADDR); > +high_addr = KERNEL_LOAD_ADDR + kernel_size; > } > > if (entry <= 0) { > @@ -168,7 +181,139 @@ static void openrisc_load_kernel(ram_addr_t ram_size, > exit(1); > } > boot_info.bootstrap_pc = entry; > + > +return high_addr; > +} > +return 0; > +} > + > +static uint32_t openrisc_load_fdt(Or1ksimState *s, hwaddr load_start, > +uint64_t mem_size) Indentation again. > +{ > +uint32_t fdt_addr; > +int fdtsize = fdt_totalsize(s->fdt); > + > +if (fdtsize <= 0) { > +error_report("invalid device-tree"); > +exit(1); > +} > + > +/* We should put fdt right after the kernel */ You change this comment in patch 4 -- I think you might as well just use that text in this patch to start with. > +fdt_addr = ROUND_UP(load_start, 4); > + > +fdt_pack(s->fdt); fdt_pack() returns an error code -- you should check it. > +/* copy in the device tree */ > +qemu_fdt_dumpdtb(s->fdt, fdtsize); > + > +rom_add_blob_fixed_as("fdt", s->fdt, fdtsize, fdt_addr, > + _space_memory); > + > +return fdt_addr; > +} > + > +static void openrisc_create_fdt(Or1ksimState *s, > +const struct MemmapEntry *memmap, int num_cpus, uint64_t mem_size, > +const char *cmdline) Indentation. > +{ > +void *fdt; > +int cpu; > +char *nodename; > +int pic_ph; > + > +fdt = s->fdt = create_device_tree(>fdt_size); > +if (!fdt) { > +error_report("create_device_tree() failed"); > +exit(1); > +} > + > +qemu_fdt_setprop_string(fdt, "/", "compatible", "opencores,or1ksim"); > +qemu_fdt_setprop_cell(fdt, "/", "#address-cells", 0x1); > +qemu_fdt_setprop_cell(fdt, "/", "#size-cells", 0x1); > + > +nodename = g_strdup_printf("/memory@%lx", > + (long)memmap[OR1KSIM_DRAM].base); Use the appropriate format string macro for the type, rather than casting to long (here and below). > +qemu_fdt_add_subnode(fdt, nodename); > +qemu_fdt_setprop_cells(fdt, nodename, "reg", > + memmap[OR1KSIM_DRAM].base, mem_size); > +qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory"); > +g_free(nodename); > + > +qemu_fdt_add_subnode(fdt, "/cpus"); > +qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0); > +qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1); > + > +for (cpu = 0; cpu < num_cpus; cpu++) { > +nodename = g_strdup_printf("/cpus/cpu@%d", cpu); > +qemu_fdt_add_subnode(fdt, nodename); > +qemu_fdt_setprop_string(fdt, nodename, "compatible", > +"opencores,or1200-rtlsvn481"); > +qemu_fdt_setprop_cell(fdt, nodename, "reg", cpu); > +qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency", > + OR1KSIM_CLK_MHZ); > +g_free(nodename); > +} > + > +if (num_cpus > 0) { > +nodename = g_strdup_printf("/ompic@%lx", > + (long)memmap[OR1KSIM_OMPIC].base); > +qemu_fdt_add_subnode(fdt, nodename); > +qemu_fdt_setprop_string(fdt, nodename, "compatible", >
Re: Call for GSoC and Outreachy project ideas for summer 2022
On 2/14/22 15:01, Stefan Hajnoczi wrote: Thanks for this idea. As a stretch goal we could add implementing the packed virtqueue layout in Linux vhost, QEMU's libvhost-user, and/or QEMU's virtio qtest code. Why not have a separate project for packed virtqueue layout? Paolo
Re: [PATCH v2 4/4] hw/openrisc/openrisc_sim: Add support for initrd loading
On Thu, 10 Feb 2022 at 13:13, Stafford Horne wrote: > > The initrd passed via the command line is loaded into memory. It's > location and size is then added to the device tree so the kernel knows > where to find it. > > Signed-off-by: Stafford Horne > --- > hw/openrisc/openrisc_sim.c | 32 +++- > 1 file changed, 31 insertions(+), 1 deletion(-) > > diff --git a/hw/openrisc/openrisc_sim.c b/hw/openrisc/openrisc_sim.c > index d7c26af82c..5354797e20 100644 > --- a/hw/openrisc/openrisc_sim.c > +++ b/hw/openrisc/openrisc_sim.c > @@ -187,6 +187,32 @@ static hwaddr openrisc_load_kernel(ram_addr_t ram_size, > return 0; > } > > +static hwaddr openrisc_load_initrd(Or1ksimState *s, const char *filename, > +hwaddr load_start, uint64_t mem_size) Indentation here is off. Otherwise Reviewed-by: Peter Maydell thanks -- PMM
[PATCH v5 14/15] docs: Add documentation for SR-IOV and Virtualization Enhancements
Signed-off-by: Lukasz Maniak --- docs/system/devices/nvme.rst | 82 1 file changed, 82 insertions(+) diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst index b5acb2a9c19..aba253304e4 100644 --- a/docs/system/devices/nvme.rst +++ b/docs/system/devices/nvme.rst @@ -239,3 +239,85 @@ The virtual namespace device supports DIF- and DIX-based protection information to ``1`` to transfer protection information as the first eight bytes of metadata. Otherwise, the protection information is transferred as the last eight bytes. + +Virtualization Enhancements and SR-IOV (Experimental Support) +- + +The ``nvme`` device supports Single Root I/O Virtualization and Sharing +along with Virtualization Enhancements. The controller has to be linked to +an NVM Subsystem device (``nvme-subsys``) for use with SR-IOV. + +A number of parameters are present (**please note, that they may be +subject to change**): + +``sriov_max_vfs`` (default: ``0``) + Indicates the maximum number of PCIe virtual functions supported + by the controller. Specifying a non-zero value enables reporting of both + SR-IOV and ARI (Alternative Routing-ID Interpretation) capabilities + by the NVMe device. Virtual function controllers will not report SR-IOV. + +``sriov_vq_flexible`` + Indicates the total number of flexible queue resources assignable to all + the secondary controllers. Implicitly sets the number of primary + controller's private resources to ``(max_ioqpairs - sriov_vq_flexible)``. + +``sriov_vi_flexible`` + Indicates the total number of flexible interrupt resources assignable to + all the secondary controllers. Implicitly sets the number of primary + controller's private resources to ``(msix_qsize - sriov_vi_flexible)``. + +``sriov_max_vi_per_vf`` (default: ``0``) + Indicates the maximum number of virtual interrupt resources assignable + to a secondary controller. The default ``0`` resolves to + ``(sriov_vi_flexible / sriov_max_vfs)`` + +``sriov_max_vq_per_vf`` (default: ``0``) + Indicates the maximum number of virtual queue resources assignable to + a secondary controller. The default ``0`` resolves to + ``(sriov_vq_flexible / sriov_max_vfs)`` + +The simplest possible invocation enables the capability to set up one VF +controller and assign an admin queue, an IO queue, and a MSI-X interrupt. + +.. code-block:: console + + -device nvme-subsys,id=subsys0 + -device nvme,serial=deadbeef,subsys=subsys0,sriov_max_vfs=1, +sriov_vq_flexible=2,sriov_vi_flexible=1 + +The minimum steps required to configure a functional NVMe secondary +controller are: + + * unbind flexible resources from the primary controller + +.. code-block:: console + + nvme virt-mgmt /dev/nvme0 -c 0 -r 1 -a 1 -n 0 + nvme virt-mgmt /dev/nvme0 -c 0 -r 0 -a 1 -n 0 + + * perform a Function Level Reset on the primary controller to actually +release the resources + +.. code-block:: console + + echo 1 > /sys/bus/pci/devices/:01:00.0/reset + + * enable VF + +.. code-block:: console + + echo 1 > /sys/bus/pci/devices/:01:00.0/sriov_numvfs + + * assign the flexible resources to the VF and set it ONLINE + +.. code-block:: console + + nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1 + nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2 + nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0 + + * bind the NVMe driver to the VF + +.. code-block:: console + + echo :01:00.1 > /sys/bus/pci/drivers/nvme/bind \ No newline at end of file -- 2.25.1
[RFC PATCH] virtio-net: Unlimit tx queue size if peer is vdpa
The code used to limit the maximum size of tx queue for others backends than vhost_user since the introduction of configurable tx queue size in 9b02e1618cf2 ("virtio-net: enable configurable tx queue size"). As vhost_user, vhost_vdpa devices should deal with memory region crosses already, so let's use the full tx size. Signed-off-by: Eugenio Pérez --- hw/net/virtio-net.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 49cd13314a..b1769bfee0 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -629,17 +629,20 @@ static int virtio_net_max_tx_queue_size(VirtIONet *n) NetClientState *peer = n->nic_conf.peers.ncs[0]; /* - * Backends other than vhost-user don't support max queue size. + * Backends other than vhost-user or vhost-vdpa don't support max queue + * size. */ if (!peer) { return VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE; } -if (peer->info->type != NET_CLIENT_DRIVER_VHOST_USER) { +switch(peer->info->type) { +case NET_CLIENT_DRIVER_VHOST_USER: +case NET_CLIENT_DRIVER_VHOST_VDPA: +return VIRTQUEUE_MAX_SIZE; +default: return VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE; -} - -return VIRTQUEUE_MAX_SIZE; +}; } static int peer_attach(VirtIONet *n, int index) -- 2.27.0
[PATCH v5 09/15] hw/nvme: Make max_ioqpairs and msix_qsize configurable in runtime
From: Łukasz Gieryk The NVMe device defines two properties: max_ioqpairs, msix_qsize. Having them as constants is problematic for SR-IOV support. SR-IOV introduces virtual resources (queues, interrupts) that can be assigned to PF and its dependent VFs. Each device, following a reset, should work with the configured number of queues. A single constant is no longer sufficient to hold the whole state. This patch tries to solve the problem by introducing additional variables in NvmeCtrl’s state. The variables for, e.g., managing queues are therefore organized as: - n->params.max_ioqpairs – no changes, constant set by the user - n->(mutable_state) – (not a part of this patch) user-configurable, specifies number of queues available _after_ reset - n->conf_ioqpairs - (new) used in all the places instead of the ‘old’ n->params.max_ioqpairs; initialized in realize() and updated during reset() to reflect user’s changes to the mutable state Since the number of available i/o queues and interrupts can change in runtime, buffers for sq/cqs and the MSIX-related structures are allocated big enough to handle the limits, to completely avoid the complicated reallocation. A helper function (nvme_update_msixcap_ts) updates the corresponding capability register, to signal configuration changes. Signed-off-by: Łukasz Gieryk --- hw/nvme/ctrl.c | 52 ++ hw/nvme/nvme.h | 2 ++ 2 files changed, 38 insertions(+), 16 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 7c1dd80f21d..f1b4026e4f8 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -445,12 +445,12 @@ static bool nvme_nsid_valid(NvmeCtrl *n, uint32_t nsid) static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid) { -return sqid < n->params.max_ioqpairs + 1 && n->sq[sqid] != NULL ? 0 : -1; +return sqid < n->conf_ioqpairs + 1 && n->sq[sqid] != NULL ? 0 : -1; } static int nvme_check_cqid(NvmeCtrl *n, uint16_t cqid) { -return cqid < n->params.max_ioqpairs + 1 && n->cq[cqid] != NULL ? 0 : -1; +return cqid < n->conf_ioqpairs + 1 && n->cq[cqid] != NULL ? 0 : -1; } static void nvme_inc_cq_tail(NvmeCQueue *cq) @@ -4188,8 +4188,7 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeRequest *req) trace_pci_nvme_err_invalid_create_sq_cqid(cqid); return NVME_INVALID_CQID | NVME_DNR; } -if (unlikely(!sqid || sqid > n->params.max_ioqpairs || -n->sq[sqid] != NULL)) { +if (unlikely(!sqid || sqid > n->conf_ioqpairs || n->sq[sqid] != NULL)) { trace_pci_nvme_err_invalid_create_sq_sqid(sqid); return NVME_INVALID_QID | NVME_DNR; } @@ -4541,8 +4540,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest *req) trace_pci_nvme_create_cq(prp1, cqid, vector, qsize, qflags, NVME_CQ_FLAGS_IEN(qflags) != 0); -if (unlikely(!cqid || cqid > n->params.max_ioqpairs || -n->cq[cqid] != NULL)) { +if (unlikely(!cqid || cqid > n->conf_ioqpairs || n->cq[cqid] != NULL)) { trace_pci_nvme_err_invalid_create_cq_cqid(cqid); return NVME_INVALID_QID | NVME_DNR; } @@ -4558,7 +4556,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest *req) trace_pci_nvme_err_invalid_create_cq_vector(vector); return NVME_INVALID_IRQ_VECTOR | NVME_DNR; } -if (unlikely(vector >= n->params.msix_qsize)) { +if (unlikely(vector >= n->conf_msix_qsize)) { trace_pci_nvme_err_invalid_create_cq_vector(vector); return NVME_INVALID_IRQ_VECTOR | NVME_DNR; } @@ -5155,13 +5153,12 @@ defaults: break; case NVME_NUMBER_OF_QUEUES: -result = (n->params.max_ioqpairs - 1) | -((n->params.max_ioqpairs - 1) << 16); +result = (n->conf_ioqpairs - 1) | ((n->conf_ioqpairs - 1) << 16); trace_pci_nvme_getfeat_numq(result); break; case NVME_INTERRUPT_VECTOR_CONF: iv = dw11 & 0x; -if (iv >= n->params.max_ioqpairs + 1) { +if (iv >= n->conf_ioqpairs + 1) { return NVME_INVALID_FIELD | NVME_DNR; } @@ -5316,10 +5313,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req) trace_pci_nvme_setfeat_numq((dw11 & 0x) + 1, ((dw11 >> 16) & 0x) + 1, -n->params.max_ioqpairs, -n->params.max_ioqpairs); -req->cqe.result = cpu_to_le32((n->params.max_ioqpairs - 1) | - ((n->params.max_ioqpairs - 1) << 16)); +n->conf_ioqpairs, +n->conf_ioqpairs); +req->cqe.result = cpu_to_le32((n->conf_ioqpairs - 1) | + ((n->conf_ioqpairs - 1) << 16)); break; case
[PATCH v5 13/15] hw/nvme: Add support for the Virtualization Management command
From: Łukasz Gieryk With the new command one can: - assign flexible resources (queues, interrupts) to primary and secondary controllers, - toggle the online/offline state of given controller. Signed-off-by: Łukasz Gieryk --- hw/nvme/ctrl.c | 257 ++- hw/nvme/nvme.h | 20 hw/nvme/trace-events | 3 + include/block/nvme.h | 17 +++ 4 files changed, 295 insertions(+), 2 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 2a6a36e733d..a9742cf5051 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -188,6 +188,7 @@ #include "qemu/error-report.h" #include "qemu/log.h" #include "qemu/units.h" +#include "qemu/range.h" #include "qapi/error.h" #include "qapi/visitor.h" #include "sysemu/sysemu.h" @@ -259,6 +260,7 @@ static const uint32_t nvme_cse_acs[256] = { [NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP, [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP, [NVME_ADM_CMD_NS_ATTACHMENT]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_NIC, +[NVME_ADM_CMD_VIRT_MNGMT] = NVME_CMD_EFF_CSUPP, [NVME_ADM_CMD_FORMAT_NVM] = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC, }; @@ -290,6 +292,7 @@ static const uint32_t nvme_cse_iocs_zoned[256] = { }; static void nvme_process_sq(void *opaque); +static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst); static uint16_t nvme_sqid(NvmeRequest *req) { @@ -5694,6 +5697,167 @@ out: return status; } +static void nvme_get_virt_res_num(NvmeCtrl *n, uint8_t rt, int *num_total, + int *num_prim, int *num_sec) +{ +*num_total = le32_to_cpu(rt ? + n->pri_ctrl_cap.vifrt : n->pri_ctrl_cap.vqfrt); +*num_prim = le16_to_cpu(rt ? +n->pri_ctrl_cap.virfap : n->pri_ctrl_cap.vqrfap); +*num_sec = le16_to_cpu(rt ? n->pri_ctrl_cap.virfa : n->pri_ctrl_cap.vqrfa); +} + +static uint16_t nvme_assign_virt_res_to_prim(NvmeCtrl *n, NvmeRequest *req, + uint16_t cntlid, uint8_t rt, + int nr) +{ +int num_total, num_prim, num_sec; + +if (cntlid != n->cntlid) { +return NVME_INVALID_CTRL_ID | NVME_DNR; +} + +nvme_get_virt_res_num(n, rt, _total, _prim, _sec); + +if (nr > num_total) { +return NVME_INVALID_NUM_RESOURCES | NVME_DNR; +} + +if (nr > num_total - num_sec) { +return NVME_INVALID_RESOURCE_ID | NVME_DNR; +} + +if (rt) { +n->next_pri_ctrl_cap.virfap = cpu_to_le16(nr); +} else { +n->next_pri_ctrl_cap.vqrfap = cpu_to_le16(nr); +} + +req->cqe.result = cpu_to_le32(nr); +return req->status; +} + +static void nvme_update_virt_res(NvmeCtrl *n, NvmeSecCtrlEntry *sctrl, + uint8_t rt, int nr) +{ +int prev_nr, prev_total; + +if (rt) { +prev_nr = le16_to_cpu(sctrl->nvi); +prev_total = le32_to_cpu(n->pri_ctrl_cap.virfa); +sctrl->nvi = cpu_to_le16(nr); +n->pri_ctrl_cap.virfa = cpu_to_le32(prev_total + nr - prev_nr); +} else { +prev_nr = le16_to_cpu(sctrl->nvq); +prev_total = le32_to_cpu(n->pri_ctrl_cap.vqrfa); +sctrl->nvq = cpu_to_le16(nr); +n->pri_ctrl_cap.vqrfa = cpu_to_le32(prev_total + nr - prev_nr); +} +} + +static uint16_t nvme_assign_virt_res_to_sec(NvmeCtrl *n, NvmeRequest *req, +uint16_t cntlid, uint8_t rt, int nr) +{ +int num_total, num_prim, num_sec, num_free, diff, limit; +NvmeSecCtrlEntry *sctrl; + +sctrl = nvme_sctrl_for_cntlid(n, cntlid); +if (!sctrl) { +return NVME_INVALID_CTRL_ID | NVME_DNR; +} + +if (sctrl->scs) { +return NVME_INVALID_SEC_CTRL_STATE | NVME_DNR; +} + +limit = le16_to_cpu(rt ? n->pri_ctrl_cap.vifrsm : n->pri_ctrl_cap.vqfrsm); +if (nr > limit) { +return NVME_INVALID_NUM_RESOURCES | NVME_DNR; +} + +nvme_get_virt_res_num(n, rt, _total, _prim, _sec); +num_free = num_total - num_prim - num_sec; +diff = nr - le16_to_cpu(rt ? sctrl->nvi : sctrl->nvq); + +if (diff > num_free) { +return NVME_INVALID_RESOURCE_ID | NVME_DNR; +} + +nvme_update_virt_res(n, sctrl, rt, nr); +req->cqe.result = cpu_to_le32(nr); + +return req->status; +} + +static uint16_t nvme_virt_set_state(NvmeCtrl *n, uint16_t cntlid, bool online) +{ +NvmeCtrl *sn = NULL; +NvmeSecCtrlEntry *sctrl; +int vf_index; + +sctrl = nvme_sctrl_for_cntlid(n, cntlid); +if (!sctrl) { +return NVME_INVALID_CTRL_ID | NVME_DNR; +} + +if (!pci_is_vf(>parent_obj)) { +vf_index = le16_to_cpu(sctrl->vfn) - 1; +sn = NVME(pcie_sriov_get_vf_at_index(>parent_obj, vf_index)); +} + +if (online) { +if (!sctrl->nvi || (le16_to_cpu(sctrl->nvq) < 2) || !sn) { +return NVME_INVALID_SEC_CTRL_STATE | NVME_DNR; +} + +if
[PATCH v5 04/15] pcie: Add 1.2 version token for the Power Management Capability
From: Łukasz Gieryk Signed-off-by: Łukasz Gieryk --- include/hw/pci/pci_regs.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/hw/pci/pci_regs.h b/include/hw/pci/pci_regs.h index 77ba64b9314..a5901409622 100644 --- a/include/hw/pci/pci_regs.h +++ b/include/hw/pci/pci_regs.h @@ -4,5 +4,6 @@ #include "standard-headers/linux/pci_regs.h" #define PCI_PM_CAP_VER_1_1 0x0002 /* PCI PM spec ver. 1.1 */ +#define PCI_PM_CAP_VER_1_2 0x0003 /* PCI PM spec ver. 1.2 */ #endif -- 2.25.1
[PATCH v5 15/15] hw/nvme: Update the initalization place for the AER queue
From: Łukasz Gieryk This patch updates the initialization place for the AER queue, so it’s initialized once, at controller initialization, and not every time controller is enabled. While the original version works for a non-SR-IOV device, as it’s hard to interact with the controller if it’s not enabled, the multiple reinitialization is not necessarily correct. With the SR/IOV feature enabled a segfault can happen: a VF can have its controller disabled, while a namespace can still be attached to the controller through the parent PF. An event generated in such case ends up on an uninitialized queue. While it’s an interesting question whether a VF should support AER in the first place, I don’t think it must be answered today. Signed-off-by: Łukasz Gieryk --- hw/nvme/ctrl.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index a9742cf5051..ae41fced596 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -6182,8 +6182,6 @@ static int nvme_start_ctrl(NvmeCtrl *n) nvme_set_timestamp(n, 0ULL); -QTAILQ_INIT(>aer_queue); - nvme_select_iocs(n); return 0; @@ -6844,6 +6842,7 @@ static void nvme_init_state(NvmeCtrl *n) n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING; n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL); n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1); +QTAILQ_INIT(>aer_queue); list->numcntl = cpu_to_le16(max_vfs); for (i = 0; i < max_vfs; i++) { -- 2.25.1
Re: Call for GSoC and Outreachy project ideas for summer 2022
On 1/28/22 16:47, Stefan Hajnoczi wrote: Dear QEMU, KVM, and rust-vmm communities, QEMU will apply for Google Summer of Code 2022 (https://summerofcode.withgoogle.com/) and has been accepted into Outreachy May-August 2022 (https://www.outreachy.org/). You can now submit internship project ideas for QEMU, KVM, and rust-vmm! If you have experience contributing to QEMU, KVM, or rust-vmm you can be a mentor. It's a great way to give back and you get to work with people who are just starting out in open source. Please reply to this email by February 21st with your project ideas. I would like to co-mentor one or more projects about adding more statistics to Mark Kanda's newly-born introspectable statistics subsystem in QEMU (https://patchew.org/QEMU/20220215150433.2310711-1-mark.ka...@oracle.com/), for example integrating "info blockstats"; and/or, to add matching functionality to libvirt. However, I will only be available for co-mentoring unfortunately. Paolo Good project ideas are suitable for remote work by a competent programmer who is not yet familiar with the codebase. In addition, they are: - Well-defined - the scope is clear - Self-contained - there are few dependencies - Uncontroversial - they are acceptable to the community - Incremental - they produce deliverables along the way Feel free to post ideas even if you are unable to mentor the project. It doesn't hurt to share the idea! I will review project ideas and keep you up-to-date on QEMU's acceptance into GSoC. Internship program details: - Paid, remote work open source internships - GSoC projects are 175 or 350 hours, Outreachy projects are 30 hrs/week for 12 weeks - Mentored by volunteers from QEMU, KVM, and rust-vmm - Mentors typically spend at least 5 hours per week during the coding period Changes since last year: GSoC now has 175 or 350 hour project sizes instead of 12 week full-time projects. GSoC will accept applicants who are not students, before it was limited to students. For more background on QEMU internships, check out this video: https://www.youtube.com/watch?v=xNVCX7YMUL8 Please let me know if you have any questions! Stefan
[PATCH v5 07/15] hw/nvme: Add support for Secondary Controller List
Introduce handling for Secondary Controller List (Identify command with CNS value of 15h). Secondary controller ids are unique in the subsystem, hence they are reserved by it upon initialization of the primary controller to the number of sriov_max_vfs. ID reservation requires the addition of an intermediate controller slot state, so the reserved controller has the address 0x. A secondary controller is in the reserved state when it has no virtual function assigned, but its primary controller is realized. Secondary controller reservations are released to NULL when its primary controller is unregistered. Signed-off-by: Lukasz Maniak --- hw/nvme/ctrl.c | 35 + hw/nvme/ns.c | 2 +- hw/nvme/nvme.h | 18 +++ hw/nvme/subsys.c | 75 ++-- hw/nvme/trace-events | 1 + include/block/nvme.h | 20 6 files changed, 141 insertions(+), 10 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 0bd55948ce1..05acd681656 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -4705,6 +4705,29 @@ static uint16_t nvme_identify_pri_ctrl_cap(NvmeCtrl *n, NvmeRequest *req) sizeof(NvmePriCtrlCap), req); } +static uint16_t nvme_identify_sec_ctrl_list(NvmeCtrl *n, NvmeRequest *req) +{ +NvmeIdentify *c = (NvmeIdentify *)>cmd; +uint16_t pri_ctrl_id = le16_to_cpu(n->pri_ctrl_cap.cntlid); +uint16_t min_id = le16_to_cpu(c->ctrlid); +uint8_t num_sec_ctrl = n->sec_ctrl_list.numcntl; +NvmeSecCtrlList list = {0}; +uint8_t i; + +for (i = 0; i < num_sec_ctrl; i++) { +if (n->sec_ctrl_list.sec[i].scid >= min_id) { +list.numcntl = num_sec_ctrl - i; +memcpy(, n->sec_ctrl_list.sec + i, + list.numcntl * sizeof(NvmeSecCtrlEntry)); +break; +} +} + +trace_pci_nvme_identify_sec_ctrl_list(pri_ctrl_id, list.numcntl); + +return nvme_c2h(n, (uint8_t *), sizeof(list), req); +} + static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req, bool active) { @@ -4925,6 +4948,8 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req) return nvme_identify_ctrl_list(n, req, false); case NVME_ID_CNS_PRIMARY_CTRL_CAP: return nvme_identify_pri_ctrl_cap(n, req); +case NVME_ID_CNS_SECONDARY_CTRL_LIST: +return nvme_identify_sec_ctrl_list(n, req); case NVME_ID_CNS_CS_NS: return nvme_identify_ns_csi(n, req, true); case NVME_ID_CNS_CS_NS_PRESENT: @@ -6476,6 +6501,9 @@ static void nvme_check_constraints(NvmeCtrl *n, Error **errp) static void nvme_init_state(NvmeCtrl *n) { NvmePriCtrlCap *cap = >pri_ctrl_cap; +NvmeSecCtrlList *list = >sec_ctrl_list; +NvmeSecCtrlEntry *sctrl; +int i; /* add one to max_ioqpairs to account for the admin queue pair */ n->reg_size = pow2ceil(sizeof(NvmeBar) + @@ -6487,6 +6515,13 @@ static void nvme_init_state(NvmeCtrl *n) n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL); n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1); +list->numcntl = cpu_to_le16(n->params.sriov_max_vfs); +for (i = 0; i < n->params.sriov_max_vfs; i++) { +sctrl = >sec[i]; +sctrl->pcid = cpu_to_le16(n->cntlid); +sctrl->vfn = cpu_to_le16(i + 1); +} + cap->cntlid = cpu_to_le16(n->cntlid); } diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c index ee673f1a5be..d42fba117f1 100644 --- a/hw/nvme/ns.c +++ b/hw/nvme/ns.c @@ -567,7 +567,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp) for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) { NvmeCtrl *ctrl = subsys->ctrls[i]; -if (ctrl) { +if (ctrl && ctrl != SUBSYS_SLOT_RSVD) { nvme_attach_ns(ctrl, ns); } } diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 2db48eb25c9..f4494e5236f 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -43,6 +43,7 @@ typedef struct NvmeBus { #define TYPE_NVME_SUBSYS "nvme-subsys" #define NVME_SUBSYS(obj) \ OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS) +#define SUBSYS_SLOT_RSVD (void *)0x typedef struct NvmeSubsystem { DeviceState parent_obj; @@ -67,6 +68,10 @@ static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem *subsys, return NULL; } +if (subsys->ctrls[cntlid] == SUBSYS_SLOT_RSVD) { +return NULL; +} + return subsys->ctrls[cntlid]; } @@ -473,6 +478,7 @@ typedef struct NvmeCtrl { } features; NvmePriCtrlCap pri_ctrl_cap; +NvmeSecCtrlList sec_ctrl_list; } NvmeCtrl; static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid) @@ -507,6 +513,18 @@ static inline uint16_t nvme_cid(NvmeRequest *req) return le16_to_cpu(req->cqe.cid); } +static inline NvmeSecCtrlEntry *nvme_sctrl(NvmeCtrl *n) +{ +PCIDevice *pci_dev = >parent_obj; +NvmeCtrl
[PATCH v5 11/15] hw/nvme: Calculate BAR attributes in a function
From: Łukasz Gieryk An NVMe device with SR-IOV capability calculates the BAR size differently for PF and VF, so it makes sense to extract the common code to a separate function. Signed-off-by: Łukasz Gieryk Reviewed-by: Klaus Jensen --- hw/nvme/ctrl.c | 45 +++-- 1 file changed, 31 insertions(+), 14 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 6abec8e4369..73707565345 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -6584,6 +6584,34 @@ static void nvme_init_pmr(NvmeCtrl *n, PCIDevice *pci_dev) memory_region_set_enabled(>pmr.dev->mr, false); } +static uint64_t nvme_bar_size(unsigned total_queues, unsigned total_irqs, + unsigned *msix_table_offset, + unsigned *msix_pba_offset) +{ +uint64_t bar_size, msix_table_size, msix_pba_size; + +bar_size = sizeof(NvmeBar) + 2 * total_queues * NVME_DB_SIZE; +bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB); + +if (msix_table_offset) { +*msix_table_offset = bar_size; +} + +msix_table_size = PCI_MSIX_ENTRY_SIZE * total_irqs; +bar_size += msix_table_size; +bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB); + +if (msix_pba_offset) { +*msix_pba_offset = bar_size; +} + +msix_pba_size = QEMU_ALIGN_UP(total_irqs, 64) / 8; +bar_size += msix_pba_size; + +bar_size = pow2ceil(bar_size); +return bar_size; +} + static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset, uint64_t bar_size) { @@ -6623,7 +6651,7 @@ static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset) static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) { uint8_t *pci_conf = pci_dev->config; -uint64_t bar_size, msix_table_size, msix_pba_size; +uint64_t bar_size; unsigned msix_table_offset, msix_pba_offset; int ret; @@ -6649,19 +6677,8 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) } /* add one to max_ioqpairs to account for the admin queue pair */ -bar_size = sizeof(NvmeBar) + - 2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE; -bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB); -msix_table_offset = bar_size; -msix_table_size = PCI_MSIX_ENTRY_SIZE * n->params.msix_qsize; - -bar_size += msix_table_size; -bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB); -msix_pba_offset = bar_size; -msix_pba_size = QEMU_ALIGN_UP(n->params.msix_qsize, 64) / 8; - -bar_size += msix_pba_size; -bar_size = pow2ceil(bar_size); +bar_size = nvme_bar_size(n->params.max_ioqpairs + 1, n->params.msix_qsize, + _table_offset, _pba_offset); memory_region_init(>bar0, OBJECT(n), "nvme-bar0", bar_size); memory_region_init_io(>iomem, OBJECT(n), _mmio_ops, n, "nvme", -- 2.25.1
[PATCH v5 02/15] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt
From: Knut Omang Add a small intro + minimal documentation for how to implement SR/IOV support for an emulated device. Signed-off-by: Knut Omang --- docs/pcie_sriov.txt | 115 1 file changed, 115 insertions(+) create mode 100644 docs/pcie_sriov.txt diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt new file mode 100644 index 000..f5e891e1d45 --- /dev/null +++ b/docs/pcie_sriov.txt @@ -0,0 +1,115 @@ +PCI SR/IOV EMULATION SUPPORT + + +Description +=== +SR/IOV (Single Root I/O Virtualization) is an optional extended capability +of a PCI Express device. It allows a single physical function (PF) to appear as multiple +virtual functions (VFs) for the main purpose of eliminating software +overhead in I/O from virtual machines. + +Qemu now implements the basic common functionality to enable an emulated device +to support SR/IOV. Yet no fully implemented devices exists in Qemu, but a +proof-of-concept hack of the Intel igb can be found here: + +git://github.com/knuto/qemu.git sriov_patches_v5 + +Implementation +== +Implementing emulation of an SR/IOV capable device typically consists of +implementing support for two types of device classes; the "normal" physical device +(PF) and the virtual device (VF). From Qemu's perspective, the VFs are just +like other devices, except that some of their properties are derived from +the PF. + +A virtual function is different from a physical function in that the BAR +space for all VFs are defined by the BAR registers in the PFs SR/IOV +capability. All VFs have the same BARs and BAR sizes. + +Accesses to these virtual BARs then is computed as + ++ * + + +From our emulation perspective this means that there is a separate call for +setting up a BAR for a VF. + +1) To enable SR/IOV support in the PF, it must be a PCI Express device so + you would need to add a PCI Express capability in the normal PCI + capability list. You might also want to add an ARI (Alternative + Routing-ID Interpretation) capability to indicate that your device + supports functions beyond it's "own" function space (0-7), + which is necessary to support more than 7 functions, or + if functions extends beyond offset 7 because they are placed at an + offset > 1 or have stride > 1. + + ... + #include "hw/pci/pcie.h" + #include "hw/pci/pcie_sriov.h" + + pci_your_pf_dev_realize( ... ) + { + ... + int ret = pcie_endpoint_cap_init(d, 0x70); + ... + pcie_ari_init(d, 0x100, 1); + ... + + /* Add and initialize the SR/IOV capability */ + pcie_sriov_pf_init(d, 0x200, "your_virtual_dev", + vf_devid, initial_vfs, total_vfs, + fun_offset, stride); + + /* Set up individual VF BARs (parameters as for normal BARs) */ + pcie_sriov_pf_init_vf_bar( ... ) + ... + } + + For cleanup, you simply call: + + pcie_sriov_pf_exit(device); + + which will delete all the virtual functions and associated resources. + +2) Similarly in the implementation of the virtual function, you need to + make it a PCI Express device and add a similar set of capabilities + except for the SR/IOV capability. Then you need to set up the VF BARs as + subregions of the PFs SR/IOV VF BARs by calling + pcie_sriov_vf_register_bar() instead of the normal pci_register_bar() call: + + pci_your_vf_dev_realize( ... ) + { + ... + int ret = pcie_endpoint_cap_init(d, 0x60); + ... + pcie_ari_init(d, 0x100, 1); + ... + memory_region_init(mr, ... ) + pcie_sriov_vf_register_bar(d, bar_nr, mr); + ... + } + +Testing on Linux guest +== +The easiest is if your device driver supports sysfs based SR/IOV +enabling. Support for this was added in kernel v.3.8, so not all drivers +support it yet. + +To enable 4 VFs for a device at 01:00.0: + + modprobe yourdriver + echo 4 > /sys/bus/pci/devices/:01:00.0/sriov_numvfs + +You should now see 4 VFs with lspci. +To turn SR/IOV off again - the standard requires you to turn it off before you can enable +another VF count, and the emulation enforces this: + + echo 0 > /sys/bus/pci/devices/:01:00.0/sriov_numvfs + +Older drivers typically provide a max_vfs module parameter +to enable it at load time: + + modprobe yourdriver max_vfs=4 + +To disable the VFs again then, you simply have to unload the driver: + + rmmod yourdriver -- 2.25.1
[PATCH v5 12/15] hw/nvme: Initialize capability structures for primary/secondary controllers
From: Łukasz Gieryk With four new properties: - sriov_v{i,q}_flexible, - sriov_max_v{i,q}_per_vf, one can configure the number of available flexible resources, as well as the limits. The primary and secondary controller capability structures are initialized accordingly. Since the number of available queues (interrupts) now varies between VF/PF, BAR size calculation is also adjusted. Signed-off-by: Łukasz Gieryk --- hw/nvme/ctrl.c | 142 --- hw/nvme/nvme.h | 4 ++ include/block/nvme.h | 5 ++ 3 files changed, 144 insertions(+), 7 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 73707565345..2a6a36e733d 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -36,6 +36,10 @@ * zoned.zasl=, \ * zoned.auto_transition=, \ * sriov_max_vfs= \ + * sriov_vq_flexible= \ + * sriov_vi_flexible= \ + * sriov_max_vi_per_vf= \ + * sriov_max_vq_per_vf= \ * subsys= * -device nvme-ns,drive=,bus=,nsid=,\ * zoned=, \ @@ -113,6 +117,29 @@ * enables reporting of both SR-IOV and ARI capabilities by the NVMe device. * Virtual function controllers will not report SR-IOV capability. * + * NOTE: Single Root I/O Virtualization support is experimental. + * All the related parameters may be subject to change. + * + * - `sriov_vq_flexible` + * Indicates the total number of flexible queue resources assignable to all + * the secondary controllers. Implicitly sets the number of primary + * controller's private resources to `(max_ioqpairs - sriov_vq_flexible)`. + * + * - `sriov_vi_flexible` + * Indicates the total number of flexible interrupt resources assignable to + * all the secondary controllers. Implicitly sets the number of primary + * controller's private resources to `(msix_qsize - sriov_vi_flexible)`. + * + * - `sriov_max_vi_per_vf` + * Indicates the maximum number of virtual interrupt resources assignable + * to a secondary controller. The default 0 resolves to + * `(sriov_vi_flexible / sriov_max_vfs)`. + * + * - `sriov_max_vq_per_vf` + * Indicates the maximum number of virtual queue resources assignable to + * a secondary controller. The default 0 resolves to + * `(sriov_vq_flexible / sriov_max_vfs)`. + * * nvme namespace device parameters * * - `shared` @@ -184,6 +211,7 @@ #define NVME_NUM_FW_SLOTS 1 #define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB) #define NVME_MAX_VFS 127 +#define NVME_VF_RES_GRANULARITY 1 #define NVME_VF_OFFSET 0x1 #define NVME_VF_STRIDE 1 @@ -6512,6 +6540,54 @@ static void nvme_check_constraints(NvmeCtrl *n, Error **errp) error_setg(errp, "PMR is not supported with SR-IOV"); return; } + +if (!params->sriov_vq_flexible || !params->sriov_vi_flexible) { +error_setg(errp, "both sriov_vq_flexible and sriov_vi_flexible" + " must be set for the use of SR-IOV"); +return; +} + +if (params->sriov_vq_flexible < params->sriov_max_vfs * 2) { +error_setg(errp, "sriov_vq_flexible must be greater than or equal" + " to %d (sriov_max_vfs * 2)", params->sriov_max_vfs * 2); +return; +} + +if (params->max_ioqpairs < params->sriov_vq_flexible + 2) { +error_setg(errp, "sriov_vq_flexible - max_ioqpairs (PF-private" + " queue resources) must be greater than or equal to 2"); +return; +} + +if (params->sriov_vi_flexible < params->sriov_max_vfs) { +error_setg(errp, "sriov_vi_flexible must be greater than or equal" + " to %d (sriov_max_vfs)", params->sriov_max_vfs); +return; +} + +if (params->msix_qsize < params->sriov_vi_flexible + 1) { +error_setg(errp, "sriov_vi_flexible - msix_qsize (PF-private" + " interrupt resources) must be greater than or equal" + " to 1"); +return; +} + +if (params->sriov_max_vi_per_vf && +(params->sriov_max_vi_per_vf - 1) % NVME_VF_RES_GRANULARITY) { +error_setg(errp, "sriov_max_vi_per_vf must meet:" + " (X - 1) %% %d == 0 and X >= 1", + NVME_VF_RES_GRANULARITY); +return; +} + +if (params->sriov_max_vq_per_vf && +(params->sriov_max_vq_per_vf < 2 || + (params->sriov_max_vq_per_vf - 1) % NVME_VF_RES_GRANULARITY)) { +error_setg(errp, "sriov_max_vq_per_vf must meet:" + " (X - 1) %% %d == 0 and X >= 2", + NVME_VF_RES_GRANULARITY); +return; +} } } @@ -6520,10 +6596,19 @@ static void nvme_init_state(NvmeCtrl *n) NvmePriCtrlCap *cap = >pri_ctrl_cap;
[PATCH v5 03/15] pcie: Add a helper to the SR/IOV API
From: Łukasz Gieryk Convenience function for retrieving the PCIDevice object of the N-th VF. Signed-off-by: Łukasz Gieryk Reviewed-by: Knut Omang --- hw/pci/pcie_sriov.c | 10 +- include/hw/pci/pcie_sriov.h | 6 ++ 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c index 3f256d483fa..87abad6ac86 100644 --- a/hw/pci/pcie_sriov.c +++ b/hw/pci/pcie_sriov.c @@ -287,8 +287,16 @@ uint16_t pcie_sriov_vf_number(PCIDevice *dev) return dev->exp.sriov_vf.vf_number; } - PCIDevice *pcie_sriov_get_pf(PCIDevice *dev) { return dev->exp.sriov_vf.pf; } + +PCIDevice *pcie_sriov_get_vf_at_index(PCIDevice *dev, int n) +{ +assert(!pci_is_vf(dev)); +if (n < dev->exp.sriov_pf.num_vfs) { +return dev->exp.sriov_pf.vf[n]; +} +return NULL; +} diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h index 990cff0a1c6..80f5c84e75c 100644 --- a/include/hw/pci/pcie_sriov.h +++ b/include/hw/pci/pcie_sriov.h @@ -68,4 +68,10 @@ uint16_t pcie_sriov_vf_number(PCIDevice *dev); */ PCIDevice *pcie_sriov_get_pf(PCIDevice *dev); +/* + * Get the n-th VF of this physical function - only valid for PF. + * Returns NULL if index is invalid + */ +PCIDevice *pcie_sriov_get_vf_at_index(PCIDevice *dev, int n); + #endif /* QEMU_PCIE_SRIOV_H */ -- 2.25.1
[PATCH v5 08/15] hw/nvme: Implement the Function Level Reset
From: Łukasz Gieryk This patch implements the Function Level Reset, a feature currently not implemented for the Nvme device, while listed as a mandatory ("shall") in the 1.4 spec. The implementation reuses FLR-related building blocks defined for the pci-bridge module, and follows the same logic: - FLR capability is advertised in the PCIE config, - custom pci_write_config callback detects a write to the trigger register and performs the PCI reset, - which, eventually, calls the custom dc->reset handler. Depending on reset type, parts of the state should (or should not) be cleared. To distinguish the type of reset, an additional parameter is passed to the reset function. This patch also enables advertisement of the Power Management PCI capability. The main reason behind it is to announce the no_soft_reset=1 bit, to signal SR-IOV support where each VF can be reset individually. The implementation purposedly ignores writes to the PMCS.PS register, as even such naïve behavior is enough to correctly handle the D3->D0 transition. It’s worth to note, that the power state transition back to to D3, with all the corresponding side effects, wasn't and stil isn't handled properly. Signed-off-by: Łukasz Gieryk Reviewed-by: Klaus Jensen --- hw/nvme/ctrl.c | 52 hw/nvme/nvme.h | 5 + hw/nvme/trace-events | 1 + 3 files changed, 54 insertions(+), 4 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 05acd681656..7c1dd80f21d 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -5757,7 +5757,7 @@ static void nvme_process_sq(void *opaque) } } -static void nvme_ctrl_reset(NvmeCtrl *n) +static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst) { NvmeNamespace *ns; int i; @@ -5789,7 +5789,9 @@ static void nvme_ctrl_reset(NvmeCtrl *n) } if (!pci_is_vf(>parent_obj) && n->params.sriov_max_vfs) { -pcie_sriov_pf_disable_vfs(>parent_obj); +if (rst != NVME_RESET_CONTROLLER) { +pcie_sriov_pf_disable_vfs(>parent_obj); +} } n->aer_queued = 0; @@ -6023,7 +6025,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, uint64_t data, } } else if (!NVME_CC_EN(data) && NVME_CC_EN(cc)) { trace_pci_nvme_mmio_stopped(); -nvme_ctrl_reset(n); +nvme_ctrl_reset(n, NVME_RESET_CONTROLLER); cc = 0; csts &= ~NVME_CSTS_READY; } @@ -6581,6 +6583,28 @@ static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset, PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size); } +static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset) +{ +Error *err = NULL; +int ret; + +ret = pci_add_capability(pci_dev, PCI_CAP_ID_PM, offset, + PCI_PM_SIZEOF, ); +if (err) { +error_report_err(err); +return ret; +} + +pci_set_word(pci_dev->config + offset + PCI_PM_PMC, + PCI_PM_CAP_VER_1_2); +pci_set_word(pci_dev->config + offset + PCI_PM_CTRL, + PCI_PM_CTRL_NO_SOFT_RESET); +pci_set_word(pci_dev->wmask + offset + PCI_PM_CTRL, + PCI_PM_CTRL_STATE_MASK); + +return 0; +} + static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) { uint8_t *pci_conf = pci_dev->config; @@ -6602,7 +6626,9 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) } pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS); +nvme_add_pm_capability(pci_dev, 0x60); pcie_endpoint_cap_init(pci_dev, 0x80); +pcie_cap_flr_init(pci_dev); if (n->params.sriov_max_vfs) { pcie_ari_init(pci_dev, 0x100, 1); } @@ -6852,7 +6878,7 @@ static void nvme_exit(PCIDevice *pci_dev) NvmeNamespace *ns; int i; -nvme_ctrl_reset(n); +nvme_ctrl_reset(n, NVME_RESET_FUNCTION); if (n->subsys) { for (i = 1; i <= NVME_MAX_NAMESPACES; i++) { @@ -6951,6 +6977,22 @@ static void nvme_set_smart_warning(Object *obj, Visitor *v, const char *name, } } +static void nvme_pci_reset(DeviceState *qdev) +{ +PCIDevice *pci_dev = PCI_DEVICE(qdev); +NvmeCtrl *n = NVME(pci_dev); + +trace_pci_nvme_pci_reset(); +nvme_ctrl_reset(n, NVME_RESET_FUNCTION); +} + +static void nvme_pci_write_config(PCIDevice *dev, uint32_t address, + uint32_t val, int len) +{ +pci_default_write_config(dev, address, val, len); +pcie_cap_flr_write_config(dev, address, val, len); +} + static const VMStateDescription nvme_vmstate = { .name = "nvme", .unmigratable = 1, @@ -6962,6 +7004,7 @@ static void nvme_class_init(ObjectClass *oc, void *data) PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc); pc->realize = nvme_realize; +pc->config_write = nvme_pci_write_config; pc->exit = nvme_exit; pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
[PATCH v5 10/15] hw/nvme: Remove reg_size variable and update BAR0 size calculation
From: Łukasz Gieryk The n->reg_size parameter unnecessarily splits the BAR0 size calculation in two phases; removed to simplify the code. With all the calculations done in one place, it seems the pow2ceil, applied originally to reg_size, is unnecessary. The rounding should happen as the last step, when BAR size includes Nvme registers, queue registers, and MSIX-related space. Finally, the size of the mmio memory region is extended to cover the 1st 4KiB padding (see the map below). Access to this range is handled as interaction with a non-existing queue and generates an error trace, so actually nothing changes, while the reg_size variable is no longer needed. | BAR0| [Nvme Registers] [Queues] [power-of-2 padding] - removed in this patch [4KiB padding (1) ] [MSIX TABLE] [4KiB padding (2) ] [MSIX PBA ] [power-of-2 padding] Signed-off-by: Łukasz Gieryk Reviewed-by: Klaus Jensen --- hw/nvme/ctrl.c | 10 +- hw/nvme/nvme.h | 1 - 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index f1b4026e4f8..6abec8e4369 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -6525,9 +6525,6 @@ static void nvme_init_state(NvmeCtrl *n) n->conf_ioqpairs = n->params.max_ioqpairs; n->conf_msix_qsize = n->params.msix_qsize; -/* add one to max_ioqpairs to account for the admin queue pair */ -n->reg_size = pow2ceil(sizeof(NvmeBar) + - 2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE); n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1); n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1); n->temperature = NVME_TEMPERATURE; @@ -6651,7 +6648,10 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) pcie_ari_init(pci_dev, 0x100, 1); } -bar_size = QEMU_ALIGN_UP(n->reg_size, 4 * KiB); +/* add one to max_ioqpairs to account for the admin queue pair */ +bar_size = sizeof(NvmeBar) + + 2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE; +bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB); msix_table_offset = bar_size; msix_table_size = PCI_MSIX_ENTRY_SIZE * n->params.msix_qsize; @@ -6665,7 +6665,7 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) memory_region_init(>bar0, OBJECT(n), "nvme-bar0", bar_size); memory_region_init_io(>iomem, OBJECT(n), _mmio_ops, n, "nvme", - n->reg_size); + msix_table_offset); memory_region_add_subregion(>bar0, 0, >iomem); if (pci_is_vf(pci_dev)) { diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 314a2894759..86b5b321331 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -424,7 +424,6 @@ typedef struct NvmeCtrl { uint16_tmax_prp_ents; uint16_tcqe_size; uint16_tsqe_size; -uint32_treg_size; uint32_tmax_q_ents; uint8_t outstanding_aers; uint32_tirq_status; -- 2.25.1
[PATCH v5 05/15] hw/nvme: Add support for SR-IOV
This patch implements initial support for Single Root I/O Virtualization on an NVMe device. Essentially, it allows to define the maximum number of virtual functions supported by the NVMe controller via sriov_max_vfs parameter. Passing a non-zero value to sriov_max_vfs triggers reporting of SR-IOV capability by a physical controller and ARI capability by both the physical and virtual function devices. NVMe controllers created via virtual functions mirror functionally the physical controller, which may not entirely be the case, thus consideration would be needed on the way to limit the capabilities of the VF. NVMe subsystem is required for the use of SR-IOV. Signed-off-by: Lukasz Maniak --- hw/nvme/ctrl.c | 85 ++-- hw/nvme/nvme.h | 3 +- include/hw/pci/pci_ids.h | 1 + 3 files changed, 85 insertions(+), 4 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index 98aac98bef5..adeba0b2b6d 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -35,6 +35,7 @@ * mdts=,vsl=, \ * zoned.zasl=, \ * zoned.auto_transition=, \ + * sriov_max_vfs= \ * subsys= * -device nvme-ns,drive=,bus=,nsid=,\ * zoned=, \ @@ -106,6 +107,12 @@ * transitioned to zone state closed for resource management purposes. * Defaults to 'on'. * + * - `sriov_max_vfs` + * Indicates the maximum number of PCIe virtual functions supported + * by the controller. The default value is 0. Specifying a non-zero value + * enables reporting of both SR-IOV and ARI capabilities by the NVMe device. + * Virtual function controllers will not report SR-IOV capability. + * * nvme namespace device parameters * * - `shared` @@ -160,6 +167,7 @@ #include "sysemu/block-backend.h" #include "sysemu/hostmem.h" #include "hw/pci/msix.h" +#include "hw/pci/pcie_sriov.h" #include "migration/vmstate.h" #include "nvme.h" @@ -175,6 +183,9 @@ #define NVME_TEMPERATURE_CRITICAL 0x175 #define NVME_NUM_FW_SLOTS 1 #define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB) +#define NVME_MAX_VFS 127 +#define NVME_VF_OFFSET 0x1 +#define NVME_VF_STRIDE 1 #define NVME_GUEST_ERR(trace, fmt, ...) \ do { \ @@ -5742,6 +5753,10 @@ static void nvme_ctrl_reset(NvmeCtrl *n) g_free(event); } +if (!pci_is_vf(>parent_obj) && n->params.sriov_max_vfs) { +pcie_sriov_pf_disable_vfs(>parent_obj); +} + n->aer_queued = 0; n->outstanding_aers = 0; n->qs_created = false; @@ -6423,6 +6438,29 @@ static void nvme_check_constraints(NvmeCtrl *n, Error **errp) error_setg(errp, "vsl must be non-zero"); return; } + +if (params->sriov_max_vfs) { +if (!n->subsys) { +error_setg(errp, "subsystem is required for the use of SR-IOV"); +return; +} + +if (params->sriov_max_vfs > NVME_MAX_VFS) { +error_setg(errp, "sriov_max_vfs must be between 0 and %d", + NVME_MAX_VFS); +return; +} + +if (params->cmb_size_mb) { +error_setg(errp, "CMB is not supported with SR-IOV"); +return; +} + +if (n->pmr.dev) { +error_setg(errp, "PMR is not supported with SR-IOV"); +return; +} +} } static void nvme_init_state(NvmeCtrl *n) @@ -6480,6 +6518,20 @@ static void nvme_init_pmr(NvmeCtrl *n, PCIDevice *pci_dev) memory_region_set_enabled(>pmr.dev->mr, false); } +static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset, +uint64_t bar_size) +{ +uint16_t vf_dev_id = n->params.use_intel_id ? + PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME; + +pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id, + n->params.sriov_max_vfs, n->params.sriov_max_vfs, + NVME_VF_OFFSET, NVME_VF_STRIDE); + +pcie_sriov_pf_init_vf_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY | + PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size); +} + static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) { uint8_t *pci_conf = pci_dev->config; @@ -6494,7 +6546,7 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) if (n->params.use_intel_id) { pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_INTEL); -pci_config_set_device_id(pci_conf, 0x5845); +pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_INTEL_NVME); } else { pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_REDHAT); pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_REDHAT_NVME); @@ -6502,6 +6554,9 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS); pcie_endpoint_cap_init(pci_dev, 0x80); +if
Re: [PATCH v2 00/12] GL & D-Bus display related fixes
On Fri, Feb 18, 2022 at 2:36 AM Marc-André Lureau wrote: > > Hi > > On Thu, Feb 17, 2022 at 9:25 PM Akihiko Odaki wrote: >> >> On Fri, Feb 18, 2022 at 2:07 AM Marc-André Lureau >> wrote: >> > >> > Hi >> > >> > On Thu, Feb 17, 2022 at 8:39 PM Akihiko Odaki >> > wrote: >> >> >> >> On Fri, Feb 18, 2022 at 1:12 AM Marc-André Lureau >> >> wrote: >> >> > >> >> > Hi >> >> > >> >> > On Thu, Feb 17, 2022 at 5:09 PM Akihiko Odaki >> >> > wrote: >> >> >> >> >> >> On Thu, Feb 17, 2022 at 8:58 PM wrote: >> >> >> > >> >> >> > From: Marc-André Lureau >> >> >> > >> >> >> > Hi, >> >> >> > >> >> >> > In the thread "[PATCH 0/6] ui/dbus: Share one listener for a >> >> >> > console", Akihiko >> >> >> > Odaki reported a number of issues with the GL and D-Bus display. His >> >> >> > series >> >> >> > propose a different design, and reverting some of my previous >> >> >> > generic console >> >> >> > changes to fix those issues. >> >> >> > >> >> >> > However, as I work through the issue so far, they can be solved by >> >> >> > reasonable >> >> >> > simple fixes while keeping the console changes generic (not specific >> >> >> > to the >> >> >> > D-Bus display backend). I belive a shared infrastructure is more >> >> >> > beneficial long >> >> >> > term than having GL-specific code in the DBus code (in particular, >> >> >> > the >> >> >> > egl-headless & VNC combination could potentially use it) >> >> >> > >> >> >> > Thanks a lot Akihiko for reporting the issues proposing a different >> >> >> > approach! >> >> >> > Please test this alternative series and let me know of any further >> >> >> > issues. My >> >> >> > understanding is that you are mainly concerned with the Cocoa >> >> >> > backend, and I >> >> >> > don't have a way to test it, so please check it. If necessary, we >> >> >> > may well have >> >> >> > to revert my earlier changes and go your way, eventually. >> >> >> > >> >> >> > Marc-André Lureau (12): >> >> >> > ui/console: fix crash when using gl context with non-gl listeners >> >> >> > ui/console: fix texture leak when calling >> >> >> > surface_gl_create_texture() >> >> >> > ui: do not create a surface when resizing a GL scanout >> >> >> > ui/console: move check for compatible GL context >> >> >> > ui/console: move dcl compatiblity check to a callback >> >> >> > ui/console: egl-headless is compatible with non-gl listeners >> >> >> > ui/dbus: associate the DBusDisplayConsole listener with the given >> >> >> > console >> >> >> > ui/console: move console compatibility check to >> >> >> > dcl_display_console() >> >> >> > ui/shader: fix potential leak of shader on error >> >> >> > ui/shader: free associated programs >> >> >> > ui/console: add a dpy_gfx_switch callback helper >> >> >> > ui/dbus: fix texture sharing >> >> >> > >> >> >> > include/ui/console.h | 19 --- >> >> >> > ui/dbus.h| 3 ++ >> >> >> > ui/console-gl.c | 4 ++ >> >> >> > ui/console.c | 119 >> >> >> > ++- >> >> >> > ui/dbus-console.c| 27 +- >> >> >> > ui/dbus-listener.c | 11 >> >> >> > ui/dbus.c| 33 +++- >> >> >> > ui/egl-headless.c| 17 ++- >> >> >> > ui/gtk.c | 18 ++- >> >> >> > ui/sdl2.c| 9 +++- >> >> >> > ui/shader.c | 9 +++- >> >> >> > ui/spice-display.c | 9 +++- >> >> >> > 12 files changed, 192 insertions(+), 86 deletions(-) >> >> >> > >> >> >> > -- >> >> >> > 2.34.1.428.gdcc0cd074f0c >> >> >> > >> >> >> > >> >> >> >> >> >> You missed only one thing: >> >> >> >- that console_select and register_displaychangelistener may not call >> >> >> > dpy_gfx_switch and call dpy_gl_scanout_texture instead. It is >> >> >> > incompatible with non-OpenGL displays >> >> >> >> >> >> displaychangelistener_display_console always has to call >> >> >> dpy_gfx_switch for non-OpenGL displays, but it still doesn't. >> >> > >> >> > >> >> > Ok, would that be what you have in mind? >> >> > >> >> > --- a/ui/console.c >> >> > +++ b/ui/console.c >> >> > @@ -1122,6 +1122,10 @@ static void >> >> > displaychangelistener_display_console(DisplayChangeListener *dcl, >> >> > } else if (con->scanout.kind == SCANOUT_SURFACE) { >> >> > dpy_gfx_create_texture(con, con->surface); >> >> > displaychangelistener_gfx_switch(dcl, con->surface); >> >> > +} else { >> >> > +/* use the fallback surface, egl-headless keeps it updated */ >> >> > +assert(con->surface); >> >> > +displaychangelistener_gfx_switch(dcl, con->surface); >> >> > } >> >> >> >> It should call displaychangelistener_gfx_switch even when e.g. >> >> con->scanout.kind == SCANOUT_TEXTURE. egl-headless renders the content >> >> to the last DisplaySurface it received while con->scanout.kind == >> >> SCANOUT_TEXTURE. >> > >> > >> > I see, egl-headless is really not a "listener". >> > >> >> >> >> > >> >> > I wish such egl-headless specific code would be
[PATCH v5 01/15] pcie: Add support for Single Root I/O Virtualization (SR/IOV)
From: Knut Omang This patch provides the building blocks for creating an SR/IOV PCIe Extended Capability header and register/unregister SR/IOV Virtual Functions. Signed-off-by: Knut Omang --- hw/pci/meson.build | 1 + hw/pci/pci.c| 100 +--- hw/pci/pcie.c | 5 + hw/pci/pcie_sriov.c | 294 hw/pci/trace-events | 5 + include/hw/pci/pci.h| 12 +- include/hw/pci/pcie.h | 6 + include/hw/pci/pcie_sriov.h | 71 + include/qemu/typedefs.h | 2 + 9 files changed, 470 insertions(+), 26 deletions(-) create mode 100644 hw/pci/pcie_sriov.c create mode 100644 include/hw/pci/pcie_sriov.h diff --git a/hw/pci/meson.build b/hw/pci/meson.build index 5c4bbac8171..bcc9c75919f 100644 --- a/hw/pci/meson.build +++ b/hw/pci/meson.build @@ -5,6 +5,7 @@ pci_ss.add(files( 'pci.c', 'pci_bridge.c', 'pci_host.c', + 'pcie_sriov.c', 'shpc.c', 'slotid_cap.c' )) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 5d30f9ca60e..ba8fb92efc6 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -239,6 +239,9 @@ int pci_bar(PCIDevice *d, int reg) { uint8_t type; +/* PCIe virtual functions do not have their own BARs */ +assert(!pci_is_vf(d)); + if (reg != PCI_ROM_SLOT) return PCI_BASE_ADDRESS_0 + reg * 4; @@ -304,10 +307,30 @@ void pci_device_deassert_intx(PCIDevice *dev) } } -static void pci_do_device_reset(PCIDevice *dev) +static void pci_reset_regions(PCIDevice *dev) { int r; +if (pci_is_vf(dev)) { +return; +} + +for (r = 0; r < PCI_NUM_REGIONS; ++r) { +PCIIORegion *region = >io_regions[r]; +if (!region->size) { +continue; +} +if (!(region->type & PCI_BASE_ADDRESS_SPACE_IO) && +region->type & PCI_BASE_ADDRESS_MEM_TYPE_64) { +pci_set_quad(dev->config + pci_bar(dev, r), region->type); +} else { +pci_set_long(dev->config + pci_bar(dev, r), region->type); +} +} +} + +static void pci_do_device_reset(PCIDevice *dev) +{ pci_device_deassert_intx(dev); assert(dev->irq_state == 0); @@ -323,19 +346,7 @@ static void pci_do_device_reset(PCIDevice *dev) pci_get_word(dev->wmask + PCI_INTERRUPT_LINE) | pci_get_word(dev->w1cmask + PCI_INTERRUPT_LINE)); dev->config[PCI_CACHE_LINE_SIZE] = 0x0; -for (r = 0; r < PCI_NUM_REGIONS; ++r) { -PCIIORegion *region = >io_regions[r]; -if (!region->size) { -continue; -} - -if (!(region->type & PCI_BASE_ADDRESS_SPACE_IO) && -region->type & PCI_BASE_ADDRESS_MEM_TYPE_64) { -pci_set_quad(dev->config + pci_bar(dev, r), region->type); -} else { -pci_set_long(dev->config + pci_bar(dev, r), region->type); -} -} +pci_reset_regions(dev); pci_update_mappings(dev); msi_reset(dev); @@ -884,6 +895,16 @@ static void pci_init_multifunction(PCIBus *bus, PCIDevice *dev, Error **errp) dev->config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION; } +/* + * With SR/IOV and ARI, a device at function 0 need not be a multifunction + * device, as it may just be a VF that ended up with function 0 in + * the legacy PCI interpretation. Avoid failing in such cases: + */ +if (pci_is_vf(dev) && +dev->exp.sriov_vf.pf->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) { +return; +} + /* * multifunction bit is interpreted in two ways as follows. * - all functions must set the bit to 1. @@ -1083,6 +1104,7 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, bus->devices[devfn]->name); return NULL; } else if (dev->hotplugged && + !pci_is_vf(pci_dev) && pci_get_function_0(pci_dev)) { error_setg(errp, "PCI: slot %d function 0 already occupied by %s," " new func %s cannot be exposed to guest.", @@ -1191,6 +1213,7 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num, pcibus_t size = memory_region_size(memory); uint8_t hdr_type; +assert(!pci_is_vf(pci_dev)); /* VFs must use pcie_sriov_vf_register_bar */ assert(region_num >= 0); assert(region_num < PCI_NUM_REGIONS); assert(is_power_of_2(size)); @@ -1294,11 +1317,45 @@ pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int region_num) return pci_dev->io_regions[region_num].addr; } -static pcibus_t pci_bar_address(PCIDevice *d, -int reg, uint8_t type, pcibus_t size) +static pcibus_t pci_config_get_bar_addr(PCIDevice *d, int reg, +uint8_t type, pcibus_t size) +{ +pcibus_t new_addr; +if (!pci_is_vf(d)) { +int bar = pci_bar(d, reg); +if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) { +new_addr =
[PATCH v5 06/15] hw/nvme: Add support for Primary Controller Capabilities
Implementation of Primary Controller Capabilities data structure (Identify command with CNS value of 14h). Currently, the command returns only ID of a primary controller. Handling of remaining fields are added in subsequent patches implementing virtualization enhancements. Signed-off-by: Lukasz Maniak Reviewed-by: Klaus Jensen --- hw/nvme/ctrl.c | 23 ++- hw/nvme/nvme.h | 2 ++ hw/nvme/trace-events | 1 + include/block/nvme.h | 23 +++ 4 files changed, 44 insertions(+), 5 deletions(-) diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index adeba0b2b6d..0bd55948ce1 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -4697,6 +4697,14 @@ static uint16_t nvme_identify_ctrl_list(NvmeCtrl *n, NvmeRequest *req, return nvme_c2h(n, (uint8_t *)list, sizeof(list), req); } +static uint16_t nvme_identify_pri_ctrl_cap(NvmeCtrl *n, NvmeRequest *req) +{ +trace_pci_nvme_identify_pri_ctrl_cap(le16_to_cpu(n->pri_ctrl_cap.cntlid)); + +return nvme_c2h(n, (uint8_t *)>pri_ctrl_cap, +sizeof(NvmePriCtrlCap), req); +} + static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req, bool active) { @@ -4915,6 +4923,8 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest *req) return nvme_identify_ctrl_list(n, req, true); case NVME_ID_CNS_CTRL_LIST: return nvme_identify_ctrl_list(n, req, false); +case NVME_ID_CNS_PRIMARY_CTRL_CAP: +return nvme_identify_pri_ctrl_cap(n, req); case NVME_ID_CNS_CS_NS: return nvme_identify_ns_csi(n, req, true); case NVME_ID_CNS_CS_NS_PRESENT: @@ -6465,6 +6475,8 @@ static void nvme_check_constraints(NvmeCtrl *n, Error **errp) static void nvme_init_state(NvmeCtrl *n) { +NvmePriCtrlCap *cap = >pri_ctrl_cap; + /* add one to max_ioqpairs to account for the admin queue pair */ n->reg_size = pow2ceil(sizeof(NvmeBar) + 2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE); @@ -6474,6 +6486,8 @@ static void nvme_init_state(NvmeCtrl *n) n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING; n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL); n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1); + +cap->cntlid = cpu_to_le16(n->cntlid); } static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev) @@ -6774,15 +6788,14 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) qbus_init(>bus, sizeof(NvmeBus), TYPE_NVME_BUS, _dev->qdev, n->parent_obj.qdev.id); -nvme_init_state(n); -if (nvme_init_pci(n, pci_dev, errp)) { -return; -} - if (nvme_init_subsys(n, errp)) { error_propagate(errp, local_err); return; } +nvme_init_state(n); +if (nvme_init_pci(n, pci_dev, errp)) { +return; +} nvme_init_ctrl(n, pci_dev); /* setup a namespace if the controller drive property was given */ diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h index 17245db96b5..2db48eb25c9 100644 --- a/hw/nvme/nvme.h +++ b/hw/nvme/nvme.h @@ -471,6 +471,8 @@ typedef struct NvmeCtrl { }; uint32_tasync_config; } features; + +NvmePriCtrlCap pri_ctrl_cap; } NvmeCtrl; static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid) diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events index 90730d802fe..bfc09dddc62 100644 --- a/hw/nvme/trace-events +++ b/hw/nvme/trace-events @@ -52,6 +52,7 @@ pci_nvme_identify_ctrl(void) "identify controller" pci_nvme_identify_ctrl_csi(uint8_t csi) "identify controller, csi=0x%"PRIx8"" pci_nvme_identify_ns(uint32_t ns) "nsid %"PRIu32"" pci_nvme_identify_ctrl_list(uint8_t cns, uint16_t cntid) "cns 0x%"PRIx8" cntid %"PRIu16"" +pci_nvme_identify_pri_ctrl_cap(uint16_t cntlid) "identify primary controller capabilities cntlid=%"PRIu16"" pci_nvme_identify_ns_csi(uint32_t ns, uint8_t csi) "nsid=%"PRIu32", csi=0x%"PRIx8"" pci_nvme_identify_nslist(uint32_t ns) "nsid %"PRIu32"" pci_nvme_identify_nslist_csi(uint16_t ns, uint8_t csi) "nsid=%"PRIu16", csi=0x%"PRIx8"" diff --git a/include/block/nvme.h b/include/block/nvme.h index cd068ac8914..73666cc900a 100644 --- a/include/block/nvme.h +++ b/include/block/nvme.h @@ -1019,6 +1019,7 @@ enum NvmeIdCns { NVME_ID_CNS_NS_PRESENT= 0x11, NVME_ID_CNS_NS_ATTACHED_CTRL_LIST = 0x12, NVME_ID_CNS_CTRL_LIST = 0x13, +NVME_ID_CNS_PRIMARY_CTRL_CAP = 0x14, NVME_ID_CNS_CS_NS_PRESENT_LIST= 0x1a, NVME_ID_CNS_CS_NS_PRESENT = 0x1b, NVME_ID_CNS_IO_COMMAND_SET= 0x1c, @@ -1503,6 +1504,27 @@ typedef enum NvmeZoneState { NVME_ZONE_STATE_OFFLINE = 0x0f, } NvmeZoneState; +typedef struct QEMU_PACKED NvmePriCtrlCap { +uint16_tcntlid; +uint16_tportid; +uint8_t crt; +uint8_t rsvd5[27]; +uint32_tvqfrt; +uint32_tvqrfa; +uint16_tvqrfap; +uint16_tvqprt; +
Re: [PATCH] ppc/spapr: Advertise StoreEOI for POWER10 compat guests
Unfortunately, this patch breaks migration under TCG because the XIVE source flag is not updated on the target side. KVM is not impacted because the emulated sources are not used. This needs to be addressed in a v2. That said, even without this patch, TCG migration is broken. some CPUs on the receive side are stalled on CPU Hard LOCKUPs. QEMU 6.2 is impacted. So it has been a while :/ Ouch. Guess we need to add TCG migration tests in the test workflow ... Regarding the first issue with the new XIVE source flag, this routine changes an object property after realize which is a no-no for migration : void spapr_xive_enable_store_eoi(SpaprXive *xive, bool enable) { if (enable) { xive->source.esb_flags |= XIVE_SRC_STORE_EOI; } else { xive->source.esb_flags &= ~XIVE_SRC_STORE_EOI; } } I think we need a new SpaprXive state to represent the characteristic of the source indirectly negotiated by CAS when the CPU is a POWER10. we would use it to update xive->source.esb_flags at post_load time after migration. Or simply mimick CAS : @@ -531,6 +531,14 @@ static int spapr_xive_post_load(SpaprInt return kvmppc_xive_post_load(xive, version_id); } +PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu); +bool enable = ppc_check_compat(first_ppc_cpu, CPU_POWERPC_LOGICAL_3_10, 0, + first_ppc_cpu->compat_pvr); +spapr_xive_enable_store_eoi(xive, enable); + return 0; } which has the benefit of being stateless. Ideas ? Thanks, C.
[PATCH v5 00/15] hw/nvme: SR-IOV with Virtualization Enhancements
Changes since v4: - Added hello world example for SR-IOV to the docs - Moved AER initialization from nvme_init_ctrl to nvme_init_state - Fixed division by zero issue in calculation of vqfrt and vifrt capabilities Knut Omang (2): pcie: Add support for Single Root I/O Virtualization (SR/IOV) pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt Lukasz Maniak (4): hw/nvme: Add support for SR-IOV hw/nvme: Add support for Primary Controller Capabilities hw/nvme: Add support for Secondary Controller List docs: Add documentation for SR-IOV and Virtualization Enhancements Łukasz Gieryk (9): pcie: Add a helper to the SR/IOV API pcie: Add 1.2 version token for the Power Management Capability hw/nvme: Implement the Function Level Reset hw/nvme: Make max_ioqpairs and msix_qsize configurable in runtime hw/nvme: Remove reg_size variable and update BAR0 size calculation hw/nvme: Calculate BAR attributes in a function hw/nvme: Initialize capability structures for primary/secondary controllers hw/nvme: Add support for the Virtualization Management command hw/nvme: Update the initalization place for the AER queue docs/pcie_sriov.txt | 115 ++ docs/system/devices/nvme.rst | 82 + hw/nvme/ctrl.c | 674 --- hw/nvme/ns.c | 2 +- hw/nvme/nvme.h | 55 ++- hw/nvme/subsys.c | 75 +++- hw/nvme/trace-events | 6 + hw/pci/meson.build | 1 + hw/pci/pci.c | 100 -- hw/pci/pcie.c| 5 + hw/pci/pcie_sriov.c | 302 hw/pci/trace-events | 5 + include/block/nvme.h | 65 include/hw/pci/pci.h | 12 +- include/hw/pci/pci_ids.h | 1 + include/hw/pci/pci_regs.h| 1 + include/hw/pci/pcie.h| 6 + include/hw/pci/pcie_sriov.h | 77 include/qemu/typedefs.h | 2 + 19 files changed, 1505 insertions(+), 81 deletions(-) create mode 100644 docs/pcie_sriov.txt create mode 100644 hw/pci/pcie_sriov.c create mode 100644 include/hw/pci/pcie_sriov.h -- 2.25.1
Re: [PATCH v2 00/12] GL & D-Bus display related fixes
Hi On Thu, Feb 17, 2022 at 9:25 PM Akihiko Odaki wrote: > On Fri, Feb 18, 2022 at 2:07 AM Marc-André Lureau > wrote: > > > > Hi > > > > On Thu, Feb 17, 2022 at 8:39 PM Akihiko Odaki > wrote: > >> > >> On Fri, Feb 18, 2022 at 1:12 AM Marc-André Lureau > >> wrote: > >> > > >> > Hi > >> > > >> > On Thu, Feb 17, 2022 at 5:09 PM Akihiko Odaki < > akihiko.od...@gmail.com> wrote: > >> >> > >> >> On Thu, Feb 17, 2022 at 8:58 PM wrote: > >> >> > > >> >> > From: Marc-André Lureau > >> >> > > >> >> > Hi, > >> >> > > >> >> > In the thread "[PATCH 0/6] ui/dbus: Share one listener for a > console", Akihiko > >> >> > Odaki reported a number of issues with the GL and D-Bus display. > His series > >> >> > propose a different design, and reverting some of my previous > generic console > >> >> > changes to fix those issues. > >> >> > > >> >> > However, as I work through the issue so far, they can be solved by > reasonable > >> >> > simple fixes while keeping the console changes generic (not > specific to the > >> >> > D-Bus display backend). I belive a shared infrastructure is more > beneficial long > >> >> > term than having GL-specific code in the DBus code (in particular, > the > >> >> > egl-headless & VNC combination could potentially use it) > >> >> > > >> >> > Thanks a lot Akihiko for reporting the issues proposing a > different approach! > >> >> > Please test this alternative series and let me know of any further > issues. My > >> >> > understanding is that you are mainly concerned with the Cocoa > backend, and I > >> >> > don't have a way to test it, so please check it. If necessary, we > may well have > >> >> > to revert my earlier changes and go your way, eventually. > >> >> > > >> >> > Marc-André Lureau (12): > >> >> > ui/console: fix crash when using gl context with non-gl listeners > >> >> > ui/console: fix texture leak when calling > surface_gl_create_texture() > >> >> > ui: do not create a surface when resizing a GL scanout > >> >> > ui/console: move check for compatible GL context > >> >> > ui/console: move dcl compatiblity check to a callback > >> >> > ui/console: egl-headless is compatible with non-gl listeners > >> >> > ui/dbus: associate the DBusDisplayConsole listener with the given > >> >> > console > >> >> > ui/console: move console compatibility check to > dcl_display_console() > >> >> > ui/shader: fix potential leak of shader on error > >> >> > ui/shader: free associated programs > >> >> > ui/console: add a dpy_gfx_switch callback helper > >> >> > ui/dbus: fix texture sharing > >> >> > > >> >> > include/ui/console.h | 19 --- > >> >> > ui/dbus.h| 3 ++ > >> >> > ui/console-gl.c | 4 ++ > >> >> > ui/console.c | 119 > ++- > >> >> > ui/dbus-console.c| 27 +- > >> >> > ui/dbus-listener.c | 11 > >> >> > ui/dbus.c| 33 +++- > >> >> > ui/egl-headless.c| 17 ++- > >> >> > ui/gtk.c | 18 ++- > >> >> > ui/sdl2.c| 9 +++- > >> >> > ui/shader.c | 9 +++- > >> >> > ui/spice-display.c | 9 +++- > >> >> > 12 files changed, 192 insertions(+), 86 deletions(-) > >> >> > > >> >> > -- > >> >> > 2.34.1.428.gdcc0cd074f0c > >> >> > > >> >> > > >> >> > >> >> You missed only one thing: > >> >> >- that console_select and register_displaychangelistener may not > call > >> >> > dpy_gfx_switch and call dpy_gl_scanout_texture instead. It is > >> >> > incompatible with non-OpenGL displays > >> >> > >> >> displaychangelistener_display_console always has to call > >> >> dpy_gfx_switch for non-OpenGL displays, but it still doesn't. > >> > > >> > > >> > Ok, would that be what you have in mind? > >> > > >> > --- a/ui/console.c > >> > +++ b/ui/console.c > >> > @@ -1122,6 +1122,10 @@ static void > displaychangelistener_display_console(DisplayChangeListener *dcl, > >> > } else if (con->scanout.kind == SCANOUT_SURFACE) { > >> > dpy_gfx_create_texture(con, con->surface); > >> > displaychangelistener_gfx_switch(dcl, con->surface); > >> > +} else { > >> > +/* use the fallback surface, egl-headless keeps it updated */ > >> > +assert(con->surface); > >> > +displaychangelistener_gfx_switch(dcl, con->surface); > >> > } > >> > >> It should call displaychangelistener_gfx_switch even when e.g. > >> con->scanout.kind == SCANOUT_TEXTURE. egl-headless renders the content > >> to the last DisplaySurface it received while con->scanout.kind == > >> SCANOUT_TEXTURE. > > > > > > I see, egl-headless is really not a "listener". > > > >> > >> > > >> > I wish such egl-headless specific code would be there, but we would > need more refactoring. > >> > > >> > I think I would rather have a backend split for GL context, like > "-object egl-context". egl-headless-specific copy code would be handled by > common/util code for anything that wants a pixman surface (VNC, screen > capture, non-GL display
Re: [PATCH v2 00/15] target/arm: Implement LVA, LPA, LPA2 features
Peter Maydell writes: > On Thu, 10 Feb 2022 at 04:04, Richard Henderson > wrote: >> >> Changes for v2: >> * Introduce FIELD_SEX64, instead of open-coding w/ sextract64. >> * Set TCR_EL1 more completely for user-only. >> * Continue to bound tsz within aa64_va_parameters; >> provide an out-of-bound indicator for raising AddressSize fault. >> * Split IPS patch. >> * Fix debug registers for LVA. >> * Fix long-format fsc for LPA2. >> * Fix TLBI page shift. >> * Validate TLBI granule vs TCR granule. >> >> Not done: >> * Validate translation levels which accept blocks. >> >> There is still no upstream kernel support for FEAT_LPA2, >> so that is essentially untested. > > This series seems to break 'make check-acceptance': > > (01/59) tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv2: > INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: > Timeout reached\nOriginal status: ERROR\n{'name': > '01-tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv2', > 'logdir': > '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/results/j... > (900.74 s) > (02/59) tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv3: > INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred: > Timeout reached\nOriginal status: ERROR\n{'name': > '02-tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv3', > 'logdir': > '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/results/j... > (900.71 s) > > UEFI runs in the guest and seems to launch the kernel, but there's > no output from the kernel itself in the logfile. Last thing it > prints is: > > EFI stub: Booting Linux Kernel... > EFI stub: EFI_RNG_PROTOCOL unavailable, no randomness supplied > EFI stub: Using DTB from configuration table > EFI stub: Exiting boot services and installing virtual address map... > SetUefiImageMemoryAttributes - 0x7F50 - 0x0004 > (0x0008) > SetUefiImageMemoryAttributes - 0x7C19 - 0x0004 > (0x0008) > SetUefiImageMemoryAttributes - 0x7C14 - 0x0004 > (0x0008) > SetUefiImageMemoryAttributes - 0x7F4C - 0x0003 > (0x0008) > SetUefiImageMemoryAttributes - 0x7C0F - 0x0004 > (0x0008) > SetUefiImageMemoryAttributes - 0x7BFB - 0x0004 > (0x0008) > SetUefiImageMemoryAttributes - 0x7BE0 - 0x0003 > (0x0008) > SetUefiImageMemoryAttributes - 0x7BDC - 0x0003 > (0x0008) > > This ought to be followed by the usual kernel boot log > [0.00] Booting Linux on physical CPU 0x00 [0x000f0510] > etc but it isn't. Probably the kernel is crashing in early bootup > before it gets round to printing anything. As this test runs under -cpu max it is likely exercising the new features (and failing). -- Alex Bennée
[PULL 10/12] virtiofsd: Create new file using O_TMPFILE and set security context
From: Vivek Goyal If guest and host policies can't work with each other, then guest security context (selinux label) needs to be set into an xattr. Say remap guest security.selinux xattr to trusted.virtiofs.security.selinux. That means setting "fscreate" is not going to help as that's ony useful for security.selinux xattr on host. So we need another method which is atomic. Use O_TMPFILE to create new file, set xattr and then linkat() to proper place. But this works only for regular files. So dir, symlinks will continue to be non-atomic. Also if host filesystem does not support O_TMPFILE, we fallback to non-atomic behavior. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Vivek Goyal Message-Id: <20220208204813.682906-10-vgo...@redhat.com> Signed-off-by: Dr. David Alan Gilbert --- tools/virtiofsd/passthrough_ll.c | 80 1 file changed, 72 insertions(+), 8 deletions(-) diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c index e1c45bb420..f5d584e18a 100644 --- a/tools/virtiofsd/passthrough_ll.c +++ b/tools/virtiofsd/passthrough_ll.c @@ -2153,14 +2153,29 @@ static int lo_do_open(struct lo_data *lo, struct lo_inode *inode, static int do_create_nosecctx(fuse_req_t req, struct lo_inode *parent_inode, const char *name, mode_t mode, - struct fuse_file_info *fi, int *open_fd) + struct fuse_file_info *fi, int *open_fd, + bool tmpfile) { int err, fd; struct lo_cred old = {}; struct lo_data *lo = lo_data(req); int flags; -flags = fi->flags | O_CREAT | O_EXCL; +if (tmpfile) { +flags = fi->flags | O_TMPFILE; +/* + * Don't use O_EXCL as we want to link file later. Also reset O_CREAT + * otherwise openat() returns -EINVAL. + */ +flags &= ~(O_CREAT | O_EXCL); + +/* O_TMPFILE needs either O_RDWR or O_WRONLY */ +if ((flags & O_ACCMODE) == O_RDONLY) { +flags |= O_RDWR; +} +} else { +flags = fi->flags | O_CREAT | O_EXCL; +} err = lo_change_cred(req, , lo->change_umask); if (err) { @@ -2191,7 +2206,7 @@ static int do_create_secctx_fscreate(fuse_req_t req, return err; } -err = do_create_nosecctx(req, parent_inode, name, mode, fi, ); +err = do_create_nosecctx(req, parent_inode, name, mode, fi, , false); close_reset_proc_fscreate(fscreate_fd); if (!err) { @@ -2200,6 +2215,44 @@ static int do_create_secctx_fscreate(fuse_req_t req, return err; } +static int do_create_secctx_tmpfile(fuse_req_t req, +struct lo_inode *parent_inode, +const char *name, mode_t mode, +struct fuse_file_info *fi, +const char *secctx_name, int *open_fd) +{ +int err, fd = -1; +struct lo_data *lo = lo_data(req); +char procname[64]; + +err = do_create_nosecctx(req, parent_inode, ".", mode, fi, , true); +if (err) { +return err; +} + +err = fsetxattr(fd, secctx_name, req->secctx.ctx, req->secctx.ctxlen, 0); +if (err) { +err = errno; +goto out; +} + +/* Security context set on file. Link it in place */ +sprintf(procname, "%d", fd); +FCHDIR_NOFAIL(lo->proc_self_fd); +err = linkat(AT_FDCWD, procname, parent_inode->fd, name, + AT_SYMLINK_FOLLOW); +err = err == -1 ? errno : 0; +FCHDIR_NOFAIL(lo->root.fd); + +out: +if (!err) { +*open_fd = fd; +} else if (fd != -1) { +close(fd); +} +return err; +} + static int do_create_secctx_noatomic(fuse_req_t req, struct lo_inode *parent_inode, const char *name, mode_t mode, @@ -2208,7 +2261,7 @@ static int do_create_secctx_noatomic(fuse_req_t req, { int err = 0, fd = -1; -err = do_create_nosecctx(req, parent_inode, name, mode, fi, ); +err = do_create_nosecctx(req, parent_inode, name, mode, fi, , false); if (err) { goto out; } @@ -2250,20 +2303,31 @@ static int do_lo_create(fuse_req_t req, struct lo_inode *parent_inode, if (secctx_enabled) { /* * If security.selinux has not been remapped and selinux is enabled, - * use fscreate to set context before file creation. - * Otherwise fallback to non-atomic method of file creation - * and xattr settting. + * use fscreate to set context before file creation. If not, use + * tmpfile method for regular files. Otherwise fallback to + * non-atomic method of file creation and xattr settting. */ if (!mapped_name && lo->use_fscreate) { err = do_create_secctx_fscreate(req, parent_inode, name, mode, fi,
[PULL 12/12] virtiofsd: Add basic support for FUSE_SYNCFS request
From: Greg Kurz Honor the expected behavior of syncfs() to synchronously flush all data and metadata to disk on linux systems. If virtiofsd is started with '-o announce_submounts', the client is expected to send a FUSE_SYNCFS request for each individual submount. In this case, we just create a new file descriptor on the submount inode with lo_inode_open(), call syncfs() on it and close it. The intermediary file is needed because O_PATH descriptors aren't backed by an actual file and syncfs() would fail with EBADF. If virtiofsd is started without '-o announce_submounts' or if the client doesn't have the FUSE_CAP_SUBMOUNTS capability, the client only sends a single FUSE_SYNCFS request for the root inode. The server would thus need to track submounts internally and call syncfs() on each of them. This will be implemented later. Note that syncfs() might suffer from a time penalty if the submounts are being hammered by some unrelated workload on the host. The only solution to prevent that is to avoid shared mounts. Signed-off-by: Greg Kurz Message-Id: <20220215181529.164070-2-gr...@kaod.org> Reviewed-by: Vivek Goyal Signed-off-by: Dr. David Alan Gilbert --- tools/virtiofsd/fuse_lowlevel.c | 11 +++ tools/virtiofsd/fuse_lowlevel.h | 13 tools/virtiofsd/passthrough_ll.c | 44 +++ tools/virtiofsd/passthrough_seccomp.c | 1 + 4 files changed, 69 insertions(+) diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c index f681d5e3b3..752928741d 100644 --- a/tools/virtiofsd/fuse_lowlevel.c +++ b/tools/virtiofsd/fuse_lowlevel.c @@ -1967,6 +1967,16 @@ static void do_lseek(fuse_req_t req, fuse_ino_t nodeid, } } +static void do_syncfs(fuse_req_t req, fuse_ino_t nodeid, + struct fuse_mbuf_iter *iter) +{ +if (req->se->op.syncfs) { +req->se->op.syncfs(req, nodeid); +} else { +fuse_reply_err(req, ENOSYS); +} +} + static void do_init(fuse_req_t req, fuse_ino_t nodeid, struct fuse_mbuf_iter *iter) { @@ -2399,6 +2409,7 @@ static struct { [FUSE_RENAME2] = { do_rename2, "RENAME2" }, [FUSE_COPY_FILE_RANGE] = { do_copy_file_range, "COPY_FILE_RANGE" }, [FUSE_LSEEK] = { do_lseek, "LSEEK" }, +[FUSE_SYNCFS] = { do_syncfs, "SYNCFS" }, }; #define FUSE_MAXOP (sizeof(fuse_ll_ops) / sizeof(fuse_ll_ops[0])) diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h index c55c0ca2fc..b889dae4de 100644 --- a/tools/virtiofsd/fuse_lowlevel.h +++ b/tools/virtiofsd/fuse_lowlevel.h @@ -1226,6 +1226,19 @@ struct fuse_lowlevel_ops { */ void (*lseek)(fuse_req_t req, fuse_ino_t ino, off_t off, int whence, struct fuse_file_info *fi); + +/** + * Synchronize file system content + * + * If this request is answered with an error code of ENOSYS, + * this is treated as success and future calls to syncfs() will + * succeed automatically without being sent to the filesystem + * process. + * + * @param req request handle + * @param ino the inode number + */ +void (*syncfs)(fuse_req_t req, fuse_ino_t ino); }; /** diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c index 4742be1d1e..dfa2fc250d 100644 --- a/tools/virtiofsd/passthrough_ll.c +++ b/tools/virtiofsd/passthrough_ll.c @@ -3699,6 +3699,49 @@ static void lo_lseek(fuse_req_t req, fuse_ino_t ino, off_t off, int whence, } } +static int lo_do_syncfs(struct lo_data *lo, struct lo_inode *inode) +{ +int fd, ret = 0; + +fuse_log(FUSE_LOG_DEBUG, "lo_do_syncfs(ino=%" PRIu64 ")\n", + inode->fuse_ino); + +fd = lo_inode_open(lo, inode, O_RDONLY); +if (fd < 0) { +return -fd; +} + +if (syncfs(fd) < 0) { +ret = errno; +} + +close(fd); +return ret; +} + +static void lo_syncfs(fuse_req_t req, fuse_ino_t ino) +{ +struct lo_data *lo = lo_data(req); +struct lo_inode *inode = lo_inode(req, ino); +int err; + +if (!inode) { +fuse_reply_err(req, EBADF); +return; +} + +err = lo_do_syncfs(lo, inode); +lo_inode_put(lo, ); + +/* + * If submounts aren't announced, the client only sends a request to + * sync the root inode. TODO: Track submounts internally and iterate + * over them as well. + */ + +fuse_reply_err(req, err); +} + static void lo_destroy(void *userdata) { struct lo_data *lo = (struct lo_data *)userdata; @@ -3759,6 +3802,7 @@ static struct fuse_lowlevel_ops lo_oper = { .copy_file_range = lo_copy_file_range, #endif .lseek = lo_lseek, +.syncfs = lo_syncfs, .destroy = lo_destroy, }; diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c index 2bc0127b69..888295c073 100644 --- a/tools/virtiofsd/passthrough_seccomp.c +++ b/tools/virtiofsd/passthrough_seccomp.c @@ -111,6 +111,7 @@ static const int
[PULL 09/12] virtiofsd: Create new file with security context
From: Vivek Goyal This patch adds support for creating new file with security context as sent by client. It basically takes three paths. - If no security context enabled, then it continues to create files without security context. - If security context is enabled and but security.selinux has not been remapped, then it uses /proc/thread-self/attr/fscreate knob to set security context and then create the file. This will make sure that newly created file gets the security context as set in "fscreate" and this is atomic w.r.t file creation. This is useful and host and guest SELinux policies don't conflict and can work with each other. In that case, guest security.selinux xattr is not remapped and it is passthrough as "security.selinux" xattr on host. - If security context is enabled but security.selinux xattr has been remapped to something else, then it first creates the file and then uses setxattr() to set the remapped xattr with the security context. This is a non-atomic operation w.r.t file creation. This mode will be most versatile and allow host and guest to have their own separate SELinux xattrs and have their own separate SELinux policies. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Vivek Goyal Message-Id: <20220208204813.682906-9-vgo...@redhat.com> Signed-off-by: Dr. David Alan Gilbert --- tools/virtiofsd/passthrough_ll.c | 229 +++ 1 file changed, 200 insertions(+), 29 deletions(-) diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c index e694980a53..e1c45bb420 100644 --- a/tools/virtiofsd/passthrough_ll.c +++ b/tools/virtiofsd/passthrough_ll.c @@ -234,6 +234,11 @@ static struct lo_inode *lo_find(struct lo_data *lo, struct stat *st, static int xattr_map_client(const struct lo_data *lo, const char *client_name, char **out_name); +#define FCHDIR_NOFAIL(fd) do { \ +int fchdir_res = fchdir(fd); \ +assert(fchdir_res == 0); \ +} while (0) + static bool is_dot_or_dotdot(const char *name) { return name[0] == '.' && @@ -288,7 +293,6 @@ static bool is_fscreate_usable(struct lo_data *lo) } /* Helpers to set/reset fscreate */ -__attribute__((unused)) static int open_set_proc_fscreate(struct lo_data *lo, const void *ctx, size_t ctxlen, int *fd) { @@ -316,7 +320,6 @@ out: return err; } -__attribute__((unused)) static void close_reset_proc_fscreate(int fd) { if ((write(fd, NULL, 0)) == -1) { @@ -1354,16 +1357,103 @@ static void lo_restore_cred_gain_cap(struct lo_cred *old, bool restore_umask, } } +static int do_mknod_symlink_secctx(fuse_req_t req, struct lo_inode *dir, + const char *name, const char *secctx_name) +{ +int path_fd, err; +char procname[64]; +struct lo_data *lo = lo_data(req); + +if (!req->secctx.ctxlen) { +return 0; +} + +/* Open newly created element with O_PATH */ +path_fd = openat(dir->fd, name, O_PATH | O_NOFOLLOW); +err = path_fd == -1 ? errno : 0; +if (err) { +return err; +} +sprintf(procname, "%i", path_fd); +FCHDIR_NOFAIL(lo->proc_self_fd); +/* Set security context. This is not atomic w.r.t file creation */ +err = setxattr(procname, secctx_name, req->secctx.ctx, req->secctx.ctxlen, + 0); +if (err) { +err = errno; +} +FCHDIR_NOFAIL(lo->root.fd); +close(path_fd); +return err; +} + +static int do_mknod_symlink(fuse_req_t req, struct lo_inode *dir, +const char *name, mode_t mode, dev_t rdev, +const char *link) +{ +int err, fscreate_fd = -1; +const char *secctx_name = req->secctx.name; +struct lo_cred old = {}; +struct lo_data *lo = lo_data(req); +char *mapped_name = NULL; +bool secctx_enabled = req->secctx.ctxlen; +bool do_fscreate = false; + +if (secctx_enabled && lo->xattrmap) { +err = xattr_map_client(lo, req->secctx.name, _name); +if (err < 0) { +return -err; +} +secctx_name = mapped_name; +} + +/* + * If security xattr has not been remapped and selinux is enabled on + * host, set fscreate and no need to do a setxattr() after file creation + */ +if (secctx_enabled && !mapped_name && lo->use_fscreate) { +do_fscreate = true; +err = open_set_proc_fscreate(lo, req->secctx.ctx, req->secctx.ctxlen, + _fd); +if (err) { +goto out; +} +} + +err = lo_change_cred(req, , lo->change_umask && !S_ISLNK(mode)); +if (err) { +goto out; +} + +err = mknod_wrapper(dir->fd, name, link, mode, rdev); +err = err == -1 ? errno : 0; +lo_restore_cred(, lo->change_umask && !S_ISLNK(mode)); +if (err) { +
[PULL 11/12] virtiofsd: Add an option to enable/disable security label
From: Vivek Goyal Provide an option "-o security_label/no_security_label" to enable/disable security label functionality. By default these are turned off. If enabled, server will indicate to client that it is capable of handling one security label during file creation. Typically this is expected to be a SELinux label. File server will set this label on the file. It will try to set it atomically wherever possible. But its not possible in all the cases. Signed-off-by: Vivek Goyal Message-Id: <20220208204813.682906-11-vgo...@redhat.com> Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Dr. David Alan Gilbert --- docs/tools/virtiofsd.rst | 32 tools/virtiofsd/helper.c | 1 + tools/virtiofsd/passthrough_ll.c | 15 +++ 3 files changed, 48 insertions(+) diff --git a/docs/tools/virtiofsd.rst b/docs/tools/virtiofsd.rst index 07ac0be551..0c0560203c 100644 --- a/docs/tools/virtiofsd.rst +++ b/docs/tools/virtiofsd.rst @@ -104,6 +104,13 @@ Options * posix_acl|no_posix_acl - Enable/disable posix acl support. Posix ACLs are disabled by default. + * security_label|no_security_label - +Enable/disable security label support. Security labels are disabled by +default. This will allow client to send a MAC label of file during +file creation. Typically this is expected to be SELinux security +label. Server will try to set that label on newly created file +atomically wherever possible. + .. option:: --socket-path=PATH Listen on vhost-user UNIX domain socket at PATH. @@ -348,6 +355,31 @@ client arguments or lists returned from the host. This stops the client seeing any 'security.' attributes on the server and stops it setting any. +SELinux support +--- +One can enable support for SELinux by running virtiofsd with option +"-o security_label". But this will try to save guest's security context +in xattr security.selinux on host and it might fail if host's SELinux +policy does not permit virtiofsd to do this operation. + +Hence, it is preferred to remap guest's "security.selinux" xattr to say +"trusted.virtiofs.security.selinux" on host. + +"-o xattrmap=:map:security.selinux:trusted.virtiofs.:" + +This will make sure that guest and host's SELinux xattrs on same file +remain separate and not interfere with each other. And will allow both +host and guest to implement their own separate SELinux policies. + +Setting trusted xattr on host requires CAP_SYS_ADMIN. So one will need +add this capability to daemon. + +"-o modcaps=+sys_admin" + +Giving CAP_SYS_ADMIN increases the risk on system. Now virtiofsd is more +powerful and if gets compromised, it can do lot of damage to host system. +So keep this trade-off in my mind while making a decision. + Examples diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c index a8295d975a..e226fc590f 100644 --- a/tools/virtiofsd/helper.c +++ b/tools/virtiofsd/helper.c @@ -187,6 +187,7 @@ void fuse_cmdline_help(void) " default: no_allow_direct_io\n" "-o announce_submounts Announce sub-mount points to the guest\n" "-o posix_acl/no_posix_acl Enable/Disable posix_acl. (default: disabled)\n" + "-o security_label/no_security_label Enable/Disable security label. (default: disabled)\n" ); } diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c index f5d584e18a..4742be1d1e 100644 --- a/tools/virtiofsd/passthrough_ll.c +++ b/tools/virtiofsd/passthrough_ll.c @@ -181,6 +181,7 @@ struct lo_data { int user_posix_acl, posix_acl; /* Keeps track if /proc//attr/fscreate should be used or not */ bool use_fscreate; +int user_security_label; }; static const struct fuse_opt lo_opts[] = { @@ -215,6 +216,8 @@ static const struct fuse_opt lo_opts[] = { { "no_killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 0 }, { "posix_acl", offsetof(struct lo_data, user_posix_acl), 1 }, { "no_posix_acl", offsetof(struct lo_data, user_posix_acl), 0 }, +{ "security_label", offsetof(struct lo_data, user_security_label), 1 }, +{ "no_security_label", offsetof(struct lo_data, user_security_label), 0 }, FUSE_OPT_END }; static bool use_syslog = false; @@ -808,6 +811,17 @@ static void lo_init(void *userdata, struct fuse_conn_info *conn) fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling posix_acl\n"); conn->want &= ~FUSE_CAP_POSIX_ACL; } + +if (lo->user_security_label == 1) { +if (!(conn->capable & FUSE_CAP_SECURITY_CTX)) { +fuse_log(FUSE_LOG_ERR, "lo_init: Can not enable security label." + " kernel does not support FUSE_SECURITY_CTX capability.\n"); +} +conn->want |= FUSE_CAP_SECURITY_CTX; +} else { +fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling security label\n"); +conn->want &= ~FUSE_CAP_SECURITY_CTX; +
[PULL 08/12] virtiofsd: Add helpers to work with /proc/self/task/tid/attr/fscreate
From: Vivek Goyal Soon we will be able to create and also set security context on the file atomically using /proc/self/task/tid/attr/fscreate knob. If this knob is available on the system, first set the knob with the desired context and then create the file. It will be created with the context set in fscreate. This works basically for SELinux and its per thread. This patch just introduces the helper functions. Subsequent patches will make use of these helpers. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Vivek Goyal Message-Id: <20220208204813.682906-8-vgo...@redhat.com> Signed-off-by: Dr. David Alan Gilbert dgilbert: Manually merged gettid syscall number fixup from Vivek --- tools/virtiofsd/passthrough_ll.c | 92 1 file changed, 92 insertions(+) diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c index e27479f1c9..e694980a53 100644 --- a/tools/virtiofsd/passthrough_ll.c +++ b/tools/virtiofsd/passthrough_ll.c @@ -173,10 +173,14 @@ struct lo_data { /* An O_PATH file descriptor to /proc/self/fd/ */ int proc_self_fd; +/* An O_PATH file descriptor to /proc/self/task/ */ +int proc_self_task; int user_killpriv_v2, killpriv_v2; /* If set, virtiofsd is responsible for setting umask during creation */ bool change_umask; int user_posix_acl, posix_acl; +/* Keeps track if /proc//attr/fscreate should be used or not */ +bool use_fscreate; }; static const struct fuse_opt lo_opts[] = { @@ -256,6 +260,72 @@ static struct lo_data *lo_data(fuse_req_t req) return (struct lo_data *)fuse_req_userdata(req); } +/* + * Tries to figure out if /proc//attr/fscreate is usable or not. With + * selinux=0, read from fscreate returns -EINVAL. + * + * TODO: Link with libselinux and use is_selinux_enabled() instead down + * the line. It probably will be more reliable indicator. + */ +static bool is_fscreate_usable(struct lo_data *lo) +{ +char procname[64]; +int fscreate_fd; +size_t bytes_read; + +sprintf(procname, "%ld/attr/fscreate", syscall(SYS_gettid)); +fscreate_fd = openat(lo->proc_self_task, procname, O_RDWR); +if (fscreate_fd == -1) { +return false; +} + +bytes_read = read(fscreate_fd, procname, 64); +close(fscreate_fd); +if (bytes_read == -1) { +return false; +} +return true; +} + +/* Helpers to set/reset fscreate */ +__attribute__((unused)) +static int open_set_proc_fscreate(struct lo_data *lo, const void *ctx, + size_t ctxlen, int *fd) +{ +char procname[64]; +int fscreate_fd, err = 0; +size_t written; + +sprintf(procname, "%ld/attr/fscreate", syscall(SYS_gettid)); +fscreate_fd = openat(lo->proc_self_task, procname, O_WRONLY); +err = fscreate_fd == -1 ? errno : 0; +if (err) { +return err; +} + +written = write(fscreate_fd, ctx, ctxlen); +err = written == -1 ? errno : 0; +if (err) { +goto out; +} + +*fd = fscreate_fd; +return 0; +out: +close(fscreate_fd); +return err; +} + +__attribute__((unused)) +static void close_reset_proc_fscreate(int fd) +{ +if ((write(fd, NULL, 0)) == -1) { +fuse_log(FUSE_LOG_WARNING, "Failed to reset fscreate. err=%d\n", errno); +} +close(fd); +return; +} + /* * Load capng's state from our saved state if the current thread * hadn't previously been loaded. @@ -3531,6 +3601,15 @@ static void setup_namespaces(struct lo_data *lo, struct fuse_session *se) exit(1); } +/* Get the /proc/self/task descriptor */ +lo->proc_self_task = open("/proc/self/task/", O_PATH); +if (lo->proc_self_task == -1) { +fuse_log(FUSE_LOG_ERR, "open(/proc/self/task, O_PATH): %m\n"); +exit(1); +} + +lo->use_fscreate = is_fscreate_usable(lo); + /* * We only need /proc/self/fd. Prevent ".." from accessing parent * directories of /proc/self/fd by bind-mounting it over /proc. Since / was @@ -3747,6 +3826,14 @@ static void setup_chroot(struct lo_data *lo) exit(1); } +lo->proc_self_task = open("/proc/self/task", O_PATH); +if (lo->proc_self_fd == -1) { +fuse_log(FUSE_LOG_ERR, "open(\"/proc/self/task\", O_PATH): %m\n"); +exit(1); +} + +lo->use_fscreate = is_fscreate_usable(lo); + /* * Make the shared directory the file system root so that FUSE_OPEN * (lo_open()) cannot escape the shared directory by opening a symlink. @@ -3932,6 +4019,10 @@ static void fuse_lo_data_cleanup(struct lo_data *lo) close(lo->proc_self_fd); } +if (lo->proc_self_task >= 0) { +close(lo->proc_self_task); +} + if (lo->root.fd >= 0) { close(lo->root.fd); } @@ -3959,6 +4050,7 @@ int main(int argc, char *argv[]) .posix_lock = 0, .allow_direct_io = 0, .proc_self_fd = -1, +.proc_self_task = -1, .user_killpriv_v2 =
[PULL 04/12] virtiofsd: Parse extended "struct fuse_init_in"
From: Vivek Goyal Add some code to parse extended "struct fuse_init_in". And use a local variable "flag" to represent 64 bit flags. This will make it easier to add more features without having to worry about two 32bit flags (->flags and ->flags2) in "fuse_struct_in". Signed-off-by: Vivek Goyal Message-Id: <20220208204813.682906-4-vgo...@redhat.com> Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Dr. David Alan Gilbert dgilbert: Fixed up long line --- tools/virtiofsd/fuse_lowlevel.c | 61 + 1 file changed, 39 insertions(+), 22 deletions(-) diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c index 5d431a7038..03d60f462a 100644 --- a/tools/virtiofsd/fuse_lowlevel.c +++ b/tools/virtiofsd/fuse_lowlevel.c @@ -1882,11 +1882,14 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, size_t compat_size = offsetof(struct fuse_init_in, max_readahead); size_t compat2_size = offsetof(struct fuse_init_in, flags) + sizeof(uint32_t); +/* Fuse structure extended with minor version 36 */ +size_t compat3_size = endof(struct fuse_init_in, unused); struct fuse_init_in *arg; struct fuse_init_out outarg; struct fuse_session *se = req->se; size_t bufsize = se->bufsize; size_t outargsize = sizeof(outarg); +uint64_t flags = 0; (void)nodeid; @@ -1903,11 +1906,25 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, fuse_reply_err(req, EINVAL); return; } +flags |= arg->flags; +} + +/* + * fuse_init_in was extended again with minor version 36. Just read + * current known size of fuse_init so that future extension and + * header rebase does not cause breakage. + */ +if (sizeof(*arg) > compat2_size && (arg->flags & FUSE_INIT_EXT)) { +if (!fuse_mbuf_iter_advance(iter, compat3_size - compat2_size)) { +fuse_reply_err(req, EINVAL); +return; +} +flags |= (uint64_t) arg->flags2 << 32; } fuse_log(FUSE_LOG_DEBUG, "INIT: %u.%u\n", arg->major, arg->minor); if (arg->major == 7 && arg->minor >= 6) { -fuse_log(FUSE_LOG_DEBUG, "flags=0x%08x\n", arg->flags); +fuse_log(FUSE_LOG_DEBUG, "flags=0x%016llx\n", flags); fuse_log(FUSE_LOG_DEBUG, "max_readahead=0x%08x\n", arg->max_readahead); } se->conn.proto_major = arg->major; @@ -1935,68 +1952,68 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, if (arg->max_readahead < se->conn.max_readahead) { se->conn.max_readahead = arg->max_readahead; } -if (arg->flags & FUSE_ASYNC_READ) { +if (flags & FUSE_ASYNC_READ) { se->conn.capable |= FUSE_CAP_ASYNC_READ; } -if (arg->flags & FUSE_POSIX_LOCKS) { +if (flags & FUSE_POSIX_LOCKS) { se->conn.capable |= FUSE_CAP_POSIX_LOCKS; } -if (arg->flags & FUSE_ATOMIC_O_TRUNC) { +if (flags & FUSE_ATOMIC_O_TRUNC) { se->conn.capable |= FUSE_CAP_ATOMIC_O_TRUNC; } -if (arg->flags & FUSE_EXPORT_SUPPORT) { +if (flags & FUSE_EXPORT_SUPPORT) { se->conn.capable |= FUSE_CAP_EXPORT_SUPPORT; } -if (arg->flags & FUSE_DONT_MASK) { +if (flags & FUSE_DONT_MASK) { se->conn.capable |= FUSE_CAP_DONT_MASK; } -if (arg->flags & FUSE_FLOCK_LOCKS) { +if (flags & FUSE_FLOCK_LOCKS) { se->conn.capable |= FUSE_CAP_FLOCK_LOCKS; } -if (arg->flags & FUSE_AUTO_INVAL_DATA) { +if (flags & FUSE_AUTO_INVAL_DATA) { se->conn.capable |= FUSE_CAP_AUTO_INVAL_DATA; } -if (arg->flags & FUSE_DO_READDIRPLUS) { +if (flags & FUSE_DO_READDIRPLUS) { se->conn.capable |= FUSE_CAP_READDIRPLUS; } -if (arg->flags & FUSE_READDIRPLUS_AUTO) { +if (flags & FUSE_READDIRPLUS_AUTO) { se->conn.capable |= FUSE_CAP_READDIRPLUS_AUTO; } -if (arg->flags & FUSE_ASYNC_DIO) { +if (flags & FUSE_ASYNC_DIO) { se->conn.capable |= FUSE_CAP_ASYNC_DIO; } -if (arg->flags & FUSE_WRITEBACK_CACHE) { +if (flags & FUSE_WRITEBACK_CACHE) { se->conn.capable |= FUSE_CAP_WRITEBACK_CACHE; } -if (arg->flags & FUSE_NO_OPEN_SUPPORT) { +if (flags & FUSE_NO_OPEN_SUPPORT) { se->conn.capable |= FUSE_CAP_NO_OPEN_SUPPORT; } -if (arg->flags & FUSE_PARALLEL_DIROPS) { +if (flags & FUSE_PARALLEL_DIROPS) { se->conn.capable |= FUSE_CAP_PARALLEL_DIROPS; } -if (arg->flags & FUSE_POSIX_ACL) { +if (flags & FUSE_POSIX_ACL) { se->conn.capable |= FUSE_CAP_POSIX_ACL; } -if (arg->flags & FUSE_HANDLE_KILLPRIV) { +if (flags & FUSE_HANDLE_KILLPRIV) { se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV; } -if (arg->flags & FUSE_NO_OPENDIR_SUPPORT) { +if (flags & FUSE_NO_OPENDIR_SUPPORT) { se->conn.capable |= FUSE_CAP_NO_OPENDIR_SUPPORT; } -if (!(arg->flags & FUSE_MAX_PAGES)) { +if (!(flags &
Re: [PATCH] migration: NULL transport_data after freeing
* Hanna Reitz (hre...@redhat.com) wrote: > migration_incoming_state_destroy() NULLs all objects it frees after they > are freed, presumably so that a subsequent call to the same function > will not free them again, unless new objects have been created in the > meantime. > > transport_data is the exception, and it shows exactly this problem: When > an incoming migration uses transport_cleanup() and transport_data, and a > subsequent incoming migration (e.g. loadvm) occurs that does not, then > when this second one is done, it will call transport_cleanup() on the > old transport_data again -- which has already been freed. This is > sometimes visible in the iotest 201, though for some reason I can only > reproduce it with -m32. > > To fix this, call transport_cleanup() only when transport_data is not > NULL (otherwise there is nothing to clean up), and set transport_data to > NULL when it has been cleaned up (i.e. freed). > > (transport_cleanup() is used only by migration/socket.c, where > socket_start_incoming_migration_internal() sets both it and > transport_data to non-NULL values.) > > Signed-off-by: Hanna Reitz That probably deserves a fixes: a59136f Reviewed-by: Dr. David Alan Gilbert > --- > migration/migration.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c > index bcc385b94b..cdb2e76d02 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -287,8 +287,9 @@ void migration_incoming_state_destroy(void) > g_array_free(mis->postcopy_remote_fds, TRUE); > mis->postcopy_remote_fds = NULL; > } > -if (mis->transport_cleanup) { > +if (mis->transport_cleanup && mis->transport_data) { > mis->transport_cleanup(mis->transport_data); > +mis->transport_data = NULL; > } > > qemu_event_reset(>main_thread_load_event); > -- > 2.34.1 > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
[PULL 07/12] virtiofsd: Move core file creation code in separate function
From: Vivek Goyal Move core file creation bits in a separate function. Soon this is going to get more complex as file creation need to set security context also. And there will be multiple modes of file creation in next patch. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Vivek Goyal Message-Id: <20220208204813.682906-7-vgo...@redhat.com> Signed-off-by: Dr. David Alan Gilbert --- tools/virtiofsd/passthrough_ll.c | 36 ++-- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c index 3e56d1cd95..e27479f1c9 100644 --- a/tools/virtiofsd/passthrough_ll.c +++ b/tools/virtiofsd/passthrough_ll.c @@ -2001,6 +2001,30 @@ static int lo_do_open(struct lo_data *lo, struct lo_inode *inode, return 0; } +static int do_lo_create(fuse_req_t req, struct lo_inode *parent_inode, +const char *name, mode_t mode, +struct fuse_file_info *fi, int* open_fd) +{ +int err = 0, fd; +struct lo_cred old = {}; +struct lo_data *lo = lo_data(req); + +err = lo_change_cred(req, , lo->change_umask); +if (err) { +return err; +} + +/* Try to create a new file but don't open existing files */ +fd = openat(parent_inode->fd, name, fi->flags | O_CREAT | O_EXCL, mode); +if (fd == -1) { +err = errno; +} else { +*open_fd = fd; +} +lo_restore_cred(, lo->change_umask); +return err; +} + static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name, mode_t mode, struct fuse_file_info *fi) { @@ -2010,7 +2034,6 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name, struct lo_inode *inode = NULL; struct fuse_entry_param e; int err; -struct lo_cred old = {}; fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)" " kill_priv=%d\n", parent, name, fi->kill_priv); @@ -2026,18 +2049,9 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name, return; } -err = lo_change_cred(req, , lo->change_umask); -if (err) { -goto out; -} - update_open_flags(lo->writeback, lo->allow_direct_io, fi); -/* Try to create a new file but don't open existing files */ -fd = openat(parent_inode->fd, name, fi->flags | O_CREAT | O_EXCL, mode); -err = fd == -1 ? errno : 0; - -lo_restore_cred(, lo->change_umask); +err = do_lo_create(req, parent_inode, name, mode, fi, ); /* Ignore the error if file exists and O_EXCL was not given */ if (err && (err != EEXIST || (fi->flags & O_EXCL))) { -- 2.35.1
[PULL 06/12] virtiofsd, fuse_lowlevel.c: Add capability to parse security context
From: Vivek Goyal Add capability to enable and parse security context as sent by client and put into fuse_req. Filesystems now can get security context from request and set it on files during creation. Signed-off-by: Vivek Goyal Message-Id: <20220208204813.682906-6-vgo...@redhat.com> Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Dr. David Alan Gilbert --- tools/virtiofsd/fuse_common.h | 5 ++ tools/virtiofsd/fuse_i.h| 7 +++ tools/virtiofsd/fuse_lowlevel.c | 102 +++- 3 files changed, 113 insertions(+), 1 deletion(-) diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h index 6f8a988202..bf46954dab 100644 --- a/tools/virtiofsd/fuse_common.h +++ b/tools/virtiofsd/fuse_common.h @@ -377,6 +377,11 @@ struct fuse_file_info { */ #define FUSE_CAP_SETXATTR_EXT (1 << 29) +/** + * Indicates that file server supports creating file security context + */ +#define FUSE_CAP_SECURITY_CTX (1ULL << 32) + /** * Ioctl flags * diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h index 492e002181..a5572fa4ae 100644 --- a/tools/virtiofsd/fuse_i.h +++ b/tools/virtiofsd/fuse_i.h @@ -15,6 +15,12 @@ struct fv_VuDev; struct fv_QueueInfo; +struct fuse_security_context { +const char *name; +uint32_t ctxlen; +const void *ctx; +}; + struct fuse_req { struct fuse_session *se; uint64_t unique; @@ -35,6 +41,7 @@ struct fuse_req { } u; struct fuse_req *next; struct fuse_req *prev; +struct fuse_security_context secctx; }; struct fuse_notify_req { diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c index 794185fb33..f681d5e3b3 100644 --- a/tools/virtiofsd/fuse_lowlevel.c +++ b/tools/virtiofsd/fuse_lowlevel.c @@ -886,11 +886,63 @@ static void do_readlink(fuse_req_t req, fuse_ino_t nodeid, } } +static int parse_secctx_fill_req(fuse_req_t req, struct fuse_mbuf_iter *iter) +{ +struct fuse_secctx_header *fsecctx_header; +struct fuse_secctx *fsecctx; +const void *secctx; +const char *name; + +fsecctx_header = fuse_mbuf_iter_advance(iter, sizeof(*fsecctx_header)); +if (!fsecctx_header) { +return -EINVAL; +} + +/* + * As of now maximum of one security context is supported. It can + * change in future though. + */ +if (fsecctx_header->nr_secctx > 1) { +return -EINVAL; +} + +/* No security context sent. Maybe no LSM supports it */ +if (!fsecctx_header->nr_secctx) { +return 0; +} + +fsecctx = fuse_mbuf_iter_advance(iter, sizeof(*fsecctx)); +if (!fsecctx) { +return -EINVAL; +} + +/* struct fsecctx with zero sized context is not expected */ +if (!fsecctx->size) { +return -EINVAL; +} +name = fuse_mbuf_iter_advance_str(iter); +if (!name) { +return -EINVAL; +} + +secctx = fuse_mbuf_iter_advance(iter, fsecctx->size); +if (!secctx) { +return -EINVAL; +} + +req->secctx.name = name; +req->secctx.ctx = secctx; +req->secctx.ctxlen = fsecctx->size; +return 0; +} + static void do_mknod(fuse_req_t req, fuse_ino_t nodeid, struct fuse_mbuf_iter *iter) { struct fuse_mknod_in *arg; const char *name; +bool secctx_enabled = req->se->conn.want & FUSE_CAP_SECURITY_CTX; +int err; arg = fuse_mbuf_iter_advance(iter, sizeof(*arg)); name = fuse_mbuf_iter_advance_str(iter); @@ -901,6 +953,14 @@ static void do_mknod(fuse_req_t req, fuse_ino_t nodeid, req->ctx.umask = arg->umask; +if (secctx_enabled) { +err = parse_secctx_fill_req(req, iter); +if (err) { +fuse_reply_err(req, -err); +return; +} +} + if (req->se->op.mknod) { req->se->op.mknod(req, nodeid, name, arg->mode, arg->rdev); } else { @@ -913,6 +973,8 @@ static void do_mkdir(fuse_req_t req, fuse_ino_t nodeid, { struct fuse_mkdir_in *arg; const char *name; +bool secctx_enabled = req->se->conn.want & FUSE_CAP_SECURITY_CTX; +int err; arg = fuse_mbuf_iter_advance(iter, sizeof(*arg)); name = fuse_mbuf_iter_advance_str(iter); @@ -923,6 +985,14 @@ static void do_mkdir(fuse_req_t req, fuse_ino_t nodeid, req->ctx.umask = arg->umask; +if (secctx_enabled) { +err = parse_secctx_fill_req(req, iter); +if (err) { +fuse_reply_err(req, err); +return; +} +} + if (req->se->op.mkdir) { req->se->op.mkdir(req, nodeid, name, arg->mode); } else { @@ -969,12 +1039,22 @@ static void do_symlink(fuse_req_t req, fuse_ino_t nodeid, { const char *name = fuse_mbuf_iter_advance_str(iter); const char *linkname = fuse_mbuf_iter_advance_str(iter); +bool secctx_enabled = req->se->conn.want & FUSE_CAP_SECURITY_CTX; +int err; if (!name || !linkname) { fuse_reply_err(req, EINVAL); return; } +
[PULL 05/12] virtiofsd: Extend size of fuse_conn_info->capable and ->want fields
From: Vivek Goyal ->capable keeps track of what capabilities kernel supports and ->wants keep track of what capabilities filesytem wants. Right now these fields are 32bit in size. But now fuse has run out of bits and capabilities can now have bit number which are higher than 31. That means 32 bit fields are not suffcient anymore. Increase size to 64 bit so that we can add newer capabilities and still be able to use existing code to check and set the capabilities. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Vivek Goyal Message-Id: <20220208204813.682906-5-vgo...@redhat.com> Signed-off-by: Dr. David Alan Gilbert --- tools/virtiofsd/fuse_common.h | 4 ++-- tools/virtiofsd/fuse_lowlevel.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h index 0c2665b977..6f8a988202 100644 --- a/tools/virtiofsd/fuse_common.h +++ b/tools/virtiofsd/fuse_common.h @@ -439,7 +439,7 @@ struct fuse_conn_info { /** * Capability flags that the kernel supports (read-only) */ -unsigned capable; +uint64_t capable; /** * Capability flags that the filesystem wants to enable. @@ -447,7 +447,7 @@ struct fuse_conn_info { * libfuse attempts to initialize this field with * reasonable default values before calling the init() handler. */ -unsigned want; +uint64_t want; /** * Maximum number of pending "background" requests. A diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c index 03d60f462a..794185fb33 100644 --- a/tools/virtiofsd/fuse_lowlevel.c +++ b/tools/virtiofsd/fuse_lowlevel.c @@ -2070,7 +2070,7 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, if (se->conn.want & (~se->conn.capable)) { fuse_log(FUSE_LOG_ERR, "fuse: error: filesystem requested capabilities " - "0x%x that are not supported by kernel, aborting.\n", + "0x%llx that are not supported by kernel, aborting.\n", se->conn.want & (~se->conn.capable)); fuse_reply_err(req, EPROTO); se->error = -EPROTO; -- 2.35.1
Re: [Virtio-fs] [PULL 00/12] virtiofs queue
* Dr. David Alan Gilbert (git) (dgilb...@redhat.com) wrote: > From: "Dr. David Alan Gilbert" > > The following changes since commit c13b8e9973635f34f3ce4356af27a311c993729c: > > Merge remote-tracking branch > 'remotes/alistair/tags/pull-riscv-to-apply-20220216' into staging (2022-02-16 > 09:57:11 +) > > are available in the Git repository at: > > https://gitlab.com/dagrh/qemu.git tags/pull-virtiofs-20220217 > > for you to fetch changes up to e138ec4ac86ea71d10ecd032edc433290776a5f2: > > virtiofsd: Add basic support for FUSE_SYNCFS request (2022-02-17 13:35:55 > +) NAK again Some checkpatch fixes slipped through; v3 in flight > > V2: virtiofs pull 2022-02-17 > > Security label improvements from Vivek > - includes a fix for building against new kernel headers > [V2: Fix building on old Linux] > Blocking flock disable from Sebastian > SYNCFS support from Greg > > Signed-off-by: Dr. David Alan Gilbert > > > Greg Kurz (1): > virtiofsd: Add basic support for FUSE_SYNCFS request > > Sebastian Hasler (1): > virtiofsd: Do not support blocking flock > > Vivek Goyal (10): > virtiofsd: Fix breakage due to fuse_init_in size change > linux-headers: Update headers to v5.17-rc1 > virtiofsd: Parse extended "struct fuse_init_in" > virtiofsd: Extend size of fuse_conn_info->capable and ->want fields > virtiofsd, fuse_lowlevel.c: Add capability to parse security context > virtiofsd: Move core file creation code in separate function > virtiofsd: Add helpers to work with /proc/self/task/tid/attr/fscreate > virtiofsd: Create new file with security context > virtiofsd: Create new file using O_TMPFILE and set security context > virtiofsd: Add an option to enable/disable security label > > docs/tools/virtiofsd.rst | 32 ++ > include/standard-headers/asm-x86/kvm_para.h| 1 + > include/standard-headers/drm/drm_fourcc.h | 11 + > include/standard-headers/linux/ethtool.h | 1 + > include/standard-headers/linux/fuse.h | 60 +++- > include/standard-headers/linux/pci_regs.h | 142 > include/standard-headers/linux/virtio_gpio.h | 72 > include/standard-headers/linux/virtio_i2c.h| 47 +++ > include/standard-headers/linux/virtio_iommu.h | 8 +- > include/standard-headers/linux/virtio_pcidev.h | 65 > include/standard-headers/linux/virtio_scmi.h | 24 ++ > linux-headers/asm-generic/unistd.h | 5 +- > linux-headers/asm-mips/unistd_n32.h| 2 + > linux-headers/asm-mips/unistd_n64.h| 2 + > linux-headers/asm-mips/unistd_o32.h| 2 + > linux-headers/asm-powerpc/unistd_32.h | 2 + > linux-headers/asm-powerpc/unistd_64.h | 2 + > linux-headers/asm-riscv/bitsperlong.h | 14 + > linux-headers/asm-riscv/mman.h | 1 + > linux-headers/asm-riscv/unistd.h | 44 +++ > linux-headers/asm-s390/unistd_32.h | 2 + > linux-headers/asm-s390/unistd_64.h | 2 + > linux-headers/asm-x86/kvm.h| 16 +- > linux-headers/asm-x86/unistd_32.h | 1 + > linux-headers/asm-x86/unistd_64.h | 1 + > linux-headers/asm-x86/unistd_x32.h | 1 + > linux-headers/linux/kvm.h | 17 + > tools/virtiofsd/fuse_common.h | 9 +- > tools/virtiofsd/fuse_i.h | 7 + > tools/virtiofsd/fuse_lowlevel.c| 179 -- > tools/virtiofsd/fuse_lowlevel.h| 13 + > tools/virtiofsd/helper.c | 1 + > tools/virtiofsd/passthrough_ll.c | 467 > +++-- > tools/virtiofsd/passthrough_seccomp.c | 1 + > 34 files changed, 1122 insertions(+), 132 deletions(-) > create mode 100644 include/standard-headers/linux/virtio_gpio.h > create mode 100644 include/standard-headers/linux/virtio_i2c.h > create mode 100644 include/standard-headers/linux/virtio_pcidev.h > create mode 100644 include/standard-headers/linux/virtio_scmi.h > create mode 100644 linux-headers/asm-riscv/bitsperlong.h > create mode 100644 linux-headers/asm-riscv/mman.h > create mode 100644 linux-headers/asm-riscv/unistd.h > > ___ > Virtio-fs mailing list > virtio...@redhat.com > https://listman.redhat.com/mailman/listinfo/virtio-fs > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [PATCH v2 00/12] GL & D-Bus display related fixes
On Fri, Feb 18, 2022 at 2:07 AM Marc-André Lureau wrote: > > Hi > > On Thu, Feb 17, 2022 at 8:39 PM Akihiko Odaki wrote: >> >> On Fri, Feb 18, 2022 at 1:12 AM Marc-André Lureau >> wrote: >> > >> > Hi >> > >> > On Thu, Feb 17, 2022 at 5:09 PM Akihiko Odaki >> > wrote: >> >> >> >> On Thu, Feb 17, 2022 at 8:58 PM wrote: >> >> > >> >> > From: Marc-André Lureau >> >> > >> >> > Hi, >> >> > >> >> > In the thread "[PATCH 0/6] ui/dbus: Share one listener for a console", >> >> > Akihiko >> >> > Odaki reported a number of issues with the GL and D-Bus display. His >> >> > series >> >> > propose a different design, and reverting some of my previous generic >> >> > console >> >> > changes to fix those issues. >> >> > >> >> > However, as I work through the issue so far, they can be solved by >> >> > reasonable >> >> > simple fixes while keeping the console changes generic (not specific to >> >> > the >> >> > D-Bus display backend). I belive a shared infrastructure is more >> >> > beneficial long >> >> > term than having GL-specific code in the DBus code (in particular, the >> >> > egl-headless & VNC combination could potentially use it) >> >> > >> >> > Thanks a lot Akihiko for reporting the issues proposing a different >> >> > approach! >> >> > Please test this alternative series and let me know of any further >> >> > issues. My >> >> > understanding is that you are mainly concerned with the Cocoa backend, >> >> > and I >> >> > don't have a way to test it, so please check it. If necessary, we may >> >> > well have >> >> > to revert my earlier changes and go your way, eventually. >> >> > >> >> > Marc-André Lureau (12): >> >> > ui/console: fix crash when using gl context with non-gl listeners >> >> > ui/console: fix texture leak when calling surface_gl_create_texture() >> >> > ui: do not create a surface when resizing a GL scanout >> >> > ui/console: move check for compatible GL context >> >> > ui/console: move dcl compatiblity check to a callback >> >> > ui/console: egl-headless is compatible with non-gl listeners >> >> > ui/dbus: associate the DBusDisplayConsole listener with the given >> >> > console >> >> > ui/console: move console compatibility check to dcl_display_console() >> >> > ui/shader: fix potential leak of shader on error >> >> > ui/shader: free associated programs >> >> > ui/console: add a dpy_gfx_switch callback helper >> >> > ui/dbus: fix texture sharing >> >> > >> >> > include/ui/console.h | 19 --- >> >> > ui/dbus.h| 3 ++ >> >> > ui/console-gl.c | 4 ++ >> >> > ui/console.c | 119 ++- >> >> > ui/dbus-console.c| 27 +- >> >> > ui/dbus-listener.c | 11 >> >> > ui/dbus.c| 33 +++- >> >> > ui/egl-headless.c| 17 ++- >> >> > ui/gtk.c | 18 ++- >> >> > ui/sdl2.c| 9 +++- >> >> > ui/shader.c | 9 +++- >> >> > ui/spice-display.c | 9 +++- >> >> > 12 files changed, 192 insertions(+), 86 deletions(-) >> >> > >> >> > -- >> >> > 2.34.1.428.gdcc0cd074f0c >> >> > >> >> > >> >> >> >> You missed only one thing: >> >> >- that console_select and register_displaychangelistener may not call >> >> > dpy_gfx_switch and call dpy_gl_scanout_texture instead. It is >> >> > incompatible with non-OpenGL displays >> >> >> >> displaychangelistener_display_console always has to call >> >> dpy_gfx_switch for non-OpenGL displays, but it still doesn't. >> > >> > >> > Ok, would that be what you have in mind? >> > >> > --- a/ui/console.c >> > +++ b/ui/console.c >> > @@ -1122,6 +1122,10 @@ static void >> > displaychangelistener_display_console(DisplayChangeListener *dcl, >> > } else if (con->scanout.kind == SCANOUT_SURFACE) { >> > dpy_gfx_create_texture(con, con->surface); >> > displaychangelistener_gfx_switch(dcl, con->surface); >> > +} else { >> > +/* use the fallback surface, egl-headless keeps it updated */ >> > +assert(con->surface); >> > +displaychangelistener_gfx_switch(dcl, con->surface); >> > } >> >> It should call displaychangelistener_gfx_switch even when e.g. >> con->scanout.kind == SCANOUT_TEXTURE. egl-headless renders the content >> to the last DisplaySurface it received while con->scanout.kind == >> SCANOUT_TEXTURE. > > > I see, egl-headless is really not a "listener". > >> >> > >> > I wish such egl-headless specific code would be there, but we would need >> > more refactoring. >> > >> > I think I would rather have a backend split for GL context, like "-object >> > egl-context". egl-headless-specific copy code would be handled by >> > common/util code for anything that wants a pixman surface (VNC, screen >> > capture, non-GL display etc). >> > >> > This split would allow sharing the context code, and introduce other >> > system specific GL initialization, such as WGL etc. Right now, I doubt the >> > EGL code works on anything but Linux. >> >>
[PULL 03/12] linux-headers: Update headers to v5.17-rc1
From: Vivek Goyal Update headers to 5.17-rc1. I need latest fuse changes. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Vivek Goyal Message-Id: <20220208204813.682906-3-vgo...@redhat.com> Signed-off-by: Dr. David Alan Gilbert --- include/standard-headers/asm-x86/kvm_para.h | 1 + include/standard-headers/drm/drm_fourcc.h | 11 ++ include/standard-headers/linux/ethtool.h | 1 + include/standard-headers/linux/fuse.h | 60 +++- include/standard-headers/linux/pci_regs.h | 142 +- include/standard-headers/linux/virtio_gpio.h | 72 + include/standard-headers/linux/virtio_i2c.h | 47 ++ include/standard-headers/linux/virtio_iommu.h | 8 +- .../standard-headers/linux/virtio_pcidev.h| 65 include/standard-headers/linux/virtio_scmi.h | 24 +++ linux-headers/asm-generic/unistd.h| 5 +- linux-headers/asm-mips/unistd_n32.h | 2 + linux-headers/asm-mips/unistd_n64.h | 2 + linux-headers/asm-mips/unistd_o32.h | 2 + linux-headers/asm-powerpc/unistd_32.h | 2 + linux-headers/asm-powerpc/unistd_64.h | 2 + linux-headers/asm-riscv/bitsperlong.h | 14 ++ linux-headers/asm-riscv/mman.h| 1 + linux-headers/asm-riscv/unistd.h | 44 ++ linux-headers/asm-s390/unistd_32.h| 2 + linux-headers/asm-s390/unistd_64.h| 2 + linux-headers/asm-x86/kvm.h | 16 +- linux-headers/asm-x86/unistd_32.h | 1 + linux-headers/asm-x86/unistd_64.h | 1 + linux-headers/asm-x86/unistd_x32.h| 1 + linux-headers/linux/kvm.h | 17 +++ 26 files changed, 469 insertions(+), 76 deletions(-) create mode 100644 include/standard-headers/linux/virtio_gpio.h create mode 100644 include/standard-headers/linux/virtio_i2c.h create mode 100644 include/standard-headers/linux/virtio_pcidev.h create mode 100644 include/standard-headers/linux/virtio_scmi.h create mode 100644 linux-headers/asm-riscv/bitsperlong.h create mode 100644 linux-headers/asm-riscv/mman.h create mode 100644 linux-headers/asm-riscv/unistd.h diff --git a/include/standard-headers/asm-x86/kvm_para.h b/include/standard-headers/asm-x86/kvm_para.h index 204cfb8640..f0235e58a1 100644 --- a/include/standard-headers/asm-x86/kvm_para.h +++ b/include/standard-headers/asm-x86/kvm_para.h @@ -8,6 +8,7 @@ * should be used to determine that a VM is running under KVM. */ #define KVM_CPUID_SIGNATURE0x4000 +#define KVM_SIGNATURE "KVMKVMKVM\0\0\0" /* This CPUID returns two feature bitmaps in eax, edx. Before enabling * a particular paravirtualization, the appropriate feature bit should diff --git a/include/standard-headers/drm/drm_fourcc.h b/include/standard-headers/drm/drm_fourcc.h index 2c025cb4fe..4888f85f69 100644 --- a/include/standard-headers/drm/drm_fourcc.h +++ b/include/standard-headers/drm/drm_fourcc.h @@ -313,6 +313,13 @@ extern "C" { */ #define DRM_FORMAT_P016fourcc_code('P', '0', '1', '6') /* 2x2 subsampled Cr:Cb plane 16 bits per channel */ +/* 2 plane YCbCr420. + * 3 10 bit components and 2 padding bits packed into 4 bytes. + * index 0 = Y plane, [31:0] x:Y2:Y1:Y0 2:10:10:10 little endian + * index 1 = Cr:Cb plane, [63:0] x:Cr2:Cb2:Cr1:x:Cb1:Cr0:Cb0 [2:10:10:10:2:10:10:10] little endian + */ +#define DRM_FORMAT_P030fourcc_code('P', '0', '3', '0') /* 2x2 subsampled Cr:Cb plane 10 bits per channel packed */ + /* 3 plane non-subsampled (444) YCbCr * 16 bits per component, but only 10 bits are used and 6 bits are padded * index 0: Y plane, [15:0] Y:x [10:6] little endian @@ -853,6 +860,10 @@ drm_fourcc_canonicalize_nvidia_format_mod(uint64_t modifier) * and UV. Some SAND-using hardware stores UV in a separate tiled * image from Y to reduce the column height, which is not supported * with these modifiers. + * + * The DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT modifier is also + * supported for DRM_FORMAT_P030 where the columns remain as 128 bytes + * wide, but as this is a 10 bpp format that translates to 96 pixels. */ #define DRM_FORMAT_MOD_BROADCOM_SAND32_COL_HEIGHT(v) \ diff --git a/include/standard-headers/linux/ethtool.h b/include/standard-headers/linux/ethtool.h index 688eb8dc39..38d5a4cd6e 100644 --- a/include/standard-headers/linux/ethtool.h +++ b/include/standard-headers/linux/ethtool.h @@ -231,6 +231,7 @@ enum tunable_id { ETHTOOL_RX_COPYBREAK, ETHTOOL_TX_COPYBREAK, ETHTOOL_PFC_PREVENTION_TOUT, /* timeout in msecs */ + ETHTOOL_TX_COPYBREAK_BUF_SIZE, /* * Add your fresh new tunable attribute above and remember to update * tunable_strings[] in net/ethtool/common.c diff --git a/include/standard-headers/linux/fuse.h b/include/standard-headers/linux/fuse.h index 23ea31708b..bda06258be 100644 --- a/include/standard-headers/linux/fuse.h +++
[PULL 02/12] virtiofsd: Fix breakage due to fuse_init_in size change
From: Vivek Goyal Kernel version 5.17 has increased the size of "struct fuse_init_in" struct. Previously this struct was 16 bytes and now it has been extended to 64 bytes in size. Once qemu headers are updated to latest, it will expect to receive 64 byte size struct (for protocol version major 7 and minor > 6). But if guest is booting older kernel (older than 5.17), then it still sends older fuse_init_in of size 16 bytes. And do_init() fails. It is expecting 64 byte struct. And this results in mount of virtiofs failing. Fix this by parsing 16 bytes only for now. Separate patches will be posted which will parse rest of the bytes and enable new functionality. Right now we don't support any of the new functionality, so we don't lose anything by not parsing bytes beyond 16. Reviewed-by: Dr. David Alan Gilbert Signed-off-by: Vivek Goyal Message-Id: <20220208204813.682906-2-vgo...@redhat.com> Signed-off-by: Dr. David Alan Gilbert --- tools/virtiofsd/fuse_lowlevel.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c index e4679c73ab..5d431a7038 100644 --- a/tools/virtiofsd/fuse_lowlevel.c +++ b/tools/virtiofsd/fuse_lowlevel.c @@ -1880,6 +1880,8 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, struct fuse_mbuf_iter *iter) { size_t compat_size = offsetof(struct fuse_init_in, max_readahead); +size_t compat2_size = offsetof(struct fuse_init_in, flags) + + sizeof(uint32_t); struct fuse_init_in *arg; struct fuse_init_out outarg; struct fuse_session *se = req->se; @@ -1897,7 +1899,7 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid, /* ...and now consume the new fields. */ if (arg->major == 7 && arg->minor >= 6) { -if (!fuse_mbuf_iter_advance(iter, sizeof(*arg) - compat_size)) { +if (!fuse_mbuf_iter_advance(iter, compat2_size - compat_size)) { fuse_reply_err(req, EINVAL); return; } -- 2.35.1
[PULL 01/12] virtiofsd: Do not support blocking flock
From: Sebastian Hasler With the current implementation, blocking flock can lead to deadlock. Thus, it's better to return EOPNOTSUPP if a user attempts to perform a blocking flock request. Signed-off-by: Sebastian Hasler Message-Id: <20220113153249.710216-1-sebastian.has...@stuvus.uni-stuttgart.de> Signed-off-by: Dr. David Alan Gilbert Reviewed-by: Vivek Goyal Reviewed-by: Greg Kurz --- tools/virtiofsd/passthrough_ll.c | 9 + 1 file changed, 9 insertions(+) diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c index b3d0674f6d..3e56d1cd95 100644 --- a/tools/virtiofsd/passthrough_ll.c +++ b/tools/virtiofsd/passthrough_ll.c @@ -2467,6 +2467,15 @@ static void lo_flock(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi, int res; (void)ino; +if (!(op & LOCK_NB)) { +/* + * Blocking flock can deadlock as there is only one thread + * serving the queue. + */ +fuse_reply_err(req, EOPNOTSUPP); +return; +} + res = flock(lo_fi_fd(req, fi), op); fuse_reply_err(req, res == -1 ? errno : 0); -- 2.35.1
[PULL 00/12] virtiofs queue
From: "Dr. David Alan Gilbert" The following changes since commit c13b8e9973635f34f3ce4356af27a311c993729c: Merge remote-tracking branch 'remotes/alistair/tags/pull-riscv-to-apply-20220216' into staging (2022-02-16 09:57:11 +) are available in the Git repository at: https://gitlab.com/dagrh/qemu.git tags/pull-virtiofs-20220217b for you to fetch changes up to 45b04ef48dbbeb18d93c2631bf5584ac493de749: virtiofsd: Add basic support for FUSE_SYNCFS request (2022-02-17 17:22:26 +) V3: virtiofs pull 2022-02-17 Security label improvements from Vivek - includes a fix for building against new kernel headers [V3: checkpatch style fixes] [V2: Fix building on old Linux] Blocking flock disable from Sebastian SYNCFS support from Greg Signed-off-by: Dr. David Alan Gilbert Greg Kurz (1): virtiofsd: Add basic support for FUSE_SYNCFS request Sebastian Hasler (1): virtiofsd: Do not support blocking flock Vivek Goyal (10): virtiofsd: Fix breakage due to fuse_init_in size change linux-headers: Update headers to v5.17-rc1 virtiofsd: Parse extended "struct fuse_init_in" virtiofsd: Extend size of fuse_conn_info->capable and ->want fields virtiofsd, fuse_lowlevel.c: Add capability to parse security context virtiofsd: Move core file creation code in separate function virtiofsd: Add helpers to work with /proc/self/task/tid/attr/fscreate virtiofsd: Create new file with security context virtiofsd: Create new file using O_TMPFILE and set security context virtiofsd: Add an option to enable/disable security label docs/tools/virtiofsd.rst | 32 ++ include/standard-headers/asm-x86/kvm_para.h| 1 + include/standard-headers/drm/drm_fourcc.h | 11 + include/standard-headers/linux/ethtool.h | 1 + include/standard-headers/linux/fuse.h | 60 +++- include/standard-headers/linux/pci_regs.h | 142 include/standard-headers/linux/virtio_gpio.h | 72 include/standard-headers/linux/virtio_i2c.h| 47 +++ include/standard-headers/linux/virtio_iommu.h | 8 +- include/standard-headers/linux/virtio_pcidev.h | 65 include/standard-headers/linux/virtio_scmi.h | 24 ++ linux-headers/asm-generic/unistd.h | 5 +- linux-headers/asm-mips/unistd_n32.h| 2 + linux-headers/asm-mips/unistd_n64.h| 2 + linux-headers/asm-mips/unistd_o32.h| 2 + linux-headers/asm-powerpc/unistd_32.h | 2 + linux-headers/asm-powerpc/unistd_64.h | 2 + linux-headers/asm-riscv/bitsperlong.h | 14 + linux-headers/asm-riscv/mman.h | 1 + linux-headers/asm-riscv/unistd.h | 44 +++ linux-headers/asm-s390/unistd_32.h | 2 + linux-headers/asm-s390/unistd_64.h | 2 + linux-headers/asm-x86/kvm.h| 16 +- linux-headers/asm-x86/unistd_32.h | 1 + linux-headers/asm-x86/unistd_64.h | 1 + linux-headers/asm-x86/unistd_x32.h | 1 + linux-headers/linux/kvm.h | 17 + tools/virtiofsd/fuse_common.h | 9 +- tools/virtiofsd/fuse_i.h | 7 + tools/virtiofsd/fuse_lowlevel.c| 180 -- tools/virtiofsd/fuse_lowlevel.h| 13 + tools/virtiofsd/helper.c | 1 + tools/virtiofsd/passthrough_ll.c | 467 +++-- tools/virtiofsd/passthrough_seccomp.c | 1 + 34 files changed, 1123 insertions(+), 132 deletions(-) create mode 100644 include/standard-headers/linux/virtio_gpio.h create mode 100644 include/standard-headers/linux/virtio_i2c.h create mode 100644 include/standard-headers/linux/virtio_pcidev.h create mode 100644 include/standard-headers/linux/virtio_scmi.h create mode 100644 linux-headers/asm-riscv/bitsperlong.h create mode 100644 linux-headers/asm-riscv/mman.h create mode 100644 linux-headers/asm-riscv/unistd.h
[PULL v2 0/5] 9p queue (previous 2022-02-10)
The following changes since commit c13b8e9973635f34f3ce4356af27a311c993729c: Merge remote-tracking branch 'remotes/alistair/tags/pull-riscv-to-apply-20220216' into staging (2022-02-16 09:57:11 +) are available in the Git repository at: https://github.com/cschoenebeck/qemu.git tags/pull-9p-20220217 for you to fetch changes up to e64e27d5cb103b7764f1a05b6eda7e7fedd517c5: 9pfs: Fix segfault in do_readdir_many caused by struct dirent overread (2022-02-17 16:57:58 +0100) 9pfs: fixes and cleanup * Fifth patch fixes a 9pfs server crash that happened on some systems due to incorrect (system dependant) handling of struct dirent size. * Tests: Second patch fixes a test error that happened on some systems due mkdir() being called twice for creating the test directory for the 9p 'local' tests. * Tests: Third patch fixes a memory leak. * Tests: The remaining two patches are code cleanup. Christian Schoenebeck (2): tests/9pfs: use g_autofree where possible tests/9pfs: fix mkdir() being called twice Greg Kurz (2): tests/9pfs: Fix leak of local_test_path tests/9pfs: Use g_autofree and g_autoptr where possible Vitaly Chikunov (1): 9pfs: Fix segfault in do_readdir_many caused by struct dirent overread hw/9pfs/9p-synth.c | 18 +++-- hw/9pfs/9p-synth.h | 5 +++ hw/9pfs/codir.c| 3 +- include/qemu/osdep.h | 13 ++ tests/qtest/libqos/virtio-9p.c | 38 +++--- tests/qtest/virtio-9p-test.c | 90 +- util/osdep.c | 21 ++ 7 files changed, 96 insertions(+), 92 deletions(-)
[PULL v2 1/5] tests/9pfs: use g_autofree where possible
Signed-off-by: Christian Schoenebeck Reviewed-by: Greg Kurz Message-Id: --- tests/qtest/virtio-9p-test.c | 90 +++- 1 file changed, 27 insertions(+), 63 deletions(-) diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c index 41fed41de1..502e5ad0c7 100644 --- a/tests/qtest/virtio-9p-test.c +++ b/tests/qtest/virtio-9p-test.c @@ -84,7 +84,7 @@ static void pci_config(void *obj, void *data, QGuestAllocator *t_alloc) QVirtio9P *v9p = obj; alloc = t_alloc; size_t tag_len = qvirtio_config_readw(v9p->vdev, 0); -char *tag; +g_autofree char *tag = NULL; int i; g_assert_cmpint(tag_len, ==, strlen(MOUNT_TAG)); @@ -94,7 +94,6 @@ static void pci_config(void *obj, void *data, QGuestAllocator *t_alloc) tag[i] = qvirtio_config_readb(v9p->vdev, i + 2); } g_assert_cmpmem(tag, tag_len, MOUNT_TAG, tag_len); -g_free(tag); } #define P9_MAX_SIZE 4096 /* Max size of a T-message or R-message */ @@ -580,7 +579,7 @@ static void do_version(QVirtio9P *v9p) { const char *version = "9P2000.L"; uint16_t server_len; -char *server_version; +g_autofree char *server_version = NULL; P9Req *req; req = v9fs_tversion(v9p, P9_MAX_SIZE, version, P9_NOTAG); @@ -588,8 +587,6 @@ static void do_version(QVirtio9P *v9p) v9fs_rversion(req, _len, _version); g_assert_cmpmem(server_version, server_len, version, strlen(version)); - -g_free(server_version); } /* utility function: walk to requested dir and return fid for that dir */ @@ -637,7 +634,7 @@ static void fs_walk(void *obj, void *data, QGuestAllocator *t_alloc) alloc = t_alloc; char *wnames[P9_MAXWELEM]; uint16_t nwqid; -v9fs_qid *wqid; +g_autofree v9fs_qid *wqid = NULL; int i; P9Req *req; @@ -655,8 +652,6 @@ static void fs_walk(void *obj, void *data, QGuestAllocator *t_alloc) for (i = 0; i < P9_MAXWELEM; i++) { g_free(wnames[i]); } - -g_free(wqid); } static bool fs_dirents_contain_name(struct V9fsDirent *e, const char* name) @@ -872,9 +867,9 @@ static void fs_readdir(void *obj, void *data, QGuestAllocator *t_alloc) g_assert_cmpint(fs_dirents_contain_name(entries, "."), ==, true); g_assert_cmpint(fs_dirents_contain_name(entries, ".."), ==, true); for (int i = 0; i < QTEST_V9FS_SYNTH_READDIR_NFILES; ++i) { -char *name = g_strdup_printf(QTEST_V9FS_SYNTH_READDIR_FILE, i); +g_autofree char *name = +g_strdup_printf(QTEST_V9FS_SYNTH_READDIR_FILE, i); g_assert_cmpint(fs_dirents_contain_name(entries, name), ==, true); -g_free(name); } v9fs_free_dirents(entries); @@ -984,7 +979,8 @@ static void fs_walk_dotdot(void *obj, void *data, QGuestAllocator *t_alloc) QVirtio9P *v9p = obj; alloc = t_alloc; char *const wnames[] = { g_strdup("..") }; -v9fs_qid root_qid, *wqid; +v9fs_qid root_qid; +g_autofree v9fs_qid *wqid = NULL; P9Req *req; do_version(v9p); @@ -998,7 +994,6 @@ static void fs_walk_dotdot(void *obj, void *data, QGuestAllocator *t_alloc) g_assert_cmpmem(_qid, 13, wqid[0], 13); -g_free(wqid); g_free(wnames[0]); } @@ -1027,7 +1022,7 @@ static void fs_write(void *obj, void *data, QGuestAllocator *t_alloc) alloc = t_alloc; static const uint32_t write_count = P9_MAX_SIZE / 2; char *const wnames[] = { g_strdup(QTEST_V9FS_SYNTH_WRITE_FILE) }; -char *buf = g_malloc0(write_count); +g_autofree char *buf = g_malloc0(write_count); uint32_t count; P9Req *req; @@ -1045,7 +1040,6 @@ static void fs_write(void *obj, void *data, QGuestAllocator *t_alloc) v9fs_rwrite(req, ); g_assert_cmpint(count, ==, write_count); -g_free(buf); g_free(wnames[0]); } @@ -1125,7 +1119,7 @@ static void fs_flush_ignored(void *obj, void *data, QGuestAllocator *t_alloc) static void do_mkdir(QVirtio9P *v9p, const char *path, const char *cname) { -char *const name = g_strdup(cname); +g_autofree char *name = g_strdup(cname); uint32_t fid; P9Req *req; @@ -1134,15 +1128,13 @@ static void do_mkdir(QVirtio9P *v9p, const char *path, const char *cname) req = v9fs_tmkdir(v9p, fid, name, 0750, 0, 0); v9fs_req_wait_for_reply(req, NULL); v9fs_rmkdir(req, NULL); - -g_free(name); } /* create a regular file with Tlcreate and return file's fid */ static uint32_t do_lcreate(QVirtio9P *v9p, const char *path, const char *cname) { -char *const name = g_strdup(cname); +g_autofree char *name = g_strdup(cname); uint32_t fid; P9Req *req; @@ -1152,7 +1144,6 @@ static uint32_t do_lcreate(QVirtio9P *v9p, const char *path, v9fs_req_wait_for_reply(req, NULL); v9fs_rlcreate(req, NULL, NULL); -g_free(name); return fid; } @@ -1160,8 +1151,8 @@ static uint32_t do_lcreate(QVirtio9P *v9p, const char *path, static void
Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq
On Tue, Feb 8, 2022 at 4:58 AM Jason Wang wrote: > > > 在 2022/2/1 上午2:58, Eugenio Perez Martin 写道: > > On Sun, Jan 30, 2022 at 5:03 AM Jason Wang wrote: > >> > >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道: > >>> First half of the buffers forwarding part, preparing vhost-vdpa > >>> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so > >>> this is effectively dead code at the moment, but it helps to reduce > >>> patch size. > >>> > >>> Signed-off-by: Eugenio Pérez > >>> --- > >>>hw/virtio/vhost-shadow-virtqueue.h | 2 +- > >>>hw/virtio/vhost-shadow-virtqueue.c | 21 - > >>>hw/virtio/vhost-vdpa.c | 133 ++--- > >>>3 files changed, 143 insertions(+), 13 deletions(-) > >>> > >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h > >>> b/hw/virtio/vhost-shadow-virtqueue.h > >>> index 035207a469..39aef5ffdf 100644 > >>> --- a/hw/virtio/vhost-shadow-virtqueue.h > >>> +++ b/hw/virtio/vhost-shadow-virtqueue.h > >>> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const > >>> VhostShadowVirtqueue *svq); > >>> > >>>void vhost_svq_stop(VhostShadowVirtqueue *svq); > >>> > >>> -VhostShadowVirtqueue *vhost_svq_new(void); > >>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize); > >>> > >>>void vhost_svq_free(VhostShadowVirtqueue *vq); > >>> > >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c > >>> b/hw/virtio/vhost-shadow-virtqueue.c > >>> index f129ec8395..7c168075d7 100644 > >>> --- a/hw/virtio/vhost-shadow-virtqueue.c > >>> +++ b/hw/virtio/vhost-shadow-virtqueue.c > >>> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq) > >>>/** > >>> * Creates vhost shadow virtqueue, and instruct vhost device to use > >>> the shadow > >>> * methods and file descriptors. > >>> + * > >>> + * @qsize Shadow VirtQueue size > >>> + * > >>> + * Returns the new virtqueue or NULL. > >>> + * > >>> + * In case of error, reason is reported through error_report. > >>> */ > >>> -VhostShadowVirtqueue *vhost_svq_new(void) > >>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize) > >>>{ > >>> +size_t desc_size = sizeof(vring_desc_t) * qsize; > >>> +size_t device_size, driver_size; > >>>g_autofree VhostShadowVirtqueue *svq = > >>> g_new0(VhostShadowVirtqueue, 1); > >>>int r; > >>> > >>> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void) > >>>/* Placeholder descriptor, it should be deleted at set_kick_fd */ > >>>event_notifier_init_fd(>svq_kick, INVALID_SVQ_KICK_FD); > >>> > >>> +svq->vring.num = qsize; > >> > >> I wonder if this is the best. E.g some hardware can support up to 32K > >> queue size. So this will probably end up with: > >> > >> 1) SVQ use 32K queue size > >> 2) hardware queue uses 256 > >> > > In that case SVQ vring queue size will be 32K and guest's vring can > > negotiate any number with SVQ equal or less than 32K, > > > Sorry for being unclear what I meant is actually > > 1) SVQ uses 32K queue size > > 2) guest vq uses 256 > > This looks like a burden that needs extra logic and may damage the > performance. > Still not getting this point. An available guest buffer, although contiguous in GPA/GVA, can expand in multiple buffers if it's not contiguous in qemu's VA (by the while loop in virtqueue_map_desc [1]). In that scenario it is better to have "plenty" of SVQ buffers. I'm ok if we decide to put an upper limit though, or if we decide not to handle this situation. But we would leave out valid virtio drivers. Maybe to set a fixed upper limit (1024?)? To add another parameter (x-svq-size-n=N)? If you mean we lose performance because memory gets more sparse I think the only possibility is to limit that way. > And this can lead other interesting situation: > > 1) SVQ uses 256 > > 2) guest vq uses 1024 > > Where a lot of more SVQ logic is needed. > If we agree that a guest descriptor can expand in multiple SVQ descriptors, this should be already handled by the previous logic too. But this should only happen in case that qemu is launched with a "bad" cmdline, isn't it? If I run that example with vp_vdpa, L0 qemu will happily accept 1024 as a queue size [2]. But if the vdpa device maximum queue size is effectively 256, this will result in an error: We're not exposing it to the guest at any moment but with qemu's cmdline. > > > including 256. > > Is that what you mean? > > > I mean, it looks to me the logic will be much more simplified if we just > allocate the shadow virtqueue with the size what guest can see (guest > vring). > > Then we don't need to think if the difference of the queue size can have > any side effects. > I think that we cannot avoid that extra logic unless we force GPA to be contiguous in IOVA. If we are sure the guest's buffers cannot be at more than one descriptor in SVQ, then yes, we can simplify things. If not, I think we are forced to carry all of it. But if we prove it I'm not opposed to simplifying things and making head at
[PULL v2 3/5] tests/9pfs: Fix leak of local_test_path
From: Greg Kurz local_test_path is allocated in virtio_9p_create_local_test_dir() to hold the path of the temporary directory. It should be freed in virtio_9p_remove_local_test_dir() when the temporary directory is removed. Clarify the lifecycle of local_test_path while here. Based-on: Signed-off-by: Greg Kurz Message-Id: <20220201151508.190035-2-gr...@kaod.org> Reviewed-by: Christian Schoenebeck Signed-off-by: Christian Schoenebeck --- tests/qtest/libqos/virtio-9p.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/tests/qtest/libqos/virtio-9p.c b/tests/qtest/libqos/virtio-9p.c index ef96ef006a..5d18e5eae5 100644 --- a/tests/qtest/libqos/virtio-9p.c +++ b/tests/qtest/libqos/virtio-9p.c @@ -39,8 +39,13 @@ static char *concat_path(const char* a, const char* b) void virtio_9p_create_local_test_dir(void) { +g_assert(local_test_path == NULL); struct stat st; char *pwd = g_get_current_dir(); +/* + * template gets cached into local_test_path and freed in + * virtio_9p_remove_local_test_dir(). + */ char *template = concat_path(pwd, "qtest-9p-local-XX"); local_test_path = mkdtemp(template); @@ -66,6 +71,8 @@ void virtio_9p_remove_local_test_dir(void) /* ignore error, dummy check to prevent compiler error */ } g_free(cmd); +g_free(local_test_path); +local_test_path = NULL; } char *virtio_9p_test_path(const char *path) -- 2.20.1
Re: [PATCH v2 9/9] spapr: implement nested-hv capability for the virtual hypervisor
On 2/16/22 13:30, Nicholas Piggin wrote: Excerpts from Nicholas Piggin's message of February 16, 2022 9:38 pm: Excerpts from Cédric Le Goater's message of February 16, 2022 8:52 pm: On 2/16/22 11:25, Nicholas Piggin wrote: This implements the Nested KVM HV hcall API for spapr under TCG. The L2 is switched in when the H_ENTER_NESTED hcall is made, and the L1 is switched back in returned from the hcall when a HV exception is sent to the vhyp. Register state is copied in and out according to the nested KVM HV hcall API specification. The hdecr timer is started when the L2 is switched in, and it provides the HDEC / 0x980 return to L1. The MMU re-uses the bare metal radix 2-level page table walker by using the get_pate method to point the MMU to the nested partition table entry. MMU faults due to partition scope errors raise HV exceptions and accordingly are routed back to the L1. The MMU does not tag translations for the L1 (direct) vs L2 (nested) guests, so the TLB is flushed on any L1<->L2 transition (hcall entry and exit).> Reviewed-by: Fabiano Rosas Signed-off-by: Nicholas Piggin Reviewed-by: Cédric Le Goater Some last comments below, [...] diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index edbf3eeed0..852fe61b36 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -199,6 +199,9 @@ struct SpaprMachineState { bool has_graphics; uint32_t vsmt; /* Virtual SMT mode (KVM's "core stride") */ +/* Nested HV support (TCG only) */ +uint64_t nested_ptcr; + this is new state to migrate. [...] +/* Linux 64-bit powerpc pt_regs struct, used by nested HV */ +struct kvmppc_pt_regs { +uint64_t gpr[32]; +uint64_t nip; +uint64_t msr; +uint64_t orig_gpr3;/* Used for restarting system calls */ +uint64_t ctr; +uint64_t link; +uint64_t xer; +uint64_t ccr; +uint64_t softe;/* Soft enabled/disabled */ +uint64_t trap; /* Reason for being here */ +uint64_t dar; /* Fault registers */ +uint64_t dsisr;/* on 4xx/Book-E used for ESR */ +uint64_t result; /* Result of a system call */ +}; I think we need to start moving all the spapr hcall definitions under spapr_hcall.h. It can come later. Sure. [...] diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h index dab3dfc76c..b560514560 100644 --- a/include/hw/ppc/spapr_cpu_core.h +++ b/include/hw/ppc/spapr_cpu_core.h @@ -48,6 +48,11 @@ typedef struct SpaprCpuState { bool prod; /* not migrated, only used to improve dispatch latencies */ struct ICPState *icp; struct XiveTCTX *tctx; + +/* Fields for nested-HV support */ +bool in_nested; /* true while the L2 is executing */ +CPUPPCState *nested_host_state; /* holds the L1 state while L2 executes */ +int64_t nested_tb_offset; /* L1->L2 TB offset */ This needs a new vmstate. How about instead of the vmstate (we would need all the L1 state in nested_host_state as well), we just add a migration blocker in the L2 entry path. We could limit the max hdecr to say 1 second to ensure it unblocks before long. I know migration blockers are not preferred but in this case it gives us some iterations to debug and optimise first, which might change the data to migrate. This should be roughly the incremental patch to do this. I think we can merge without it. Adding support shouldn't be too complex and TCG migration of an L1 running L2 is not the most important feature today. It would be better to have something clean (blocker if incomplete or a decent support) before the 7.0 is released though However, there is an issue with TCG migration and it has been there for a while : https://lore.kernel.org/qemu-devel/fb2e56cc-15d1-65ee-9d9c-64223483e...@kaod.org/ Thanks, C.
[PULL v2 4/5] tests/9pfs: Use g_autofree and g_autoptr where possible
From: Greg Kurz It is recommended to use g_autofree or g_autoptr as it reduces the odds of introducing memory leaks in future changes. Signed-off-by: Greg Kurz Message-Id: <20220201151508.190035-3-gr...@kaod.org> Reviewed-by: Christian Schoenebeck Signed-off-by: Christian Schoenebeck --- tests/qtest/libqos/virtio-9p.c | 13 - 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/tests/qtest/libqos/virtio-9p.c b/tests/qtest/libqos/virtio-9p.c index 5d18e5eae5..f51f0635cc 100644 --- a/tests/qtest/libqos/virtio-9p.c +++ b/tests/qtest/libqos/virtio-9p.c @@ -41,7 +41,7 @@ void virtio_9p_create_local_test_dir(void) { g_assert(local_test_path == NULL); struct stat st; -char *pwd = g_get_current_dir(); +g_autofree char *pwd = g_get_current_dir(); /* * template gets cached into local_test_path and freed in * virtio_9p_remove_local_test_dir(). @@ -52,7 +52,6 @@ void virtio_9p_create_local_test_dir(void) if (!local_test_path) { g_test_message("mkdtemp('%s') failed: %s", template, strerror(errno)); } -g_free(pwd); g_assert(local_test_path != NULL); @@ -65,12 +64,11 @@ void virtio_9p_create_local_test_dir(void) void virtio_9p_remove_local_test_dir(void) { g_assert(local_test_path != NULL); -char *cmd = g_strdup_printf("rm -fr '%s'\n", local_test_path); +g_autofree char *cmd = g_strdup_printf("rm -fr '%s'\n", local_test_path); int res = system(cmd); if (res < 0) { /* ignore error, dummy check to prevent compiler error */ } -g_free(cmd); g_free(local_test_path); local_test_path = NULL; } @@ -216,8 +214,8 @@ static void *virtio_9p_pci_create(void *pci_bus, QGuestAllocator *t_alloc, static void regex_replace(GString *haystack, const char *pattern, const char *replace_fmt, ...) { -GRegex *regex; -char *replace, *s; +g_autoptr(GRegex) regex = NULL; +g_autofree char *replace = NULL, *s = NULL; va_list argp; va_start(argp, replace_fmt); @@ -227,9 +225,6 @@ static void regex_replace(GString *haystack, const char *pattern, regex = g_regex_new(pattern, 0, 0, NULL); s = g_regex_replace(regex, haystack->str, -1, 0, replace, 0, NULL); g_string_assign(haystack, s); -g_free(s); -g_regex_unref(regex); -g_free(replace); } void virtio_9p_assign_local_driver(GString *cmd_line, const char *args) -- 2.20.1
Re: [PATCH] tcg: Remove dh_alias indirection for dh_typecode
Richard Henderson writes: > Reported-by: Keith Packard > Signed-off-by: Richard Henderson Looks good to me, and it passes my very simple test when run on s390. Tested-by: Keith Packard -- -keith signature.asc Description: PGP signature
[PULL v2 2/5] tests/9pfs: fix mkdir() being called twice
The 9p test cases use mkdtemp() to create a temporary directory for running the 'local' 9p tests with real files/dirs. Unlike mktemp() which only generates a unique file name, mkdtemp() also creates the directory, therefore the subsequent mkdir() was wrong and caused errors on some systems. Signed-off-by: Christian Schoenebeck Fixes: 136b7af2 (tests/9pfs: fix test dir for parallel tests) Reported-by: Daniel P. Berrangé Resolves: https://gitlab.com/qemu-project/qemu/-/issues/832 Reviewed-by: Daniel P. Berrangé Reviewed-by: Greg Kurz Message-Id: --- tests/qtest/libqos/virtio-9p.c | 18 +++--- 1 file changed, 3 insertions(+), 15 deletions(-) diff --git a/tests/qtest/libqos/virtio-9p.c b/tests/qtest/libqos/virtio-9p.c index b4e1143288..ef96ef006a 100644 --- a/tests/qtest/libqos/virtio-9p.c +++ b/tests/qtest/libqos/virtio-9p.c @@ -37,31 +37,19 @@ static char *concat_path(const char* a, const char* b) return g_build_filename(a, b, NULL); } -static void init_local_test_path(void) +void virtio_9p_create_local_test_dir(void) { +struct stat st; char *pwd = g_get_current_dir(); char *template = concat_path(pwd, "qtest-9p-local-XX"); + local_test_path = mkdtemp(template); if (!local_test_path) { g_test_message("mkdtemp('%s') failed: %s", template, strerror(errno)); } -g_assert(local_test_path); g_free(pwd); -} - -void virtio_9p_create_local_test_dir(void) -{ -struct stat st; -int res; - -init_local_test_path(); g_assert(local_test_path != NULL); -res = mkdir(local_test_path, 0777); -if (res < 0) { -g_test_message("mkdir('%s') failed: %s", local_test_path, - strerror(errno)); -} /* ensure test directory exists now ... */ g_assert(stat(local_test_path, ) == 0); -- 2.20.1
Re: [PATCH v2 00/12] GL & D-Bus display related fixes
Hi On Thu, Feb 17, 2022 at 8:39 PM Akihiko Odaki wrote: > On Fri, Feb 18, 2022 at 1:12 AM Marc-André Lureau > wrote: > > > > Hi > > > > On Thu, Feb 17, 2022 at 5:09 PM Akihiko Odaki > wrote: > >> > >> On Thu, Feb 17, 2022 at 8:58 PM wrote: > >> > > >> > From: Marc-André Lureau > >> > > >> > Hi, > >> > > >> > In the thread "[PATCH 0/6] ui/dbus: Share one listener for a > console", Akihiko > >> > Odaki reported a number of issues with the GL and D-Bus display. His > series > >> > propose a different design, and reverting some of my previous generic > console > >> > changes to fix those issues. > >> > > >> > However, as I work through the issue so far, they can be solved by > reasonable > >> > simple fixes while keeping the console changes generic (not specific > to the > >> > D-Bus display backend). I belive a shared infrastructure is more > beneficial long > >> > term than having GL-specific code in the DBus code (in particular, the > >> > egl-headless & VNC combination could potentially use it) > >> > > >> > Thanks a lot Akihiko for reporting the issues proposing a different > approach! > >> > Please test this alternative series and let me know of any further > issues. My > >> > understanding is that you are mainly concerned with the Cocoa > backend, and I > >> > don't have a way to test it, so please check it. If necessary, we may > well have > >> > to revert my earlier changes and go your way, eventually. > >> > > >> > Marc-André Lureau (12): > >> > ui/console: fix crash when using gl context with non-gl listeners > >> > ui/console: fix texture leak when calling > surface_gl_create_texture() > >> > ui: do not create a surface when resizing a GL scanout > >> > ui/console: move check for compatible GL context > >> > ui/console: move dcl compatiblity check to a callback > >> > ui/console: egl-headless is compatible with non-gl listeners > >> > ui/dbus: associate the DBusDisplayConsole listener with the given > >> > console > >> > ui/console: move console compatibility check to > dcl_display_console() > >> > ui/shader: fix potential leak of shader on error > >> > ui/shader: free associated programs > >> > ui/console: add a dpy_gfx_switch callback helper > >> > ui/dbus: fix texture sharing > >> > > >> > include/ui/console.h | 19 --- > >> > ui/dbus.h| 3 ++ > >> > ui/console-gl.c | 4 ++ > >> > ui/console.c | 119 > ++- > >> > ui/dbus-console.c| 27 +- > >> > ui/dbus-listener.c | 11 > >> > ui/dbus.c| 33 +++- > >> > ui/egl-headless.c| 17 ++- > >> > ui/gtk.c | 18 ++- > >> > ui/sdl2.c| 9 +++- > >> > ui/shader.c | 9 +++- > >> > ui/spice-display.c | 9 +++- > >> > 12 files changed, 192 insertions(+), 86 deletions(-) > >> > > >> > -- > >> > 2.34.1.428.gdcc0cd074f0c > >> > > >> > > >> > >> You missed only one thing: > >> >- that console_select and register_displaychangelistener may not call > >> > dpy_gfx_switch and call dpy_gl_scanout_texture instead. It is > >> > incompatible with non-OpenGL displays > >> > >> displaychangelistener_display_console always has to call > >> dpy_gfx_switch for non-OpenGL displays, but it still doesn't. > > > > > > Ok, would that be what you have in mind? > > > > --- a/ui/console.c > > +++ b/ui/console.c > > @@ -1122,6 +1122,10 @@ static void > displaychangelistener_display_console(DisplayChangeListener *dcl, > > } else if (con->scanout.kind == SCANOUT_SURFACE) { > > dpy_gfx_create_texture(con, con->surface); > > displaychangelistener_gfx_switch(dcl, con->surface); > > +} else { > > +/* use the fallback surface, egl-headless keeps it updated */ > > +assert(con->surface); > > +displaychangelistener_gfx_switch(dcl, con->surface); > > } > > It should call displaychangelistener_gfx_switch even when e.g. > con->scanout.kind == SCANOUT_TEXTURE. egl-headless renders the content > to the last DisplaySurface it received while con->scanout.kind == > SCANOUT_TEXTURE. > I see, egl-headless is really not a "listener". > > > > I wish such egl-headless specific code would be there, but we would need > more refactoring. > > > > I think I would rather have a backend split for GL context, like > "-object egl-context". egl-headless-specific copy code would be handled by > common/util code for anything that wants a pixman surface (VNC, screen > capture, non-GL display etc). > > > > This split would allow sharing the context code, and introduce other > system specific GL initialization, such as WGL etc. Right now, I doubt the > EGL code works on anything but Linux. > > Sharing the context code is unlikely to happen. Usually the toolkit > (GTK, SDL, or Cocoa in my fork) knows what graphics accelerator should > be used and how the context should be created for a particular window. > The context sharing can be achieved only for headless
[PATCH] migration: NULL transport_data after freeing
migration_incoming_state_destroy() NULLs all objects it frees after they are freed, presumably so that a subsequent call to the same function will not free them again, unless new objects have been created in the meantime. transport_data is the exception, and it shows exactly this problem: When an incoming migration uses transport_cleanup() and transport_data, and a subsequent incoming migration (e.g. loadvm) occurs that does not, then when this second one is done, it will call transport_cleanup() on the old transport_data again -- which has already been freed. This is sometimes visible in the iotest 201, though for some reason I can only reproduce it with -m32. To fix this, call transport_cleanup() only when transport_data is not NULL (otherwise there is nothing to clean up), and set transport_data to NULL when it has been cleaned up (i.e. freed). (transport_cleanup() is used only by migration/socket.c, where socket_start_incoming_migration_internal() sets both it and transport_data to non-NULL values.) Signed-off-by: Hanna Reitz --- migration/migration.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/migration/migration.c b/migration/migration.c index bcc385b94b..cdb2e76d02 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -287,8 +287,9 @@ void migration_incoming_state_destroy(void) g_array_free(mis->postcopy_remote_fds, TRUE); mis->postcopy_remote_fds = NULL; } -if (mis->transport_cleanup) { +if (mis->transport_cleanup && mis->transport_data) { mis->transport_cleanup(mis->transport_data); +mis->transport_data = NULL; } qemu_event_reset(>main_thread_load_event); -- 2.34.1
Re: [PATCH v5 11/20] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU
On Tue, Feb 08, 2022 at 09:35:04AM -0500, Emanuele Giuseppe Esposito wrote: > Once job lock is used and aiocontext is removed, mirror has > to perform job operations under the same critical section, > using the helpers prepared in previous commit. > > Note: at this stage, job_{lock/unlock} and job lock guard macros > are *nop*. > > Signed-off-by: Emanuele Giuseppe Esposito > --- > block/mirror.c | 19 ++- > 1 file changed, 14 insertions(+), 5 deletions(-) My understanding is that MirrorBlockJob itself does need a lock because it's only access from the coroutines - and they run in only one thread. Reviewed-by: Stefan Hajnoczi signature.asc Description: PGP signature
[PATCH v1] aio-posix: fix build failure io_uring 2.2
The io_uring fixed "Don't truncate addr fields to 32-bit on 32-bit": https://git.kernel.dk/cgit/liburing/commit/?id=d84c29b19ed0b13619cff40141bb1fc3615b This leads to build failure: ../util/fdmon-io_uring.c: In function ‘add_poll_remove_sqe’: ../util/fdmon-io_uring.c:182:36: error: passing argument 2 of ‘io_uring_prep_poll_remove’ makes integer from pointer without a cast [-Werror=int-conversion] 182 | io_uring_prep_poll_remove(sqe, node); |^~~~ || |AioHandler * In file included from /root/io/qemu/include/block/aio.h:18, from ../util/aio-posix.h:20, from ../util/fdmon-io_uring.c:49: /usr/include/liburing.h:415:17: note: expected ‘__u64’ {aka ‘long long unsigned int’} but argument is of type ‘AioHandler *’ 415 | __u64 user_data) | ~~^ cc1: all warnings being treated as errors So convert the paramter to right type according to the io_uring define. Signed-off-by: Haiyue Wang --- util/fdmon-io_uring.c | 4 1 file changed, 4 insertions(+) diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c index 1461dfa407..e7febbf11f 100644 --- a/util/fdmon-io_uring.c +++ b/util/fdmon-io_uring.c @@ -179,7 +179,11 @@ static void add_poll_remove_sqe(AioContext *ctx, AioHandler *node) { struct io_uring_sqe *sqe = get_sqe(ctx); +#ifdef LIBURING_HAVE_DATA64 +io_uring_prep_poll_remove(sqe, (__u64)node); +#else io_uring_prep_poll_remove(sqe, node); +#endif } /* Add a timeout that self-cancels when another cqe becomes ready */ -- 2.35.1
[PULL v2 5/5] 9pfs: Fix segfault in do_readdir_many caused by struct dirent overread
From: Vitaly Chikunov `struct dirent' returned from readdir(3) could be shorter (or longer) than `sizeof(struct dirent)', thus memcpy of sizeof length will overread into unallocated page causing SIGSEGV. Example stack trace: #0 0x559ebeed v9fs_co_readdir_many (/usr/bin/qemu-system-x86_64 + 0x497eed) #1 0x559ec2e9 v9fs_readdir (/usr/bin/qemu-system-x86_64 + 0x4982e9) #2 0x55eb7983 coroutine_trampoline (/usr/bin/qemu-system-x86_64 + 0x963983) #3 0x773e0be0 n/a (n/a + 0x0) While fixing this, provide a helper for any future `struct dirent' cloning. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/841 Cc: qemu-sta...@nongnu.org Co-authored-by: Christian Schoenebeck Reviewed-by: Dmitry V. Levin Signed-off-by: Vitaly Chikunov Tested-by: Christian Schoenebeck Reviewed-by: Christian Schoenebeck Acked-by: Greg Kurz Tested-by: Vitaly Chikunov Message-Id: <20220216181821.3481527-1...@altlinux.org> [C.S. - Fix typo in source comment. ] Signed-off-by: Christian Schoenebeck --- hw/9pfs/9p-synth.c | 18 +++--- hw/9pfs/9p-synth.h | 5 + hw/9pfs/codir.c | 3 +-- include/qemu/osdep.h | 13 + util/osdep.c | 21 + 5 files changed, 55 insertions(+), 5 deletions(-) diff --git a/hw/9pfs/9p-synth.c b/hw/9pfs/9p-synth.c index b38088e066..7a7cd5c5ba 100644 --- a/hw/9pfs/9p-synth.c +++ b/hw/9pfs/9p-synth.c @@ -182,7 +182,12 @@ static int synth_opendir(FsContext *ctx, V9fsSynthOpenState *synth_open; V9fsSynthNode *node = *(V9fsSynthNode **)fs_path->data; -synth_open = g_malloc(sizeof(*synth_open)); +/* + * V9fsSynthOpenState contains 'struct dirent' which have OS-specific + * properties, thus it's zero cleared on allocation here and below + * in synth_open. + */ +synth_open = g_new0(V9fsSynthOpenState, 1); synth_open->node = node; node->open_count++; fs->private = synth_open; @@ -220,7 +225,14 @@ static void synth_rewinddir(FsContext *ctx, V9fsFidOpenState *fs) static void synth_direntry(V9fsSynthNode *node, struct dirent *entry, off_t off) { -strcpy(entry->d_name, node->name); +size_t sz = strlen(node->name) + 1; +/* + * 'entry' is always inside of V9fsSynthOpenState which have NAME_MAX + * back padding. Ensure we do not overflow it. + */ +g_assert(sizeof(struct dirent) + NAME_MAX >= + offsetof(struct dirent, d_name) + sz); +memcpy(entry->d_name, node->name, sz); entry->d_ino = node->attr->inode; entry->d_off = off + 1; } @@ -266,7 +278,7 @@ static int synth_open(FsContext *ctx, V9fsPath *fs_path, V9fsSynthOpenState *synth_open; V9fsSynthNode *node = *(V9fsSynthNode **)fs_path->data; -synth_open = g_malloc(sizeof(*synth_open)); +synth_open = g_new0(V9fsSynthOpenState, 1); synth_open->node = node; node->open_count++; fs->private = synth_open; diff --git a/hw/9pfs/9p-synth.h b/hw/9pfs/9p-synth.h index 036d7e4a5b..eeb246f377 100644 --- a/hw/9pfs/9p-synth.h +++ b/hw/9pfs/9p-synth.h @@ -41,6 +41,11 @@ typedef struct V9fsSynthOpenState { off_t offset; V9fsSynthNode *node; struct dirent dent; +/* + * Ensure there is enough space for 'dent' above, some systems have a + * d_name size of just 1, which would cause a buffer overrun. + */ +char dent_trailing_space[NAME_MAX]; } V9fsSynthOpenState; int qemu_v9fs_synth_mkdir(V9fsSynthNode *parent, int mode, diff --git a/hw/9pfs/codir.c b/hw/9pfs/codir.c index 032cce04c4..c0873bde16 100644 --- a/hw/9pfs/codir.c +++ b/hw/9pfs/codir.c @@ -143,8 +143,7 @@ static int do_readdir_many(V9fsPDU *pdu, V9fsFidState *fidp, } else { e = e->next = g_malloc0(sizeof(V9fsDirEnt)); } -e->dent = g_malloc0(sizeof(struct dirent)); -memcpy(e->dent, dent, sizeof(struct dirent)); +e->dent = qemu_dirent_dup(dent); /* perform a full stat() for directory entry if requested by caller */ if (dostat) { diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h index d1660d67fa..ce12f64853 100644 --- a/include/qemu/osdep.h +++ b/include/qemu/osdep.h @@ -805,6 +805,19 @@ static inline int platform_does_not_support_system(const char *command) } #endif /* !HAVE_SYSTEM_FUNCTION */ +/** + * Duplicate directory entry @dent. + * + * It is highly recommended to use this function instead of open coding + * duplication of @c dirent objects, because the actual @c struct @c dirent + * size may be bigger or shorter than @c sizeof(struct dirent) and correct + * handling is platform specific (see gitlab issue #841). + * + * @dent - original directory entry to be duplicated + * @returns duplicated directory entry which should be freed with g_free() + */ +struct dirent *qemu_dirent_dup(struct dirent *dent); + #ifdef __cplusplus } #endif diff --git a/util/osdep.c b/util/osdep.c index 42a0a4986a..67fbf22778 100644 ---
Re: [PATCH v2 00/12] GL & D-Bus display related fixes
On Fri, Feb 18, 2022 at 1:12 AM Marc-André Lureau wrote: > > Hi > > On Thu, Feb 17, 2022 at 5:09 PM Akihiko Odaki wrote: >> >> On Thu, Feb 17, 2022 at 8:58 PM wrote: >> > >> > From: Marc-André Lureau >> > >> > Hi, >> > >> > In the thread "[PATCH 0/6] ui/dbus: Share one listener for a console", >> > Akihiko >> > Odaki reported a number of issues with the GL and D-Bus display. His series >> > propose a different design, and reverting some of my previous generic >> > console >> > changes to fix those issues. >> > >> > However, as I work through the issue so far, they can be solved by >> > reasonable >> > simple fixes while keeping the console changes generic (not specific to the >> > D-Bus display backend). I belive a shared infrastructure is more >> > beneficial long >> > term than having GL-specific code in the DBus code (in particular, the >> > egl-headless & VNC combination could potentially use it) >> > >> > Thanks a lot Akihiko for reporting the issues proposing a different >> > approach! >> > Please test this alternative series and let me know of any further issues. >> > My >> > understanding is that you are mainly concerned with the Cocoa backend, and >> > I >> > don't have a way to test it, so please check it. If necessary, we may well >> > have >> > to revert my earlier changes and go your way, eventually. >> > >> > Marc-André Lureau (12): >> > ui/console: fix crash when using gl context with non-gl listeners >> > ui/console: fix texture leak when calling surface_gl_create_texture() >> > ui: do not create a surface when resizing a GL scanout >> > ui/console: move check for compatible GL context >> > ui/console: move dcl compatiblity check to a callback >> > ui/console: egl-headless is compatible with non-gl listeners >> > ui/dbus: associate the DBusDisplayConsole listener with the given >> > console >> > ui/console: move console compatibility check to dcl_display_console() >> > ui/shader: fix potential leak of shader on error >> > ui/shader: free associated programs >> > ui/console: add a dpy_gfx_switch callback helper >> > ui/dbus: fix texture sharing >> > >> > include/ui/console.h | 19 --- >> > ui/dbus.h| 3 ++ >> > ui/console-gl.c | 4 ++ >> > ui/console.c | 119 ++- >> > ui/dbus-console.c| 27 +- >> > ui/dbus-listener.c | 11 >> > ui/dbus.c| 33 +++- >> > ui/egl-headless.c| 17 ++- >> > ui/gtk.c | 18 ++- >> > ui/sdl2.c| 9 +++- >> > ui/shader.c | 9 +++- >> > ui/spice-display.c | 9 +++- >> > 12 files changed, 192 insertions(+), 86 deletions(-) >> > >> > -- >> > 2.34.1.428.gdcc0cd074f0c >> > >> > >> >> You missed only one thing: >> >- that console_select and register_displaychangelistener may not call >> > dpy_gfx_switch and call dpy_gl_scanout_texture instead. It is >> > incompatible with non-OpenGL displays >> >> displaychangelistener_display_console always has to call >> dpy_gfx_switch for non-OpenGL displays, but it still doesn't. > > > Ok, would that be what you have in mind? > > --- a/ui/console.c > +++ b/ui/console.c > @@ -1122,6 +1122,10 @@ static void > displaychangelistener_display_console(DisplayChangeListener *dcl, > } else if (con->scanout.kind == SCANOUT_SURFACE) { > dpy_gfx_create_texture(con, con->surface); > displaychangelistener_gfx_switch(dcl, con->surface); > +} else { > +/* use the fallback surface, egl-headless keeps it updated */ > +assert(con->surface); > +displaychangelistener_gfx_switch(dcl, con->surface); > } It should call displaychangelistener_gfx_switch even when e.g. con->scanout.kind == SCANOUT_TEXTURE. egl-headless renders the content to the last DisplaySurface it received while con->scanout.kind == SCANOUT_TEXTURE. > > I wish such egl-headless specific code would be there, but we would need more > refactoring. > > I think I would rather have a backend split for GL context, like "-object > egl-context". egl-headless-specific copy code would be handled by common/util > code for anything that wants a pixman surface (VNC, screen capture, non-GL > display etc). > > This split would allow sharing the context code, and introduce other system > specific GL initialization, such as WGL etc. Right now, I doubt the EGL code > works on anything but Linux. Sharing the context code is unlikely to happen. Usually the toolkit (GTK, SDL, or Cocoa in my fork) knows what graphics accelerator should be used and how the context should be created for a particular window. The context sharing can be achieved only for headless displays, namely dbus, egl-headless and spice. Few people would want to use them in combination. > >> >> Anything else should be addressed with this patch series. (And it also >> has nice fixes for shader leaks.) > > > thanks > >> >> >> cocoa doesn't have OpenGL output and egl-headless,
Re: [PATCH v5 1/3] s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3 for the s390x
Will submit patch later today, thanks
Re: Call for GSoC and Outreachy project ideas for summer 2022
On Thu, 17 Feb 2022 at 14:12, Stefano Garzarella wrote: > > On Mon, Feb 14, 2022 at 02:01:52PM +, Stefan Hajnoczi wrote: > >On Mon, 14 Feb 2022 at 07:11, Jason Wang wrote: > >> > >> On Fri, Jan 28, 2022 at 11:47 PM Stefan Hajnoczi > >> wrote: > >> > > >> > Dear QEMU, KVM, and rust-vmm communities, > >> > QEMU will apply for Google Summer of Code 2022 > >> > (https://summerofcode.withgoogle.com/) and has been accepted into > >> > Outreachy May-August 2022 (https://www.outreachy.org/). You can now > >> > submit internship project ideas for QEMU, KVM, and rust-vmm! > >> > > >> > If you have experience contributing to QEMU, KVM, or rust-vmm you can > >> > be a mentor. It's a great way to give back and you get to work with > >> > people who are just starting out in open source. > >> > > >> > Please reply to this email by February 21st with your project ideas. > >> > > >> > Good project ideas are suitable for remote work by a competent > >> > programmer who is not yet familiar with the codebase. In > >> > addition, they are: > >> > - Well-defined - the scope is clear > >> > - Self-contained - there are few dependencies > >> > - Uncontroversial - they are acceptable to the community > >> > - Incremental - they produce deliverables along the way > >> > > >> > Feel free to post ideas even if you are unable to mentor the project. > >> > It doesn't hurt to share the idea! > >> > >> Implementing the VIRTIO_F_IN_ORDER feature for both Qemu and kernel > >> (vhost/virtio drivers) would be an interesting idea. > >> > >> It satisfies all the points above since it's supported by virtio spec. > >> > >> (Unfortunately, I won't have time in the mentoring) > > > >Thanks for this idea. As a stretch goal we could add implementing the > >packed virtqueue layout in Linux vhost, QEMU's libvhost-user, and/or > >QEMU's virtio qtest code. > > > >Stefano: Thank you for volunteering to mentor the project. Please > >write a project description (see template below) and I will add this > >idea: > > > > I wrote a description of the project below. Let me know if there is > anything to change. Thanks, I have added the idea: https://wiki.qemu.org/Google_Summer_of_Code_2022#VIRTIO_F_IN_ORDER_support_for_virtio_devices Stefan
Re: [PATCH v5] 9pfs: Fix segfault in do_readdir_many caused by struct dirent overread
On Mittwoch, 16. Februar 2022 19:18:21 CET Vitaly Chikunov wrote: > `struct dirent' returned from readdir(3) could be shorter (or longer) > than `sizeof(struct dirent)', thus memcpy of sizeof length will overread > into unallocated page causing SIGSEGV. Example stack trace: > > #0 0x559ebeed v9fs_co_readdir_many (/usr/bin/qemu-system-x86_64 + > 0x497eed) #1 0x559ec2e9 v9fs_readdir (/usr/bin/qemu-system-x86_64 > + 0x4982e9) #2 0x55eb7983 coroutine_trampoline > (/usr/bin/qemu-system-x86_64 + 0x963983) #3 0x773e0be0 n/a (n/a + > 0x0) > > While fixing this, provide a helper for any future `struct dirent' cloning. > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/841 > Cc: qemu-sta...@nongnu.org > Co-authored-by: Christian Schoenebeck > Reviewed-by: Dmitry V. Levin > Signed-off-by: Vitaly Chikunov > --- Queued on 9p.next: https://github.com/cschoenebeck/qemu/commits/9p.next Thanks! I prepare a new PR now. Best regards, Christian Schoenebeck
Re: Call for GSoC and Outreachy project ideas for summer 2022
On Thu, 17 Feb 2022 at 07:08, Alice Frosi wrote: > > On Fri, Jan 28, 2022 at 6:04 PM Stefan Hajnoczi wrote: > > > > Dear QEMU, KVM, and rust-vmm communities, > > QEMU will apply for Google Summer of Code 2022 > > (https://summerofcode.withgoogle.com/) and has been accepted into > > Outreachy May-August 2022 (https://www.outreachy.org/). You can now > > submit internship project ideas for QEMU, KVM, and rust-vmm! > > > > If you have experience contributing to QEMU, KVM, or rust-vmm you can > > be a mentor. It's a great way to give back and you get to work with > > people who are just starting out in open source. > > > > Please reply to this email by February 21st with your project ideas. > > > > Good project ideas are suitable for remote work by a competent > > programmer who is not yet familiar with the codebase. In > > addition, they are: > > - Well-defined - the scope is clear > > - Self-contained - there are few dependencies > > - Uncontroversial - they are acceptable to the community > > - Incremental - they produce deliverables along the way > > > > Feel free to post ideas even if you are unable to mentor the project. > > It doesn't hurt to share the idea! > > > > I'd like to propose this idea: > > Title: Create encrypted storage using VM-based container runtimes > > Cryptsetup requires root privileges in order to be able to encrypt > storage with luks. However, privileged containers are generally > discouraged for security reasons. A possible solution to avoid extra > privileges is using VM-based container runtimes (e.g crun with libkrun > or kata-containers) and running inside the Virtual Machine the tools > for the storage encryption. > > This internship focus on a PoC for integrating and extending crun with > libkrun in order to be able to create encrypted storage. The initial > step will focus on creating encrypted images to demonstrate the > feasibility and the necessary changes in the stack. If the timeframe > allows it, an interesting follow-up of the first step is the > encryption of persistent storage using block-based PVCs. > > Language: C, rust, golang > Skills: containers and virtualization would be a big plus > I won't put a level but the intern needs to be willing to dig into > different source codes like crun (written in C), libkrun (written in > Rust) and possibly podman or other kubernetes/containers projects > (written in go) > Mentor: Alice Frosi, Co-mentor: Sergio Lopez Pascual > > Let me know if the idea sounds feasible to you! Thanks, I have added the idea: https://wiki.qemu.org/Google_Summer_of_Code_2022#Create_encrypted_storage_using_VM-based_container_runtimes Stefan
Re: qemu crash 100% CPU with Ubuntu10.04 guest (solved)
On Thu, Feb 17, 2022 at 12:07:15PM +1100, Ben Smith wrote: > Hi All, Hi, > I'm cross-posting this from Reddit qemu_kvm, in case it helps in some > way. I know my setup is ancient and unique; let me know if you would > like more info. > > Symptoms: > 1. Ubuntu10.04 32-bit guest locks up randomly between 0 and 30 days. > 2. The console shows a CPU trace dump, nothing else logged on the guest or > host. > 3. Host system (Ubuntu20.04) 100% CPU for qemu process. > > Solution: > When using virt-install, always use the "--os-variant" parameter! > e.g. --os-variant ubuntu10.04 > > From the man page "--os-variant... Optimize the guest configuration > for a specific operating system". > In this case, "optimize" apparently means "stop the crashing". The "--os-variant" will use virtio devices where applicable, recommended machine type, guest resources (e.g. CPU, memory, disk size) and other things that'll improve performance. > I was deliberately avoiding the option because the VM was already > performing much better than expected and I didn't want to complicate > the configuration. Using it is always recommended when using `virt-install`. The command `osinfo-query os` will list all the OSes that you can use with "--os-variant". Note: even if you don't find the latest version of $OS in `osinfo-query`, just using the most recent version still suffices. > This was very, very painful to troubleshoot; Involving spinning up 60 > VMs simultaneously, waiting for a failure, changing one parameter, > repeat. :( Yikes! Kudos for having the high threshold for frustration. I think providing a clear reproducer can still be useful. E.g. your full guest QEMU command-line and your QEMU version. (The libvirt-generated QEMu log contains the version info.) -- /kashyap
Re: [PATCH v2 00/12] GL & D-Bus display related fixes
Hi On Thu, Feb 17, 2022 at 5:09 PM Akihiko Odaki wrote: > On Thu, Feb 17, 2022 at 8:58 PM wrote: > > > > From: Marc-André Lureau > > > > Hi, > > > > In the thread "[PATCH 0/6] ui/dbus: Share one listener for a console", > Akihiko > > Odaki reported a number of issues with the GL and D-Bus display. His > series > > propose a different design, and reverting some of my previous generic > console > > changes to fix those issues. > > > > However, as I work through the issue so far, they can be solved by > reasonable > > simple fixes while keeping the console changes generic (not specific to > the > > D-Bus display backend). I belive a shared infrastructure is more > beneficial long > > term than having GL-specific code in the DBus code (in particular, the > > egl-headless & VNC combination could potentially use it) > > > > Thanks a lot Akihiko for reporting the issues proposing a different > approach! > > Please test this alternative series and let me know of any further > issues. My > > understanding is that you are mainly concerned with the Cocoa backend, > and I > > don't have a way to test it, so please check it. If necessary, we may > well have > > to revert my earlier changes and go your way, eventually. > > > > Marc-André Lureau (12): > > ui/console: fix crash when using gl context with non-gl listeners > > ui/console: fix texture leak when calling surface_gl_create_texture() > > ui: do not create a surface when resizing a GL scanout > > ui/console: move check for compatible GL context > > ui/console: move dcl compatiblity check to a callback > > ui/console: egl-headless is compatible with non-gl listeners > > ui/dbus: associate the DBusDisplayConsole listener with the given > > console > > ui/console: move console compatibility check to dcl_display_console() > > ui/shader: fix potential leak of shader on error > > ui/shader: free associated programs > > ui/console: add a dpy_gfx_switch callback helper > > ui/dbus: fix texture sharing > > > > include/ui/console.h | 19 --- > > ui/dbus.h| 3 ++ > > ui/console-gl.c | 4 ++ > > ui/console.c | 119 ++- > > ui/dbus-console.c| 27 +- > > ui/dbus-listener.c | 11 > > ui/dbus.c| 33 +++- > > ui/egl-headless.c| 17 ++- > > ui/gtk.c | 18 ++- > > ui/sdl2.c| 9 +++- > > ui/shader.c | 9 +++- > > ui/spice-display.c | 9 +++- > > 12 files changed, 192 insertions(+), 86 deletions(-) > > > > -- > > 2.34.1.428.gdcc0cd074f0c > > > > > > You missed only one thing: > >- that console_select and register_displaychangelistener may not call > > dpy_gfx_switch and call dpy_gl_scanout_texture instead. It is > > incompatible with non-OpenGL displays > > displaychangelistener_display_console always has to call > dpy_gfx_switch for non-OpenGL displays, but it still doesn't. > Ok, would that be what you have in mind? --- a/ui/console.c +++ b/ui/console.c @@ -1122,6 +1122,10 @@ static void displaychangelistener_display_console(DisplayChangeListener *dcl, } else if (con->scanout.kind == SCANOUT_SURFACE) { dpy_gfx_create_texture(con, con->surface); displaychangelistener_gfx_switch(dcl, con->surface); +} else { +/* use the fallback surface, egl-headless keeps it updated */ +assert(con->surface); +displaychangelistener_gfx_switch(dcl, con->surface); } I wish such egl-headless specific code would be there, but we would need more refactoring. I think I would rather have a backend split for GL context, like "-object egl-context". egl-headless-specific copy code would be handled by common/util code for anything that wants a pixman surface (VNC, screen capture, non-GL display etc). This split would allow sharing the context code, and introduce other system specific GL initialization, such as WGL etc. Right now, I doubt the EGL code works on anything but Linux. > Anything else should be addressed with this patch series. (And it also > has nice fixes for shader leaks.) > thanks > > cocoa doesn't have OpenGL output and egl-headless, the cause of many > pains addressed here, does not work on macOS so you need little > attention. I have an out-of-tree OpenGL support for cocoa but it is > out-of-tree anyway and I can fix it anytime. > Great! btw, I suppose you checked your DBus changes against the WIP "qemu-display" project. What was your experience? I don't think many people have tried it yet. Do you think this could be made to work on macOS? at least the non-dmabuf support should work, as long as Gtk4 has good macOS support. I don't know if dmabuf or similar exist there, any idea? -- Marc-André Lureau