Re: [PATCH v2 0/9] ppc: nested KVM HV for spapr virtual hypervisor

2022-02-17 Thread Cédric Le Goater

On 2/16/22 11:25, Nicholas Piggin wrote:

This should account for AFAIKS all comments, except maybe some
about naming.

Changes since v1:
- Per-CPU spapr nested state moved to SpaprCpuState from PowerPCCPU.
- address_space_map ops are used, small rearrangement to make any
   given access region store-only or load-only.
- Some style, naming, etc cleanups and fixes.

Hopefully I didn't miss anything.

Thanks,
Nick


We can address migration aspects with followups.

Applied for ppc-7.0

Thanks,

C.







Re: [PATCH v7 0/3] spapr: nvdimm: Introduce spapr-nvdimm device

2022-02-17 Thread Cédric Le Goater

On 2/4/22 09:15, Shivaprasad G Bhat wrote:

If the device backend is not persistent memory for the nvdimm, there
is need for explicit IO flushes to ensure persistence.

On SPAPR, the issue is addressed by adding a new hcall to request for
an explicit flush from the guest when the backend is not pmem.
So, the approach here is to convey when the hcall flush is required
in a device tree property. The guest once it knows the device needs
explicit flushes, makes the hcall as and when required.

It was suggested to create a new device type to address the
explicit flush for such backends on PPC instead of extending the
generic nvdimm device with new property. So, the patch introduces
the spapr-nvdimm device. The new device inherits the nvdimm device
with the new bahviour such that if the backend has pmem=no, the
device tree property is set by default.

The below demonstration shows the map_sync behavior for non-pmem
backends.
(https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py.data/map_sync.c)

The pmem0 is from spapr-nvdimm with with backend pmem=on, and pmem1 is
from spapr-nvdimm with pmem=off, mounted as
/dev/pmem0 on /mnt1 type xfs 
(rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota)
/dev/pmem1 on /mnt2 type xfs 
(rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota)

[root@atest-guest ~]# ./mapsync /mnt1/newfile > When pmem=on
[root@atest-guest ~]# ./mapsync /mnt2/newfile > when pmem=off
Failed to mmap  with Operation not supported

First patch adds the realize/unrealize call backs to the generic device
for the new device's vmstate registration. The second patch implements
the hcall, adds the necessary vmstate properties to spapr machine structure
for carrying the hcall status during save-restore. The nature of the hcall
being asynchronus, the patch uses aio utilities to offload the flush. The
third patch introduces the spapr-nvdimm device, adds the device tree
property for the guest when spapr-nvdimm is used with pmem=no on the
backend. Also adds new property pmem-override(?, suggest if you have better
name) to the spapr-nvdimm which hints at forcing the hcall based flushes even
on pmem backed devices.

The kernel changes to exploit this hcall is at
https://github.com/linuxppc/linux/commit/75b7c05ebf9026.patch




Applied for ppc-7.0

Thanks,

C.



Re: [PATCH v2 00/27] target/ppc: SPR registration cleanups

2022-02-17 Thread Cédric Le Goater

On 2/16/22 17:23, Fabiano Rosas wrote:

The goal of this series is to do some untangling of SPR registration
code in cpu_init.c and prepare for moving the CPU initialization into
separate files for each CPU family.

After this series we'll have only cpu-specific SPR code in cpu_init.c,
i.e. code that can be split and moved as a unit into other
files. Common/generic SPR code will be in helper_regs.c, exposed via
spr_common.h.

Changes from v1:

- Some commit message improvements suggested by David;

- Removed the soft_tlb rename patch. Kept the old name;

- Left the specific check_pow functions behind, they can be dealt with
   in the next series;

- Added a new patch to rename spr_tcg to spr_common.

Patches 23 and 26 still need review.

This series is based on legoater/ppc7.0.

v1:
https://lists.nongnu.org/archive/html/qemu-ppc/2022-02/msg00313.html

Fabiano Rosas (27):
   target/ppc: cpu_init: Remove not implemented comments
   target/ppc: cpu_init: Remove G2LE init code
   target/ppc: cpu_init: Group registration of generic SPRs
   target/ppc: cpu_init: Move Timebase registration into the common
 function
   target/ppc: cpu_init: Avoid nested SPR register functions
   target/ppc: cpu_init: Move 405 SPRs into register_405_sprs
   target/ppc: cpu_init: Move G2 SPRs into register_G2_sprs
   target/ppc: cpu_init: Decouple G2 SPR registration from 755
   target/ppc: cpu_init: Decouple 74xx SPR registration from 7xx
   target/ppc: cpu_init: Deduplicate 440 SPR registration
   target/ppc: cpu_init: Deduplicate 603 SPR registration
   target/ppc: cpu_init: Deduplicate 604 SPR registration
   target/ppc: cpu_init: Deduplicate 745/755 SPR registration
   target/ppc: cpu_init: Deduplicate 7xx SPR registration
   target/ppc: cpu_init: Move 755 L2 cache SPRs into a function
   target/ppc: cpu_init: Move e300 SPR registration into a function
   target/ppc: cpu_init: Move 604e SPR registration into a function
   target/ppc: cpu_init: Reuse init_proc_603 for the e300
   target/ppc: cpu_init: Reuse init_proc_604 for the 604e
   target/ppc: cpu_init: Reuse init_proc_745 for the 755
   target/ppc: cpu_init: Rename register_ne_601_sprs
   target/ppc: cpu_init: Remove register_usprg3_sprs
   target/ppc: Rename spr_tcg.h to spr_common.h
   target/ppc: cpu_init: Expose some SPR registration helpers
   target/ppc: cpu_init: Move SPR registration macros to a header
   target/ppc: cpu_init: Move check_pow and QOM macros to a header
   target/ppc: Move common SPR functions out of cpu_init

  target/ppc/cpu.h   |   39 +
  target/ppc/cpu_init.c  | 1879 
  target/ppc/helper_regs.c   |  402 +
  target/ppc/{spr_tcg.h => spr_common.h} |   69 +-
  target/ppc/translate.c |2 +-
  5 files changed, 1098 insertions(+), 1293 deletions(-)
  rename target/ppc/{spr_tcg.h => spr_common.h} (72%)




Applied for ppc-7.0

Thanks,

C.



Re: [PATCH v5 05/15] hw/nvme: Add support for SR-IOV

2022-02-17 Thread Klaus Jensen
On Feb 17 18:44, Lukasz Maniak wrote:
> This patch implements initial support for Single Root I/O Virtualization
> on an NVMe device.
> 
> Essentially, it allows to define the maximum number of virtual functions
> supported by the NVMe controller via sriov_max_vfs parameter.
> 
> Passing a non-zero value to sriov_max_vfs triggers reporting of SR-IOV
> capability by a physical controller and ARI capability by both the
> physical and virtual function devices.
> 
> NVMe controllers created via virtual functions mirror functionally
> the physical controller, which may not entirely be the case, thus
> consideration would be needed on the way to limit the capabilities of
> the VF.
> 
> NVMe subsystem is required for the use of SR-IOV.
> 
> Signed-off-by: Lukasz Maniak 
> ---
>  hw/nvme/ctrl.c   | 85 ++--
>  hw/nvme/nvme.h   |  3 +-
>  include/hw/pci/pci_ids.h |  1 +
>  3 files changed, 85 insertions(+), 4 deletions(-)
> 

LGTM.

Reviewed-by: Klaus Jensen 


signature.asc
Description: PGP signature


Re: [PATCH v5 15/15] hw/nvme: Update the initalization place for the AER queue

2022-02-17 Thread Klaus Jensen
On Feb 17 18:45, Lukasz Maniak wrote:
> From: Łukasz Gieryk 
> 
> This patch updates the initialization place for the AER queue, so it’s
> initialized once, at controller initialization, and not every time
> controller is enabled.
> 
> While the original version works for a non-SR-IOV device, as it’s hard
> to interact with the controller if it’s not enabled, the multiple
> reinitialization is not necessarily correct.
> 
> With the SR/IOV feature enabled a segfault can happen: a VF can have its
> controller disabled, while a namespace can still be attached to the
> controller through the parent PF. An event generated in such case ends
> up on an uninitialized queue.
> 
> While it’s an interesting question whether a VF should support AER in
> the first place, I don’t think it must be answered today.
> 
> Signed-off-by: Łukasz Gieryk 

Looks good.

Reviewed-by: Klaus Jensen 


signature.asc
Description: PGP signature


[PATCH v3] migration/rdma: set the REUSEADDR option for destination

2022-02-17 Thread Jack Wang
We hit following error during testing RDMA transport:
in case of migration error, mgmt daemon pick one migration port,
incoming rdma:[::]:8089: RDMA ERROR: Error: could not rdma_bind_addr

Then try another -incoming rdma:[::]:8103, sometime it worked,
sometimes need another try with other ports number.

Set the REUSEADDR option for destination, This allow address could
be reused to avoid rdma_bind_addr error out.

Signed-off-by: Jack Wang 
Reviewed-by: Pankaj Gupta 

---
v3: add reviewed-by tags from David and Pankaj.
v2: extend commit message as discussed with Pankaj and David
---
 migration/rdma.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/migration/rdma.c b/migration/rdma.c
index c7c7a384875b..663e1fbb096d 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2705,6 +2705,7 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error 
**errp)
 char ip[40] = "unknown";
 struct rdma_addrinfo *res, *e;
 char port_str[16];
+int reuse = 1;
 
 for (idx = 0; idx < RDMA_WRID_MAX; idx++) {
 rdma->wr_data[idx].control_len = 0;
@@ -2740,6 +2741,12 @@ static int qemu_rdma_dest_init(RDMAContext *rdma, Error 
**errp)
 goto err_dest_init_bind_addr;
 }
 
+ret = rdma_set_option(listen_id, RDMA_OPTION_ID, RDMA_OPTION_ID_REUSEADDR,
+ , sizeof reuse);
+if (ret) {
+ERROR(errp, "Error: could not set REUSEADDR option");
+goto err_dest_init_bind_addr;
+}
 for (e = res; e != NULL; e = e->ai_next) {
 inet_ntop(e->ai_family,
 &((struct sockaddr_in *) e->ai_dst_addr)->sin_addr, ip, sizeof ip);
-- 
2.25.1




[PATCH v2] hw: riscv: opentitan: fixup SPI addresses

2022-02-17 Thread Alistair Francis
From: Wilfred Mallawa 

This patch updates the SPI_DEVICE, SPI_HOST0, SPI_HOST1
base addresses. Also adds these as unimplemented devices.

The address references can be found [1].

[1] 
https://github.com/lowRISC/opentitan/blob/6c317992fbd646818b34f2a2dbf44bc850e461e4/hw/top_earlgrey/sw/autogen/top_earlgrey_memory.h#L107

Signed-off-by: Wilfred Mallawa 
Reviewed-by: Alistair Francis 
---
v2: arranged base addrs in sorted order

 hw/riscv/opentitan.c | 12 +---
 include/hw/riscv/opentitan.h |  4 +++-
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/riscv/opentitan.c b/hw/riscv/opentitan.c
index aec7cfa33f..833624d66c 100644
--- a/hw/riscv/opentitan.c
+++ b/hw/riscv/opentitan.c
@@ -34,13 +34,15 @@ static const MemMapEntry ibex_memmap[] = {
 [IBEX_DEV_FLASH] =  {  0x2000,  0x8 },
 [IBEX_DEV_UART] =   {  0x4000,  0x1000  },
 [IBEX_DEV_GPIO] =   {  0x4004,  0x1000  },
-[IBEX_DEV_SPI] ={  0x4005,  0x1000  },
+[IBEX_DEV_SPI_DEVICE] = {  0x4005,  0x1000  },
 [IBEX_DEV_I2C] ={  0x4008,  0x1000  },
 [IBEX_DEV_PATTGEN] ={  0x400e,  0x1000  },
 [IBEX_DEV_TIMER] =  {  0x4010,  0x1000  },
 [IBEX_DEV_SENSOR_CTRL] ={  0x4011,  0x1000  },
 [IBEX_DEV_OTP_CTRL] =   {  0x4013,  0x4000  },
 [IBEX_DEV_USBDEV] = {  0x4015,  0x1000  },
+[IBEX_DEV_SPI_HOST0] =  {  0x4030,  0x1000  },
+[IBEX_DEV_SPI_HOST1] =  {  0x4031,  0x1000  },
 [IBEX_DEV_PWRMGR] = {  0x4040,  0x1000  },
 [IBEX_DEV_RSTMGR] = {  0x4041,  0x1000  },
 [IBEX_DEV_CLKMGR] = {  0x4042,  0x1000  },
@@ -209,8 +211,12 @@ static void lowrisc_ibex_soc_realize(DeviceState *dev_soc, 
Error **errp)
 
 create_unimplemented_device("riscv.lowrisc.ibex.gpio",
 memmap[IBEX_DEV_GPIO].base, memmap[IBEX_DEV_GPIO].size);
-create_unimplemented_device("riscv.lowrisc.ibex.spi",
-memmap[IBEX_DEV_SPI].base, memmap[IBEX_DEV_SPI].size);
+create_unimplemented_device("riscv.lowrisc.ibex.spi_device",
+memmap[IBEX_DEV_SPI_DEVICE].base, memmap[IBEX_DEV_SPI_DEVICE].size);
+create_unimplemented_device("riscv.lowrisc.ibex.spi_host0",
+memmap[IBEX_DEV_SPI_HOST0].base, memmap[IBEX_DEV_SPI_HOST0].size);
+create_unimplemented_device("riscv.lowrisc.ibex.spi_host1",
+memmap[IBEX_DEV_SPI_HOST1].base, memmap[IBEX_DEV_SPI_HOST1].size);
 create_unimplemented_device("riscv.lowrisc.ibex.i2c",
 memmap[IBEX_DEV_I2C].base, memmap[IBEX_DEV_I2C].size);
 create_unimplemented_device("riscv.lowrisc.ibex.pattgen",
diff --git a/include/hw/riscv/opentitan.h b/include/hw/riscv/opentitan.h
index eac35ef590..00da9ded43 100644
--- a/include/hw/riscv/opentitan.h
+++ b/include/hw/riscv/opentitan.h
@@ -57,8 +57,10 @@ enum {
 IBEX_DEV_FLASH,
 IBEX_DEV_FLASH_VIRTUAL,
 IBEX_DEV_UART,
+IBEX_DEV_SPI_DEVICE,
+IBEX_DEV_SPI_HOST0,
+IBEX_DEV_SPI_HOST1,
 IBEX_DEV_GPIO,
-IBEX_DEV_SPI,
 IBEX_DEV_I2C,
 IBEX_DEV_PATTGEN,
 IBEX_DEV_TIMER,
-- 
2.35.1




Re: [PATCH v2 2/3] hw/smbios: fix table memory corruption with large memory vms

2022-02-17 Thread Ani Sinha
On Mon, Feb 14, 2022 at 6:21 PM Igor Mammedov  wrote:
>
> On Mon,  7 Feb 2022 17:01:28 +0530
> Ani Sinha  wrote:
>
> > With the current smbios table assignment code, we can have only 512 DIMM 
> > slots
> it's a bit confusing, since it's not DIMM slots in QEMU sense (we do not 
> expose
> DIMM devices via SMBIOS/E820). So maybe clarify here that initial RAM is split
> into 16GB (with 'DIMM' type ) chunks/entries when it's described in SMBIOS 
> table 17.
>
> > (each DIMM of 16 GiB in size) before tables 17 and 19 conflict with their
> > addresses.
>
> Are you sure it's addresses that are wrong?

I don't know why I had this pre conception of memory corruption and
overlapping addresses! Even the BZ says table handles overlap. Grr ...
doing too much multiplexing these days :(



Re: [PATCH v2 00/15] target/arm: Implement LVA, LPA, LPA2 features

2022-02-17 Thread Richard Henderson

On 2/18/22 04:37, Alex Bennée wrote:


Peter Maydell  writes:


On Thu, 10 Feb 2022 at 04:04, Richard Henderson
 wrote:


Changes for v2:
   * Introduce FIELD_SEX64, instead of open-coding w/ sextract64.
   * Set TCR_EL1 more completely for user-only.
   * Continue to bound tsz within aa64_va_parameters;
 provide an out-of-bound indicator for raising AddressSize fault.
   * Split IPS patch.
   * Fix debug registers for LVA.
   * Fix long-format fsc for LPA2.
   * Fix TLBI page shift.
   * Validate TLBI granule vs TCR granule.

Not done:
   * Validate translation levels which accept blocks.

There is still no upstream kernel support for FEAT_LPA2,
so that is essentially untested.


This series seems to break 'make check-acceptance':

  (01/59) tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv2:
INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
Timeout reached\nOriginal status: ERROR\n{'name':
'01-tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv2',
'logdir': 
'/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/results/j...
(900.74 s)
  (02/59) tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv3:
INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
Timeout reached\nOriginal status: ERROR\n{'name':
'02-tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv3',
'logdir': 
'/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/results/j...
(900.71 s)

UEFI runs in the guest and seems to launch the kernel, but there's
no output from the kernel itself in the logfile. Last thing it
prints is:

EFI stub: Booting Linux Kernel...
EFI stub: EFI_RNG_PROTOCOL unavailable, no randomness supplied
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services and installing virtual address map...
SetUefiImageMemoryAttributes - 0x7F50 - 0x0004
(0x0008)
SetUefiImageMemoryAttributes - 0x7C19 - 0x0004
(0x0008)
SetUefiImageMemoryAttributes - 0x7C14 - 0x0004
(0x0008)
SetUefiImageMemoryAttributes - 0x7F4C - 0x0003
(0x0008)
SetUefiImageMemoryAttributes - 0x7C0F - 0x0004
(0x0008)
SetUefiImageMemoryAttributes - 0x7BFB - 0x0004
(0x0008)
SetUefiImageMemoryAttributes - 0x7BE0 - 0x0003
(0x0008)
SetUefiImageMemoryAttributes - 0x7BDC - 0x0003
(0x0008)

This ought to be followed by the usual kernel boot log
[0.00] Booting Linux on physical CPU 0x00 [0x000f0510]
etc but it isn't. Probably the kernel is crashing in early bootup
before it gets round to printing anything.


As this test runs under -cpu max it is likely exercising the new
features (and failing).


I would have thought so too.  However...

I've bisected this to the final LPA2 patch.  I have not tracked down what exactly is going 
on with this, but it's definitely not the guest exercising the new feature -- there is no 
upstream support for LPA2.


I'll keep looking.


r~




Re: [PATCH v6 01/19] configure, meson: override C compiler for cmake

2022-02-17 Thread Jag Raman


> On Feb 17, 2022, at 7:09 AM, Peter Maydell  wrote:
> 
> On Thu, 17 Feb 2022 at 07:56, Jagannathan Raman  wrote:
>> 
>> The compiler path that cmake gets from meson is corrupted. It results in
>> the following error:
>> | -- The C compiler identification is unknown
>> | CMake Error at CMakeLists.txt:35 (project):
>> | The CMAKE_C_COMPILER:
>> | /opt/rh/devtoolset-9/root/bin/cc;-m64;-mcx16
>> | is not a full path to an existing compiler tool.
>> 
>> Explicitly specify the C compiler for cmake to avoid this error
> 
> This sounds like a bug in Meson. Is there a Meson bug report
> we can reference in the commit message here ?

Hi Peter,

This issue reproduces with the latest meson [1] also.

I noticed the following about the “binaries” section [2]. The manual
says meson could pass the values in this section to find_program [3].
As such I’m wondering if it’s OK to set compiler flags in this section
because find_program doesn’t seem to accept any compiler flags.

The compiler flags could be set in the “built-in options” section using
options such as “c_args”, “cpp_args” and “objc_args” [4]. When I
moved CPU_CFLAGS from the binaries section to the built-in-options
section in “configure", I don’t see the issue anymore. 

[1]: https://github.com/mesonbuild/meson.git
[2]: https://mesonbuild.com/Machine-files.html#binaries
[3]: https://cmake.org/cmake/help/latest/command/find_program.html
[4]: 
https://github.com/mesonbuild/meson/blob/master/docs/markdown/Reference-tables.md
 (section “Language arguments parameter names")

Thank you!
--
Jag

> 
> thanks
> -- PMM



Re: [PATCH] migration: NULL transport_data after freeing

2022-02-17 Thread Peter Xu
On Thu, Feb 17, 2022 at 06:04:07PM +0100, Hanna Reitz wrote:
> migration_incoming_state_destroy() NULLs all objects it frees after they
> are freed, presumably so that a subsequent call to the same function
> will not free them again, unless new objects have been created in the
> meantime.
> 
> transport_data is the exception, and it shows exactly this problem: When
> an incoming migration uses transport_cleanup() and transport_data, and a
> subsequent incoming migration (e.g. loadvm) occurs that does not, then
> when this second one is done, it will call transport_cleanup() on the
> old transport_data again -- which has already been freed.  This is
> sometimes visible in the iotest 201, though for some reason I can only
> reproduce it with -m32.
> 
> To fix this, call transport_cleanup() only when transport_data is not
> NULL (otherwise there is nothing to clean up), and set transport_data to
> NULL when it has been cleaned up (i.e. freed).
> 
> (transport_cleanup() is used only by migration/socket.c, where
> socket_start_incoming_migration_internal() sets both it and
> transport_data to non-NULL values.)
> 
> Signed-off-by: Hanna Reitz 

I had a similar fix here:

https://lore.kernel.org/qemu-devel/20220216062809.57179-15-pet...@redhat.com/

Though there it was because I need migration_incoming_transport_cleanup()
for other purposes, so the fix came along.

My guess is this small fix will land earlier, if so I'll rebase. :)

Thanks,

-- 
Peter Xu




Re: [PATCH v2 4/8] configure: Disable out-of-line atomic operations on Aarch64

2022-02-17 Thread Richard Henderson

On 2/17/22 04:18, Philippe Mathieu-Daudé wrote:

On 16/2/22 17:42, Akihiko Odaki wrote:

On 2022/02/17 0:08, Philippe Mathieu-Daudé wrote:

On 16/2/22 11:19, Richard Henderson wrote:
These should have been supplied by libgcc.a, which we're supposed to be linking 
against. Something is wrong with your installation.


I don't have gobjc/g++ installed, so ./configure defaulted to Clang to
compile these languages, but compiled C files using GCC. At the end the
Clang linker is used (the default c++ symlink).


This is another form of compiler mis-configuration.
If you don't have g++ to go with gcc, use --cxx=false to avoid picking up a different 
compiler.



Could there be a mismatch between Clang (-mno-outline-atomics) and GCC
(-moutline-atomics)?


I have no idea if those options do the same thing.

I think you have to instruct Clang to use libgcc instead of compiler-rt and link the 
objects with GCC. Here is the documentation of Clang about the runtime I could find:

https://clang.llvm.org/docs/Toolchain.html#libgcc-s-gnu


Thanks for the pointer. And the next section is
https://clang.llvm.org/docs/Toolchain.html#atomics-library :)

   Clang does not currently automatically link against libatomic when
    using libgcc_s. You may need to manually add -latomic to support
   this configuration when using non-native atomic operations (if you
   see link errors referring to __atomic_* functions).

I'll try that.


-moutline-atomics is *not* the same as libatomic.
You should not need libatomic at all.


r~



Re: [PATCH] configure: Support empty prefixes

2022-02-17 Thread Paolo Bonzini

On 2/17/22 19:42, Joshua Seaton wrote:

At least as of v5 (before the meson build), empty `--prefix` values were
supported; this seems to have fallen out along the way. This change
reintroduces support.


What is the usecase exactly?  QEMU supports relocatable installation so 
if you want you can use --prefix=/nonexistent and then move the 
resulting tree wherever you want.


Paolo


Tested locally with empty and non-empty values of `--prefix`.

Signed-off-by: Joshua Seaton 
---
  configure | 33 -
  1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/configure b/configure
index 3a29eff5cc..87a32e52e4 100755
--- a/configure
+++ b/configure
@@ -1229,20 +1229,30 @@ case $git_submodules_action in
  ;;
  esac

-libdir="${libdir:-$prefix/lib}"
-libexecdir="${libexecdir:-$prefix/libexec}"
-includedir="${includedir:-$prefix/include}"
+# Emits a relative path in the case of an empty prefix.
+prefix_subdir() {
+dir="$1"
+if test -z "$prefix" ; then
+echo "$dir"
+else
+echo "$prefix/$dir"
+fi
+}
+
+libdir="${libdir:-$(prefix_subdir lib)}"
+libexecdir="${libexecdir:-$(prefix_subdir libexec)}"
+includedir="${includedir:-$(prefix_subdir include)}"

  if test "$mingw32" = "yes" ; then
  bindir="${bindir:-$prefix}"
  else
-bindir="${bindir:-$prefix/bin}"
+bindir="${bindir:-$(prefix_subdir bin)}"
  fi
-mandir="${mandir:-$prefix/share/man}"
-datadir="${datadir:-$prefix/share}"
-docdir="${docdir:-$prefix/share/doc}"
-sysconfdir="${sysconfdir:-$prefix/etc}"
-local_statedir="${local_statedir:-$prefix/var}"
+mandir="${mandir:-$(prefix_subdir share/man)}"
+datadir="${datadir:-$(prefix_subdir share)}"
+docdir="${docdir:-$(prefix_subdir share/doc)}"
+sysconfdir="${sysconfdir:-$(prefix_subdir etc)}"
+local_statedir="${local_statedir:-$(prefix_subdir var)}"
  firmwarepath="${firmwarepath:-$datadir/qemu-firmware}"
  localedir="${localedir:-$datadir/locale}"

@@ -3763,6 +3773,11 @@ if test "$skip_meson" = no; then
mv $cross config-meson.cross

rm -rf meson-private meson-info meson-logs
+
+  # Workaround for a meson bug preventing empty prefixes:
+  # see https://github.com/mesonbuild/meson/issues/6946.
+  prefix="${prefix:-/}"
+
run_meson() {
  NINJA=$ninja $meson setup \
  --prefix "$prefix" \
--
2.35.1.265.g69c8d7142f-goog






[PATCH v6 4/4] tests/tcg/s390x: changed to using .insn for tests requiring z15

2022-02-17 Thread David Miller
Signed-off-by: David Miller 
---
 tests/tcg/s390x/mie3-compl.c | 21 +++--
 tests/tcg/s390x/mie3-mvcrl.c |  2 +-
 tests/tcg/s390x/mie3-sel.c   |  6 +++---
 3 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/tests/tcg/s390x/mie3-compl.c b/tests/tcg/s390x/mie3-compl.c
index 98281ee683..31820e4a2a 100644
--- a/tests/tcg/s390x/mie3-compl.c
+++ b/tests/tcg/s390x/mie3-compl.c
@@ -14,25 +14,26 @@
 #define FbinOp(S, ASM) uint64_t S(uint64_t a, uint64_t b) \
 { uint64_t res = 0; F_PRO; ASM; return res; }
 
+
 /* AND WITH COMPLEMENT */
-FbinOp(_ncrk,  asm("ncrk  %%r0, %%r3, %%r2\n" F_EPI))
-FbinOp(_ncgrk, asm("ncgrk %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_ncrk,  asm(".insn rrf, 0xB9F5, %%r0, %%r3, %%r2, 0\n" F_EPI))
+FbinOp(_ncgrk, asm(".insn rrf, 0xB9E5, %%r0, %%r3, %%r2, 0\n" F_EPI))
 
 /* NAND */
-FbinOp(_nnrk,  asm("nnrk  %%r0, %%r3, %%r2\n" F_EPI))
-FbinOp(_nngrk, asm("nngrk %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_nnrk,  asm(".insn rrf, 0xB974, %%r0, %%r3, %%r2, 0\n" F_EPI))
+FbinOp(_nngrk, asm(".insn rrf, 0xB964, %%r0, %%r3, %%r2, 0\n" F_EPI))
 
 /* NOT XOR */
-FbinOp(_nxrk,  asm("nxrk  %%r0, %%r3, %%r2\n" F_EPI))
-FbinOp(_nxgrk, asm("nxgrk %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_nxrk,  asm(".insn rrf, 0xB977, %%r0, %%r3, %%r2, 0\n" F_EPI))
+FbinOp(_nxgrk, asm(".insn rrf, 0xB967, %%r0, %%r3, %%r2, 0\n" F_EPI))
 
 /* NOR */
-FbinOp(_nork,  asm("nork  %%r0, %%r3, %%r2\n" F_EPI))
-FbinOp(_nogrk, asm("nogrk %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_nork,  asm(".insn rrf, 0xB976, %%r0, %%r3, %%r2, 0\n" F_EPI))
+FbinOp(_nogrk, asm(".insn rrf, 0xB966, %%r0, %%r3, %%r2, 0\n" F_EPI))
 
 /* OR WITH COMPLEMENT */
-FbinOp(_ocrk,  asm("ocrk  %%r0, %%r3, %%r2\n" F_EPI))
-FbinOp(_ocgrk, asm("ocgrk %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_ocrk,  asm(".insn rrf, 0xB975, %%r0, %%r3, %%r2, 0\n" F_EPI))
+FbinOp(_ocgrk, asm(".insn rrf, 0xB965, %%r0, %%r3, %%r2, 0\n" F_EPI))
 
 
 int main(int argc, char *argv[])
diff --git a/tests/tcg/s390x/mie3-mvcrl.c b/tests/tcg/s390x/mie3-mvcrl.c
index 81cf3ad702..f0be83b197 100644
--- a/tests/tcg/s390x/mie3-mvcrl.c
+++ b/tests/tcg/s390x/mie3-mvcrl.c
@@ -6,7 +6,7 @@ static inline void mvcrl_8(const char *dst, const char *src)
 {
 asm volatile (
 "llill %%r0, 8\n"
-"mvcrl 0(%[dst]), 0(%[src])\n"
+".insn sse, 0xE50A, 0(%[dst]), 0(%[src])"
 : : [dst] "d" (dst), [src] "d" (src)
 : "memory");
 }
diff --git a/tests/tcg/s390x/mie3-sel.c b/tests/tcg/s390x/mie3-sel.c
index d6b7b0933b..32d434b01a 100644
--- a/tests/tcg/s390x/mie3-sel.c
+++ b/tests/tcg/s390x/mie3-sel.c
@@ -19,9 +19,9 @@
 { uint64_t res = 0; F_PRO ; ASM ; return res; }
 
 
-Fi3 (_selre, asm("selre%%r0, %%r3, %%r2\n" F_EPI))
-Fi3 (_selgrz,asm("selgrz   %%r0, %%r3, %%r2\n" F_EPI))
-Fi3 (_selfhrnz,  asm("selfhrnz %%r0, %%r3, %%r2\n" F_EPI))
+Fi3 (_selre, asm(".insn rrf, 0xB9F0, %%r0, %%r3, %%r2, 8\n" F_EPI))
+Fi3 (_selgrz,asm(".insn rrf, 0xB9E3, %%r0, %%r3, %%r2, 8\n" F_EPI))
+Fi3 (_selfhrnz,  asm(".insn rrf, 0xB9C0, %%r0, %%r3, %%r2, 7\n" F_EPI))
 
 
 int main(int argc, char *argv[])
-- 
2.32.0




[PATCH v6 3/4] tests/tcg/s390x: Tests for Miscellaneous-Instruction-Extensions Facility 3

2022-02-17 Thread David Miller
tests/tcg/s390x/mie3-compl.c: [N]*K instructions
tests/tcg/s390x/mie3-mvcrl.c: MVCRL instruction
tests/tcg/s390x/mie3-sel.c:  SELECT instruction

Signed-off-by: David Miller 
---
 tests/tcg/s390x/Makefile.target |  5 ++-
 tests/tcg/s390x/mie3-compl.c| 55 +
 tests/tcg/s390x/mie3-mvcrl.c| 31 +++
 tests/tcg/s390x/mie3-sel.c  | 42 +
 4 files changed, 132 insertions(+), 1 deletion(-)
 create mode 100644 tests/tcg/s390x/mie3-compl.c
 create mode 100644 tests/tcg/s390x/mie3-mvcrl.c
 create mode 100644 tests/tcg/s390x/mie3-sel.c

diff --git a/tests/tcg/s390x/Makefile.target b/tests/tcg/s390x/Makefile.target
index 1a7238b4eb..54e67446aa 100644
--- a/tests/tcg/s390x/Makefile.target
+++ b/tests/tcg/s390x/Makefile.target
@@ -1,12 +1,15 @@
 S390X_SRC=$(SRC_PATH)/tests/tcg/s390x
 VPATH+=$(S390X_SRC)
-CFLAGS+=-march=zEC12 -m64
+CFLAGS+=-march=z15 -m64
 TESTS+=hello-s390x
 TESTS+=csst
 TESTS+=ipm
 TESTS+=exrl-trt
 TESTS+=exrl-trtr
 TESTS+=pack
+TESTS+=mie3-compl
+TESTS+=mie3-mvcrl
+TESTS+=mie3-sel
 TESTS+=mvo
 TESTS+=mvc
 TESTS+=shift
diff --git a/tests/tcg/s390x/mie3-compl.c b/tests/tcg/s390x/mie3-compl.c
new file mode 100644
index 00..98281ee683
--- /dev/null
+++ b/tests/tcg/s390x/mie3-compl.c
@@ -0,0 +1,55 @@
+#include 
+
+
+#define F_EPI "stg %%r0, %[res] " : [res] "+m" (res) : : "r0", "r2", "r3"
+
+#define F_PROasm ( \
+"llihf %%r0,801\n" \
+"lg %%r2, %[a]\n"  \
+"lg %%r3, %[b] "   \
+: : [a] "m" (a),   \
+[b] "m" (b)\
+: "r2", "r3")
+
+#define FbinOp(S, ASM) uint64_t S(uint64_t a, uint64_t b) \
+{ uint64_t res = 0; F_PRO; ASM; return res; }
+
+/* AND WITH COMPLEMENT */
+FbinOp(_ncrk,  asm("ncrk  %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_ncgrk, asm("ncgrk %%r0, %%r3, %%r2\n" F_EPI))
+
+/* NAND */
+FbinOp(_nnrk,  asm("nnrk  %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_nngrk, asm("nngrk %%r0, %%r3, %%r2\n" F_EPI))
+
+/* NOT XOR */
+FbinOp(_nxrk,  asm("nxrk  %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_nxgrk, asm("nxgrk %%r0, %%r3, %%r2\n" F_EPI))
+
+/* NOR */
+FbinOp(_nork,  asm("nork  %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_nogrk, asm("nogrk %%r0, %%r3, %%r2\n" F_EPI))
+
+/* OR WITH COMPLEMENT */
+FbinOp(_ocrk,  asm("ocrk  %%r0, %%r3, %%r2\n" F_EPI))
+FbinOp(_ocgrk, asm("ocgrk %%r0, %%r3, %%r2\n" F_EPI))
+
+
+int main(int argc, char *argv[])
+{
+if (_ncrk(0xFF88, 0xAA11)  != 0x03210011ull ||
+_nnrk(0xFF88, 0xAA11)  != 0x032155FFull ||
+_nork(0xFF88, 0xAA11)  != 0x03210066ull ||
+_nxrk(0xFF88, 0xAA11)  != 0x0321AA66ull ||
+_ocrk(0xFF88, 0xAA11)  != 0x0321AA77ull ||
+_ncgrk(0xFF88, 0xAA11) != 0x0011ull ||
+_nngrk(0xFF88, 0xAA11) != 0x55FFull ||
+_nogrk(0xFF88, 0xAA11) != 0x0066ull ||
+_nxgrk(0xFF88, 0xAA11) != 0xAA66ull ||
+_ocgrk(0xFF88, 0xAA11) != 0xAA77ull)
+{
+return 1;
+}
+
+return 0;
+}
diff --git a/tests/tcg/s390x/mie3-mvcrl.c b/tests/tcg/s390x/mie3-mvcrl.c
new file mode 100644
index 00..81cf3ad702
--- /dev/null
+++ b/tests/tcg/s390x/mie3-mvcrl.c
@@ -0,0 +1,31 @@
+#include 
+#include 
+
+
+static inline void mvcrl_8(const char *dst, const char *src)
+{
+asm volatile (
+"llill %%r0, 8\n"
+"mvcrl 0(%[dst]), 0(%[src])\n"
+: : [dst] "d" (dst), [src] "d" (src)
+: "memory");
+}
+
+
+int main(int argc, char *argv[])
+{
+const char *alpha = "abcdefghijklmnop";
+
+/* array missing 'i' */
+char tstr[17] = "abcdefghjklmnop\0" ;
+
+/* mvcrl reference use: 'open a hole in an array' */
+mvcrl_8(tstr + 9, tstr + 8);
+
+/* place missing 'i' */
+tstr[8] = 'i';
+
+return strncmp(alpha, tstr, 16ul);
+}
+
+
diff --git a/tests/tcg/s390x/mie3-sel.c b/tests/tcg/s390x/mie3-sel.c
new file mode 100644
index 00..d6b7b0933b
--- /dev/null
+++ b/tests/tcg/s390x/mie3-sel.c
@@ -0,0 +1,42 @@
+#include 
+
+
+#define F_EPI "stg %%r0, %[res] " : [res] "+m" (res) : : "r0", "r2", "r3"
+
+#define F_PROasm ( \
+"lg %%r2, %[a]\n"  \
+"lg %%r3, %[b]\n"  \
+"lg %%r0, %[c]\n"  \
+"ltgr %%r0, %%r0"  \
+: : [a] "m" (a),   \
+[b] "m" (b),   \
+[c] "m" (c)\
+: "r0", "r2", "r3", "r4")
+
+
+
+#define Fi3(S, ASM) uint64_t S(uint64_t a, uint64_t b, uint64_t c) \
+{ uint64_t res = 0; F_PRO ; ASM ; return res; }
+
+
+Fi3 (_selre, asm("selre%%r0, %%r3, %%r2\n" F_EPI))
+Fi3 (_selgrz,asm("selgrz   %%r0, %%r3, %%r2\n" F_EPI))
+Fi3 (_selfhrnz,  asm("selfhrnz %%r0, %%r3, %%r2\n" F_EPI))
+
+
+int main(int argc, char *argv[])
+{
+uint64_t a = ~0, b = ~0, c = ~0;
+a =_selre(0x06660066ull, 0x06660006ull, a);
+b =   _selgrz(0xF00D0005ull, 0xF00D0055ull, b);
+c = _selfhrnz(0x00440044ull, 0x00040004ull, c);
+
+if ((0x0066ull != a) ||
+(0xF00D0005ull != b) ||
+

[PATCH v6 1/4] s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3 for the s390x

2022-02-17 Thread David Miller
resolves: https://gitlab.com/qemu-project/qemu/-/issues/737
implements:
AND WITH COMPLEMENT   (NCRK, NCGRK)
NAND  (NNRK, NNGRK)
NOT EXCLUSIVE OR  (NXRK, NXGRK)
NOR   (NORK, NOGRK)
OR WITH COMPLEMENT(OCRK, OCGRK)
SELECT(SELR, SELGR)
SELECT HIGH   (SELFHR)
MOVE RIGHT TO LEFT(MVCRL)
POPULATION COUNT  (POPCNT)

Signed-off-by: David Miller 
---
 target/s390x/gen-features.c|  1 +
 target/s390x/helper.h  |  1 +
 target/s390x/tcg/insn-data.def | 30 +--
 target/s390x/tcg/mem_helper.c  | 20 +
 target/s390x/tcg/translate.c   | 53 --
 5 files changed, 100 insertions(+), 5 deletions(-)

diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index 7cb1a6ec10..a3f30f69d9 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -740,6 +740,7 @@ static uint16_t qemu_LATEST[] = {
 
 /* add all new definitions before this point */
 static uint16_t qemu_MAX[] = {
+S390_FEAT_MISC_INSTRUCTION_EXT3,
 /* generates a dependency warning, leave it out for now */
 S390_FEAT_MSA_EXT_5,
 };
diff --git a/target/s390x/helper.h b/target/s390x/helper.h
index 271b081e8c..69f69cf718 100644
--- a/target/s390x/helper.h
+++ b/target/s390x/helper.h
@@ -4,6 +4,7 @@ DEF_HELPER_FLAGS_4(nc, TCG_CALL_NO_WG, i32, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(oc, TCG_CALL_NO_WG, i32, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(xc, TCG_CALL_NO_WG, i32, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(mvc, TCG_CALL_NO_WG, void, env, i32, i64, i64)
+DEF_HELPER_FLAGS_4(mvcrl, TCG_CALL_NO_WG, void, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(mvcin, TCG_CALL_NO_WG, void, env, i32, i64, i64)
 DEF_HELPER_FLAGS_4(clc, TCG_CALL_NO_WG, i32, env, i32, i64, i64)
 DEF_HELPER_3(mvcl, i32, env, i32, i32)
diff --git a/target/s390x/tcg/insn-data.def b/target/s390x/tcg/insn-data.def
index 1c3e115712..3e51cd7c6d 100644
--- a/target/s390x/tcg/insn-data.def
+++ b/target/s390x/tcg/insn-data.def
@@ -105,6 +105,9 @@
 D(0xa507, NILL,RI_a,  Z,   r1_o, i2_16u, r1, 0, andi, 0, 0x1000)
 D(0x9400, NI,  SI,Z,   la1, i2_8u, new, 0, ni, nz64, MO_UB)
 D(0xeb54, NIY, SIY,   LD,  la1, i2_8u, new, 0, ni, nz64, MO_UB)
+/* AND WITH COMPLEMENT */
+C(0xb9f5, NCRK,RRF_a, MIE3, r2, r3, new, r1_32, andc, nz32)
+C(0xb9e5, NCGRK,   RRF_a, MIE3, r2, r3, r1, 0, andc, nz64)
 
 /* BRANCH AND LINK */
 C(0x0500, BALR,RR_a,  Z,   0, r2_nz, r1, 0, bal, 0)
@@ -640,6 +643,8 @@
 C(0xeb8e, MVCLU,   RSY_a, E2,  0, a2, 0, 0, mvclu, 0)
 /* MOVE NUMERICS */
 C(0xd100, MVN, SS_a,  Z,   la1, a2, 0, 0, mvn, 0)
+/* MOVE RIGHT TO LEFT */
+C(0xe50a, MVCRL,   SSE,  MIE3, la1, a2, 0, 0, mvcrl, 0)
 /* MOVE PAGE */
 C(0xb254, MVPG,RRE,   Z,   0, 0, 0, 0, mvpg, 0)
 /* MOVE STRING */
@@ -707,6 +712,16 @@
 F(0xed0f, MSEB,RXF,   Z,   e1, m2_32u, new, e1, mseb, 0, IF_BFP)
 F(0xed1f, MSDB,RXF,   Z,   f1, m2_64, new, f1, msdb, 0, IF_BFP)
 
+/* NAND */
+C(0xb974, NNRK,RRF_a, MIE3, r2, r3, new, r1_32, nand, nz32)
+C(0xb964, NNGRK,   RRF_a, MIE3, r2, r3, r1, 0, nand, nz64)
+/* NOR */
+C(0xb976, NORK,RRF_a, MIE3, r2, r3, new, r1_32, nor, nz32)
+C(0xb966, NOGRK,   RRF_a, MIE3, r2, r3, r1, 0, nor, nz64)
+/* NOT EXCLUSIVE OR */
+C(0xb977, NXRK,RRF_a, MIE3, r2, r3, new, r1_32, nxor, nz32)
+C(0xb967, NXGRK,   RRF_a, MIE3, r2, r3, r1, 0, nxor, nz64)
+
 /* OR */
 C(0x1600, OR,  RR_a,  Z,   r1, r2, new, r1_32, or, nz32)
 C(0xb9f6, ORK, RRF_a, DO,  r2, r3, new, r1_32, or, nz32)
@@ -725,6 +740,9 @@
 D(0xa50b, OILL,RI_a,  Z,   r1_o, i2_16u, r1, 0, ori, 0, 0x1000)
 D(0x9600, OI,  SI,Z,   la1, i2_8u, new, 0, oi, nz64, MO_UB)
 D(0xeb56, OIY, SIY,   LD,  la1, i2_8u, new, 0, oi, nz64, MO_UB)
+/* OR WITH COMPLEMENT */
+C(0xb975, OCRK,RRF_a, MIE3, r2, r3, new, r1_32, orc, nz32)
+C(0xb965, OCGRK,   RRF_a, MIE3, r2, r3, r1, 0, orc, nz64)
 
 /* PACK */
 /* Really format SS_b, but we pack both lengths into one argument
@@ -735,6 +753,9 @@
 /* PACK UNICODE */
 C(0xe100, PKU, SS_f,  E2,  la1, a2, 0, 0, pku, 0)
 
+/* POPULATION COUNT */
+C(0xb9e1, POPCNT,  RRF_c, PC,  0, r2_o, r1, 0, popcnt, nz64)
+
 /* PREFETCH */
 /* Implemented as nops of course.  */
 C(0xe336, PFD, RXY_b, GIE, 0, 0, 0, 0, 0, 0)
@@ -743,9 +764,6 @@
 /* Implemented as nop of course.  */
 C(0xb2e8, PPA, RRF_c, PPA, 0, 0, 0, 0, 0, 0)
 
-/* POPULATION COUNT */
-C(0xb9e1, POPCNT,  RRE,   PC,  0, r2_o, r1, 0, popcnt, nz64)
-
 /* ROTATE LEFT SINGLE LOGICAL */
 C(0xeb1d, RLL, RSY_a, Z,   r3_o, sh, new, r1_32, rll32, 0)
 C(0xeb1c, RLLG,RSY_a, Z,   r3_o, sh, r1, 0, rll64, 0)
@@ -765,6 +783,12 @@
 /* SEARCH STRING UNICODE */
 C(0xb9be, SRSTU,   RRE,   ETF3, 0, 0, 0, 0, srstu, 0)
 
+/* SELECT */
+C(0xb9f0, SELR,RRF_a, MIE3, r3, r2, new, r1_32, loc, 0)
+C(0xb9e3, SELGR,   RRF_a, 

[PATCH v6 2/4] s390x/cpumodel: Bump up QEMU model to a stripped-down IBM z15 GA1

2022-02-17 Thread David Miller
TCG implements everything we need to run basic z15 OS+software

Signed-off-by: David Miller 
---
 hw/s390x/s390-virtio-ccw.c  | 3 +++
 target/s390x/cpu_models.c   | 6 +++---
 target/s390x/gen-features.c | 7 +--
 3 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 84e3e63c43..90480e7cf9 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -802,7 +802,10 @@ DEFINE_CCW_MACHINE(7_0, "7.0", true);
 
 static void ccw_machine_6_2_instance_options(MachineState *machine)
 {
+static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V6_2 };
+
 ccw_machine_7_0_instance_options(machine);
+s390_set_qemu_cpu_model(0x3906, 14, 2, qemu_cpu_feat);
 }
 
 static void ccw_machine_6_2_class_options(MachineClass *mc)
diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index 11e06cc51f..89f83e81d5 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -85,9 +85,9 @@ static S390CPUDef s390_cpu_defs[] = {
 CPUDEF_INIT(0x3932, 16, 1, 47, 0x0800U, "gen16b", "IBM 3932 GA1"),
 };
 
-#define QEMU_MAX_CPU_TYPE 0x3906
-#define QEMU_MAX_CPU_GEN 14
-#define QEMU_MAX_CPU_EC_GA 2
+#define QEMU_MAX_CPU_TYPE 0x8561
+#define QEMU_MAX_CPU_GEN 15
+#define QEMU_MAX_CPU_EC_GA 1
 static const S390FeatInit qemu_max_cpu_feat_init = { S390_FEAT_LIST_QEMU_MAX };
 static S390FeatBitmap qemu_max_cpu_feat;
 
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index a3f30f69d9..22846121c4 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -731,16 +731,18 @@ static uint16_t qemu_V6_0[] = {
 S390_FEAT_ESOP,
 };
 
-static uint16_t qemu_LATEST[] = {
+static uint16_t qemu_V6_2[] = {
 S390_FEAT_INSTRUCTION_EXEC_PROT,
 S390_FEAT_MISC_INSTRUCTION_EXT2,
 S390_FEAT_MSA_EXT_8,
 S390_FEAT_VECTOR_ENH,
 };
 
+static uint16_t qemu_LATEST[] = {
+S390_FEAT_MISC_INSTRUCTION_EXT3,
+};
 /* add all new definitions before this point */
 static uint16_t qemu_MAX[] = {
-S390_FEAT_MISC_INSTRUCTION_EXT3,
 /* generates a dependency warning, leave it out for now */
 S390_FEAT_MSA_EXT_5,
 };
@@ -863,6 +865,7 @@ static FeatGroupDefSpec QemuFeatDef[] = {
 QEMU_FEAT_INITIALIZER(V4_0),
 QEMU_FEAT_INITIALIZER(V4_1),
 QEMU_FEAT_INITIALIZER(V6_0),
+QEMU_FEAT_INITIALIZER(V6_2),
 QEMU_FEAT_INITIALIZER(LATEST),
 QEMU_FEAT_INITIALIZER(MAX),
 };
-- 
2.32.0




[PATCH v6 0/4] s390x: Add partial z15 support and tests

2022-02-17 Thread David Miller
Add partial support for s390x z15 ga1 and specific tests for mie3 

v5 -> v6:
* Swap operands for sel* instructions 
* Use .insn in tests for z15 arch instructions

v4 -> v5:
* Readd missing tests/tcg/s390x/mie3-*.c to patch

v3 -> v4:
* Change popcnt encoding RRE -> RRF_c
* Remove redundant code op_sel -> op_loc
* Cleanup for checkpatch.pl
* Readded mie3-* to Makefile.target

v2 -> v3:
* Moved tests to separate patch.
* Combined patches into series.


David Miller (4):
  s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3
for the s390x
  s390x/cpumodel: Bump up QEMU model to a stripped-down IBM z15 GA1
  tests/tcg/s390x: Tests for Miscellaneous-Instruction-Extensions
Facility 3
  tests/tcg/s390x: changed to using .insn for tests requiring z15

 hw/s390x/s390-virtio-ccw.c  |  3 ++
 target/s390x/cpu_models.c   |  6 ++--
 target/s390x/gen-features.c |  6 +++-
 target/s390x/helper.h   |  1 +
 target/s390x/tcg/insn-data.def  | 30 --
 target/s390x/tcg/mem_helper.c   | 20 
 target/s390x/tcg/translate.c| 53 +--
 tests/tcg/s390x/Makefile.target |  5 ++-
 tests/tcg/s390x/mie3-compl.c| 56 +
 tests/tcg/s390x/mie3-mvcrl.c| 31 ++
 tests/tcg/s390x/mie3-sel.c  | 42 +
 11 files changed, 243 insertions(+), 10 deletions(-)
 create mode 100644 tests/tcg/s390x/mie3-compl.c
 create mode 100644 tests/tcg/s390x/mie3-mvcrl.c
 create mode 100644 tests/tcg/s390x/mie3-sel.c

-- 
2.32.0




[PATCH] virtio/virtio-balloon: Prefer Object* over void* parameter

2022-02-17 Thread Bernhard Beschow
*opaque is an alias to *obj. Using the ladder makes the code consistent with
with other devices, e.g. accel/kvm/kvm-all and accel/tcg/tcg-all. It also
makes the cast more typesafe.

Signed-off-by: Bernhard Beschow 
---
 hw/virtio/virtio-balloon.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 9a4f491b54..38732d4118 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -241,7 +241,7 @@ static void balloon_stats_get_all(Object *obj, Visitor *v, 
const char *name,
   void *opaque, Error **errp)
 {
 Error *err = NULL;
-VirtIOBalloon *s = opaque;
+VirtIOBalloon *s = VIRTIO_BALLOON(obj);
 int i;
 
 if (!visit_start_struct(v, name, NULL, 0, )) {
@@ -276,7 +276,7 @@ static void balloon_stats_get_poll_interval(Object *obj, 
Visitor *v,
 const char *name, void *opaque,
 Error **errp)
 {
-VirtIOBalloon *s = opaque;
+VirtIOBalloon *s = VIRTIO_BALLOON(obj);
 visit_type_int(v, name, >stats_poll_interval, errp);
 }
 
@@ -284,7 +284,7 @@ static void balloon_stats_set_poll_interval(Object *obj, 
Visitor *v,
 const char *name, void *opaque,
 Error **errp)
 {
-VirtIOBalloon *s = opaque;
+VirtIOBalloon *s = VIRTIO_BALLOON(obj);
 int64_t value;
 
 if (!visit_type_int(v, name, , errp)) {
@@ -1014,12 +1014,12 @@ static void virtio_balloon_instance_init(Object *obj)
 s->free_page_hint_notify.notify = virtio_balloon_free_page_hint_notify;
 
 object_property_add(obj, "guest-stats", "guest statistics",
-balloon_stats_get_all, NULL, NULL, s);
+balloon_stats_get_all, NULL, NULL, NULL);
 
 object_property_add(obj, "guest-stats-polling-interval", "int",
 balloon_stats_get_poll_interval,
 balloon_stats_set_poll_interval,
-NULL, s);
+NULL, NULL);
 }
 
 static const VMStateDescription vmstate_virtio_balloon = {
-- 
2.35.1




[PATCH 1/2] hw/vfio/pci-quirks: Resolve redundant property getters

2022-02-17 Thread Bernhard Beschow
The QOM API already provides getters for uint64 and uint32 values, so reuse
them.

Signed-off-by: Bernhard Beschow 
---
 hw/vfio/pci-quirks.c | 34 +-
 1 file changed, 9 insertions(+), 25 deletions(-)

diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
index 0cf69a8c6d..f0147a050a 100644
--- a/hw/vfio/pci-quirks.c
+++ b/hw/vfio/pci-quirks.c
@@ -1565,22 +1565,6 @@ static int vfio_add_nv_gpudirect_cap(VFIOPCIDevice 
*vdev, Error **errp)
 return 0;
 }
 
-static void vfio_pci_nvlink2_get_tgt(Object *obj, Visitor *v,
- const char *name,
- void *opaque, Error **errp)
-{
-uint64_t tgt = (uintptr_t) opaque;
-visit_type_uint64(v, name, , errp);
-}
-
-static void vfio_pci_nvlink2_get_link_speed(Object *obj, Visitor *v,
- const char *name,
- void *opaque, Error **errp)
-{
-uint32_t link_speed = (uint32_t)(uintptr_t) opaque;
-visit_type_uint32(v, name, _speed, errp);
-}
-
 int vfio_pci_nvidia_v100_ram_init(VFIOPCIDevice *vdev, Error **errp)
 {
 int ret;
@@ -1618,9 +1602,9 @@ int vfio_pci_nvidia_v100_ram_init(VFIOPCIDevice *vdev, 
Error **errp)
nv2reg->size, p);
 QLIST_INSERT_HEAD(>bars[0].quirks, quirk, next);
 
-object_property_add(OBJECT(vdev), "nvlink2-tgt", "uint64",
-vfio_pci_nvlink2_get_tgt, NULL, NULL,
-(void *) (uintptr_t) cap->tgt);
+object_property_add_uint64_ptr(OBJECT(vdev), "nvlink2-tgt",
+   (uint64_t *) >tgt,
+   OBJ_PROP_FLAG_READ);
 trace_vfio_pci_nvidia_gpu_setup_quirk(vdev->vbasedev.name, cap->tgt,
   nv2reg->size);
 free_exit:
@@ -1679,15 +1663,15 @@ int vfio_pci_nvlink2_init(VFIOPCIDevice *vdev, Error 
**errp)
 QLIST_INSERT_HEAD(>bars[0].quirks, quirk, next);
 }
 
-object_property_add(OBJECT(vdev), "nvlink2-tgt", "uint64",
-vfio_pci_nvlink2_get_tgt, NULL, NULL,
-(void *) (uintptr_t) captgt->tgt);
+object_property_add_uint64_ptr(OBJECT(vdev), "nvlink2-tgt",
+   (uint64_t *) >tgt,
+   OBJ_PROP_FLAG_READ);
 trace_vfio_pci_nvlink2_setup_quirk_ssatgt(vdev->vbasedev.name, captgt->tgt,
   atsdreg->size);
 
-object_property_add(OBJECT(vdev), "nvlink2-link-speed", "uint32",
-vfio_pci_nvlink2_get_link_speed, NULL, NULL,
-(void *) (uintptr_t) capspeed->link_speed);
+object_property_add_uint32_ptr(OBJECT(vdev), "nvlink2-link-speed",
+   >link_speed,
+   OBJ_PROP_FLAG_READ);
 trace_vfio_pci_nvlink2_setup_quirk_lnkspd(vdev->vbasedev.name,
   capspeed->link_speed);
 free_exit:
-- 
2.35.1




[PATCH 0/2] Resolve some redundant property accessors

2022-02-17 Thread Bernhard Beschow
The QOM API already provides appropriate accessors, so reuse them.

Testing done:

  :$ make check
  Ok: 569
  Expected Fail:  0
  Fail:   0
  Unexpected Pass:0
  Skipped:178
  Timeout:0

Bernhard Beschow (2):
  hw/vfio/pci-quirks: Resolve redundant property getters
  hw/riscv/sifive_u: Resolve redundant property accessors

 hw/riscv/sifive_u.c  | 24 
 hw/vfio/pci-quirks.c | 34 +-
 2 files changed, 13 insertions(+), 45 deletions(-)

-- 
2.35.1




[PATCH 2/2] hw/riscv/sifive_u: Resolve redundant property accessors

2022-02-17 Thread Bernhard Beschow
The QOM API already provides accessors for uint32 values, so reuse them.

Signed-off-by: Bernhard Beschow 
---
 hw/riscv/sifive_u.c | 24 
 1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index 7fbc7dea42..747eb4ee89 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -713,36 +713,20 @@ static void sifive_u_machine_set_start_in_flash(Object 
*obj, bool value, Error *
 s->start_in_flash = value;
 }
 
-static void sifive_u_machine_get_uint32_prop(Object *obj, Visitor *v,
- const char *name, void *opaque,
- Error **errp)
-{
-visit_type_uint32(v, name, (uint32_t *)opaque, errp);
-}
-
-static void sifive_u_machine_set_uint32_prop(Object *obj, Visitor *v,
- const char *name, void *opaque,
- Error **errp)
-{
-visit_type_uint32(v, name, (uint32_t *)opaque, errp);
-}
-
 static void sifive_u_machine_instance_init(Object *obj)
 {
 SiFiveUState *s = RISCV_U_MACHINE(obj);
 
 s->start_in_flash = false;
 s->msel = 0;
-object_property_add(obj, "msel", "uint32",
-sifive_u_machine_get_uint32_prop,
-sifive_u_machine_set_uint32_prop, NULL, >msel);
+object_property_add_uint32_ptr(obj, "msel", >msel,
+   OBJ_PROP_FLAG_READWRITE);
 object_property_set_description(obj, "msel",
 "Mode Select (MSEL[3:0]) pin state");
 
 s->serial = OTP_SERIAL;
-object_property_add(obj, "serial", "uint32",
-sifive_u_machine_get_uint32_prop,
-sifive_u_machine_set_uint32_prop, NULL, >serial);
+object_property_add_uint32_ptr(obj, "serial", >serial,
+   OBJ_PROP_FLAG_READWRITE);
 object_property_set_description(obj, "serial", "Board serial number");
 }
 
-- 
2.35.1




Re: [PATCH v12 2/5] target/ppc: make power8-pmu.c CONFIG_TCG only

2022-02-17 Thread Richard Henderson

On 2/16/22 21:10, Daniel Henrique Barboza wrote:

  static void init_tcg_pmu_power8(CPUPPCState *env)
  {
-#if defined(TARGET_PPC64) && !defined(CONFIG_USER_ONLY)
+#if defined(CONFIG_TCG)
  /* Init PMU overflow timers */
  if (!kvm_enabled()) {
  cpu_ppc_pmu_init(env);
@@ -7872,10 +7872,9 @@ static void ppc_cpu_reset(DeviceState *dev)
  if (env->mmu_model != POWERPC_MMU_REAL) {
  ppc_tlb_invalidate_all(env);
  }
+pmu_update_summaries(env);
  #endif /* CONFIG_TCG */
  #endif
-
-pmu_update_summaries(env);


It looks like you could remove all of the ifdefs if you simply use tcg_enabled() rather 
than !kvm_enabled().  If !defined(CONFIG_TCG), tcg_enabled() will be constant false, and 
the block will be optimized away.



r~



Re: [PATCH v4 1/7] hw/mips/gt64xxx_pci: Fix PCI IRQ levels to be preserved during migration

2022-02-17 Thread Bernhard Beschow
Am 17. Februar 2022 10:19:18 UTC schrieb Bernhard Beschow :
>Based on commit e735b55a8c11dd455e31ccd4420e6c9485191d0c:
>
>  piix_pci: eliminate PIIX3State::pci_irq_levels
>
>  PIIX3State::pci_irq_levels are redundant which is already tracked by
>  PCIBus layer. So eliminate them.
>
>The IRQ levels in the PCIBus layer are already preserved during
>migration. By reusing them and rather than having a redundant implementation
>the bug is avoided in the first place.
>
>Suggested-by: Peter Maydell 
>Signed-off-by: Bernhard Beschow 

Copy from v3:

Reviewed-by: Peter Maydell 

>---
> hw/mips/gt64xxx_pci.c | 7 ++-
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
>diff --git a/hw/mips/gt64xxx_pci.c b/hw/mips/gt64xxx_pci.c
>index c7480bd019..4cbd0911f5 100644
>--- a/hw/mips/gt64xxx_pci.c
>+++ b/hw/mips/gt64xxx_pci.c
>@@ -1006,14 +1006,11 @@ static int gt64120_pci_map_irq(PCIDevice *pci_dev, int 
>irq_num)
> }
> }
> 
>-static int pci_irq_levels[4];
>-
> static void gt64120_pci_set_irq(void *opaque, int irq_num, int level)
> {
> int i, pic_irq, pic_level;
> qemu_irq *pic = opaque;
>-
>-pci_irq_levels[irq_num] = level;
>+PCIBus *bus = pci_get_bus(piix4_dev);
> 
> /* now we change the pic irq level according to the piix irq mappings */
> /* XXX: optimize */
>@@ -1023,7 +1020,7 @@ static void gt64120_pci_set_irq(void *opaque, int 
>irq_num, int level)
> pic_level = 0;
> for (i = 0; i < 4; i++) {
> if (pic_irq == piix4_dev->config[PIIX_PIRQCA + i]) {
>-pic_level |= pci_irq_levels[i];
>+pic_level |= pci_bus_get_irq_level(bus, i);
> }
> }
> qemu_set_irq(pic[pic_irq], pic_level);




Re: [PATCH 3/4] hw/openrisc/openrisc_sim; Add support for loading a decice tree

2022-02-17 Thread Stafford Horne
On Thu, Feb 17, 2022 at 06:18:58PM +, Peter Maydell wrote:
> On Thu, 10 Feb 2022 at 06:46, Stafford Horne  wrote:
> >
> > Using the device tree means that qemu can now directly tell
> > the kernel what hardware is configured rather than use having
> > to maintain and update a separate device tree file.
> >
> > This patch adds device tree support for the OpenRISC simulator.
> > A device tree is built up based on the state of the configure
> > openrisc simulator.
> 
> This sounds like it's support for creating a device
> tree? Support for loading a device tree would be "the
> user passes us a filename of a dtb file". (This is mostly a
> quibble about commit message wording.)

Ah, yes I will fix this to say, "adds automatic device tree generation support"


> > -static void openrisc_load_kernel(ram_addr_t ram_size,
> > +static hwaddr openrisc_load_kernel(ram_addr_t ram_size,
> >   const char *kernel_filename)
> 
> Indentation looks off now ?

Fixed now.

> >  {
> >  long kernel_size;
> >  uint64_t elf_entry;
> > +uint64_t high_addr;
> >  hwaddr entry;
> >
> >  if (kernel_filename && !qtest_enabled()) {
> >  kernel_size = load_elf(kernel_filename, NULL, NULL, NULL,
> > -   _entry, NULL, NULL, NULL, 1, 
> > EM_OPENRISC,
> > -   1, 0);
> > +   _entry, NULL, _addr, NULL, 1,
> > +   EM_OPENRISC, 1, 0);
> >  entry = elf_entry;
> >  if (kernel_size < 0) {
> >  kernel_size = load_uimage(kernel_filename,
> >, NULL, NULL, NULL, NULL);
> > +high_addr = entry + kernel_size;
> >  }
> >  if (kernel_size < 0) {
> >  kernel_size = load_image_targphys(kernel_filename,
> >KERNEL_LOAD_ADDR,
> >ram_size - KERNEL_LOAD_ADDR);
> > +high_addr = KERNEL_LOAD_ADDR + kernel_size;
> >  }
> >
> >  if (entry <= 0) {
> > @@ -168,7 +181,139 @@ static void openrisc_load_kernel(ram_addr_t ram_size,
> >  exit(1);
> >  }
> >  boot_info.bootstrap_pc = entry;
> > +
> > +return high_addr;
> > +}
> > +return 0;
> > +}
> > +
> > +static uint32_t openrisc_load_fdt(Or1ksimState *s, hwaddr load_start,
> > +uint64_t mem_size)
> 
> Indentation again.

Fixed.

> > +{
> > +uint32_t fdt_addr;
> > +int fdtsize = fdt_totalsize(s->fdt);
> > +
> > +if (fdtsize <= 0) {
> > +error_report("invalid device-tree");
> > +exit(1);
> > +}
> > +
> > +/* We should put fdt right after the kernel */
> 
> You change this comment in patch 4 -- I think you might as well
> just use that text in this patch to start with.

OK, I had that at first but I did this to be more techincally correct.  I will
simplify as you suggest.

> > +fdt_addr = ROUND_UP(load_start, 4);
> > +
> > +fdt_pack(s->fdt);
> 
> fdt_pack() returns an error code -- you should check it.

OK.

> > +/* copy in the device tree */
> > +qemu_fdt_dumpdtb(s->fdt, fdtsize);
> > +
> > +rom_add_blob_fixed_as("fdt", s->fdt, fdtsize, fdt_addr,
> > +  _space_memory);
> > +
> > +return fdt_addr;
> > +}
> > +
> > +static void openrisc_create_fdt(Or1ksimState *s,
> > +const struct MemmapEntry *memmap, int num_cpus, uint64_t mem_size,
> > +const char *cmdline)
> 
> Indentation.

Right, fixed.

> > +{
> > +void *fdt;
> > +int cpu;
> > +char *nodename;
> > +int pic_ph;
> > +
> > +fdt = s->fdt = create_device_tree(>fdt_size);
> > +if (!fdt) {
> > +error_report("create_device_tree() failed");
> > +exit(1);
> > +}
> > +
> > +qemu_fdt_setprop_string(fdt, "/", "compatible", "opencores,or1ksim");
> > +qemu_fdt_setprop_cell(fdt, "/", "#address-cells", 0x1);
> > +qemu_fdt_setprop_cell(fdt, "/", "#size-cells", 0x1);
> > +
> > +nodename = g_strdup_printf("/memory@%lx",
> > +   (long)memmap[OR1KSIM_DRAM].base);
> 
> Use the appropriate format string macro for the type, rather than
> casting to long (here and below).

Right good point.

> > +qemu_fdt_add_subnode(fdt, nodename);
> > +qemu_fdt_setprop_cells(fdt, nodename, "reg",
> > +   memmap[OR1KSIM_DRAM].base, mem_size);
> > +qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
> > +g_free(nodename);
> > +
> > +qemu_fdt_add_subnode(fdt, "/cpus");
> > +qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0);
> > +qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
> > +
> > +for (cpu = 0; cpu < num_cpus; cpu++) {
> > +nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> > +qemu_fdt_add_subnode(fdt, nodename);
> > +qemu_fdt_setprop_string(fdt, nodename, "compatible",
> 

Re: [PATCH v2 1/2] Mark remaining global TypeInfo instances as const

2022-02-17 Thread Bernhard Beschow
Am 17. Januar 2022 14:58:04 UTC schrieb Bernhard Beschow :
>More than 1k of TypeInfo instances are already marked as const. Mark the
>remaining ones, too.
>
>This commit was created with:
>  git grep -z -l 'static TypeInfo' -- '*.c' | \
>  xargs -0 sed -i 's/static TypeInfo/static const TypeInfo/'
>
>Signed-off-by: Bernhard Beschow 
>---
> hw/core/generic-loader.c   | 2 +-
> hw/core/guest-loader.c | 2 +-
> hw/display/bcm2835_fb.c| 2 +-
> hw/display/i2c-ddc.c   | 2 +-
> hw/display/macfb.c | 4 ++--
> hw/display/virtio-vga.c| 2 +-
> hw/dma/bcm2835_dma.c   | 2 +-
> hw/i386/pc_piix.c  | 2 +-
> hw/i386/sgx-epc.c  | 2 +-
> hw/intc/bcm2835_ic.c   | 2 +-
> hw/intc/bcm2836_control.c  | 2 +-
> hw/ipmi/ipmi.c | 4 ++--
> hw/mem/nvdimm.c| 2 +-
> hw/mem/pc-dimm.c   | 2 +-
> hw/misc/bcm2835_mbox.c | 2 +-
> hw/misc/bcm2835_powermgt.c | 2 +-
> hw/misc/bcm2835_property.c | 2 +-
> hw/misc/bcm2835_rng.c  | 2 +-
> hw/misc/pvpanic-isa.c  | 2 +-
> hw/misc/pvpanic-pci.c  | 2 +-
> hw/net/fsl_etsec/etsec.c   | 2 +-
> hw/ppc/prep_systemio.c | 2 +-
> hw/ppc/spapr_iommu.c   | 2 +-
> hw/s390x/s390-pci-bus.c| 2 +-
> hw/s390x/sclp.c| 2 +-
> hw/s390x/tod-kvm.c | 2 +-
> hw/s390x/tod-tcg.c | 2 +-
> hw/s390x/tod.c | 2 +-
> hw/scsi/lsi53c895a.c   | 2 +-
> hw/sd/allwinner-sdhost.c   | 2 +-
> hw/sd/aspeed_sdhci.c   | 2 +-
> hw/sd/bcm2835_sdhost.c | 2 +-
> hw/sd/cadence_sdhci.c  | 2 +-
> hw/sd/npcm7xx_sdhci.c  | 2 +-
> hw/usb/dev-mtp.c   | 2 +-
> hw/usb/host-libusb.c   | 2 +-
> hw/vfio/igd.c  | 2 +-
> hw/virtio/virtio-pmem.c| 2 +-
> qom/object.c   | 4 ++--
> 39 files changed, 42 insertions(+), 42 deletions(-)
>
>diff --git a/hw/core/generic-loader.c b/hw/core/generic-loader.c
>index 9a24ffb880..eaafc416f4 100644
>--- a/hw/core/generic-loader.c
>+++ b/hw/core/generic-loader.c
>@@ -207,7 +207,7 @@ static void generic_loader_class_init(ObjectClass *klass, 
>void *data)
> set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> }
> 
>-static TypeInfo generic_loader_info = {
>+static const TypeInfo generic_loader_info = {
> .name = TYPE_GENERIC_LOADER,
> .parent = TYPE_DEVICE,
> .instance_size = sizeof(GenericLoaderState),
>diff --git a/hw/core/guest-loader.c b/hw/core/guest-loader.c
>index d3f9d1a06e..391c875a29 100644
>--- a/hw/core/guest-loader.c
>+++ b/hw/core/guest-loader.c
>@@ -129,7 +129,7 @@ static void guest_loader_class_init(ObjectClass *klass, 
>void *data)
> set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> }
> 
>-static TypeInfo guest_loader_info = {
>+static const TypeInfo guest_loader_info = {
> .name = TYPE_GUEST_LOADER,
> .parent = TYPE_DEVICE,
> .instance_size = sizeof(GuestLoaderState),
>diff --git a/hw/display/bcm2835_fb.c b/hw/display/bcm2835_fb.c
>index 2be77bdd3a..088fc3d51c 100644
>--- a/hw/display/bcm2835_fb.c
>+++ b/hw/display/bcm2835_fb.c
>@@ -454,7 +454,7 @@ static void bcm2835_fb_class_init(ObjectClass *klass, void 
>*data)
> dc->vmsd = _bcm2835_fb;
> }
> 
>-static TypeInfo bcm2835_fb_info = {
>+static const TypeInfo bcm2835_fb_info = {
> .name  = TYPE_BCM2835_FB,
> .parent= TYPE_SYS_BUS_DEVICE,
> .instance_size = sizeof(BCM2835FBState),
>diff --git a/hw/display/i2c-ddc.c b/hw/display/i2c-ddc.c
>index 13eb529fc1..146489518c 100644
>--- a/hw/display/i2c-ddc.c
>+++ b/hw/display/i2c-ddc.c
>@@ -113,7 +113,7 @@ static void i2c_ddc_class_init(ObjectClass *oc, void *data)
> isc->send = i2c_ddc_tx;
> }
> 
>-static TypeInfo i2c_ddc_info = {
>+static const TypeInfo i2c_ddc_info = {
> .name = TYPE_I2CDDC,
> .parent = TYPE_I2C_SLAVE,
> .instance_size = sizeof(I2CDDCState),
>diff --git a/hw/display/macfb.c b/hw/display/macfb.c
>index 4bd7c3ad6a..69c2ea2b6e 100644
>--- a/hw/display/macfb.c
>+++ b/hw/display/macfb.c
>@@ -783,14 +783,14 @@ static void macfb_nubus_class_init(ObjectClass *klass, 
>void *data)
> device_class_set_props(dc, macfb_nubus_properties);
> }
> 
>-static TypeInfo macfb_sysbus_info = {
>+static const TypeInfo macfb_sysbus_info = {
> .name  = TYPE_MACFB,
> .parent= TYPE_SYS_BUS_DEVICE,
> .instance_size = sizeof(MacfbSysBusState),
> .class_init= macfb_sysbus_class_init,
> };
> 
>-static TypeInfo macfb_nubus_info = {
>+static const TypeInfo macfb_nubus_info = {
> .name  = TYPE_NUBUS_MACFB,
> .parent= TYPE_NUBUS_DEVICE,
> .instance_size = sizeof(MacfbNubusState),
>diff --git a/hw/display/virtio-vga.c b/hw/display/virtio-vga.c
>index b23a75a04b..5a2f7a4540 100644
>--- a/hw/display/virtio-vga.c
>+++ b/hw/display/virtio-vga.c
>@@ -220,7 +220,7 @@ static void virtio_vga_base_class_init(ObjectClass *klass, 
>void *data)
>virtio_vga_set_big_endian_fb);
> }
> 
>-static TypeInfo virtio_vga_base_info = {
>+static const TypeInfo virtio_vga_base_info = {
> 

[PATCH] configure: Support empty prefixes

2022-02-17 Thread Joshua Seaton
At least as of v5 (before the meson build), empty `--prefix` values were
supported; this seems to have fallen out along the way. This change
reintroduces support.

Tested locally with empty and non-empty values of `--prefix`.

Signed-off-by: Joshua Seaton 
---
 configure | 33 -
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/configure b/configure
index 3a29eff5cc..87a32e52e4 100755
--- a/configure
+++ b/configure
@@ -1229,20 +1229,30 @@ case $git_submodules_action in
 ;;
 esac

-libdir="${libdir:-$prefix/lib}"
-libexecdir="${libexecdir:-$prefix/libexec}"
-includedir="${includedir:-$prefix/include}"
+# Emits a relative path in the case of an empty prefix.
+prefix_subdir() {
+dir="$1"
+if test -z "$prefix" ; then
+echo "$dir"
+else
+echo "$prefix/$dir"
+fi
+}
+
+libdir="${libdir:-$(prefix_subdir lib)}"
+libexecdir="${libexecdir:-$(prefix_subdir libexec)}"
+includedir="${includedir:-$(prefix_subdir include)}"

 if test "$mingw32" = "yes" ; then
 bindir="${bindir:-$prefix}"
 else
-bindir="${bindir:-$prefix/bin}"
+bindir="${bindir:-$(prefix_subdir bin)}"
 fi
-mandir="${mandir:-$prefix/share/man}"
-datadir="${datadir:-$prefix/share}"
-docdir="${docdir:-$prefix/share/doc}"
-sysconfdir="${sysconfdir:-$prefix/etc}"
-local_statedir="${local_statedir:-$prefix/var}"
+mandir="${mandir:-$(prefix_subdir share/man)}"
+datadir="${datadir:-$(prefix_subdir share)}"
+docdir="${docdir:-$(prefix_subdir share/doc)}"
+sysconfdir="${sysconfdir:-$(prefix_subdir etc)}"
+local_statedir="${local_statedir:-$(prefix_subdir var)}"
 firmwarepath="${firmwarepath:-$datadir/qemu-firmware}"
 localedir="${localedir:-$datadir/locale}"

@@ -3763,6 +3773,11 @@ if test "$skip_meson" = no; then
   mv $cross config-meson.cross

   rm -rf meson-private meson-info meson-logs
+
+  # Workaround for a meson bug preventing empty prefixes:
+  # see https://github.com/mesonbuild/meson/issues/6946.
+  prefix="${prefix:-/}"
+
   run_meson() {
 NINJA=$ninja $meson setup \
 --prefix "$prefix" \
--
2.35.1.265.g69c8d7142f-goog



I have a Question? Can you build a version with German and English and Spain and France and Russia and Chinese Language Pack and Handbook Installation Guide in PDF and Paper book with all Function in

2022-02-17 Thread Anon Anonymous

I have a Question?


Can you build a version with German and English and Spain and France and 
Russia and Chinese Language Pack and Handbook Installation Guide in PDF 
and Paper book with all Function in Linux, Unix and Windows?



A Book with configure Raspberry PI Emulation, Android Emulation, IOS & 
MAC & SPARC Unix Emulation and IBM PC DOS/WINDOWS/LINUX x86/amd64 
Emulation Soundblaster pro 16 awe32 awe64 Live Adlib speake Emulation.##



VHD, IMG VHDX mount unmount as Instalation Guide

Installation Guide all OS

IMA, VFD, IMG Floppy mount unmount

ISO Image CD mount unmount

Copy Tools from Harddisk to VHD, VHDx, IMG and CONVER FORMAT

Copy Tool from real Floppy to IMG, IMA VFD

Copy Tool from real CD to ISO and convert Tool from all CD-Image Files 
to ISO (NRG,BIN and other)


Creation Tool Harddisk Image Partition Tool and Format and boot Install 
CD FLoppy


A Backup & Restore Tool from Harddisk Image and Partition to Backup File

A Guide for this in German, Please.


Best King Regards



Daniel Frank Nommensen






A Virtual Bios in Qemu with GUI by Boot QEMU with SHORTCUT

2022-02-17 Thread Anon Anonymous
Hello can you build a Virtual Bios with Hardware Simulation Emulation 
and change Hardware?


CPU Option = CPU x86/x64/SPARC/ARM/ARM64/ANDROID/IOS/MAC

CPU Standard Config Name = Original CPU in HArdware or Simulation 
Emulation All INTEL CELERON, CENTRINO, ATOM, PENTIUM (MMX) 1234, 
i3,i5,i7,i9 , ALL MAC Processors, ALL AMD Processors, ALL 
HANDY/SMARTPHONE Processor, ALL ONE PLATINE CPU and Hardware Raspberry 
Pi ZERO, 1, 2, 3, 4, CM4, 400, RockPI and all other or Optional Config 
and Retro Computers AMIGA, COMMODORE, SPEKTRUM, SINCLAIR, PC IBM DOS ALL 
8086 up to 486 CPU


CPU Core MAX USE MANUAL OPTIONAL = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
12, 13, 14, 15, 16, up to 256 Cores.


CPU Core Speed min. 1 MHz up to max. CPU Speed from Hardware

CPU Architecture Simulation 8-bit, 16-bit, 32-bit, 64-bit (Future: 
Quantum Processor)


GPU Grafic Card Simulation Emulation = Orginal GPU from Hardware or 
simulated emulated all Graficcard from RETRO MATROX S3 NVIDIA ATI 
DOS/WINDOWS and all other from other hardware


GPU Grafic Card Space from 128 Kilobyte up to max Speed GPU.

GPU Grafic Card Core = 1 up to max from graficcard Autodetection

1 - 4 Graficcards

Architecture Graficcard from 8-Bit up to 64-Bit

Graficcard Simulation = Hercules, CGA, EGA, VGA, SVGA and greater 4:3, 
16:9, 16:10


Graficcard Min Size max Size = ALL DOS 320x240, 640x480, 1024x768, 
1280x1024, 1600x1200, HDREADY, FULLHD, 2K, 4K, 8K


RAM from 0,5 MB up to Max hardware.

mount 26 Drives Floppy, Harddisk, SD-Card, CD/DVD/BLURAY REAL SATA / USB 
or Emulation Hardware Controllers Master SLAVE IDE/ATA FDD or Image file


Sound Card EMulation Real or Creativ Soundblster, Terratec, Roland, 
Adlib, or / and PC Speaker


Boot Option 1234

1=First drive Boot or USB Boot

2=Second Drive (image) Boot or USB Boot

3= third drive boot

4=fourth Drive Boot


And i can change with File Folder Image File by Short Cuts in System 
CD/DVD/BLURAY/FLOPPY IMAGE



Best King Regards




Daniel Frank Nommensen




Re: [PATCH v2 4/4] hw/openrisc/openrisc_sim: Add support for initrd loading

2022-02-17 Thread Stafford Horne
On Thu, Feb 17, 2022 at 06:04:32PM +, Peter Maydell wrote:
> On Thu, 10 Feb 2022 at 13:13, Stafford Horne  wrote:
> >
> > The initrd passed via the command line is loaded into memory.  It's
> > location and size is then added to the device tree so the kernel knows
> > where to find it.
> >
> > Signed-off-by: Stafford Horne 
> > ---
> >  hw/openrisc/openrisc_sim.c | 32 +++-
> >  1 file changed, 31 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/openrisc/openrisc_sim.c b/hw/openrisc/openrisc_sim.c
> > index d7c26af82c..5354797e20 100644
> > --- a/hw/openrisc/openrisc_sim.c
> > +++ b/hw/openrisc/openrisc_sim.c
> > @@ -187,6 +187,32 @@ static hwaddr openrisc_load_kernel(ram_addr_t ram_size,
> >  return 0;
> >  }
> >
> > +static hwaddr openrisc_load_initrd(Or1ksimState *s, const char *filename,
> > +hwaddr load_start, uint64_t mem_size)
> 
> Indentation here is off.

Ah, I was going off the indentation already in the file.  I will fix this.

i.e. should be:

static hwaddr openrisc_load_initrd(Or1ksimState *s, const char *filename,
   hwaddr load_start, uint64_t mem_size)

That's why its like that everywhere.  I might as well add another patch to fix
up indentation.

>
> Otherwise
> Reviewed-by: Peter Maydell 

Thanks!



Re: [PATCH] hw/ide: implement ich6 ide controller support

2022-02-17 Thread John Snow
On Mon, Feb 14, 2022 at 1:48 PM Liav Albani  wrote:
>
> Hello BALATON,
>
> Thank you for helping keeping this patch noticeable to everyone :)
>
> I tried to reach out to John via a private email last Saturday (two days
> ago) so I don't "spam" the mailing list for no good reason.
> It might be that I should actually refrain from doing so and talk to the
> maintainer directly on the mailing list once the patch
> has been submitted to the mailing list.
> I've not yet seen any response from John so I assume it's a matter of
> days before he can take care of this.
>
> Best regards,
> Liav
>
> On 2/14/22 14:26, BALATON Zoltan wrote:
> > On Sat, 5 Feb 2022, BALATON Zoltan wrote:
> >> Hello,
> >
> > Ping? John, do you agree with my comments? Should Liav proceed to send
> > a v2?
> >
> > Thanks,
> > BALATON Zoltan
> >
> >> On Sat, 5 Feb 2022, Liav Albani wrote:
> >>> On 2/5/22 17:48, BALATON Zoltan wrote:
>  On Sat, 5 Feb 2022, Liav Albani wrote:
> > This type of IDE controller has support for relocating the IO
> > ports and
> > doesn't use IRQ 14 and 15 but one allocated PCI IRQ for the
> > controller.
> 
>  I haven't looked at in detail so only a few comments I've got while
>  reading it. What machine needs this? In QEMU I think we only have
>  piix and ich9 emulated for pc and q35 machines but maybe ich6 is
>  also used by some machine I don't know about. Otherwise it looks
>  odd to have ide part of ich6 but not the other parts of this chip.
> 
> >>> Hi BALATON,
> >>>
> >>> This is my first patch to QEMU and the first time I send patches
> >>> over the mail. I sent my github tree to John Snow (the maintainer of
> >>> the IDE code in QEMU) for advice if I should send them here and I
> >>> was encouraged to do that.
> >>
> >> Welcome and thanks a lot for taking time to contribute and share your
> >> results. In case you're not yet aware, these docs should explain how
> >> patches are handled on the list:
> >>
> >> https://www.qemu.org/docs/master/devel/submitting-a-patch.html
> >>
> >>> For the next time patch I'll put a note on writing a descriptive
> >>> cover letter as it could have put more valuable details on why I
> >>> sent this patch.
> >>>
> >>> There's no such machine type emulating the ICH6 chipset in QEMU.
> >>> However, I wrote this emulation component as a test for the
> >>> SerenityOS kernel because I have a machine from 2009 which has
> >>> an ICH7 southbridge, so, I wanted to emulate such device with QEMU
> >>> to ease development on it.
> >>>
> >>> I found out that Linux with libata was using the controller without
> >>> any noticeable problems, but the SerenityOS kernel struggled to use
> >>> this device, so I decided that
> >>> I should send this patch to get it merged and then I can use it
> >>> locally and maybe other people will benefit from it.
> >>>
> >>> In regard to other components of the ICH6 chipset - I don't think
> >>> it's worth anybody's time to actually implement them as the ICH9
> >>> chipset is quite close to what the ICH6 chipset offers as far as I
> >>> can tell.
> >>> The idea of implementing ich6-ide controller was to enable the
> >>> option of people like me and other OS developers to ensure their
> >>> kernels operate correctly on such type of device,
> >>> which is legacy-free device in the aspect of PCI bus resource
> >>> management but still is a legacy device which belongs to chipsets of
> >>> late 2000s.
> >>
> >> That's OK, maybe a short mention (just one sentence) in the commit
> >> message explaining this would help to understand why this device
> >> model was added.
> >>
> > Signed-off-by: Liav Albani 
> > ---
> > hw/i386/Kconfig  |   2 +
> > hw/ide/Kconfig   |   5 +
> > hw/ide/bmdma.c   |  83 +++
> > hw/ide/ich6.c| 211
> > +++
> > hw/ide/meson.build   |   3 +-
> > hw/ide/piix.c|  50 +-
> > include/hw/ide/pci.h |   5 +
> > include/hw/pci/pci_ids.h |   1 +
> > 8 files changed, 311 insertions(+), 49 deletions(-)
> > create mode 100644 hw/ide/bmdma.c
> > create mode 100644 hw/ide/ich6.c
> >
> > diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
> > index d22ac4a4b9..a18de2d962 100644
> > --- a/hw/i386/Kconfig
> > +++ b/hw/i386/Kconfig
> > @@ -75,6 +75,7 @@ config I440FX
> > select PCI_I440FX
> > select PIIX3
> > select IDE_PIIX
> > +select IDE_ICH6
> > select DIMM
> > select SMBIOS
> > select FW_CFG_DMA
> > @@ -101,6 +102,7 @@ config Q35
> > select PCI_EXPRESS_Q35
> > select LPC_ICH9
> > select AHCI_ICH9
> > +select IDE_ICH6
> > select DIMM
> > select SMBIOS
> > select FW_CFG_DMA
> > diff --git a/hw/ide/Kconfig b/hw/ide/Kconfig
> > index dd85fa3619..63304325a5 100644
> > --- a/hw/ide/Kconfig
> > +++ 

Re: [PATCH] ppc/spapr: Advertise StoreEOI for POWER10 compat guests

2022-02-17 Thread Daniel Henrique Barboza




On 2/17/22 10:23, Cédric Le Goater wrote:

On 2/17/22 12:28, Daniel Henrique Barboza wrote:



On 2/14/22 11:11, Cédric Le Goater wrote:

When an interrupt has been handled, the OS notifies the interrupt
controller with a EOI sequence. On a POWER9 and POWER10 systems using
the XIVE interrupt controller, this can be done with a load or a store
operation on the ESB interrupt management page of the interrupt. The
StoreEOI operation has less latency and improves interrupt handling
performance but it was deactivated during the POWER9 DD2.0 timeframe
because of ordering issues. POWER9 systems use the LoadEOI instead.
POWER10 compat guests should have fixed the issue with
Load-after-Store ordering and StoreEOI can be activated for them
again.

To maintain performance, this ordering is only enforced for the
XIVE_ESB_SET_PQ_10 load operation. This operation can be used to
disable temporarily an interrupt source. If StoreEOI is active, a
source could be left enabled if the load and store operations come
out of order.

Add a check in our XIVE emulation model for Load-after-Store when
StoreEOI is active. It should catch unreliable sequences. Other load
operations should be fine without it.

Signed-off-by: Cédric Le Goater 
---


Reviewed-by: Daniel Henrique Barboza 



Unfortunetaly, this patch breaks migration under TCG because the XIVE
source flag is not updated on the target side. KVM is not impacted
because the emulated sources are not used. This needs to be addressed
in a v2.

That said, even without this patch, TCG migration is broken. some CPUs
on the receive side are stalled on CPU Hard LOCKUPs. QEMU 6.2 is impacted.
So it has been a while :/




I've done a few tests and I can see Hard Lockups with TCG pseries migration, 
when using
multiples CPUs (I used -smp 4 like you suggested in private), since at least 
QEMU
v6.0.0.

This is hardly surprising since TCG migration isn't something that we ever 
supported in
a product or even in the community*. It would be good to understand why and get 
it fixed,
but for now we can take a bit comfort in knowing that:

- it has been broken for awhile (if ever worked). If this was a recent 7.0 
regression
we would need to solve it for this upcoming release;

- single CPU TCG migration seems to be working fine, so we can count with this 
TCG
migration scenario for testing.



* I'm hoping David and Greg can push back on this if my assumption is wrong.



Thanks,



Daniel








See below.

C.



[   24.113608] watchdog: CPU 0 detected hard LOCKUP on other CPUs 1,3
[   24.116534] watchdog: CPU 0 TB:15585461459, last SMP heartbeat TB:7394335409 
(15998ms ago)
[   24.117840] watchdog: CPU 1 Hard LOCKUP
[   24.117956] watchdog: CPU 1 TB:15587843000, last heartbeat TB:5355690415 
(19984ms ago)
[   24.117999] Modules linked in:
[   24.118387] irq event stamp: 341399
[   24.118399] hardirqs last  enabled at (341399): [] 
snooze_loop+0x9c/0x290
[   24.118900] hardirqs last disabled at (341398): [] 
do_idle+0x12c/0x450
[   24.118943] softirqs last  enabled at (9798): [] 
__do_softirq+0x60c/0x678
[   24.118971] softirqs last disabled at (9789): [] 
__irq_exit_rcu+0x158/0x1c0
[   24.119127] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.17.0-rc4-dirty #984
[   24.119293] NIP:  c0caea78 LR: c0caea38 CTR: c0cae990
[   24.119315] REGS: c000fff43d60 TRAP: 0100   Not tainted  
(5.17.0-rc4-dirty)
[   24.119352] MSR:  8280b033   CR: 
28000228  XER: 0006
[   24.119554] CFAR: c0caea98 IRQMASK: 0
[   24.119554] GPR00: c0caea2c c2bbbd80 c1c30b00 

[   24.119554] GPR04: 0006  c800 
c1c7dc38
[   24.119554] GPR08: c2b5d500  0003a115ef39 
36d551ed
[   24.119554] GPR12: c0cae990 c000f300  

[   24.119554] GPR16:    

[   24.119554] GPR20:    
c1b3a660
[   24.119554] GPR24: c000ffa4fb48 00059d7c5070 c1c78e48 

[   24.119554] GPR28: c1b3a660 c15422e0 c15422e8 

[   24.119845] NIP [c0caea78] snooze_loop+0xe8/0x290
[   24.119866] LR [c0caea38] snooze_loop+0xa8/0x290
[   24.119998] Call Trace:
[   24.120029] [c2bbbd80] [c0caea2c] snooze_loop+0x9c/0x290 
(unreliable)
[   24.120097] [c2bbbdc0] [c0cab730] 
cpuidle_enter_state+0x300/0x730
[   24.120119] [c2bbbe30] [c0cabbfc] cpuidle_enter+0x4c/0x70
[   24.120131] [c2bbbe70] [c0208d98] do_idle+0x328/0x450
[   24.120141] [c2bbbf00] [c020926c] cpu_startup_entry+0x3c/0x40
[   24.120150] [c2bbbf30] [c005e144] start_secondary+0x2a4/0x2b0
[   24.120161] [c2bbbf90] [c000d054] 
start_secondary_prolog+0x10/0x14
[   24.120238] Instruction 

Re: [PATCH v2] ide: Increment BB in-flight counter for TRIM BH

2022-02-17 Thread John Snow
On Tue, Feb 15, 2022 at 12:14 PM Hanna Reitz  wrote:
>
> Ping
>
> (I can take it too, if you’d like, John, but you’re listed as the only
> maintainer for hw/ide, so...  Just say the word, though!)
>

Sorry, I sent you a mail off-list at the time where I said you were
free to take it whenever you like. Why'd I send it off-list? I don't
know

Please feel free to send this with your next block PR.

--js

> On 20.01.22 15:22, Hanna Reitz wrote:
> > When we still have an AIOCB registered for DMA operations, we try to
> > settle the respective operation by draining the BlockBackend associated
> > with the IDE device.
> >
> > However, this assumes that every DMA operation is associated with an
> > increment of the BlockBackend’s in-flight counter (e.g. through some
> > ongoing I/O operation), so that draining the BB until its in-flight
> > counter reaches 0 will settle all DMA operations.  That is not the case:
> > For TRIM, the guest can issue a zero-length operation that will not
> > result in any I/O operation forwarded to the BlockBackend, and also not
> > increment the in-flight counter in any other way.  In such a case,
> > blk_drain() will be a no-op if no other operations are in flight.
> >
> > It is clear that if blk_drain() is a no-op, the value of
> > s->bus->dma->aiocb will not change between checking it in the `if`
> > condition and asserting that it is NULL after blk_drain().
> >
> > The particular problem is that ide_issue_trim() creates a BH
> > (ide_trim_bh_cb()) to settle the TRIM request: iocb->common.cb() is
> > ide_dma_cb(), which will either create a new request, or find the
> > transfer to be done and call ide_set_inactive(), which clears
> > s->bus->dma->aiocb.  Therefore, the blk_drain() must wait for
> > ide_trim_bh_cb() to run, which currently it will not always do.
> >
> > To fix this issue, we increment the BlockBackend's in-flight counter
> > when the TRIM operation begins (in ide_issue_trim(), when the
> > ide_trim_bh_cb() BH is created) and decrement it when ide_trim_bh_cb()
> > is done.
> >
> > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2029980
> > Suggested-by: Paolo Bonzini 
> > Signed-off-by: Hanna Reitz 
> > ---
> > v1:
> > https://lists.nongnu.org/archive/html/qemu-block/2022-01/msg00024.html
> >
> > v2:
> > - Increment BB’s in-flight counter while the BH is active so that
> >blk_drain() will poll until the BH is done, as suggested by Paolo
> >
> > (No git-backport-diff, because this patch was basically completely
> > rewritten, so it wouldn’t be worth it.)
> > ---
> >   hw/ide/core.c | 7 +++
> >   1 file changed, 7 insertions(+)
>




Re: [PATCH v4 01/12] mm/shmem: Introduce F_SEAL_INACCESSIBLE

2022-02-17 Thread Andy Lutomirski
On Thu, Feb 17, 2022, at 5:06 AM, Chao Peng wrote:
> On Fri, Feb 11, 2022 at 03:33:35PM -0800, Andy Lutomirski wrote:
>> On 1/18/22 05:21, Chao Peng wrote:
>> > From: "Kirill A. Shutemov" 
>> > 
>> > Introduce a new seal F_SEAL_INACCESSIBLE indicating the content of
>> > the file is inaccessible from userspace through ordinary MMU access
>> > (e.g., read/write/mmap). However, the file content can be accessed
>> > via a different mechanism (e.g. KVM MMU) indirectly.
>> > 
>> > It provides semantics required for KVM guest private memory support
>> > that a file descriptor with this seal set is going to be used as the
>> > source of guest memory in confidential computing environments such
>> > as Intel TDX/AMD SEV but may not be accessible from host userspace.
>> > 
>> > At this time only shmem implements this seal.
>> > 
>> 
>> I don't dislike this *that* much, but I do dislike this. F_SEAL_INACCESSIBLE
>> essentially transmutes a memfd into a different type of object.  While this
>> can apparently be done successfully and without races (as in this code),
>> it's at least awkward.  I think that either creating a special inaccessible
>> memfd should be a single operation that create the correct type of object or
>> there should be a clear justification for why it's a two-step process.
>
> Now one justification maybe from Stever's comment to patch-00: for ARM
> usage it can be used with creating a normal memfd, (partially)populate
> it with initial guest memory content (e.g. firmware), and then
> F_SEAL_INACCESSIBLE it just before the first time lunch of the guest in
> KVM (definitely the current code needs to be changed to support that).

Except we don't allow F_SEAL_INACCESSIBLE on a non-empty file, right?  So this 
won't work.

In any case, the whole confidential VM initialization story is a bit buddy.  
From the earlier emails, it sounds like ARM expects the host to fill in guest 
memory and measure it.  From my recollection of Intel's scheme (which may well 
be wrong, and I could easily be confusing it with SGX), TDX instead measures 
what is essentially a transcript of the series of operations that initializes 
the VM.  These are fundamentally not the same thing even if they accomplish the 
same end goal.  For TDX, we unavoidably need an operation (ioctl or similar) 
that initializes things according to the VM's instructions, and ARM ought to be 
able to use roughly the same mechanism.

Also, if we ever get fancy and teach the page allocator about memory with 
reduced directmap permissions, it may well be more efficient for userspace to 
shove data into a memfd via ioctl than it is to mmap it and write the data.



Re: [PATCH v5 03/16] tests/fp/berkeley-testfloat-3: Ignore ignored #pragma directives

2022-02-17 Thread Peter Maydell
On Mon, 14 Feb 2022 at 18:56, Philippe Mathieu-Daudé  wrote:
>
> Since we already use -Wno-unknown-pragmas, we can also use
> -Wno-ignored-pragmas. This silences hundred of warnings using
> clang 13 on macOS Monterey:
>
>   [409/771] Compiling C object 
> tests/fp/libtestfloat.a.p/berkeley-testfloat-3_source_test_az_f128_rx.c.o
>   ../tests/fp/berkeley-testfloat-3/source/test_az_f128_rx.c:49:14: warning: 
> '#pragma FENV_ACCESS' is not supported on this target - ignored 
> [-Wignored-pragmas]
>   #pragma STDC FENV_ACCESS ON
>^
>   1 warning generated.
>

GCC doesn't know about -Wignored-pragmas, so this change is relying on
the GCC "ignore a -Wno-something that this gcc doesn't recognize
if we wouldn't otherwise be complaining about something" behaviour.
I forget which GCC version that was introduced in... (This is why
configure has the cc_has_warning_flag() test before it tries to
use a warning/warning-suppression option.)

-- PMM



Re: qemu crash 100% CPU with Ubuntu10.04 guest (solved)

2022-02-17 Thread Andrea Bolognani
On Thu, Feb 17, 2022 at 12:07:15PM +1100, Ben Smith wrote:
> Hi All,
>
> I'm cross-posting this from Reddit qemu_kvm, in case it helps in some
> way. I know my setup is ancient and unique; let me know if you would
> like more info.
>
> Symptoms:
> 1. Ubuntu10.04 32-bit guest locks up randomly between 0 and 30 days.
> 2. The console shows a CPU trace dump, nothing else logged on the guest or 
> host.
> 3. Host system (Ubuntu20.04) 100% CPU for qemu process.
>
> Solution:
> When using virt-install, always use the "--os-variant" parameter!
> e.g. --os-variant ubuntu10.04

FWIW, the --os-variant / --osinfo argument is going to be mandatory
starting with the upcoming virt-manager release.

https://listman.redhat.com/archives/virt-tools-list/2022-February/msg00021.html

-- 
Andrea Bolognani / Red Hat / Virtualization




Re: [PATCH v5 02/16] configure: Allow passing extra Objective C compiler flags

2022-02-17 Thread Peter Maydell
On Mon, 14 Feb 2022 at 18:56, Philippe Mathieu-Daudé  wrote:
>
> We can pass C/CPP/LD flags via CFLAGS/CXXFLAGS/LDFLAGS environment
> variables, or via configure --extra-cflags / --extra-cxxflags /
> --extra-ldflags options. Provide similar behavior for Objective C:
> use existing flags from $OBJCFLAGS, or passed via --extra-objcflags.
>
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Peter Maydell 

thanks
-- PMM



Re: [PATCH v5 01/16] MAINTAINERS: Add Akihiko Odaki to macOS-relateds

2022-02-17 Thread Peter Maydell
On Mon, 14 Feb 2022 at 18:56, Philippe Mathieu-Daudé  wrote:
>
> From: Akihiko Odaki 
>
> Signed-off-by: Akihiko Odaki 
> Reviewed-by: Christian Schoenebeck 
> Reviewed-by: Philippe Mathieu-Daudé 
> Message-Id: <20220213021215.1974-1-akihiko.od...@gmail.com>
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Peter Maydell 

thanks
-- PMM



Re: Call for GSoC and Outreachy project ideas for summer 2022

2022-02-17 Thread Thomas Huth

On 28/01/2022 16.47, Stefan Hajnoczi wrote:

Dear QEMU, KVM, and rust-vmm communities,
QEMU will apply for Google Summer of Code 2022
(https://summerofcode.withgoogle.com/) and has been accepted into
Outreachy May-August 2022 (https://www.outreachy.org/). You can now
submit internship project ideas for QEMU, KVM, and rust-vmm!

If you have experience contributing to QEMU, KVM, or rust-vmm you can
be a mentor. It's a great way to give back and you get to work with
people who are just starting out in open source.

Please reply to this email by February 21st with your project ideas.


I'd like to suggest an idea (shamelessly "inspired" by Philippe's suggestion 
last year):


=== Improve s390x (IBM Z) emulation with RISU ===

Summary: Adapt RISU to s390x and fix CPU emulation along the way.

RISU (Random Instruction Sequence generator for Userspace testing) is a tool 
for testing CPU instructions with randomly generated opcodes. The goal of 
this project is to adapt the RISU framework for the IBM Z architecture 
(a.k.a. s390x), so that it could be used to test the s390x emulation of QEMU 
for correctness. This will certainly help to spot some instruction emulation 
deficiencies in QEMU which should be addressed during this internship, too.


'''Links:'''
* [https://git.linaro.org/people/peter.maydell/risu.git/tree/
   Peter Maydell's RISU repository]
* [https://www.linux-kvm.org/images/6/63/01x03-ValidatingTCG.pdf
   KVM Forum 2014 presentation by Alex Bennée]
* [http://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf
   z/Architecture Principles of Operation] (the description
   of the CPU instructions)

'''Details:'''
* Skill level: intermediate (a good basic understanding of CPU
  instructions is required)
* Language: C, Perl
* Mentor: Thomas Huth (th...@redhat.com) (+1 TBD)


What do you think about that idea?

 Thanks,
  Thomas




Re: [PATCH 3/4] hw/openrisc/openrisc_sim; Add support for loading a decice tree

2022-02-17 Thread Peter Maydell
On Thu, 10 Feb 2022 at 06:46, Stafford Horne  wrote:
>
> Using the device tree means that qemu can now directly tell
> the kernel what hardware is configured rather than use having
> to maintain and update a separate device tree file.
>
> This patch adds device tree support for the OpenRISC simulator.
> A device tree is built up based on the state of the configure
> openrisc simulator.

This sounds like it's support for creating a device
tree? Support for loading a device tree would be "the
user passes us a filename of a dtb file". (This is mostly a
quibble about commit message wording.)

> -static void openrisc_load_kernel(ram_addr_t ram_size,
> +static hwaddr openrisc_load_kernel(ram_addr_t ram_size,
>   const char *kernel_filename)

Indentation looks off now ?

>  {
>  long kernel_size;
>  uint64_t elf_entry;
> +uint64_t high_addr;
>  hwaddr entry;
>
>  if (kernel_filename && !qtest_enabled()) {
>  kernel_size = load_elf(kernel_filename, NULL, NULL, NULL,
> -   _entry, NULL, NULL, NULL, 1, EM_OPENRISC,
> -   1, 0);
> +   _entry, NULL, _addr, NULL, 1,
> +   EM_OPENRISC, 1, 0);
>  entry = elf_entry;
>  if (kernel_size < 0) {
>  kernel_size = load_uimage(kernel_filename,
>, NULL, NULL, NULL, NULL);
> +high_addr = entry + kernel_size;
>  }
>  if (kernel_size < 0) {
>  kernel_size = load_image_targphys(kernel_filename,
>KERNEL_LOAD_ADDR,
>ram_size - KERNEL_LOAD_ADDR);
> +high_addr = KERNEL_LOAD_ADDR + kernel_size;
>  }
>
>  if (entry <= 0) {
> @@ -168,7 +181,139 @@ static void openrisc_load_kernel(ram_addr_t ram_size,
>  exit(1);
>  }
>  boot_info.bootstrap_pc = entry;
> +
> +return high_addr;
> +}
> +return 0;
> +}
> +
> +static uint32_t openrisc_load_fdt(Or1ksimState *s, hwaddr load_start,
> +uint64_t mem_size)

Indentation again.

> +{
> +uint32_t fdt_addr;
> +int fdtsize = fdt_totalsize(s->fdt);
> +
> +if (fdtsize <= 0) {
> +error_report("invalid device-tree");
> +exit(1);
> +}
> +
> +/* We should put fdt right after the kernel */

You change this comment in patch 4 -- I think you might as well
just use that text in this patch to start with.

> +fdt_addr = ROUND_UP(load_start, 4);
> +
> +fdt_pack(s->fdt);

fdt_pack() returns an error code -- you should check it.

> +/* copy in the device tree */
> +qemu_fdt_dumpdtb(s->fdt, fdtsize);
> +
> +rom_add_blob_fixed_as("fdt", s->fdt, fdtsize, fdt_addr,
> +  _space_memory);
> +
> +return fdt_addr;
> +}
> +
> +static void openrisc_create_fdt(Or1ksimState *s,
> +const struct MemmapEntry *memmap, int num_cpus, uint64_t mem_size,
> +const char *cmdline)

Indentation.

> +{
> +void *fdt;
> +int cpu;
> +char *nodename;
> +int pic_ph;
> +
> +fdt = s->fdt = create_device_tree(>fdt_size);
> +if (!fdt) {
> +error_report("create_device_tree() failed");
> +exit(1);
> +}
> +
> +qemu_fdt_setprop_string(fdt, "/", "compatible", "opencores,or1ksim");
> +qemu_fdt_setprop_cell(fdt, "/", "#address-cells", 0x1);
> +qemu_fdt_setprop_cell(fdt, "/", "#size-cells", 0x1);
> +
> +nodename = g_strdup_printf("/memory@%lx",
> +   (long)memmap[OR1KSIM_DRAM].base);

Use the appropriate format string macro for the type, rather than
casting to long (here and below).

> +qemu_fdt_add_subnode(fdt, nodename);
> +qemu_fdt_setprop_cells(fdt, nodename, "reg",
> +   memmap[OR1KSIM_DRAM].base, mem_size);
> +qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
> +g_free(nodename);
> +
> +qemu_fdt_add_subnode(fdt, "/cpus");
> +qemu_fdt_setprop_cell(fdt, "/cpus", "#size-cells", 0x0);
> +qemu_fdt_setprop_cell(fdt, "/cpus", "#address-cells", 0x1);
> +
> +for (cpu = 0; cpu < num_cpus; cpu++) {
> +nodename = g_strdup_printf("/cpus/cpu@%d", cpu);
> +qemu_fdt_add_subnode(fdt, nodename);
> +qemu_fdt_setprop_string(fdt, nodename, "compatible",
> +"opencores,or1200-rtlsvn481");
> +qemu_fdt_setprop_cell(fdt, nodename, "reg", cpu);
> +qemu_fdt_setprop_cell(fdt, nodename, "clock-frequency",
> +  OR1KSIM_CLK_MHZ);
> +g_free(nodename);
> +}
> +
> +if (num_cpus > 0) {
> +nodename = g_strdup_printf("/ompic@%lx",
> +   (long)memmap[OR1KSIM_OMPIC].base);
> +qemu_fdt_add_subnode(fdt, nodename);
> +qemu_fdt_setprop_string(fdt, nodename, "compatible", 
> 

Re: Call for GSoC and Outreachy project ideas for summer 2022

2022-02-17 Thread Paolo Bonzini

On 2/14/22 15:01, Stefan Hajnoczi wrote:

Thanks for this idea. As a stretch goal we could add implementing the
packed virtqueue layout in Linux vhost, QEMU's libvhost-user, and/or
QEMU's virtio qtest code.


Why not have a separate project for packed virtqueue layout?

Paolo



Re: [PATCH v2 4/4] hw/openrisc/openrisc_sim: Add support for initrd loading

2022-02-17 Thread Peter Maydell
On Thu, 10 Feb 2022 at 13:13, Stafford Horne  wrote:
>
> The initrd passed via the command line is loaded into memory.  It's
> location and size is then added to the device tree so the kernel knows
> where to find it.
>
> Signed-off-by: Stafford Horne 
> ---
>  hw/openrisc/openrisc_sim.c | 32 +++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/hw/openrisc/openrisc_sim.c b/hw/openrisc/openrisc_sim.c
> index d7c26af82c..5354797e20 100644
> --- a/hw/openrisc/openrisc_sim.c
> +++ b/hw/openrisc/openrisc_sim.c
> @@ -187,6 +187,32 @@ static hwaddr openrisc_load_kernel(ram_addr_t ram_size,
>  return 0;
>  }
>
> +static hwaddr openrisc_load_initrd(Or1ksimState *s, const char *filename,
> +hwaddr load_start, uint64_t mem_size)

Indentation here is off.
Otherwise
Reviewed-by: Peter Maydell 

thanks
-- PMM



[PATCH v5 14/15] docs: Add documentation for SR-IOV and Virtualization Enhancements

2022-02-17 Thread Lukasz Maniak
Signed-off-by: Lukasz Maniak 
---
 docs/system/devices/nvme.rst | 82 
 1 file changed, 82 insertions(+)

diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst
index b5acb2a9c19..aba253304e4 100644
--- a/docs/system/devices/nvme.rst
+++ b/docs/system/devices/nvme.rst
@@ -239,3 +239,85 @@ The virtual namespace device supports DIF- and DIX-based 
protection information
   to ``1`` to transfer protection information as the first eight bytes of
   metadata. Otherwise, the protection information is transferred as the last
   eight bytes.
+
+Virtualization Enhancements and SR-IOV (Experimental Support)
+-
+
+The ``nvme`` device supports Single Root I/O Virtualization and Sharing
+along with Virtualization Enhancements. The controller has to be linked to
+an NVM Subsystem device (``nvme-subsys``) for use with SR-IOV.
+
+A number of parameters are present (**please note, that they may be
+subject to change**):
+
+``sriov_max_vfs`` (default: ``0``)
+  Indicates the maximum number of PCIe virtual functions supported
+  by the controller. Specifying a non-zero value enables reporting of both
+  SR-IOV and ARI (Alternative Routing-ID Interpretation) capabilities
+  by the NVMe device. Virtual function controllers will not report SR-IOV.
+
+``sriov_vq_flexible``
+  Indicates the total number of flexible queue resources assignable to all
+  the secondary controllers. Implicitly sets the number of primary
+  controller's private resources to ``(max_ioqpairs - sriov_vq_flexible)``.
+
+``sriov_vi_flexible``
+  Indicates the total number of flexible interrupt resources assignable to
+  all the secondary controllers. Implicitly sets the number of primary
+  controller's private resources to ``(msix_qsize - sriov_vi_flexible)``.
+
+``sriov_max_vi_per_vf`` (default: ``0``)
+  Indicates the maximum number of virtual interrupt resources assignable
+  to a secondary controller. The default ``0`` resolves to
+  ``(sriov_vi_flexible / sriov_max_vfs)``
+
+``sriov_max_vq_per_vf`` (default: ``0``)
+  Indicates the maximum number of virtual queue resources assignable to
+  a secondary controller. The default ``0`` resolves to
+  ``(sriov_vq_flexible / sriov_max_vfs)``
+
+The simplest possible invocation enables the capability to set up one VF
+controller and assign an admin queue, an IO queue, and a MSI-X interrupt.
+
+.. code-block:: console
+
+   -device nvme-subsys,id=subsys0
+   -device nvme,serial=deadbeef,subsys=subsys0,sriov_max_vfs=1,
+sriov_vq_flexible=2,sriov_vi_flexible=1
+
+The minimum steps required to configure a functional NVMe secondary
+controller are:
+
+  * unbind flexible resources from the primary controller
+
+.. code-block:: console
+
+   nvme virt-mgmt /dev/nvme0 -c 0 -r 1 -a 1 -n 0
+   nvme virt-mgmt /dev/nvme0 -c 0 -r 0 -a 1 -n 0
+
+  * perform a Function Level Reset on the primary controller to actually
+release the resources
+
+.. code-block:: console
+
+   echo 1 > /sys/bus/pci/devices/:01:00.0/reset
+
+  * enable VF
+
+.. code-block:: console
+
+   echo 1 > /sys/bus/pci/devices/:01:00.0/sriov_numvfs
+
+  * assign the flexible resources to the VF and set it ONLINE
+
+.. code-block:: console
+
+   nvme virt-mgmt /dev/nvme0 -c 1 -r 1 -a 8 -n 1
+   nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 8 -n 2
+   nvme virt-mgmt /dev/nvme0 -c 1 -r 0 -a 9 -n 0
+
+  * bind the NVMe driver to the VF
+
+.. code-block:: console
+
+   echo :01:00.1 > /sys/bus/pci/drivers/nvme/bind
\ No newline at end of file
-- 
2.25.1




[RFC PATCH] virtio-net: Unlimit tx queue size if peer is vdpa

2022-02-17 Thread Eugenio Pérez
The code used to limit the maximum size of tx queue for others backends
than vhost_user since the introduction of configurable tx queue size in
9b02e1618cf2 ("virtio-net: enable configurable tx queue size").

As vhost_user, vhost_vdpa devices should deal with memory region
crosses already, so let's use the full tx size.

Signed-off-by: Eugenio Pérez 
---
 hw/net/virtio-net.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 49cd13314a..b1769bfee0 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -629,17 +629,20 @@ static int virtio_net_max_tx_queue_size(VirtIONet *n)
 NetClientState *peer = n->nic_conf.peers.ncs[0];
 
 /*
- * Backends other than vhost-user don't support max queue size.
+ * Backends other than vhost-user or vhost-vdpa don't support max queue
+ * size.
  */
 if (!peer) {
 return VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE;
 }
 
-if (peer->info->type != NET_CLIENT_DRIVER_VHOST_USER) {
+switch(peer->info->type) {
+case NET_CLIENT_DRIVER_VHOST_USER:
+case NET_CLIENT_DRIVER_VHOST_VDPA:
+return VIRTQUEUE_MAX_SIZE;
+default:
 return VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE;
-}
-
-return VIRTQUEUE_MAX_SIZE;
+};
 }
 
 static int peer_attach(VirtIONet *n, int index)
-- 
2.27.0




[PATCH v5 09/15] hw/nvme: Make max_ioqpairs and msix_qsize configurable in runtime

2022-02-17 Thread Lukasz Maniak
From: Łukasz Gieryk 

The NVMe device defines two properties: max_ioqpairs, msix_qsize. Having
them as constants is problematic for SR-IOV support.

SR-IOV introduces virtual resources (queues, interrupts) that can be
assigned to PF and its dependent VFs. Each device, following a reset,
should work with the configured number of queues. A single constant is
no longer sufficient to hold the whole state.

This patch tries to solve the problem by introducing additional
variables in NvmeCtrl’s state. The variables for, e.g., managing queues
are therefore organized as:
 - n->params.max_ioqpairs – no changes, constant set by the user
 - n->(mutable_state) – (not a part of this patch) user-configurable,
specifies number of queues available _after_
reset
 - n->conf_ioqpairs - (new) used in all the places instead of the ‘old’
  n->params.max_ioqpairs; initialized in realize()
  and updated during reset() to reflect user’s
  changes to the mutable state

Since the number of available i/o queues and interrupts can change in
runtime, buffers for sq/cqs and the MSIX-related structures are
allocated big enough to handle the limits, to completely avoid the
complicated reallocation. A helper function (nvme_update_msixcap_ts)
updates the corresponding capability register, to signal configuration
changes.

Signed-off-by: Łukasz Gieryk 
---
 hw/nvme/ctrl.c | 52 ++
 hw/nvme/nvme.h |  2 ++
 2 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 7c1dd80f21d..f1b4026e4f8 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -445,12 +445,12 @@ static bool nvme_nsid_valid(NvmeCtrl *n, uint32_t nsid)
 
 static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
 {
-return sqid < n->params.max_ioqpairs + 1 && n->sq[sqid] != NULL ? 0 : -1;
+return sqid < n->conf_ioqpairs + 1 && n->sq[sqid] != NULL ? 0 : -1;
 }
 
 static int nvme_check_cqid(NvmeCtrl *n, uint16_t cqid)
 {
-return cqid < n->params.max_ioqpairs + 1 && n->cq[cqid] != NULL ? 0 : -1;
+return cqid < n->conf_ioqpairs + 1 && n->cq[cqid] != NULL ? 0 : -1;
 }
 
 static void nvme_inc_cq_tail(NvmeCQueue *cq)
@@ -4188,8 +4188,7 @@ static uint16_t nvme_create_sq(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_err_invalid_create_sq_cqid(cqid);
 return NVME_INVALID_CQID | NVME_DNR;
 }
-if (unlikely(!sqid || sqid > n->params.max_ioqpairs ||
-n->sq[sqid] != NULL)) {
+if (unlikely(!sqid || sqid > n->conf_ioqpairs || n->sq[sqid] != NULL)) {
 trace_pci_nvme_err_invalid_create_sq_sqid(sqid);
 return NVME_INVALID_QID | NVME_DNR;
 }
@@ -4541,8 +4540,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_create_cq(prp1, cqid, vector, qsize, qflags,
  NVME_CQ_FLAGS_IEN(qflags) != 0);
 
-if (unlikely(!cqid || cqid > n->params.max_ioqpairs ||
-n->cq[cqid] != NULL)) {
+if (unlikely(!cqid || cqid > n->conf_ioqpairs || n->cq[cqid] != NULL)) {
 trace_pci_nvme_err_invalid_create_cq_cqid(cqid);
 return NVME_INVALID_QID | NVME_DNR;
 }
@@ -4558,7 +4556,7 @@ static uint16_t nvme_create_cq(NvmeCtrl *n, NvmeRequest 
*req)
 trace_pci_nvme_err_invalid_create_cq_vector(vector);
 return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
 }
-if (unlikely(vector >= n->params.msix_qsize)) {
+if (unlikely(vector >= n->conf_msix_qsize)) {
 trace_pci_nvme_err_invalid_create_cq_vector(vector);
 return NVME_INVALID_IRQ_VECTOR | NVME_DNR;
 }
@@ -5155,13 +5153,12 @@ defaults:
 
 break;
 case NVME_NUMBER_OF_QUEUES:
-result = (n->params.max_ioqpairs - 1) |
-((n->params.max_ioqpairs - 1) << 16);
+result = (n->conf_ioqpairs - 1) | ((n->conf_ioqpairs - 1) << 16);
 trace_pci_nvme_getfeat_numq(result);
 break;
 case NVME_INTERRUPT_VECTOR_CONF:
 iv = dw11 & 0x;
-if (iv >= n->params.max_ioqpairs + 1) {
+if (iv >= n->conf_ioqpairs + 1) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
@@ -5316,10 +5313,10 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, 
NvmeRequest *req)
 
 trace_pci_nvme_setfeat_numq((dw11 & 0x) + 1,
 ((dw11 >> 16) & 0x) + 1,
-n->params.max_ioqpairs,
-n->params.max_ioqpairs);
-req->cqe.result = cpu_to_le32((n->params.max_ioqpairs - 1) |
-  ((n->params.max_ioqpairs - 1) << 16));
+n->conf_ioqpairs,
+n->conf_ioqpairs);
+req->cqe.result = cpu_to_le32((n->conf_ioqpairs - 1) |
+  ((n->conf_ioqpairs - 1) << 16));
 break;
 case 

[PATCH v5 13/15] hw/nvme: Add support for the Virtualization Management command

2022-02-17 Thread Lukasz Maniak
From: Łukasz Gieryk 

With the new command one can:
 - assign flexible resources (queues, interrupts) to primary and
   secondary controllers,
 - toggle the online/offline state of given controller.

Signed-off-by: Łukasz Gieryk 
---
 hw/nvme/ctrl.c   | 257 ++-
 hw/nvme/nvme.h   |  20 
 hw/nvme/trace-events |   3 +
 include/block/nvme.h |  17 +++
 4 files changed, 295 insertions(+), 2 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 2a6a36e733d..a9742cf5051 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -188,6 +188,7 @@
 #include "qemu/error-report.h"
 #include "qemu/log.h"
 #include "qemu/units.h"
+#include "qemu/range.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
 #include "sysemu/sysemu.h"
@@ -259,6 +260,7 @@ static const uint32_t nvme_cse_acs[256] = {
 [NVME_ADM_CMD_GET_FEATURES] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_ASYNC_EV_REQ] = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_NS_ATTACHMENT]= NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_NIC,
+[NVME_ADM_CMD_VIRT_MNGMT]   = NVME_CMD_EFF_CSUPP,
 [NVME_ADM_CMD_FORMAT_NVM]   = NVME_CMD_EFF_CSUPP | NVME_CMD_EFF_LBCC,
 };
 
@@ -290,6 +292,7 @@ static const uint32_t nvme_cse_iocs_zoned[256] = {
 };
 
 static void nvme_process_sq(void *opaque);
+static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst);
 
 static uint16_t nvme_sqid(NvmeRequest *req)
 {
@@ -5694,6 +5697,167 @@ out:
 return status;
 }
 
+static void nvme_get_virt_res_num(NvmeCtrl *n, uint8_t rt, int *num_total,
+  int *num_prim, int *num_sec)
+{
+*num_total = le32_to_cpu(rt ?
+ n->pri_ctrl_cap.vifrt : n->pri_ctrl_cap.vqfrt);
+*num_prim = le16_to_cpu(rt ?
+n->pri_ctrl_cap.virfap : n->pri_ctrl_cap.vqrfap);
+*num_sec = le16_to_cpu(rt ? n->pri_ctrl_cap.virfa : n->pri_ctrl_cap.vqrfa);
+}
+
+static uint16_t nvme_assign_virt_res_to_prim(NvmeCtrl *n, NvmeRequest *req,
+ uint16_t cntlid, uint8_t rt,
+ int nr)
+{
+int num_total, num_prim, num_sec;
+
+if (cntlid != n->cntlid) {
+return NVME_INVALID_CTRL_ID | NVME_DNR;
+}
+
+nvme_get_virt_res_num(n, rt, _total, _prim, _sec);
+
+if (nr > num_total) {
+return NVME_INVALID_NUM_RESOURCES | NVME_DNR;
+}
+
+if (nr > num_total - num_sec) {
+return NVME_INVALID_RESOURCE_ID | NVME_DNR;
+}
+
+if (rt) {
+n->next_pri_ctrl_cap.virfap = cpu_to_le16(nr);
+} else {
+n->next_pri_ctrl_cap.vqrfap = cpu_to_le16(nr);
+}
+
+req->cqe.result = cpu_to_le32(nr);
+return req->status;
+}
+
+static void nvme_update_virt_res(NvmeCtrl *n, NvmeSecCtrlEntry *sctrl,
+ uint8_t rt, int nr)
+{
+int prev_nr, prev_total;
+
+if (rt) {
+prev_nr = le16_to_cpu(sctrl->nvi);
+prev_total = le32_to_cpu(n->pri_ctrl_cap.virfa);
+sctrl->nvi = cpu_to_le16(nr);
+n->pri_ctrl_cap.virfa = cpu_to_le32(prev_total + nr - prev_nr);
+} else {
+prev_nr = le16_to_cpu(sctrl->nvq);
+prev_total = le32_to_cpu(n->pri_ctrl_cap.vqrfa);
+sctrl->nvq = cpu_to_le16(nr);
+n->pri_ctrl_cap.vqrfa = cpu_to_le32(prev_total + nr - prev_nr);
+}
+}
+
+static uint16_t nvme_assign_virt_res_to_sec(NvmeCtrl *n, NvmeRequest *req,
+uint16_t cntlid, uint8_t rt, int 
nr)
+{
+int num_total, num_prim, num_sec, num_free, diff, limit;
+NvmeSecCtrlEntry *sctrl;
+
+sctrl = nvme_sctrl_for_cntlid(n, cntlid);
+if (!sctrl) {
+return NVME_INVALID_CTRL_ID | NVME_DNR;
+}
+
+if (sctrl->scs) {
+return NVME_INVALID_SEC_CTRL_STATE | NVME_DNR;
+}
+
+limit = le16_to_cpu(rt ? n->pri_ctrl_cap.vifrsm : n->pri_ctrl_cap.vqfrsm);
+if (nr > limit) {
+return NVME_INVALID_NUM_RESOURCES | NVME_DNR;
+}
+
+nvme_get_virt_res_num(n, rt, _total, _prim, _sec);
+num_free = num_total - num_prim - num_sec;
+diff = nr - le16_to_cpu(rt ? sctrl->nvi : sctrl->nvq);
+
+if (diff > num_free) {
+return NVME_INVALID_RESOURCE_ID | NVME_DNR;
+}
+
+nvme_update_virt_res(n, sctrl, rt, nr);
+req->cqe.result = cpu_to_le32(nr);
+
+return req->status;
+}
+
+static uint16_t nvme_virt_set_state(NvmeCtrl *n, uint16_t cntlid, bool online)
+{
+NvmeCtrl *sn = NULL;
+NvmeSecCtrlEntry *sctrl;
+int vf_index;
+
+sctrl = nvme_sctrl_for_cntlid(n, cntlid);
+if (!sctrl) {
+return NVME_INVALID_CTRL_ID | NVME_DNR;
+}
+
+if (!pci_is_vf(>parent_obj)) {
+vf_index = le16_to_cpu(sctrl->vfn) - 1;
+sn = NVME(pcie_sriov_get_vf_at_index(>parent_obj, vf_index));
+}
+
+if (online) {
+if (!sctrl->nvi || (le16_to_cpu(sctrl->nvq) < 2) || !sn) {
+return NVME_INVALID_SEC_CTRL_STATE | NVME_DNR;
+}
+
+if 

[PATCH v5 04/15] pcie: Add 1.2 version token for the Power Management Capability

2022-02-17 Thread Lukasz Maniak
From: Łukasz Gieryk 

Signed-off-by: Łukasz Gieryk 
---
 include/hw/pci/pci_regs.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/hw/pci/pci_regs.h b/include/hw/pci/pci_regs.h
index 77ba64b9314..a5901409622 100644
--- a/include/hw/pci/pci_regs.h
+++ b/include/hw/pci/pci_regs.h
@@ -4,5 +4,6 @@
 #include "standard-headers/linux/pci_regs.h"
 
 #define  PCI_PM_CAP_VER_1_1 0x0002  /* PCI PM spec ver. 1.1 */
+#define  PCI_PM_CAP_VER_1_2 0x0003  /* PCI PM spec ver. 1.2 */
 
 #endif
-- 
2.25.1




[PATCH v5 15/15] hw/nvme: Update the initalization place for the AER queue

2022-02-17 Thread Lukasz Maniak
From: Łukasz Gieryk 

This patch updates the initialization place for the AER queue, so it’s
initialized once, at controller initialization, and not every time
controller is enabled.

While the original version works for a non-SR-IOV device, as it’s hard
to interact with the controller if it’s not enabled, the multiple
reinitialization is not necessarily correct.

With the SR/IOV feature enabled a segfault can happen: a VF can have its
controller disabled, while a namespace can still be attached to the
controller through the parent PF. An event generated in such case ends
up on an uninitialized queue.

While it’s an interesting question whether a VF should support AER in
the first place, I don’t think it must be answered today.

Signed-off-by: Łukasz Gieryk 
---
 hw/nvme/ctrl.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index a9742cf5051..ae41fced596 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6182,8 +6182,6 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 
 nvme_set_timestamp(n, 0ULL);
 
-QTAILQ_INIT(>aer_queue);
-
 nvme_select_iocs(n);
 
 return 0;
@@ -6844,6 +6842,7 @@ static void nvme_init_state(NvmeCtrl *n)
 n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
 n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
 n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
+QTAILQ_INIT(>aer_queue);
 
 list->numcntl = cpu_to_le16(max_vfs);
 for (i = 0; i < max_vfs; i++) {
-- 
2.25.1




Re: Call for GSoC and Outreachy project ideas for summer 2022

2022-02-17 Thread Paolo Bonzini

On 1/28/22 16:47, Stefan Hajnoczi wrote:

Dear QEMU, KVM, and rust-vmm communities,
QEMU will apply for Google Summer of Code 2022
(https://summerofcode.withgoogle.com/) and has been accepted into
Outreachy May-August 2022 (https://www.outreachy.org/). You can now
submit internship project ideas for QEMU, KVM, and rust-vmm!

If you have experience contributing to QEMU, KVM, or rust-vmm you can
be a mentor. It's a great way to give back and you get to work with
people who are just starting out in open source.

Please reply to this email by February 21st with your project ideas.


I would like to co-mentor one or more projects about adding more 
statistics to Mark Kanda's newly-born introspectable statistics 
subsystem in QEMU 
(https://patchew.org/QEMU/20220215150433.2310711-1-mark.ka...@oracle.com/), 
for example integrating "info blockstats"; and/or, to add matching 
functionality to libvirt.


However, I will only be available for co-mentoring unfortunately.

Paolo


Good project ideas are suitable for remote work by a competent
programmer who is not yet familiar with the codebase. In
addition, they are:
- Well-defined - the scope is clear
- Self-contained - there are few dependencies
- Uncontroversial - they are acceptable to the community
- Incremental - they produce deliverables along the way

Feel free to post ideas even if you are unable to mentor the project.
It doesn't hurt to share the idea!

I will review project ideas and keep you up-to-date on QEMU's
acceptance into GSoC.

Internship program details:
- Paid, remote work open source internships
- GSoC projects are 175 or 350 hours, Outreachy projects are 30
hrs/week for 12 weeks
- Mentored by volunteers from QEMU, KVM, and rust-vmm
- Mentors typically spend at least 5 hours per week during the coding period

Changes since last year: GSoC now has 175 or 350 hour project sizes
instead of 12 week full-time projects. GSoC will accept applicants who
are not students, before it was limited to students.

For more background on QEMU internships, check out this video:
https://www.youtube.com/watch?v=xNVCX7YMUL8

Please let me know if you have any questions!

Stefan






[PATCH v5 07/15] hw/nvme: Add support for Secondary Controller List

2022-02-17 Thread Lukasz Maniak
Introduce handling for Secondary Controller List (Identify command with
CNS value of 15h).

Secondary controller ids are unique in the subsystem, hence they are
reserved by it upon initialization of the primary controller to the
number of sriov_max_vfs.

ID reservation requires the addition of an intermediate controller slot
state, so the reserved controller has the address 0x.
A secondary controller is in the reserved state when it has no virtual
function assigned, but its primary controller is realized.
Secondary controller reservations are released to NULL when its primary
controller is unregistered.

Signed-off-by: Lukasz Maniak 
---
 hw/nvme/ctrl.c   | 35 +
 hw/nvme/ns.c |  2 +-
 hw/nvme/nvme.h   | 18 +++
 hw/nvme/subsys.c | 75 ++--
 hw/nvme/trace-events |  1 +
 include/block/nvme.h | 20 
 6 files changed, 141 insertions(+), 10 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 0bd55948ce1..05acd681656 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4705,6 +4705,29 @@ static uint16_t nvme_identify_pri_ctrl_cap(NvmeCtrl *n, 
NvmeRequest *req)
 sizeof(NvmePriCtrlCap), req);
 }
 
+static uint16_t nvme_identify_sec_ctrl_list(NvmeCtrl *n, NvmeRequest *req)
+{
+NvmeIdentify *c = (NvmeIdentify *)>cmd;
+uint16_t pri_ctrl_id = le16_to_cpu(n->pri_ctrl_cap.cntlid);
+uint16_t min_id = le16_to_cpu(c->ctrlid);
+uint8_t num_sec_ctrl = n->sec_ctrl_list.numcntl;
+NvmeSecCtrlList list = {0};
+uint8_t i;
+
+for (i = 0; i < num_sec_ctrl; i++) {
+if (n->sec_ctrl_list.sec[i].scid >= min_id) {
+list.numcntl = num_sec_ctrl - i;
+memcpy(, n->sec_ctrl_list.sec + i,
+   list.numcntl * sizeof(NvmeSecCtrlEntry));
+break;
+}
+}
+
+trace_pci_nvme_identify_sec_ctrl_list(pri_ctrl_id, list.numcntl);
+
+return nvme_c2h(n, (uint8_t *), sizeof(list), req);
+}
+
 static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req,
  bool active)
 {
@@ -4925,6 +4948,8 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest 
*req)
 return nvme_identify_ctrl_list(n, req, false);
 case NVME_ID_CNS_PRIMARY_CTRL_CAP:
 return nvme_identify_pri_ctrl_cap(n, req);
+case NVME_ID_CNS_SECONDARY_CTRL_LIST:
+return nvme_identify_sec_ctrl_list(n, req);
 case NVME_ID_CNS_CS_NS:
 return nvme_identify_ns_csi(n, req, true);
 case NVME_ID_CNS_CS_NS_PRESENT:
@@ -6476,6 +6501,9 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 static void nvme_init_state(NvmeCtrl *n)
 {
 NvmePriCtrlCap *cap = >pri_ctrl_cap;
+NvmeSecCtrlList *list = >sec_ctrl_list;
+NvmeSecCtrlEntry *sctrl;
+int i;
 
 /* add one to max_ioqpairs to account for the admin queue pair */
 n->reg_size = pow2ceil(sizeof(NvmeBar) +
@@ -6487,6 +6515,13 @@ static void nvme_init_state(NvmeCtrl *n)
 n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
 n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
 
+list->numcntl = cpu_to_le16(n->params.sriov_max_vfs);
+for (i = 0; i < n->params.sriov_max_vfs; i++) {
+sctrl = >sec[i];
+sctrl->pcid = cpu_to_le16(n->cntlid);
+sctrl->vfn = cpu_to_le16(i + 1);
+}
+
 cap->cntlid = cpu_to_le16(n->cntlid);
 }
 
diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index ee673f1a5be..d42fba117f1 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -567,7 +567,7 @@ static void nvme_ns_realize(DeviceState *dev, Error **errp)
 for (i = 0; i < ARRAY_SIZE(subsys->ctrls); i++) {
 NvmeCtrl *ctrl = subsys->ctrls[i];
 
-if (ctrl) {
+if (ctrl && ctrl != SUBSYS_SLOT_RSVD) {
 nvme_attach_ns(ctrl, ns);
 }
 }
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 2db48eb25c9..f4494e5236f 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -43,6 +43,7 @@ typedef struct NvmeBus {
 #define TYPE_NVME_SUBSYS "nvme-subsys"
 #define NVME_SUBSYS(obj) \
 OBJECT_CHECK(NvmeSubsystem, (obj), TYPE_NVME_SUBSYS)
+#define SUBSYS_SLOT_RSVD (void *)0x
 
 typedef struct NvmeSubsystem {
 DeviceState parent_obj;
@@ -67,6 +68,10 @@ static inline NvmeCtrl *nvme_subsys_ctrl(NvmeSubsystem 
*subsys,
 return NULL;
 }
 
+if (subsys->ctrls[cntlid] == SUBSYS_SLOT_RSVD) {
+return NULL;
+}
+
 return subsys->ctrls[cntlid];
 }
 
@@ -473,6 +478,7 @@ typedef struct NvmeCtrl {
 } features;
 
 NvmePriCtrlCap  pri_ctrl_cap;
+NvmeSecCtrlList sec_ctrl_list;
 } NvmeCtrl;
 
 static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
@@ -507,6 +513,18 @@ static inline uint16_t nvme_cid(NvmeRequest *req)
 return le16_to_cpu(req->cqe.cid);
 }
 
+static inline NvmeSecCtrlEntry *nvme_sctrl(NvmeCtrl *n)
+{
+PCIDevice *pci_dev = >parent_obj;
+NvmeCtrl 

[PATCH v5 11/15] hw/nvme: Calculate BAR attributes in a function

2022-02-17 Thread Lukasz Maniak
From: Łukasz Gieryk 

An NVMe device with SR-IOV capability calculates the BAR size
differently for PF and VF, so it makes sense to extract the common code
to a separate function.

Signed-off-by: Łukasz Gieryk 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 45 +++--
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 6abec8e4369..73707565345 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6584,6 +6584,34 @@ static void nvme_init_pmr(NvmeCtrl *n, PCIDevice 
*pci_dev)
 memory_region_set_enabled(>pmr.dev->mr, false);
 }
 
+static uint64_t nvme_bar_size(unsigned total_queues, unsigned total_irqs,
+  unsigned *msix_table_offset,
+  unsigned *msix_pba_offset)
+{
+uint64_t bar_size, msix_table_size, msix_pba_size;
+
+bar_size = sizeof(NvmeBar) + 2 * total_queues * NVME_DB_SIZE;
+bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB);
+
+if (msix_table_offset) {
+*msix_table_offset = bar_size;
+}
+
+msix_table_size = PCI_MSIX_ENTRY_SIZE * total_irqs;
+bar_size += msix_table_size;
+bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB);
+
+if (msix_pba_offset) {
+*msix_pba_offset = bar_size;
+}
+
+msix_pba_size = QEMU_ALIGN_UP(total_irqs, 64) / 8;
+bar_size += msix_pba_size;
+
+bar_size = pow2ceil(bar_size);
+return bar_size;
+}
+
 static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset,
 uint64_t bar_size)
 {
@@ -6623,7 +6651,7 @@ static int nvme_add_pm_capability(PCIDevice *pci_dev, 
uint8_t offset)
 static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
 {
 uint8_t *pci_conf = pci_dev->config;
-uint64_t bar_size, msix_table_size, msix_pba_size;
+uint64_t bar_size;
 unsigned msix_table_offset, msix_pba_offset;
 int ret;
 
@@ -6649,19 +6677,8 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 }
 
 /* add one to max_ioqpairs to account for the admin queue pair */
-bar_size = sizeof(NvmeBar) +
-   2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE;
-bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB);
-msix_table_offset = bar_size;
-msix_table_size = PCI_MSIX_ENTRY_SIZE * n->params.msix_qsize;
-
-bar_size += msix_table_size;
-bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB);
-msix_pba_offset = bar_size;
-msix_pba_size = QEMU_ALIGN_UP(n->params.msix_qsize, 64) / 8;
-
-bar_size += msix_pba_size;
-bar_size = pow2ceil(bar_size);
+bar_size = nvme_bar_size(n->params.max_ioqpairs + 1, n->params.msix_qsize,
+ _table_offset, _pba_offset);
 
 memory_region_init(>bar0, OBJECT(n), "nvme-bar0", bar_size);
 memory_region_init_io(>iomem, OBJECT(n), _mmio_ops, n, "nvme",
-- 
2.25.1




[PATCH v5 02/15] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt

2022-02-17 Thread Lukasz Maniak
From: Knut Omang 

Add a small intro + minimal documentation for how to
implement SR/IOV support for an emulated device.

Signed-off-by: Knut Omang 
---
 docs/pcie_sriov.txt | 115 
 1 file changed, 115 insertions(+)
 create mode 100644 docs/pcie_sriov.txt

diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
new file mode 100644
index 000..f5e891e1d45
--- /dev/null
+++ b/docs/pcie_sriov.txt
@@ -0,0 +1,115 @@
+PCI SR/IOV EMULATION SUPPORT
+
+
+Description
+===
+SR/IOV (Single Root I/O Virtualization) is an optional extended capability
+of a PCI Express device. It allows a single physical function (PF) to appear 
as multiple
+virtual functions (VFs) for the main purpose of eliminating software
+overhead in I/O from virtual machines.
+
+Qemu now implements the basic common functionality to enable an emulated device
+to support SR/IOV. Yet no fully implemented devices exists in Qemu, but a
+proof-of-concept hack of the Intel igb can be found here:
+
+git://github.com/knuto/qemu.git sriov_patches_v5
+
+Implementation
+==
+Implementing emulation of an SR/IOV capable device typically consists of
+implementing support for two types of device classes; the "normal" physical 
device
+(PF) and the virtual device (VF). From Qemu's perspective, the VFs are just
+like other devices, except that some of their properties are derived from
+the PF.
+
+A virtual function is different from a physical function in that the BAR
+space for all VFs are defined by the BAR registers in the PFs SR/IOV
+capability. All VFs have the same BARs and BAR sizes.
+
+Accesses to these virtual BARs then is computed as
+
++  *  + 
+
+From our emulation perspective this means that there is a separate call for
+setting up a BAR for a VF.
+
+1) To enable SR/IOV support in the PF, it must be a PCI Express device so
+   you would need to add a PCI Express capability in the normal PCI
+   capability list. You might also want to add an ARI (Alternative
+   Routing-ID Interpretation) capability to indicate that your device
+   supports functions beyond it's "own" function space (0-7),
+   which is necessary to support more than 7 functions, or
+   if functions extends beyond offset 7 because they are placed at an
+   offset > 1 or have stride > 1.
+
+   ...
+   #include "hw/pci/pcie.h"
+   #include "hw/pci/pcie_sriov.h"
+
+   pci_your_pf_dev_realize( ... )
+   {
+  ...
+  int ret = pcie_endpoint_cap_init(d, 0x70);
+  ...
+  pcie_ari_init(d, 0x100, 1);
+  ...
+
+  /* Add and initialize the SR/IOV capability */
+  pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
+   vf_devid, initial_vfs, total_vfs,
+   fun_offset, stride);
+
+  /* Set up individual VF BARs (parameters as for normal BARs) */
+  pcie_sriov_pf_init_vf_bar( ... )
+  ...
+   }
+
+   For cleanup, you simply call:
+
+  pcie_sriov_pf_exit(device);
+
+   which will delete all the virtual functions and associated resources.
+
+2) Similarly in the implementation of the virtual function, you need to
+   make it a PCI Express device and add a similar set of capabilities
+   except for the SR/IOV capability. Then you need to set up the VF BARs as
+   subregions of the PFs SR/IOV VF BARs by calling
+   pcie_sriov_vf_register_bar() instead of the normal pci_register_bar() call:
+
+   pci_your_vf_dev_realize( ... )
+   {
+  ...
+  int ret = pcie_endpoint_cap_init(d, 0x60);
+  ...
+  pcie_ari_init(d, 0x100, 1);
+  ...
+  memory_region_init(mr, ... )
+  pcie_sriov_vf_register_bar(d, bar_nr, mr);
+  ...
+   }
+
+Testing on Linux guest
+==
+The easiest is if your device driver supports sysfs based SR/IOV
+enabling. Support for this was added in kernel v.3.8, so not all drivers
+support it yet.
+
+To enable 4 VFs for a device at 01:00.0:
+
+   modprobe yourdriver
+   echo 4 > /sys/bus/pci/devices/:01:00.0/sriov_numvfs
+
+You should now see 4 VFs with lspci.
+To turn SR/IOV off again - the standard requires you to turn it off before you 
can enable
+another VF count, and the emulation enforces this:
+
+   echo 0 > /sys/bus/pci/devices/:01:00.0/sriov_numvfs
+
+Older drivers typically provide a max_vfs module parameter
+to enable it at load time:
+
+   modprobe yourdriver max_vfs=4
+
+To disable the VFs again then, you simply have to unload the driver:
+
+   rmmod yourdriver
-- 
2.25.1




[PATCH v5 12/15] hw/nvme: Initialize capability structures for primary/secondary controllers

2022-02-17 Thread Lukasz Maniak
From: Łukasz Gieryk 

With four new properties:
 - sriov_v{i,q}_flexible,
 - sriov_max_v{i,q}_per_vf,
one can configure the number of available flexible resources, as well as
the limits. The primary and secondary controller capability structures
are initialized accordingly.

Since the number of available queues (interrupts) now varies between
VF/PF, BAR size calculation is also adjusted.

Signed-off-by: Łukasz Gieryk 
---
 hw/nvme/ctrl.c   | 142 ---
 hw/nvme/nvme.h   |   4 ++
 include/block/nvme.h |   5 ++
 3 files changed, 144 insertions(+), 7 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 73707565345..2a6a36e733d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -36,6 +36,10 @@
  *  zoned.zasl=, \
  *  zoned.auto_transition=, \
  *  sriov_max_vfs= \
+ *  sriov_vq_flexible= \
+ *  sriov_vi_flexible= \
+ *  sriov_max_vi_per_vf= \
+ *  sriov_max_vq_per_vf= \
  *  subsys=
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=, \
@@ -113,6 +117,29 @@
  *   enables reporting of both SR-IOV and ARI capabilities by the NVMe device.
  *   Virtual function controllers will not report SR-IOV capability.
  *
+ *   NOTE: Single Root I/O Virtualization support is experimental.
+ *   All the related parameters may be subject to change.
+ *
+ * - `sriov_vq_flexible`
+ *   Indicates the total number of flexible queue resources assignable to all
+ *   the secondary controllers. Implicitly sets the number of primary
+ *   controller's private resources to `(max_ioqpairs - sriov_vq_flexible)`.
+ *
+ * - `sriov_vi_flexible`
+ *   Indicates the total number of flexible interrupt resources assignable to
+ *   all the secondary controllers. Implicitly sets the number of primary
+ *   controller's private resources to `(msix_qsize - sriov_vi_flexible)`.
+ *
+ * - `sriov_max_vi_per_vf`
+ *   Indicates the maximum number of virtual interrupt resources assignable
+ *   to a secondary controller. The default 0 resolves to
+ *   `(sriov_vi_flexible / sriov_max_vfs)`.
+ *
+ * - `sriov_max_vq_per_vf`
+ *   Indicates the maximum number of virtual queue resources assignable to
+ *   a secondary controller. The default 0 resolves to
+ *   `(sriov_vq_flexible / sriov_max_vfs)`.
+ *
  * nvme namespace device parameters
  * 
  * - `shared`
@@ -184,6 +211,7 @@
 #define NVME_NUM_FW_SLOTS 1
 #define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB)
 #define NVME_MAX_VFS 127
+#define NVME_VF_RES_GRANULARITY 1
 #define NVME_VF_OFFSET 0x1
 #define NVME_VF_STRIDE 1
 
@@ -6512,6 +6540,54 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 error_setg(errp, "PMR is not supported with SR-IOV");
 return;
 }
+
+if (!params->sriov_vq_flexible || !params->sriov_vi_flexible) {
+error_setg(errp, "both sriov_vq_flexible and sriov_vi_flexible"
+   " must be set for the use of SR-IOV");
+return;
+}
+
+if (params->sriov_vq_flexible < params->sriov_max_vfs * 2) {
+error_setg(errp, "sriov_vq_flexible must be greater than or equal"
+   " to %d (sriov_max_vfs * 2)", params->sriov_max_vfs * 
2);
+return;
+}
+
+if (params->max_ioqpairs < params->sriov_vq_flexible + 2) {
+error_setg(errp, "sriov_vq_flexible - max_ioqpairs (PF-private"
+   " queue resources) must be greater than or equal to 2");
+return;
+}
+
+if (params->sriov_vi_flexible < params->sriov_max_vfs) {
+error_setg(errp, "sriov_vi_flexible must be greater than or equal"
+   " to %d (sriov_max_vfs)", params->sriov_max_vfs);
+return;
+}
+
+if (params->msix_qsize < params->sriov_vi_flexible + 1) {
+error_setg(errp, "sriov_vi_flexible - msix_qsize (PF-private"
+   " interrupt resources) must be greater than or equal"
+   " to 1");
+return;
+}
+
+if (params->sriov_max_vi_per_vf &&
+(params->sriov_max_vi_per_vf - 1) % NVME_VF_RES_GRANULARITY) {
+error_setg(errp, "sriov_max_vi_per_vf must meet:"
+   " (X - 1) %% %d == 0 and X >= 1",
+   NVME_VF_RES_GRANULARITY);
+return;
+}
+
+if (params->sriov_max_vq_per_vf &&
+(params->sriov_max_vq_per_vf < 2 ||
+ (params->sriov_max_vq_per_vf - 1) % NVME_VF_RES_GRANULARITY)) {
+error_setg(errp, "sriov_max_vq_per_vf must meet:"
+   " (X - 1) %% %d == 0 and X >= 2",
+   NVME_VF_RES_GRANULARITY);
+return;
+}
 }
 }
 
@@ -6520,10 +6596,19 @@ static void nvme_init_state(NvmeCtrl *n)
 NvmePriCtrlCap *cap = >pri_ctrl_cap;
 

[PATCH v5 03/15] pcie: Add a helper to the SR/IOV API

2022-02-17 Thread Lukasz Maniak
From: Łukasz Gieryk 

Convenience function for retrieving the PCIDevice object of the N-th VF.

Signed-off-by: Łukasz Gieryk 
Reviewed-by: Knut Omang 
---
 hw/pci/pcie_sriov.c | 10 +-
 include/hw/pci/pcie_sriov.h |  6 ++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 3f256d483fa..87abad6ac86 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -287,8 +287,16 @@ uint16_t pcie_sriov_vf_number(PCIDevice *dev)
 return dev->exp.sriov_vf.vf_number;
 }
 
-
 PCIDevice *pcie_sriov_get_pf(PCIDevice *dev)
 {
 return dev->exp.sriov_vf.pf;
 }
+
+PCIDevice *pcie_sriov_get_vf_at_index(PCIDevice *dev, int n)
+{
+assert(!pci_is_vf(dev));
+if (n < dev->exp.sriov_pf.num_vfs) {
+return dev->exp.sriov_pf.vf[n];
+}
+return NULL;
+}
diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index 990cff0a1c6..80f5c84e75c 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -68,4 +68,10 @@ uint16_t pcie_sriov_vf_number(PCIDevice *dev);
  */
 PCIDevice *pcie_sriov_get_pf(PCIDevice *dev);
 
+/*
+ * Get the n-th VF of this physical function - only valid for PF.
+ * Returns NULL if index is invalid
+ */
+PCIDevice *pcie_sriov_get_vf_at_index(PCIDevice *dev, int n);
+
 #endif /* QEMU_PCIE_SRIOV_H */
-- 
2.25.1




[PATCH v5 08/15] hw/nvme: Implement the Function Level Reset

2022-02-17 Thread Lukasz Maniak
From: Łukasz Gieryk 

This patch implements the Function Level Reset, a feature currently not
implemented for the Nvme device, while listed as a mandatory ("shall")
in the 1.4 spec.

The implementation reuses FLR-related building blocks defined for the
pci-bridge module, and follows the same logic:
- FLR capability is advertised in the PCIE config,
- custom pci_write_config callback detects a write to the trigger
  register and performs the PCI reset,
- which, eventually, calls the custom dc->reset handler.

Depending on reset type, parts of the state should (or should not) be
cleared. To distinguish the type of reset, an additional parameter is
passed to the reset function.

This patch also enables advertisement of the Power Management PCI
capability. The main reason behind it is to announce the no_soft_reset=1
bit, to signal SR-IOV support where each VF can be reset individually.

The implementation purposedly ignores writes to the PMCS.PS register,
as even such naïve behavior is enough to correctly handle the D3->D0
transition.

It’s worth to note, that the power state transition back to to D3, with
all the corresponding side effects, wasn't and stil isn't handled
properly.

Signed-off-by: Łukasz Gieryk 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 52 
 hw/nvme/nvme.h   |  5 +
 hw/nvme/trace-events |  1 +
 3 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 05acd681656..7c1dd80f21d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -5757,7 +5757,7 @@ static void nvme_process_sq(void *opaque)
 }
 }
 
-static void nvme_ctrl_reset(NvmeCtrl *n)
+static void nvme_ctrl_reset(NvmeCtrl *n, NvmeResetType rst)
 {
 NvmeNamespace *ns;
 int i;
@@ -5789,7 +5789,9 @@ static void nvme_ctrl_reset(NvmeCtrl *n)
 }
 
 if (!pci_is_vf(>parent_obj) && n->params.sriov_max_vfs) {
-pcie_sriov_pf_disable_vfs(>parent_obj);
+if (rst != NVME_RESET_CONTROLLER) {
+pcie_sriov_pf_disable_vfs(>parent_obj);
+}
 }
 
 n->aer_queued = 0;
@@ -6023,7 +6025,7 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, 
uint64_t data,
 }
 } else if (!NVME_CC_EN(data) && NVME_CC_EN(cc)) {
 trace_pci_nvme_mmio_stopped();
-nvme_ctrl_reset(n);
+nvme_ctrl_reset(n, NVME_RESET_CONTROLLER);
 cc = 0;
 csts &= ~NVME_CSTS_READY;
 }
@@ -6581,6 +6583,28 @@ static void nvme_init_sriov(NvmeCtrl *n, PCIDevice 
*pci_dev, uint16_t offset,
   PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size);
 }
 
+static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset)
+{
+Error *err = NULL;
+int ret;
+
+ret = pci_add_capability(pci_dev, PCI_CAP_ID_PM, offset,
+ PCI_PM_SIZEOF, );
+if (err) {
+error_report_err(err);
+return ret;
+}
+
+pci_set_word(pci_dev->config + offset + PCI_PM_PMC,
+ PCI_PM_CAP_VER_1_2);
+pci_set_word(pci_dev->config + offset + PCI_PM_CTRL,
+ PCI_PM_CTRL_NO_SOFT_RESET);
+pci_set_word(pci_dev->wmask + offset + PCI_PM_CTRL,
+ PCI_PM_CTRL_STATE_MASK);
+
+return 0;
+}
+
 static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
 {
 uint8_t *pci_conf = pci_dev->config;
@@ -6602,7 +6626,9 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, 
Error **errp)
 }
 
 pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS);
+nvme_add_pm_capability(pci_dev, 0x60);
 pcie_endpoint_cap_init(pci_dev, 0x80);
+pcie_cap_flr_init(pci_dev);
 if (n->params.sriov_max_vfs) {
 pcie_ari_init(pci_dev, 0x100, 1);
 }
@@ -6852,7 +6878,7 @@ static void nvme_exit(PCIDevice *pci_dev)
 NvmeNamespace *ns;
 int i;
 
-nvme_ctrl_reset(n);
+nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
 
 if (n->subsys) {
 for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
@@ -6951,6 +6977,22 @@ static void nvme_set_smart_warning(Object *obj, Visitor 
*v, const char *name,
 }
 }
 
+static void nvme_pci_reset(DeviceState *qdev)
+{
+PCIDevice *pci_dev = PCI_DEVICE(qdev);
+NvmeCtrl *n = NVME(pci_dev);
+
+trace_pci_nvme_pci_reset();
+nvme_ctrl_reset(n, NVME_RESET_FUNCTION);
+}
+
+static void nvme_pci_write_config(PCIDevice *dev, uint32_t address,
+  uint32_t val, int len)
+{
+pci_default_write_config(dev, address, val, len);
+pcie_cap_flr_write_config(dev, address, val, len);
+}
+
 static const VMStateDescription nvme_vmstate = {
 .name = "nvme",
 .unmigratable = 1,
@@ -6962,6 +7004,7 @@ static void nvme_class_init(ObjectClass *oc, void *data)
 PCIDeviceClass *pc = PCI_DEVICE_CLASS(oc);
 
 pc->realize = nvme_realize;
+pc->config_write = nvme_pci_write_config;
 pc->exit = nvme_exit;
 pc->class_id = PCI_CLASS_STORAGE_EXPRESS;

[PATCH v5 10/15] hw/nvme: Remove reg_size variable and update BAR0 size calculation

2022-02-17 Thread Lukasz Maniak
From: Łukasz Gieryk 

The n->reg_size parameter unnecessarily splits the BAR0 size calculation
in two phases; removed to simplify the code.

With all the calculations done in one place, it seems the pow2ceil,
applied originally to reg_size, is unnecessary. The rounding should
happen as the last step, when BAR size includes Nvme registers, queue
registers, and MSIX-related space.

Finally, the size of the mmio memory region is extended to cover the 1st
4KiB padding (see the map below). Access to this range is handled as
interaction with a non-existing queue and generates an error trace, so
actually nothing changes, while the reg_size variable is no longer needed.


|  BAR0|

[Nvme Registers]
[Queues]
[power-of-2 padding] - removed in this patch
[4KiB padding (1)  ]
[MSIX TABLE]
[4KiB padding (2)  ]
[MSIX PBA  ]
[power-of-2 padding]

Signed-off-by: Łukasz Gieryk 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 10 +-
 hw/nvme/nvme.h |  1 -
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index f1b4026e4f8..6abec8e4369 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6525,9 +6525,6 @@ static void nvme_init_state(NvmeCtrl *n)
 n->conf_ioqpairs = n->params.max_ioqpairs;
 n->conf_msix_qsize = n->params.msix_qsize;
 
-/* add one to max_ioqpairs to account for the admin queue pair */
-n->reg_size = pow2ceil(sizeof(NvmeBar) +
-   2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE);
 n->sq = g_new0(NvmeSQueue *, n->params.max_ioqpairs + 1);
 n->cq = g_new0(NvmeCQueue *, n->params.max_ioqpairs + 1);
 n->temperature = NVME_TEMPERATURE;
@@ -6651,7 +6648,10 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 pcie_ari_init(pci_dev, 0x100, 1);
 }
 
-bar_size = QEMU_ALIGN_UP(n->reg_size, 4 * KiB);
+/* add one to max_ioqpairs to account for the admin queue pair */
+bar_size = sizeof(NvmeBar) +
+   2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE;
+bar_size = QEMU_ALIGN_UP(bar_size, 4 * KiB);
 msix_table_offset = bar_size;
 msix_table_size = PCI_MSIX_ENTRY_SIZE * n->params.msix_qsize;
 
@@ -6665,7 +6665,7 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, 
Error **errp)
 
 memory_region_init(>bar0, OBJECT(n), "nvme-bar0", bar_size);
 memory_region_init_io(>iomem, OBJECT(n), _mmio_ops, n, "nvme",
-  n->reg_size);
+  msix_table_offset);
 memory_region_add_subregion(>bar0, 0, >iomem);
 
 if (pci_is_vf(pci_dev)) {
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 314a2894759..86b5b321331 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -424,7 +424,6 @@ typedef struct NvmeCtrl {
 uint16_tmax_prp_ents;
 uint16_tcqe_size;
 uint16_tsqe_size;
-uint32_treg_size;
 uint32_tmax_q_ents;
 uint8_t outstanding_aers;
 uint32_tirq_status;
-- 
2.25.1




[PATCH v5 05/15] hw/nvme: Add support for SR-IOV

2022-02-17 Thread Lukasz Maniak
This patch implements initial support for Single Root I/O Virtualization
on an NVMe device.

Essentially, it allows to define the maximum number of virtual functions
supported by the NVMe controller via sriov_max_vfs parameter.

Passing a non-zero value to sriov_max_vfs triggers reporting of SR-IOV
capability by a physical controller and ARI capability by both the
physical and virtual function devices.

NVMe controllers created via virtual functions mirror functionally
the physical controller, which may not entirely be the case, thus
consideration would be needed on the way to limit the capabilities of
the VF.

NVMe subsystem is required for the use of SR-IOV.

Signed-off-by: Lukasz Maniak 
---
 hw/nvme/ctrl.c   | 85 ++--
 hw/nvme/nvme.h   |  3 +-
 include/hw/pci/pci_ids.h |  1 +
 3 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 98aac98bef5..adeba0b2b6d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -35,6 +35,7 @@
  *  mdts=,vsl=, \
  *  zoned.zasl=, \
  *  zoned.auto_transition=, \
+ *  sriov_max_vfs= \
  *  subsys=
  *  -device nvme-ns,drive=,bus=,nsid=,\
  *  zoned=, \
@@ -106,6 +107,12 @@
  *   transitioned to zone state closed for resource management purposes.
  *   Defaults to 'on'.
  *
+ * - `sriov_max_vfs`
+ *   Indicates the maximum number of PCIe virtual functions supported
+ *   by the controller. The default value is 0. Specifying a non-zero value
+ *   enables reporting of both SR-IOV and ARI capabilities by the NVMe device.
+ *   Virtual function controllers will not report SR-IOV capability.
+ *
  * nvme namespace device parameters
  * 
  * - `shared`
@@ -160,6 +167,7 @@
 #include "sysemu/block-backend.h"
 #include "sysemu/hostmem.h"
 #include "hw/pci/msix.h"
+#include "hw/pci/pcie_sriov.h"
 #include "migration/vmstate.h"
 
 #include "nvme.h"
@@ -175,6 +183,9 @@
 #define NVME_TEMPERATURE_CRITICAL 0x175
 #define NVME_NUM_FW_SLOTS 1
 #define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB)
+#define NVME_MAX_VFS 127
+#define NVME_VF_OFFSET 0x1
+#define NVME_VF_STRIDE 1
 
 #define NVME_GUEST_ERR(trace, fmt, ...) \
 do { \
@@ -5742,6 +5753,10 @@ static void nvme_ctrl_reset(NvmeCtrl *n)
 g_free(event);
 }
 
+if (!pci_is_vf(>parent_obj) && n->params.sriov_max_vfs) {
+pcie_sriov_pf_disable_vfs(>parent_obj);
+}
+
 n->aer_queued = 0;
 n->outstanding_aers = 0;
 n->qs_created = false;
@@ -6423,6 +6438,29 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 error_setg(errp, "vsl must be non-zero");
 return;
 }
+
+if (params->sriov_max_vfs) {
+if (!n->subsys) {
+error_setg(errp, "subsystem is required for the use of SR-IOV");
+return;
+}
+
+if (params->sriov_max_vfs > NVME_MAX_VFS) {
+error_setg(errp, "sriov_max_vfs must be between 0 and %d",
+   NVME_MAX_VFS);
+return;
+}
+
+if (params->cmb_size_mb) {
+error_setg(errp, "CMB is not supported with SR-IOV");
+return;
+}
+
+if (n->pmr.dev) {
+error_setg(errp, "PMR is not supported with SR-IOV");
+return;
+}
+}
 }
 
 static void nvme_init_state(NvmeCtrl *n)
@@ -6480,6 +6518,20 @@ static void nvme_init_pmr(NvmeCtrl *n, PCIDevice 
*pci_dev)
 memory_region_set_enabled(>pmr.dev->mr, false);
 }
 
+static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset,
+uint64_t bar_size)
+{
+uint16_t vf_dev_id = n->params.use_intel_id ?
+ PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME;
+
+pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id,
+   n->params.sriov_max_vfs, n->params.sriov_max_vfs,
+   NVME_VF_OFFSET, NVME_VF_STRIDE);
+
+pcie_sriov_pf_init_vf_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY |
+  PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size);
+}
+
 static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp)
 {
 uint8_t *pci_conf = pci_dev->config;
@@ -6494,7 +6546,7 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, 
Error **errp)
 
 if (n->params.use_intel_id) {
 pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_INTEL);
-pci_config_set_device_id(pci_conf, 0x5845);
+pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_INTEL_NVME);
 } else {
 pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_REDHAT);
 pci_config_set_device_id(pci_conf, PCI_DEVICE_ID_REDHAT_NVME);
@@ -6502,6 +6554,9 @@ static int nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, 
Error **errp)
 
 pci_config_set_class(pci_conf, PCI_CLASS_STORAGE_EXPRESS);
 pcie_endpoint_cap_init(pci_dev, 0x80);
+if 

Re: [PATCH v2 00/12] GL & D-Bus display related fixes

2022-02-17 Thread Akihiko Odaki
On Fri, Feb 18, 2022 at 2:36 AM Marc-André Lureau
 wrote:
>
> Hi
>
> On Thu, Feb 17, 2022 at 9:25 PM Akihiko Odaki  wrote:
>>
>> On Fri, Feb 18, 2022 at 2:07 AM Marc-André Lureau
>>  wrote:
>> >
>> > Hi
>> >
>> > On Thu, Feb 17, 2022 at 8:39 PM Akihiko Odaki  
>> > wrote:
>> >>
>> >> On Fri, Feb 18, 2022 at 1:12 AM Marc-André Lureau
>> >>  wrote:
>> >> >
>> >> > Hi
>> >> >
>> >> > On Thu, Feb 17, 2022 at 5:09 PM Akihiko Odaki  
>> >> > wrote:
>> >> >>
>> >> >> On Thu, Feb 17, 2022 at 8:58 PM  wrote:
>> >> >> >
>> >> >> > From: Marc-André Lureau 
>> >> >> >
>> >> >> > Hi,
>> >> >> >
>> >> >> > In the thread "[PATCH 0/6] ui/dbus: Share one listener for a 
>> >> >> > console", Akihiko
>> >> >> > Odaki reported a number of issues with the GL and D-Bus display. His 
>> >> >> > series
>> >> >> > propose a different design, and reverting some of my previous 
>> >> >> > generic console
>> >> >> > changes to fix those issues.
>> >> >> >
>> >> >> > However, as I work through the issue so far, they can be solved by 
>> >> >> > reasonable
>> >> >> > simple fixes while keeping the console changes generic (not specific 
>> >> >> > to the
>> >> >> > D-Bus display backend). I belive a shared infrastructure is more 
>> >> >> > beneficial long
>> >> >> > term than having GL-specific code in the DBus code (in particular, 
>> >> >> > the
>> >> >> > egl-headless & VNC combination could potentially use it)
>> >> >> >
>> >> >> > Thanks a lot Akihiko for reporting the issues proposing a different 
>> >> >> > approach!
>> >> >> > Please test this alternative series and let me know of any further 
>> >> >> > issues. My
>> >> >> > understanding is that you are mainly concerned with the Cocoa 
>> >> >> > backend, and I
>> >> >> > don't have a way to test it, so please check it. If necessary, we 
>> >> >> > may well have
>> >> >> > to revert my earlier changes and go your way, eventually.
>> >> >> >
>> >> >> > Marc-André Lureau (12):
>> >> >> >   ui/console: fix crash when using gl context with non-gl listeners
>> >> >> >   ui/console: fix texture leak when calling 
>> >> >> > surface_gl_create_texture()
>> >> >> >   ui: do not create a surface when resizing a GL scanout
>> >> >> >   ui/console: move check for compatible GL context
>> >> >> >   ui/console: move dcl compatiblity check to a callback
>> >> >> >   ui/console: egl-headless is compatible with non-gl listeners
>> >> >> >   ui/dbus: associate the DBusDisplayConsole listener with the given
>> >> >> > console
>> >> >> >   ui/console: move console compatibility check to 
>> >> >> > dcl_display_console()
>> >> >> >   ui/shader: fix potential leak of shader on error
>> >> >> >   ui/shader: free associated programs
>> >> >> >   ui/console: add a dpy_gfx_switch callback helper
>> >> >> >   ui/dbus: fix texture sharing
>> >> >> >
>> >> >> >  include/ui/console.h |  19 ---
>> >> >> >  ui/dbus.h|   3 ++
>> >> >> >  ui/console-gl.c  |   4 ++
>> >> >> >  ui/console.c | 119 
>> >> >> > ++-
>> >> >> >  ui/dbus-console.c|  27 +-
>> >> >> >  ui/dbus-listener.c   |  11 
>> >> >> >  ui/dbus.c|  33 +++-
>> >> >> >  ui/egl-headless.c|  17 ++-
>> >> >> >  ui/gtk.c |  18 ++-
>> >> >> >  ui/sdl2.c|   9 +++-
>> >> >> >  ui/shader.c  |   9 +++-
>> >> >> >  ui/spice-display.c   |   9 +++-
>> >> >> >  12 files changed, 192 insertions(+), 86 deletions(-)
>> >> >> >
>> >> >> > --
>> >> >> > 2.34.1.428.gdcc0cd074f0c
>> >> >> >
>> >> >> >
>> >> >>
>> >> >> You missed only one thing:
>> >> >> >- that console_select and register_displaychangelistener may not call
>> >> >> > dpy_gfx_switch and call dpy_gl_scanout_texture instead. It is
>> >> >> > incompatible with non-OpenGL displays
>> >> >>
>> >> >> displaychangelistener_display_console always has to call
>> >> >> dpy_gfx_switch for non-OpenGL displays, but it still doesn't.
>> >> >
>> >> >
>> >> > Ok, would that be what you have in mind?
>> >> >
>> >> >  --- a/ui/console.c
>> >> > +++ b/ui/console.c
>> >> > @@ -1122,6 +1122,10 @@ static void 
>> >> > displaychangelistener_display_console(DisplayChangeListener *dcl,
>> >> >  } else if (con->scanout.kind == SCANOUT_SURFACE) {
>> >> >  dpy_gfx_create_texture(con, con->surface);
>> >> >  displaychangelistener_gfx_switch(dcl, con->surface);
>> >> > +} else {
>> >> > +/* use the fallback surface, egl-headless keeps it updated */
>> >> > +assert(con->surface);
>> >> > +displaychangelistener_gfx_switch(dcl, con->surface);
>> >> >  }
>> >>
>> >> It should call displaychangelistener_gfx_switch even when e.g.
>> >> con->scanout.kind == SCANOUT_TEXTURE. egl-headless renders the content
>> >> to the last DisplaySurface it received while con->scanout.kind ==
>> >> SCANOUT_TEXTURE.
>> >
>> >
>> > I see, egl-headless is really not a "listener".
>> >
>> >>
>> >> >
>> >> > I wish such egl-headless specific code would be 

[PATCH v5 01/15] pcie: Add support for Single Root I/O Virtualization (SR/IOV)

2022-02-17 Thread Lukasz Maniak
From: Knut Omang 

This patch provides the building blocks for creating an SR/IOV
PCIe Extended Capability header and register/unregister
SR/IOV Virtual Functions.

Signed-off-by: Knut Omang 
---
 hw/pci/meson.build  |   1 +
 hw/pci/pci.c| 100 +---
 hw/pci/pcie.c   |   5 +
 hw/pci/pcie_sriov.c | 294 
 hw/pci/trace-events |   5 +
 include/hw/pci/pci.h|  12 +-
 include/hw/pci/pcie.h   |   6 +
 include/hw/pci/pcie_sriov.h |  71 +
 include/qemu/typedefs.h |   2 +
 9 files changed, 470 insertions(+), 26 deletions(-)
 create mode 100644 hw/pci/pcie_sriov.c
 create mode 100644 include/hw/pci/pcie_sriov.h

diff --git a/hw/pci/meson.build b/hw/pci/meson.build
index 5c4bbac8171..bcc9c75919f 100644
--- a/hw/pci/meson.build
+++ b/hw/pci/meson.build
@@ -5,6 +5,7 @@ pci_ss.add(files(
   'pci.c',
   'pci_bridge.c',
   'pci_host.c',
+  'pcie_sriov.c',
   'shpc.c',
   'slotid_cap.c'
 ))
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 5d30f9ca60e..ba8fb92efc6 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -239,6 +239,9 @@ int pci_bar(PCIDevice *d, int reg)
 {
 uint8_t type;
 
+/* PCIe virtual functions do not have their own BARs */
+assert(!pci_is_vf(d));
+
 if (reg != PCI_ROM_SLOT)
 return PCI_BASE_ADDRESS_0 + reg * 4;
 
@@ -304,10 +307,30 @@ void pci_device_deassert_intx(PCIDevice *dev)
 }
 }
 
-static void pci_do_device_reset(PCIDevice *dev)
+static void pci_reset_regions(PCIDevice *dev)
 {
 int r;
+if (pci_is_vf(dev)) {
+return;
+}
+
+for (r = 0; r < PCI_NUM_REGIONS; ++r) {
+PCIIORegion *region = >io_regions[r];
+if (!region->size) {
+continue;
+}
 
+if (!(region->type & PCI_BASE_ADDRESS_SPACE_IO) &&
+region->type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
+pci_set_quad(dev->config + pci_bar(dev, r), region->type);
+} else {
+pci_set_long(dev->config + pci_bar(dev, r), region->type);
+}
+}
+}
+
+static void pci_do_device_reset(PCIDevice *dev)
+{
 pci_device_deassert_intx(dev);
 assert(dev->irq_state == 0);
 
@@ -323,19 +346,7 @@ static void pci_do_device_reset(PCIDevice *dev)
   pci_get_word(dev->wmask + PCI_INTERRUPT_LINE) |
   pci_get_word(dev->w1cmask + PCI_INTERRUPT_LINE));
 dev->config[PCI_CACHE_LINE_SIZE] = 0x0;
-for (r = 0; r < PCI_NUM_REGIONS; ++r) {
-PCIIORegion *region = >io_regions[r];
-if (!region->size) {
-continue;
-}
-
-if (!(region->type & PCI_BASE_ADDRESS_SPACE_IO) &&
-region->type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
-pci_set_quad(dev->config + pci_bar(dev, r), region->type);
-} else {
-pci_set_long(dev->config + pci_bar(dev, r), region->type);
-}
-}
+pci_reset_regions(dev);
 pci_update_mappings(dev);
 
 msi_reset(dev);
@@ -884,6 +895,16 @@ static void pci_init_multifunction(PCIBus *bus, PCIDevice 
*dev, Error **errp)
 dev->config[PCI_HEADER_TYPE] |= PCI_HEADER_TYPE_MULTI_FUNCTION;
 }
 
+/*
+ * With SR/IOV and ARI, a device at function 0 need not be a multifunction
+ * device, as it may just be a VF that ended up with function 0 in
+ * the legacy PCI interpretation. Avoid failing in such cases:
+ */
+if (pci_is_vf(dev) &&
+dev->exp.sriov_vf.pf->cap_present & QEMU_PCI_CAP_MULTIFUNCTION) {
+return;
+}
+
 /*
  * multifunction bit is interpreted in two ways as follows.
  *   - all functions must set the bit to 1.
@@ -1083,6 +1104,7 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev,
bus->devices[devfn]->name);
 return NULL;
 } else if (dev->hotplugged &&
+   !pci_is_vf(pci_dev) &&
pci_get_function_0(pci_dev)) {
 error_setg(errp, "PCI: slot %d function 0 already occupied by %s,"
" new func %s cannot be exposed to guest.",
@@ -1191,6 +1213,7 @@ void pci_register_bar(PCIDevice *pci_dev, int region_num,
 pcibus_t size = memory_region_size(memory);
 uint8_t hdr_type;
 
+assert(!pci_is_vf(pci_dev)); /* VFs must use pcie_sriov_vf_register_bar */
 assert(region_num >= 0);
 assert(region_num < PCI_NUM_REGIONS);
 assert(is_power_of_2(size));
@@ -1294,11 +1317,45 @@ pcibus_t pci_get_bar_addr(PCIDevice *pci_dev, int 
region_num)
 return pci_dev->io_regions[region_num].addr;
 }
 
-static pcibus_t pci_bar_address(PCIDevice *d,
-int reg, uint8_t type, pcibus_t size)
+static pcibus_t pci_config_get_bar_addr(PCIDevice *d, int reg,
+uint8_t type, pcibus_t size)
+{
+pcibus_t new_addr;
+if (!pci_is_vf(d)) {
+int bar = pci_bar(d, reg);
+if (type & PCI_BASE_ADDRESS_MEM_TYPE_64) {
+new_addr = 

[PATCH v5 06/15] hw/nvme: Add support for Primary Controller Capabilities

2022-02-17 Thread Lukasz Maniak
Implementation of Primary Controller Capabilities data
structure (Identify command with CNS value of 14h).

Currently, the command returns only ID of a primary controller.
Handling of remaining fields are added in subsequent patches
implementing virtualization enhancements.

Signed-off-by: Lukasz Maniak 
Reviewed-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 23 ++-
 hw/nvme/nvme.h   |  2 ++
 hw/nvme/trace-events |  1 +
 include/block/nvme.h | 23 +++
 4 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index adeba0b2b6d..0bd55948ce1 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4697,6 +4697,14 @@ static uint16_t nvme_identify_ctrl_list(NvmeCtrl *n, 
NvmeRequest *req,
 return nvme_c2h(n, (uint8_t *)list, sizeof(list), req);
 }
 
+static uint16_t nvme_identify_pri_ctrl_cap(NvmeCtrl *n, NvmeRequest *req)
+{
+trace_pci_nvme_identify_pri_ctrl_cap(le16_to_cpu(n->pri_ctrl_cap.cntlid));
+
+return nvme_c2h(n, (uint8_t *)>pri_ctrl_cap,
+sizeof(NvmePriCtrlCap), req);
+}
+
 static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, NvmeRequest *req,
  bool active)
 {
@@ -4915,6 +4923,8 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest 
*req)
 return nvme_identify_ctrl_list(n, req, true);
 case NVME_ID_CNS_CTRL_LIST:
 return nvme_identify_ctrl_list(n, req, false);
+case NVME_ID_CNS_PRIMARY_CTRL_CAP:
+return nvme_identify_pri_ctrl_cap(n, req);
 case NVME_ID_CNS_CS_NS:
 return nvme_identify_ns_csi(n, req, true);
 case NVME_ID_CNS_CS_NS_PRESENT:
@@ -6465,6 +6475,8 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 
 static void nvme_init_state(NvmeCtrl *n)
 {
+NvmePriCtrlCap *cap = >pri_ctrl_cap;
+
 /* add one to max_ioqpairs to account for the admin queue pair */
 n->reg_size = pow2ceil(sizeof(NvmeBar) +
2 * (n->params.max_ioqpairs + 1) * NVME_DB_SIZE);
@@ -6474,6 +6486,8 @@ static void nvme_init_state(NvmeCtrl *n)
 n->features.temp_thresh_hi = NVME_TEMPERATURE_WARNING;
 n->starttime_ms = qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL);
 n->aer_reqs = g_new0(NvmeRequest *, n->params.aerl + 1);
+
+cap->cntlid = cpu_to_le16(n->cntlid);
 }
 
 static void nvme_init_cmb(NvmeCtrl *n, PCIDevice *pci_dev)
@@ -6774,15 +6788,14 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
**errp)
 qbus_init(>bus, sizeof(NvmeBus), TYPE_NVME_BUS,
   _dev->qdev, n->parent_obj.qdev.id);
 
-nvme_init_state(n);
-if (nvme_init_pci(n, pci_dev, errp)) {
-return;
-}
-
 if (nvme_init_subsys(n, errp)) {
 error_propagate(errp, local_err);
 return;
 }
+nvme_init_state(n);
+if (nvme_init_pci(n, pci_dev, errp)) {
+return;
+}
 nvme_init_ctrl(n, pci_dev);
 
 /* setup a namespace if the controller drive property was given */
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 17245db96b5..2db48eb25c9 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -471,6 +471,8 @@ typedef struct NvmeCtrl {
 };
 uint32_tasync_config;
 } features;
+
+NvmePriCtrlCap  pri_ctrl_cap;
 } NvmeCtrl;
 
 static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t nsid)
diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
index 90730d802fe..bfc09dddc62 100644
--- a/hw/nvme/trace-events
+++ b/hw/nvme/trace-events
@@ -52,6 +52,7 @@ pci_nvme_identify_ctrl(void) "identify controller"
 pci_nvme_identify_ctrl_csi(uint8_t csi) "identify controller, csi=0x%"PRIx8""
 pci_nvme_identify_ns(uint32_t ns) "nsid %"PRIu32""
 pci_nvme_identify_ctrl_list(uint8_t cns, uint16_t cntid) "cns 0x%"PRIx8" cntid 
%"PRIu16""
+pci_nvme_identify_pri_ctrl_cap(uint16_t cntlid) "identify primary controller 
capabilities cntlid=%"PRIu16""
 pci_nvme_identify_ns_csi(uint32_t ns, uint8_t csi) "nsid=%"PRIu32", 
csi=0x%"PRIx8""
 pci_nvme_identify_nslist(uint32_t ns) "nsid %"PRIu32""
 pci_nvme_identify_nslist_csi(uint16_t ns, uint8_t csi) "nsid=%"PRIu16", 
csi=0x%"PRIx8""
diff --git a/include/block/nvme.h b/include/block/nvme.h
index cd068ac8914..73666cc900a 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -1019,6 +1019,7 @@ enum NvmeIdCns {
 NVME_ID_CNS_NS_PRESENT= 0x11,
 NVME_ID_CNS_NS_ATTACHED_CTRL_LIST = 0x12,
 NVME_ID_CNS_CTRL_LIST = 0x13,
+NVME_ID_CNS_PRIMARY_CTRL_CAP  = 0x14,
 NVME_ID_CNS_CS_NS_PRESENT_LIST= 0x1a,
 NVME_ID_CNS_CS_NS_PRESENT = 0x1b,
 NVME_ID_CNS_IO_COMMAND_SET= 0x1c,
@@ -1503,6 +1504,27 @@ typedef enum NvmeZoneState {
 NVME_ZONE_STATE_OFFLINE  = 0x0f,
 } NvmeZoneState;
 
+typedef struct QEMU_PACKED NvmePriCtrlCap {
+uint16_tcntlid;
+uint16_tportid;
+uint8_t crt;
+uint8_t rsvd5[27];
+uint32_tvqfrt;
+uint32_tvqrfa;
+uint16_tvqrfap;
+uint16_tvqprt;
+

Re: [PATCH] ppc/spapr: Advertise StoreEOI for POWER10 compat guests

2022-02-17 Thread Cédric Le Goater

Unfortunately, this patch breaks migration under TCG because the XIVE
source flag is not updated on the target side. KVM is not impacted
because the emulated sources are not used. This needs to be addressed
in a v2.

That said, even without this patch, TCG migration is broken. some CPUs
on the receive side are stalled on CPU Hard LOCKUPs. QEMU 6.2 is impacted.
So it has been a while :/


Ouch. Guess we need to add TCG migration tests in the test workflow ...


Regarding the first issue with the new XIVE source flag, this routine
changes an object property after realize which is a no-no for migration :

void spapr_xive_enable_store_eoi(SpaprXive *xive, bool enable)
{
if (enable) {
xive->source.esb_flags |= XIVE_SRC_STORE_EOI;
} else {
xive->source.esb_flags &= ~XIVE_SRC_STORE_EOI;
}
}

I think we need a new SpaprXive state to represent the characteristic
of the source indirectly negotiated by CAS when the CPU is a POWER10.
we would use it to update xive->source.esb_flags at post_load time
after migration.

Or simply mimick CAS :

@@ -531,6 +531,14 @@ static int spapr_xive_post_load(SpaprInt
 return kvmppc_xive_post_load(xive, version_id);
 }
 
+PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);

+bool enable = ppc_check_compat(first_ppc_cpu, CPU_POWERPC_LOGICAL_3_10, 0,
+   first_ppc_cpu->compat_pvr);
+spapr_xive_enable_store_eoi(xive, enable);
+
 return 0;
 }
 
which has the benefit of being stateless.


Ideas ?

Thanks,

C.




[PATCH v5 00/15] hw/nvme: SR-IOV with Virtualization Enhancements

2022-02-17 Thread Lukasz Maniak
Changes since v4:
- Added hello world example for SR-IOV to the docs
- Moved AER initialization from nvme_init_ctrl to nvme_init_state
- Fixed division by zero issue in calculation of vqfrt and vifrt
  capabilities

Knut Omang (2):
  pcie: Add support for Single Root I/O Virtualization (SR/IOV)
  pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt

Lukasz Maniak (4):
  hw/nvme: Add support for SR-IOV
  hw/nvme: Add support for Primary Controller Capabilities
  hw/nvme: Add support for Secondary Controller List
  docs: Add documentation for SR-IOV and Virtualization Enhancements

Łukasz Gieryk (9):
  pcie: Add a helper to the SR/IOV API
  pcie: Add 1.2 version token for the Power Management Capability
  hw/nvme: Implement the Function Level Reset
  hw/nvme: Make max_ioqpairs and msix_qsize configurable in runtime
  hw/nvme: Remove reg_size variable and update BAR0 size calculation
  hw/nvme: Calculate BAR attributes in a function
  hw/nvme: Initialize capability structures for primary/secondary
controllers
  hw/nvme: Add support for the Virtualization Management command
  hw/nvme: Update the initalization place for the AER queue

 docs/pcie_sriov.txt  | 115 ++
 docs/system/devices/nvme.rst |  82 +
 hw/nvme/ctrl.c   | 674 ---
 hw/nvme/ns.c |   2 +-
 hw/nvme/nvme.h   |  55 ++-
 hw/nvme/subsys.c |  75 +++-
 hw/nvme/trace-events |   6 +
 hw/pci/meson.build   |   1 +
 hw/pci/pci.c | 100 --
 hw/pci/pcie.c|   5 +
 hw/pci/pcie_sriov.c  | 302 
 hw/pci/trace-events  |   5 +
 include/block/nvme.h |  65 
 include/hw/pci/pci.h |  12 +-
 include/hw/pci/pci_ids.h |   1 +
 include/hw/pci/pci_regs.h|   1 +
 include/hw/pci/pcie.h|   6 +
 include/hw/pci/pcie_sriov.h  |  77 
 include/qemu/typedefs.h  |   2 +
 19 files changed, 1505 insertions(+), 81 deletions(-)
 create mode 100644 docs/pcie_sriov.txt
 create mode 100644 hw/pci/pcie_sriov.c
 create mode 100644 include/hw/pci/pcie_sriov.h

-- 
2.25.1




Re: [PATCH v2 00/12] GL & D-Bus display related fixes

2022-02-17 Thread Marc-André Lureau
Hi

On Thu, Feb 17, 2022 at 9:25 PM Akihiko Odaki 
wrote:

> On Fri, Feb 18, 2022 at 2:07 AM Marc-André Lureau
>  wrote:
> >
> > Hi
> >
> > On Thu, Feb 17, 2022 at 8:39 PM Akihiko Odaki 
> wrote:
> >>
> >> On Fri, Feb 18, 2022 at 1:12 AM Marc-André Lureau
> >>  wrote:
> >> >
> >> > Hi
> >> >
> >> > On Thu, Feb 17, 2022 at 5:09 PM Akihiko Odaki <
> akihiko.od...@gmail.com> wrote:
> >> >>
> >> >> On Thu, Feb 17, 2022 at 8:58 PM  wrote:
> >> >> >
> >> >> > From: Marc-André Lureau 
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > In the thread "[PATCH 0/6] ui/dbus: Share one listener for a
> console", Akihiko
> >> >> > Odaki reported a number of issues with the GL and D-Bus display.
> His series
> >> >> > propose a different design, and reverting some of my previous
> generic console
> >> >> > changes to fix those issues.
> >> >> >
> >> >> > However, as I work through the issue so far, they can be solved by
> reasonable
> >> >> > simple fixes while keeping the console changes generic (not
> specific to the
> >> >> > D-Bus display backend). I belive a shared infrastructure is more
> beneficial long
> >> >> > term than having GL-specific code in the DBus code (in particular,
> the
> >> >> > egl-headless & VNC combination could potentially use it)
> >> >> >
> >> >> > Thanks a lot Akihiko for reporting the issues proposing a
> different approach!
> >> >> > Please test this alternative series and let me know of any further
> issues. My
> >> >> > understanding is that you are mainly concerned with the Cocoa
> backend, and I
> >> >> > don't have a way to test it, so please check it. If necessary, we
> may well have
> >> >> > to revert my earlier changes and go your way, eventually.
> >> >> >
> >> >> > Marc-André Lureau (12):
> >> >> >   ui/console: fix crash when using gl context with non-gl listeners
> >> >> >   ui/console: fix texture leak when calling
> surface_gl_create_texture()
> >> >> >   ui: do not create a surface when resizing a GL scanout
> >> >> >   ui/console: move check for compatible GL context
> >> >> >   ui/console: move dcl compatiblity check to a callback
> >> >> >   ui/console: egl-headless is compatible with non-gl listeners
> >> >> >   ui/dbus: associate the DBusDisplayConsole listener with the given
> >> >> > console
> >> >> >   ui/console: move console compatibility check to
> dcl_display_console()
> >> >> >   ui/shader: fix potential leak of shader on error
> >> >> >   ui/shader: free associated programs
> >> >> >   ui/console: add a dpy_gfx_switch callback helper
> >> >> >   ui/dbus: fix texture sharing
> >> >> >
> >> >> >  include/ui/console.h |  19 ---
> >> >> >  ui/dbus.h|   3 ++
> >> >> >  ui/console-gl.c  |   4 ++
> >> >> >  ui/console.c | 119
> ++-
> >> >> >  ui/dbus-console.c|  27 +-
> >> >> >  ui/dbus-listener.c   |  11 
> >> >> >  ui/dbus.c|  33 +++-
> >> >> >  ui/egl-headless.c|  17 ++-
> >> >> >  ui/gtk.c |  18 ++-
> >> >> >  ui/sdl2.c|   9 +++-
> >> >> >  ui/shader.c  |   9 +++-
> >> >> >  ui/spice-display.c   |   9 +++-
> >> >> >  12 files changed, 192 insertions(+), 86 deletions(-)
> >> >> >
> >> >> > --
> >> >> > 2.34.1.428.gdcc0cd074f0c
> >> >> >
> >> >> >
> >> >>
> >> >> You missed only one thing:
> >> >> >- that console_select and register_displaychangelistener may not
> call
> >> >> > dpy_gfx_switch and call dpy_gl_scanout_texture instead. It is
> >> >> > incompatible with non-OpenGL displays
> >> >>
> >> >> displaychangelistener_display_console always has to call
> >> >> dpy_gfx_switch for non-OpenGL displays, but it still doesn't.
> >> >
> >> >
> >> > Ok, would that be what you have in mind?
> >> >
> >> >  --- a/ui/console.c
> >> > +++ b/ui/console.c
> >> > @@ -1122,6 +1122,10 @@ static void
> displaychangelistener_display_console(DisplayChangeListener *dcl,
> >> >  } else if (con->scanout.kind == SCANOUT_SURFACE) {
> >> >  dpy_gfx_create_texture(con, con->surface);
> >> >  displaychangelistener_gfx_switch(dcl, con->surface);
> >> > +} else {
> >> > +/* use the fallback surface, egl-headless keeps it updated */
> >> > +assert(con->surface);
> >> > +displaychangelistener_gfx_switch(dcl, con->surface);
> >> >  }
> >>
> >> It should call displaychangelistener_gfx_switch even when e.g.
> >> con->scanout.kind == SCANOUT_TEXTURE. egl-headless renders the content
> >> to the last DisplaySurface it received while con->scanout.kind ==
> >> SCANOUT_TEXTURE.
> >
> >
> > I see, egl-headless is really not a "listener".
> >
> >>
> >> >
> >> > I wish such egl-headless specific code would be there, but we would
> need more refactoring.
> >> >
> >> > I think I would rather have a backend split for GL context, like
> "-object egl-context". egl-headless-specific copy code would be handled by
> common/util code for anything that wants a pixman surface (VNC, screen
> capture, non-GL display 

Re: [PATCH v2 00/15] target/arm: Implement LVA, LPA, LPA2 features

2022-02-17 Thread Alex Bennée


Peter Maydell  writes:

> On Thu, 10 Feb 2022 at 04:04, Richard Henderson
>  wrote:
>>
>> Changes for v2:
>>   * Introduce FIELD_SEX64, instead of open-coding w/ sextract64.
>>   * Set TCR_EL1 more completely for user-only.
>>   * Continue to bound tsz within aa64_va_parameters;
>> provide an out-of-bound indicator for raising AddressSize fault.
>>   * Split IPS patch.
>>   * Fix debug registers for LVA.
>>   * Fix long-format fsc for LPA2.
>>   * Fix TLBI page shift.
>>   * Validate TLBI granule vs TCR granule.
>>
>> Not done:
>>   * Validate translation levels which accept blocks.
>>
>> There is still no upstream kernel support for FEAT_LPA2,
>> so that is essentially untested.
>
> This series seems to break 'make check-acceptance':
>
>  (01/59) tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv2:
> INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
> Timeout reached\nOriginal status: ERROR\n{'name':
> '01-tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv2',
> 'logdir': 
> '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/results/j...
> (900.74 s)
>  (02/59) tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv3:
> INTERRUPTED: Test interrupted by SIGTERM\nRunner error occurred:
> Timeout reached\nOriginal status: ERROR\n{'name':
> '02-tests/avocado/boot_linux.py:BootLinuxAarch64.test_virt_tcg_gicv3',
> 'logdir': 
> '/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/arm-clang/tests/results/j...
> (900.71 s)
>
> UEFI runs in the guest and seems to launch the kernel, but there's
> no output from the kernel itself in the logfile. Last thing it
> prints is:
>
> EFI stub: Booting Linux Kernel...
> EFI stub: EFI_RNG_PROTOCOL unavailable, no randomness supplied
> EFI stub: Using DTB from configuration table
> EFI stub: Exiting boot services and installing virtual address map...
> SetUefiImageMemoryAttributes - 0x7F50 - 0x0004
> (0x0008)
> SetUefiImageMemoryAttributes - 0x7C19 - 0x0004
> (0x0008)
> SetUefiImageMemoryAttributes - 0x7C14 - 0x0004
> (0x0008)
> SetUefiImageMemoryAttributes - 0x7F4C - 0x0003
> (0x0008)
> SetUefiImageMemoryAttributes - 0x7C0F - 0x0004
> (0x0008)
> SetUefiImageMemoryAttributes - 0x7BFB - 0x0004
> (0x0008)
> SetUefiImageMemoryAttributes - 0x7BE0 - 0x0003
> (0x0008)
> SetUefiImageMemoryAttributes - 0x7BDC - 0x0003
> (0x0008)
>
> This ought to be followed by the usual kernel boot log
> [0.00] Booting Linux on physical CPU 0x00 [0x000f0510]
> etc but it isn't. Probably the kernel is crashing in early bootup
> before it gets round to printing anything.

As this test runs under -cpu max it is likely exercising the new
features (and failing).

-- 
Alex Bennée



[PULL 10/12] virtiofsd: Create new file using O_TMPFILE and set security context

2022-02-17 Thread Dr. David Alan Gilbert (git)
From: Vivek Goyal 

If guest and host policies can't work with each other, then guest security
context (selinux label) needs to be set into an xattr. Say remap guest
security.selinux xattr to trusted.virtiofs.security.selinux.

That means setting "fscreate" is not going to help as that's ony useful
for security.selinux xattr on host.

So we need another method which is atomic. Use O_TMPFILE to create new
file, set xattr and then linkat() to proper place.

But this works only for regular files. So dir, symlinks will continue
to be non-atomic.

Also if host filesystem does not support O_TMPFILE, we fallback to
non-atomic behavior.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Vivek Goyal 
Message-Id: <20220208204813.682906-10-vgo...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 tools/virtiofsd/passthrough_ll.c | 80 
 1 file changed, 72 insertions(+), 8 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index e1c45bb420..f5d584e18a 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2153,14 +2153,29 @@ static int lo_do_open(struct lo_data *lo, struct 
lo_inode *inode,
 
 static int do_create_nosecctx(fuse_req_t req, struct lo_inode *parent_inode,
const char *name, mode_t mode,
-   struct fuse_file_info *fi, int *open_fd)
+   struct fuse_file_info *fi, int *open_fd,
+  bool tmpfile)
 {
 int err, fd;
 struct lo_cred old = {};
 struct lo_data *lo = lo_data(req);
 int flags;
 
-flags = fi->flags | O_CREAT | O_EXCL;
+if (tmpfile) {
+flags = fi->flags | O_TMPFILE;
+/*
+ * Don't use O_EXCL as we want to link file later. Also reset O_CREAT
+ * otherwise openat() returns -EINVAL.
+ */
+flags &= ~(O_CREAT | O_EXCL);
+
+/* O_TMPFILE needs either O_RDWR or O_WRONLY */
+if ((flags & O_ACCMODE) == O_RDONLY) {
+flags |= O_RDWR;
+}
+} else {
+flags = fi->flags | O_CREAT | O_EXCL;
+}
 
 err = lo_change_cred(req, , lo->change_umask);
 if (err) {
@@ -2191,7 +2206,7 @@ static int do_create_secctx_fscreate(fuse_req_t req,
 return err;
 }
 
-err = do_create_nosecctx(req, parent_inode, name, mode, fi, );
+err = do_create_nosecctx(req, parent_inode, name, mode, fi, , false);
 
 close_reset_proc_fscreate(fscreate_fd);
 if (!err) {
@@ -2200,6 +2215,44 @@ static int do_create_secctx_fscreate(fuse_req_t req,
 return err;
 }
 
+static int do_create_secctx_tmpfile(fuse_req_t req,
+struct lo_inode *parent_inode,
+const char *name, mode_t mode,
+struct fuse_file_info *fi,
+const char *secctx_name, int *open_fd)
+{
+int err, fd = -1;
+struct lo_data *lo = lo_data(req);
+char procname[64];
+
+err = do_create_nosecctx(req, parent_inode, ".", mode, fi, , true);
+if (err) {
+return err;
+}
+
+err = fsetxattr(fd, secctx_name, req->secctx.ctx, req->secctx.ctxlen, 0);
+if (err) {
+err = errno;
+goto out;
+}
+
+/* Security context set on file. Link it in place */
+sprintf(procname, "%d", fd);
+FCHDIR_NOFAIL(lo->proc_self_fd);
+err = linkat(AT_FDCWD, procname, parent_inode->fd, name,
+ AT_SYMLINK_FOLLOW);
+err = err == -1 ? errno : 0;
+FCHDIR_NOFAIL(lo->root.fd);
+
+out:
+if (!err) {
+*open_fd = fd;
+} else if (fd != -1) {
+close(fd);
+}
+return err;
+}
+
 static int do_create_secctx_noatomic(fuse_req_t req,
  struct lo_inode *parent_inode,
  const char *name, mode_t mode,
@@ -2208,7 +2261,7 @@ static int do_create_secctx_noatomic(fuse_req_t req,
 {
 int err = 0, fd = -1;
 
-err = do_create_nosecctx(req, parent_inode, name, mode, fi, );
+err = do_create_nosecctx(req, parent_inode, name, mode, fi, , false);
 if (err) {
 goto out;
 }
@@ -2250,20 +2303,31 @@ static int do_lo_create(fuse_req_t req, struct lo_inode 
*parent_inode,
 if (secctx_enabled) {
 /*
  * If security.selinux has not been remapped and selinux is enabled,
- * use fscreate to set context before file creation.
- * Otherwise fallback to non-atomic method of file creation
- * and xattr settting.
+ * use fscreate to set context before file creation. If not, use
+ * tmpfile method for regular files. Otherwise fallback to
+ * non-atomic method of file creation and xattr settting.
  */
 if (!mapped_name && lo->use_fscreate) {
 err = do_create_secctx_fscreate(req, parent_inode, name, mode, fi,
   

[PULL 12/12] virtiofsd: Add basic support for FUSE_SYNCFS request

2022-02-17 Thread Dr. David Alan Gilbert (git)
From: Greg Kurz 

Honor the expected behavior of syncfs() to synchronously flush all data
and metadata to disk on linux systems.

If virtiofsd is started with '-o announce_submounts', the client is
expected to send a FUSE_SYNCFS request for each individual submount.
In this case, we just create a new file descriptor on the submount
inode with lo_inode_open(), call syncfs() on it and close it. The
intermediary file is needed because O_PATH descriptors aren't
backed by an actual file and syncfs() would fail with EBADF.

If virtiofsd is started without '-o announce_submounts' or if the
client doesn't have the FUSE_CAP_SUBMOUNTS capability, the client
only sends a single FUSE_SYNCFS request for the root inode. The
server would thus need to track submounts internally and call
syncfs() on each of them. This will be implemented later.

Note that syncfs() might suffer from a time penalty if the submounts
are being hammered by some unrelated workload on the host. The only
solution to prevent that is to avoid shared mounts.

Signed-off-by: Greg Kurz 
Message-Id: <20220215181529.164070-2-gr...@kaod.org>
Reviewed-by: Vivek Goyal 
Signed-off-by: Dr. David Alan Gilbert 
---
 tools/virtiofsd/fuse_lowlevel.c   | 11 +++
 tools/virtiofsd/fuse_lowlevel.h   | 13 
 tools/virtiofsd/passthrough_ll.c  | 44 +++
 tools/virtiofsd/passthrough_seccomp.c |  1 +
 4 files changed, 69 insertions(+)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index f681d5e3b3..752928741d 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1967,6 +1967,16 @@ static void do_lseek(fuse_req_t req, fuse_ino_t nodeid,
 }
 }
 
+static void do_syncfs(fuse_req_t req, fuse_ino_t nodeid,
+  struct fuse_mbuf_iter *iter)
+{
+if (req->se->op.syncfs) {
+req->se->op.syncfs(req, nodeid);
+} else {
+fuse_reply_err(req, ENOSYS);
+}
+}
+
 static void do_init(fuse_req_t req, fuse_ino_t nodeid,
 struct fuse_mbuf_iter *iter)
 {
@@ -2399,6 +2409,7 @@ static struct {
 [FUSE_RENAME2] = { do_rename2, "RENAME2" },
 [FUSE_COPY_FILE_RANGE] = { do_copy_file_range, "COPY_FILE_RANGE" },
 [FUSE_LSEEK] = { do_lseek, "LSEEK" },
+[FUSE_SYNCFS] = { do_syncfs, "SYNCFS" },
 };
 
 #define FUSE_MAXOP (sizeof(fuse_ll_ops) / sizeof(fuse_ll_ops[0]))
diff --git a/tools/virtiofsd/fuse_lowlevel.h b/tools/virtiofsd/fuse_lowlevel.h
index c55c0ca2fc..b889dae4de 100644
--- a/tools/virtiofsd/fuse_lowlevel.h
+++ b/tools/virtiofsd/fuse_lowlevel.h
@@ -1226,6 +1226,19 @@ struct fuse_lowlevel_ops {
  */
 void (*lseek)(fuse_req_t req, fuse_ino_t ino, off_t off, int whence,
   struct fuse_file_info *fi);
+
+/**
+ * Synchronize file system content
+ *
+ * If this request is answered with an error code of ENOSYS,
+ * this is treated as success and future calls to syncfs() will
+ * succeed automatically without being sent to the filesystem
+ * process.
+ *
+ * @param req request handle
+ * @param ino the inode number
+ */
+void (*syncfs)(fuse_req_t req, fuse_ino_t ino);
 };
 
 /**
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 4742be1d1e..dfa2fc250d 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -3699,6 +3699,49 @@ static void lo_lseek(fuse_req_t req, fuse_ino_t ino, 
off_t off, int whence,
 }
 }
 
+static int lo_do_syncfs(struct lo_data *lo, struct lo_inode *inode)
+{
+int fd, ret = 0;
+
+fuse_log(FUSE_LOG_DEBUG, "lo_do_syncfs(ino=%" PRIu64 ")\n",
+ inode->fuse_ino);
+
+fd = lo_inode_open(lo, inode, O_RDONLY);
+if (fd < 0) {
+return -fd;
+}
+
+if (syncfs(fd) < 0) {
+ret = errno;
+}
+
+close(fd);
+return ret;
+}
+
+static void lo_syncfs(fuse_req_t req, fuse_ino_t ino)
+{
+struct lo_data *lo = lo_data(req);
+struct lo_inode *inode = lo_inode(req, ino);
+int err;
+
+if (!inode) {
+fuse_reply_err(req, EBADF);
+return;
+}
+
+err = lo_do_syncfs(lo, inode);
+lo_inode_put(lo, );
+
+/*
+ * If submounts aren't announced, the client only sends a request to
+ * sync the root inode. TODO: Track submounts internally and iterate
+ * over them as well.
+ */
+
+fuse_reply_err(req, err);
+}
+
 static void lo_destroy(void *userdata)
 {
 struct lo_data *lo = (struct lo_data *)userdata;
@@ -3759,6 +3802,7 @@ static struct fuse_lowlevel_ops lo_oper = {
 .copy_file_range = lo_copy_file_range,
 #endif
 .lseek = lo_lseek,
+.syncfs = lo_syncfs,
 .destroy = lo_destroy,
 };
 
diff --git a/tools/virtiofsd/passthrough_seccomp.c 
b/tools/virtiofsd/passthrough_seccomp.c
index 2bc0127b69..888295c073 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -111,6 +111,7 @@ static const int 

[PULL 09/12] virtiofsd: Create new file with security context

2022-02-17 Thread Dr. David Alan Gilbert (git)
From: Vivek Goyal 

This patch adds support for creating new file with security context
as sent by client. It basically takes three paths.

- If no security context enabled, then it continues to create files without
  security context.

- If security context is enabled and but security.selinux has not been
  remapped, then it uses /proc/thread-self/attr/fscreate knob to set
  security context and then create the file. This will make sure that
  newly created file gets the security context as set in "fscreate" and
  this is atomic w.r.t file creation.

  This is useful and host and guest SELinux policies don't conflict and
  can work with each other. In that case, guest security.selinux xattr
  is not remapped and it is passthrough as "security.selinux" xattr
  on host.

- If security context is enabled but security.selinux xattr has been
  remapped to something else, then it first creates the file and then
  uses setxattr() to set the remapped xattr with the security context.
  This is a non-atomic operation w.r.t file creation.

  This mode will be most versatile and allow host and guest to have their
  own separate SELinux xattrs and have their own separate SELinux policies.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Vivek Goyal 
Message-Id: <20220208204813.682906-9-vgo...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 tools/virtiofsd/passthrough_ll.c | 229 +++
 1 file changed, 200 insertions(+), 29 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index e694980a53..e1c45bb420 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -234,6 +234,11 @@ static struct lo_inode *lo_find(struct lo_data *lo, struct 
stat *st,
 static int xattr_map_client(const struct lo_data *lo, const char *client_name,
 char **out_name);
 
+#define FCHDIR_NOFAIL(fd) do { \
+int fchdir_res = fchdir(fd);   \
+assert(fchdir_res == 0);   \
+} while (0)
+
 static bool is_dot_or_dotdot(const char *name)
 {
 return name[0] == '.' &&
@@ -288,7 +293,6 @@ static bool is_fscreate_usable(struct lo_data *lo)
 }
 
 /* Helpers to set/reset fscreate */
-__attribute__((unused))
 static int open_set_proc_fscreate(struct lo_data *lo, const void *ctx,
   size_t ctxlen, int *fd)
 {
@@ -316,7 +320,6 @@ out:
 return err;
 }
 
-__attribute__((unused))
 static void close_reset_proc_fscreate(int fd)
 {
 if ((write(fd, NULL, 0)) == -1) {
@@ -1354,16 +1357,103 @@ static void lo_restore_cred_gain_cap(struct lo_cred 
*old, bool restore_umask,
 }
 }
 
+static int do_mknod_symlink_secctx(fuse_req_t req, struct lo_inode *dir,
+   const char *name, const char *secctx_name)
+{
+int path_fd, err;
+char procname[64];
+struct lo_data *lo = lo_data(req);
+
+if (!req->secctx.ctxlen) {
+return 0;
+}
+
+/* Open newly created element with O_PATH */
+path_fd = openat(dir->fd, name, O_PATH | O_NOFOLLOW);
+err = path_fd == -1 ? errno : 0;
+if (err) {
+return err;
+}
+sprintf(procname, "%i", path_fd);
+FCHDIR_NOFAIL(lo->proc_self_fd);
+/* Set security context. This is not atomic w.r.t file creation */
+err = setxattr(procname, secctx_name, req->secctx.ctx, req->secctx.ctxlen,
+   0);
+if (err) {
+err = errno;
+}
+FCHDIR_NOFAIL(lo->root.fd);
+close(path_fd);
+return err;
+}
+
+static int do_mknod_symlink(fuse_req_t req, struct lo_inode *dir,
+const char *name, mode_t mode, dev_t rdev,
+const char *link)
+{
+int err, fscreate_fd = -1;
+const char *secctx_name = req->secctx.name;
+struct lo_cred old = {};
+struct lo_data *lo = lo_data(req);
+char *mapped_name = NULL;
+bool secctx_enabled = req->secctx.ctxlen;
+bool do_fscreate = false;
+
+if (secctx_enabled && lo->xattrmap) {
+err = xattr_map_client(lo, req->secctx.name, _name);
+if (err < 0) {
+return -err;
+}
+secctx_name = mapped_name;
+}
+
+/*
+ * If security xattr has not been remapped and selinux is enabled on
+ * host, set fscreate and no need to do a setxattr() after file creation
+ */
+if (secctx_enabled && !mapped_name && lo->use_fscreate) {
+do_fscreate = true;
+err = open_set_proc_fscreate(lo, req->secctx.ctx, req->secctx.ctxlen,
+ _fd);
+if (err) {
+goto out;
+}
+}
+
+err = lo_change_cred(req, , lo->change_umask && !S_ISLNK(mode));
+if (err) {
+goto out;
+}
+
+err = mknod_wrapper(dir->fd, name, link, mode, rdev);
+err = err == -1 ? errno : 0;
+lo_restore_cred(, lo->change_umask && !S_ISLNK(mode));
+if (err) {
+

[PULL 11/12] virtiofsd: Add an option to enable/disable security label

2022-02-17 Thread Dr. David Alan Gilbert (git)
From: Vivek Goyal 

Provide an option "-o security_label/no_security_label" to enable/disable
security label functionality. By default these are turned off.

If enabled, server will indicate to client that it is capable of handling
one security label during file creation. Typically this is expected to
be a SELinux label. File server will set this label on the file. It will
try to set it atomically wherever possible. But its not possible in
all the cases.

Signed-off-by: Vivek Goyal 
Message-Id: <20220208204813.682906-11-vgo...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
---
 docs/tools/virtiofsd.rst | 32 
 tools/virtiofsd/helper.c |  1 +
 tools/virtiofsd/passthrough_ll.c | 15 +++
 3 files changed, 48 insertions(+)

diff --git a/docs/tools/virtiofsd.rst b/docs/tools/virtiofsd.rst
index 07ac0be551..0c0560203c 100644
--- a/docs/tools/virtiofsd.rst
+++ b/docs/tools/virtiofsd.rst
@@ -104,6 +104,13 @@ Options
   * posix_acl|no_posix_acl -
 Enable/disable posix acl support.  Posix ACLs are disabled by default.
 
+  * security_label|no_security_label -
+Enable/disable security label support. Security labels are disabled by
+default. This will allow client to send a MAC label of file during
+file creation. Typically this is expected to be SELinux security
+label. Server will try to set that label on newly created file
+atomically wherever possible.
+
 .. option:: --socket-path=PATH
 
   Listen on vhost-user UNIX domain socket at PATH.
@@ -348,6 +355,31 @@ client arguments or lists returned from the host.  This 
stops
 the client seeing any 'security.' attributes on the server and
 stops it setting any.
 
+SELinux support
+---
+One can enable support for SELinux by running virtiofsd with option
+"-o security_label". But this will try to save guest's security context
+in xattr security.selinux on host and it might fail if host's SELinux
+policy does not permit virtiofsd to do this operation.
+
+Hence, it is preferred to remap guest's "security.selinux" xattr to say
+"trusted.virtiofs.security.selinux" on host.
+
+"-o xattrmap=:map:security.selinux:trusted.virtiofs.:"
+
+This will make sure that guest and host's SELinux xattrs on same file
+remain separate and not interfere with each other. And will allow both
+host and guest to implement their own separate SELinux policies.
+
+Setting trusted xattr on host requires CAP_SYS_ADMIN. So one will need
+add this capability to daemon.
+
+"-o modcaps=+sys_admin"
+
+Giving CAP_SYS_ADMIN increases the risk on system. Now virtiofsd is more
+powerful and if gets compromised, it can do lot of damage to host system.
+So keep this trade-off in my mind while making a decision.
+
 Examples
 
 
diff --git a/tools/virtiofsd/helper.c b/tools/virtiofsd/helper.c
index a8295d975a..e226fc590f 100644
--- a/tools/virtiofsd/helper.c
+++ b/tools/virtiofsd/helper.c
@@ -187,6 +187,7 @@ void fuse_cmdline_help(void)
"   default: no_allow_direct_io\n"
"-o announce_submounts  Announce sub-mount points to the 
guest\n"
"-o posix_acl/no_posix_acl  Enable/Disable posix_acl. (default: 
disabled)\n"
+   "-o security_label/no_security_label  Enable/Disable security 
label. (default: disabled)\n"
);
 }
 
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index f5d584e18a..4742be1d1e 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -181,6 +181,7 @@ struct lo_data {
 int user_posix_acl, posix_acl;
 /* Keeps track if /proc//attr/fscreate should be used or not */
 bool use_fscreate;
+int user_security_label;
 };
 
 static const struct fuse_opt lo_opts[] = {
@@ -215,6 +216,8 @@ static const struct fuse_opt lo_opts[] = {
 { "no_killpriv_v2", offsetof(struct lo_data, user_killpriv_v2), 0 },
 { "posix_acl", offsetof(struct lo_data, user_posix_acl), 1 },
 { "no_posix_acl", offsetof(struct lo_data, user_posix_acl), 0 },
+{ "security_label", offsetof(struct lo_data, user_security_label), 1 },
+{ "no_security_label", offsetof(struct lo_data, user_security_label), 0 },
 FUSE_OPT_END
 };
 static bool use_syslog = false;
@@ -808,6 +811,17 @@ static void lo_init(void *userdata, struct fuse_conn_info 
*conn)
 fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling posix_acl\n");
 conn->want &= ~FUSE_CAP_POSIX_ACL;
 }
+
+if (lo->user_security_label == 1) {
+if (!(conn->capable & FUSE_CAP_SECURITY_CTX)) {
+fuse_log(FUSE_LOG_ERR, "lo_init: Can not enable security label."
+ " kernel does not support FUSE_SECURITY_CTX 
capability.\n");
+}
+conn->want |= FUSE_CAP_SECURITY_CTX;
+} else {
+fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling security label\n");
+conn->want &= ~FUSE_CAP_SECURITY_CTX;
+ 

[PULL 08/12] virtiofsd: Add helpers to work with /proc/self/task/tid/attr/fscreate

2022-02-17 Thread Dr. David Alan Gilbert (git)
From: Vivek Goyal 

Soon we will be able to create and also set security context on the file
atomically using /proc/self/task/tid/attr/fscreate knob. If this knob
is available on the system, first set the knob with the desired context
and then create the file. It will be created with the context set in
fscreate. This works basically for SELinux and its per thread.

This patch just introduces the helper functions. Subsequent patches will
make use of these helpers.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Vivek Goyal 
Message-Id: <20220208204813.682906-8-vgo...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: Manually merged gettid syscall number fixup from Vivek
---
 tools/virtiofsd/passthrough_ll.c | 92 
 1 file changed, 92 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index e27479f1c9..e694980a53 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -173,10 +173,14 @@ struct lo_data {
 
 /* An O_PATH file descriptor to /proc/self/fd/ */
 int proc_self_fd;
+/* An O_PATH file descriptor to /proc/self/task/ */
+int proc_self_task;
 int user_killpriv_v2, killpriv_v2;
 /* If set, virtiofsd is responsible for setting umask during creation */
 bool change_umask;
 int user_posix_acl, posix_acl;
+/* Keeps track if /proc//attr/fscreate should be used or not */
+bool use_fscreate;
 };
 
 static const struct fuse_opt lo_opts[] = {
@@ -256,6 +260,72 @@ static struct lo_data *lo_data(fuse_req_t req)
 return (struct lo_data *)fuse_req_userdata(req);
 }
 
+/*
+ * Tries to figure out if /proc//attr/fscreate is usable or not. With
+ * selinux=0, read from fscreate returns -EINVAL.
+ *
+ * TODO: Link with libselinux and use is_selinux_enabled() instead down
+ * the line. It probably will be more reliable indicator.
+ */
+static bool is_fscreate_usable(struct lo_data *lo)
+{
+char procname[64];
+int fscreate_fd;
+size_t bytes_read;
+
+sprintf(procname, "%ld/attr/fscreate", syscall(SYS_gettid));
+fscreate_fd = openat(lo->proc_self_task, procname, O_RDWR);
+if (fscreate_fd == -1) {
+return false;
+}
+
+bytes_read = read(fscreate_fd, procname, 64);
+close(fscreate_fd);
+if (bytes_read == -1) {
+return false;
+}
+return true;
+}
+
+/* Helpers to set/reset fscreate */
+__attribute__((unused))
+static int open_set_proc_fscreate(struct lo_data *lo, const void *ctx,
+  size_t ctxlen, int *fd)
+{
+char procname[64];
+int fscreate_fd, err = 0;
+size_t written;
+
+sprintf(procname, "%ld/attr/fscreate", syscall(SYS_gettid));
+fscreate_fd = openat(lo->proc_self_task, procname, O_WRONLY);
+err = fscreate_fd == -1 ? errno : 0;
+if (err) {
+return err;
+}
+
+written = write(fscreate_fd, ctx, ctxlen);
+err = written == -1 ? errno : 0;
+if (err) {
+goto out;
+}
+
+*fd = fscreate_fd;
+return 0;
+out:
+close(fscreate_fd);
+return err;
+}
+
+__attribute__((unused))
+static void close_reset_proc_fscreate(int fd)
+{
+if ((write(fd, NULL, 0)) == -1) {
+fuse_log(FUSE_LOG_WARNING, "Failed to reset fscreate. err=%d\n", 
errno);
+}
+close(fd);
+return;
+}
+
 /*
  * Load capng's state from our saved state if the current thread
  * hadn't previously been loaded.
@@ -3531,6 +3601,15 @@ static void setup_namespaces(struct lo_data *lo, struct 
fuse_session *se)
 exit(1);
 }
 
+/* Get the /proc/self/task descriptor */
+lo->proc_self_task = open("/proc/self/task/", O_PATH);
+if (lo->proc_self_task == -1) {
+fuse_log(FUSE_LOG_ERR, "open(/proc/self/task, O_PATH): %m\n");
+exit(1);
+}
+
+lo->use_fscreate = is_fscreate_usable(lo);
+
 /*
  * We only need /proc/self/fd. Prevent ".." from accessing parent
  * directories of /proc/self/fd by bind-mounting it over /proc. Since / was
@@ -3747,6 +3826,14 @@ static void setup_chroot(struct lo_data *lo)
 exit(1);
 }
 
+lo->proc_self_task = open("/proc/self/task", O_PATH);
+if (lo->proc_self_fd == -1) {
+fuse_log(FUSE_LOG_ERR, "open(\"/proc/self/task\", O_PATH): %m\n");
+exit(1);
+}
+
+lo->use_fscreate = is_fscreate_usable(lo);
+
 /*
  * Make the shared directory the file system root so that FUSE_OPEN
  * (lo_open()) cannot escape the shared directory by opening a symlink.
@@ -3932,6 +4019,10 @@ static void fuse_lo_data_cleanup(struct lo_data *lo)
 close(lo->proc_self_fd);
 }
 
+if (lo->proc_self_task >= 0) {
+close(lo->proc_self_task);
+}
+
 if (lo->root.fd >= 0) {
 close(lo->root.fd);
 }
@@ -3959,6 +4050,7 @@ int main(int argc, char *argv[])
 .posix_lock = 0,
 .allow_direct_io = 0,
 .proc_self_fd = -1,
+.proc_self_task = -1,
 .user_killpriv_v2 = 

[PULL 04/12] virtiofsd: Parse extended "struct fuse_init_in"

2022-02-17 Thread Dr. David Alan Gilbert (git)
From: Vivek Goyal 

Add some code to parse extended "struct fuse_init_in". And use a local
variable "flag" to represent 64 bit flags. This will make it easier
to add more features without having to worry about two 32bit flags (->flags
and ->flags2) in "fuse_struct_in".

Signed-off-by: Vivek Goyal 
Message-Id: <20220208204813.682906-4-vgo...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
  dgilbert: Fixed up long line
---
 tools/virtiofsd/fuse_lowlevel.c | 61 +
 1 file changed, 39 insertions(+), 22 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 5d431a7038..03d60f462a 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1882,11 +1882,14 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
 size_t compat_size = offsetof(struct fuse_init_in, max_readahead);
 size_t compat2_size = offsetof(struct fuse_init_in, flags) +
   sizeof(uint32_t);
+/* Fuse structure extended with minor version 36 */
+size_t compat3_size = endof(struct fuse_init_in, unused);
 struct fuse_init_in *arg;
 struct fuse_init_out outarg;
 struct fuse_session *se = req->se;
 size_t bufsize = se->bufsize;
 size_t outargsize = sizeof(outarg);
+uint64_t flags = 0;
 
 (void)nodeid;
 
@@ -1903,11 +1906,25 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
 fuse_reply_err(req, EINVAL);
 return;
 }
+flags |= arg->flags;
+}
+
+/*
+ * fuse_init_in was extended again with minor version 36. Just read
+ * current known size of fuse_init so that future extension and
+ * header rebase does not cause breakage.
+ */
+if (sizeof(*arg) > compat2_size && (arg->flags & FUSE_INIT_EXT)) {
+if (!fuse_mbuf_iter_advance(iter, compat3_size - compat2_size)) {
+fuse_reply_err(req, EINVAL);
+return;
+}
+flags |= (uint64_t) arg->flags2 << 32;
 }
 
 fuse_log(FUSE_LOG_DEBUG, "INIT: %u.%u\n", arg->major, arg->minor);
 if (arg->major == 7 && arg->minor >= 6) {
-fuse_log(FUSE_LOG_DEBUG, "flags=0x%08x\n", arg->flags);
+fuse_log(FUSE_LOG_DEBUG, "flags=0x%016llx\n", flags);
 fuse_log(FUSE_LOG_DEBUG, "max_readahead=0x%08x\n", arg->max_readahead);
 }
 se->conn.proto_major = arg->major;
@@ -1935,68 +1952,68 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
 if (arg->max_readahead < se->conn.max_readahead) {
 se->conn.max_readahead = arg->max_readahead;
 }
-if (arg->flags & FUSE_ASYNC_READ) {
+if (flags & FUSE_ASYNC_READ) {
 se->conn.capable |= FUSE_CAP_ASYNC_READ;
 }
-if (arg->flags & FUSE_POSIX_LOCKS) {
+if (flags & FUSE_POSIX_LOCKS) {
 se->conn.capable |= FUSE_CAP_POSIX_LOCKS;
 }
-if (arg->flags & FUSE_ATOMIC_O_TRUNC) {
+if (flags & FUSE_ATOMIC_O_TRUNC) {
 se->conn.capable |= FUSE_CAP_ATOMIC_O_TRUNC;
 }
-if (arg->flags & FUSE_EXPORT_SUPPORT) {
+if (flags & FUSE_EXPORT_SUPPORT) {
 se->conn.capable |= FUSE_CAP_EXPORT_SUPPORT;
 }
-if (arg->flags & FUSE_DONT_MASK) {
+if (flags & FUSE_DONT_MASK) {
 se->conn.capable |= FUSE_CAP_DONT_MASK;
 }
-if (arg->flags & FUSE_FLOCK_LOCKS) {
+if (flags & FUSE_FLOCK_LOCKS) {
 se->conn.capable |= FUSE_CAP_FLOCK_LOCKS;
 }
-if (arg->flags & FUSE_AUTO_INVAL_DATA) {
+if (flags & FUSE_AUTO_INVAL_DATA) {
 se->conn.capable |= FUSE_CAP_AUTO_INVAL_DATA;
 }
-if (arg->flags & FUSE_DO_READDIRPLUS) {
+if (flags & FUSE_DO_READDIRPLUS) {
 se->conn.capable |= FUSE_CAP_READDIRPLUS;
 }
-if (arg->flags & FUSE_READDIRPLUS_AUTO) {
+if (flags & FUSE_READDIRPLUS_AUTO) {
 se->conn.capable |= FUSE_CAP_READDIRPLUS_AUTO;
 }
-if (arg->flags & FUSE_ASYNC_DIO) {
+if (flags & FUSE_ASYNC_DIO) {
 se->conn.capable |= FUSE_CAP_ASYNC_DIO;
 }
-if (arg->flags & FUSE_WRITEBACK_CACHE) {
+if (flags & FUSE_WRITEBACK_CACHE) {
 se->conn.capable |= FUSE_CAP_WRITEBACK_CACHE;
 }
-if (arg->flags & FUSE_NO_OPEN_SUPPORT) {
+if (flags & FUSE_NO_OPEN_SUPPORT) {
 se->conn.capable |= FUSE_CAP_NO_OPEN_SUPPORT;
 }
-if (arg->flags & FUSE_PARALLEL_DIROPS) {
+if (flags & FUSE_PARALLEL_DIROPS) {
 se->conn.capable |= FUSE_CAP_PARALLEL_DIROPS;
 }
-if (arg->flags & FUSE_POSIX_ACL) {
+if (flags & FUSE_POSIX_ACL) {
 se->conn.capable |= FUSE_CAP_POSIX_ACL;
 }
-if (arg->flags & FUSE_HANDLE_KILLPRIV) {
+if (flags & FUSE_HANDLE_KILLPRIV) {
 se->conn.capable |= FUSE_CAP_HANDLE_KILLPRIV;
 }
-if (arg->flags & FUSE_NO_OPENDIR_SUPPORT) {
+if (flags & FUSE_NO_OPENDIR_SUPPORT) {
 se->conn.capable |= FUSE_CAP_NO_OPENDIR_SUPPORT;
 }
-if (!(arg->flags & FUSE_MAX_PAGES)) {
+if (!(flags & 

Re: [PATCH] migration: NULL transport_data after freeing

2022-02-17 Thread Dr. David Alan Gilbert
* Hanna Reitz (hre...@redhat.com) wrote:
> migration_incoming_state_destroy() NULLs all objects it frees after they
> are freed, presumably so that a subsequent call to the same function
> will not free them again, unless new objects have been created in the
> meantime.
> 
> transport_data is the exception, and it shows exactly this problem: When
> an incoming migration uses transport_cleanup() and transport_data, and a
> subsequent incoming migration (e.g. loadvm) occurs that does not, then
> when this second one is done, it will call transport_cleanup() on the
> old transport_data again -- which has already been freed.  This is
> sometimes visible in the iotest 201, though for some reason I can only
> reproduce it with -m32.
> 
> To fix this, call transport_cleanup() only when transport_data is not
> NULL (otherwise there is nothing to clean up), and set transport_data to
> NULL when it has been cleaned up (i.e. freed).
> 
> (transport_cleanup() is used only by migration/socket.c, where
> socket_start_incoming_migration_internal() sets both it and
> transport_data to non-NULL values.)
> 
> Signed-off-by: Hanna Reitz 

That probably deserves a fixes: a59136f

Reviewed-by: Dr. David Alan Gilbert 

> ---
>  migration/migration.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index bcc385b94b..cdb2e76d02 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -287,8 +287,9 @@ void migration_incoming_state_destroy(void)
>  g_array_free(mis->postcopy_remote_fds, TRUE);
>  mis->postcopy_remote_fds = NULL;
>  }
> -if (mis->transport_cleanup) {
> +if (mis->transport_cleanup && mis->transport_data) {
>  mis->transport_cleanup(mis->transport_data);
> +mis->transport_data = NULL;
>  }
>  
>  qemu_event_reset(>main_thread_load_event);
> -- 
> 2.34.1
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




[PULL 07/12] virtiofsd: Move core file creation code in separate function

2022-02-17 Thread Dr. David Alan Gilbert (git)
From: Vivek Goyal 

Move core file creation bits in a separate function. Soon this is going
to get more complex as file creation need to set security context also.
And there will be multiple modes of file creation in next patch.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Vivek Goyal 
Message-Id: <20220208204813.682906-7-vgo...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 tools/virtiofsd/passthrough_ll.c | 36 ++--
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 3e56d1cd95..e27479f1c9 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2001,6 +2001,30 @@ static int lo_do_open(struct lo_data *lo, struct 
lo_inode *inode,
 return 0;
 }
 
+static int do_lo_create(fuse_req_t req, struct lo_inode *parent_inode,
+const char *name, mode_t mode,
+struct fuse_file_info *fi, int* open_fd)
+{
+int err = 0, fd;
+struct lo_cred old = {};
+struct lo_data *lo = lo_data(req);
+
+err = lo_change_cred(req, , lo->change_umask);
+if (err) {
+return err;
+}
+
+/* Try to create a new file but don't open existing files */
+fd = openat(parent_inode->fd, name, fi->flags | O_CREAT | O_EXCL, mode);
+if (fd == -1) {
+err = errno;
+} else {
+*open_fd = fd;
+}
+lo_restore_cred(, lo->change_umask);
+return err;
+}
+
 static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
   mode_t mode, struct fuse_file_info *fi)
 {
@@ -2010,7 +2034,6 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, 
const char *name,
 struct lo_inode *inode = NULL;
 struct fuse_entry_param e;
 int err;
-struct lo_cred old = {};
 
 fuse_log(FUSE_LOG_DEBUG, "lo_create(parent=%" PRIu64 ", name=%s)"
  " kill_priv=%d\n", parent, name, fi->kill_priv);
@@ -2026,18 +2049,9 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, 
const char *name,
 return;
 }
 
-err = lo_change_cred(req, , lo->change_umask);
-if (err) {
-goto out;
-}
-
 update_open_flags(lo->writeback, lo->allow_direct_io, fi);
 
-/* Try to create a new file but don't open existing files */
-fd = openat(parent_inode->fd, name, fi->flags | O_CREAT | O_EXCL, mode);
-err = fd == -1 ? errno : 0;
-
-lo_restore_cred(, lo->change_umask);
+err = do_lo_create(req, parent_inode, name, mode, fi, );
 
 /* Ignore the error if file exists and O_EXCL was not given */
 if (err && (err != EEXIST || (fi->flags & O_EXCL))) {
-- 
2.35.1




[PULL 06/12] virtiofsd, fuse_lowlevel.c: Add capability to parse security context

2022-02-17 Thread Dr. David Alan Gilbert (git)
From: Vivek Goyal 

Add capability to enable and parse security context as sent by client
and put into fuse_req. Filesystems now can get security context from
request and set it on files during creation.

Signed-off-by: Vivek Goyal 
Message-Id: <20220208204813.682906-6-vgo...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
---
 tools/virtiofsd/fuse_common.h   |   5 ++
 tools/virtiofsd/fuse_i.h|   7 +++
 tools/virtiofsd/fuse_lowlevel.c | 102 +++-
 3 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index 6f8a988202..bf46954dab 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -377,6 +377,11 @@ struct fuse_file_info {
  */
 #define FUSE_CAP_SETXATTR_EXT (1 << 29)
 
+/**
+ * Indicates that file server supports creating file security context
+ */
+#define FUSE_CAP_SECURITY_CTX (1ULL << 32)
+
 /**
  * Ioctl flags
  *
diff --git a/tools/virtiofsd/fuse_i.h b/tools/virtiofsd/fuse_i.h
index 492e002181..a5572fa4ae 100644
--- a/tools/virtiofsd/fuse_i.h
+++ b/tools/virtiofsd/fuse_i.h
@@ -15,6 +15,12 @@
 struct fv_VuDev;
 struct fv_QueueInfo;
 
+struct fuse_security_context {
+const char *name;
+uint32_t ctxlen;
+const void *ctx;
+};
+
 struct fuse_req {
 struct fuse_session *se;
 uint64_t unique;
@@ -35,6 +41,7 @@ struct fuse_req {
 } u;
 struct fuse_req *next;
 struct fuse_req *prev;
+struct fuse_security_context secctx;
 };
 
 struct fuse_notify_req {
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 794185fb33..f681d5e3b3 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -886,11 +886,63 @@ static void do_readlink(fuse_req_t req, fuse_ino_t nodeid,
 }
 }
 
+static int parse_secctx_fill_req(fuse_req_t req, struct fuse_mbuf_iter *iter)
+{
+struct fuse_secctx_header *fsecctx_header;
+struct fuse_secctx *fsecctx;
+const void *secctx;
+const char *name;
+
+fsecctx_header = fuse_mbuf_iter_advance(iter, sizeof(*fsecctx_header));
+if (!fsecctx_header) {
+return -EINVAL;
+}
+
+/*
+ * As of now maximum of one security context is supported. It can
+ * change in future though.
+ */
+if (fsecctx_header->nr_secctx > 1) {
+return -EINVAL;
+}
+
+/* No security context sent. Maybe no LSM supports it */
+if (!fsecctx_header->nr_secctx) {
+return 0;
+}
+
+fsecctx = fuse_mbuf_iter_advance(iter, sizeof(*fsecctx));
+if (!fsecctx) {
+return -EINVAL;
+}
+
+/* struct fsecctx with zero sized context is not expected */
+if (!fsecctx->size) {
+return -EINVAL;
+}
+name = fuse_mbuf_iter_advance_str(iter);
+if (!name) {
+return -EINVAL;
+}
+
+secctx = fuse_mbuf_iter_advance(iter, fsecctx->size);
+if (!secctx) {
+return -EINVAL;
+}
+
+req->secctx.name = name;
+req->secctx.ctx = secctx;
+req->secctx.ctxlen = fsecctx->size;
+return 0;
+}
+
 static void do_mknod(fuse_req_t req, fuse_ino_t nodeid,
  struct fuse_mbuf_iter *iter)
 {
 struct fuse_mknod_in *arg;
 const char *name;
+bool secctx_enabled = req->se->conn.want & FUSE_CAP_SECURITY_CTX;
+int err;
 
 arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
 name = fuse_mbuf_iter_advance_str(iter);
@@ -901,6 +953,14 @@ static void do_mknod(fuse_req_t req, fuse_ino_t nodeid,
 
 req->ctx.umask = arg->umask;
 
+if (secctx_enabled) {
+err = parse_secctx_fill_req(req, iter);
+if (err) {
+fuse_reply_err(req, -err);
+return;
+}
+}
+
 if (req->se->op.mknod) {
 req->se->op.mknod(req, nodeid, name, arg->mode, arg->rdev);
 } else {
@@ -913,6 +973,8 @@ static void do_mkdir(fuse_req_t req, fuse_ino_t nodeid,
 {
 struct fuse_mkdir_in *arg;
 const char *name;
+bool secctx_enabled = req->se->conn.want & FUSE_CAP_SECURITY_CTX;
+int err;
 
 arg = fuse_mbuf_iter_advance(iter, sizeof(*arg));
 name = fuse_mbuf_iter_advance_str(iter);
@@ -923,6 +985,14 @@ static void do_mkdir(fuse_req_t req, fuse_ino_t nodeid,
 
 req->ctx.umask = arg->umask;
 
+if (secctx_enabled) {
+err = parse_secctx_fill_req(req, iter);
+if (err) {
+fuse_reply_err(req, err);
+return;
+}
+}
+
 if (req->se->op.mkdir) {
 req->se->op.mkdir(req, nodeid, name, arg->mode);
 } else {
@@ -969,12 +1039,22 @@ static void do_symlink(fuse_req_t req, fuse_ino_t nodeid,
 {
 const char *name = fuse_mbuf_iter_advance_str(iter);
 const char *linkname = fuse_mbuf_iter_advance_str(iter);
+bool secctx_enabled = req->se->conn.want & FUSE_CAP_SECURITY_CTX;
+int err;
 
 if (!name || !linkname) {
 fuse_reply_err(req, EINVAL);
 return;
 }
 
+

[PULL 05/12] virtiofsd: Extend size of fuse_conn_info->capable and ->want fields

2022-02-17 Thread Dr. David Alan Gilbert (git)
From: Vivek Goyal 

->capable keeps track of what capabilities kernel supports and ->wants keep
track of what capabilities filesytem wants.

Right now these fields are 32bit in size. But now fuse has run out of
bits and capabilities can now have bit number which are higher than 31.

That means 32 bit fields are not suffcient anymore. Increase size to 64
bit so that we can add newer capabilities and still be able to use existing
code to check and set the capabilities.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Vivek Goyal 
Message-Id: <20220208204813.682906-5-vgo...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 tools/virtiofsd/fuse_common.h   | 4 ++--
 tools/virtiofsd/fuse_lowlevel.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index 0c2665b977..6f8a988202 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -439,7 +439,7 @@ struct fuse_conn_info {
 /**
  * Capability flags that the kernel supports (read-only)
  */
-unsigned capable;
+uint64_t capable;
 
 /**
  * Capability flags that the filesystem wants to enable.
@@ -447,7 +447,7 @@ struct fuse_conn_info {
  * libfuse attempts to initialize this field with
  * reasonable default values before calling the init() handler.
  */
-unsigned want;
+uint64_t want;
 
 /**
  * Maximum number of pending "background" requests. A
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index 03d60f462a..794185fb33 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -2070,7 +2070,7 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
 if (se->conn.want & (~se->conn.capable)) {
 fuse_log(FUSE_LOG_ERR,
  "fuse: error: filesystem requested capabilities "
- "0x%x that are not supported by kernel, aborting.\n",
+ "0x%llx that are not supported by kernel, aborting.\n",
  se->conn.want & (~se->conn.capable));
 fuse_reply_err(req, EPROTO);
 se->error = -EPROTO;
-- 
2.35.1




Re: [Virtio-fs] [PULL 00/12] virtiofs queue

2022-02-17 Thread Dr. David Alan Gilbert
* Dr. David Alan Gilbert (git) (dgilb...@redhat.com) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> The following changes since commit c13b8e9973635f34f3ce4356af27a311c993729c:
> 
>   Merge remote-tracking branch 
> 'remotes/alistair/tags/pull-riscv-to-apply-20220216' into staging (2022-02-16 
> 09:57:11 +)
> 
> are available in the Git repository at:
> 
>   https://gitlab.com/dagrh/qemu.git tags/pull-virtiofs-20220217
> 
> for you to fetch changes up to e138ec4ac86ea71d10ecd032edc433290776a5f2:
> 
>   virtiofsd: Add basic support for FUSE_SYNCFS request (2022-02-17 13:35:55 
> +)

NAK again
Some checkpatch fixes slipped through; v3 in flight

> 
> V2: virtiofs pull 2022-02-17
> 
> Security label improvements from Vivek
>   - includes a fix for building against new kernel headers
>   [V2: Fix building on old Linux]
> Blocking flock disable from Sebastian
> SYNCFS support from Greg
> 
> Signed-off-by: Dr. David Alan Gilbert 
> 
> 
> Greg Kurz (1):
>   virtiofsd: Add basic support for FUSE_SYNCFS request
> 
> Sebastian Hasler (1):
>   virtiofsd: Do not support blocking flock
> 
> Vivek Goyal (10):
>   virtiofsd: Fix breakage due to fuse_init_in size change
>   linux-headers: Update headers to v5.17-rc1
>   virtiofsd: Parse extended "struct fuse_init_in"
>   virtiofsd: Extend size of fuse_conn_info->capable and ->want fields
>   virtiofsd, fuse_lowlevel.c: Add capability to parse security context
>   virtiofsd: Move core file creation code in separate function
>   virtiofsd: Add helpers to work with /proc/self/task/tid/attr/fscreate
>   virtiofsd: Create new file with security context
>   virtiofsd: Create new file using O_TMPFILE and set security context
>   virtiofsd: Add an option to enable/disable security label
> 
>  docs/tools/virtiofsd.rst   |  32 ++
>  include/standard-headers/asm-x86/kvm_para.h|   1 +
>  include/standard-headers/drm/drm_fourcc.h  |  11 +
>  include/standard-headers/linux/ethtool.h   |   1 +
>  include/standard-headers/linux/fuse.h  |  60 +++-
>  include/standard-headers/linux/pci_regs.h  | 142 
>  include/standard-headers/linux/virtio_gpio.h   |  72 
>  include/standard-headers/linux/virtio_i2c.h|  47 +++
>  include/standard-headers/linux/virtio_iommu.h  |   8 +-
>  include/standard-headers/linux/virtio_pcidev.h |  65 
>  include/standard-headers/linux/virtio_scmi.h   |  24 ++
>  linux-headers/asm-generic/unistd.h |   5 +-
>  linux-headers/asm-mips/unistd_n32.h|   2 +
>  linux-headers/asm-mips/unistd_n64.h|   2 +
>  linux-headers/asm-mips/unistd_o32.h|   2 +
>  linux-headers/asm-powerpc/unistd_32.h  |   2 +
>  linux-headers/asm-powerpc/unistd_64.h  |   2 +
>  linux-headers/asm-riscv/bitsperlong.h  |  14 +
>  linux-headers/asm-riscv/mman.h |   1 +
>  linux-headers/asm-riscv/unistd.h   |  44 +++
>  linux-headers/asm-s390/unistd_32.h |   2 +
>  linux-headers/asm-s390/unistd_64.h |   2 +
>  linux-headers/asm-x86/kvm.h|  16 +-
>  linux-headers/asm-x86/unistd_32.h  |   1 +
>  linux-headers/asm-x86/unistd_64.h  |   1 +
>  linux-headers/asm-x86/unistd_x32.h |   1 +
>  linux-headers/linux/kvm.h  |  17 +
>  tools/virtiofsd/fuse_common.h  |   9 +-
>  tools/virtiofsd/fuse_i.h   |   7 +
>  tools/virtiofsd/fuse_lowlevel.c| 179 --
>  tools/virtiofsd/fuse_lowlevel.h|  13 +
>  tools/virtiofsd/helper.c   |   1 +
>  tools/virtiofsd/passthrough_ll.c   | 467 
> +++--
>  tools/virtiofsd/passthrough_seccomp.c  |   1 +
>  34 files changed, 1122 insertions(+), 132 deletions(-)
>  create mode 100644 include/standard-headers/linux/virtio_gpio.h
>  create mode 100644 include/standard-headers/linux/virtio_i2c.h
>  create mode 100644 include/standard-headers/linux/virtio_pcidev.h
>  create mode 100644 include/standard-headers/linux/virtio_scmi.h
>  create mode 100644 linux-headers/asm-riscv/bitsperlong.h
>  create mode 100644 linux-headers/asm-riscv/mman.h
>  create mode 100644 linux-headers/asm-riscv/unistd.h
> 
> ___
> Virtio-fs mailing list
> virtio...@redhat.com
> https://listman.redhat.com/mailman/listinfo/virtio-fs
> 
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH v2 00/12] GL & D-Bus display related fixes

2022-02-17 Thread Akihiko Odaki
On Fri, Feb 18, 2022 at 2:07 AM Marc-André Lureau
 wrote:
>
> Hi
>
> On Thu, Feb 17, 2022 at 8:39 PM Akihiko Odaki  wrote:
>>
>> On Fri, Feb 18, 2022 at 1:12 AM Marc-André Lureau
>>  wrote:
>> >
>> > Hi
>> >
>> > On Thu, Feb 17, 2022 at 5:09 PM Akihiko Odaki  
>> > wrote:
>> >>
>> >> On Thu, Feb 17, 2022 at 8:58 PM  wrote:
>> >> >
>> >> > From: Marc-André Lureau 
>> >> >
>> >> > Hi,
>> >> >
>> >> > In the thread "[PATCH 0/6] ui/dbus: Share one listener for a console", 
>> >> > Akihiko
>> >> > Odaki reported a number of issues with the GL and D-Bus display. His 
>> >> > series
>> >> > propose a different design, and reverting some of my previous generic 
>> >> > console
>> >> > changes to fix those issues.
>> >> >
>> >> > However, as I work through the issue so far, they can be solved by 
>> >> > reasonable
>> >> > simple fixes while keeping the console changes generic (not specific to 
>> >> > the
>> >> > D-Bus display backend). I belive a shared infrastructure is more 
>> >> > beneficial long
>> >> > term than having GL-specific code in the DBus code (in particular, the
>> >> > egl-headless & VNC combination could potentially use it)
>> >> >
>> >> > Thanks a lot Akihiko for reporting the issues proposing a different 
>> >> > approach!
>> >> > Please test this alternative series and let me know of any further 
>> >> > issues. My
>> >> > understanding is that you are mainly concerned with the Cocoa backend, 
>> >> > and I
>> >> > don't have a way to test it, so please check it. If necessary, we may 
>> >> > well have
>> >> > to revert my earlier changes and go your way, eventually.
>> >> >
>> >> > Marc-André Lureau (12):
>> >> >   ui/console: fix crash when using gl context with non-gl listeners
>> >> >   ui/console: fix texture leak when calling surface_gl_create_texture()
>> >> >   ui: do not create a surface when resizing a GL scanout
>> >> >   ui/console: move check for compatible GL context
>> >> >   ui/console: move dcl compatiblity check to a callback
>> >> >   ui/console: egl-headless is compatible with non-gl listeners
>> >> >   ui/dbus: associate the DBusDisplayConsole listener with the given
>> >> > console
>> >> >   ui/console: move console compatibility check to dcl_display_console()
>> >> >   ui/shader: fix potential leak of shader on error
>> >> >   ui/shader: free associated programs
>> >> >   ui/console: add a dpy_gfx_switch callback helper
>> >> >   ui/dbus: fix texture sharing
>> >> >
>> >> >  include/ui/console.h |  19 ---
>> >> >  ui/dbus.h|   3 ++
>> >> >  ui/console-gl.c  |   4 ++
>> >> >  ui/console.c | 119 ++-
>> >> >  ui/dbus-console.c|  27 +-
>> >> >  ui/dbus-listener.c   |  11 
>> >> >  ui/dbus.c|  33 +++-
>> >> >  ui/egl-headless.c|  17 ++-
>> >> >  ui/gtk.c |  18 ++-
>> >> >  ui/sdl2.c|   9 +++-
>> >> >  ui/shader.c  |   9 +++-
>> >> >  ui/spice-display.c   |   9 +++-
>> >> >  12 files changed, 192 insertions(+), 86 deletions(-)
>> >> >
>> >> > --
>> >> > 2.34.1.428.gdcc0cd074f0c
>> >> >
>> >> >
>> >>
>> >> You missed only one thing:
>> >> >- that console_select and register_displaychangelistener may not call
>> >> > dpy_gfx_switch and call dpy_gl_scanout_texture instead. It is
>> >> > incompatible with non-OpenGL displays
>> >>
>> >> displaychangelistener_display_console always has to call
>> >> dpy_gfx_switch for non-OpenGL displays, but it still doesn't.
>> >
>> >
>> > Ok, would that be what you have in mind?
>> >
>> >  --- a/ui/console.c
>> > +++ b/ui/console.c
>> > @@ -1122,6 +1122,10 @@ static void 
>> > displaychangelistener_display_console(DisplayChangeListener *dcl,
>> >  } else if (con->scanout.kind == SCANOUT_SURFACE) {
>> >  dpy_gfx_create_texture(con, con->surface);
>> >  displaychangelistener_gfx_switch(dcl, con->surface);
>> > +} else {
>> > +/* use the fallback surface, egl-headless keeps it updated */
>> > +assert(con->surface);
>> > +displaychangelistener_gfx_switch(dcl, con->surface);
>> >  }
>>
>> It should call displaychangelistener_gfx_switch even when e.g.
>> con->scanout.kind == SCANOUT_TEXTURE. egl-headless renders the content
>> to the last DisplaySurface it received while con->scanout.kind ==
>> SCANOUT_TEXTURE.
>
>
> I see, egl-headless is really not a "listener".
>
>>
>> >
>> > I wish such egl-headless specific code would be there, but we would need 
>> > more refactoring.
>> >
>> > I think I would rather have a backend split for GL context, like "-object 
>> > egl-context". egl-headless-specific copy code would be handled by 
>> > common/util code for anything that wants a pixman surface (VNC, screen 
>> > capture, non-GL display etc).
>> >
>> > This split would allow sharing the context code, and introduce other 
>> > system specific GL initialization, such as WGL etc. Right now, I doubt the 
>> > EGL code works on anything but Linux.
>>
>> 

[PULL 03/12] linux-headers: Update headers to v5.17-rc1

2022-02-17 Thread Dr. David Alan Gilbert (git)
From: Vivek Goyal 

Update headers to 5.17-rc1. I need latest fuse changes.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Vivek Goyal 
Message-Id: <20220208204813.682906-3-vgo...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 include/standard-headers/asm-x86/kvm_para.h   |   1 +
 include/standard-headers/drm/drm_fourcc.h |  11 ++
 include/standard-headers/linux/ethtool.h  |   1 +
 include/standard-headers/linux/fuse.h |  60 +++-
 include/standard-headers/linux/pci_regs.h | 142 +-
 include/standard-headers/linux/virtio_gpio.h  |  72 +
 include/standard-headers/linux/virtio_i2c.h   |  47 ++
 include/standard-headers/linux/virtio_iommu.h |   8 +-
 .../standard-headers/linux/virtio_pcidev.h|  65 
 include/standard-headers/linux/virtio_scmi.h  |  24 +++
 linux-headers/asm-generic/unistd.h|   5 +-
 linux-headers/asm-mips/unistd_n32.h   |   2 +
 linux-headers/asm-mips/unistd_n64.h   |   2 +
 linux-headers/asm-mips/unistd_o32.h   |   2 +
 linux-headers/asm-powerpc/unistd_32.h |   2 +
 linux-headers/asm-powerpc/unistd_64.h |   2 +
 linux-headers/asm-riscv/bitsperlong.h |  14 ++
 linux-headers/asm-riscv/mman.h|   1 +
 linux-headers/asm-riscv/unistd.h  |  44 ++
 linux-headers/asm-s390/unistd_32.h|   2 +
 linux-headers/asm-s390/unistd_64.h|   2 +
 linux-headers/asm-x86/kvm.h   |  16 +-
 linux-headers/asm-x86/unistd_32.h |   1 +
 linux-headers/asm-x86/unistd_64.h |   1 +
 linux-headers/asm-x86/unistd_x32.h|   1 +
 linux-headers/linux/kvm.h |  17 +++
 26 files changed, 469 insertions(+), 76 deletions(-)
 create mode 100644 include/standard-headers/linux/virtio_gpio.h
 create mode 100644 include/standard-headers/linux/virtio_i2c.h
 create mode 100644 include/standard-headers/linux/virtio_pcidev.h
 create mode 100644 include/standard-headers/linux/virtio_scmi.h
 create mode 100644 linux-headers/asm-riscv/bitsperlong.h
 create mode 100644 linux-headers/asm-riscv/mman.h
 create mode 100644 linux-headers/asm-riscv/unistd.h

diff --git a/include/standard-headers/asm-x86/kvm_para.h 
b/include/standard-headers/asm-x86/kvm_para.h
index 204cfb8640..f0235e58a1 100644
--- a/include/standard-headers/asm-x86/kvm_para.h
+++ b/include/standard-headers/asm-x86/kvm_para.h
@@ -8,6 +8,7 @@
  * should be used to determine that a VM is running under KVM.
  */
 #define KVM_CPUID_SIGNATURE0x4000
+#define KVM_SIGNATURE "KVMKVMKVM\0\0\0"
 
 /* This CPUID returns two feature bitmaps in eax, edx. Before enabling
  * a particular paravirtualization, the appropriate feature bit should
diff --git a/include/standard-headers/drm/drm_fourcc.h 
b/include/standard-headers/drm/drm_fourcc.h
index 2c025cb4fe..4888f85f69 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -313,6 +313,13 @@ extern "C" {
  */
 #define DRM_FORMAT_P016fourcc_code('P', '0', '1', '6') /* 2x2 
subsampled Cr:Cb plane 16 bits per channel */
 
+/* 2 plane YCbCr420.
+ * 3 10 bit components and 2 padding bits packed into 4 bytes.
+ * index 0 = Y plane, [31:0] x:Y2:Y1:Y0 2:10:10:10 little endian
+ * index 1 = Cr:Cb plane, [63:0] x:Cr2:Cb2:Cr1:x:Cb1:Cr0:Cb0 
[2:10:10:10:2:10:10:10] little endian
+ */
+#define DRM_FORMAT_P030fourcc_code('P', '0', '3', '0') /* 2x2 
subsampled Cr:Cb plane 10 bits per channel packed */
+
 /* 3 plane non-subsampled (444) YCbCr
  * 16 bits per component, but only 10 bits are used and 6 bits are padded
  * index 0: Y plane, [15:0] Y:x [10:6] little endian
@@ -853,6 +860,10 @@ drm_fourcc_canonicalize_nvidia_format_mod(uint64_t 
modifier)
  * and UV.  Some SAND-using hardware stores UV in a separate tiled
  * image from Y to reduce the column height, which is not supported
  * with these modifiers.
+ *
+ * The DRM_FORMAT_MOD_BROADCOM_SAND128_COL_HEIGHT modifier is also
+ * supported for DRM_FORMAT_P030 where the columns remain as 128 bytes
+ * wide, but as this is a 10 bpp format that translates to 96 pixels.
  */
 
 #define DRM_FORMAT_MOD_BROADCOM_SAND32_COL_HEIGHT(v) \
diff --git a/include/standard-headers/linux/ethtool.h 
b/include/standard-headers/linux/ethtool.h
index 688eb8dc39..38d5a4cd6e 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -231,6 +231,7 @@ enum tunable_id {
ETHTOOL_RX_COPYBREAK,
ETHTOOL_TX_COPYBREAK,
ETHTOOL_PFC_PREVENTION_TOUT, /* timeout in msecs */
+   ETHTOOL_TX_COPYBREAK_BUF_SIZE,
/*
 * Add your fresh new tunable attribute above and remember to update
 * tunable_strings[] in net/ethtool/common.c
diff --git a/include/standard-headers/linux/fuse.h 
b/include/standard-headers/linux/fuse.h
index 23ea31708b..bda06258be 100644
--- a/include/standard-headers/linux/fuse.h
+++ 

[PULL 02/12] virtiofsd: Fix breakage due to fuse_init_in size change

2022-02-17 Thread Dr. David Alan Gilbert (git)
From: Vivek Goyal 

Kernel version 5.17 has increased the size of "struct fuse_init_in" struct.
Previously this struct was 16 bytes and now it has been extended to
64 bytes in size.

Once qemu headers are updated to latest, it will expect to receive 64 byte
size struct (for protocol version major 7 and minor > 6). But if guest is
booting older kernel (older than 5.17), then it still sends older
fuse_init_in of size 16 bytes. And do_init() fails. It is expecting
64 byte struct. And this results in mount of virtiofs failing.

Fix this by parsing 16 bytes only for now. Separate patches will be
posted which will parse rest of the bytes and enable new functionality.
Right now we don't support any of the new functionality, so we don't
lose anything by not parsing bytes beyond 16.

Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Vivek Goyal 
Message-Id: <20220208204813.682906-2-vgo...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 tools/virtiofsd/fuse_lowlevel.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index e4679c73ab..5d431a7038 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1880,6 +1880,8 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
 struct fuse_mbuf_iter *iter)
 {
 size_t compat_size = offsetof(struct fuse_init_in, max_readahead);
+size_t compat2_size = offsetof(struct fuse_init_in, flags) +
+  sizeof(uint32_t);
 struct fuse_init_in *arg;
 struct fuse_init_out outarg;
 struct fuse_session *se = req->se;
@@ -1897,7 +1899,7 @@ static void do_init(fuse_req_t req, fuse_ino_t nodeid,
 
 /* ...and now consume the new fields. */
 if (arg->major == 7 && arg->minor >= 6) {
-if (!fuse_mbuf_iter_advance(iter, sizeof(*arg) - compat_size)) {
+if (!fuse_mbuf_iter_advance(iter, compat2_size - compat_size)) {
 fuse_reply_err(req, EINVAL);
 return;
 }
-- 
2.35.1




[PULL 01/12] virtiofsd: Do not support blocking flock

2022-02-17 Thread Dr. David Alan Gilbert (git)
From: Sebastian Hasler 

With the current implementation, blocking flock can lead to
deadlock. Thus, it's better to return EOPNOTSUPP if a user attempts
to perform a blocking flock request.

Signed-off-by: Sebastian Hasler 
Message-Id: <20220113153249.710216-1-sebastian.has...@stuvus.uni-stuttgart.de>
Signed-off-by: Dr. David Alan Gilbert 
Reviewed-by: Vivek Goyal 
Reviewed-by: Greg Kurz 
---
 tools/virtiofsd/passthrough_ll.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index b3d0674f6d..3e56d1cd95 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -2467,6 +2467,15 @@ static void lo_flock(fuse_req_t req, fuse_ino_t ino, 
struct fuse_file_info *fi,
 int res;
 (void)ino;
 
+if (!(op & LOCK_NB)) {
+/*
+ * Blocking flock can deadlock as there is only one thread
+ * serving the queue.
+ */
+fuse_reply_err(req, EOPNOTSUPP);
+return;
+}
+
 res = flock(lo_fi_fd(req, fi), op);
 
 fuse_reply_err(req, res == -1 ? errno : 0);
-- 
2.35.1




[PULL 00/12] virtiofs queue

2022-02-17 Thread Dr. David Alan Gilbert (git)
From: "Dr. David Alan Gilbert" 

The following changes since commit c13b8e9973635f34f3ce4356af27a311c993729c:

  Merge remote-tracking branch 
'remotes/alistair/tags/pull-riscv-to-apply-20220216' into staging (2022-02-16 
09:57:11 +)

are available in the Git repository at:

  https://gitlab.com/dagrh/qemu.git tags/pull-virtiofs-20220217b

for you to fetch changes up to 45b04ef48dbbeb18d93c2631bf5584ac493de749:

  virtiofsd: Add basic support for FUSE_SYNCFS request (2022-02-17 17:22:26 
+)


V3: virtiofs pull 2022-02-17

Security label improvements from Vivek
  - includes a fix for building against new kernel headers
  [V3: checkpatch style fixes]
  [V2: Fix building on old Linux]
Blocking flock disable from Sebastian
SYNCFS support from Greg

Signed-off-by: Dr. David Alan Gilbert 


Greg Kurz (1):
  virtiofsd: Add basic support for FUSE_SYNCFS request

Sebastian Hasler (1):
  virtiofsd: Do not support blocking flock

Vivek Goyal (10):
  virtiofsd: Fix breakage due to fuse_init_in size change
  linux-headers: Update headers to v5.17-rc1
  virtiofsd: Parse extended "struct fuse_init_in"
  virtiofsd: Extend size of fuse_conn_info->capable and ->want fields
  virtiofsd, fuse_lowlevel.c: Add capability to parse security context
  virtiofsd: Move core file creation code in separate function
  virtiofsd: Add helpers to work with /proc/self/task/tid/attr/fscreate
  virtiofsd: Create new file with security context
  virtiofsd: Create new file using O_TMPFILE and set security context
  virtiofsd: Add an option to enable/disable security label

 docs/tools/virtiofsd.rst   |  32 ++
 include/standard-headers/asm-x86/kvm_para.h|   1 +
 include/standard-headers/drm/drm_fourcc.h  |  11 +
 include/standard-headers/linux/ethtool.h   |   1 +
 include/standard-headers/linux/fuse.h  |  60 +++-
 include/standard-headers/linux/pci_regs.h  | 142 
 include/standard-headers/linux/virtio_gpio.h   |  72 
 include/standard-headers/linux/virtio_i2c.h|  47 +++
 include/standard-headers/linux/virtio_iommu.h  |   8 +-
 include/standard-headers/linux/virtio_pcidev.h |  65 
 include/standard-headers/linux/virtio_scmi.h   |  24 ++
 linux-headers/asm-generic/unistd.h |   5 +-
 linux-headers/asm-mips/unistd_n32.h|   2 +
 linux-headers/asm-mips/unistd_n64.h|   2 +
 linux-headers/asm-mips/unistd_o32.h|   2 +
 linux-headers/asm-powerpc/unistd_32.h  |   2 +
 linux-headers/asm-powerpc/unistd_64.h  |   2 +
 linux-headers/asm-riscv/bitsperlong.h  |  14 +
 linux-headers/asm-riscv/mman.h |   1 +
 linux-headers/asm-riscv/unistd.h   |  44 +++
 linux-headers/asm-s390/unistd_32.h |   2 +
 linux-headers/asm-s390/unistd_64.h |   2 +
 linux-headers/asm-x86/kvm.h|  16 +-
 linux-headers/asm-x86/unistd_32.h  |   1 +
 linux-headers/asm-x86/unistd_64.h  |   1 +
 linux-headers/asm-x86/unistd_x32.h |   1 +
 linux-headers/linux/kvm.h  |  17 +
 tools/virtiofsd/fuse_common.h  |   9 +-
 tools/virtiofsd/fuse_i.h   |   7 +
 tools/virtiofsd/fuse_lowlevel.c| 180 --
 tools/virtiofsd/fuse_lowlevel.h|  13 +
 tools/virtiofsd/helper.c   |   1 +
 tools/virtiofsd/passthrough_ll.c   | 467 +++--
 tools/virtiofsd/passthrough_seccomp.c  |   1 +
 34 files changed, 1123 insertions(+), 132 deletions(-)
 create mode 100644 include/standard-headers/linux/virtio_gpio.h
 create mode 100644 include/standard-headers/linux/virtio_i2c.h
 create mode 100644 include/standard-headers/linux/virtio_pcidev.h
 create mode 100644 include/standard-headers/linux/virtio_scmi.h
 create mode 100644 linux-headers/asm-riscv/bitsperlong.h
 create mode 100644 linux-headers/asm-riscv/mman.h
 create mode 100644 linux-headers/asm-riscv/unistd.h




[PULL v2 0/5] 9p queue (previous 2022-02-10)

2022-02-17 Thread Christian Schoenebeck
The following changes since commit c13b8e9973635f34f3ce4356af27a311c993729c:

  Merge remote-tracking branch 
'remotes/alistair/tags/pull-riscv-to-apply-20220216' into staging (2022-02-16 
09:57:11 +)

are available in the Git repository at:

  https://github.com/cschoenebeck/qemu.git tags/pull-9p-20220217

for you to fetch changes up to e64e27d5cb103b7764f1a05b6eda7e7fedd517c5:

  9pfs: Fix segfault in do_readdir_many caused by struct dirent overread 
(2022-02-17 16:57:58 +0100)


9pfs: fixes and cleanup

* Fifth patch fixes a 9pfs server crash that happened on some systems due
  to incorrect (system dependant) handling of struct dirent size.

* Tests: Second patch fixes a test error that happened on some systems due
  mkdir() being called twice for creating the test directory for the 9p
  'local' tests.

* Tests: Third patch fixes a memory leak.

* Tests: The remaining two patches are code cleanup.


Christian Schoenebeck (2):
  tests/9pfs: use g_autofree where possible
  tests/9pfs: fix mkdir() being called twice

Greg Kurz (2):
  tests/9pfs: Fix leak of local_test_path
  tests/9pfs: Use g_autofree and g_autoptr where possible

Vitaly Chikunov (1):
  9pfs: Fix segfault in do_readdir_many caused by struct dirent overread

 hw/9pfs/9p-synth.c | 18 +++--
 hw/9pfs/9p-synth.h |  5 +++
 hw/9pfs/codir.c|  3 +-
 include/qemu/osdep.h   | 13 ++
 tests/qtest/libqos/virtio-9p.c | 38 +++---
 tests/qtest/virtio-9p-test.c   | 90 +-
 util/osdep.c   | 21 ++
 7 files changed, 96 insertions(+), 92 deletions(-)



[PULL v2 1/5] tests/9pfs: use g_autofree where possible

2022-02-17 Thread Christian Schoenebeck
Signed-off-by: Christian Schoenebeck 
Reviewed-by: Greg Kurz 
Message-Id: 
---
 tests/qtest/virtio-9p-test.c | 90 +++-
 1 file changed, 27 insertions(+), 63 deletions(-)

diff --git a/tests/qtest/virtio-9p-test.c b/tests/qtest/virtio-9p-test.c
index 41fed41de1..502e5ad0c7 100644
--- a/tests/qtest/virtio-9p-test.c
+++ b/tests/qtest/virtio-9p-test.c
@@ -84,7 +84,7 @@ static void pci_config(void *obj, void *data, QGuestAllocator 
*t_alloc)
 QVirtio9P *v9p = obj;
 alloc = t_alloc;
 size_t tag_len = qvirtio_config_readw(v9p->vdev, 0);
-char *tag;
+g_autofree char *tag = NULL;
 int i;
 
 g_assert_cmpint(tag_len, ==, strlen(MOUNT_TAG));
@@ -94,7 +94,6 @@ static void pci_config(void *obj, void *data, QGuestAllocator 
*t_alloc)
 tag[i] = qvirtio_config_readb(v9p->vdev, i + 2);
 }
 g_assert_cmpmem(tag, tag_len, MOUNT_TAG, tag_len);
-g_free(tag);
 }
 
 #define P9_MAX_SIZE 4096 /* Max size of a T-message or R-message */
@@ -580,7 +579,7 @@ static void do_version(QVirtio9P *v9p)
 {
 const char *version = "9P2000.L";
 uint16_t server_len;
-char *server_version;
+g_autofree char *server_version = NULL;
 P9Req *req;
 
 req = v9fs_tversion(v9p, P9_MAX_SIZE, version, P9_NOTAG);
@@ -588,8 +587,6 @@ static void do_version(QVirtio9P *v9p)
 v9fs_rversion(req, _len, _version);
 
 g_assert_cmpmem(server_version, server_len, version, strlen(version));
-
-g_free(server_version);
 }
 
 /* utility function: walk to requested dir and return fid for that dir */
@@ -637,7 +634,7 @@ static void fs_walk(void *obj, void *data, QGuestAllocator 
*t_alloc)
 alloc = t_alloc;
 char *wnames[P9_MAXWELEM];
 uint16_t nwqid;
-v9fs_qid *wqid;
+g_autofree v9fs_qid *wqid = NULL;
 int i;
 P9Req *req;
 
@@ -655,8 +652,6 @@ static void fs_walk(void *obj, void *data, QGuestAllocator 
*t_alloc)
 for (i = 0; i < P9_MAXWELEM; i++) {
 g_free(wnames[i]);
 }
-
-g_free(wqid);
 }
 
 static bool fs_dirents_contain_name(struct V9fsDirent *e, const char* name)
@@ -872,9 +867,9 @@ static void fs_readdir(void *obj, void *data, 
QGuestAllocator *t_alloc)
 g_assert_cmpint(fs_dirents_contain_name(entries, "."), ==, true);
 g_assert_cmpint(fs_dirents_contain_name(entries, ".."), ==, true);
 for (int i = 0; i < QTEST_V9FS_SYNTH_READDIR_NFILES; ++i) {
-char *name = g_strdup_printf(QTEST_V9FS_SYNTH_READDIR_FILE, i);
+g_autofree char *name =
+g_strdup_printf(QTEST_V9FS_SYNTH_READDIR_FILE, i);
 g_assert_cmpint(fs_dirents_contain_name(entries, name), ==, true);
-g_free(name);
 }
 
 v9fs_free_dirents(entries);
@@ -984,7 +979,8 @@ static void fs_walk_dotdot(void *obj, void *data, 
QGuestAllocator *t_alloc)
 QVirtio9P *v9p = obj;
 alloc = t_alloc;
 char *const wnames[] = { g_strdup("..") };
-v9fs_qid root_qid, *wqid;
+v9fs_qid root_qid;
+g_autofree v9fs_qid *wqid = NULL;
 P9Req *req;
 
 do_version(v9p);
@@ -998,7 +994,6 @@ static void fs_walk_dotdot(void *obj, void *data, 
QGuestAllocator *t_alloc)
 
 g_assert_cmpmem(_qid, 13, wqid[0], 13);
 
-g_free(wqid);
 g_free(wnames[0]);
 }
 
@@ -1027,7 +1022,7 @@ static void fs_write(void *obj, void *data, 
QGuestAllocator *t_alloc)
 alloc = t_alloc;
 static const uint32_t write_count = P9_MAX_SIZE / 2;
 char *const wnames[] = { g_strdup(QTEST_V9FS_SYNTH_WRITE_FILE) };
-char *buf = g_malloc0(write_count);
+g_autofree char *buf = g_malloc0(write_count);
 uint32_t count;
 P9Req *req;
 
@@ -1045,7 +1040,6 @@ static void fs_write(void *obj, void *data, 
QGuestAllocator *t_alloc)
 v9fs_rwrite(req, );
 g_assert_cmpint(count, ==, write_count);
 
-g_free(buf);
 g_free(wnames[0]);
 }
 
@@ -1125,7 +1119,7 @@ static void fs_flush_ignored(void *obj, void *data, 
QGuestAllocator *t_alloc)
 
 static void do_mkdir(QVirtio9P *v9p, const char *path, const char *cname)
 {
-char *const name = g_strdup(cname);
+g_autofree char *name = g_strdup(cname);
 uint32_t fid;
 P9Req *req;
 
@@ -1134,15 +1128,13 @@ static void do_mkdir(QVirtio9P *v9p, const char *path, 
const char *cname)
 req = v9fs_tmkdir(v9p, fid, name, 0750, 0, 0);
 v9fs_req_wait_for_reply(req, NULL);
 v9fs_rmkdir(req, NULL);
-
-g_free(name);
 }
 
 /* create a regular file with Tlcreate and return file's fid */
 static uint32_t do_lcreate(QVirtio9P *v9p, const char *path,
const char *cname)
 {
-char *const name = g_strdup(cname);
+g_autofree char *name = g_strdup(cname);
 uint32_t fid;
 P9Req *req;
 
@@ -1152,7 +1144,6 @@ static uint32_t do_lcreate(QVirtio9P *v9p, const char 
*path,
 v9fs_req_wait_for_reply(req, NULL);
 v9fs_rlcreate(req, NULL, NULL);
 
-g_free(name);
 return fid;
 }
 
@@ -1160,8 +1151,8 @@ static uint32_t do_lcreate(QVirtio9P *v9p, const char 
*path,
 static void 

Re: [PATCH 17/31] vdpa: adapt vhost_ops callbacks to svq

2022-02-17 Thread Eugenio Perez Martin
On Tue, Feb 8, 2022 at 4:58 AM Jason Wang  wrote:
>
>
> 在 2022/2/1 上午2:58, Eugenio Perez Martin 写道:
> > On Sun, Jan 30, 2022 at 5:03 AM Jason Wang  wrote:
> >>
> >> 在 2022/1/22 上午4:27, Eugenio Pérez 写道:
> >>> First half of the buffers forwarding part, preparing vhost-vdpa
> >>> callbacks to SVQ to offer it. QEMU cannot enable it at this moment, so
> >>> this is effectively dead code at the moment, but it helps to reduce
> >>> patch size.
> >>>
> >>> Signed-off-by: Eugenio Pérez 
> >>> ---
> >>>hw/virtio/vhost-shadow-virtqueue.h |   2 +-
> >>>hw/virtio/vhost-shadow-virtqueue.c |  21 -
> >>>hw/virtio/vhost-vdpa.c | 133 ++---
> >>>3 files changed, 143 insertions(+), 13 deletions(-)
> >>>
> >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
> >>> b/hw/virtio/vhost-shadow-virtqueue.h
> >>> index 035207a469..39aef5ffdf 100644
> >>> --- a/hw/virtio/vhost-shadow-virtqueue.h
> >>> +++ b/hw/virtio/vhost-shadow-virtqueue.h
> >>> @@ -35,7 +35,7 @@ size_t vhost_svq_device_area_size(const 
> >>> VhostShadowVirtqueue *svq);
> >>>
> >>>void vhost_svq_stop(VhostShadowVirtqueue *svq);
> >>>
> >>> -VhostShadowVirtqueue *vhost_svq_new(void);
> >>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize);
> >>>
> >>>void vhost_svq_free(VhostShadowVirtqueue *vq);
> >>>
> >>> diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
> >>> b/hw/virtio/vhost-shadow-virtqueue.c
> >>> index f129ec8395..7c168075d7 100644
> >>> --- a/hw/virtio/vhost-shadow-virtqueue.c
> >>> +++ b/hw/virtio/vhost-shadow-virtqueue.c
> >>> @@ -277,9 +277,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
> >>>/**
> >>> * Creates vhost shadow virtqueue, and instruct vhost device to use 
> >>> the shadow
> >>> * methods and file descriptors.
> >>> + *
> >>> + * @qsize Shadow VirtQueue size
> >>> + *
> >>> + * Returns the new virtqueue or NULL.
> >>> + *
> >>> + * In case of error, reason is reported through error_report.
> >>> */
> >>> -VhostShadowVirtqueue *vhost_svq_new(void)
> >>> +VhostShadowVirtqueue *vhost_svq_new(uint16_t qsize)
> >>>{
> >>> +size_t desc_size = sizeof(vring_desc_t) * qsize;
> >>> +size_t device_size, driver_size;
> >>>g_autofree VhostShadowVirtqueue *svq = 
> >>> g_new0(VhostShadowVirtqueue, 1);
> >>>int r;
> >>>
> >>> @@ -300,6 +308,15 @@ VhostShadowVirtqueue *vhost_svq_new(void)
> >>>/* Placeholder descriptor, it should be deleted at set_kick_fd */
> >>>event_notifier_init_fd(>svq_kick, INVALID_SVQ_KICK_FD);
> >>>
> >>> +svq->vring.num = qsize;
> >>
> >> I wonder if this is the best. E.g some hardware can support up to 32K
> >> queue size. So this will probably end up with:
> >>
> >> 1) SVQ use 32K queue size
> >> 2) hardware queue uses 256
> >>
> > In that case SVQ vring queue size will be 32K and guest's vring can
> > negotiate any number with SVQ equal or less than 32K,
>
>
> Sorry for being unclear what I meant is actually
>
> 1) SVQ uses 32K queue size
>
> 2) guest vq uses 256
>
> This looks like a burden that needs extra logic and may damage the
> performance.
>

Still not getting this point.

An available guest buffer, although contiguous in GPA/GVA, can expand
in multiple buffers if it's not contiguous in qemu's VA (by the while
loop in virtqueue_map_desc [1]). In that scenario it is better to have
"plenty" of SVQ buffers.

I'm ok if we decide to put an upper limit though, or if we decide not
to handle this situation. But we would leave out valid virtio drivers.
Maybe to set a fixed upper limit (1024?)? To add another parameter
(x-svq-size-n=N)?

If you mean we lose performance because memory gets more sparse I
think the only possibility is to limit that way.

> And this can lead other interesting situation:
>
> 1) SVQ uses 256
>
> 2) guest vq uses 1024
>
> Where a lot of more SVQ logic is needed.
>

If we agree that a guest descriptor can expand in multiple SVQ
descriptors, this should be already handled by the previous logic too.

But this should only happen in case that qemu is launched with a "bad"
cmdline, isn't it?

If I run that example with vp_vdpa, L0 qemu will happily accept 1024
as a queue size [2]. But if the vdpa device maximum queue size is
effectively 256, this will result in an error: We're not exposing it
to the guest at any moment but with qemu's cmdline.

>
> > including 256.
> > Is that what you mean?
>
>
> I mean, it looks to me the logic will be much more simplified if we just
> allocate the shadow virtqueue with the size what guest can see (guest
> vring).
>
> Then we don't need to think if the difference of the queue size can have
> any side effects.
>

I think that we cannot avoid that extra logic unless we force GPA to
be contiguous in IOVA. If we are sure the guest's buffers cannot be at
more than one descriptor in SVQ, then yes, we can simplify things. If
not, I think we are forced to carry all of it.

But if we prove it I'm not opposed to simplifying things and making
head at 

[PULL v2 3/5] tests/9pfs: Fix leak of local_test_path

2022-02-17 Thread Christian Schoenebeck
From: Greg Kurz 

local_test_path is allocated in virtio_9p_create_local_test_dir() to hold the 
path
of the temporary directory. It should be freed in 
virtio_9p_remove_local_test_dir()
when the temporary directory is removed. Clarify the lifecycle of 
local_test_path
while here.

Based-on: 

Signed-off-by: Greg Kurz 
Message-Id: <20220201151508.190035-2-gr...@kaod.org>
Reviewed-by: Christian Schoenebeck 
Signed-off-by: Christian Schoenebeck 
---
 tests/qtest/libqos/virtio-9p.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tests/qtest/libqos/virtio-9p.c b/tests/qtest/libqos/virtio-9p.c
index ef96ef006a..5d18e5eae5 100644
--- a/tests/qtest/libqos/virtio-9p.c
+++ b/tests/qtest/libqos/virtio-9p.c
@@ -39,8 +39,13 @@ static char *concat_path(const char* a, const char* b)
 
 void virtio_9p_create_local_test_dir(void)
 {
+g_assert(local_test_path == NULL);
 struct stat st;
 char *pwd = g_get_current_dir();
+/*
+ * template gets cached into local_test_path and freed in
+ * virtio_9p_remove_local_test_dir().
+ */
 char *template = concat_path(pwd, "qtest-9p-local-XX");
 
 local_test_path = mkdtemp(template);
@@ -66,6 +71,8 @@ void virtio_9p_remove_local_test_dir(void)
 /* ignore error, dummy check to prevent compiler error */
 }
 g_free(cmd);
+g_free(local_test_path);
+local_test_path = NULL;
 }
 
 char *virtio_9p_test_path(const char *path)
-- 
2.20.1




Re: [PATCH v2 9/9] spapr: implement nested-hv capability for the virtual hypervisor

2022-02-17 Thread Cédric Le Goater

On 2/16/22 13:30, Nicholas Piggin wrote:

Excerpts from Nicholas Piggin's message of February 16, 2022 9:38 pm:

Excerpts from Cédric Le Goater's message of February 16, 2022 8:52 pm:

On 2/16/22 11:25, Nicholas Piggin wrote:

This implements the Nested KVM HV hcall API for spapr under TCG.

The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
L1 is switched back in returned from the hcall when a HV exception
is sent to the vhyp. Register state is copied in and out according to
the nested KVM HV hcall API specification.

The hdecr timer is started when the L2 is switched in, and it provides
the HDEC / 0x980 return to L1.

The MMU re-uses the bare metal radix 2-level page table walker by
using the get_pate method to point the MMU to the nested partition
table entry. MMU faults due to partition scope errors raise HV
exceptions and accordingly are routed back to the L1.

The MMU does not tag translations for the L1 (direct) vs L2 (nested)
guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
and exit).>
Reviewed-by: Fabiano Rosas 
Signed-off-by: Nicholas Piggin 


Reviewed-by: Cédric Le Goater 

Some last comments below,


[...]


diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index edbf3eeed0..852fe61b36 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -199,6 +199,9 @@ struct SpaprMachineState {
   bool has_graphics;
   uint32_t vsmt;   /* Virtual SMT mode (KVM's "core stride") */
   
+/* Nested HV support (TCG only) */

+uint64_t nested_ptcr;
+


this is new state to migrate.



[...]


+/* Linux 64-bit powerpc pt_regs struct, used by nested HV */
+struct kvmppc_pt_regs {
+uint64_t gpr[32];
+uint64_t nip;
+uint64_t msr;
+uint64_t orig_gpr3;/* Used for restarting system calls */
+uint64_t ctr;
+uint64_t link;
+uint64_t xer;
+uint64_t ccr;
+uint64_t softe;/* Soft enabled/disabled */
+uint64_t trap; /* Reason for being here */
+uint64_t dar;  /* Fault registers */
+uint64_t dsisr;/* on 4xx/Book-E used for ESR */
+uint64_t result;   /* Result of a system call */
+};


I think we need to start moving all the spapr hcall definitions under
spapr_hcall.h. It can come later.


Sure.

[...]


diff --git a/include/hw/ppc/spapr_cpu_core.h b/include/hw/ppc/spapr_cpu_core.h
index dab3dfc76c..b560514560 100644
--- a/include/hw/ppc/spapr_cpu_core.h
+++ b/include/hw/ppc/spapr_cpu_core.h
@@ -48,6 +48,11 @@ typedef struct SpaprCpuState {
   bool prod; /* not migrated, only used to improve dispatch latencies */
   struct ICPState *icp;
   struct XiveTCTX *tctx;
+
+/* Fields for nested-HV support */
+bool in_nested; /* true while the L2 is executing */
+CPUPPCState *nested_host_state; /* holds the L1 state while L2 executes */
+int64_t nested_tb_offset; /* L1->L2 TB offset */


This needs a new vmstate.


How about instead of the vmstate (we would need all the L1 state in
nested_host_state as well), we just add a migration blocker in the
L2 entry path. We could limit the max hdecr to say 1 second to
ensure it unblocks before long.

I know migration blockers are not preferred but in this case it gives
us some iterations to debug and optimise first, which might change
the data to migrate.


This should be roughly the incremental patch to do this.


I think we can merge without it.

Adding support shouldn't be too complex and TCG migration of an L1
running L2 is not the most important feature today. It would be
better to have something clean (blocker if incomplete or a decent
support) before the 7.0 is released though

However, there is an issue with TCG migration and it has been there
for a while :

https://lore.kernel.org/qemu-devel/fb2e56cc-15d1-65ee-9d9c-64223483e...@kaod.org/

Thanks,

C.



[PULL v2 4/5] tests/9pfs: Use g_autofree and g_autoptr where possible

2022-02-17 Thread Christian Schoenebeck
From: Greg Kurz 

It is recommended to use g_autofree or g_autoptr as it reduces
the odds of introducing memory leaks in future changes.

Signed-off-by: Greg Kurz 
Message-Id: <20220201151508.190035-3-gr...@kaod.org>
Reviewed-by: Christian Schoenebeck 
Signed-off-by: Christian Schoenebeck 
---
 tests/qtest/libqos/virtio-9p.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/tests/qtest/libqos/virtio-9p.c b/tests/qtest/libqos/virtio-9p.c
index 5d18e5eae5..f51f0635cc 100644
--- a/tests/qtest/libqos/virtio-9p.c
+++ b/tests/qtest/libqos/virtio-9p.c
@@ -41,7 +41,7 @@ void virtio_9p_create_local_test_dir(void)
 {
 g_assert(local_test_path == NULL);
 struct stat st;
-char *pwd = g_get_current_dir();
+g_autofree char *pwd = g_get_current_dir();
 /*
  * template gets cached into local_test_path and freed in
  * virtio_9p_remove_local_test_dir().
@@ -52,7 +52,6 @@ void virtio_9p_create_local_test_dir(void)
 if (!local_test_path) {
 g_test_message("mkdtemp('%s') failed: %s", template, strerror(errno));
 }
-g_free(pwd);
 
 g_assert(local_test_path != NULL);
 
@@ -65,12 +64,11 @@ void virtio_9p_create_local_test_dir(void)
 void virtio_9p_remove_local_test_dir(void)
 {
 g_assert(local_test_path != NULL);
-char *cmd = g_strdup_printf("rm -fr '%s'\n", local_test_path);
+g_autofree char *cmd = g_strdup_printf("rm -fr '%s'\n", local_test_path);
 int res = system(cmd);
 if (res < 0) {
 /* ignore error, dummy check to prevent compiler error */
 }
-g_free(cmd);
 g_free(local_test_path);
 local_test_path = NULL;
 }
@@ -216,8 +214,8 @@ static void *virtio_9p_pci_create(void *pci_bus, 
QGuestAllocator *t_alloc,
 static void regex_replace(GString *haystack, const char *pattern,
   const char *replace_fmt, ...)
 {
-GRegex *regex;
-char *replace, *s;
+g_autoptr(GRegex) regex = NULL;
+g_autofree char *replace = NULL, *s = NULL;
 va_list argp;
 
 va_start(argp, replace_fmt);
@@ -227,9 +225,6 @@ static void regex_replace(GString *haystack, const char 
*pattern,
 regex = g_regex_new(pattern, 0, 0, NULL);
 s = g_regex_replace(regex, haystack->str, -1, 0, replace, 0, NULL);
 g_string_assign(haystack, s);
-g_free(s);
-g_regex_unref(regex);
-g_free(replace);
 }
 
 void virtio_9p_assign_local_driver(GString *cmd_line, const char *args)
-- 
2.20.1




Re: [PATCH] tcg: Remove dh_alias indirection for dh_typecode

2022-02-17 Thread Keith Packard via
Richard Henderson  writes:

> Reported-by: Keith Packard 
> Signed-off-by: Richard Henderson 

Looks good to me, and it passes my very simple test when run on s390.

Tested-by: Keith Packard 

-- 
-keith


signature.asc
Description: PGP signature


[PULL v2 2/5] tests/9pfs: fix mkdir() being called twice

2022-02-17 Thread Christian Schoenebeck
The 9p test cases use mkdtemp() to create a temporary directory for
running the 'local' 9p tests with real files/dirs. Unlike mktemp()
which only generates a unique file name, mkdtemp() also creates the
directory, therefore the subsequent mkdir() was wrong and caused
errors on some systems.

Signed-off-by: Christian Schoenebeck 
Fixes: 136b7af2 (tests/9pfs: fix test dir for parallel tests)
Reported-by: Daniel P. Berrangé 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/832
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Greg Kurz 
Message-Id: 

---
 tests/qtest/libqos/virtio-9p.c | 18 +++---
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/tests/qtest/libqos/virtio-9p.c b/tests/qtest/libqos/virtio-9p.c
index b4e1143288..ef96ef006a 100644
--- a/tests/qtest/libqos/virtio-9p.c
+++ b/tests/qtest/libqos/virtio-9p.c
@@ -37,31 +37,19 @@ static char *concat_path(const char* a, const char* b)
 return g_build_filename(a, b, NULL);
 }
 
-static void init_local_test_path(void)
+void virtio_9p_create_local_test_dir(void)
 {
+struct stat st;
 char *pwd = g_get_current_dir();
 char *template = concat_path(pwd, "qtest-9p-local-XX");
+
 local_test_path = mkdtemp(template);
 if (!local_test_path) {
 g_test_message("mkdtemp('%s') failed: %s", template, strerror(errno));
 }
-g_assert(local_test_path);
 g_free(pwd);
-}
-
-void virtio_9p_create_local_test_dir(void)
-{
-struct stat st;
-int res;
-
-init_local_test_path();
 
 g_assert(local_test_path != NULL);
-res = mkdir(local_test_path, 0777);
-if (res < 0) {
-g_test_message("mkdir('%s') failed: %s", local_test_path,
-   strerror(errno));
-}
 
 /* ensure test directory exists now ... */
 g_assert(stat(local_test_path, ) == 0);
-- 
2.20.1




Re: [PATCH v2 00/12] GL & D-Bus display related fixes

2022-02-17 Thread Marc-André Lureau
Hi

On Thu, Feb 17, 2022 at 8:39 PM Akihiko Odaki 
wrote:

> On Fri, Feb 18, 2022 at 1:12 AM Marc-André Lureau
>  wrote:
> >
> > Hi
> >
> > On Thu, Feb 17, 2022 at 5:09 PM Akihiko Odaki 
> wrote:
> >>
> >> On Thu, Feb 17, 2022 at 8:58 PM  wrote:
> >> >
> >> > From: Marc-André Lureau 
> >> >
> >> > Hi,
> >> >
> >> > In the thread "[PATCH 0/6] ui/dbus: Share one listener for a
> console", Akihiko
> >> > Odaki reported a number of issues with the GL and D-Bus display. His
> series
> >> > propose a different design, and reverting some of my previous generic
> console
> >> > changes to fix those issues.
> >> >
> >> > However, as I work through the issue so far, they can be solved by
> reasonable
> >> > simple fixes while keeping the console changes generic (not specific
> to the
> >> > D-Bus display backend). I belive a shared infrastructure is more
> beneficial long
> >> > term than having GL-specific code in the DBus code (in particular, the
> >> > egl-headless & VNC combination could potentially use it)
> >> >
> >> > Thanks a lot Akihiko for reporting the issues proposing a different
> approach!
> >> > Please test this alternative series and let me know of any further
> issues. My
> >> > understanding is that you are mainly concerned with the Cocoa
> backend, and I
> >> > don't have a way to test it, so please check it. If necessary, we may
> well have
> >> > to revert my earlier changes and go your way, eventually.
> >> >
> >> > Marc-André Lureau (12):
> >> >   ui/console: fix crash when using gl context with non-gl listeners
> >> >   ui/console: fix texture leak when calling
> surface_gl_create_texture()
> >> >   ui: do not create a surface when resizing a GL scanout
> >> >   ui/console: move check for compatible GL context
> >> >   ui/console: move dcl compatiblity check to a callback
> >> >   ui/console: egl-headless is compatible with non-gl listeners
> >> >   ui/dbus: associate the DBusDisplayConsole listener with the given
> >> > console
> >> >   ui/console: move console compatibility check to
> dcl_display_console()
> >> >   ui/shader: fix potential leak of shader on error
> >> >   ui/shader: free associated programs
> >> >   ui/console: add a dpy_gfx_switch callback helper
> >> >   ui/dbus: fix texture sharing
> >> >
> >> >  include/ui/console.h |  19 ---
> >> >  ui/dbus.h|   3 ++
> >> >  ui/console-gl.c  |   4 ++
> >> >  ui/console.c | 119
> ++-
> >> >  ui/dbus-console.c|  27 +-
> >> >  ui/dbus-listener.c   |  11 
> >> >  ui/dbus.c|  33 +++-
> >> >  ui/egl-headless.c|  17 ++-
> >> >  ui/gtk.c |  18 ++-
> >> >  ui/sdl2.c|   9 +++-
> >> >  ui/shader.c  |   9 +++-
> >> >  ui/spice-display.c   |   9 +++-
> >> >  12 files changed, 192 insertions(+), 86 deletions(-)
> >> >
> >> > --
> >> > 2.34.1.428.gdcc0cd074f0c
> >> >
> >> >
> >>
> >> You missed only one thing:
> >> >- that console_select and register_displaychangelistener may not call
> >> > dpy_gfx_switch and call dpy_gl_scanout_texture instead. It is
> >> > incompatible with non-OpenGL displays
> >>
> >> displaychangelistener_display_console always has to call
> >> dpy_gfx_switch for non-OpenGL displays, but it still doesn't.
> >
> >
> > Ok, would that be what you have in mind?
> >
> >  --- a/ui/console.c
> > +++ b/ui/console.c
> > @@ -1122,6 +1122,10 @@ static void
> displaychangelistener_display_console(DisplayChangeListener *dcl,
> >  } else if (con->scanout.kind == SCANOUT_SURFACE) {
> >  dpy_gfx_create_texture(con, con->surface);
> >  displaychangelistener_gfx_switch(dcl, con->surface);
> > +} else {
> > +/* use the fallback surface, egl-headless keeps it updated */
> > +assert(con->surface);
> > +displaychangelistener_gfx_switch(dcl, con->surface);
> >  }
>
> It should call displaychangelistener_gfx_switch even when e.g.
> con->scanout.kind == SCANOUT_TEXTURE. egl-headless renders the content
> to the last DisplaySurface it received while con->scanout.kind ==
> SCANOUT_TEXTURE.
>

I see, egl-headless is really not a "listener".


> >
> > I wish such egl-headless specific code would be there, but we would need
> more refactoring.
> >
> > I think I would rather have a backend split for GL context, like
> "-object egl-context". egl-headless-specific copy code would be handled by
> common/util code for anything that wants a pixman surface (VNC, screen
> capture, non-GL display etc).
> >
> > This split would allow sharing the context code, and introduce other
> system specific GL initialization, such as WGL etc. Right now, I doubt the
> EGL code works on anything but Linux.
>
> Sharing the context code is unlikely to happen. Usually the toolkit
> (GTK, SDL, or Cocoa in my fork) knows what graphics accelerator should
> be used and how the context should be created for a particular window.
> The context sharing can be achieved only for headless 

[PATCH] migration: NULL transport_data after freeing

2022-02-17 Thread Hanna Reitz
migration_incoming_state_destroy() NULLs all objects it frees after they
are freed, presumably so that a subsequent call to the same function
will not free them again, unless new objects have been created in the
meantime.

transport_data is the exception, and it shows exactly this problem: When
an incoming migration uses transport_cleanup() and transport_data, and a
subsequent incoming migration (e.g. loadvm) occurs that does not, then
when this second one is done, it will call transport_cleanup() on the
old transport_data again -- which has already been freed.  This is
sometimes visible in the iotest 201, though for some reason I can only
reproduce it with -m32.

To fix this, call transport_cleanup() only when transport_data is not
NULL (otherwise there is nothing to clean up), and set transport_data to
NULL when it has been cleaned up (i.e. freed).

(transport_cleanup() is used only by migration/socket.c, where
socket_start_incoming_migration_internal() sets both it and
transport_data to non-NULL values.)

Signed-off-by: Hanna Reitz 
---
 migration/migration.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index bcc385b94b..cdb2e76d02 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -287,8 +287,9 @@ void migration_incoming_state_destroy(void)
 g_array_free(mis->postcopy_remote_fds, TRUE);
 mis->postcopy_remote_fds = NULL;
 }
-if (mis->transport_cleanup) {
+if (mis->transport_cleanup && mis->transport_data) {
 mis->transport_cleanup(mis->transport_data);
+mis->transport_data = NULL;
 }
 
 qemu_event_reset(>main_thread_load_event);
-- 
2.34.1




Re: [PATCH v5 11/20] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU

2022-02-17 Thread Stefan Hajnoczi
On Tue, Feb 08, 2022 at 09:35:04AM -0500, Emanuele Giuseppe Esposito wrote:
> Once job lock is used and aiocontext is removed, mirror has
> to perform job operations under the same critical section,
> using the helpers prepared in previous commit.
> 
> Note: at this stage, job_{lock/unlock} and job lock guard macros
> are *nop*.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  block/mirror.c | 19 ++-
>  1 file changed, 14 insertions(+), 5 deletions(-)

My understanding is that MirrorBlockJob itself does need a lock because
it's only access from the coroutines - and they run in only one thread.

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


[PATCH v1] aio-posix: fix build failure io_uring 2.2

2022-02-17 Thread Haiyue Wang
The io_uring fixed "Don't truncate addr fields to 32-bit on 32-bit":
https://git.kernel.dk/cgit/liburing/commit/?id=d84c29b19ed0b13619cff40141bb1fc3615b

This leads to build failure:
../util/fdmon-io_uring.c: In function ‘add_poll_remove_sqe’:
../util/fdmon-io_uring.c:182:36: error: passing argument 2 of 
‘io_uring_prep_poll_remove’ makes integer from pointer without a cast 
[-Werror=int-conversion]
  182 | io_uring_prep_poll_remove(sqe, node);
  |^~~~
  ||
  |AioHandler *
In file included from /root/io/qemu/include/block/aio.h:18,
 from ../util/aio-posix.h:20,
 from ../util/fdmon-io_uring.c:49:
/usr/include/liburing.h:415:17: note: expected ‘__u64’ {aka ‘long long unsigned 
int’} but argument is of type ‘AioHandler *’
  415 |   __u64 user_data)
  |   ~~^
cc1: all warnings being treated as errors

So convert the paramter to right type according to the io_uring define.

Signed-off-by: Haiyue Wang 
---
 util/fdmon-io_uring.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
index 1461dfa407..e7febbf11f 100644
--- a/util/fdmon-io_uring.c
+++ b/util/fdmon-io_uring.c
@@ -179,7 +179,11 @@ static void add_poll_remove_sqe(AioContext *ctx, 
AioHandler *node)
 {
 struct io_uring_sqe *sqe = get_sqe(ctx);
 
+#ifdef LIBURING_HAVE_DATA64
+io_uring_prep_poll_remove(sqe, (__u64)node);
+#else
 io_uring_prep_poll_remove(sqe, node);
+#endif
 }
 
 /* Add a timeout that self-cancels when another cqe becomes ready */
-- 
2.35.1




[PULL v2 5/5] 9pfs: Fix segfault in do_readdir_many caused by struct dirent overread

2022-02-17 Thread Christian Schoenebeck
From: Vitaly Chikunov 

`struct dirent' returned from readdir(3) could be shorter (or longer)
than `sizeof(struct dirent)', thus memcpy of sizeof length will overread
into unallocated page causing SIGSEGV. Example stack trace:

 #0  0x559ebeed v9fs_co_readdir_many (/usr/bin/qemu-system-x86_64 + 
0x497eed)
 #1  0x559ec2e9 v9fs_readdir (/usr/bin/qemu-system-x86_64 + 0x4982e9)
 #2  0x55eb7983 coroutine_trampoline (/usr/bin/qemu-system-x86_64 + 
0x963983)
 #3  0x773e0be0 n/a (n/a + 0x0)

While fixing this, provide a helper for any future `struct dirent' cloning.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/841
Cc: qemu-sta...@nongnu.org
Co-authored-by: Christian Schoenebeck 
Reviewed-by: Dmitry V. Levin 
Signed-off-by: Vitaly Chikunov 
Tested-by: Christian Schoenebeck 
Reviewed-by: Christian Schoenebeck 
Acked-by: Greg Kurz 
Tested-by: Vitaly Chikunov 
Message-Id: <20220216181821.3481527-1...@altlinux.org>
[C.S. - Fix typo in source comment. ]
Signed-off-by: Christian Schoenebeck 
---
 hw/9pfs/9p-synth.c   | 18 +++---
 hw/9pfs/9p-synth.h   |  5 +
 hw/9pfs/codir.c  |  3 +--
 include/qemu/osdep.h | 13 +
 util/osdep.c | 21 +
 5 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/hw/9pfs/9p-synth.c b/hw/9pfs/9p-synth.c
index b38088e066..7a7cd5c5ba 100644
--- a/hw/9pfs/9p-synth.c
+++ b/hw/9pfs/9p-synth.c
@@ -182,7 +182,12 @@ static int synth_opendir(FsContext *ctx,
 V9fsSynthOpenState *synth_open;
 V9fsSynthNode *node = *(V9fsSynthNode **)fs_path->data;
 
-synth_open = g_malloc(sizeof(*synth_open));
+/*
+ * V9fsSynthOpenState contains 'struct dirent' which have OS-specific
+ * properties, thus it's zero cleared on allocation here and below
+ * in synth_open.
+ */
+synth_open = g_new0(V9fsSynthOpenState, 1);
 synth_open->node = node;
 node->open_count++;
 fs->private = synth_open;
@@ -220,7 +225,14 @@ static void synth_rewinddir(FsContext *ctx, 
V9fsFidOpenState *fs)
 static void synth_direntry(V9fsSynthNode *node,
 struct dirent *entry, off_t off)
 {
-strcpy(entry->d_name, node->name);
+size_t sz = strlen(node->name) + 1;
+/*
+ * 'entry' is always inside of V9fsSynthOpenState which have NAME_MAX
+ * back padding. Ensure we do not overflow it.
+ */
+g_assert(sizeof(struct dirent) + NAME_MAX >=
+ offsetof(struct dirent, d_name) + sz);
+memcpy(entry->d_name, node->name, sz);
 entry->d_ino = node->attr->inode;
 entry->d_off = off + 1;
 }
@@ -266,7 +278,7 @@ static int synth_open(FsContext *ctx, V9fsPath *fs_path,
 V9fsSynthOpenState *synth_open;
 V9fsSynthNode *node = *(V9fsSynthNode **)fs_path->data;
 
-synth_open = g_malloc(sizeof(*synth_open));
+synth_open = g_new0(V9fsSynthOpenState, 1);
 synth_open->node = node;
 node->open_count++;
 fs->private = synth_open;
diff --git a/hw/9pfs/9p-synth.h b/hw/9pfs/9p-synth.h
index 036d7e4a5b..eeb246f377 100644
--- a/hw/9pfs/9p-synth.h
+++ b/hw/9pfs/9p-synth.h
@@ -41,6 +41,11 @@ typedef struct V9fsSynthOpenState {
 off_t offset;
 V9fsSynthNode *node;
 struct dirent dent;
+/*
+ * Ensure there is enough space for 'dent' above, some systems have a
+ * d_name size of just 1, which would cause a buffer overrun.
+ */
+char dent_trailing_space[NAME_MAX];
 } V9fsSynthOpenState;
 
 int qemu_v9fs_synth_mkdir(V9fsSynthNode *parent, int mode,
diff --git a/hw/9pfs/codir.c b/hw/9pfs/codir.c
index 032cce04c4..c0873bde16 100644
--- a/hw/9pfs/codir.c
+++ b/hw/9pfs/codir.c
@@ -143,8 +143,7 @@ static int do_readdir_many(V9fsPDU *pdu, V9fsFidState *fidp,
 } else {
 e = e->next = g_malloc0(sizeof(V9fsDirEnt));
 }
-e->dent = g_malloc0(sizeof(struct dirent));
-memcpy(e->dent, dent, sizeof(struct dirent));
+e->dent = qemu_dirent_dup(dent);
 
 /* perform a full stat() for directory entry if requested by caller */
 if (dostat) {
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index d1660d67fa..ce12f64853 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -805,6 +805,19 @@ static inline int platform_does_not_support_system(const 
char *command)
 }
 #endif /* !HAVE_SYSTEM_FUNCTION */
 
+/**
+ * Duplicate directory entry @dent.
+ *
+ * It is highly recommended to use this function instead of open coding
+ * duplication of @c dirent objects, because the actual @c struct @c dirent
+ * size may be bigger or shorter than @c sizeof(struct dirent) and correct
+ * handling is platform specific (see gitlab issue #841).
+ *
+ * @dent - original directory entry to be duplicated
+ * @returns duplicated directory entry which should be freed with g_free()
+ */
+struct dirent *qemu_dirent_dup(struct dirent *dent);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/util/osdep.c b/util/osdep.c
index 42a0a4986a..67fbf22778 100644
--- 

Re: [PATCH v2 00/12] GL & D-Bus display related fixes

2022-02-17 Thread Akihiko Odaki
On Fri, Feb 18, 2022 at 1:12 AM Marc-André Lureau
 wrote:
>
> Hi
>
> On Thu, Feb 17, 2022 at 5:09 PM Akihiko Odaki  wrote:
>>
>> On Thu, Feb 17, 2022 at 8:58 PM  wrote:
>> >
>> > From: Marc-André Lureau 
>> >
>> > Hi,
>> >
>> > In the thread "[PATCH 0/6] ui/dbus: Share one listener for a console", 
>> > Akihiko
>> > Odaki reported a number of issues with the GL and D-Bus display. His series
>> > propose a different design, and reverting some of my previous generic 
>> > console
>> > changes to fix those issues.
>> >
>> > However, as I work through the issue so far, they can be solved by 
>> > reasonable
>> > simple fixes while keeping the console changes generic (not specific to the
>> > D-Bus display backend). I belive a shared infrastructure is more 
>> > beneficial long
>> > term than having GL-specific code in the DBus code (in particular, the
>> > egl-headless & VNC combination could potentially use it)
>> >
>> > Thanks a lot Akihiko for reporting the issues proposing a different 
>> > approach!
>> > Please test this alternative series and let me know of any further issues. 
>> > My
>> > understanding is that you are mainly concerned with the Cocoa backend, and 
>> > I
>> > don't have a way to test it, so please check it. If necessary, we may well 
>> > have
>> > to revert my earlier changes and go your way, eventually.
>> >
>> > Marc-André Lureau (12):
>> >   ui/console: fix crash when using gl context with non-gl listeners
>> >   ui/console: fix texture leak when calling surface_gl_create_texture()
>> >   ui: do not create a surface when resizing a GL scanout
>> >   ui/console: move check for compatible GL context
>> >   ui/console: move dcl compatiblity check to a callback
>> >   ui/console: egl-headless is compatible with non-gl listeners
>> >   ui/dbus: associate the DBusDisplayConsole listener with the given
>> > console
>> >   ui/console: move console compatibility check to dcl_display_console()
>> >   ui/shader: fix potential leak of shader on error
>> >   ui/shader: free associated programs
>> >   ui/console: add a dpy_gfx_switch callback helper
>> >   ui/dbus: fix texture sharing
>> >
>> >  include/ui/console.h |  19 ---
>> >  ui/dbus.h|   3 ++
>> >  ui/console-gl.c  |   4 ++
>> >  ui/console.c | 119 ++-
>> >  ui/dbus-console.c|  27 +-
>> >  ui/dbus-listener.c   |  11 
>> >  ui/dbus.c|  33 +++-
>> >  ui/egl-headless.c|  17 ++-
>> >  ui/gtk.c |  18 ++-
>> >  ui/sdl2.c|   9 +++-
>> >  ui/shader.c  |   9 +++-
>> >  ui/spice-display.c   |   9 +++-
>> >  12 files changed, 192 insertions(+), 86 deletions(-)
>> >
>> > --
>> > 2.34.1.428.gdcc0cd074f0c
>> >
>> >
>>
>> You missed only one thing:
>> >- that console_select and register_displaychangelistener may not call
>> > dpy_gfx_switch and call dpy_gl_scanout_texture instead. It is
>> > incompatible with non-OpenGL displays
>>
>> displaychangelistener_display_console always has to call
>> dpy_gfx_switch for non-OpenGL displays, but it still doesn't.
>
>
> Ok, would that be what you have in mind?
>
>  --- a/ui/console.c
> +++ b/ui/console.c
> @@ -1122,6 +1122,10 @@ static void 
> displaychangelistener_display_console(DisplayChangeListener *dcl,
>  } else if (con->scanout.kind == SCANOUT_SURFACE) {
>  dpy_gfx_create_texture(con, con->surface);
>  displaychangelistener_gfx_switch(dcl, con->surface);
> +} else {
> +/* use the fallback surface, egl-headless keeps it updated */
> +assert(con->surface);
> +displaychangelistener_gfx_switch(dcl, con->surface);
>  }

It should call displaychangelistener_gfx_switch even when e.g.
con->scanout.kind == SCANOUT_TEXTURE. egl-headless renders the content
to the last DisplaySurface it received while con->scanout.kind ==
SCANOUT_TEXTURE.

>
> I wish such egl-headless specific code would be there, but we would need more 
> refactoring.
>
> I think I would rather have a backend split for GL context, like "-object 
> egl-context". egl-headless-specific copy code would be handled by common/util 
> code for anything that wants a pixman surface (VNC, screen capture, non-GL 
> display etc).
>
> This split would allow sharing the context code, and introduce other system 
> specific GL initialization, such as WGL etc. Right now, I doubt the EGL code 
> works on anything but Linux.

Sharing the context code is unlikely to happen. Usually the toolkit
(GTK, SDL, or Cocoa in my fork) knows what graphics accelerator should
be used and how the context should be created for a particular window.
The context sharing can be achieved only for headless displays, namely
dbus, egl-headless and spice. Few people would want to use them in
combination.

>
>>
>> Anything else should be addressed with this patch series. (And it also
>> has nice fixes for shader leaks.)
>
>
> thanks
>
>>
>>
>> cocoa doesn't have OpenGL output and egl-headless, 

Re: [PATCH v5 1/3] s390x/tcg: Implement Miscellaneous-Instruction-Extensions Facility 3 for the s390x

2022-02-17 Thread David Miller
Will submit patch later today, thanks



Re: Call for GSoC and Outreachy project ideas for summer 2022

2022-02-17 Thread Stefan Hajnoczi
On Thu, 17 Feb 2022 at 14:12, Stefano Garzarella  wrote:
>
> On Mon, Feb 14, 2022 at 02:01:52PM +, Stefan Hajnoczi wrote:
> >On Mon, 14 Feb 2022 at 07:11, Jason Wang  wrote:
> >>
> >> On Fri, Jan 28, 2022 at 11:47 PM Stefan Hajnoczi  
> >> wrote:
> >> >
> >> > Dear QEMU, KVM, and rust-vmm communities,
> >> > QEMU will apply for Google Summer of Code 2022
> >> > (https://summerofcode.withgoogle.com/) and has been accepted into
> >> > Outreachy May-August 2022 (https://www.outreachy.org/). You can now
> >> > submit internship project ideas for QEMU, KVM, and rust-vmm!
> >> >
> >> > If you have experience contributing to QEMU, KVM, or rust-vmm you can
> >> > be a mentor. It's a great way to give back and you get to work with
> >> > people who are just starting out in open source.
> >> >
> >> > Please reply to this email by February 21st with your project ideas.
> >> >
> >> > Good project ideas are suitable for remote work by a competent
> >> > programmer who is not yet familiar with the codebase. In
> >> > addition, they are:
> >> > - Well-defined - the scope is clear
> >> > - Self-contained - there are few dependencies
> >> > - Uncontroversial - they are acceptable to the community
> >> > - Incremental - they produce deliverables along the way
> >> >
> >> > Feel free to post ideas even if you are unable to mentor the project.
> >> > It doesn't hurt to share the idea!
> >>
> >> Implementing the VIRTIO_F_IN_ORDER feature for both Qemu and kernel
> >> (vhost/virtio drivers) would be an interesting idea.
> >>
> >> It satisfies all the points above since it's supported by virtio spec.
> >>
> >> (Unfortunately, I won't have time in the mentoring)
> >
> >Thanks for this idea. As a stretch goal we could add implementing the
> >packed virtqueue layout in Linux vhost, QEMU's libvhost-user, and/or
> >QEMU's virtio qtest code.
> >
> >Stefano: Thank you for volunteering to mentor the project. Please
> >write a project description (see template below) and I will add this
> >idea:
> >
>
> I wrote a description of the project below. Let me know if there is
> anything to change.

Thanks, I have added the idea:
https://wiki.qemu.org/Google_Summer_of_Code_2022#VIRTIO_F_IN_ORDER_support_for_virtio_devices

Stefan



Re: [PATCH v5] 9pfs: Fix segfault in do_readdir_many caused by struct dirent overread

2022-02-17 Thread Christian Schoenebeck
On Mittwoch, 16. Februar 2022 19:18:21 CET Vitaly Chikunov wrote:
> `struct dirent' returned from readdir(3) could be shorter (or longer)
> than `sizeof(struct dirent)', thus memcpy of sizeof length will overread
> into unallocated page causing SIGSEGV. Example stack trace:
> 
>  #0  0x559ebeed v9fs_co_readdir_many (/usr/bin/qemu-system-x86_64 +
> 0x497eed) #1  0x559ec2e9 v9fs_readdir (/usr/bin/qemu-system-x86_64
> + 0x4982e9) #2  0x55eb7983 coroutine_trampoline
> (/usr/bin/qemu-system-x86_64 + 0x963983) #3  0x773e0be0 n/a (n/a +
> 0x0)
> 
> While fixing this, provide a helper for any future `struct dirent' cloning.
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/841
> Cc: qemu-sta...@nongnu.org
> Co-authored-by: Christian Schoenebeck 
> Reviewed-by: Dmitry V. Levin 
> Signed-off-by: Vitaly Chikunov 
> ---

Queued on 9p.next:
https://github.com/cschoenebeck/qemu/commits/9p.next

Thanks! I prepare a new PR now.

Best regards,
Christian Schoenebeck





Re: Call for GSoC and Outreachy project ideas for summer 2022

2022-02-17 Thread Stefan Hajnoczi
On Thu, 17 Feb 2022 at 07:08, Alice Frosi  wrote:
>
> On Fri, Jan 28, 2022 at 6:04 PM Stefan Hajnoczi  wrote:
> >
> > Dear QEMU, KVM, and rust-vmm communities,
> > QEMU will apply for Google Summer of Code 2022
> > (https://summerofcode.withgoogle.com/) and has been accepted into
> > Outreachy May-August 2022 (https://www.outreachy.org/). You can now
> > submit internship project ideas for QEMU, KVM, and rust-vmm!
> >
> > If you have experience contributing to QEMU, KVM, or rust-vmm you can
> > be a mentor. It's a great way to give back and you get to work with
> > people who are just starting out in open source.
> >
> > Please reply to this email by February 21st with your project ideas.
> >
> > Good project ideas are suitable for remote work by a competent
> > programmer who is not yet familiar with the codebase. In
> > addition, they are:
> > - Well-defined - the scope is clear
> > - Self-contained - there are few dependencies
> > - Uncontroversial - they are acceptable to the community
> > - Incremental - they produce deliverables along the way
> >
> > Feel free to post ideas even if you are unable to mentor the project.
> > It doesn't hurt to share the idea!
> >
>
> I'd like to propose this idea:
>
> Title: Create encrypted storage using VM-based container runtimes
>
> Cryptsetup requires root privileges in order to be able to encrypt
> storage with luks. However, privileged containers are generally
> discouraged for security reasons. A possible solution to avoid extra
> privileges is using VM-based container runtimes (e.g crun with libkrun
> or kata-containers) and running inside the Virtual Machine the tools
> for the storage encryption.
>
> This internship focus on a PoC for integrating and extending crun with
> libkrun in order to be able to create encrypted storage. The initial
> step will focus on creating encrypted images to demonstrate the
> feasibility and the necessary changes in the stack. If the timeframe
> allows it, an interesting follow-up of the first step is the
> encryption of persistent storage using block-based PVCs.
>
> Language: C, rust, golang
> Skills: containers and virtualization would be a big plus
> I won't put a level but the intern needs to be willing to dig into
> different source codes like crun (written in C), libkrun (written in
> Rust) and possibly podman or other kubernetes/containers projects
> (written in go)
> Mentor: Alice Frosi, Co-mentor: Sergio Lopez Pascual
>
> Let me know if the idea sounds feasible to you!
Thanks, I have added the idea:
https://wiki.qemu.org/Google_Summer_of_Code_2022#Create_encrypted_storage_using_VM-based_container_runtimes

Stefan



Re: qemu crash 100% CPU with Ubuntu10.04 guest (solved)

2022-02-17 Thread Kashyap Chamarthy
On Thu, Feb 17, 2022 at 12:07:15PM +1100, Ben Smith wrote:
> Hi All,
Hi,

> I'm cross-posting this from Reddit qemu_kvm, in case it helps in some
> way. I know my setup is ancient and unique; let me know if you would
> like more info.
> 
> Symptoms:
> 1. Ubuntu10.04 32-bit guest locks up randomly between 0 and 30 days.
> 2. The console shows a CPU trace dump, nothing else logged on the guest or 
> host.
> 3. Host system (Ubuntu20.04) 100% CPU for qemu process.
> 
> Solution:
> When using virt-install, always use the "--os-variant" parameter!
> e.g. --os-variant ubuntu10.04
> 
> From the man page "--os-variant... Optimize the guest configuration
> for a specific operating system".
> In this case, "optimize" apparently means "stop the crashing".

The "--os-variant" will use virtio devices where applicable, recommended
machine type, guest resources (e.g. CPU, memory, disk size) and other
things that'll improve performance.

> I was deliberately avoiding the option because the VM was already
> performing much better than expected and I didn't want to complicate
> the configuration.

Using it is always recommended when using `virt-install`.  The command
`osinfo-query os` will list all the OSes that you can use with
"--os-variant".  Note: even if you don't find the latest version of $OS
in `osinfo-query`, just using the most recent version still suffices.


> This was very, very painful to troubleshoot; Involving spinning up 60
> VMs simultaneously, waiting for a failure, changing one parameter,
> repeat. :(

Yikes!  Kudos for having the high threshold for frustration.

I think providing a clear reproducer can still be useful.  E.g. your
full guest QEMU command-line and your QEMU version.  (The
libvirt-generated QEMu log contains the version info.)


-- 
/kashyap




Re: [PATCH v2 00/12] GL & D-Bus display related fixes

2022-02-17 Thread Marc-André Lureau
Hi

On Thu, Feb 17, 2022 at 5:09 PM Akihiko Odaki 
wrote:

> On Thu, Feb 17, 2022 at 8:58 PM  wrote:
> >
> > From: Marc-André Lureau 
> >
> > Hi,
> >
> > In the thread "[PATCH 0/6] ui/dbus: Share one listener for a console",
> Akihiko
> > Odaki reported a number of issues with the GL and D-Bus display. His
> series
> > propose a different design, and reverting some of my previous generic
> console
> > changes to fix those issues.
> >
> > However, as I work through the issue so far, they can be solved by
> reasonable
> > simple fixes while keeping the console changes generic (not specific to
> the
> > D-Bus display backend). I belive a shared infrastructure is more
> beneficial long
> > term than having GL-specific code in the DBus code (in particular, the
> > egl-headless & VNC combination could potentially use it)
> >
> > Thanks a lot Akihiko for reporting the issues proposing a different
> approach!
> > Please test this alternative series and let me know of any further
> issues. My
> > understanding is that you are mainly concerned with the Cocoa backend,
> and I
> > don't have a way to test it, so please check it. If necessary, we may
> well have
> > to revert my earlier changes and go your way, eventually.
> >
> > Marc-André Lureau (12):
> >   ui/console: fix crash when using gl context with non-gl listeners
> >   ui/console: fix texture leak when calling surface_gl_create_texture()
> >   ui: do not create a surface when resizing a GL scanout
> >   ui/console: move check for compatible GL context
> >   ui/console: move dcl compatiblity check to a callback
> >   ui/console: egl-headless is compatible with non-gl listeners
> >   ui/dbus: associate the DBusDisplayConsole listener with the given
> > console
> >   ui/console: move console compatibility check to dcl_display_console()
> >   ui/shader: fix potential leak of shader on error
> >   ui/shader: free associated programs
> >   ui/console: add a dpy_gfx_switch callback helper
> >   ui/dbus: fix texture sharing
> >
> >  include/ui/console.h |  19 ---
> >  ui/dbus.h|   3 ++
> >  ui/console-gl.c  |   4 ++
> >  ui/console.c | 119 ++-
> >  ui/dbus-console.c|  27 +-
> >  ui/dbus-listener.c   |  11 
> >  ui/dbus.c|  33 +++-
> >  ui/egl-headless.c|  17 ++-
> >  ui/gtk.c |  18 ++-
> >  ui/sdl2.c|   9 +++-
> >  ui/shader.c  |   9 +++-
> >  ui/spice-display.c   |   9 +++-
> >  12 files changed, 192 insertions(+), 86 deletions(-)
> >
> > --
> > 2.34.1.428.gdcc0cd074f0c
> >
> >
>
> You missed only one thing:
> >- that console_select and register_displaychangelistener may not call
> > dpy_gfx_switch and call dpy_gl_scanout_texture instead. It is
> > incompatible with non-OpenGL displays
>
> displaychangelistener_display_console always has to call
> dpy_gfx_switch for non-OpenGL displays, but it still doesn't.
>

Ok, would that be what you have in mind?

 --- a/ui/console.c
+++ b/ui/console.c
@@ -1122,6 +1122,10 @@ static void
displaychangelistener_display_console(DisplayChangeListener *dcl,
 } else if (con->scanout.kind == SCANOUT_SURFACE) {
 dpy_gfx_create_texture(con, con->surface);
 displaychangelistener_gfx_switch(dcl, con->surface);
+} else {
+/* use the fallback surface, egl-headless keeps it updated */
+assert(con->surface);
+displaychangelistener_gfx_switch(dcl, con->surface);
 }

I wish such egl-headless specific code would be there, but we would need
more refactoring.

I think I would rather have a backend split for GL context, like "-object
egl-context". egl-headless-specific copy code would be handled by
common/util code for anything that wants a pixman surface (VNC, screen
capture, non-GL display etc).

This split would allow sharing the context code, and introduce other system
specific GL initialization, such as WGL etc. Right now, I doubt the EGL
code works on anything but Linux.


> Anything else should be addressed with this patch series. (And it also
> has nice fixes for shader leaks.)
>

thanks


>
> cocoa doesn't have OpenGL output and egl-headless, the cause of many
> pains addressed here, does not work on macOS so you need little
> attention. I have an out-of-tree OpenGL support for cocoa but it is
> out-of-tree anyway and I can fix it anytime.
>

Great!

btw, I suppose you checked your DBus changes against the WIP "qemu-display"
project. What was your experience? I don't think many people have tried it
yet. Do you think this could be made to work on macOS? at least the
non-dmabuf support should work, as long as Gtk4 has good macOS support. I
don't know if dmabuf or similar exist there, any idea?


-- 
Marc-André Lureau


  1   2   3   >