Re: [Qemu-devel] [PATCH 6/8] i386/kvm: hv-stimer requires hv-time and hv-synic

2022-04-12 Thread Divya Garg



On 12/04/22 8:46 pm, Vitaly Kuznetsov wrote:

Divya Garg  writes:


On 12/04/22 6:18 pm, Vitaly Kuznetsov wrote:

Divya Garg  writes:


Hi Vitaly Kuznetsov !
I was working on hyperv flags and saw that we introduced new
dependencies some
time back
(https://urldefense.proofpoint.com/v2/url?u=https-3A__sourcegraph.com_github.com_qemu_qemu_-2D_commit_c686193072a47032d83cb4e131dc49ae30f9e5d7-3Fvisible-3D1=DwIBAg=s883GpUCOChKOHiocYtGcg=2QGHz-fTCVWImEBKe1ZcSe5t6UfasnhvdzD5DcixwOE=ln-t0rKlkFkOEKe97jJTLi2BoKK5E9lLMPHjPihl4kpdbvBStPeD0Ku9wTed7GPf=AtipQDs1Mi-0FQtb1AyvBpR34bpjp64troGF_nr_08E=
 ).
After these changes, if we try to live migrate a vm from older qemu to newer
one having these changes, it fails showing dependency issue.

I was wondering if this is the expected behaviour or if there is any work
around for handing it ? Or something needs to be done to ensure backward
compatibility ?

Hi Divya,

configurations with 'hv-stimer' and without 'hv-synic'/'hv-time' were
always incorrect as Windows can't use the feature, that's why the
dependencies were added. It is true that it doesn't seem to be possible
to forward-migrate such VMs to newer QEMU versions. We could've tied
these new dependencies to newer machine types I guess (so old machine
types would not fail to start) but we didn't do that back in 4.1 and
it's been awhile since... Not sure whether it would make much sense to
introduce something for pre-4.1 machine types now.

Out of curiosity, why do such "incorrect" configurations exist? Can you
just update them to include missing flags on older QEMU so they migrate
to newer ones without issues?


Hi Vitaly !

Thanks for the response. I understand that these were incorrect
configurations
and should be corrected. Only issue is, we want to avoid power cycling those
VMs. But yeah I think, since the configurations were wrong we should
update and
power cycle the VM.  Just for understanding purpose, is it possible to
disable
the feature by throwing out some warning message and update libvirt to
metigate
this change and handle live migration ?


I'm not exactly sure about libvirt, I was under the impression it makes
sure that QEMU command line is the same on the destination and on the
source. If there's a way to add something, I'd suggest you add the
missing features (hv-time, hv-synic) on the destination rather than
remove 'hv-stimer' as it is probably safer.
Yes libvirt makes sure that the configurations remains constant on 
source and

destination. And true that adding new features is a safer route.



Or maybe update libvirt to not to ask for this feature from qemu during live
migration and handle different configuration on source and destination
host ?

You can also modify QEMU locally and throw away these dependencies,
it'll allow these configurations again but generally speaking checking
that the set of hyper-v features is exactly the same on the source and
destination is the right thing to do: there are no guarantees that guest
OS (Windows) will keep behaving sane when the corresponding CPUIDs
change while it's running, all sorts of things are possible I believe.
True that. Its really difficult to predict the change in behaviour of 
guest on

changing CPUIDs especially disabling a bit. I agree best solution will be to
power cycle the VMs and update the correct CPUIDs maintaining correct
dependency. Thankyou for clearing out the doubts and helping in a better
understanding.

Regards
Divya



Re: XIVE VFIO kernel resample failure in INTx mode under heavy load

2022-04-12 Thread Alexey Kardashevskiy




On 3/17/22 06:16, Cédric Le Goater wrote:

Timothy,

On 3/16/22 17:29, Cédric Le Goater wrote:

Hello,



I've been struggling for some time with what is looking like a
potential bug in QEMU/KVM on the POWER9 platform.  It appears that
in XIVE mode, when the in-kernel IRQ chip is enabled, an external
device that rapidly asserts IRQs via the legacy INTx level mechanism
will only receive one interrupt in the KVM guest.


Indeed. I could reproduce with a pass-through PCI adapter using
'pci=nomsi'. The virtio devices operate correctly but the network
adapter only receives one event (*):


$ cat /proc/interrupts
    CPU0   CPU1   CPU2   CPU3   CPU4   
CPU5   CPU6   CPU7
  16:   2198   1378   1519   1216  0  
0  0  0  XIVE-IPI   0 Edge  IPI-0
  17:  0  0  0  0   2003   
1936   1335   1507  XIVE-IPI   1 Edge  IPI-1
  18:  0   6401  0  0  0  
0  0  0  XIVE-IRQ 4609 Level virtio3, virtio0, 
virtio2
  19:  0  0  0  0  0
204  0  0  XIVE-IRQ 4610 Level virtio1
  20:  0  0  0  0  0  
0  0  0  XIVE-IRQ 4608 Level xhci-hcd:usb1
  21:  0  1  0  0  0  
0  0  0  XIVE-IRQ 4612 Level eth1 (*)
  23:  0  0  0  0  0  
0  0  0  XIVE-IRQ 4096 Edge  RAS_EPOW
  24:  0  0  0  0  0  
0  0  0  XIVE-IRQ 4592 Edge  hvc_console
  26:  0  0  0  0  0  
0  0  0  XIVE-IRQ 4097 Edge  RAS_HOTPLUG



Changing any one of those items appears to avoid the glitch, e.g. XICS


XICS is very different from XIVE. The driver implements the previous
interrupt controller architecture (P5-P8) and the hypervisor mediates
the delivery to the guest. With XIVE, vCPUs are directly signaled by
the IC. When under KVM, we use different KVM devices for each mode :

* KVM XIVE is a XICS-on-XIVE implementation (P9/P10 hosts) for guests
   not using the XIVE native interface. RHEL7 for instance.
* KVM XIVE native is a XIVE implementation (P9/P10 hosts) for guests
   using the XIVE native interface. Linux > 4.14.
* KVM XICS is for P8 hosts (no XIVE HW)

VFIO adds some complexity with the source events. I think the problem
comes from the assertion state. I will talk about it later.


mode with the in-kernel IRQ chip works (all interrupts are passed
through),


All interrupts are passed through using XIVE also. Run 'info pic' in
the monitor. On the host, check the IRQ mapping in the debugfs file :

   /sys/kernel/debug/powerpc/kvm-xive-*

and XIVE mode with the in-kernel IRQ chip disabled also works. 


In that case, no KVM device backs the QEMU device and all state
is in one place.


We
are also not seeing any problems in XIVE mode with the in-kernel
chip from MSI/MSI-X devices.


Yes. pass-through devices are expected to operate correctly :)


The device in question is a real time card that needs to raise an
interrupt every 1ms.  It works perfectly on the host, but fails in
the guest -- with the in-kernel IRQ chip and XIVE enabled, it
receives exactly one interrupt, at which point the host continues to
see INTx+ but the guest sees INTX-, and the IRQ handler in the guest
kernel is never reentered.


ok. Same symptom as the scenario above.


We have also seen some very rare glitches where, over a long period
of time, we can enter a similar deadlock in XICS mode.


with the in-kernel XICS IRQ chip ?


Disabling
the in-kernel IRQ chip in XIVE mode will also lead to the lockup
with this device, since the userspace IRQ emulation cannot keep up
with the rapid interrupt firing (measurements show around 100ms
required for processing each interrupt in the user mode).


MSI emulation in QEMU is slower indeed (35%). LSI is very slow because
it is handled as a special case in the device/driver. To maintain the
assertion state, all LSI handling is done with a special HCALL :
H_INT_ESB which is implemented in QEMU. This generates a lot of back
and forth in the KVM stack.


My understanding is the resample mechanism does some clever tricks
with level IRQs, but that QEMU needs to check if the IRQ is still
asserted by the device on guest EOI.


Yes. the problem is in that area.


Since a failure here would
explain these symptoms I'm wondering if there is a bug in either
QEMU or KVM for POWER / pSeries (SPAPr) where the IRQ is not
resampled and therefore not re-fired in the guest?


KVM I would say. The assertion state is maintained in KVM for the KVM
XICS-on-XIVE implementation and in QEMU for the KVM XIVE native
device. These are good candidates. I will take a look.


All works 

[PATCH] docs: Correct the default thread-pool-size

2022-04-12 Thread Liu Yiding
Refer to 26ec190964 virtiofsd: Do not use a thread pool by default

Signed-off-by: Liu Yiding 
---
 docs/tools/virtiofsd.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/tools/virtiofsd.rst b/docs/tools/virtiofsd.rst
index 0c0560203c..33fed08c6f 100644
--- a/docs/tools/virtiofsd.rst
+++ b/docs/tools/virtiofsd.rst
@@ -127,7 +127,7 @@ Options
 .. option:: --thread-pool-size=NUM
 
   Restrict the number of worker threads per request queue to NUM.  The default
-  is 64.
+  is 0.
 
 .. option:: --cache=none|auto|always
 
-- 
2.31.1






Re: [RFC PATCH 0/4] 9pfs: Add 9pfs support for Windows host

2022-04-12 Thread Bin Meng
+Guohuai

On Tue, Apr 12, 2022 at 8:27 PM Christian Schoenebeck
 wrote:
>
> On Freitag, 8. April 2022 19:10:09 CEST Bin Meng wrote:
> > At present there is no Windows support for 9p file system.
> > This series adds initial Windows support for 9p file system.
>
> Nice!
>
> > Only 'local' file system driver backend is supported. security_model
> > should be 'none' due to limitations on Windows host.
>
> We have 3 fs drivers: local, synth, proxy. I don't mind about proxy, it is in
> bad shape and we will probably deprecate it in near future anyway. But it
> would be good to have support for the synth driver, because we are using it
> for running test cases and fuzzing tests (QA).
>
> What are the limitations against security_model=mapped on Windows? Keep in
> mind that with security_model=none you are very limited in what you can do
> with 9p.
>

Regards,
Bin



Re: [PATCH 0/5] target/arm: Support variable sized coprocessor registers

2022-04-12 Thread Gavin Shan

Hi Peter,

On 4/11/22 8:10 PM, Peter Maydell wrote:

On Mon, 11 Apr 2022 at 13:02, Andrew Jones  wrote:

On Mon, Apr 11, 2022 at 10:22:59AM +0100, Peter Maydell wrote:

Also, we support SVE today, and we don't have variable size
coprocessor registers. Is there a bug here that we would be
fixing ?


SVE registers are KVM_REG_SIZE_U2048 and KVM_REG_SIZE_U256 sized
registers. They work fine (just like the VFP registers which are
KVM_REG_SIZE_U128 sized). They work because they don't get stored in the
cpreg list. SVE and CORE (which includes VFP) registers are filtered
out by kvm_arm_reg_syncs_via_cpreg_list(). Since they're filtered
out they need to be handled specifically by kvm_arch_get/put_registers()


Right, this is the distinction between ONE_REG registers and
coprocessor registers (which are a subset of ONE_REG registers).
We wouldn't want to handle SVE regs in the copro list anyway,
I think, because we want their state to end up in env->vfp.zregs[]
so the gdbstub can find it there. And we wouldn't have benefited
from the copro regs handling's "no need for new QEMU to handle
migrating state of a new register" because we needed a lot of
special case code for SVE and couldn't enable it by default
for other reasons.



Yep, those new introduced SDEI pseudo-registers, the intention
is to avoid the special case code. So the coprocessor register
list fits the need well. The only barrier to use the coprocessor
register list is the variable register sizes.


If we do add non-64-bit cpregs on the kernel side then we need to
make those new registers opt-in, because currently deployed QEMU
will refuse to start if the kernel passes it a register in the
KVM_GET_REG_LIST that is larger than 64 bits and isn't
KVM_REG_ARM_CORE or KVM_REG_ARM64_SVE (assuming I'm not misreading
the QEMU code).



Yes, we need make those new registers opt-in absolutely. Otherwise,
the old qemu, which doesn't have variable sized registers supported,
will crash on host kernel, where large sized registers are exposed
unconditionally.

I spent some time to think of the mechanisms for opt-in. There are
two options as I can figure out: (1) Using KVM_CAP_ARM_SDEI to check
if the large sized registers exist. (2) Using newly introduced
pseudo-register ("KVM_REG_ARM_STD_BMAP") in Raghavendra's series [1].
The individual bit in this pseudo-register corresponds to one
service in "standard hypervisor" category, where SDEI falls into.

I prefer (2) because those services or firmware interfaces are
exposed in a collective way by KVM_REG_ARM_STD_BMAP, comparing
to the individual capabilities. However, they are same in essence.
Another benefit to use KVM_REG_ARM_STD_BMAP is hiding SDEI interface
and the large sized registers for old QEMU.

[1] 
https://lore.kernel.org/linux-arm-kernel/20220407011605.1966778-10-rana...@google.com/T/#m0bc1aa4048ca157e8e99c593b3f349b879032543

Thanks,
Gavin




Re: [PATCH v5 4/4] hw/acpi/aml-build: Use existing CPU topology to build PPTT table

2022-04-12 Thread Gavin Shan

Hi Jonathan,

On 4/12/22 11:40 PM, Jonathan Cameron wrote:

On Sun,  3 Apr 2022 22:59:53 +0800
Gavin Shan  wrote:

When the PPTT table is built, the CPU topology is re-calculated, but
it's unecessary because the CPU topology has been populated in
virt_possible_cpu_arch_ids() on arm/virt machine.

This reworks build_pptt() to avoid by reusing the existing one in
ms->possible_cpus. Currently, the only user of build_pptt() is
arm/virt machine.

Signed-off-by: Gavin Shan 


My compiler isn't being very smart today and gives a bunch of
maybe used uninitialized for socket_offset, cluster_offset and core_offset.

They probably are initialized in all real paths, but I think you need to
set them to something at instantiation to keep the compiler happy.



Thanks for reporting the warning raised from the compiler. I think
your compiler may be smarter than mine :)

Yeah, they're initialized in all real paths after checking the code
again. So lets initialize them with zeroes in v6, but I would hold
for Igor and Yanan's reviews on this (v5) series.

Thanks,
Gavin




---
  hw/acpi/aml-build.c | 100 +---
  1 file changed, 38 insertions(+), 62 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 4086879ebf..4b0f9df3e3 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -2002,86 +2002,62 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, 
MachineState *ms,
  const char *oem_id, const char *oem_table_id)
  {
  MachineClass *mc = MACHINE_GET_CLASS(ms);
-GQueue *list = g_queue_new();
-guint pptt_start = table_data->len;
-guint parent_offset;
-guint length, i;
-int uid = 0;
-int socket;
+CPUArchIdList *cpus = ms->possible_cpus;
+int64_t socket_id = -1, cluster_id = -1, core_id = -1;
+uint32_t socket_offset, cluster_offset, core_offset;
+uint32_t pptt_start = table_data->len;
+int n;
  AcpiTable table = { .sig = "PPTT", .rev = 2,
  .oem_id = oem_id, .oem_table_id = oem_table_id };
  
  acpi_table_begin(, table_data);
  
-for (socket = 0; socket < ms->smp.sockets; socket++) {

-g_queue_push_tail(list,
-GUINT_TO_POINTER(table_data->len - pptt_start));
-build_processor_hierarchy_node(
-table_data,
-/*
- * Physical package - represents the boundary
- * of a physical package
- */
-(1 << 0),
-0, socket, NULL, 0);
-}
+for (n = 0; n < cpus->len; n++) {
+if (cpus->cpus[n].props.socket_id != socket_id) {
+socket_id = cpus->cpus[n].props.socket_id;
+cluster_id = -1;
+core_id = -1;
+socket_offset = table_data->len - pptt_start;
+build_processor_hierarchy_node(table_data,
+(1 << 0), /* Physical package */
+0, socket_id, NULL, 0);
+}
  
-if (mc->smp_props.clusters_supported) {

-length = g_queue_get_length(list);
-for (i = 0; i < length; i++) {
-int cluster;
-
-parent_offset = GPOINTER_TO_UINT(g_queue_pop_head(list));
-for (cluster = 0; cluster < ms->smp.clusters; cluster++) {
-g_queue_push_tail(list,
-GUINT_TO_POINTER(table_data->len - pptt_start));
-build_processor_hierarchy_node(
-table_data,
-(0 << 0), /* not a physical package */
-parent_offset, cluster, NULL, 0);
+if (mc->smp_props.clusters_supported) {
+if (cpus->cpus[n].props.cluster_id != cluster_id) {
+cluster_id = cpus->cpus[n].props.cluster_id;
+core_id = -1;
+cluster_offset = table_data->len - pptt_start;
+build_processor_hierarchy_node(table_data,
+(0 << 0), /* Not a physical package */
+socket_offset, cluster_id, NULL, 0);
  }
+} else {
+cluster_offset = socket_offset;
  }
-}
  
-length = g_queue_get_length(list);

-for (i = 0; i < length; i++) {
-int core;
-
-parent_offset = GPOINTER_TO_UINT(g_queue_pop_head(list));
-for (core = 0; core < ms->smp.cores; core++) {
-if (ms->smp.threads > 1) {
-g_queue_push_tail(list,
-GUINT_TO_POINTER(table_data->len - pptt_start));
-build_processor_hierarchy_node(
-table_data,
+if (ms->smp.threads <= 1) {
+build_processor_hierarchy_node(table_data,
+(1 << 1) | /* ACPI Processor ID valid */
+(1 << 3),  /* Node is a Leaf */
+cluster_offset, n, NULL, 0);
+} else {
+if (cpus->cpus[n].props.core_id != core_id) {
+core_id = cpus->cpus[n].props.core_id;
+core_offset = table_data->len - 

[ANNOUNCE] QEMU 7.0.0-rc4 is now available

2022-04-12 Thread Michael Roth
Hello,

On behalf of the QEMU Team, I'd like to announce the availability of the
fifth release candidate for the QEMU 7.0 release. This release is meant
for testing purposes and should not be used in a production environment.

  http://download.qemu-project.org/qemu-7.0.0-rc4.tar.xz
  http://download.qemu-project.org/qemu-7.0.0-rc4.tar.xz.sig

A note from the maintainer:

  rc4 contains three fixes for late-breaking security bugs. The plan
  is to make the final 7.0 release in a week's time on the 19th April,
  with no further changes, unless we discover some last-minute
  catastrophic problem.

You can help improve the quality of the QEMU 7.0 release by testing this
release and reporting bugs using our GitLab issue tracker:

  https://gitlab.com/qemu-project/qemu/-/issues

The release plan, as well a documented known issues for release
candidates, are available at:

  http://wiki.qemu.org/Planning/7.0

Please add entries to the ChangeLog for the 7.0 release below:

  http://wiki.qemu.org/ChangeLog/7.0

Thank you to everyone involved!

Changes since rc3:

81c7ed41a1: Update version for v7.0.0-rc4 release (Peter Maydell)
4bf58c7213: virtio-iommu: use-after-free fix (Wentao Liang)
fa892e9abb: ui/cursor: fix integer overflow in cursor_alloc (CVE-2021-4206) 
(Mauro Matteo Cascella)
9569f5cb5b: display/qxl-render: fix race condition in qxl_cursor 
(CVE-2021-4207) (Mauro Matteo Cascella)



[PATCH] net/vhost-user: Save ack_features to net_clients during vhost_user_start

2022-04-12 Thread Yi Wang
From: Liu Xiangyu 

During vhost_user_start, if openvswitch.service restart, cause the final 
features
not expected. Because qemu not save the ack_features promptly.

Signed-off-by: Liu Xiangyu 
Signed-off-by: Yi Wang 
---
 net/vhost-user.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index b1a0247..ce9dcb6 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -92,6 +92,10 @@ static int vhost_user_start(int queues, NetClientState 
*ncs[],
 goto err;
 }
 
+if (s->vhost_net) {
+s->acked_features = vhost_net_get_acked_features(net);
+}
+
 if (i == 0) {
 max_queues = vhost_net_get_max_queues(net);
 if (queues > max_queues) {
-- 
1.8.3.1



[PATCH v3 for 7.1 1/1] block: add 'force' parameter to 'blockdev-change-medium' command

2022-04-12 Thread Denis V. Lunev
'blockdev-change-medium' is a convinient wrapper for the following
sequence of commands:
 * blockdev-open-tray
 * blockdev-remove-medium
 * blockdev-insert-medium
 * blockdev-close-tray
and should be used f.e. to change ISO image inside the CD-ROM tray.
Though the guest could lock the tray and some linux guests like
CentOS 8.5 actually does that. In this case the execution if this
command results in the error like the following:
  Device 'scsi0-0-1-0' is locked and force was not specified,
  wait for tray to open and try again.

This situation is could be resolved 'blockdev-open-tray' by passing
flag 'force' inside. Thus is seems reasonable to add the same
capability for 'blockdev-change-medium' too.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Acked-by: "Dr. David Alan Gilbert" 
CC: Kevin Wolf 
CC: Hanna Reitz 
CC: Eric Blake 
CC: Markus Armbruster 
---
 block/qapi-sysemu.c |  3 ++-
 hmp-commands.hx | 11 +++
 monitor/hmp-cmds.c  |  4 +++-
 qapi/block.json |  6 ++
 ui/cocoa.m  |  1 +
 5 files changed, 19 insertions(+), 6 deletions(-)

Changes from v2:
- fixed parameter's order in changeDeviceMedia(). This is a VERY interesting
  story, actually. Both versions of the patch (v2 & v3) compile silently.
  In order to see the difference one needs to enable -Weverything compilation
  option!

Changes from v1:
- added kludge to Objective C code
- simplified a bit call of do_open_tray() (thanks, Vova!)
- added record to hmp-command.hx

diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
index 8498402ad4..680c7ee342 100644
--- a/block/qapi-sysemu.c
+++ b/block/qapi-sysemu.c
@@ -318,6 +318,7 @@ void qmp_blockdev_change_medium(bool has_device, const char 
*device,
 bool has_id, const char *id,
 const char *filename,
 bool has_format, const char *format,
+bool has_force, bool force,
 bool has_read_only,
 BlockdevChangeReadOnlyMode read_only,
 Error **errp)
@@ -380,7 +381,7 @@ void qmp_blockdev_change_medium(bool has_device, const char 
*device,
 
 rc = do_open_tray(has_device ? device : NULL,
   has_id ? id : NULL,
-  false, );
+  force, );
 if (rc && rc != -ENOSYS) {
 error_propagate(errp, err);
 goto fail;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8476277aa9..6ec593ea08 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -202,9 +202,9 @@ ERST
 
 {
 .name   = "change",
-.args_type  = "device:B,target:F,arg:s?,read-only-mode:s?",
-.params = "device filename [format [read-only-mode]]",
-.help   = "change a removable medium, optional format",
+.args_type  = "device:B,force:-f,target:F,arg:s?,read-only-mode:s?",
+.params = "device [-f] filename [format [read-only-mode]]",
+.help   = "change a removable medium, optional format, use -f to 
force the operation",
 .cmd= hmp_change,
 },
 
@@ -212,11 +212,14 @@ SRST
 ``change`` *device* *setting*
   Change the configuration of a device.
 
-  ``change`` *diskdevice* *filename* [*format* [*read-only-mode*]]
+  ``change`` *diskdevice* [-f] *filename* [*format* [*read-only-mode*]]
 Change the medium for a removable disk device to point to *filename*. eg::
 
   (qemu) change ide1-cd0 /path/to/some.iso
 
+``-f``
+  forces the operation even if the guest has locked the tray.
+
 *format* is optional.
 
 *read-only-mode* may be used to change the read-only status of the device.
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 634968498b..d8b98bed6c 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1472,6 +1472,7 @@ void hmp_change(Monitor *mon, const QDict *qdict)
 const char *target = qdict_get_str(qdict, "target");
 const char *arg = qdict_get_try_str(qdict, "arg");
 const char *read_only = qdict_get_try_str(qdict, "read-only-mode");
+bool force = qdict_get_try_bool(qdict, "force", false);
 BlockdevChangeReadOnlyMode read_only_mode = 0;
 Error *err = NULL;
 
@@ -1508,7 +1509,8 @@ void hmp_change(Monitor *mon, const QDict *qdict)
 }
 
 qmp_blockdev_change_medium(true, device, false, NULL, target,
-   !!arg, arg, !!read_only, read_only_mode,
+   !!arg, arg, true, force,
+   !!read_only, read_only_mode,
);
 }
 
diff --git a/qapi/block.json b/qapi/block.json
index 82fcf2c914..3f100d4887 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -326,6 +326,11 @@
 # @read-only-mode: change the read-only mode of the device; defaults
 #  to 'retain'
 #
+# @force: if false (the default), an 

Re: [PATCH for-7.1 1/8] nbd: actually implement reply_possible safeguard

2022-04-12 Thread Eric Blake
On Tue, Apr 12, 2022 at 09:41:57PM +0200, Paolo Bonzini wrote:
> The .reply_possible field of s->requests is never set to false.  This is
> not a big problem as it is only a safeguard to detect protocol errors,
> but fix it anyway.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/nbd.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 567872ac53..6a5e410e5f 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -454,15 +454,16 @@ static coroutine_fn int 
> nbd_receive_replies(BDRVNBDState *s, uint64_t handle)
>  nbd_channel_error(s, -EINVAL);
>  return -EINVAL;
>  }
> -if (s->reply.handle == handle) {
> -/* We are done */
> -return 0;
> -}
>  ind2 = HANDLE_TO_INDEX(s, s->reply.handle);
>  if (ind2 >= MAX_NBD_REQUESTS || !s->requests[ind2].reply_possible) {
>  nbd_channel_error(s, -EINVAL);
>  return -EINVAL;
>  }
> +s->requests[ind2].reply_possible = 
> nbd_reply_is_structured(>reply);

If the reply is simple (not structured), then we expect no further
replies, so this sets things to false.  But if the reply is
structured, the answer depends on NBD_REPLY_FLAG_DONE, as in:

s->requests[ind2].reply_possible =
  nbd_reply_is_structured(>reply) &&
  (s->reply.structured.flags & NBD_REPLY_FLAG_DONE);

> +if (s->reply.handle == handle) {
> +/* We are done */
> +return 0;
> +}
>  nbd_recv_coroutine_wake_one(>requests[ind2]);
>  }
>  }

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK

2022-04-12 Thread Andy Lutomirski
On Tue, Apr 12, 2022, at 7:36 AM, Jason Gunthorpe wrote:
> On Fri, Apr 08, 2022 at 08:54:02PM +0200, David Hildenbrand wrote:
>
>> RLIMIT_MEMLOCK was the obvious candidate, but as we discovered int he
>> past already with secretmem, it's not 100% that good of a fit (unmovable
>> is worth than mlocked). But it gets the job done for now at least.
>
> No, it doesn't. There are too many different interpretations how
> MELOCK is supposed to work
>
> eg VFIO accounts per-process so hostile users can just fork to go past
> it.
>
> RDMA is per-process but uses a different counter, so you can double up
>
> iouring is per-user and users a 3rd counter, so it can triple up on
> the above two
>
>> So I'm open for alternative to limit the amount of unmovable memory we
>> might allocate for user space, and then we could convert seretmem as well.
>
> I think it has to be cgroup based considering where we are now :\
>

So this is another situation where the actual backend (TDX, SEV, pKVM, pure 
software) makes a difference -- depending on exactly what backend we're using, 
the memory may not be unmoveable.  It might even be swappable (in the 
potentially distant future).

Anyway, here's a concrete proposal, with a bit of handwaving:

We add new cgroup limits:

memory.unmoveable
memory.locked

These can be set to an actual number or they can be set to the special value 
ROOT_CAP.  If they're set to ROOT_CAP, then anyone in the cgroup with 
capable(CAP_SYS_RESOURCE) (i.e. the global capability) can allocate movable or 
locked memory with this (and potentially other) new APIs.  If it's 0, then they 
can't.  If it's another value, then the memory can be allocated, charged to the 
cgroup, up to the limit, with no particular capability needed.  The default at 
boot is ROOT_CAP.  Anyone who wants to configure it differently is free to do 
so.  This avoids introducing a DoS, makes it easy to run tests without 
configuring cgroup, and lets serious users set up their cgroups.

Nothing is charge per mm.

To make this fully sensible, we need to know what the backend is for the 
private memory before allocating any so that we can charge it accordingly.



Re: [PATCH v2 04/39] util/log: Pass Error pointer to qemu_set_log

2022-04-12 Thread Alex Bennée


Richard Henderson  writes:

> Do not force exit within qemu_set_log; return bool and pass
> an Error value back up the stack as per usual.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée



Re: [PATCH v2 07/39] util/log: Rename qemu_log_lock to qemu_log_trylock

2022-04-12 Thread Alex Bennée


Richard Henderson  writes:

> This function can fail, which makes it more like ftrylockfile
> or pthread_mutex_trylock than flockfile or pthread_mutex_lock,
> so rename it.
>
> To closer match the other trylock functions, release rcu_read_lock
> along the failure path, so that qemu_log_unlock need not be called
> on failure.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée



Re: [PATCH] hw/nvme: fix narrowing conversion

2022-04-12 Thread Klaus Jensen
On Apr 12 11:59, Dmitry Tikhov wrote:
> Since nlbas is of type int, it does not work with large namespace size
> values, e.g., 9 TB size of file backing namespace and 8 byte metadata
> with 4096 bytes lbasz gives negative nlbas value, which is later
> promoted to negative int64_t type value and results in negative
> ns->moff which breaks namespace
> 
> Signed-off-by: Dmitry Tikhov 
> ---
>  hw/nvme/ns.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
> index 324f53ea0c..af6504fad2 100644
> --- a/hw/nvme/ns.c
> +++ b/hw/nvme/ns.c
> @@ -29,7 +29,8 @@ void nvme_ns_init_format(NvmeNamespace *ns)
>  {
>  NvmeIdNs *id_ns = >id_ns;
>  BlockDriverInfo bdi;
> -int npdg, nlbas, ret;
> +int npdg, ret;
> +int64_t nlbas;
>  
>  ns->lbaf = id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
>  ns->lbasz = 1 << ns->lbaf.ds;
> @@ -42,7 +43,7 @@ void nvme_ns_init_format(NvmeNamespace *ns)
>  id_ns->ncap = id_ns->nsze;
>  id_ns->nuse = id_ns->ncap;
>  
> -ns->moff = (int64_t)nlbas << ns->lbaf.ds;
> +ns->moff = nlbas << ns->lbaf.ds;
>  
>  npdg = ns->blkconf.discard_granularity / ns->lbasz;
>  
> -- 
> 2.35.1
> 

Thanks Dmitry. Looks reasonable,

Reviewed-by: Klaus Jensen 


signature.asc
Description: PGP signature


Re: [PATCH v2 03/39] util/log: Return bool from qemu_set_log_filename

2022-04-12 Thread Alex Bennée


Richard Henderson  writes:

> Per the recommendations in qapi/error.h, return false on failure.
>
> Use the return value in the monitor, the only place we aren't
> already passing error_fatal or error_abort.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée



Re: [PATCH] Warn user if the vga flag is passed but no vga device is created

2022-04-12 Thread Gautam Agrawal
hi,

> thanks for your patch, looks pretty good already, but there is a small
> issue: Try for example:
>
>   ./qemu-system-s390x -vga none
>
> ... and it will print the warning "qemu-system-s390x: warning: No vga device
> is created", though the user only asked for no VGA device. This seems to
> happen if a machine does not have any VGA device by default, but still
> requests "-vga none" on the command line.

This can be solved by adding this condition : (vga_interface_type != VGA_NONE)


> On 08/04/2022 12.45, Gautam Agrawal wrote:
> > This patch is in regards to this 
> > issue:https://gitlab.com/qemu-project/qemu/-/issues/581#.
>
> Better write this right in front of your Signed-off-by line:
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/581
>
> ... then the ticket will be automatically be closed once your patch gets 
> merged.
>
I apologize for this mistake

> vga_interface_type is also used in hw/mips/fuloong2e.c and
> hw/xenpv/xen_machine_pv.c ... do they need a change, too?

I can definitely make similar changes in them too since they also
specify the vga_interface_type, shall I proceed with this?

> This will trigger a warning from the scripts/checkpatch.pl script:
>
> ERROR: do not initialise globals to 0 or NULL
> #238: FILE: softmmu/globals.c:43:
> +bool vga_interface_created = false;

Could you kindly suggest a better approach to this than creating a
global variable.


> I'm not a native speaker, and maybe it's just a matter of taste, but I'd
> rather say it in past tense: "No VGA device has been created"

I will correct the warning message, as suggested by Peter Maydell.

Regards,
Gautam Agrawal



Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-04-12 Thread Kirill A. Shutemov
On Mon, Mar 28, 2022 at 01:16:48PM -0700, Andy Lutomirski wrote:
> On Thu, Mar 10, 2022 at 6:09 AM Chao Peng  wrote:
> >
> > This is the v5 of this series which tries to implement the fd-based KVM
> > guest private memory. The patches are based on latest kvm/queue branch
> > commit:
> >
> >   d5089416b7fb KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
> 
> Can this series be run and a VM booted without TDX?  A feature like
> that might help push it forward.

It would require enlightenment of the guest code. We have two options.

Simple one is to limit enabling to the guest kernel, but it would require
non-destructive conversion between shared->private memory. This does not
seem to be compatible with current design.

Other option is get memory private from time 0 of VM boot, but it requires
modification of virtual BIOS to setup shared ranges as needed. I'm not
sure if anybody volunteer to work on BIOS code to make it happen.

Hm.

-- 
 Kirill A. Shutemov



[PATCH for-7.1 8/8] nbd: document what is protected by the CoMutexes

2022-04-12 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini 
---
 block/nbd.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/nbd.c b/block/nbd.c
index 8954243f50..8297da7e89 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -82,12 +82,18 @@ typedef struct BDRVNBDState {
 NBDClientRequest requests[MAX_NBD_REQUESTS];
 QEMUTimer *reconnect_delay_timer;
 
+/* Protects sending data on the socket.  */
 CoMutex send_mutex;
+
+/*
+ * Protects receiving reply headers from the socket, as well as the
+ * fields reply, requests[].receiving and requests[].reply_possible
+ */
 CoMutex receive_mutex;
+NBDReply reply;
 
 QEMUTimer *open_timer;
 
-NBDReply reply;
 BlockDriverState *bs;
 
 /* Connection parameters */
-- 
2.35.1




[PATCH for-7.1 5/8] nbd: use a QemuMutex to synchronize reconnection with coroutines

2022-04-12 Thread Paolo Bonzini
The condition for waiting on the s->free_sema queue depends on
both s->in_flight and s->state.  The latter is currently using
atomics, but this is quite dubious and probably wrong.

Because s->state is written in the main thread too, for example by
the reconnect timer callback, it cannot be protected by a CoMutex.
Introduce a separate lock that can be used by nbd_co_send_request();
later on this lock will also be used for s->state.  There will not
be any contention on the lock unless there is a reconnect, so this
is not performance sensitive.

Signed-off-by: Paolo Bonzini 
---
 block/nbd.c | 46 +++---
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 0ff41cb914..c908ea6ae3 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -72,17 +72,22 @@ typedef struct BDRVNBDState {
 QIOChannel *ioc; /* The current I/O channel */
 NBDExportInfo info;
 
-CoMutex send_mutex;
+/*
+ * Protects free_sema, in_flight, requests[].coroutine,
+ * reconnect_delay_timer.
+ */
+QemuMutex requests_lock;
 CoQueue free_sema;
-
-CoMutex receive_mutex;
 int in_flight;
+NBDClientRequest requests[MAX_NBD_REQUESTS];
+QEMUTimer *reconnect_delay_timer;
+
+CoMutex send_mutex;
+CoMutex receive_mutex;
 NBDClientState state;
 
-QEMUTimer *reconnect_delay_timer;
 QEMUTimer *open_timer;
 
-NBDClientRequest requests[MAX_NBD_REQUESTS];
 NBDReply reply;
 BlockDriverState *bs;
 
@@ -351,7 +356,7 @@ int coroutine_fn 
nbd_co_do_establish_connection(BlockDriverState *bs,
 return 0;
 }
 
-/* called under s->send_mutex */
+/* Called with s->requests_lock taken.  */
 static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
 {
 bool blocking = nbd_client_connecting_wait(s);
@@ -383,9 +388,9 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState 
*s)
 s->ioc = NULL;
 }
 
-qemu_co_mutex_unlock(>send_mutex);
+qemu_mutex_unlock(>requests_lock);
 nbd_co_do_establish_connection(s->bs, blocking, NULL);
-qemu_co_mutex_lock(>send_mutex);
+qemu_mutex_lock(>requests_lock);
 
 /*
  * The reconnect attempt is done (maybe successfully, maybe not), so
@@ -468,11 +473,10 @@ static int coroutine_fn 
nbd_co_send_request(BlockDriverState *bs,
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 int rc, i = -1;
 
-qemu_co_mutex_lock(>send_mutex);
-
+qemu_mutex_lock(>requests_lock);
 while (s->in_flight == MAX_NBD_REQUESTS ||
(!nbd_client_connected(s) && s->in_flight > 0)) {
-qemu_co_queue_wait(>free_sema, >send_mutex);
+qemu_co_queue_wait(>free_sema, >requests_lock);
 }
 
 s->in_flight++;
@@ -493,14 +497,14 @@ static int coroutine_fn 
nbd_co_send_request(BlockDriverState *bs,
 }
 }
 
-g_assert(qemu_in_coroutine());
 assert(i < MAX_NBD_REQUESTS);
-
 s->requests[i].coroutine = qemu_coroutine_self();
 s->requests[i].offset = request->from;
 s->requests[i].receiving = false;
 s->requests[i].reply_possible = true;
+qemu_mutex_unlock(>requests_lock);
 
+qemu_co_mutex_lock(>send_mutex);
 request->handle = INDEX_TO_HANDLE(s, i);
 
 assert(s->ioc);
@@ -520,17 +524,19 @@ static int coroutine_fn 
nbd_co_send_request(BlockDriverState *bs,
 } else {
 rc = nbd_send_request(s->ioc, request);
 }
+qemu_co_mutex_unlock(>send_mutex);
 
-err:
 if (rc < 0) {
+qemu_mutex_lock(>requests_lock);
+err:
 nbd_channel_error(s, rc);
 if (i != -1) {
 s->requests[i].coroutine = NULL;
 }
 s->in_flight--;
 qemu_co_queue_next(>free_sema);
+qemu_mutex_unlock(>requests_lock);
 }
-qemu_co_mutex_unlock(>send_mutex);
 return rc;
 }
 
@@ -1020,12 +1026,11 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState 
*s,
 return true;
 
 break_loop:
+qemu_mutex_lock(>requests_lock);
 s->requests[HANDLE_TO_INDEX(s, handle)].coroutine = NULL;
-
-qemu_co_mutex_lock(>send_mutex);
 s->in_flight--;
 qemu_co_queue_next(>free_sema);
-qemu_co_mutex_unlock(>send_mutex);
+qemu_mutex_unlock(>requests_lock);
 
 return false;
 }
@@ -1858,8 +1863,9 @@ static int nbd_open(BlockDriverState *bs, QDict *options, 
int flags,
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
 s->bs = bs;
-qemu_co_mutex_init(>send_mutex);
+qemu_mutex_init(>requests_lock);
 qemu_co_queue_init(>free_sema);
+qemu_co_mutex_init(>send_mutex);
 qemu_co_mutex_init(>receive_mutex);
 
 if (!yank_register_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name), errp)) {
@@ -2046,9 +2052,11 @@ static void nbd_cancel_in_flight(BlockDriverState *bs)
 
 reconnect_delay_timer_del(s);
 
+qemu_mutex_lock(>requests_lock);
 if (s->state == NBD_CLIENT_CONNECTING_WAIT) {
 s->state = NBD_CLIENT_CONNECTING_NOWAIT;
 }
+qemu_mutex_unlock(>requests_lock);
 
 

[PATCH for-7.1 4/8] nbd: keep send_mutex/free_sema handling outside nbd_co_do_establish_connection

2022-04-12 Thread Paolo Bonzini
Elevate s->in_flight early so that other incoming requests will wait
on the CoQueue in nbd_co_send_request; restart them after getting back
from nbd_reconnect_attempt.  This could be after the reconnect timer or
nbd_cancel_in_flight have cancelled the attempt, so there is no
need anymore to cancel the requests there.

nbd_co_send_request now handles both stopping and restarting pending
requests after a successful connection, and there is no need to
hold send_mutex in nbd_co_do_establish_connection.  The current setup
is confusing because nbd_co_do_establish_connection is called both with
send_mutex taken and without it.  Before the patch it uses free_sema which
(at least in theory...) is protected by send_mutex, after the patch it
does not anymore.

Signed-off-by: Paolo Bonzini 
---
 block/coroutines.h |  4 +--
 block/nbd.c| 66 ++
 2 files changed, 33 insertions(+), 37 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index b293e943c8..8f6e438ef3 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -59,7 +59,7 @@ int coroutine_fn bdrv_co_writev_vmstate(BlockDriverState *bs,
 QEMUIOVector *qiov, int64_t pos);
 
 int coroutine_fn
-nbd_co_do_establish_connection(BlockDriverState *bs, Error **errp);
+nbd_co_do_establish_connection(BlockDriverState *bs, bool blocking, Error 
**errp);
 
 
 int coroutine_fn
@@ -109,7 +109,7 @@ bdrv_common_block_status_above(BlockDriverState *bs,
BlockDriverState **file,
int *depth);
 int generated_co_wrapper
-nbd_do_establish_connection(BlockDriverState *bs, Error **errp);
+nbd_do_establish_connection(BlockDriverState *bs, bool blocking, Error **errp);
 
 int generated_co_wrapper
 blk_do_preadv(BlockBackend *blk, int64_t offset, int64_t bytes,
diff --git a/block/nbd.c b/block/nbd.c
index 02db52a230..0ff41cb914 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -188,9 +188,6 @@ static void reconnect_delay_timer_cb(void *opaque)
 if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT) {
 s->state = NBD_CLIENT_CONNECTING_NOWAIT;
 nbd_co_establish_connection_cancel(s->conn);
-while (qemu_co_enter_next(>free_sema, NULL)) {
-/* Resume all queued requests */
-}
 }
 
 reconnect_delay_timer_del(s);
@@ -311,11 +308,10 @@ static int nbd_handle_updated_info(BlockDriverState *bs, 
Error **errp)
 }
 
 int coroutine_fn nbd_co_do_establish_connection(BlockDriverState *bs,
-Error **errp)
+bool blocking, Error **errp)
 {
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 int ret;
-bool blocking = nbd_client_connecting_wait(s);
 IO_CODE();
 
 assert(!s->ioc);
@@ -351,7 +347,6 @@ int coroutine_fn 
nbd_co_do_establish_connection(BlockDriverState *bs,
 
 /* successfully connected */
 s->state = NBD_CLIENT_CONNECTED;
-qemu_co_queue_restart_all(>free_sema);
 
 return 0;
 }
@@ -359,25 +354,25 @@ int coroutine_fn 
nbd_co_do_establish_connection(BlockDriverState *bs,
 /* called under s->send_mutex */
 static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
 {
-assert(nbd_client_connecting(s));
-assert(s->in_flight == 0);
-
-if (nbd_client_connecting_wait(s) && s->reconnect_delay &&
-!s->reconnect_delay_timer)
-{
-/*
- * It's first reconnect attempt after switching to
- * NBD_CLIENT_CONNECTING_WAIT
- */
-reconnect_delay_timer_init(s,
-qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
-s->reconnect_delay * NANOSECONDS_PER_SECOND);
-}
+bool blocking = nbd_client_connecting_wait(s);
 
 /*
  * Now we are sure that nobody is accessing the channel, and no one will
  * try until we set the state to CONNECTED.
  */
+assert(nbd_client_connecting(s));
+assert(s->in_flight == 1);
+
+if (blocking && !s->reconnect_delay_timer) {
+/*
+ * It's first reconnect attempt after switching to
+ * NBD_CLIENT_CONNECTING_WAIT
+ */
+g_assert(s->reconnect_delay);
+reconnect_delay_timer_init(s,
+qemu_clock_get_ns(QEMU_CLOCK_REALTIME) +
+s->reconnect_delay * NANOSECONDS_PER_SECOND);
+}
 
 /* Finalize previous connection if any */
 if (s->ioc) {
@@ -388,7 +383,9 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState 
*s)
 s->ioc = NULL;
 }
 
-nbd_co_do_establish_connection(s->bs, NULL);
+qemu_co_mutex_unlock(>send_mutex);
+nbd_co_do_establish_connection(s->bs, blocking, NULL);
+qemu_co_mutex_lock(>send_mutex);
 
 /*
  * The reconnect attempt is done (maybe successfully, maybe not), so
@@ -474,21 +471,21 @@ static int coroutine_fn 
nbd_co_send_request(BlockDriverState *bs,
 qemu_co_mutex_lock(>send_mutex);
 
 while 

[PATCH for-7.1 6/8] nbd: move s->state under requests_lock

2022-04-12 Thread Paolo Bonzini
Remove the confusing, and most likely wrong, atomics.  The only function
that used to be somewhat in a hot path was nbd_client_connected(),
but it is not anymore after the previous patches.

The function nbd_client_connecting_wait() was used mostly to check if
a request had to be reissued (outside requests_lock), but also
under requests_lock in nbd_client_connecting_wait().  The two uses have to
be separated; for the former we rename it to nbd_client_will_reconnect()
and make it take s->requests_lock; for the latter the access can simply
be inlined.  The new name is clearer, and ensures that a missing
conversion is caught by the compiler.

Signed-off-by: Paolo Bonzini 
---
 block/nbd.c | 88 +
 1 file changed, 48 insertions(+), 40 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index c908ea6ae3..6d80bd59e2 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -35,7 +35,6 @@
 #include "qemu/option.h"
 #include "qemu/cutils.h"
 #include "qemu/main-loop.h"
-#include "qemu/atomic.h"
 
 #include "qapi/qapi-visit-sockets.h"
 #include "qapi/qmp/qstring.h"
@@ -73,10 +72,11 @@ typedef struct BDRVNBDState {
 NBDExportInfo info;
 
 /*
- * Protects free_sema, in_flight, requests[].coroutine,
+ * Protects state, free_sema, in_flight, requests[].coroutine,
  * reconnect_delay_timer.
  */
 QemuMutex requests_lock;
+NBDClientState state;
 CoQueue free_sema;
 int in_flight;
 NBDClientRequest requests[MAX_NBD_REQUESTS];
@@ -84,7 +84,6 @@ typedef struct BDRVNBDState {
 
 CoMutex send_mutex;
 CoMutex receive_mutex;
-NBDClientState state;
 
 QEMUTimer *open_timer;
 
@@ -133,11 +132,6 @@ static void nbd_clear_bdrvstate(BlockDriverState *bs)
 s->x_dirty_bitmap = NULL;
 }
 
-static bool nbd_client_connected(BDRVNBDState *s)
-{
-return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED;
-}
-
 static bool coroutine_fn nbd_recv_coroutine_wake_one(NBDClientRequest *req)
 {
 if (req->receiving) {
@@ -160,14 +154,15 @@ static void coroutine_fn 
nbd_recv_coroutines_wake(BDRVNBDState *s, bool all)
 }
 }
 
-static void coroutine_fn nbd_channel_error(BDRVNBDState *s, int ret)
+/* Called with s->requests_lock held.  */
+static void coroutine_fn nbd_channel_error_locked(BDRVNBDState *s, int ret)
 {
-if (nbd_client_connected(s)) {
+if (s->state == NBD_CLIENT_CONNECTED) {
 qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
 }
 
 if (ret == -EIO) {
-if (nbd_client_connected(s)) {
+if (s->state == NBD_CLIENT_CONNECTED) {
 s->state = s->reconnect_delay ? NBD_CLIENT_CONNECTING_WAIT :
 NBD_CLIENT_CONNECTING_NOWAIT;
 }
@@ -178,6 +173,12 @@ static void coroutine_fn nbd_channel_error(BDRVNBDState 
*s, int ret)
 nbd_recv_coroutines_wake(s, true);
 }
 
+static void coroutine_fn nbd_channel_error(BDRVNBDState *s, int ret)
+{
+QEMU_LOCK_GUARD(>requests_lock);
+nbd_channel_error_locked(s, ret);
+}
+
 static void reconnect_delay_timer_del(BDRVNBDState *s)
 {
 if (s->reconnect_delay_timer) {
@@ -190,20 +191,18 @@ static void reconnect_delay_timer_cb(void *opaque)
 {
 BDRVNBDState *s = opaque;
 
-if (qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT) {
-s->state = NBD_CLIENT_CONNECTING_NOWAIT;
-nbd_co_establish_connection_cancel(s->conn);
-}
-
 reconnect_delay_timer_del(s);
+WITH_QEMU_LOCK_GUARD(>requests_lock) {
+if (s->state != NBD_CLIENT_CONNECTING_WAIT) {
+return;
+}
+s->state = NBD_CLIENT_CONNECTING_NOWAIT;
+}
+nbd_co_establish_connection_cancel(s->conn);
 }
 
 static void reconnect_delay_timer_init(BDRVNBDState *s, uint64_t 
expire_time_ns)
 {
-if (qatomic_load_acquire(>state) != NBD_CLIENT_CONNECTING_WAIT) {
-return;
-}
-
 assert(!s->reconnect_delay_timer);
 s->reconnect_delay_timer = aio_timer_new(bdrv_get_aio_context(s->bs),
  QEMU_CLOCK_REALTIME,
@@ -226,7 +225,9 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 s->ioc = NULL;
 }
 
-s->state = NBD_CLIENT_QUIT;
+WITH_QEMU_LOCK_GUARD(>requests_lock) {
+s->state = NBD_CLIENT_QUIT;
+}
 }
 
 static void open_timer_del(BDRVNBDState *s)
@@ -255,16 +256,13 @@ static void open_timer_init(BDRVNBDState *s, uint64_t 
expire_time_ns)
 timer_mod(s->open_timer, expire_time_ns);
 }
 
-static bool nbd_client_connecting(BDRVNBDState *s)
+static bool nbd_client_will_reconnect(BDRVNBDState *s)
 {
-NBDClientState state = qatomic_load_acquire(>state);
-return state == NBD_CLIENT_CONNECTING_WAIT ||
-state == NBD_CLIENT_CONNECTING_NOWAIT;
-}
-
-static bool nbd_client_connecting_wait(BDRVNBDState *s)
-{
-return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTING_WAIT;
+/*
+ * Called only after a socket error, so this is not performance 

[PATCH for-7.1 7/8] nbd: take receive_mutex when reading requests[].receiving

2022-04-12 Thread Paolo Bonzini
requests[].receiving is set by nbd_receive_replies() under the receive_mutex;
Read it under the same mutex as well.  Waking up receivers on errors happens
after each reply finishes processing, in nbd_co_receive_one_chunk().
If there is no currently-active reply, there are two cases:

* either there is no active request at all, in which case no
element of request[] can have .receiving = true

* or nbd_receive_replies() must be running and waiting for receive_mutex;
in that case it will get back to nbd_co_receive_one_chunk() because
the socket has been shutdown, and all waiting coroutines will wake up
in turn.

Signed-off-by: Paolo Bonzini 
---
 block/nbd.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 6d80bd59e2..8954243f50 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -132,6 +132,7 @@ static void nbd_clear_bdrvstate(BlockDriverState *bs)
 s->x_dirty_bitmap = NULL;
 }
 
+/* Called with s->receive_mutex taken.  */
 static bool coroutine_fn nbd_recv_coroutine_wake_one(NBDClientRequest *req)
 {
 if (req->receiving) {
@@ -143,12 +144,13 @@ static bool coroutine_fn 
nbd_recv_coroutine_wake_one(NBDClientRequest *req)
 return false;
 }
 
-static void coroutine_fn nbd_recv_coroutines_wake(BDRVNBDState *s, bool all)
+static void coroutine_fn nbd_recv_coroutines_wake(BDRVNBDState *s)
 {
 int i;
 
+QEMU_LOCK_GUARD(>receive_mutex);
 for (i = 0; i < MAX_NBD_REQUESTS; i++) {
-if (nbd_recv_coroutine_wake_one(>requests[i]) && !all) {
+if (nbd_recv_coroutine_wake_one(>requests[i])) {
 return;
 }
 }
@@ -169,8 +171,6 @@ static void coroutine_fn 
nbd_channel_error_locked(BDRVNBDState *s, int ret)
 } else {
 s->state = NBD_CLIENT_QUIT;
 }
-
-nbd_recv_coroutines_wake(s, true);
 }
 
 static void coroutine_fn nbd_channel_error(BDRVNBDState *s, int ret)
@@ -433,11 +433,10 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState 
*s, uint64_t handle)
 
 qemu_coroutine_yield();
 /*
- * We may be woken for 3 reasons:
+ * We may be woken for 2 reasons:
  * 1. From this function, executing in parallel coroutine, when our
  *handle is received.
- * 2. From nbd_channel_error(), when connection is lost.
- * 3. From nbd_co_receive_one_chunk(), when previous request is
+ * 2. From nbd_co_receive_one_chunk(), when previous request is
  *finished and s->reply.handle set to 0.
  * Anyway, it's OK to lock the mutex and go to the next iteration.
  */
@@ -931,7 +930,7 @@ static coroutine_fn int nbd_co_receive_one_chunk(
 }
 s->reply.handle = 0;
 
-nbd_recv_coroutines_wake(s, false);
+nbd_recv_coroutines_wake(s);
 
 return ret;
 }
-- 
2.35.1





[PATCH for-7.1 2/8] nbd: mark more coroutine_fns

2022-04-12 Thread Paolo Bonzini
Several coroutine functions in block/nbd.c are not marked as such.  This
patch adds a few more markers; it is not exhaustive, but it focuses
especially on:

- places that wake other coroutines, because aio_co_wake() has very
different semantics inside a coroutine (queuing after yield vs. entering
immediately);

- functions with _co_ in their names, to avoid confusion

Signed-off-by: Paolo Bonzini 
---
 block/nbd.c | 64 ++---
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 6a5e410e5f..81b319318e 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -133,7 +133,7 @@ static bool nbd_client_connected(BDRVNBDState *s)
 return qatomic_load_acquire(>state) == NBD_CLIENT_CONNECTED;
 }
 
-static bool nbd_recv_coroutine_wake_one(NBDClientRequest *req)
+static bool coroutine_fn nbd_recv_coroutine_wake_one(NBDClientRequest *req)
 {
 if (req->receiving) {
 req->receiving = false;
@@ -144,7 +144,7 @@ static bool nbd_recv_coroutine_wake_one(NBDClientRequest 
*req)
 return false;
 }
 
-static void nbd_recv_coroutines_wake(BDRVNBDState *s, bool all)
+static void coroutine_fn nbd_recv_coroutines_wake(BDRVNBDState *s, bool all)
 {
 int i;
 
@@ -155,7 +155,7 @@ static void nbd_recv_coroutines_wake(BDRVNBDState *s, bool 
all)
 }
 }
 
-static void nbd_channel_error(BDRVNBDState *s, int ret)
+static void coroutine_fn nbd_channel_error(BDRVNBDState *s, int ret)
 {
 if (nbd_client_connected(s)) {
 qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
@@ -468,9 +468,9 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState 
*s, uint64_t handle)
 }
 }
 
-static int nbd_co_send_request(BlockDriverState *bs,
-   NBDRequest *request,
-   QEMUIOVector *qiov)
+static int coroutine_fn nbd_co_send_request(BlockDriverState *bs,
+NBDRequest *request,
+QEMUIOVector *qiov)
 {
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 int rc, i = -1;
@@ -724,9 +724,9 @@ static int nbd_parse_error_payload(NBDStructuredReplyChunk 
*chunk,
 return 0;
 }
 
-static int nbd_co_receive_offset_data_payload(BDRVNBDState *s,
-  uint64_t orig_offset,
-  QEMUIOVector *qiov, Error **errp)
+static int coroutine_fn nbd_co_receive_offset_data_payload(BDRVNBDState *s,
+   uint64_t 
orig_offset,
+   QEMUIOVector *qiov, 
Error **errp)
 {
 QEMUIOVector sub_qiov;
 uint64_t offset;
@@ -1042,8 +1042,8 @@ break_loop:
 return false;
 }
 
-static int nbd_co_receive_return_code(BDRVNBDState *s, uint64_t handle,
-  int *request_ret, Error **errp)
+static int coroutine_fn nbd_co_receive_return_code(BDRVNBDState *s, uint64_t 
handle,
+   int *request_ret, Error 
**errp)
 {
 NBDReplyChunkIter iter;
 
@@ -1056,9 +1056,9 @@ static int nbd_co_receive_return_code(BDRVNBDState *s, 
uint64_t handle,
 return iter.ret;
 }
 
-static int nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t handle,
-uint64_t offset, QEMUIOVector *qiov,
-int *request_ret, Error **errp)
+static int coroutine_fn nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t 
handle,
+ uint64_t offset, 
QEMUIOVector *qiov,
+ int *request_ret, Error 
**errp)
 {
 NBDReplyChunkIter iter;
 NBDReply reply;
@@ -1108,10 +1108,10 @@ static int nbd_co_receive_cmdread_reply(BDRVNBDState 
*s, uint64_t handle,
 return iter.ret;
 }
 
-static int nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
-uint64_t handle, uint64_t length,
-NBDExtent *extent,
-int *request_ret, Error **errp)
+static int coroutine_fn nbd_co_receive_blockstatus_reply(BDRVNBDState *s,
+ uint64_t handle, 
uint64_t length,
+ NBDExtent *extent,
+ int *request_ret, 
Error **errp)
 {
 NBDReplyChunkIter iter;
 NBDReply reply;
@@ -1168,8 +1168,8 @@ static int nbd_co_receive_blockstatus_reply(BDRVNBDState 
*s,
 return iter.ret;
 }
 
-static int nbd_co_request(BlockDriverState *bs, NBDRequest *request,
-  QEMUIOVector *write_qiov)
+static int coroutine_fn nbd_co_request(BlockDriverState *bs, NBDRequest 
*request,
+   

[PATCH for-7.1 1/8] nbd: actually implement reply_possible safeguard

2022-04-12 Thread Paolo Bonzini
The .reply_possible field of s->requests is never set to false.  This is
not a big problem as it is only a safeguard to detect protocol errors,
but fix it anyway.

Signed-off-by: Paolo Bonzini 
---
 block/nbd.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 567872ac53..6a5e410e5f 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -454,15 +454,16 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState 
*s, uint64_t handle)
 nbd_channel_error(s, -EINVAL);
 return -EINVAL;
 }
-if (s->reply.handle == handle) {
-/* We are done */
-return 0;
-}
 ind2 = HANDLE_TO_INDEX(s, s->reply.handle);
 if (ind2 >= MAX_NBD_REQUESTS || !s->requests[ind2].reply_possible) {
 nbd_channel_error(s, -EINVAL);
 return -EINVAL;
 }
+s->requests[ind2].reply_possible = nbd_reply_is_structured(>reply);
+if (s->reply.handle == handle) {
+/* We are done */
+return 0;
+}
 nbd_recv_coroutine_wake_one(>requests[ind2]);
 }
 }
-- 
2.35.1





[PATCH for-7.1 0/8] nbd: actually make s->state thread-safe

2022-04-12 Thread Paolo Bonzini
The main point of this series is patch 6, which removes the dubious and
probably wrong use of atomics in block/nbd.c.  This in turn is enabled
mostly by the cleanups in patches 3-5.  Together, they introduce a
QemuMutex that synchronizes the NBD client coroutines, the reconnect_delay
timer and nbd_cancel_in_flight() as well.

The fixes happen to remove an incorrect use of qemu_co_queue_restart_all
and qemu_co_enter_next on the s->free_sema CoQueue, which was not guarded
by s->send_mutex.

The rest is bugfixes, simplifying the code a bit, and extra documentation.

Paolo Bonzini (8):
  nbd: actually implement reply_possible safeguard
  nbd: mark more coroutine_fns
  nbd: remove peppering of nbd_client_connected
  nbd: keep send_mutex/free_sema handling outside
nbd_co_do_establish_connection
  nbd: use a QemuMutex to synchronize reconnection with coroutines
  nbd: move s->state under requests_lock
  nbd: take receive_mutex when reading requests[].receiving
  nbd: document what is protected by the CoMutexes

 block/coroutines.h |   4 +-
 block/nbd.c| 295 +++--
 2 files changed, 154 insertions(+), 145 deletions(-)

-- 
2.35.1




[PATCH for-7.1 3/8] nbd: remove peppering of nbd_client_connected

2022-04-12 Thread Paolo Bonzini
It is unnecessary to check nbd_client_connected() because every time
s->state is moved out of NBD_CLIENT_CONNECTED the socket is shut down
and all coroutines are resumed.

The only case where it was actually needed is when the NBD
server disconnects and there is no reconnect-delay.  In that
case, nbd_receive_replies() does not set s->reply.handle and
nbd_co_do_receive_one_chunk() cannot continue.  For that one case,
check the return value of nbd_receive_replies().

As to the others:

* nbd_receive_replies() can put the current coroutine to sleep if another
reply is ongoing; then it will be woken by nbd_channel_error() called
by the ongoing reply.  Or it can try itself to read a reply header and
fail, thus calling nbd_channel_error() itself.

* nbd_co_send_request() will write the body of the request and fail

* nbd_reply_chunk_iter_receive() will call nbd_co_receive_one_chunk()
and then nbd_co_do_receive_one_chunk(), which will handle the failure as
above; or it will just detect a previous call to nbd_iter_channel_error()
via iter->ret < 0.

Signed-off-by: Paolo Bonzini 
---
 block/nbd.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 81b319318e..02db52a230 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -410,10 +410,6 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState 
*s, uint64_t handle)
 return 0;
 }
 
-if (!nbd_client_connected(s)) {
-return -EIO;
-}
-
 if (s->reply.handle != 0) {
 /*
  * Some other request is being handled now. It should already be
@@ -515,7 +511,7 @@ static int coroutine_fn 
nbd_co_send_request(BlockDriverState *bs,
 if (qiov) {
 qio_channel_set_cork(s->ioc, true);
 rc = nbd_send_request(s->ioc, request);
-if (nbd_client_connected(s) && rc >= 0) {
+if (rc >= 0) {
 if (qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov,
NULL) < 0) {
 rc = -EIO;
@@ -832,8 +828,8 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
 }
 *request_ret = 0;
 
-nbd_receive_replies(s, handle);
-if (!nbd_client_connected(s)) {
+ret = nbd_receive_replies(s, handle);
+if (ret < 0) {
 error_setg(errp, "Connection closed");
 return -EIO;
 }
@@ -985,11 +981,6 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
 NBDReply local_reply;
 NBDStructuredReplyChunk *chunk;
 Error *local_err = NULL;
-if (!nbd_client_connected(s)) {
-error_setg(_err, "Connection closed");
-nbd_iter_channel_error(iter, -EIO, _err);
-goto break_loop;
-}
 
 if (iter->done) {
 /* Previous iteration was last. */
@@ -1010,7 +1001,7 @@ static bool nbd_reply_chunk_iter_receive(BDRVNBDState *s,
 }
 
 /* Do not execute the body of NBD_FOREACH_REPLY_CHUNK for simple reply. */
-if (nbd_reply_is_simple(reply) || !nbd_client_connected(s)) {
+if (nbd_reply_is_simple(reply) || iter->ret < 0) {
 goto break_loop;
 }
 
-- 
2.35.1





Re: [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK

2022-04-12 Thread Kirill A. Shutemov
On Tue, Apr 12, 2022 at 09:39:25PM +0800, Chao Peng wrote:
> On Mon, Apr 11, 2022 at 06:32:33PM +0300, Kirill A. Shutemov wrote:
> > On Thu, Apr 07, 2022 at 04:05:36PM +, Sean Christopherson wrote:
> > > Hmm, shmem_writepage() already handles SHM_F_INACCESSIBLE by rejecting 
> > > the swap, so
> > > maybe it's just the page migration path that needs to be updated?
> > 
> > My early version prevented migration with -ENOTSUPP for
> > address_space_operations::migratepage().
> > 
> > What's wrong with that approach?
> 
> I previously thought migratepage will not be called since we already
> marked the pages as UNMOVABLE, sounds not correct?

Do you mean missing __GFP_MOVABLE? I can be wrong, but I don't see that it
direclty affects if the page is migratable. It is a hint to page allocator
to group unmovable pages to separate page block and impove availablity of
higher order pages this way. Page allocator tries to allocate unmovable
pages from pages blocks that already have unmovable pages.

-- 
 Kirill A. Shutemov



Re: [PULL 03/15] multifd: Make no compression operations into its own structure

2022-04-12 Thread Peter Maydell
On Fri, 28 Feb 2020 at 09:26, Juan Quintela  wrote:
>
> It will be used later.
>
> Signed-off-by: Juan Quintela 
> Reviewed-by: Dr. David Alan Gilbert 
>

Hi; Coverity thinks there might be a buffer overrun here.
It's probably wrong, but it's not completely obvious why
it can't happen, so an assert somewhere would help...
(This is CID 1487239.)

> +MultiFDCompression migrate_multifd_compression(void)
> +{
> +MigrationState *s;
> +
> +s = migrate_get_current();
> +
> +return s->parameters.multifd_compression;

This function returns an enum of type MultiFDCompression,
whose (autogenerated from QAPI) definition is:

typedef enum MultiFDCompression {
MULTIFD_COMPRESSION_NONE,
MULTIFD_COMPRESSION_ZLIB,
#if defined(CONFIG_ZSTD)
MULTIFD_COMPRESSION_ZSTD,
#endif /* defined(CONFIG_ZSTD) */
MULTIFD_COMPRESSION__MAX,
} MultiFDCompression;

> @@ -604,6 +745,7 @@ int multifd_save_setup(Error **errp)
>  multifd_send_state->pages = multifd_pages_init(page_count);
>  qemu_sem_init(_send_state->channels_ready, 0);
>  atomic_set(_send_state->exiting, 0);
> +multifd_send_state->ops = multifd_ops[migrate_multifd_compression()];

Here we take the result of the function and use it as an
array index into multifd_ops, whose definition is
 static MultiFDMethods *multifd_ops[MULTIFD_COMPRESSION__MAX] = { ... }

Coverity doesn't see any reason why the return value from
migrate_multifd_compression() can't be MULTIFD_COMPRESSION__MAX,
so it complains that if it is then we are going to index off the
end of the array.

An assert in migrate_multifd_compression() that the value being
returned is within the expected range would probably placate it.

Alternatively, if the qapi type codegen didn't put the __MAX
value as a possible value of the enum type then Coverity
and probably also the compiler wouldn't consider it to be
a possible value of this kind of variable. But that might
have other awkward side-effects.

thanks
-- PMM



RE: [PATCH v8 10/12] target/hexagon: import parser for idef-parser

2022-04-12 Thread Taylor Simpson


> From: Anton Johansson  
> Sent: Tuesday, April 12, 2022 10:11 AM
> To: Taylor Simpson ; qemu-devel@nongnu.org
> Cc: a...@rev.ng; Brian Cain ; Michael Lambert 
> ; bab...@rev.ng; ni...@rev.ng; 
> richard.hender...@linaro.org
> Subject: Re: [PATCH v8 10/12] target/hexagon: import parser for idef-parser
> 
> Very nice catch, this is the bug that plagued us a few weeks ago when 
> rebasing,
> it has since been fixed. Actually the `gen_set_overflow` fucntion has been 
> removed
> completely as it was only called when we handled `asl/asr_r_r_sat`.
> Current way we handle overflow:
>
> Overflow is now only set by saturates, this assumption holds for the 
> instructions
> we parse in phase 1. In the parser, all saturates call `gen_rvalue_sat` which 
> emits
> a call to `gen_set_usr_field_if` in `genptr.c` to set USR_OVF only if the 
> value is
> non-zero. It does this via OR'ing with the previous value.
>
> See 
> https://gitlab.com/AntonJohansson/qemu/-/blob/feature/idef-parser/target/hexagon/idef-parser/parser-helpers.c#L2109
>  for `gen_rvalue_sat`
> and 
> https://gitlab.com/AntonJohansson/qemu/-/blob/feature/idef-parser/target/hexagon/genptr.c#L669
>  for `gen_set_usr_field_if`

Your implementation of gen_set_usr_field_if is not correct.  Don't extract the 
bits from the value based on the reg_field_info.  The easiest thing to do is 
compare val with zero and jump over a call to gen_set_usr_field.  If you are 
determined to do an "or", you have to assert that ref_field_info[field].width 
== 1.  Then, you can extract 1 bit from val starting at offset zero and shift 
the result left by ref_field_info[field].offset and then "or" with USR.

Thanks,
Taylor



Re: [PATCH for-7.1 0/8] nbd: actually make s->state thread-safe

2022-04-12 Thread Vladimir Sementsov-Ogievskiy

12.04.2022 20:32, Paolo Bonzini wrote:

The main point of this series is patch 6, which removes the dubious and
probably wrong use of atomics in block/nbd.c.  This in turn is enabled
mostly by the cleanups in patches 3-5.  Together, they introduce a
QemuMutex that synchronizes the NBD client coroutines, the reconnect_delay
timer and nbd_cancel_in_flight() as well.

The fixes happen to remove an incorrect use of qemu_co_queue_restart_all
and qemu_co_enter_next on the s->free_sema CoQueue, which was not guarded
by s->send_mutex.

The rest is bugfixes, simplifying the code a bit, and extra documentation.

Paolo Bonzini (8):
   nbd: actually implement reply_possible safeguard
   nbd: mark more coroutine_fns
   nbd: remove peppering of nbd_client_connected
   nbd: keep send_mutex/free_sema handling outside
 nbd_co_do_establish_connection
   nbd: use a QemuMutex to synchronize reconnection with coroutines
   nbd: move s->state under requests_lock
   nbd: take receive_mutex when reading requests[].receiving
   nbd: document what is protected by the CoMutexes

  block/coroutines.h |   4 +-
  block/nbd.c| 303 +++--
  2 files changed, 157 insertions(+), 150 deletions(-)



Hmm, no patches come to me except for cover-letter. Neither here: 
https://patchew.org/QEMU/20220412173216.308065-1-pbonz...@redhat.com/

--
Best regards,
Vladimir



[PATCH for-7.1 0/8] nbd: actually make s->state thread-safe

2022-04-12 Thread Paolo Bonzini
The main point of this series is patch 6, which removes the dubious and
probably wrong use of atomics in block/nbd.c.  This in turn is enabled
mostly by the cleanups in patches 3-5.  Together, they introduce a
QemuMutex that synchronizes the NBD client coroutines, the reconnect_delay
timer and nbd_cancel_in_flight() as well.

The fixes happen to remove an incorrect use of qemu_co_queue_restart_all
and qemu_co_enter_next on the s->free_sema CoQueue, which was not guarded
by s->send_mutex.

The rest is bugfixes, simplifying the code a bit, and extra documentation.

Paolo Bonzini (8):
  nbd: actually implement reply_possible safeguard
  nbd: mark more coroutine_fns
  nbd: remove peppering of nbd_client_connected
  nbd: keep send_mutex/free_sema handling outside
nbd_co_do_establish_connection
  nbd: use a QemuMutex to synchronize reconnection with coroutines
  nbd: move s->state under requests_lock
  nbd: take receive_mutex when reading requests[].receiving
  nbd: document what is protected by the CoMutexes

 block/coroutines.h |   4 +-
 block/nbd.c| 303 +++--
 2 files changed, 157 insertions(+), 150 deletions(-)

-- 
2.35.1




[RFC PATCH 1/3] disas: Remove old libopcode s390 disassembler

2022-04-12 Thread Thomas Huth
Capstone should be superior to the old libopcode disassembler, so
we can drop the old file nowadays.

Signed-off-by: Thomas Huth 
---
 include/disas/dis-asm.h |1 -
 disas/s390.c| 1892 ---
 target/s390x/cpu.c  |1 -
 MAINTAINERS |2 -
 disas/meson.build   |1 -
 5 files changed, 1897 deletions(-)
 delete mode 100644 disas/s390.c

diff --git a/include/disas/dis-asm.h b/include/disas/dis-asm.h
index fadf6a65ef..aebd23dc20 100644
--- a/include/disas/dis-asm.h
+++ b/include/disas/dis-asm.h
@@ -450,7 +450,6 @@ int print_insn_d10v (bfd_vma, 
disassemble_info*);
 int print_insn_v850 (bfd_vma, disassemble_info*);
 int print_insn_tic30(bfd_vma, disassemble_info*);
 int print_insn_ppc  (bfd_vma, disassemble_info*);
-int print_insn_s390 (bfd_vma, disassemble_info*);
 int print_insn_crisv32  (bfd_vma, disassemble_info*);
 int print_insn_crisv10  (bfd_vma, disassemble_info*);
 int print_insn_microblaze   (bfd_vma, disassemble_info*);
diff --git a/disas/s390.c b/disas/s390.c
deleted file mode 100644
index a9ec8fa593..00
--- a/disas/s390.c
+++ /dev/null
@@ -1,1892 +0,0 @@
-/* opcodes/s390-dis.c revision 1.12 */
-/* s390-dis.c -- Disassemble S390 instructions
-   Copyright 2000, 2001, 2002, 2003, 2005 Free Software Foundation, Inc.
-   Contributed by Martin Schwidefsky (schwidef...@de.ibm.com).
-
-   This file is part of GDB, GAS and the GNU binutils.
-
-   This program is free software; you can redistribute it and/or modify
-   it under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 2 of the License, or
-   (at your option) any later version.
-
-   This program is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with this program; if not, write to the Free Software
-   Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
-   02110-1301, USA.  */
-
-#include "qemu/osdep.h"
-#include "disas/dis-asm.h"
-
-/* include/opcode/s390.h revision 1.9 */
-/* s390.h -- Header file for S390 opcode table
-   Copyright 2000, 2001, 2003 Free Software Foundation, Inc.
-   Contributed by Martin Schwidefsky (schwidef...@de.ibm.com).
-
-   This file is part of BFD, the Binary File Descriptor library.
-
-   This program is free software; you can redistribute it and/or modify
-   it under the terms of the GNU General Public License as published by
-   the Free Software Foundation; either version 2 of the License, or
-   (at your option) any later version.
-
-   This program is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-   GNU General Public License for more details.
-
-   You should have received a copy of the GNU General Public License
-   along with this program; if not, write to the Free Software
-   Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA
-   02110-1301, USA.  */
-
-#ifndef S390_H
-#define S390_H
-
-/* List of instruction sets variations. */
-
-enum s390_opcode_mode_val
-  {
-S390_OPCODE_ESA = 0,
-S390_OPCODE_ZARCH
-  };
-
-enum s390_opcode_cpu_val
-  {
-S390_OPCODE_G5 = 0,
-S390_OPCODE_G6,
-S390_OPCODE_Z900,
-S390_OPCODE_Z990,
-S390_OPCODE_Z9_109,
-S390_OPCODE_Z9_EC,
-S390_OPCODE_Z10
-  };
-
-/* The opcode table is an array of struct s390_opcode.  */
-
-struct s390_opcode
-  {
-/* The opcode name.  */
-const char * name;
-
-/* The opcode itself.  Those bits which will be filled in with
-   operands are zeroes.  */
-unsigned char opcode[6];
-
-/* The opcode mask.  This is used by the disassembler.  This is a
-   mask containing ones indicating those bits which must match the
-   opcode field, and zeroes indicating those bits which need not
-   match (and are presumably filled in by operands).  */
-unsigned char mask[6];
-
-/* The opcode length in bytes. */
-int oplen;
-
-/* An array of operand codes.  Each code is an index into the
-   operand table.  They appear in the order which the operands must
-   appear in assembly code, and are terminated by a zero.  */
-unsigned char operands[6];
-
-/* Bitmask of execution modes this opcode is available for.  */
-unsigned int modes;
-
-/* First cpu this opcode is available for.  */
-enum s390_opcode_cpu_val min_cpu;
-  };
-
-/* The table itself is sorted by major opcode number, and is otherwise
-   in the order in which the disassembler should consider
-   instructions.  */
-/* QEMU: Mark these static.  */
-static const struct 

[RFC PATCH 0/3] Remove some of the old libopcode based disassemblers

2022-04-12 Thread Thomas Huth
Many of the disassemblers in the disas folder are based on old
versions from the GNU tools (libopcode, GDB, ...) that were still
licensed under the GPL v2. The GNU tools switched to GPL v3 at one
point in time, so QEMU is stuck with the old versions, i.e. these
files did not see much updates for new processors anymore. But
for most architectures, we're preferring the Capstone disassembler
now anyway, so the old libopcode disassemblers are also hardly
used anymore.

I'm not 100% sure (thus this is marked as RFC), but I think we could
simply drop the old disassemblers nowadays, and hardly anybody would
miss them, since we now always embed capstone as a submodule anyway.
Or is there still an advantage in keeping these old files around?

This RFC series tackles with s390, arm (32-bit) and i386 ... I wanted
to get some feedback first, but if we agree that these can be removed,
the sparc, mips and ppc disassemblers likely can be removed, too.
(I think we should keep m68k.c since Capstone does not have support
for Coldfire CPUs yet).

Thomas Huth (3):
  disas: Remove old libopcode s390 disassembler
  disas: Remove old libopcode arm disassembler
  disas: Remove old libopcode i386 disassembler

 include/disas/dis-asm.h |3 -
 disas.c |3 -
 disas/arm.c | 4012 ---
 disas/i386.c| 6771 ---
 disas/s390.c| 1892 ---
 target/arm/cpu.c|8 -
 target/i386/cpu.c   |1 -
 target/s390x/cpu.c  |1 -
 MAINTAINERS |6 -
 disas/meson.build   |3 -
 10 files changed, 12700 deletions(-)
 delete mode 100644 disas/arm.c
 delete mode 100644 disas/i386.c
 delete mode 100644 disas/s390.c

-- 
2.27.0




[PATCH for-7.1] hw/block/fdc-sysbus: Always mark sysbus floppy controllers as not having DMA

2022-04-12 Thread Peter Maydell
The sysbus floppy controllers (devices sysbus-fdc and sun-fdtwo)
don't support DMA.  The core floppy controller code expects this to
be indicated by setting FDCtrl::dma_chann to -1.  This used to be
done in the device instance_init functions sysbus_fdc_initfn() and
sun4m_fdc_initfn(), but in commit 1430759ec3e we refactored this code
and accidentally lost the setting of dma_chann.

For sysbus-fdc this has no ill effects because we were redundantly
also setting dma_chann in fdctrl_init_sysbus(), but for sun-fdtwo
this means that guests which try to enable DMA on the floppy
controller will cause QEMU to crash because FDCtrl::dma is NULL.

Set dma_chann to -1 in the common instance init, and remove the
redundant code in fdctrl_init_sysbus() that is also setting it.

There is a six-year-old FIXME comment in the jazz board code to the
effect that in theory it should support doing DMA via a custom DMA
controller.  If anybody ever chooses to fix that they can do it by
adding support for setting both FDCtrl::dma_chann and FDCtrl::dma.
(A QOM link property 'dma-controller' on the sysbus device which can
be set to an instance of IsaDmaClass is probably the way to go.)

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/958
Signed-off-by: Peter Maydell 
---
 include/hw/block/fdc.h |  3 +--
 hw/block/fdc-sysbus.c  | 14 +++---
 hw/mips/jazz.c |  2 +-
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/hw/block/fdc.h b/include/hw/block/fdc.h
index 1ecca7cac7f..35248c08379 100644
--- a/include/hw/block/fdc.h
+++ b/include/hw/block/fdc.h
@@ -10,8 +10,7 @@
 #define TYPE_ISA_FDC "isa-fdc"
 
 void isa_fdc_init_drives(ISADevice *fdc, DriveInfo **fds);
-void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
-hwaddr mmio_base, DriveInfo **fds);
+void fdctrl_init_sysbus(qemu_irq irq, hwaddr mmio_base, DriveInfo **fds);
 void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
DriveInfo **fds, qemu_irq *fdc_tc);
 
diff --git a/hw/block/fdc-sysbus.c b/hw/block/fdc-sysbus.c
index 57fc8773f12..6c22c3510da 100644
--- a/hw/block/fdc-sysbus.c
+++ b/hw/block/fdc-sysbus.c
@@ -94,8 +94,7 @@ static void fdctrl_handle_tc(void *opaque, int irq, int level)
 trace_fdctrl_tc_pulse(level);
 }
 
-void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
-hwaddr mmio_base, DriveInfo **fds)
+void fdctrl_init_sysbus(qemu_irq irq, hwaddr mmio_base, DriveInfo **fds)
 {
 FDCtrl *fdctrl;
 DeviceState *dev;
@@ -105,7 +104,6 @@ void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
 dev = qdev_new("sysbus-fdc");
 sys = SYSBUS_FDC(dev);
 fdctrl = >state;
-fdctrl->dma_chann = dma_chann; /* FIXME */
 sbd = SYS_BUS_DEVICE(dev);
 sysbus_realize_and_unref(sbd, _fatal);
 sysbus_connect_irq(sbd, 0, irq);
@@ -138,6 +136,16 @@ static void sysbus_fdc_common_instance_init(Object *obj)
 FDCtrlSysBus *sys = SYSBUS_FDC(obj);
 FDCtrl *fdctrl = >state;
 
+/*
+ * DMA is not currently supported for sysbus floppy controllers.
+ * If we wanted to add support then probably the best approach is
+ * to have a QOM link property 'dma-controller' which the board
+ * code can set to an instance of IsaDmaClass, and an integer
+ * property 'dma-channel', so that we can set fdctrl->dma and
+ * fdctrl->dma_chann accordingly.
+ */
+fdctrl->dma_chann = -1;
+
 qdev_set_legacy_instance_id(dev, 0 /* io */, 2); /* FIXME */
 
 memory_region_init_io(>iomem, obj,
diff --git a/hw/mips/jazz.c b/hw/mips/jazz.c
index 44f0d48bfd7..14eaa517435 100644
--- a/hw/mips/jazz.c
+++ b/hw/mips/jazz.c
@@ -354,7 +354,7 @@ static void mips_jazz_init(MachineState *machine,
 fds[n] = drive_get(IF_FLOPPY, 0, n);
 }
 /* FIXME: we should enable DMA with a custom IsaDma device */
-fdctrl_init_sysbus(qdev_get_gpio_in(rc4030, 1), -1, 0x80003000, fds);
+fdctrl_init_sysbus(qdev_get_gpio_in(rc4030, 1), 0x80003000, fds);
 
 /* Real time clock */
 mc146818_rtc_init(isa_bus, 1980, NULL);
-- 
2.25.1




Re: [PATCH v5 2/9] vfio: tolerate migration protocol v1 uapi renames

2022-04-12 Thread Matthew Rosato

On 4/12/22 11:50 AM, Pierre Morel wrote:



On 4/4/22 20:17, Matthew Rosato wrote:

The v1 uapi is deprecated and will be replaced by v2 at some point;
this patch just tolerates the renaming of uapi fields to reflect
v1 / deprecated status.

Signed-off-by: Matthew Rosato 
---
  hw/vfio/common.c    |  2 +-
  hw/vfio/migration.c | 19 +++
  2 files changed, 12 insertions(+), 9 deletions(-)



I do not understand why you need this patch in this series.
Shouldn't it be separate?


This patch is included because of the patch 1 kernel header sync, which 
pulls in uapi headers from kernel version 5.18-rc1 + my unmerged kernel 
uapi changes.


This patch is unnecessary without a header sync (and in fact would break 
QEMU compile), and is unrelated to the rest of the series -- but QEMU 
will not compile without it once you update linux uapi headers to 
5.18-rc1 (or greater) due to the v1 uapi for vfio migration being 
deprecated [1].  This means that ANY series that does a linux header 
sync starting from here on will need something like this patch to go 
along with the header sync (or a series that replaces v1 usage with v2?).


If this patch looks good it could be included whenever a header sync is 
next needed, doesn't necessarily have to be with this series.


[1] https://www.spinics.net/lists/kernel/msg4288200.html





diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 080046e3f5..7b1e12fb69 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -380,7 +380,7 @@ static bool 
vfio_devices_all_running_and_saving(VFIOContainer *container)

  return false;
  }
-    if ((migration->device_state & VFIO_DEVICE_STATE_SAVING) &&
+    if ((migration->device_state & 
VFIO_DEVICE_STATE_V1_SAVING) &&
  (migration->device_state & 
VFIO_DEVICE_STATE_RUNNING)) {

  continue;
  } else {
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index ff6b45de6b..e109cee551 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -432,7 +432,7 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
  }
  ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
-   VFIO_DEVICE_STATE_SAVING);
+   VFIO_DEVICE_STATE_V1_SAVING);
  if (ret) {
  error_report("%s: Failed to set state SAVING", vbasedev->name);
  return ret;
@@ -532,7 +532,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, 
void *opaque)

  int ret;
  ret = vfio_migration_set_state(vbasedev, 
~VFIO_DEVICE_STATE_RUNNING,

-   VFIO_DEVICE_STATE_SAVING);
+   VFIO_DEVICE_STATE_V1_SAVING);
  if (ret) {
  error_report("%s: Failed to set state STOP and SAVING",
   vbasedev->name);
@@ -569,7 +569,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, 
void *opaque)

  return ret;
  }
-    ret = vfio_migration_set_state(vbasedev, 
~VFIO_DEVICE_STATE_SAVING, 0);
+    ret = vfio_migration_set_state(vbasedev, 
~VFIO_DEVICE_STATE_V1_SAVING, 0);

  if (ret) {
  error_report("%s: Failed to set state STOPPED", 
vbasedev->name);

  return ret;
@@ -730,7 +730,7 @@ static void vfio_vmstate_change(void *opaque, bool 
running, RunState state)

   * start saving data.
   */
  if (state == RUN_STATE_SAVE_VM) {
-    value = VFIO_DEVICE_STATE_SAVING;
+    value = VFIO_DEVICE_STATE_V1_SAVING;
  } else {
  value = 0;
  }
@@ -768,8 +768,9 @@ static void vfio_migration_state_notifier(Notifier 
*notifier, void *data)

  case MIGRATION_STATUS_FAILED:
  bytes_transferred = 0;
  ret = vfio_migration_set_state(vbasedev,
-  ~(VFIO_DEVICE_STATE_SAVING | 
VFIO_DEVICE_STATE_RESUMING),

-  VFIO_DEVICE_STATE_RUNNING);
+   ~(VFIO_DEVICE_STATE_V1_SAVING |
+ VFIO_DEVICE_STATE_RESUMING),
+   VFIO_DEVICE_STATE_RUNNING);
  if (ret) {
  error_report("%s: Failed to set state RUNNING", 
vbasedev->name);

  }
@@ -864,8 +865,10 @@ int vfio_migration_probe(VFIODevice *vbasedev, 
Error **errp)

  goto add_blocker;
  }
-    ret = vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_MIGRATION,
-   VFIO_REGION_SUBTYPE_MIGRATION, 
);

+    ret = vfio_get_dev_region_info(vbasedev,
+   
VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
+   
VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,

+   );
  if (ret) {
  goto add_blocker;
  }








Re: [RFC 1/3] serial: Enable MSI capablity and option

2022-04-12 Thread Marc Zyngier

On 2022-04-12 03:10, Atish Patra wrote:

The seria-pci device doesn't support MSI. Enable the device to provide
MSI so that any platform with MSI support only can also use
this serial device. MSI can be enabled by enabling the newly introduced
device property. This will be disabled by default preserving the 
current

behavior of the seria-pci device.


This seems really odd. Switching to MSI implies that you now have
an edge signalling. This means that the guest will not be interrupted
again if it acks the MSI and doesn't service the device, as you'd
expect for a level interrupt (which is what the device generates today).

From what I understand of the patch, you signal a MSI on each
transition of the device state, which is equally odd (you get
an interrupt even where the device goes idle?).

While this may work for some guests, this completely changes the
semantics of the device. You may want to at least document the new
behaviour.

Thanks,

M.



Signed-off-by: Atish Patra 
---
 hw/char/serial-pci.c | 36 +---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/hw/char/serial-pci.c b/hw/char/serial-pci.c
index 93d6f9924425..ca93c2ce2776 100644
--- a/hw/char/serial-pci.c
+++ b/hw/char/serial-pci.c
@@ -31,6 +31,7 @@
 #include "hw/char/serial.h"
 #include "hw/irq.h"
 #include "hw/pci/pci.h"
+#include "hw/pci/msi.h"
 #include "hw/qdev-properties.h"
 #include "migration/vmstate.h"
 #include "qom/object.h"
@@ -39,26 +40,54 @@ struct PCISerialState {
 PCIDevice dev;
 SerialState state;
 uint8_t prog_if;
+bool msi_enabled;
 };

 #define TYPE_PCI_SERIAL "pci-serial"
 OBJECT_DECLARE_SIMPLE_TYPE(PCISerialState, PCI_SERIAL)

+
+static void msi_irq_handler(void *opaque, int irq_num, int level)
+{
+PCIDevice *pci_dev = opaque;
+
+assert(level == 0 || level == 1);
+
+if (msi_enabled(pci_dev)) {
+msi_notify(pci_dev, 0);
+}
+}
+
 static void serial_pci_realize(PCIDevice *dev, Error **errp)
 {
 PCISerialState *pci = DO_UPCAST(PCISerialState, dev, dev);
 SerialState *s = >state;
+Error *err = NULL;
+int ret;

 if (!qdev_realize(DEVICE(s), NULL, errp)) {
 return;
 }

 pci->dev.config[PCI_CLASS_PROG] = pci->prog_if;
-pci->dev.config[PCI_INTERRUPT_PIN] = 0x01;
-s->irq = pci_allocate_irq(>dev);
-
+if (pci->msi_enabled) {
+pci->dev.config[PCI_INTERRUPT_PIN] = 0x00;
+s->irq = qemu_allocate_irq(msi_irq_handler, >dev, 100);
+} else {
+pci->dev.config[PCI_INTERRUPT_PIN] = 0x01;
+s->irq = pci_allocate_irq(>dev);
+}
 memory_region_init_io(>io, OBJECT(pci), _io_ops, s, 
"serial", 8);

 pci_register_bar(>dev, 0, PCI_BASE_ADDRESS_SPACE_IO, >io);
+
+if (!pci->msi_enabled) {
+return;
+}
+
+ret = msi_init(>dev, 0, 1, true, false, );
+if (ret == -ENOTSUP) {
+fprintf(stdout, "MSIX INIT FAILED\n");
+}
 }

 static void serial_pci_exit(PCIDevice *dev)
@@ -83,6 +112,7 @@ static const VMStateDescription vmstate_pci_serial = 
{


 static Property serial_pci_properties[] = {
 DEFINE_PROP_UINT8("prog_if",  PCISerialState, prog_if, 0x02),
+DEFINE_PROP_BOOL("msi",  PCISerialState, msi_enabled, false),
 DEFINE_PROP_END_OF_LIST(),
 };


--
Jazz is not dead. It just smells funny...



[PATCH] vhost: Track descriptor chain in private at SVQ

2022-04-12 Thread Eugenio Pérez
Only the first one of them were properly enqueued back.

While we're at it, harden SVQ: The device could have access to modify
them, and it definitely have access when we implement packed vq. Harden
SVQ maintaining a private copy of the descriptor chain. Other fields
like buffer addresses are already maintained sepparatedly.

Fixes: 100890f7ca ("vhost: Shadow virtqueue buffers forwarding")

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h |  6 ++
 hw/virtio/vhost-shadow-virtqueue.c | 27 +--
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index e5e24c536d..c132c994e9 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -53,6 +53,12 @@ typedef struct VhostShadowVirtqueue {
 /* Next VirtQueue element that guest made available */
 VirtQueueElement *next_guest_avail_elem;

+/*
+ * Backup next field for each descriptor so we can recover securely, not
+ * needing to trust the device access.
+ */
+uint16_t *desc_next;
+
 /* Next head to expose to the device */
 uint16_t shadow_avail_idx;

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index b232803d1b..a2531d5874 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -138,6 +138,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue 
*svq, hwaddr *sg,
 for (n = 0; n < num; n++) {
 if (more_descs || (n + 1 < num)) {
 descs[i].flags = flags | cpu_to_le16(VRING_DESC_F_NEXT);
+descs[i].next = cpu_to_le16(svq->desc_next[i]);
 } else {
 descs[i].flags = flags;
 }
@@ -145,10 +146,10 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue 
*svq, hwaddr *sg,
 descs[i].len = cpu_to_le32(iovec[n].iov_len);

 last = i;
-i = cpu_to_le16(descs[i].next);
+i = cpu_to_le16(svq->desc_next[i]);
 }

-svq->free_head = le16_to_cpu(descs[last].next);
+svq->free_head = le16_to_cpu(svq->desc_next[last]);
 }

 static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
@@ -333,13 +334,22 @@ static void 
vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
 svq->vring.avail->flags |= cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
 }

+static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
+ uint16_t num, uint16_t i)
+{
+for (uint16_t j = 0; j < num; ++j) {
+i = le16_to_cpu(svq->desc_next[i]);
+}
+
+return i;
+}
+
 static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
uint32_t *len)
 {
-vring_desc_t *descs = svq->vring.desc;
 const vring_used_t *used = svq->vring.used;
 vring_used_elem_t used_elem;
-uint16_t last_used;
+uint16_t last_used, last_used_chain, num;

 if (!vhost_svq_more_used(svq)) {
 return NULL;
@@ -365,7 +375,10 @@ static VirtQueueElement 
*vhost_svq_get_buf(VhostShadowVirtqueue *svq,
 return NULL;
 }

-descs[used_elem.id].next = svq->free_head;
+num = svq->ring_id_maps[used_elem.id]->in_num +
+  svq->ring_id_maps[used_elem.id]->out_num;
+last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
+svq->desc_next[last_used_chain] = svq->free_head;
 svq->free_head = used_elem.id;

 *len = used_elem.len;
@@ -540,8 +553,9 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, 
VirtIODevice *vdev,
 svq->vring.used = qemu_memalign(qemu_real_host_page_size, device_size);
 memset(svq->vring.used, 0, device_size);
 svq->ring_id_maps = g_new0(VirtQueueElement *, svq->vring.num);
+svq->desc_next = g_new0(uint16_t, svq->vring.num);
 for (unsigned i = 0; i < svq->vring.num - 1; i++) {
-svq->vring.desc[i].next = cpu_to_le16(i + 1);
+svq->desc_next[i] = cpu_to_le16(i + 1);
 }
 }

@@ -574,6 +588,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
 virtqueue_detach_element(svq->vq, next_avail_elem, 0);
 }
 svq->vq = NULL;
+g_free(svq->desc_next);
 g_free(svq->ring_id_maps);
 qemu_vfree(svq->vring.desc);
 qemu_vfree(svq->vring.used);
--
2.27.0




Re: [PATCH v5 2/9] vfio: tolerate migration protocol v1 uapi renames

2022-04-12 Thread Pierre Morel




On 4/4/22 20:17, Matthew Rosato wrote:

The v1 uapi is deprecated and will be replaced by v2 at some point;
this patch just tolerates the renaming of uapi fields to reflect
v1 / deprecated status.

Signed-off-by: Matthew Rosato 
---
  hw/vfio/common.c|  2 +-
  hw/vfio/migration.c | 19 +++
  2 files changed, 12 insertions(+), 9 deletions(-)



I do not understand why you need this patch in this series.
Shouldn't it be separate?



diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 080046e3f5..7b1e12fb69 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -380,7 +380,7 @@ static bool 
vfio_devices_all_running_and_saving(VFIOContainer *container)
  return false;
  }
  
-if ((migration->device_state & VFIO_DEVICE_STATE_SAVING) &&

+if ((migration->device_state & VFIO_DEVICE_STATE_V1_SAVING) &&
  (migration->device_state & VFIO_DEVICE_STATE_RUNNING)) {
  continue;
  } else {
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index ff6b45de6b..e109cee551 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -432,7 +432,7 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
  }
  
  ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,

-   VFIO_DEVICE_STATE_SAVING);
+   VFIO_DEVICE_STATE_V1_SAVING);
  if (ret) {
  error_report("%s: Failed to set state SAVING", vbasedev->name);
  return ret;
@@ -532,7 +532,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
  int ret;
  
  ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_RUNNING,

-   VFIO_DEVICE_STATE_SAVING);
+   VFIO_DEVICE_STATE_V1_SAVING);
  if (ret) {
  error_report("%s: Failed to set state STOP and SAVING",
   vbasedev->name);
@@ -569,7 +569,7 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
  return ret;
  }
  
-ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_SAVING, 0);

+ret = vfio_migration_set_state(vbasedev, ~VFIO_DEVICE_STATE_V1_SAVING, 0);
  if (ret) {
  error_report("%s: Failed to set state STOPPED", vbasedev->name);
  return ret;
@@ -730,7 +730,7 @@ static void vfio_vmstate_change(void *opaque, bool running, 
RunState state)
   * start saving data.
   */
  if (state == RUN_STATE_SAVE_VM) {
-value = VFIO_DEVICE_STATE_SAVING;
+value = VFIO_DEVICE_STATE_V1_SAVING;
  } else {
  value = 0;
  }
@@ -768,8 +768,9 @@ static void vfio_migration_state_notifier(Notifier 
*notifier, void *data)
  case MIGRATION_STATUS_FAILED:
  bytes_transferred = 0;
  ret = vfio_migration_set_state(vbasedev,
-  ~(VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING),
-  VFIO_DEVICE_STATE_RUNNING);
+   ~(VFIO_DEVICE_STATE_V1_SAVING |
+ VFIO_DEVICE_STATE_RESUMING),
+   VFIO_DEVICE_STATE_RUNNING);
  if (ret) {
  error_report("%s: Failed to set state RUNNING", vbasedev->name);
  }
@@ -864,8 +865,10 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error 
**errp)
  goto add_blocker;
  }
  
-ret = vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_MIGRATION,

-   VFIO_REGION_SUBTYPE_MIGRATION, );
+ret = vfio_get_dev_region_info(vbasedev,
+   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
+   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
+   );
  if (ret) {
  goto add_blocker;
  }



--
Pierre Morel
IBM Lab Boeblingen



Re: [PATCH v5 4/4] hw/acpi/aml-build: Use existing CPU topology to build PPTT table

2022-04-12 Thread Jonathan Cameron via
On Sun,  3 Apr 2022 22:59:53 +0800
Gavin Shan  wrote:

> When the PPTT table is built, the CPU topology is re-calculated, but
> it's unecessary because the CPU topology has been populated in
> virt_possible_cpu_arch_ids() on arm/virt machine.
> 
> This reworks build_pptt() to avoid by reusing the existing one in
> ms->possible_cpus. Currently, the only user of build_pptt() is
> arm/virt machine.
> 
> Signed-off-by: Gavin Shan 

Hi Gavin,

My compiler isn't being very smart today and gives a bunch of
maybe used uninitialized for socket_offset, cluster_offset and core_offset.

They probably are initialized in all real paths, but I think you need to
set them to something at instantiation to keep the compiler happy.

Thanks,

Jonathan

> ---
>  hw/acpi/aml-build.c | 100 +---
>  1 file changed, 38 insertions(+), 62 deletions(-)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 4086879ebf..4b0f9df3e3 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -2002,86 +2002,62 @@ void build_pptt(GArray *table_data, BIOSLinker 
> *linker, MachineState *ms,
>  const char *oem_id, const char *oem_table_id)
>  {
>  MachineClass *mc = MACHINE_GET_CLASS(ms);
> -GQueue *list = g_queue_new();
> -guint pptt_start = table_data->len;
> -guint parent_offset;
> -guint length, i;
> -int uid = 0;
> -int socket;
> +CPUArchIdList *cpus = ms->possible_cpus;
> +int64_t socket_id = -1, cluster_id = -1, core_id = -1;
> +uint32_t socket_offset, cluster_offset, core_offset;
> +uint32_t pptt_start = table_data->len;
> +int n;
>  AcpiTable table = { .sig = "PPTT", .rev = 2,
>  .oem_id = oem_id, .oem_table_id = oem_table_id };
>  
>  acpi_table_begin(, table_data);
>  
> -for (socket = 0; socket < ms->smp.sockets; socket++) {
> -g_queue_push_tail(list,
> -GUINT_TO_POINTER(table_data->len - pptt_start));
> -build_processor_hierarchy_node(
> -table_data,
> -/*
> - * Physical package - represents the boundary
> - * of a physical package
> - */
> -(1 << 0),
> -0, socket, NULL, 0);
> -}
> +for (n = 0; n < cpus->len; n++) {
> +if (cpus->cpus[n].props.socket_id != socket_id) {
> +socket_id = cpus->cpus[n].props.socket_id;
> +cluster_id = -1;
> +core_id = -1;
> +socket_offset = table_data->len - pptt_start;
> +build_processor_hierarchy_node(table_data,
> +(1 << 0), /* Physical package */
> +0, socket_id, NULL, 0);
> +}
>  
> -if (mc->smp_props.clusters_supported) {
> -length = g_queue_get_length(list);
> -for (i = 0; i < length; i++) {
> -int cluster;
> -
> -parent_offset = GPOINTER_TO_UINT(g_queue_pop_head(list));
> -for (cluster = 0; cluster < ms->smp.clusters; cluster++) {
> -g_queue_push_tail(list,
> -GUINT_TO_POINTER(table_data->len - pptt_start));
> -build_processor_hierarchy_node(
> -table_data,
> -(0 << 0), /* not a physical package */
> -parent_offset, cluster, NULL, 0);
> +if (mc->smp_props.clusters_supported) {
> +if (cpus->cpus[n].props.cluster_id != cluster_id) {
> +cluster_id = cpus->cpus[n].props.cluster_id;
> +core_id = -1;
> +cluster_offset = table_data->len - pptt_start;
> +build_processor_hierarchy_node(table_data,
> +(0 << 0), /* Not a physical package */
> +socket_offset, cluster_id, NULL, 0);
>  }
> +} else {
> +cluster_offset = socket_offset;
>  }
> -}
>  
> -length = g_queue_get_length(list);
> -for (i = 0; i < length; i++) {
> -int core;
> -
> -parent_offset = GPOINTER_TO_UINT(g_queue_pop_head(list));
> -for (core = 0; core < ms->smp.cores; core++) {
> -if (ms->smp.threads > 1) {
> -g_queue_push_tail(list,
> -GUINT_TO_POINTER(table_data->len - pptt_start));
> -build_processor_hierarchy_node(
> -table_data,
> +if (ms->smp.threads <= 1) {
> +build_processor_hierarchy_node(table_data,
> +(1 << 1) | /* ACPI Processor ID valid */
> +(1 << 3),  /* Node is a Leaf */
> +cluster_offset, n, NULL, 0);
> +} else {
> +if (cpus->cpus[n].props.core_id != core_id) {
> +core_id = cpus->cpus[n].props.core_id;
> +core_offset = table_data->len - pptt_start;
> +build_processor_hierarchy_node(table_data,
>  (0 << 0), /* not a physical package */
> -   

Re: [Qemu-devel] [PATCH 6/8] i386/kvm: hv-stimer requires hv-time and hv-synic

2022-04-12 Thread Vitaly Kuznetsov
Divya Garg  writes:

> On 12/04/22 6:18 pm, Vitaly Kuznetsov wrote:
>> Divya Garg  writes:
>>
>>> Hi Vitaly Kuznetsov !
>>> I was working on hyperv flags and saw that we introduced new
>>> dependencies some
>>> time back
>>> (https://urldefense.proofpoint.com/v2/url?u=https-3A__sourcegraph.com_github.com_qemu_qemu_-2D_commit_c686193072a47032d83cb4e131dc49ae30f9e5d7-3Fvisible-3D1=DwIBAg=s883GpUCOChKOHiocYtGcg=2QGHz-fTCVWImEBKe1ZcSe5t6UfasnhvdzD5DcixwOE=ln-t0rKlkFkOEKe97jJTLi2BoKK5E9lLMPHjPihl4kpdbvBStPeD0Ku9wTed7GPf=AtipQDs1Mi-0FQtb1AyvBpR34bpjp64troGF_nr_08E=
>>>  ).
>>> After these changes, if we try to live migrate a vm from older qemu to newer
>>> one having these changes, it fails showing dependency issue.
>>>
>>> I was wondering if this is the expected behaviour or if there is any work
>>> around for handing it ? Or something needs to be done to ensure backward
>>> compatibility ?
>> Hi Divya,
>>
>> configurations with 'hv-stimer' and without 'hv-synic'/'hv-time' were
>> always incorrect as Windows can't use the feature, that's why the
>> dependencies were added. It is true that it doesn't seem to be possible
>> to forward-migrate such VMs to newer QEMU versions. We could've tied
>> these new dependencies to newer machine types I guess (so old machine
>> types would not fail to start) but we didn't do that back in 4.1 and
>> it's been awhile since... Not sure whether it would make much sense to
>> introduce something for pre-4.1 machine types now.
>>
>> Out of curiosity, why do such "incorrect" configurations exist? Can you
>> just update them to include missing flags on older QEMU so they migrate
>> to newer ones without issues?
>>
> Hi Vitaly !
>
> Thanks for the response. I understand that these were incorrect 
> configurations
> and should be corrected. Only issue is, we want to avoid power cycling those
> VMs. But yeah I think, since the configurations were wrong we should 
> update and
> power cycle the VM.  Just for understanding purpose, is it possible to 
> disable
> the feature by throwing out some warning message and update libvirt to 
> metigate
> this change and handle live migration ?
>

I'm not exactly sure about libvirt, I was under the impression it makes
sure that QEMU command line is the same on the destination and on the
source. If there's a way to add something, I'd suggest you add the
missing features (hv-time, hv-synic) on the destination rather than
remove 'hv-stimer' as it is probably safer.

> Or maybe update libvirt to not to ask for this feature from qemu during live
> migration and handle different configuration on source and destination 
> host ?

You can also modify QEMU locally and throw away these dependencies,
it'll allow these configurations again but generally speaking checking
that the set of hyper-v features is exactly the same on the source and
destination is the right thing to do: there are no guarantees that guest
OS (Windows) will keep behaving sane when the corresponding CPUIDs
change while it's running, all sorts of things are possible I believe.

-- 
Vitaly




Re: [PATCH v8 10/12] target/hexagon: import parser for idef-parser

2022-04-12 Thread Anton Johansson via
Very nice catch, this is the bug that plagued us a few weeks ago when 
rebasing,
it has since been fixed. Actually the `gen_set_overflow` fucntion has 
been removed

completely as it was only called when we handled `asl/asr_r_r_sat`.

Current way we handle overflow:

Overflow is now only set by saturates, this assumption holds for the 
instructions
we parse in phase 1. In the parser, all saturates call `gen_rvalue_sat` 
which emits
a call to `gen_set_usr_field_if` in `genptr.c` to set USR_OVF only if 
the value is

non-zero. It does this via OR'ing with the previous value.

See here 
 
for `gen_rvalue_sat`
and here 
 
for `gen_set_usr_field_if`





-Original Message-
From: Anton Johansson
Sent: Wednesday, February 9, 2022 11:03 AM
To:qemu-devel@nongnu.org
Cc:a...@rev.ng; Taylor Simpson; Brian Cain
; Michael Lambert;
bab...@rev.ng;ni...@rev.ng;richard.hender...@linaro.org
Subject: [PATCH v8 10/12] target/hexagon: import parser for idef-parser

Signed-off-by: Alessandro Di Federico
Signed-off-by: Paolo Montesel
Signed-off-by: Anton Johansson



diff --git a/target/hexagon/idef-parser/parser-helpers.c
b/target/hexagon/idef-parser/parser-helpers.c
new file mode 100644




+void gen_set_overflow(Context *c, YYLTYPE *locp, HexValue *value)
+{
+HexValue value_m = *value;
+
+if (is_inside_ternary(c)) {
+/* Inside ternary operator, need to take care of the side-effect */
+HexValue cond = get_ternary_cond(c, locp);
+HexValue zero = gen_constant(c, locp, "0", cond.bit_width,
UNSIGNED);
+bool is_64bit = cond.bit_width == 64;
+unsigned bit_width = cond.bit_width;
+value_m = rvalue_materialize(c, locp, _m);
+if (is_64bit) {
+value_m = gen_rvalue_extend(c, locp, _m);
+}
+OUT(c, locp, "tcg_gen_movcond_i", _width,
+ "(TCG_COND_NE, ", _m, ", ", );
+OUT(c, locp, ", ", , ", ", _m, ", ", , ");\n");

You shouldn't write zero when the condition is false - you should do nothing.  
Try a test where OVF is already set.  You can't overwrite the bit with zero 
when the current instruction doesn't overflow.




+if (is_64bit) {
+value_m = gen_rvalue_truncate(c, locp, _m);
+}
+gen_rvalue_free(c, locp, );
+}
+
+OUT(c, locp, "SET_USR_FIELD(USR_OVF, ", _m, ");\n");
+gen_rvalue_free(c, locp, _m);
+}

Re: [PULL v2 29/35] hw/intc: Add RISC-V AIA APLIC device emulation

2022-04-12 Thread Peter Maydell
On Wed, 16 Feb 2022 at 08:43, Alistair Francis
 wrote:
>
> From: Anup Patel 
>
> The RISC-V AIA (Advanced Interrupt Architecture) defines a new
> interrupt controller for wired interrupts called APLIC (Advanced
> Platform Level Interrupt Controller). The APLIC is capabable of
> forwarding wired interupts to RISC-V HARTs directly or as MSIs
> (Message Signaled Interupts).
>
> This patch adds device emulation for RISC-V AIA APLIC.

Hi; Coverity has some issues with this change; they're kind of
false positives but they do flag up a minor issue with the code.
This is CID 1487105, 1487113, 1487185, 1487208.

> +} else if ((APLIC_TARGET_BASE <= addr) &&
> +(addr < (APLIC_TARGET_BASE + (aplic->num_irqs - 1) * 4))) {
> +irq = ((addr - APLIC_TARGET_BASE) >> 2) + 1;
> +return aplic->target[irq];
> +} else if (!aplic->msimode && (APLIC_IDC_BASE <= addr) &&
> +(addr < (APLIC_IDC_BASE + aplic->num_harts * APLIC_IDC_SIZE))) {
> +idc = (addr - APLIC_IDC_BASE) / APLIC_IDC_SIZE;

In expressions like these where we're checking "is addr between
some base address and an upper bound calculated from num_irqs
or num_harts", Coverity warns that we calculate expressions like
(APLIC_TARGET_BASE + (aplic->num_irqs - 1) * 4) using 32-bits and
then compare against the 64-bit 'addr', so there might be an
unintentional overflow. Now clearly in this case num_irqs and num_harts
should never be so large that there is an overflow, so in that
sense Coverity is wrong and these are false positives. However...

> +static void riscv_aplic_realize(DeviceState *dev, Error **errp)
> +{
> +uint32_t i;
> +RISCVAPLICState *aplic = RISCV_APLIC(dev);
> +
> +aplic->bitfield_words = (aplic->num_irqs + 31) >> 5;
> +aplic->sourcecfg = g_new0(uint32_t, aplic->num_irqs);
> +aplic->state = g_new(uint32_t, aplic->num_irqs);
> +aplic->target = g_new0(uint32_t, aplic->num_irqs);
> +if (!aplic->msimode) {
> +for (i = 0; i < aplic->num_irqs; i++) {
> +aplic->target[i] = 1;
> +}
> +}
> +aplic->idelivery = g_new0(uint32_t, aplic->num_harts);
> +aplic->iforce = g_new0(uint32_t, aplic->num_harts);
> +aplic->ithreshold = g_new0(uint32_t, aplic->num_harts);
> +
> +memory_region_init_io(>mmio, OBJECT(dev), _aplic_ops, aplic,
> +  TYPE_RISCV_APLIC, aplic->aperture_size);
> +sysbus_init_mmio(SYS_BUS_DEVICE(dev), >mmio);
> +
> +/*
> + * Only root APLICs have hardware IRQ lines. All non-root APLICs
> + * have IRQ lines delegated by their parent APLIC.
> + */
> +if (!aplic->parent) {
> +qdev_init_gpio_in(dev, riscv_aplic_request, aplic->num_irqs);
> +}
> +
> +/* Create output IRQ lines for non-MSI mode */
> +if (!aplic->msimode) {
> +aplic->external_irqs = g_malloc(sizeof(qemu_irq) * aplic->num_harts);
> +qdev_init_gpio_out(dev, aplic->external_irqs, aplic->num_harts);
> +
> +/* Claim the CPU interrupt to be triggered by this APLIC */
> +for (i = 0; i < aplic->num_harts; i++) {
> +RISCVCPU *cpu = RISCV_CPU(qemu_get_cpu(aplic->hartid_base + i));
> +if (riscv_cpu_claim_interrupts(cpu,
> +(aplic->mmode) ? MIP_MEIP : MIP_SEIP) < 0) {
> +error_report("%s already claimed",
> + (aplic->mmode) ? "MEIP" : "SEIP");
> +exit(1);
> +}
> +}
> +}
> +
> +msi_nonbroken = true;
> +}

...in the realize method we don't do any sanity checking of our
QOM properties that set aplic->num_irqs and aplic->num_harts, so the
creator of the device could in theory pass us some bogus values that
cause overflow and other bad things.

> +/*
> + * Create APLIC device.
> + */
> +DeviceState *riscv_aplic_create(hwaddr addr, hwaddr size,
> +uint32_t hartid_base, uint32_t num_harts, uint32_t num_sources,
> +uint32_t iprio_bits, bool msimode, bool mmode, DeviceState *parent)
> +{
> +DeviceState *dev = qdev_new(TYPE_RISCV_APLIC);
> +uint32_t i;
> +
> +assert(num_harts < APLIC_MAX_IDC);
> +assert((APLIC_IDC_BASE + (num_harts * APLIC_IDC_SIZE)) <= size);
> +assert(num_sources < APLIC_MAX_SOURCE);
> +assert(APLIC_MIN_IPRIO_BITS <= iprio_bits);
> +assert(iprio_bits <= APLIC_MAX_IPRIO_BITS);
> +
> +qdev_prop_set_uint32(dev, "aperture-size", size);
> +qdev_prop_set_uint32(dev, "hartid-base", hartid_base);
> +qdev_prop_set_uint32(dev, "num-harts", num_harts);
> +qdev_prop_set_uint32(dev, "iprio-mask", ((1U << iprio_bits) - 1));
> +qdev_prop_set_uint32(dev, "num-irqs", num_sources + 1);

You do make some assert()s on num_harts and num_sources here, but
this is the wrong place to do this error checking, because there's
no particular reason why a board model has to use this function
rather than directly creating the device. Instead these checks should
go in the realize method and should cause realize to fail.

> +

Re: [RFC PATCH] target/i386: fix byte swap issue with XMM register access

2022-04-12 Thread Richard Henderson

On 4/12/22 01:54, Alex Bennée wrote:

During the conversion to the gdb_get_reg128 helpers the high and low
parts of the XMM register where inadvertently swapped. This causes
reads of the register to report the incorrect value to gdb.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/971
Fixes: b7b8756a9c (target/i386: use gdb_get_reg helpers)
Signed-off-by: Alex Bennée 
Cc: qemu-sta...@nongnu.org
---
  target/i386/gdbstub.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Richard Henderson 


r~



diff --git a/target/i386/gdbstub.c b/target/i386/gdbstub.c
index 098a2ad15a..c3a2cf6f28 100644
--- a/target/i386/gdbstub.c
+++ b/target/i386/gdbstub.c
@@ -129,8 +129,8 @@ int x86_cpu_gdb_read_register(CPUState *cs, GByteArray 
*mem_buf, int n)
  n -= IDX_XMM_REGS;
  if (n < CPU_NB_REGS32 || TARGET_LONG_BITS == 64) {
  return gdb_get_reg128(mem_buf,
-  env->xmm_regs[n].ZMM_Q(0),
-  env->xmm_regs[n].ZMM_Q(1));
+  env->xmm_regs[n].ZMM_Q(1),
+  env->xmm_regs[n].ZMM_Q(0));
  }
  } else {
  switch (n) {





Re: [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK

2022-04-12 Thread Jason Gunthorpe
On Fri, Apr 08, 2022 at 08:54:02PM +0200, David Hildenbrand wrote:

> RLIMIT_MEMLOCK was the obvious candidate, but as we discovered int he
> past already with secretmem, it's not 100% that good of a fit (unmovable
> is worth than mlocked). But it gets the job done for now at least.

No, it doesn't. There are too many different interpretations how
MELOCK is supposed to work

eg VFIO accounts per-process so hostile users can just fork to go past
it.

RDMA is per-process but uses a different counter, so you can double up

iouring is per-user and users a 3rd counter, so it can triple up on
the above two

> So I'm open for alternative to limit the amount of unmovable memory we
> might allocate for user space, and then we could convert seretmem as well.

I think it has to be cgroup based considering where we are now :\

Jason



Re: [PATCH for-7.1] target/mips: Remove stale TODO file

2022-04-12 Thread Richard Henderson

On 4/12/22 04:38, Thomas Huth wrote:

The last change to this file has been done in 2012, so it
seems like this is not really used anymore, and the content
is likely very out of date now.

Signed-off-by: Thomas Huth
---
  target/mips/TODO | 51 
  1 file changed, 51 deletions(-)
  delete mode 100644 target/mips/TODO


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 14/16] target/arm: Implement ESB instruction

2022-04-12 Thread Richard Henderson

On 4/12/22 02:56, Peter Maydell wrote:

On Mon, 11 Apr 2022 at 23:14, Richard Henderson
 wrote:


On 4/11/22 09:18, Peter Maydell wrote:

+  ESB 0011 0010    0001 
+]


Why don't we decode bits [11:8] here? I see it's the same
as YIELD/WFE/WFI, but I'm not sure why we're not decoding
those bits in those insns either...


See page F4-7074 in H.a, where bits [11:8] of the imm12 field are described 
with ''.


Hmm. That just means "decodes to the NOP/WFI/ESB/whatever
instruction-description whatever the value of those bits",
but when the specific instruction-description then marks
those bits as "(0)" or "(1)", that has the usual CONSTRAINED
UNPREDICTABLE meaning described in section F1.7.2, where
we get a free choice of UNDEF, NOP, ignore the bit, or
any-dest-regs-are-UNKNOWN. So we're within the spec to
not decode [11:8] but I think it would be more consistent
with how we try to handle those (0) and (1) bits generally
if we insist that [11:8] is all zeroes here.

For this series, I guess go along with the current way we
handle hint instructions, and maybe fix this as a separate
cleanup later.


Ok.

r~



Re: [PATCH v2 for 7.1 1/1] block: add 'force' parameter to 'blockdev-change-medium' command

2022-04-12 Thread Denis V. Lunev

On 12.04.2022 16:17, Vladimir Sementsov-Ogievskiy wrote:

12.04.2022 12:50, Denis V. Lunev wrote:

'blockdev-change-medium' is a convinient wrapper for the following
sequence of commands:
  * blockdev-open-tray
  * blockdev-remove-medium
  * blockdev-insert-medium
  * blockdev-close-tray
and should be used f.e. to change ISO image inside the CD-ROM tray.
Though the guest could lock the tray and some linux guests like
CentOS 8.5 actually does that. In this case the execution if this
command results in the error like the following:
   Device 'scsi0-0-1-0' is locked and force was not specified,
   wait for tray to open and try again.

This situation is could be resolved 'blockdev-open-tray' by passing
flag 'force' inside. Thus is seems reasonable to add the same
capability for 'blockdev-change-medium' too.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Hanna Reitz 
CC: "Dr. David Alan Gilbert" 
CC: Eric Blake 
CC: Markus Armbruster 
CC: Vladimir Sementsov-Ogievskiy 
---
  block/qapi-sysemu.c |  3 ++-
  hmp-commands.hx | 11 +++
  monitor/hmp-cmds.c  |  4 +++-
  qapi/block.json |  6 ++
  ui/cocoa.m  |  1 +
  5 files changed, 19 insertions(+), 6 deletions(-)

Changes from v1:
- added kludge to Objective C code
- simplified a bit call of do_open_tray() (thanks, Vova!)
- added record to hmp-command.hx

diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
index 8498402ad4..680c7ee342 100644
--- a/block/qapi-sysemu.c
+++ b/block/qapi-sysemu.c
@@ -318,6 +318,7 @@ void qmp_blockdev_change_medium(bool has_device, 
const char *device,

  bool has_id, const char *id,
  const char *filename,
  bool has_format, const char *format,
+    bool has_force, bool force,
  bool has_read_only,
  BlockdevChangeReadOnlyMode read_only,
  Error **errp)
@@ -380,7 +381,7 @@ void qmp_blockdev_change_medium(bool has_device, 
const char *device,

    rc = do_open_tray(has_device ? device : NULL,
    has_id ? id : NULL,
-  false, );
+  force, );
  if (rc && rc != -ENOSYS) {
  error_propagate(errp, err);
  goto fail;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8476277aa9..6ec593ea08 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -202,9 +202,9 @@ ERST
    {
  .name   = "change",
-    .args_type  = "device:B,target:F,arg:s?,read-only-mode:s?",
-    .params = "device filename [format [read-only-mode]]",
-    .help   = "change a removable medium, optional format",
+    .args_type  = 
"device:B,force:-f,target:F,arg:s?,read-only-mode:s?",

+    .params = "device [-f] filename [format [read-only-mode]]",
+    .help   = "change a removable medium, optional format, 
use -f to force the operation",

  .cmd    = hmp_change,
  },
  @@ -212,11 +212,14 @@ SRST
  ``change`` *device* *setting*
    Change the configuration of a device.
  -  ``change`` *diskdevice* *filename* [*format* [*read-only-mode*]]
+  ``change`` *diskdevice* [-f] *filename* [*format* [*read-only-mode*]]
  Change the medium for a removable disk device to point to 
*filename*. eg::

      (qemu) change ide1-cd0 /path/to/some.iso
  +    ``-f``
+  forces the operation even if the guest has locked the tray.
+
  *format* is optional.
    *read-only-mode* may be used to change the read-only status 
of the device.

diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 634968498b..d8b98bed6c 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1472,6 +1472,7 @@ void hmp_change(Monitor *mon, const QDict *qdict)
  const char *target = qdict_get_str(qdict, "target");
  const char *arg = qdict_get_try_str(qdict, "arg");
  const char *read_only = qdict_get_try_str(qdict, 
"read-only-mode");

+    bool force = qdict_get_try_bool(qdict, "force", false);
  BlockdevChangeReadOnlyMode read_only_mode = 0;
  Error *err = NULL;
  @@ -1508,7 +1509,8 @@ void hmp_change(Monitor *mon, const QDict 
*qdict)

  }
    qmp_blockdev_change_medium(true, device, false, NULL, 
target,
-   !!arg, arg, !!read_only, 
read_only_mode,

+   !!arg, arg, true, force,
+   !!read_only, read_only_mode,
 );
  }
  diff --git a/qapi/block.json b/qapi/block.json
index 82fcf2c914..3f100d4887 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -326,6 +326,11 @@
  # @read-only-mode: change the read-only mode of the device; defaults
  #  to 'retain'
  #
+# @force: if false (the default), an eject request through 
blockdev-open-tray
+# will be sent to the guest if it has locked the tray (and 

Re: [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK

2022-04-12 Thread Chao Peng
On Mon, Apr 11, 2022 at 06:32:33PM +0300, Kirill A. Shutemov wrote:
> On Thu, Apr 07, 2022 at 04:05:36PM +, Sean Christopherson wrote:
> > Hmm, shmem_writepage() already handles SHM_F_INACCESSIBLE by rejecting the 
> > swap, so
> > maybe it's just the page migration path that needs to be updated?
> 
> My early version prevented migration with -ENOTSUPP for
> address_space_operations::migratepage().
> 
> What's wrong with that approach?

I previously thought migratepage will not be called since we already
marked the pages as UNMOVABLE, sounds not correct?

Thanks,
Chao
> 
> -- 
>  Kirill A. Shutemov



Re: [PATCH v2 for 7.1 1/1] block: add 'force' parameter to 'blockdev-change-medium' command

2022-04-12 Thread Vladimir Sementsov-Ogievskiy

12.04.2022 12:50, Denis V. Lunev wrote:

'blockdev-change-medium' is a convinient wrapper for the following
sequence of commands:
  * blockdev-open-tray
  * blockdev-remove-medium
  * blockdev-insert-medium
  * blockdev-close-tray
and should be used f.e. to change ISO image inside the CD-ROM tray.
Though the guest could lock the tray and some linux guests like
CentOS 8.5 actually does that. In this case the execution if this
command results in the error like the following:
   Device 'scsi0-0-1-0' is locked and force was not specified,
   wait for tray to open and try again.

This situation is could be resolved 'blockdev-open-tray' by passing
flag 'force' inside. Thus is seems reasonable to add the same
capability for 'blockdev-change-medium' too.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Hanna Reitz 
CC: "Dr. David Alan Gilbert" 
CC: Eric Blake 
CC: Markus Armbruster 
CC: Vladimir Sementsov-Ogievskiy 
---
  block/qapi-sysemu.c |  3 ++-
  hmp-commands.hx | 11 +++
  monitor/hmp-cmds.c  |  4 +++-
  qapi/block.json |  6 ++
  ui/cocoa.m  |  1 +
  5 files changed, 19 insertions(+), 6 deletions(-)

Changes from v1:
- added kludge to Objective C code
- simplified a bit call of do_open_tray() (thanks, Vova!)
- added record to hmp-command.hx

diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
index 8498402ad4..680c7ee342 100644
--- a/block/qapi-sysemu.c
+++ b/block/qapi-sysemu.c
@@ -318,6 +318,7 @@ void qmp_blockdev_change_medium(bool has_device, const char 
*device,
  bool has_id, const char *id,
  const char *filename,
  bool has_format, const char *format,
+bool has_force, bool force,
  bool has_read_only,
  BlockdevChangeReadOnlyMode read_only,
  Error **errp)
@@ -380,7 +381,7 @@ void qmp_blockdev_change_medium(bool has_device, const char 
*device,
  
  rc = do_open_tray(has_device ? device : NULL,

has_id ? id : NULL,
-  false, );
+  force, );
  if (rc && rc != -ENOSYS) {
  error_propagate(errp, err);
  goto fail;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8476277aa9..6ec593ea08 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -202,9 +202,9 @@ ERST
  
  {

  .name   = "change",
-.args_type  = "device:B,target:F,arg:s?,read-only-mode:s?",
-.params = "device filename [format [read-only-mode]]",
-.help   = "change a removable medium, optional format",
+.args_type  = "device:B,force:-f,target:F,arg:s?,read-only-mode:s?",
+.params = "device [-f] filename [format [read-only-mode]]",
+.help   = "change a removable medium, optional format, use -f to force 
the operation",
  .cmd= hmp_change,
  },
  
@@ -212,11 +212,14 @@ SRST

  ``change`` *device* *setting*
Change the configuration of a device.
  
-  ``change`` *diskdevice* *filename* [*format* [*read-only-mode*]]

+  ``change`` *diskdevice* [-f] *filename* [*format* [*read-only-mode*]]
  Change the medium for a removable disk device to point to *filename*. eg::
  
(qemu) change ide1-cd0 /path/to/some.iso
  
+``-f``

+  forces the operation even if the guest has locked the tray.
+
  *format* is optional.
  
  *read-only-mode* may be used to change the read-only status of the device.

diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 634968498b..d8b98bed6c 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1472,6 +1472,7 @@ void hmp_change(Monitor *mon, const QDict *qdict)
  const char *target = qdict_get_str(qdict, "target");
  const char *arg = qdict_get_try_str(qdict, "arg");
  const char *read_only = qdict_get_try_str(qdict, "read-only-mode");
+bool force = qdict_get_try_bool(qdict, "force", false);
  BlockdevChangeReadOnlyMode read_only_mode = 0;
  Error *err = NULL;
  
@@ -1508,7 +1509,8 @@ void hmp_change(Monitor *mon, const QDict *qdict)

  }
  
  qmp_blockdev_change_medium(true, device, false, NULL, target,

-   !!arg, arg, !!read_only, read_only_mode,
+   !!arg, arg, true, force,
+   !!read_only, read_only_mode,
 );
  }
  
diff --git a/qapi/block.json b/qapi/block.json

index 82fcf2c914..3f100d4887 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -326,6 +326,11 @@
  # @read-only-mode: change the read-only mode of the device; defaults
  #  to 'retain'
  #
+# @force: if false (the default), an eject request through blockdev-open-tray
+# will be sent to the guest if it has locked the tray (and the tray
+# will not be opened immediately); if 

Re: [PATCH v5 03/13] mm/shmem: Support memfile_notifier

2022-04-12 Thread Chao Peng
On Mon, Apr 11, 2022 at 06:26:47PM +0300, Kirill A. Shutemov wrote:
> On Thu, Mar 10, 2022 at 10:09:01PM +0800, Chao Peng wrote:
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 9b31a7056009..7b43e274c9a2 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -903,6 +903,28 @@ static struct folio *shmem_get_partial_folio(struct 
> > inode *inode, pgoff_t index)
> > return page ? page_folio(page) : NULL;
> >  }
> >  
> > +static void notify_fallocate(struct inode *inode, pgoff_t start, pgoff_t 
> > end)
> > +{
> > +#ifdef CONFIG_MEMFILE_NOTIFIER
> > +   struct shmem_inode_info *info = SHMEM_I(inode);
> > +
> > +   memfile_notifier_fallocate(>memfile_notifiers, start, end);
> > +#endif
> 
> All these #ifdefs look ugly. Could you provide dummy memfile_* for
> !MEMFILE_NOTIFIER case?
Sure.

Chao
> 
> -- 
>  Kirill A. Shutemov



Re: [PATCH v5 01/13] mm/memfd: Introduce MFD_INACCESSIBLE flag

2022-04-12 Thread Chao Peng
On Mon, Apr 11, 2022 at 06:10:23PM +0300, Kirill A. Shutemov wrote:
> On Thu, Mar 10, 2022 at 10:08:59PM +0800, Chao Peng wrote:
> > From: "Kirill A. Shutemov" 
> > 
> > Introduce a new memfd_create() flag indicating the content of the
> > created memfd is inaccessible from userspace through ordinary MMU
> > access (e.g., read/write/mmap). However, the file content can be
> > accessed via a different mechanism (e.g. KVM MMU) indirectly.
> > 
> > It provides semantics required for KVM guest private memory support
> > that a file descriptor with this flag set is going to be used as the
> > source of guest memory in confidential computing environments such
> > as Intel TDX/AMD SEV but may not be accessible from host userspace.
> > 
> > Since page migration/swapping is not yet supported for such usages
> > so these pages are currently marked as UNMOVABLE and UNEVICTABLE
> > which makes them behave like long-term pinned pages.
> > 
> > The flag can not coexist with MFD_ALLOW_SEALING, future sealing is
> > also impossible for a memfd created with this flag.
> > 
> > At this time only shmem implements this flag.
> > 
> > Signed-off-by: Kirill A. Shutemov 
> > Signed-off-by: Chao Peng 
> > ---
> >  include/linux/shmem_fs.h   |  7 +
> >  include/uapi/linux/memfd.h |  1 +
> >  mm/memfd.c | 26 +++--
> >  mm/shmem.c | 57 ++
> >  4 files changed, 88 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> > index e65b80ed09e7..2dde843f28ef 100644
> > --- a/include/linux/shmem_fs.h
> > +++ b/include/linux/shmem_fs.h
> > @@ -12,6 +12,9 @@
> >  
> >  /* inode in-kernel data */
> >  
> > +/* shmem extended flags */
> > +#define SHM_F_INACCESSIBLE 0x0001  /* prevent ordinary MMU access (e.g. 
> > read/write/mmap) to file content */
> > +
> >  struct shmem_inode_info {
> > spinlock_t  lock;
> > unsigned intseals;  /* shmem seals */
> > @@ -24,6 +27,7 @@ struct shmem_inode_info {
> > struct shared_policypolicy; /* NUMA memory alloc policy */
> > struct simple_xattrsxattrs; /* list of xattrs */
> > atomic_tstop_eviction;  /* hold when working on inode */
> > +   unsigned intxflags; /* shmem extended flags */
> > struct inodevfs_inode;
> >  };
> >  
> 
> AFAICS, only two bits of 'flags' are used. And that's very strange that
> VM_ flags are used for the purpose. My guess that someone was lazy to
> introduce new constants for this.
> 
> I think we should fix this: introduce SHM_F_LOCKED and SHM_F_NORESERVE
> alongside with SHM_F_INACCESSIBLE and stuff them all into info->flags.
> It also makes shmem_file_setup_xflags() go away.

Did a quick search and sounds we only use SHM_F_LOCKED/SHM_F_NORESERVE and
that definitely don't have to be VM_ flags.

Chao
> 
> -- 
>  Kirill A. Shutemov



Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-04-12 Thread Chao Peng
On Fri, Apr 08, 2022 at 11:35:05AM -1000, Vishal Annapurve wrote:
> On Mon, Mar 28, 2022 at 10:17 AM Andy Lutomirski  wrote:
> >
> > On Thu, Mar 10, 2022 at 6:09 AM Chao Peng  
> > wrote:
> > >
> > > This is the v5 of this series which tries to implement the fd-based KVM
> > > guest private memory. The patches are based on latest kvm/queue branch
> > > commit:
> > >
> > >   d5089416b7fb KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
> >
> > Can this series be run and a VM booted without TDX?  A feature like
> > that might help push it forward.
> >
> > --Andy
> 
> I have posted a RFC series with selftests to exercise the UPM feature
> with normal non-confidential VMs via
> https://lore.kernel.org/kvm/20220408210545.3915712-1-vannapu...@google.com/

Thanks Vishal, this sounds very helpful, it already started to find
bugs.

Chao
> 
> -- Vishal



Re: [Qemu-devel] [PATCH 6/8] i386/kvm: hv-stimer requires hv-time and hv-synic

2022-04-12 Thread Divya Garg



On 12/04/22 6:18 pm, Vitaly Kuznetsov wrote:

Divya Garg  writes:


Hi Vitaly Kuznetsov !
I was working on hyperv flags and saw that we introduced new
dependencies some
time back
(https://urldefense.proofpoint.com/v2/url?u=https-3A__sourcegraph.com_github.com_qemu_qemu_-2D_commit_c686193072a47032d83cb4e131dc49ae30f9e5d7-3Fvisible-3D1=DwIBAg=s883GpUCOChKOHiocYtGcg=2QGHz-fTCVWImEBKe1ZcSe5t6UfasnhvdzD5DcixwOE=ln-t0rKlkFkOEKe97jJTLi2BoKK5E9lLMPHjPihl4kpdbvBStPeD0Ku9wTed7GPf=AtipQDs1Mi-0FQtb1AyvBpR34bpjp64troGF_nr_08E=
 ).
After these changes, if we try to live migrate a vm from older qemu to newer
one having these changes, it fails showing dependency issue.

I was wondering if this is the expected behaviour or if there is any work
around for handing it ? Or something needs to be done to ensure backward
compatibility ?

Hi Divya,

configurations with 'hv-stimer' and without 'hv-synic'/'hv-time' were
always incorrect as Windows can't use the feature, that's why the
dependencies were added. It is true that it doesn't seem to be possible
to forward-migrate such VMs to newer QEMU versions. We could've tied
these new dependencies to newer machine types I guess (so old machine
types would not fail to start) but we didn't do that back in 4.1 and
it's been awhile since... Not sure whether it would make much sense to
introduce something for pre-4.1 machine types now.

Out of curiosity, why do such "incorrect" configurations exist? Can you
just update them to include missing flags on older QEMU so they migrate
to newer ones without issues?


Hi Vitaly !

Thanks for the response. I understand that these were incorrect 
configurations

and should be corrected. Only issue is, we want to avoid power cycling those
VMs. But yeah I think, since the configurations were wrong we should 
update and
power cycle the VM.  Just for understanding purpose, is it possible to 
disable
the feature by throwing out some warning message and update libvirt to 
metigate

this change and handle live migration ?

Or maybe update libvirt to not to ask for this feature from qemu during live
migration and handle different configuration on source and destination 
host ?


Thanks
Regards
Divya



[PATCH] hw/nvme: fix narrowing conversion

2022-04-12 Thread Dmitry Tikhov
Since nlbas is of type int, it does not work with large namespace size
values, e.g., 9 TB size of file backing namespace and 8 byte metadata
with 4096 bytes lbasz gives negative nlbas value, which is later
promoted to negative int64_t type value and results in negative
ns->moff which breaks namespace

Signed-off-by: Dmitry Tikhov 
---
 hw/nvme/ns.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
index 324f53ea0c..af6504fad2 100644
--- a/hw/nvme/ns.c
+++ b/hw/nvme/ns.c
@@ -29,7 +29,8 @@ void nvme_ns_init_format(NvmeNamespace *ns)
 {
 NvmeIdNs *id_ns = >id_ns;
 BlockDriverInfo bdi;
-int npdg, nlbas, ret;
+int npdg, ret;
+int64_t nlbas;
 
 ns->lbaf = id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
 ns->lbasz = 1 << ns->lbaf.ds;
@@ -42,7 +43,7 @@ void nvme_ns_init_format(NvmeNamespace *ns)
 id_ns->ncap = id_ns->nsze;
 id_ns->nuse = id_ns->ncap;
 
-ns->moff = (int64_t)nlbas << ns->lbaf.ds;
+ns->moff = nlbas << ns->lbaf.ds;
 
 npdg = ns->blkconf.discard_granularity / ns->lbasz;
 
-- 
2.35.1




答复: [PATCH] block: fix core for unlock not permitted

2022-04-12 Thread suruifeng (A)
Hi,

The recurrence probability is extremely low. I have not reproduced this in the 
latest version.
However, after reviewing the latest code, we find that this also exists.

This is my understanding of the latest code, if there is a mistake in my 
understanding, please tell me.

bdrv_flush_all()// bdrv_next() is a cyclic execution, and is shown in 
the order of execution for convenience.
-> bdrv_next()  
-> bdrv_ref(bs) // ref mirror_top_bs, current ref=2
-> bdrv_flush()   // generated by block-coroutine-wrapper.py
-> bdrv_poll_co()
-> BDRV_POLL_WHILE()
-> mirror_exit_common()  // Triggered by I/O 
error on iothread context
-> bdrv_unref(mirror_top_bs)// 
unref mirror_top_bs,current ref=1

-> bdrv_next()  
-> bdrv_unref(old_bs)   // unref mirror_top_bs,current ref=0
-> bdrv_delete()
-> bdrv_close()
-> bdrv_flush() // generated by 
block-coroutine-wrapper.py
-> bdrv_poll_co()
-> BDRV_POLL_WHILE()
-> 
aio_context_release(ctx_);   // iothread_ctx is not acquire,so triggered the 
core

Suruifeng


-邮件原件-
发件人: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] 代表 Paolo Bonzini
发送时间: 2022年4月12日 19:05
收件人: suruifeng (A) ; qemu-devel@nongnu.org
主题: Re: [PATCH] block: fix core for unlock not permitted

On 4/12/22 09:13, suruifeng via wrote:
> qemu coredump:
>0x7f9e7205c81b in raise () from /usr/lib64/libc.so.6
>0x7f9e7205db41 in abort () from /usr/lib64/libc.so.6
>0x7f9e71ddbe94 in error_exit (err=, 
> msg=msg@entry=0x7f9e71ec1b50 <__func__.20287> "qemu_mutex_unlock_impl")
>  at /usr/src/debug/qemu-4.1.0-170.x86_64/util/qemu-thread-posix.c:36
>0x7f9e71ddc61f in qemu_mutex_unlock_impl 
> (mutex=mutex@entry=0x5559850b0b90, file=file@entry=0x7f9e71ec0978 
> "/home/abuild/rpmbuild/BUILD/qemu-4.1.0/util/async.c",
>  line=line@entry=524) at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/util/qemu-thread-posix.c:108
>0x7f9e71dd5bb5 in aio_context_release (ctx=ctx@entry=0x5559850b0b30) 
> at /usr/src/debug/qemu-4.1.0-170.x86_64/util/async.c:524
>0x7f9e70dfed28 in bdrv_flush (bs=bs@entry=0x5559851f0a20) at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/block/io.c:2778
>0x7f9e70e37f63 in bdrv_close (bs=bs@entry=0x5559851f0a20) at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/block.c:4025
>0x7f9e70e38193 in bdrv_delete (bs=0x5559851f0a20) at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/block.c:4271
>0x7f9e70e38225 in bdrv_unref (bs=) at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/block.c:5612
>0x7f9e70df9a92 in bdrv_next (it=it@entry=0x7ffc5e3547a0) at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/block/block-backend.c:576
>0x7f9e70dfee76 in bdrv_flush_all () at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/block/io.c:2074
>0x7f9e71e3a08f in do_vm_stop (state=state@entry=RUN_STATE_SHUTDOWN, 
> send_stop=send_stop@entry=false) at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/cpus.c:1140
>0x7f9e71e3a14c in vm_shutdown () at 
> /usr/src/debug/qemu-4.1.0-170.x86_64/cpus.c:1151
> 
> During mirror job run, the VM is shutdown. During the shutdown, the mirror 
> job I/O error triggers mirror_exit_commom.
> In bdrv_flush_all(), bdrv_next() increase the ref to mirror_top_bs 
> first, and then bdrv_flush(bs) call BDRV_POLL_WHILE and executes 
> mirror_exit_common() decreases ref to mirror_top_bs, and finally bdrv_next() 
> decreases the ref to mirror_top_bs, resulting in release mirror_top_bs.
> 
> Let's fix this by adding aio_context_acquire() and aio_context_release() to 
> bdrv_next().

Hi,

is this reproducible with a more recent version of QEMU?  In particular, 
bdrv_next does not have bdrv_unref anymore.

Paolo

> Signed-off-by: suruifeng 
> ---
>   block/block-backend.c | 10 ++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c index 
> e0e1aff4b1..5ae745c0ab 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -593,6 +593,7 @@ BlockBackend *blk_next(BlockBackend *blk)
>   BlockDriverState *bdrv_next(BdrvNextIterator *it)
>   {
>   BlockDriverState *bs, *old_bs;
> +AioContext *ctx = NULL;
>   
>   /* Must be called from the main loop */
>   assert(qemu_get_current_aio_context() == 
> qemu_get_aio_context()); @@ -613,11 +614,17 @@ BlockDriverState 
> *bdrv_next(BdrvNextIterator *it)
>   if (it->blk) {
>   blk_ref(it->blk);
>   }
> + ctx = blk_get_aio_context(old_blk);
> + aio_context_acquire(ctx);
>   blk_unref(old_blk);
> + aio_context_release(ctx);
>   
>   if 

Re: [PATCH v5 12/13] KVM: Expose KVM_MEM_PRIVATE

2022-04-12 Thread Chao Peng
On Tue, Mar 29, 2022 at 07:13:00PM +, Sean Christopherson wrote:
> On Thu, Mar 10, 2022, Chao Peng wrote:
> > KVM_MEM_PRIVATE is not exposed by default but architecture code can turn
> > on it by implementing kvm_arch_private_memory_supported().
> > 
> > Signed-off-by: Yu Zhang 
> > Signed-off-by: Chao Peng 
> > ---
> >  include/linux/kvm_host.h |  1 +
> >  virt/kvm/kvm_main.c  | 24 +++-
> >  2 files changed, 20 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 186b9b981a65..0150e952a131 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -1432,6 +1432,7 @@ bool kvm_arch_dy_has_pending_interrupt(struct 
> > kvm_vcpu *vcpu);
> >  int kvm_arch_post_init_vm(struct kvm *kvm);
> >  void kvm_arch_pre_destroy_vm(struct kvm *kvm);
> >  int kvm_arch_create_vm_debugfs(struct kvm *kvm);
> > +bool kvm_arch_private_memory_supported(struct kvm *kvm);
> >  
> >  #ifndef __KVM_HAVE_ARCH_VM_ALLOC
> >  /*
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 52319f49d58a..df5311755a40 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -1485,10 +1485,19 @@ static void kvm_replace_memslot(struct kvm *kvm,
> > }
> >  }
> >  
> > -static int check_memory_region_flags(const struct 
> > kvm_userspace_memory_region *mem)
> > +bool __weak kvm_arch_private_memory_supported(struct kvm *kvm)
> > +{
> > +   return false;
> > +}
> > +
> > +static int check_memory_region_flags(struct kvm *kvm,
> > +   const struct kvm_userspace_memory_region *mem)
> >  {
> > u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
> >  
> > +   if (kvm_arch_private_memory_supported(kvm))
> > +   valid_flags |= KVM_MEM_PRIVATE;
> > +
> >  #ifdef __KVM_HAVE_READONLY_MEM
> > valid_flags |= KVM_MEM_READONLY;
> >  #endif
> > @@ -1900,7 +1909,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
> > int as_id, id;
> > int r;
> >  
> > -   r = check_memory_region_flags(mem);
> > +   r = check_memory_region_flags(kvm, mem);
> > if (r)
> > return r;
> >  
> > @@ -1913,10 +1922,12 @@ int __kvm_set_memory_region(struct kvm *kvm,
> > return -EINVAL;
> > if (mem->guest_phys_addr & (PAGE_SIZE - 1))
> > return -EINVAL;
> > -   /* We can read the guest memory with __xxx_user() later on. */
> > if ((mem->userspace_addr & (PAGE_SIZE - 1)) ||
> > -   (mem->userspace_addr != untagged_addr(mem->userspace_addr)) ||
> > -!access_ok((void __user *)(unsigned long)mem->userspace_addr,
> > +   (mem->userspace_addr != untagged_addr(mem->userspace_addr)))
> > +   return -EINVAL;
> > +   /* We can read the guest memory with __xxx_user() later on. */
> > +   if (!(mem->flags & KVM_MEM_PRIVATE) &&
> > +   !access_ok((void __user *)(unsigned long)mem->userspace_addr,
> 
> This should sanity check private_offset for private memslots.  At a bare 
> minimum,
> wrapping should be disallowed.

Will add this.

> 
> > mem->memory_size))
> > return -EINVAL;
> > if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM)
> > @@ -1957,6 +1968,9 @@ int __kvm_set_memory_region(struct kvm *kvm,
> > if ((kvm->nr_memslot_pages + npages) < kvm->nr_memslot_pages)
> > return -EINVAL;
> > } else { /* Modify an existing slot. */
> > +   /* Private memslots are immutable, they can only be deleted. */
> > +   if (mem->flags & KVM_MEM_PRIVATE)
> > +   return -EINVAL;
> 
> These sanity checks belong in "KVM: Register private memslot to memory 
> backing store",
> e.g. that patch is "broken" without the immutability restriction.  It's 
> somewhat moot
> because the code is unreachable, but it makes reviewing confusing/difficult.
> 
> But rather than move the sanity checks back, I think I'd prefer to pull all 
> of patch 10
> here.  I think it also makes sense to drop "KVM: Use memfile_pfn_ops to 
> obtain pfn for
> private pages" and add the pointer in "struct kvm_memory_slot" in patch "KVM: 
> Extend the
> memslot to support fd-based private memory", with the use of the ops folded 
> into
> "KVM: Handle page fault for private memory".  Adding code to KVM and KVM-x86 
> in a single
> patch is ok, and overall makes things easier to review because the new 
> helpers have a
> user right away, especially since there will be #ifdeffery.
> 
> I.e. end up with something like:
> 
>   mm: Introduce memfile_notifier
>   mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK
>   KVM: Extend the memslot to support fd-based private memory
>   KVM: Use kvm_userspace_memory_region_ext
>   KVM: Add KVM_EXIT_MEMORY_ERROR exit
>   KVM: Handle page fault for private memory
>   KVM: Register private memslot to memory backing store
>   KVM: Zap existing KVM mappings when pages changed in the private fd
>   KVM: Enable and expose KVM_MEM_PRIVATE


Re: [BUG]QEMU jump into interrupt when single-stepping on aarch64

2022-04-12 Thread Shuai Xue
在 2022/4/7 PM12:10, Shuai Xue 写道:
> 在 2022/4/7 AM12:57, Richard Henderson 写道:
>> On 4/6/22 09:30, Shuai Xue wrote:
>>> Dear, folks,
>>>
>>> I try to debug Linux kernel with QEMU in single-stepping mode on aarch64 
>>> platform,
>>> the added breakpoint hits but after I type `step`, the gdb always jumps 
>>> into interrupt.
>>>
>>> My env:
>>>
>>> gdb-10.2
>>> qemu-6.2.0
>>> host kernel: 5.10.84
>>> VM kernel: 5.10.84
>>>
>>> The steps to reproduce:
>>> # host console: run a VM with only one core, the import arg: >> value='-s'/>
>>> # details can be found here: 
>>> https://www.redhat.com/en/blog/debugging-kernel-qemulibvirt
>>> virsh create dev_core0.xml
>>> 
>>> # run gdb client
>>> gdb ./vmlinux
>>>
>>> # gdb client on host console
>>> (gdb) dir 
>>> ./usr/src/debug/kernel-5.10.84/linux-5.10.84-004.alpha.ali5000.alios7.aarch64
>>> (gdb) target remote localhost:1234
>>> (gdb) info b
>>> Num Type   Disp Enb Address    What
>>> 1   breakpoint keep y   
>>> 1.1 y   0x800010361444 
>>> mm/memory-failure.c:1318
>>> 1.2 y   0x800010361450 in memory_failure
>>>     at 
>>> mm/memory-failure.c:1488
>>> (gdb) c
>>> Continuing.
>>>
>>> # console in VM, use madvise to inject a hwposion at virtual address 
>>> vaddr,
>>> # which will hit the b inmemory_failur: madvise(vaddr, pagesize, 
>>> MADV_HWPOISON);
>>> # and the VM pause
>>> ./run_madvise.c
>>>
>>> # gdb client on host console
>>> (gdb)
>>> Continuing.
>>> Breakpoint 1, 0x800010361444 in memory_failure () at 
>>> mm/memory-failure.c:1318
>>> 1318    res = -EHWPOISON;
>>> (gdb) n
>>> vectors () at arch/arm64/kernel/entry.S:552
>>> 552 kernel_ventry   1, irq  // IRQ 
>>> EL1h
>>
>> The 'n' command is not a single-step: use stepi, which will suppress 
>> interrupts.
>> Anyway, not a bug.
>>
>> r~
> 
> Hi, Richard,
> 
> Thank you for your quick reply, I also try `stepi`, but it does NOT work 
> either.
> 
>   (gdb) c
>   Continuing.
> 
>   Breakpoint 1, memory_failure (pfn=1273982, flags=1) at 
> mm/memory-failure.c:1488
>   1488{
>   (gdb) stepi
>   vectors () at arch/arm64/kernel/entry.S:552
>   552 kernel_ventry   1, irq  // IRQ 
> EL1h
> 
> According to QEMU doc[1]: the default single stepping behavior is step with 
> the IRQs
> and timer service routines off. I checked the MASK bits used to control the 
> single
> stepping IE on my machine as bellow:
> 
>   # gdb client on host (x86 plafrom)
>   (gdb) maintenance packet qqemu.sstepbits
>   sending: "qqemu.sstepbits"
>   received: "ENABLE=1,NOIRQ=2,NOTIMER=4"
> 
> The sstep MASK looks as expected, but does not work as expected.
> 
> I also try the same kernel and qemu version on X86 platform:
>>> gdb-10.2
>>> qemu-6.2.0
>>> host kernel: 5.10.84
>>> VM kernel: 5.10.84
> 
> 
> The command `n` jumps to the next instruction.
> 
>   # gdb client on host (x86 plafrom)
>   (gdb) b memory-failure.c:1488
>   Breakpoint 1, memory_failure (pfn=1128931, flags=1) at 
> mm/memory-failure.c:1488
>   1488{
>   (gdb) n
>   1497if (!sysctl_memory_failure_recovery)
>   (gdb) stepi
>   0x812efdbc  1497if 
> (!sysctl_memory_failure_recovery)
>   (gdb) stepi
>   0x812efdbe  1497if 
> (!sysctl_memory_failure_recovery)
>   (gdb) n
>   1500p = pfn_to_online_page(pfn);
>   (gdb) l
>   1496
>   1497if (!sysctl_memory_failure_recovery)
>   1498panic("Memory failure on page %lx", pfn);
>   1499
>   1500p = pfn_to_online_page(pfn);
>   1501if (!p) {
> 
> Best Regrades,
> Shuai
> 
> 
> [1] https://github.com/qemu/qemu/blob/master/docs/system/gdb.rst

Hi, Richard,

I was wondering that do you have any comments to this?

Best Regrades,
Shuai



Re: [Qemu-devel] [PATCH 6/8] i386/kvm: hv-stimer requires hv-time and hv-synic

2022-04-12 Thread Vitaly Kuznetsov
Divya Garg  writes:

> Hi Vitaly Kuznetsov !
> I was working on hyperv flags and saw that we introduced new 
> dependencies some
> time back 
> (https://sourcegraph.com/github.com/qemu/qemu/-/commit/c686193072a47032d83cb4e131dc49ae30f9e5d7?visible=1).
> After these changes, if we try to live migrate a vm from older qemu to newer
> one having these changes, it fails showing dependency issue.
>
> I was wondering if this is the expected behaviour or if there is any work
> around for handing it ? Or something needs to be done to ensure backward
> compatibility ?

Hi Divya,

configurations with 'hv-stimer' and without 'hv-synic'/'hv-time' were
always incorrect as Windows can't use the feature, that's why the
dependencies were added. It is true that it doesn't seem to be possible
to forward-migrate such VMs to newer QEMU versions. We could've tied
these new dependencies to newer machine types I guess (so old machine
types would not fail to start) but we didn't do that back in 4.1 and
it's been awhile since... Not sure whether it would make much sense to
introduce something for pre-4.1 machine types now.

Out of curiosity, why do such "incorrect" configurations exist? Can you
just update them to include missing flags on older QEMU so they migrate
to newer ones without issues?

-- 
Vitaly




Re: [PATCH v5 11/13] KVM: Zap existing KVM mappings when pages changed in the private fd

2022-04-12 Thread Chao Peng
On Tue, Mar 29, 2022 at 07:23:04PM +, Sean Christopherson wrote:
> On Thu, Mar 10, 2022, Chao Peng wrote:
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 67349421eae3..52319f49d58a 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -841,8 +841,43 @@ static int kvm_init_mmu_notifier(struct kvm *kvm)
> >  #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
> >  
> >  #ifdef CONFIG_MEMFILE_NOTIFIER
> > +static void kvm_memfile_notifier_handler(struct memfile_notifier *notifier,
> > +pgoff_t start, pgoff_t end)
> > +{
> > +   int idx;
> > +   struct kvm_memory_slot *slot = container_of(notifier,
> > +   struct kvm_memory_slot,
> > +   notifier);
> > +   struct kvm_gfn_range gfn_range = {
> > +   .slot   = slot,
> > +   .start  = start - (slot->private_offset >> PAGE_SHIFT),
> > +   .end= end - (slot->private_offset >> PAGE_SHIFT),
> > +   .may_block  = true,
> > +   };
> > +   struct kvm *kvm = slot->kvm;
> > +
> > +   gfn_range.start = max(gfn_range.start, slot->base_gfn);
> > +   gfn_range.end = min(gfn_range.end, slot->base_gfn + slot->npages);
> > +
> > +   if (gfn_range.start >= gfn_range.end)
> > +   return;
> > +
> > +   idx = srcu_read_lock(>srcu);
> > +   KVM_MMU_LOCK(kvm);
> > +   kvm_unmap_gfn_range(kvm, _range);
> > +   kvm_flush_remote_tlbs(kvm);
> 
> This should check the result of kvm_unmap_gfn_range() and flush only if 
> necessary.

Yep.

> 
> kvm->mmu_notifier_seq needs to be incremented, otherwise KVM will incorrectly
> install a SPTE if the mapping is zapped between retrieving the pfn in faultin 
> and
> installing it after acquire mmu_lock.

Good catch.

Chao
> 
> 
> > +   KVM_MMU_UNLOCK(kvm);
> > +   srcu_read_unlock(>srcu, idx);
> > +}
> > +
> > +static struct memfile_notifier_ops kvm_memfile_notifier_ops = {
> > +   .invalidate = kvm_memfile_notifier_handler,
> > +   .fallocate = kvm_memfile_notifier_handler,
> > +};
> > +
> >  static inline int kvm_memfile_register(struct kvm_memory_slot *slot)
> >  {
> > +   slot->notifier.ops = _memfile_notifier_ops;
> > return memfile_register_notifier(file_inode(slot->private_file),
> >  >notifier,
> >  >pfn_ops);
> > @@ -1963,6 +1998,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
> > new->private_file = file;
> > new->private_offset = mem->flags & KVM_MEM_PRIVATE ?
> >   region_ext->private_offset : 0;
> > +   new->kvm = kvm;
> >  
> > r = kvm_set_memslot(kvm, old, new, change);
> > if (!r)
> > -- 
> > 2.17.1
> > 



Re: [PATCH v5 10/13] KVM: Register private memslot to memory backing store

2022-04-12 Thread Chao Peng
On Tue, Mar 29, 2022 at 07:01:52PM +, Sean Christopherson wrote:
> On Thu, Mar 10, 2022, Chao Peng wrote:
> > Add 'notifier' to memslot to make it a memfile_notifier node and then
> > register it to memory backing store via memfile_register_notifier() when
> > memslot gets created. When memslot is deleted, do the reverse with
> > memfile_unregister_notifier(). Note each KVM memslot can be registered
> > to different memory backing stores (or the same backing store but at
> > different offset) independently.
> > 
> > Signed-off-by: Yu Zhang 
> > Signed-off-by: Chao Peng 
> > ---
> >  include/linux/kvm_host.h |  1 +
> >  virt/kvm/kvm_main.c  | 75 
> >  2 files changed, 70 insertions(+), 6 deletions(-)
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 6e1d770d6bf8..9b175aeca63f 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -567,6 +567,7 @@ struct kvm_memory_slot {
> > struct file *private_file;
> > loff_t private_offset;
> > struct memfile_pfn_ops *pfn_ops;
> > +   struct memfile_notifier notifier;
> >  };
> >  
> >  static inline bool kvm_slot_is_private(const struct kvm_memory_slot *slot)
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index d11a2628b548..67349421eae3 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -840,6 +840,37 @@ static int kvm_init_mmu_notifier(struct kvm *kvm)
> >  
> >  #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
> >  
> > +#ifdef CONFIG_MEMFILE_NOTIFIER
> > +static inline int kvm_memfile_register(struct kvm_memory_slot *slot)
> 
> This is a good oppurtunity to hide away the memfile details a bit.  Maybe
> kvm_private_mem_{,un}register()?

Happy to change.

> 
> > +{
> > +   return memfile_register_notifier(file_inode(slot->private_file),
> > +>notifier,
> > +>pfn_ops);
> > +}
> > +
> > +static inline void kvm_memfile_unregister(struct kvm_memory_slot *slot)
> > +{
> > +   if (slot->private_file) {
> > +   memfile_unregister_notifier(file_inode(slot->private_file),
> > +   >notifier);
> > +   fput(slot->private_file);
> 
> This should not do fput(), it makes the helper imbalanced with respect to the
> register path and will likely lead to double fput().  Indeed, if preparing the
> region fails, __kvm_set_memory_region() will double up on fput() due to 
> checking
> its local "file" for null, not slot->private for null.

Right.

> 
> > +   slot->private_file = NULL;
> > +   }
> > +}
> > +
> > +#else /* !CONFIG_MEMFILE_NOTIFIER */
> > +
> > +static inline int kvm_memfile_register(struct kvm_memory_slot *slot)
> > +{
> 
> This should WARN_ON_ONCE().  Ditto for unregister.
> 
> > +   return -EOPNOTSUPP;
> > +}
> > +
> > +static inline void kvm_memfile_unregister(struct kvm_memory_slot *slot)
> > +{
> > +}
> > +
> > +#endif /* CONFIG_MEMFILE_NOTIFIER */
> > +
> >  #ifdef CONFIG_HAVE_KVM_PM_NOTIFIER
> >  static int kvm_pm_notifier_call(struct notifier_block *bl,
> > unsigned long state,
> > @@ -884,6 +915,9 @@ static void kvm_destroy_dirty_bitmap(struct 
> > kvm_memory_slot *memslot)
> >  /* This does not remove the slot from struct kvm_memslots data structures 
> > */
> >  static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
> >  {
> > +   if (slot->flags & KVM_MEM_PRIVATE)
> > +   kvm_memfile_unregister(slot);
> 
> With fput() move out of unregister, this needs to be:

Agreed.

> 
>   if (slot->flags & KVM_MEM_PRIVATE) {
>   kvm_private_mem_unregister(slot);
>   fput(slot->private_file);
>   }
> > +
> > kvm_destroy_dirty_bitmap(slot);
> >  
> > kvm_arch_free_memslot(kvm, slot);
> > @@ -1738,6 +1772,12 @@ static int kvm_set_memslot(struct kvm *kvm,
> > kvm_invalidate_memslot(kvm, old, invalid_slot);
> > }
> >  
> > +   if (new->flags & KVM_MEM_PRIVATE && change == KVM_MR_CREATE) {
> > +   r = kvm_memfile_register(new);
> > +   if (r)
> > +   return r;
> > +   }
> 
> This belongs in kvm_prepare_memory_region().  The shenanigans for DELETE and 
> MOVE
> are special.

Sure.

> 
> > +
> > r = kvm_prepare_memory_region(kvm, old, new, change);
> > if (r) {
> > /*
> > @@ -1752,6 +1792,10 @@ static int kvm_set_memslot(struct kvm *kvm,
> > } else {
> > mutex_unlock(>slots_arch_lock);
> > }
> > +
> > +   if (new->flags & KVM_MEM_PRIVATE && change == KVM_MR_CREATE)
> > +   kvm_memfile_unregister(new);
> > +
> > return r;
> > }
> >  
> > @@ -1817,6 +1861,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
> > enum kvm_mr_change change;
> > unsigned long npages;
> > gfn_t base_gfn;
> > +   struct file *file = NULL;
> 
> Nit, 

Re: [PATCH] Warn user if the vga flag is passed but no vga device is created

2022-04-12 Thread Peter Maydell
On Tue, 12 Apr 2022 at 13:13, Thomas Huth  wrote:
> On 08/04/2022 12.45, Gautam Agrawal wrote:
> > +if (!vga_interface_created && !default_vga) {
> > +warn_report("No vga device is created");
>
> I'm not a native speaker, and maybe it's just a matter of taste, but I'd
> rather say it in past tense: "No VGA device has been created"

I think we could phrase the warning to tell the user more
clearly what has happened:

"A -vga option was passed but this machine type does not use that
option; no VGA device has been created"

thanks
-- PMM



Re: [PATCH v4 10/11] tests/tcg/s390x: Tests for Vector Enhancements Facility 2

2022-04-12 Thread David Hildenbrand
On 05.04.22 19:03, David Miller wrote:
> Recommendation for comment?
> 
> /* vri-d encoding matches vrr for 4b imm.
>   .insn does not handle this encoding variant.
> */
> 

Sorry for the late reply.

".insn doesn't handle vri-d properly. So instead, we use vrr, which
matches vri-d with a 4b imm -- good enough for our purpose."


-- 
Thanks,

David / dhildenb




Re: [RFC PATCH 0/4] 9pfs: Add 9pfs support for Windows host

2022-04-12 Thread Christian Schoenebeck
On Freitag, 8. April 2022 19:10:09 CEST Bin Meng wrote:
> At present there is no Windows support for 9p file system.
> This series adds initial Windows support for 9p file system.

Nice!

> Only 'local' file system driver backend is supported. security_model
> should be 'none' due to limitations on Windows host.

We have 3 fs drivers: local, synth, proxy. I don't mind about proxy, it is in 
bad shape and we will probably deprecate it in near future anyway. But it 
would be good to have support for the synth driver, because we are using it 
for running test cases and fuzzing tests (QA).

What are the limitations against security_model=mapped on Windows? Keep in 
mind that with security_model=none you are very limited in what you can do 
with 9p.

> Example command line to test:
> 
>   "-fsdev local,path=c:\msys64,security_model=none,id=p9 -device
> virtio-9p-pci,fsdev=p9,mount_tag=p9fs"
> 
> 
> Guohuai Shi (4):
>   fsdev: Add missing definitions for Windows in file-op-9p.h
>   hw/9pfs: Update 'local' file system backend driver to support Windows
>   fsdev: Enable 'local' file system driver backend for Windows
>   meson.build: Turn on virtfs for Windows host
> 
>  meson.build |  10 +-
>  fsdev/file-op-9p.h  |  33 ++
>  hw/9pfs/9p-util.h   |   4 +
>  hw/9pfs/9p.h|  22 
>  fsdev/qemu-fsdev.c  |   2 +
>  hw/9pfs/9p-local.c  | 273 +++-
>  hw/9pfs/9p.c|  85 +-
>  hw/9pfs/codir.c |  17 +++
>  fsdev/meson.build   |   1 +
>  hw/9pfs/meson.build |  10 +-
>  10 files changed, 449 insertions(+), 8 deletions(-)





[RFC PATCH 4/4] net: slirp: allow CFI with libslirp >= 4.7

2022-04-12 Thread Paolo Bonzini
slirp 4.7 introduces a new CFI-friendly timer callback that does
not pass function pointers within libslirp as callbacks for timers.
Check the version number and, if it is new enough, allow using CFI
even with a system libslirp.

Signed-off-by: Paolo Bonzini 
---
 meson.build | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/meson.build b/meson.build
index 861de93c4f..92a83580a3 100644
--- a/meson.build
+++ b/meson.build
@@ -2485,21 +2485,21 @@ if have_system
 slirp = declare_dependency(link_with: libslirp,
dependencies: slirp_deps,
include_directories: slirp_inc)
+  else
+# slirp <4.7 is incompatible with CFI support in QEMU.  This is because
+# it passes function pointers within libslirp as callbacks for timers.
+# When using a system-wide shared libslirp, the type information for the
+# callback is missing and the timer call produces a false positive with 
CFI.
+#
+# Now that slirp_opt has been defined, check if the selected slirp is 
compatible
+# with control-flow integrity.
+if get_option('cfi') and slirp.found() and 
slirp.version().version_compare('<4.7')
+  error('Control-Flow Integrity is not compatible with system-wide slirp.' 
\
+ + ' Please configure with --enable-slirp=git or upgrade to 
libslirp 4.7')
+endif
   endif
 endif
 
-# For CFI, we need to compile slirp as a static library together with qemu.
-# This is because we register slirp functions as callbacks for QEMU Timers.
-# When using a system-wide shared libslirp, the type information for the
-# callback is missing and the timer call produces a false positive with CFI.
-#
-# Now that slirp_opt has been defined, check if the selected slirp is 
compatible
-# with control-flow integrity.
-if get_option('cfi') and slirp_opt == 'system'
-  error('Control-Flow Integrity is not compatible with system-wide slirp.' \
- + ' Please configure with --enable-slirp=git')
-endif
-
 fdt = not_found
 if have_system
   fdt_opt = get_option('fdt')
-- 
2.35.1




[RFC PATCH 2/4] net: slirp: switch to slirp_new

2022-04-12 Thread Paolo Bonzini
Replace slirp_init with slirp_new, so that a more recent cfg.version
can be specified.

Signed-off-by: Paolo Bonzini 
---
 net/slirp.c | 27 +--
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/net/slirp.c b/net/slirp.c
index f1e25d741f..b3a92d6e38 100644
--- a/net/slirp.c
+++ b/net/slirp.c
@@ -389,6 +389,7 @@ static int net_slirp_init(NetClientState *peer, const char 
*model,
 #if defined(CONFIG_SMBD_COMMAND)
 struct in_addr smbsrv = { .s_addr = 0 };
 #endif
+SlirpConfig cfg = { 0 };
 NetClientState *nc;
 SlirpState *s;
 char buf[20];
@@ -577,12 +578,26 @@ static int net_slirp_init(NetClientState *peer, const 
char *model,
 
 s = DO_UPCAST(SlirpState, nc, nc);
 
-s->slirp = slirp_init(restricted, ipv4, net, mask, host,
-  ipv6, ip6_prefix, vprefix6_len, ip6_host,
-  vhostname, tftp_server_name,
-  tftp_export, bootfile, dhcp,
-  dns, ip6_dns, dnssearch, vdomainname,
-  _cb, s);
+cfg.version = 3;
+cfg.restricted = restricted;
+cfg.in_enabled = ipv4;
+cfg.vnetwork = net;
+cfg.vnetmask = mask;
+cfg.vhost = host;
+cfg.in6_enabled = ipv6;
+cfg.vprefix_addr6 = ip6_prefix;
+cfg.vprefix_len = vprefix6_len;
+cfg.vhost6 = ip6_host;
+cfg.vhostname = vhostname;
+cfg.tftp_server_name = tftp_server_name;
+cfg.tftp_path = tftp_export;
+cfg.bootfile = bootfile;
+cfg.vdhcp_start = dhcp;
+cfg.vnameserver = dns;
+cfg.vnameserver6 = ip6_dns;
+cfg.vdnssearch = dnssearch;
+cfg.vdomainname = vdomainname;
+s->slirp = slirp_new(, _cb, s);
 QTAILQ_INSERT_TAIL(_stacks, s, entry);
 
 /*
-- 
2.35.1





Re: [PATCH v9 09/11] 9p: darwin: Implement compatibility for mknodat

2022-04-12 Thread Christian Schoenebeck
On Freitag, 8. April 2022 17:00:59 CEST Greg Kurz wrote:
> On Fri, 08 Apr 2022 15:52:25 +0200
> 
> Christian Schoenebeck  wrote:
> > On Sonntag, 27. Februar 2022 23:35:20 CEST Will Cohen wrote:
> > > From: Keno Fischer 
> > > 
> > > Darwin does not support mknodat. However, to avoid race conditions
> > > with later setting the permissions, we must avoid using mknod on
> > > the full path instead. We could try to fchdir, but that would cause
> > > problems if multiple threads try to call mknodat at the same time.
> > > However, luckily there is a solution: Darwin includes a function
> > > that sets the cwd for the current thread only.
> > > This should suffice to use mknod safely.
> > 
> > [...]
> > 
> > > diff --git a/hw/9pfs/9p-util-darwin.c b/hw/9pfs/9p-util-darwin.c
> > > index cdb4c9e24c..bec0253474 100644
> > > --- a/hw/9pfs/9p-util-darwin.c
> > > +++ b/hw/9pfs/9p-util-darwin.c
> > > @@ -7,6 +7,8 @@
> > > 
> > >  #include "qemu/osdep.h"
> > >  #include "qemu/xattr.h"
> > > 
> > > +#include "qapi/error.h"
> > > +#include "qemu/error-report.h"
> > > 
> > >  #include "9p-util.h"
> > >  
> > >  ssize_t fgetxattrat_nofollow(int dirfd, const char *filename, const
> > >  char
> > > 
> > > *name, @@ -62,3 +64,34 @@ int fsetxattrat_nofollow(int dirfd, const char
> > > *filename, const char *name, close_preserve_errno(fd);
> > > 
> > >  return ret;
> > >  
> > >  }
> > > 
> > > +
> > > +/*
> > > + * As long as mknodat is not available on macOS, this workaround
> > > + * using pthread_fchdir_np is needed.
> > > + *
> > > + * Radar filed with Apple for implementing mknodat:
> > > + * rdar://FB9862426 (https://openradar.appspot.com/FB9862426)
> > > + */
> > > +#if defined CONFIG_PTHREAD_FCHDIR_NP
> > > +
> > > +int qemu_mknodat(int dirfd, const char *filename, mode_t mode, dev_t
> > > dev)
> > > +{
> > > +int preserved_errno, err;
> > > +if (!pthread_fchdir_np) {
> > > +error_report_once("pthread_fchdir_np() not available on this
> > > version of macOS"); +return -ENOTSUP;
> > > +}
> > > +if (pthread_fchdir_np(dirfd) < 0) {
> > > +return -1;
> > > +}
> > > +err = mknod(filename, mode, dev);
> > 
> > I just tested this on macOS Monterey and realized mknod() seems to require
> > admin privileges on macOS to work. So if you run QEMU as ordinary user on
> > macOS then mknod() would fail with errno=1 (Operation not permitted).
> > 
> > That means a lot of stuff would simply not work on macOS, unless you
> > really
> > want to run QEMU with super user privileges, which does not sound
> > appealing to me. :/
> > 
> > Should we introduce another fake behaviour here, i.e. remapping this on
> > macOS hosts as regular file and make guest believe it would create a
> > device, similar as we already do for mapped links?
> 
> I'd rather keep that for the mapped security mode only to avoid
> confusion, but qemu_mknodat() is also used in passthrough mode.
> 
> Anyway, it seems that macOS's mknod() only creates device files,
> unlike linux (POSIX) which is also used to create FIFOs, sockets
> and regular files. And it also requires elevated privileges,
> CAP_MKNOD, in order to create device files.
> 
> It seems that this implementation of qemu_mknodat() is just missing
> some features that can be implemented with unprivileged syscalls like
> mkfifo(), socket() and open().

+Akihiko on CC.

Like always when it comes to POSIX APIs, Apple's documentation on this is far 
from being great. I actually had to test out what's supported with mknod() on 
macOS, in which way, and what is not (tested on macOS 12 "Monterey" only):

* S_IFIFO: works, even as regular user.

* S_IFREG: doesn't work, neither as regular user (ERRNO 1, Operation not 
  permitted), nor as super-user (ERRNO 22, Invalid argument). So we should 
  divert that to a regular open() call on macOS.

* S_IFCHR and S_IFBLK: works as super-user, doesn't work for regular user 
  (ERRNO 1, Operation not permitted). So if 9p is used with passthrough 
  permissions, we should probably stick to the direct mknod() call and that 
  the user would need to run QEMU as super-user to get this working. Whereas 
  if 9p is used with mapped permissions, we should fake those devices by 
  creating regular files, store their type and major, minor numbers there and 
  that's it. We don't expect that guest ever tries to read/write such block/
  character devices, right? I.e. I'm assuming that any read/write is handled 
  as an overlay by Linux kernel on its guest level, correct?

* S_IFSOCK: doesn't work, neither as regular user (ERRNO 1, Operation not 
  permitted), nor as super-user (ERRNO 22, Invalid argument). So we should 
  divert that to a socket() call on macOS.

Thoughts?

Best regards,
Christian Schoenebeck





[RFC PATCH 3/4] net: slirp: add support for CFI-friendly timer API

2022-04-12 Thread Paolo Bonzini
libslirp 4.7 introduces a CFI-friendly version of the .timer_new callback.
The new callback replaces the function pointer with an enum; invoking the
callback is done with a new function slirp_handle_timer.

Support the new API so that CFI can be made compatible with using a system
libslirp.

Signed-off-by: Paolo Bonzini 
---
 net/slirp.c | 41 -
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/net/slirp.c b/net/slirp.c
index b3a92d6e38..57af42299d 100644
--- a/net/slirp.c
+++ b/net/slirp.c
@@ -184,10 +184,43 @@ static int64_t net_slirp_clock_get_ns(void *opaque)
 return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 }
 
+typedef struct SlirpTimer SlirpTimer;
 struct SlirpTimer {
 QEMUTimer timer;
+#if SLIRP_CHECK_VERSION(4,7,0)
+Slirp *slirp;
+SlirpTimerId id;
+void *cb_opaque;
+#endif
+};
+
+#if SLIRP_CHECK_VERSION(4,7,0)
+static void net_slirp_init_completed(Slirp *slirp, void *opaque)
+{
+SlirpState *s = opaque;
+s->slirp = slirp;
 }
 
+static void net_slirp_timer_cb(void *opaque)
+{
+SlirpTimer *t = opaque;
+slirp_handle_timer(t->slirp, t->id, t->cb_opaque);
+}
+
+static void *net_slirp_timer_new_opaque(SlirpTimerId id,
+void *cb_opaque, void *opaque)
+{
+SlirpState *s = opaque;
+SlirpTimer *t = g_new(SlirpTimer, 1);
+t->slirp = s->slirp;
+t->id = id;
+t->cb_opaque = cb_opaque;
+timer_init_full(>timer, NULL, QEMU_CLOCK_VIRTUAL,
+SCALE_MS, QEMU_TIMER_ATTR_EXTERNAL,
+net_slirp_timer_cb, t);
+return t;
+}
+#else
 static void *net_slirp_timer_new(SlirpTimerCb cb,
  void *cb_opaque, void *opaque)
 {
@@ -197,6 +230,7 @@ static void *net_slirp_timer_new(SlirpTimerCb cb,
 cb, cb_opaque);
 return t;
 }
+#endif
 
 static void net_slirp_timer_free(void *timer, void *opaque)
 {
@@ -231,7 +265,12 @@ static const SlirpCb slirp_cb = {
 .send_packet = net_slirp_send_packet,
 .guest_error = net_slirp_guest_error,
 .clock_get_ns = net_slirp_clock_get_ns,
+#if SLIRP_CHECK_VERSION(4,7,0)
+.init_completed = net_slirp_init_completed,
+.timer_new_opaque = net_slirp_timer_new_opaque,
+#else
 .timer_new = net_slirp_timer_new,
+#endif
 .timer_free = net_slirp_timer_free,
 .timer_mod = net_slirp_timer_mod,
 .register_poll_fd = net_slirp_register_poll_fd,
@@ -578,7 +617,7 @@ static int net_slirp_init(NetClientState *peer, const char 
*model,
 
 s = DO_UPCAST(SlirpState, nc, nc);
 
-cfg.version = 3;
+cfg.version = SLIRP_CHECK_VERSION(4,7,0) ? 4 : 3;
 cfg.restricted = restricted;
 cfg.in_enabled = ipv4;
 cfg.vnetwork = net;
-- 
2.35.1





[RFC PATCH 1/4] net: slirp: introduce a wrapper struct for QemuTimer

2022-04-12 Thread Paolo Bonzini
This struct will be extended in the next few patches to support the
new slirp_handle_timer() call.  For that we need to store an additional
"int" for each SLIRP timer, in addition to the cb_opaque.

Signed-off-by: Paolo Bonzini 
---
 net/slirp.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/net/slirp.c b/net/slirp.c
index bc5e9e4f77..f1e25d741f 100644
--- a/net/slirp.c
+++ b/net/slirp.c
@@ -184,23 +184,32 @@ static int64_t net_slirp_clock_get_ns(void *opaque)
 return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
 }
 
+struct SlirpTimer {
+QEMUTimer timer;
+}
+
 static void *net_slirp_timer_new(SlirpTimerCb cb,
  void *cb_opaque, void *opaque)
 {
-return timer_new_full(NULL, QEMU_CLOCK_VIRTUAL,
-  SCALE_MS, QEMU_TIMER_ATTR_EXTERNAL,
-  cb, cb_opaque);
+SlirpTimer *t = g_new(SlirpTimer, 1);
+timer_init_full(>timer, NULL, QEMU_CLOCK_VIRTUAL,
+SCALE_MS, QEMU_TIMER_ATTR_EXTERNAL,
+cb, cb_opaque);
+return t;
 }
 
 static void net_slirp_timer_free(void *timer, void *opaque)
 {
-timer_free(timer);
+SlirpTimer *t = timer;
+timer_del(>timer);
+g_free(t);
 }
 
 static void net_slirp_timer_mod(void *timer, int64_t expire_timer,
 void *opaque)
 {
-timer_mod(timer, expire_timer);
+SlirpTimer *t = timer;
+timer_mod(>timer, expire_timer);
 }
 
 static void net_slirp_register_poll_fd(int fd, void *opaque)
-- 
2.35.1





Re: [PATCH] Warn user if the vga flag is passed but no vga device is created

2022-04-12 Thread Thomas Huth



 Hi,

thanks for your patch, looks pretty good already, but there is a small 
issue: Try for example:


 ./qemu-system-s390x -vga none

... and it will print the warning "qemu-system-s390x: warning: No vga device 
is created", though the user only asked for no VGA device. This seems to 
happen if a machine does not have any VGA device by default, but still 
requests "-vga none" on the command line.


Some more comments below...

On 08/04/2022 12.45, Gautam Agrawal wrote:

This patch is in regards to this 
issue:https://gitlab.com/qemu-project/qemu/-/issues/581#.


Better write this right in front of your Signed-off-by line:

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/581

... then the ticket will be automatically be closed once your patch gets merged.


A global boolean variable "vga_interface_created"(declared in softmmu/globals.c)
has been used to track the creation of vga interface. If the vga flag is passed 
in the command
line "default_vga"(declared in softmmu/vl.c) variable is set to 0. To warn 
user, the condition
checks if vga_interface_created is false and default_vga is equal to 0.

The warning "No vga device is created" is logged if vga flag is passed
but no vga device is created. This patch has been tested for
x86_64, i386, sparc, sparc64 and arm boards.

Signed-off-by: Gautam Agrawal 
---
  hw/isa/isa-bus.c| 1 +
  hw/pci/pci.c| 1 +
  hw/sparc/sun4m.c| 2 ++
  hw/sparc64/sun4u.c  | 1 +
  include/sysemu/sysemu.h | 1 +
  softmmu/globals.c   | 1 +
  softmmu/vl.c| 3 +++
  7 files changed, 10 insertions(+)


vga_interface_type is also used in hw/mips/fuloong2e.c and 
hw/xenpv/xen_machine_pv.c ... do they need a change, too?



diff --git a/hw/isa/isa-bus.c b/hw/isa/isa-bus.c
index 0ad1c5fd65..cd5ad3687d 100644
--- a/hw/isa/isa-bus.c
+++ b/hw/isa/isa-bus.c
@@ -166,6 +166,7 @@ bool isa_realize_and_unref(ISADevice *dev, ISABus *bus, 
Error **errp)
  
  ISADevice *isa_vga_init(ISABus *bus)

  {
+vga_interface_created = true;
  switch (vga_interface_type) {
  case VGA_CIRRUS:
  return isa_create_simple(bus, "isa-cirrus-vga");
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index dae9119bfe..fab9c80f8d 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2038,6 +2038,7 @@ PCIDevice *pci_nic_init_nofail(NICInfo *nd, PCIBus 
*rootbus,
  
  PCIDevice *pci_vga_init(PCIBus *bus)

  {
+vga_interface_created = true;
  switch (vga_interface_type) {
  case VGA_CIRRUS:
  return pci_create_simple(bus, -1, "cirrus-vga");
diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
index 7f3a7c0027..f45e29acc8 100644
--- a/hw/sparc/sun4m.c
+++ b/hw/sparc/sun4m.c
@@ -921,6 +921,7 @@ static void sun4m_hw_init(MachineState *machine)
  /* sbus irq 5 */
  cg3_init(hwdef->tcx_base, slavio_irq[11], 0x0010,
   graphic_width, graphic_height, graphic_depth);
+vga_interface_created = true;
  } else {
  /* If no display specified, default to TCX */
  if (graphic_depth != 8 && graphic_depth != 24) {
@@ -936,6 +937,7 @@ static void sun4m_hw_init(MachineState *machine)
  
  tcx_init(hwdef->tcx_base, slavio_irq[11], 0x0010,

   graphic_width, graphic_height, graphic_depth);
+vga_interface_created = true;
  }
  }
  
diff --git a/hw/sparc64/sun4u.c b/hw/sparc64/sun4u.c

index cda7df36e3..75334dba71 100644
--- a/hw/sparc64/sun4u.c
+++ b/hw/sparc64/sun4u.c
@@ -633,6 +633,7 @@ static void sun4uv_init(MemoryRegion *address_space_mem,
  switch (vga_interface_type) {
  case VGA_STD:
  pci_create_simple(pci_busA, PCI_DEVFN(2, 0), "VGA");
+vga_interface_created = true;
  break;
  case VGA_NONE:
  break;
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index b9421e03ff..a558b895e4 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -32,6 +32,7 @@ typedef enum {
  } VGAInterfaceType;
  
  extern int vga_interface_type;

+extern bool vga_interface_created;
  
  extern int graphic_width;

  extern int graphic_height;
diff --git a/softmmu/globals.c b/softmmu/globals.c
index 3ebd718e35..1a5f8d42ad 100644
--- a/softmmu/globals.c
+++ b/softmmu/globals.c
@@ -40,6 +40,7 @@ int nb_nics;
  NICInfo nd_table[MAX_NICS];
  int autostart = 1;
  int vga_interface_type = VGA_NONE;
+bool vga_interface_created = false;


This will trigger a warning from the scripts/checkpatch.pl script:

ERROR: do not initialise globals to 0 or NULL
#238: FILE: softmmu/globals.c:43:
+bool vga_interface_created = false;


  Chardev *parallel_hds[MAX_PARALLEL_PORTS];
  int win2k_install_hack;
  int singlestep;
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 6f646531a0..cb79fa1f42 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2734,6 +2734,9 @@ static void qemu_machine_creation_done(void)
  if (foreach_device_config(DEV_GDB, gdbserver_start) < 0) {
  exit(1);
  }

[RFC PATCH 0/4] net: support for CFI with libslirp >= 4.7

2022-04-12 Thread Paolo Bonzini
A system libslirp (either static or dynamic) cannot be used with QEMU if
QEMU is compiled with control-flow instrumentation, because of the way
timers are implemented in libslirp.   libslirp passes a function pointer
to the timer_new callback but the type information for the callback is
missing; invoking the timer callback produces a CFI false positive.

The fix requires the introduction of new interfaces in
libslirp.  This series is an example of how QEMU would use
the new interfaces introduced by libslirp merge request at
https://gitlab.freedesktop.org/slirp/libslirp/-/merge_requests/117.
It is RFC-only because the new interfaces have not been accepted yet.

Paolo Bonzini (4):
  net: slirp: introduce a wrapper struct for QemuTimer
  net: slirp: switch to slirp_new
  net: slirp: add support for CFI-friendly timer API
  net: slirp: allow CFI with libslirp >= 4.7

 meson.build | 24 +++
 net/slirp.c | 85 ++---
 2 files changed, 86 insertions(+), 23 deletions(-)

-- 
2.35.1




Re: [PATCH v5 09/13] KVM: Handle page fault for private memory

2022-04-12 Thread Chao Peng
On Tue, Mar 29, 2022 at 01:07:18AM +, Sean Christopherson wrote:
> On Thu, Mar 10, 2022, Chao Peng wrote:
> > @@ -3890,7 +3893,59 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu 
> > *vcpu, gpa_t cr2_or_gpa,
> >   kvm_vcpu_gfn_to_hva(vcpu, gfn), );
> >  }
> >  
> > -static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault 
> > *fault, int *r)
> > +static bool kvm_vcpu_is_private_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
> > +{
> > +   /*
> > +* At this time private gfn has not been supported yet. Other patch
> > +* that enables it should change this.
> > +*/
> > +   return false;
> > +}
> > +
> > +static bool kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
> > +   struct kvm_page_fault *fault,
> > +   bool *is_private_pfn, int *r)
> 
> @is_private_pfn should be a field in @fault, not a separate parameter, and it
> should be a const property set by the original caller.  I would also name it
> "is_private", because if KVM proceeds past this point, it will be a property 
> of
> the fault/access _and_ the pfn
> 
> I say it's a property of the fault because the below kvm_vcpu_is_private_gfn()
> should instead be:
> 
>   if (fault->is_private)
> 
> The kvm_vcpu_is_private_gfn() check is TDX centric.  For SNP, private vs. 
> shared
> is communicated via error code.  For software-only (I'm being optimistic ;-) 
> ),
> we'd probably need to track private vs. shared internally in KVM, I don't 
> think
> we'd want to force it to be a property of the gfn.

Make sense.

> 
> Then you can also move the fault->is_private waiver into 
> is_page_fault_stale(),
> and drop the local is_private_pfn in direct_page_fault().
> 
> > +{
> > +   int order;
> > +   unsigned int flags = 0;
> > +   struct kvm_memory_slot *slot = fault->slot;
> > +   long pfn = kvm_memfile_get_pfn(slot, fault->gfn, );
> 
> If get_lock_pfn() and thus kvm_memfile_get_pfn() returns a pure error code 
> instead
> of multiplexing the pfn, then this can be:
> 
>   bool is_private_pfn;
> 
>   is_private_pfn = !!kvm_memfile_get_pfn(slot, fault->gfn, >pfn, 
> );
> 
> That self-documents the "pfn < 0" == shared logic.

Yes, agreed.

> 
> > +
> > +   if (kvm_vcpu_is_private_gfn(vcpu, fault->addr >> PAGE_SHIFT)) {
> > +   if (pfn < 0)
> > +   flags |= KVM_MEMORY_EXIT_FLAG_PRIVATE;
> > +   else {
> > +   fault->pfn = pfn;
> > +   if (slot->flags & KVM_MEM_READONLY)
> > +   fault->map_writable = false;
> > +   else
> > +   fault->map_writable = true;
> > +
> > +   if (order == 0)
> > +   fault->max_level = PG_LEVEL_4K;
> 
> This doesn't correctly handle order > 0, but less than the next page size, in 
> which
> case max_level needs to be PG_LEVEL_4k.  It also doesn't handle the case where
> max_level > PG_LEVEL_2M.
> 
> That said, I think the proper fix is to have the get_lock_pfn() API return 
> the max
> mapping level, not the order.  KVM, and presumably any other secondary MMU 
> that might
> use these APIs, doesn't care about the order of the struct page, KVM cares 
> about the
> max size/level of page it can map into the guest.  And similar to the 
> previous patch,
> "order" is specific to struct page, which we are trying to avoid.

I remembered I suggested return max mapping level instead of order but
Kirill reminded me that PG_LEVEL_* is x86 specific, then changed back
to 'order'. It's just a matter of backing store or KVM to convert
'order' to mapping level.

> 
> > +   *is_private_pfn = true;
> 
> This is where KVM guarantees that is_private_pfn == fault->is_private.
> 
> > +   *r = RET_PF_FIXED;
> > +   return true;
> 
> Ewww.  This is super confusing.  Ditto for the "*r = -1" magic number.  I 
> totally
> understand why you took this approach, it's just hard to follow because it 
> kinda
> follows the kvm_faultin_pfn() semantics, but then inverts true and false in 
> this
> one case.
> 
> I think the least awful option is to forego the helper and open code 
> everything.
> If we ever refactor kvm_faultin_pfn() to be less weird then we can maybe move 
> this
> to a helper.
> 
> Open coding isn't too bad if you reorganize things so that the 
> exit-to-userspace
> path is a dedicated, early check.  IMO, it's a lot easier to read this way, 
> open
> coded or not.

Yes the existing way of handling this is really awful, including the handling 
for 'r'
that will be finally return to KVM_RUN as part of the uAPI. Let me try your 
above
suggestion.

> 
> I think this is correct?  "is_private_pfn" and "level" are locals, everything 
> else
> is in @fault.
> 
>   if (kvm_slot_is_private(slot)) {
>   is_private_pfn = !!kvm_memfile_get_pfn(slot, fault->gfn,
>  

Re: [PATCH for-7.1] target/i386: Remove unused XMMReg, YMMReg types and CPUState fields

2022-04-12 Thread Paolo Bonzini
Queued, thanks.

Paolo





[PATCH for-7.1] target/mips: Remove stale TODO file

2022-04-12 Thread Thomas Huth
The last change to this file has been done in 2012, so it
seems like this is not really used anymore, and the content
is likely very out of date now.

Signed-off-by: Thomas Huth 
---
 target/mips/TODO | 51 
 1 file changed, 51 deletions(-)
 delete mode 100644 target/mips/TODO

diff --git a/target/mips/TODO b/target/mips/TODO
deleted file mode 100644
index 1d782d8027..00
--- a/target/mips/TODO
+++ /dev/null
@@ -1,51 +0,0 @@
-Unsolved issues/bugs in the mips/mipsel backend

-
-General

-- Unimplemented ASEs:
-  - MDMX
-  - SmartMIPS
-  - microMIPS DSP r1 & r2 encodings
-- MT ASE only partially implemented and not functional
-- Shadow register support only partially implemented,
-  lacks set switching on interrupt/exception.
-- 34K ITC not implemented.
-- A general lack of documentation, especially for technical internals.
-  Existing documentation is x86-centric.
-- Reverse endianness bit not implemented
-- The TLB emulation is very inefficient:
-  QEMU's softmmu implements a x86-style MMU, with separate entries
-  for read/write/execute, a TLB index which is just a modulo of the
-  virtual address, and a set of TLBs for each user/kernel/supervisor
-  MMU mode.
-  MIPS has a single entry for read/write/execute and only one MMU mode.
-  But it is fully associative with randomized entry indices, and uses
-  up to 256 ASID tags as additional matching criterion (which roughly
-  equates to 256 MMU modes). It also has a global flag which causes
-  entries to match regardless of ASID.
-  To cope with these differences, QEMU currently flushes the TLB at
-  each ASID change. Using the MMU modes to implement ASIDs hinges on
-  implementing the global bit efficiently.
-- save/restore of the CPU state is not implemented (see machine.c).
-
-MIPS64
---
-- Userland emulation (both n32 and n64) not functional.
-
-"Generic" 4Kc system emulation
---
-- Doesn't correspond to any real hardware. Should be removed some day,
-  U-Boot is the last remaining user.
-
-PICA 61 system emulation
-
-- No framebuffer support yet.
-
-MALTA system emulation
---
-- We fake firmware support instead of doing the real thing
-- Real firmware (YAMON) falls over when trying to init RAM, presumably
-  due to lacking system controller emulation.
-- Bonito system controller not implemented
-- MSC1 system controller not implemented
-- 
2.27.0




Re: [RFC PATCH] gdb/gic: expose cpu_index via MxTxAttrs

2022-04-12 Thread Alex Bennée


Peter Maydell  writes:

> On Tue, 12 Apr 2022 at 11:45, Alex Bennée  wrote:
>>
>> When accessing HW via the gdbstub we can't easily figure out what the
>> cpu_index is. The canonical case is current_cpu but for some cases
>> that will be NULL. For debug accesses we can overload requester_id and
>> make the GIC a bit smarter about fishing that out.
>>
>> [AJB: very much a PoC hack for now but interested if this makes sense.
>> We could encode cpu_index in another field but that would grow
>> MxTxAttrs even more.]
>>
>> Signed-off-by: Alex Bennée 
>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/124
>> ---
>>  include/exec/memattrs.h |  2 +-
>>  hw/core/cpu-sysemu.c| 15 +++
>>  hw/intc/arm_gic.c   | 33 +++--
>>  3 files changed, 31 insertions(+), 19 deletions(-)
>>
>> diff --git a/include/exec/memattrs.h b/include/exec/memattrs.h
>> index 9fb98bc1ef..1333a34cb3 100644
>> --- a/include/exec/memattrs.h
>> +++ b/include/exec/memattrs.h
>> @@ -43,7 +43,7 @@ typedef struct MemTxAttrs {
>>   * (see MEMTX_ACCESS_ERROR).
>>   */
>>  unsigned int memory:1;
>> -/* Requester ID (for MSI for example) */
>> +/* Requester ID (for MSI for example) or cpu_index for debug */
>>  unsigned int requester_id:16;
>
> If we want to provide a requester ID for memory transactions we
> should provide it always, not just for debug. That way gic_get_current_cpu()
> and similar code can unconditionally use requester_id and never needs
> to look at current_cpu. (We would also need to figure out how we want
> to parcel out requester_ids in the system, so that PCI requester IDs
> don't clash with CPU requester IDs.)

We could have a requester_type field (0 for CPU, 1 for PCI for now)?

>
> -- PMM


-- 
Alex Bennée



[Qemu-devel] [PATCH 6/8] i386/kvm: hv-stimer requires hv-time and hv-synic

2022-04-12 Thread Divya Garg

Hi Vitaly Kuznetsov !
I was working on hyperv flags and saw that we introduced new 
dependencies some
time back 
(https://sourcegraph.com/github.com/qemu/qemu/-/commit/c686193072a47032d83cb4e131dc49ae30f9e5d7?visible=1).

After these changes, if we try to live migrate a vm from older qemu to newer
one having these changes, it fails showing dependency issue.

I was wondering if this is the expected behaviour or if there is any work
around for handing it ? Or something needs to be done to ensure backward
compatibility ?

Thankyou
Regards
Divya Garg



Re: [PATCH V6 21/27] vfio-pci: cpr part 3 (intx)

2022-04-12 Thread Fam Zheng
On 2022-04-11 12:23, Steven Sistare wrote:
> On 3/29/2022 7:03 AM, Fam Zheng wrote:
> > On 2021-08-06 14:43, Steve Sistare wrote:
> >> Preserve vfio INTX state across cpr restart.  Preserve VFIOINTx fields as
> >> follows:
> >>   pin : Recover this from the vfio config in kernel space
> >>   interrupt : Preserve its eventfd descriptor across exec.
> >>   unmask : Ditto
> >>   route.irq : This could perhaps be recovered in vfio_pci_post_load by
> >> calling pci_device_route_intx_to_irq(pin), whose implementation reads
> >> config space for a bridge device such as ich9.  However, there is no
> >> guarantee that the bridge vmstate is read before vfio vmstate.  Rather
> >> than fiddling with MigrationPriority for vmstate handlers, explicitly
> >> save route.irq in vfio vmstate.
> >>   pending : save in vfio vmstate.
> >>   mmap_timeout, mmap_timer : Re-initialize
> >>   bool kvm_accel : Re-initialize
> >>
> >> In vfio_realize, defer calling vfio_intx_enable until the vmstate
> >> is available, in vfio_pci_post_load.  Modify vfio_intx_enable and
> >> vfio_intx_kvm_enable to skip vfio initialization, but still perform
> >> kvm initialization.
> >>
> >> Signed-off-by: Steve Sistare 
> > 
> > Hi Steve,
> > 
> > Not directly related to this patch, but since the context is close: it looks
> > like this series only takes care of exec restart mode of vfio-pci, have you 
> > had
> > any thoughts on kexec reboot mode with vfio-pci?
> > 
> > The general idea is if DMAR context is not lost during kexec, we should be 
> > able
> > to set up irqfds again and things will just work?
> > 
> > Fam
> 
> Hi Fam,
>   I have thought about that use case, but only in general terms.
> IMO it best fits in the cpr framework as a new mode (rather than as 
> a new -restore command line argument).  

Yes I think that is better, I will try that.

> 
> In your code below, you would have fewer code changes if you set 
> 'reused = true' for the new mode, rather than testing both 'reused and 
> restored' 
> at multiple sites. Lastly, I cleaned up the vector handling somewhat from V6 
> to V7, so you may want to try your code using V7 as a base.

I am cleaning up the kernel patches and will post both parts once ready.

Fam



Re: [RFC PATCH] gdb/gic: expose cpu_index via MxTxAttrs

2022-04-12 Thread Peter Maydell
On Tue, 12 Apr 2022 at 11:45, Alex Bennée  wrote:
>
> When accessing HW via the gdbstub we can't easily figure out what the
> cpu_index is. The canonical case is current_cpu but for some cases
> that will be NULL. For debug accesses we can overload requester_id and
> make the GIC a bit smarter about fishing that out.
>
> [AJB: very much a PoC hack for now but interested if this makes sense.
> We could encode cpu_index in another field but that would grow
> MxTxAttrs even more.]
>
> Signed-off-by: Alex Bennée 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/124
> ---
>  include/exec/memattrs.h |  2 +-
>  hw/core/cpu-sysemu.c| 15 +++
>  hw/intc/arm_gic.c   | 33 +++--
>  3 files changed, 31 insertions(+), 19 deletions(-)
>
> diff --git a/include/exec/memattrs.h b/include/exec/memattrs.h
> index 9fb98bc1ef..1333a34cb3 100644
> --- a/include/exec/memattrs.h
> +++ b/include/exec/memattrs.h
> @@ -43,7 +43,7 @@ typedef struct MemTxAttrs {
>   * (see MEMTX_ACCESS_ERROR).
>   */
>  unsigned int memory:1;
> -/* Requester ID (for MSI for example) */
> +/* Requester ID (for MSI for example) or cpu_index for debug */
>  unsigned int requester_id:16;

If we want to provide a requester ID for memory transactions we
should provide it always, not just for debug. That way gic_get_current_cpu()
and similar code can unconditionally use requester_id and never needs
to look at current_cpu. (We would also need to figure out how we want
to parcel out requester_ids in the system, so that PCI requester IDs
don't clash with CPU requester IDs.)

-- PMM



Re: [PATCH] block: fix core for unlock not permitted

2022-04-12 Thread Paolo Bonzini

On 4/12/22 09:13, suruifeng via wrote:

qemu coredump:
   0x7f9e7205c81b in raise () from /usr/lib64/libc.so.6
   0x7f9e7205db41 in abort () from /usr/lib64/libc.so.6
   0x7f9e71ddbe94 in error_exit (err=, msg=msg@entry=0x7f9e71ec1b50 
<__func__.20287> "qemu_mutex_unlock_impl")
 at /usr/src/debug/qemu-4.1.0-170.x86_64/util/qemu-thread-posix.c:36
   0x7f9e71ddc61f in qemu_mutex_unlock_impl (mutex=mutex@entry=0x5559850b0b90, 
file=file@entry=0x7f9e71ec0978 
"/home/abuild/rpmbuild/BUILD/qemu-4.1.0/util/async.c",
 line=line@entry=524) at 
/usr/src/debug/qemu-4.1.0-170.x86_64/util/qemu-thread-posix.c:108
   0x7f9e71dd5bb5 in aio_context_release (ctx=ctx@entry=0x5559850b0b30) at 
/usr/src/debug/qemu-4.1.0-170.x86_64/util/async.c:524
   0x7f9e70dfed28 in bdrv_flush (bs=bs@entry=0x5559851f0a20) at 
/usr/src/debug/qemu-4.1.0-170.x86_64/block/io.c:2778
   0x7f9e70e37f63 in bdrv_close (bs=bs@entry=0x5559851f0a20) at 
/usr/src/debug/qemu-4.1.0-170.x86_64/block.c:4025
   0x7f9e70e38193 in bdrv_delete (bs=0x5559851f0a20) at 
/usr/src/debug/qemu-4.1.0-170.x86_64/block.c:4271
   0x7f9e70e38225 in bdrv_unref (bs=) at 
/usr/src/debug/qemu-4.1.0-170.x86_64/block.c:5612
   0x7f9e70df9a92 in bdrv_next (it=it@entry=0x7ffc5e3547a0) at 
/usr/src/debug/qemu-4.1.0-170.x86_64/block/block-backend.c:576
   0x7f9e70dfee76 in bdrv_flush_all () at 
/usr/src/debug/qemu-4.1.0-170.x86_64/block/io.c:2074
   0x7f9e71e3a08f in do_vm_stop (state=state@entry=RUN_STATE_SHUTDOWN, 
send_stop=send_stop@entry=false) at 
/usr/src/debug/qemu-4.1.0-170.x86_64/cpus.c:1140
   0x7f9e71e3a14c in vm_shutdown () at 
/usr/src/debug/qemu-4.1.0-170.x86_64/cpus.c:1151

During mirror job run, the VM is shutdown. During the shutdown, the mirror job 
I/O error triggers mirror_exit_commom.
In bdrv_flush_all(), bdrv_next() increase the ref to mirror_top_bs first,
and then bdrv_flush(bs) call BDRV_POLL_WHILE and executes mirror_exit_common() 
decreases ref to mirror_top_bs,
and finally bdrv_next() decreases the ref to mirror_top_bs, resulting in 
release mirror_top_bs.

Let's fix this by adding aio_context_acquire() and aio_context_release() to 
bdrv_next().


Hi,

is this reproducible with a more recent version of QEMU?  In particular, 
bdrv_next does not have bdrv_unref anymore.


Paolo


Signed-off-by: suruifeng 
---
  block/block-backend.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index e0e1aff4b1..5ae745c0ab 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -593,6 +593,7 @@ BlockBackend *blk_next(BlockBackend *blk)
  BlockDriverState *bdrv_next(BdrvNextIterator *it)
  {
  BlockDriverState *bs, *old_bs;
+AioContext *ctx = NULL;
  
  /* Must be called from the main loop */

  assert(qemu_get_current_aio_context() == qemu_get_aio_context());
@@ -613,11 +614,17 @@ BlockDriverState *bdrv_next(BdrvNextIterator *it)
  if (it->blk) {
  blk_ref(it->blk);
  }
+   ctx = blk_get_aio_context(old_blk);
+   aio_context_acquire(ctx);
  blk_unref(old_blk);
+   aio_context_release(ctx);
  
  if (bs) {

  bdrv_ref(bs);
+   ctx = bdrv_get_aio_context(old_bs);
+   aio_context_acquire(ctx);
  bdrv_unref(old_bs);
+   aio_context_release(ctx);
  return bs;
  }
  it->phase = BDRV_NEXT_MONITOR_OWNED;
@@ -636,7 +643,10 @@ BlockDriverState *bdrv_next(BdrvNextIterator *it)
  if (bs) {
  bdrv_ref(bs);
  }
+ctx = bdrv_get_aio_context(old_bs);
+aio_context_acquire(ctx);
  bdrv_unref(old_bs);
+aio_context_release(ctx);
  
  return bs;

  }





[PATCH for-7.1] target/i386: Remove unused XMMReg, YMMReg types and CPUState fields

2022-04-12 Thread Peter Maydell
In commit b7711471f5 in 2014 we refactored the handling of the x86
vector registers so that instead of separate structs XMMReg, YMMReg
and ZMMReg for representing the 16-byte, 32-byte and 64-byte width
vector registers and multiple fields in the CPU state, we have a
single type (XMMReg, later renamed to ZMMReg) and a single struct
field (xmm_regs).  However, in 2017 in commit c97d6d2cdf97ed some of
the old struct types and CPU state fields got added back, when we
merged in the hvf support (which had developed in a separate fork
that had presumably not had the refactoring of b7711471f5), as part
of code handling xsave.  Commit f585195ec07 then almost immediately
dropped that xsave code again in favour of sharing the xsave handling
with KVM, but forgot to remove the now unused CPU state fields and
struct types.

Delete the unused types and CPUState fields.

Signed-off-by: Peter Maydell 
---
 target/i386/cpu.h | 18 --
 1 file changed, 18 deletions(-)

diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 982c5323537..77b4f5696cf 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1217,20 +1217,6 @@ typedef struct SegmentCache {
 float64  _d_##n[(bits)/64]; \
 }
 
-typedef union {
-uint8_t _b[16];
-uint16_t _w[8];
-uint32_t _l[4];
-uint64_t _q[2];
-} XMMReg;
-
-typedef union {
-uint8_t _b[32];
-uint16_t _w[16];
-uint32_t _l[8];
-uint64_t _q[4];
-} YMMReg;
-
 typedef MMREG_UNION(ZMMReg, 512) ZMMReg;
 typedef MMREG_UNION(MMXReg, 64)  MMXReg;
 
@@ -1529,11 +1515,7 @@ typedef struct CPUArchState {
 ZMMReg xmm_t0;
 MMXReg mmx_t0;
 
-XMMReg ymmh_regs[CPU_NB_REGS];
-
 uint64_t opmask_regs[NB_OPMASK_REGS];
-YMMReg zmmh_regs[CPU_NB_REGS];
-ZMMReg hi16_zmm_regs[CPU_NB_REGS];
 #ifdef TARGET_X86_64
 uint8_t xtilecfg[64];
 uint8_t xtiledata[8192];
-- 
2.25.1




Re: [PATCH 1/2] gdbstub: Set current_cpu for memory read write

2022-04-12 Thread Alex Bennée


Bin Meng  writes:

> On Sat, Apr 2, 2022 at 7:20 PM Bin Meng  wrote:
>>
>> On Tue, Mar 29, 2022 at 12:43 PM Bin Meng  wrote:
>> >
>> > On Mon, Mar 28, 2022 at 5:10 PM Peter Maydell  
>> > wrote:
>> > >
>> > > On Mon, 28 Mar 2022 at 03:10, Bin Meng  wrote:
>> > > > IMHO it's too bad to just ignore this bug forever.
>> > > >
>> > > > This is a valid use case. It's not about whether we intentionally want
>> > > > to inspect the GIC register value from gdb. The case is that when
>> > > > single stepping the source codes it triggers the core dump for no
>> > > > reason if the instructions involved contain load/store to any of the
>> > > > GIC registers.
>> > >
>> > > Huh? Single-stepping the instruction should execute it inside
>> > > QEMU, which will do the load in the usual way. That should not
>> > > be going via gdbstub reads and writes.
>> >
>> > Yes, single-stepping the instruction is executed in the vCPU context,
>> > but a gdb client sends additional commands, more than just telling
>> > QEMU to execute a single instruction.
>> >
>> > For example, the following is the sequence a gdb client sent when doing a 
>> > "si":
>> >
>> > gdbstub_io_command Received: Z0,10,4
>> > gdbstub_io_reply Sent: OK
>> > gdbstub_io_got_ack Got ACK
>> > gdbstub_io_command Received: m18c430,4
>> > gdbstub_io_reply Sent: ff430091
>> > gdbstub_io_got_ack Got ACK
>> > gdbstub_io_command Received: vCont;s:p1.1;c:p1.-1
>> > gdbstub_op_stepping Stepping CPU 0
>> > gdbstub_op_continue_cpu Continuing CPU 1
>> > gdbstub_op_continue_cpu Continuing CPU 2
>> > gdbstub_op_continue_cpu Continuing CPU 3
>> > gdbstub_hit_break RUN_STATE_DEBUG
>> > gdbstub_io_reply Sent: T05thread:p01.01;
>> > gdbstub_io_got_ack Got ACK
>> > gdbstub_io_command Received: g
>> > gdbstub_io_reply Sent:
>> > 3848ed00f08fa6100300010001f930a5ec0034c41800c903
>> > gdbstub_io_got_ack Got ACK
>> > gdbstub_io_command Received: m18c434,4
>> > gdbstub_io_reply Sent: 00e004d1
>> > gdbstub_io_got_ack Got ACK
>> > gdbstub_io_command Received: m18c430,4
>> > gdbstub_io_reply Sent: ff430091
>> > gdbstub_io_got_ack Got ACK
>> > gdbstub_io_command Received: m18c434,4
>> > gdbstub_io_reply Sent: 00e004d1
>> > gdbstub_io_got_ack Got ACK
>> > gdbstub_io_command Received: m18c400,40
>> > gdbstub_io_reply Sent:
>> > ff4300d1e00300f98037005840f900a0019140f900b0009140f900e004911e7800f9fe0340f91ef9ff43009100e004d174390094bb390094
>> > gdbstub_io_got_ack Got ACK
>> > gdbstub_io_command Received: mf901,4
>> >
>> > Here "mf901,4" triggers the bug where 0xf901 is the GIC register.
>> >
>> > This is not something QEMU can ignore or control. The logic is inside
>> > the gdb client.
>> >
>>
>> Ping for this series?
>>
>
> Ping?

Can you have a look at:

  Subject: [RFC PATCH] gdb/gic: expose cpu_index via MxTxAttrs
  Date: Tue, 12 Apr 2022 11:45:19 +0100
  Message-Id: <20220412104519.201655-1-alex.ben...@linaro.org>

and let me know what you think? 

-- 
Alex Bennée



[RFC PATCH] gdb/gic: expose cpu_index via MxTxAttrs

2022-04-12 Thread Alex Bennée
When accessing HW via the gdbstub we can't easily figure out what the
cpu_index is. The canonical case is current_cpu but for some cases
that will be NULL. For debug accesses we can overload requester_id and
make the GIC a bit smarter about fishing that out.

[AJB: very much a PoC hack for now but interested if this makes sense.
We could encode cpu_index in another field but that would grow
MxTxAttrs even more.]

Signed-off-by: Alex Bennée 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/124
---
 include/exec/memattrs.h |  2 +-
 hw/core/cpu-sysemu.c| 15 +++
 hw/intc/arm_gic.c   | 33 +++--
 3 files changed, 31 insertions(+), 19 deletions(-)

diff --git a/include/exec/memattrs.h b/include/exec/memattrs.h
index 9fb98bc1ef..1333a34cb3 100644
--- a/include/exec/memattrs.h
+++ b/include/exec/memattrs.h
@@ -43,7 +43,7 @@ typedef struct MemTxAttrs {
  * (see MEMTX_ACCESS_ERROR).
  */
 unsigned int memory:1;
-/* Requester ID (for MSI for example) */
+/* Requester ID (for MSI for example) or cpu_index for debug */
 unsigned int requester_id:16;
 /* Invert endianness for this page */
 unsigned int byte_swap:1;
diff --git a/hw/core/cpu-sysemu.c b/hw/core/cpu-sysemu.c
index 00253f8929..77f0b1a289 100644
--- a/hw/core/cpu-sysemu.c
+++ b/hw/core/cpu-sysemu.c
@@ -51,13 +51,20 @@ hwaddr cpu_get_phys_page_attrs_debug(CPUState *cpu, vaddr 
addr,
  MemTxAttrs *attrs)
 {
 CPUClass *cc = CPU_GET_CLASS(cpu);
+MemTxAttrs local = { };
+hwaddr res;
 
 if (cc->sysemu_ops->get_phys_page_attrs_debug) {
-return cc->sysemu_ops->get_phys_page_attrs_debug(cpu, addr, attrs);
+res = cc->sysemu_ops->get_phys_page_attrs_debug(cpu, addr, );
+} else {
+/* Fallback for CPUs which don't implement the _attrs_ hook */
+local = MEMTXATTRS_UNSPECIFIED;
+res = cc->sysemu_ops->get_phys_page_debug(cpu, addr);
 }
-/* Fallback for CPUs which don't implement the _attrs_ hook */
-*attrs = MEMTXATTRS_UNSPECIFIED;
-return cc->sysemu_ops->get_phys_page_debug(cpu, addr);
+
+local.requester_id = cpu->cpu_index;
+*attrs = local;
+return res;
 }
 
 hwaddr cpu_get_phys_page_debug(CPUState *cpu, vaddr addr)
diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c
index 492b2421ab..6a007a7f9e 100644
--- a/hw/intc/arm_gic.c
+++ b/hw/intc/arm_gic.c
@@ -56,17 +56,22 @@ static const uint8_t gic_id_gicv2[] = {
 0x04, 0x00, 0x00, 0x00, 0x90, 0xb4, 0x2b, 0x00, 0x0d, 0xf0, 0x05, 0xb1
 };
 
-static inline int gic_get_current_cpu(GICState *s)
+static inline int gic_get_current_cpu(GICState *s, MemTxAttrs attrs)
 {
 if (!qtest_enabled() && s->num_cpu > 1) {
-return current_cpu->cpu_index;
+if (current_cpu) {
+return current_cpu->cpu_index;
+} else {
+/* worst case this will be zeroed */
+return attrs.requester_id;
+}
 }
 return 0;
 }
 
-static inline int gic_get_current_vcpu(GICState *s)
+static inline int gic_get_current_vcpu(GICState *s, MemTxAttrs attrs)
 {
-return gic_get_current_cpu(s) + GIC_NCPU;
+return gic_get_current_cpu(s, attrs) + GIC_NCPU;
 }
 
 /* Return true if this GIC config has interrupt groups, which is
@@ -951,7 +956,7 @@ static uint32_t gic_dist_readb(void *opaque, hwaddr offset, 
MemTxAttrs attrs)
 int cm;
 int mask;
 
-cpu = gic_get_current_cpu(s);
+cpu = gic_get_current_cpu(s, attrs);
 cm = 1 << cpu;
 if (offset < 0x100) {
 if (offset == 0) {  /* GICD_CTLR */
@@ -1182,7 +1187,7 @@ static void gic_dist_writeb(void *opaque, hwaddr offset,
 int i;
 int cpu;
 
-cpu = gic_get_current_cpu(s);
+cpu = gic_get_current_cpu(s, attrs);
 if (offset < 0x100) {
 if (offset == 0) {
 if (s->security_extn && !attrs.secure) {
@@ -1476,7 +1481,7 @@ static void gic_dist_writel(void *opaque, hwaddr offset,
 int mask;
 int target_cpu;
 
-cpu = gic_get_current_cpu(s);
+cpu = gic_get_current_cpu(s, attrs);
 irq = value & 0xf;
 switch ((value >> 24) & 3) {
 case 0:
@@ -1780,7 +1785,7 @@ static MemTxResult gic_thiscpu_read(void *opaque, hwaddr 
addr, uint64_t *data,
 unsigned size, MemTxAttrs attrs)
 {
 GICState *s = (GICState *)opaque;
-return gic_cpu_read(s, gic_get_current_cpu(s), addr, data, attrs);
+return gic_cpu_read(s, gic_get_current_cpu(s, attrs), addr, data, attrs);
 }
 
 static MemTxResult gic_thiscpu_write(void *opaque, hwaddr addr,
@@ -1788,7 +1793,7 @@ static MemTxResult gic_thiscpu_write(void *opaque, hwaddr 
addr,
  MemTxAttrs attrs)
 {
 GICState *s = (GICState *)opaque;
-return gic_cpu_write(s, gic_get_current_cpu(s), addr, value, attrs);
+return gic_cpu_write(s, gic_get_current_cpu(s, attrs), addr, value, attrs);
 }
 
 /* Wrappers to read/write 

Re: [PATCH v2 for 7.1 1/1] block: add 'force' parameter to 'blockdev-change-medium' command

2022-04-12 Thread Dr. David Alan Gilbert
* Denis V. Lunev (d...@openvz.org) wrote:
> 'blockdev-change-medium' is a convinient wrapper for the following
> sequence of commands:
>  * blockdev-open-tray
>  * blockdev-remove-medium
>  * blockdev-insert-medium
>  * blockdev-close-tray
> and should be used f.e. to change ISO image inside the CD-ROM tray.
> Though the guest could lock the tray and some linux guests like
> CentOS 8.5 actually does that. In this case the execution if this
> command results in the error like the following:
>   Device 'scsi0-0-1-0' is locked and force was not specified,
>   wait for tray to open and try again.
> 
> This situation is could be resolved 'blockdev-open-tray' by passing
> flag 'force' inside. Thus is seems reasonable to add the same
> capability for 'blockdev-change-medium' too.

For HMP:

Acked-by: Dr. David Alan Gilbert 

(Although I'd be pretty careful with this; a guest OS might feel like
it could ignore anything else that was going on and keep it's data
cached if it had it's drive locked).

Dave

> Signed-off-by: Denis V. Lunev 
> CC: Kevin Wolf 
> CC: Hanna Reitz 
> CC: "Dr. David Alan Gilbert" 
> CC: Eric Blake 
> CC: Markus Armbruster 
> CC: Vladimir Sementsov-Ogievskiy 
> ---
>  block/qapi-sysemu.c |  3 ++-
>  hmp-commands.hx | 11 +++
>  monitor/hmp-cmds.c  |  4 +++-
>  qapi/block.json |  6 ++
>  ui/cocoa.m  |  1 +
>  5 files changed, 19 insertions(+), 6 deletions(-)
> 
> Changes from v1:
> - added kludge to Objective C code
> - simplified a bit call of do_open_tray() (thanks, Vova!)
> - added record to hmp-command.hx
> 
> diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
> index 8498402ad4..680c7ee342 100644
> --- a/block/qapi-sysemu.c
> +++ b/block/qapi-sysemu.c
> @@ -318,6 +318,7 @@ void qmp_blockdev_change_medium(bool has_device, const 
> char *device,
>  bool has_id, const char *id,
>  const char *filename,
>  bool has_format, const char *format,
> +bool has_force, bool force,
>  bool has_read_only,
>  BlockdevChangeReadOnlyMode read_only,
>  Error **errp)
> @@ -380,7 +381,7 @@ void qmp_blockdev_change_medium(bool has_device, const 
> char *device,
>  
>  rc = do_open_tray(has_device ? device : NULL,
>has_id ? id : NULL,
> -  false, );
> +  force, );
>  if (rc && rc != -ENOSYS) {
>  error_propagate(errp, err);
>  goto fail;
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 8476277aa9..6ec593ea08 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -202,9 +202,9 @@ ERST
>  
>  {
>  .name   = "change",
> -.args_type  = "device:B,target:F,arg:s?,read-only-mode:s?",
> -.params = "device filename [format [read-only-mode]]",
> -.help   = "change a removable medium, optional format",
> +.args_type  = "device:B,force:-f,target:F,arg:s?,read-only-mode:s?",
> +.params = "device [-f] filename [format [read-only-mode]]",
> +.help   = "change a removable medium, optional format, use -f to 
> force the operation",
>  .cmd= hmp_change,
>  },
>  
> @@ -212,11 +212,14 @@ SRST
>  ``change`` *device* *setting*
>Change the configuration of a device.
>  
> -  ``change`` *diskdevice* *filename* [*format* [*read-only-mode*]]
> +  ``change`` *diskdevice* [-f] *filename* [*format* [*read-only-mode*]]
>  Change the medium for a removable disk device to point to *filename*. 
> eg::
>  
>(qemu) change ide1-cd0 /path/to/some.iso
>  
> +``-f``
> +  forces the operation even if the guest has locked the tray.
> +
>  *format* is optional.
>  
>  *read-only-mode* may be used to change the read-only status of the 
> device.
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index 634968498b..d8b98bed6c 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -1472,6 +1472,7 @@ void hmp_change(Monitor *mon, const QDict *qdict)
>  const char *target = qdict_get_str(qdict, "target");
>  const char *arg = qdict_get_try_str(qdict, "arg");
>  const char *read_only = qdict_get_try_str(qdict, "read-only-mode");
> +bool force = qdict_get_try_bool(qdict, "force", false);
>  BlockdevChangeReadOnlyMode read_only_mode = 0;
>  Error *err = NULL;
>  
> @@ -1508,7 +1509,8 @@ void hmp_change(Monitor *mon, const QDict *qdict)
>  }
>  
>  qmp_blockdev_change_medium(true, device, false, NULL, target,
> -   !!arg, arg, !!read_only, read_only_mode,
> +   !!arg, arg, true, force,
> +   !!read_only, read_only_mode,
> );
>  }
>  
> diff --git a/qapi/block.json b/qapi/block.json
> index 

Re: Re: [PATCH v4 0/8] Introduce akcipher service for virtio-crypto

2022-04-12 Thread zhenwei pi




On 4/12/22 17:47, Paolo Bonzini wrote:



In our plan, the feature is designed for HTTPS offloading case and
other applications which use kernel RSA/ecdsa by keyctl syscall.


Hi Zhenwei,

what is the % of time spent doing asymmetric key operations in your
benchmark?  I am not very familiar with crypto acceleration but my
understanding has always been that most time is spent doing either
hashing (for signing) or symmetric key operations (for encryption).

If I understand correctly, without support for acceleration these 
patches are more of a demonstration of virtio-crypto, or usable for 
testing purposes.




Hi, Paolo

This is the perf result of nginx+openssl CPU calculation, the heavy load 
from openssl uses the most time(as same as you mentioned).
27.37%26.00%  nginxlibcrypto.so.1.1  [.] 
__bn_sqrx8x_reduction
20.58%19.52%  nginxlibcrypto.so.1.1  [.] 
mulx4x_internal
16.73%15.89%  nginxlibcrypto.so.1.1  [.] 
bn_sqrx8x_internal
 8.79% 0.00%  nginx[unknown] [k] 

 7.26% 0.00%  nginx[unknown] [.] 
0x89388669992a0cbc
 7.00% 0.00%  nginx[unknown] [k] 
0x45f0e480d5f2a58e
 6.76% 0.02%  nginx[kernel.kallsyms] [k] 
entry_SYSCALL_64_after_hwframe
 6.74% 0.02%  nginx[kernel.kallsyms] [k] 
do_syscall_64
 6.61% 0.00%  nginx[unknown] [.] 
0xa75a60d7820f9ffb
 6.47% 0.00%  nginx[unknown] [k] 
0xe91223f6da36254c
 5.51% 0.01%  nginx[kernel.kallsyms] [k] 
asm_common_interrupt
 5.46% 0.01%  nginx[kernel.kallsyms] [k] 
common_interrupt
 5.16% 0.04%  nginx[kernel.kallsyms] [k] 
__softirqentry_text_start
 4.92% 0.01%  nginx[kernel.kallsyms] [k] 
irq_exit_rcu
 4.91% 0.04%  nginx[kernel.kallsyms] [k] 
net_rx_action



This is the result of nginx+openssl keyctl offload(virtio crypto + host 
keyctl + Intel QAT):
30.38% 0.08%  nginx[kernel.kallsyms] [k] 
entry_SYSCALL_64_after_hwframe
30.29% 0.07%  nginx[kernel.kallsyms] [k] 
do_syscall_64
23.84% 0.00%  nginx[unknown] [k] 

14.24% 0.03%  nginx[kernel.kallsyms] [k] 
asm_common_interrupt
14.06% 0.05%  nginx[kernel.kallsyms] [k] 
common_interrupt
12.99% 0.11%  nginx[kernel.kallsyms] [k] 
__softirqentry_text_start
12.27% 0.12%  nginx[kernel.kallsyms] [k] 
net_rx_action

12.13% 0.03%  nginx[kernel.kallsyms] [k] __napi_poll
12.06% 0.06%  nginx[kernel.kallsyms] [k] 
irq_exit_rcu
10.49% 0.14%  nginxlibssl.so.1.1 [.] 
tls_process_client_key_exchange
10.21% 0.12%  nginx[virtio_net]  [k] 
virtnet_poll

10.13% 0.04%  nginxlibc-2.28.so  [.] syscall
10.12% 0.03%  nginxkctl-engine.so[.] 
kctl_rsa_priv_dec
10.02% 0.02%  nginxkctl-engine.so[.] 
kctl_hw_rsa_priv_func
 9.98% 0.01%  nginxlibkeyutils.so.1.10   [.] 
keyctl_pkey_decrypt

 9.95% 0.02%  nginxlibkeyutils.so.1.10   [.] keyctl
 9.77% 0.03%  nginx[kernel.kallsyms] [k] 
keyctl_pkey_e_d_s
 8.97% 0.00%  nginx[unknown] [k] 
0x7f4adbb81f0b
 8.78% 0.08%  nginxlibpthread-2.28.so[.] 
__libc_write
 8.49% 0.05%  nginx[kernel.kallsyms] [k] 
netif_receive_skb_list_internal


The RSA part gets reduced, and the QPS of https improves to ~200%.

Something may be ignored in this cover letter:
[4] Currently RSA is supported only in builtin driver. This driver is 
supposed to test the full feature without other software(Ex vhost 
process) and hardware dependence.

-> Yes, this patch is a demonstration of virtio-crypto.

[5] keyctl backend is in development, we will post this feature in 
Q2-2022. keyctl backend can use hardware acceleration(Ex, Intel QAT).

-> This is our plan. Currently it's still in developing.



Would it be possible to extend virtio-crypto to use keys already in the
host keyctl, or in a PKCS#11 smartcard, so that virtio-crypto could also
provide the functionality of an HSM?  Or does the standard require that
the keys are provided by the guest?

Paolo


I'm very interested in this, I'll try in Q3-2022 or later.

--
zhenwei pi



Re: [PATCH 14/16] target/arm: Implement ESB instruction

2022-04-12 Thread Peter Maydell
On Mon, 11 Apr 2022 at 23:14, Richard Henderson
 wrote:
>
> On 4/11/22 09:18, Peter Maydell wrote:
> >> +  ESB 0011 0010    0001 
> >> +]
> >
> > Why don't we decode bits [11:8] here? I see it's the same
> > as YIELD/WFE/WFI, but I'm not sure why we're not decoding
> > those bits in those insns either...
>
> See page F4-7074 in H.a, where bits [11:8] of the imm12 field are described 
> with ''.

Hmm. That just means "decodes to the NOP/WFI/ESB/whatever
instruction-description whatever the value of those bits",
but when the specific instruction-description then marks
those bits as "(0)" or "(1)", that has the usual CONSTRAINED
UNPREDICTABLE meaning described in section F1.7.2, where
we get a free choice of UNDEF, NOP, ignore the bit, or
any-dest-regs-are-UNKNOWN. So we're within the spec to
not decode [11:8] but I think it would be more consistent
with how we try to handle those (0) and (1) bits generally
if we insist that [11:8] is all zeroes here.

For this series, I guess go along with the current way we
handle hint instructions, and maybe fix this as a separate
cleanup later.

-- PMM



[PATCH v2 for 7.1 1/1] block: add 'force' parameter to 'blockdev-change-medium' command

2022-04-12 Thread Denis V. Lunev
'blockdev-change-medium' is a convinient wrapper for the following
sequence of commands:
 * blockdev-open-tray
 * blockdev-remove-medium
 * blockdev-insert-medium
 * blockdev-close-tray
and should be used f.e. to change ISO image inside the CD-ROM tray.
Though the guest could lock the tray and some linux guests like
CentOS 8.5 actually does that. In this case the execution if this
command results in the error like the following:
  Device 'scsi0-0-1-0' is locked and force was not specified,
  wait for tray to open and try again.

This situation is could be resolved 'blockdev-open-tray' by passing
flag 'force' inside. Thus is seems reasonable to add the same
capability for 'blockdev-change-medium' too.

Signed-off-by: Denis V. Lunev 
CC: Kevin Wolf 
CC: Hanna Reitz 
CC: "Dr. David Alan Gilbert" 
CC: Eric Blake 
CC: Markus Armbruster 
CC: Vladimir Sementsov-Ogievskiy 
---
 block/qapi-sysemu.c |  3 ++-
 hmp-commands.hx | 11 +++
 monitor/hmp-cmds.c  |  4 +++-
 qapi/block.json |  6 ++
 ui/cocoa.m  |  1 +
 5 files changed, 19 insertions(+), 6 deletions(-)

Changes from v1:
- added kludge to Objective C code
- simplified a bit call of do_open_tray() (thanks, Vova!)
- added record to hmp-command.hx

diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
index 8498402ad4..680c7ee342 100644
--- a/block/qapi-sysemu.c
+++ b/block/qapi-sysemu.c
@@ -318,6 +318,7 @@ void qmp_blockdev_change_medium(bool has_device, const char 
*device,
 bool has_id, const char *id,
 const char *filename,
 bool has_format, const char *format,
+bool has_force, bool force,
 bool has_read_only,
 BlockdevChangeReadOnlyMode read_only,
 Error **errp)
@@ -380,7 +381,7 @@ void qmp_blockdev_change_medium(bool has_device, const char 
*device,
 
 rc = do_open_tray(has_device ? device : NULL,
   has_id ? id : NULL,
-  false, );
+  force, );
 if (rc && rc != -ENOSYS) {
 error_propagate(errp, err);
 goto fail;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8476277aa9..6ec593ea08 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -202,9 +202,9 @@ ERST
 
 {
 .name   = "change",
-.args_type  = "device:B,target:F,arg:s?,read-only-mode:s?",
-.params = "device filename [format [read-only-mode]]",
-.help   = "change a removable medium, optional format",
+.args_type  = "device:B,force:-f,target:F,arg:s?,read-only-mode:s?",
+.params = "device [-f] filename [format [read-only-mode]]",
+.help   = "change a removable medium, optional format, use -f to 
force the operation",
 .cmd= hmp_change,
 },
 
@@ -212,11 +212,14 @@ SRST
 ``change`` *device* *setting*
   Change the configuration of a device.
 
-  ``change`` *diskdevice* *filename* [*format* [*read-only-mode*]]
+  ``change`` *diskdevice* [-f] *filename* [*format* [*read-only-mode*]]
 Change the medium for a removable disk device to point to *filename*. eg::
 
   (qemu) change ide1-cd0 /path/to/some.iso
 
+``-f``
+  forces the operation even if the guest has locked the tray.
+
 *format* is optional.
 
 *read-only-mode* may be used to change the read-only status of the device.
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 634968498b..d8b98bed6c 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1472,6 +1472,7 @@ void hmp_change(Monitor *mon, const QDict *qdict)
 const char *target = qdict_get_str(qdict, "target");
 const char *arg = qdict_get_try_str(qdict, "arg");
 const char *read_only = qdict_get_try_str(qdict, "read-only-mode");
+bool force = qdict_get_try_bool(qdict, "force", false);
 BlockdevChangeReadOnlyMode read_only_mode = 0;
 Error *err = NULL;
 
@@ -1508,7 +1509,8 @@ void hmp_change(Monitor *mon, const QDict *qdict)
 }
 
 qmp_blockdev_change_medium(true, device, false, NULL, target,
-   !!arg, arg, !!read_only, read_only_mode,
+   !!arg, arg, true, force,
+   !!read_only, read_only_mode,
);
 }
 
diff --git a/qapi/block.json b/qapi/block.json
index 82fcf2c914..3f100d4887 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -326,6 +326,11 @@
 # @read-only-mode: change the read-only mode of the device; defaults
 #  to 'retain'
 #
+# @force: if false (the default), an eject request through blockdev-open-tray
+# will be sent to the guest if it has locked the tray (and the tray
+# will not be opened immediately); if true, the tray will be opened
+# regardless of whether it is locked. (since 7.1)
+#
 # Features:
 

Re: [PATCH v4 0/8] Introduce akcipher service for virtio-crypto

2022-04-12 Thread Paolo Bonzini




In our plan, the feature is designed for HTTPS offloading case and
other applications which use kernel RSA/ecdsa by keyctl syscall.


Hi Zhenwei,

what is the % of time spent doing asymmetric key operations in your
benchmark?  I am not very familiar with crypto acceleration but my
understanding has always been that most time is spent doing either
hashing (for signing) or symmetric key operations (for encryption).

If I understand correctly, without support for acceleration these 
patches are more of a demonstration of virtio-crypto, or usable for 
testing purposes.


Would it be possible to extend virtio-crypto to use keys already in the
host keyctl, or in a PKCS#11 smartcard, so that virtio-crypto could also
provide the functionality of an HSM?  Or does the standard require that
the keys are provided by the guest?

Paolo



Re: Procedures adding new CPUs in sbsa-ref

2022-04-12 Thread Itaru Kitayama
On Tue, Apr 12, 2022 at 0:22 Alex Bennée  wrote:

>
> Itaru Kitayama  writes:
>
> > Good point; however per the SBSA specification, DEN0029F, there's the
> > PE architecture requirement at
> > each level from 1 to 7, so now I am wondering whether supporting
> > cortex-a57 and a72 are good enough to
> > set up a fully SBSA level 7 compliant "board" in QMEU.
>
> Not currently - we are working on cortex-a76/neoverse-n1 which will
> provide a v8.2 baseline for sbsa-ref. See:
>
>   Subject: [PATCH 00/16] target/arm: Implement features Debugv8p4, RAS,
> IESB
>   Date: Fri,  8 Apr 2022 17:07:26 -0700
>   Message-Id: <20220409000742.293691-1-richard.hender...@linaro.org>
>
> and:
>
>   Subject: [PATCH 0/7] target/arm: More trivial features, A76, N1
>   Date: Sat,  9 Apr 2022 22:57:18 -0700
>   Message-Id: <20220410055725.380246-1-richard.hender...@linaro.org>
>
> which are stepping stones to those concrete models. Please review if you
> can.


Sure, Shuichi and I will review Richard’s series and give feedback if
there’s any.

Itaru.


>
> > Also, the 'max'
> > is there, but does not boot.
>
> Generally the firmware has to be built with the knowledge of what system
> it is running on so will generally fall over if run on a different CPU
> feature set. However I believe Leif had a firmware branch which attempts
> to work with -cpu max by doing proper ID register probing before using
> features. However -cpu max is very a moving feast which is why there is
> a push for the concrete CPU types.
>
> I believe there is a proposal for a versioned sbsa-ref model which will
> step of the default CPU for higher levels.
>
> >
> > Itaru.
> >
> > On Sat, Apr 9, 2022 at 12:04 AM Peter Maydell 
> wrote:
> >>
> >> On Fri, 8 Apr 2022 at 15:59, Itaru Kitayama 
> wrote:
> >> > I'd like to add a64fx cpu to the sbsa-ref board, if there's a quick
> and dirty
> >> > way of completing that, advice from the  maintainers is greatly
> appreciated.
> >>
> >> I have cc'd the sbsa-ref maintainers (as listed in the MAINTAINERS
> file).
> >>
> >> However, I'm not sure why you want to add the a64fx CPU to this
> >> board model? The sbsa-ref board is intended as a platform for
> >> developing firmware that runs on Server Base System Architecture
> >> hardware, so it deliberately doesn't have support for every CPU
> >> type QEMU implements.
> >>
> >> thanks
> >> -- PMM
>
>
> --
> Alex Bennée
>


[PATCH] block: fix core for unlock not permitted

2022-04-12 Thread suruifeng via
qemu coredump:
  0x7f9e7205c81b in raise () from /usr/lib64/libc.so.6
  0x7f9e7205db41 in abort () from /usr/lib64/libc.so.6
  0x7f9e71ddbe94 in error_exit (err=, 
msg=msg@entry=0x7f9e71ec1b50 <__func__.20287> "qemu_mutex_unlock_impl")
at /usr/src/debug/qemu-4.1.0-170.x86_64/util/qemu-thread-posix.c:36
  0x7f9e71ddc61f in qemu_mutex_unlock_impl 
(mutex=mutex@entry=0x5559850b0b90, file=file@entry=0x7f9e71ec0978 
"/home/abuild/rpmbuild/BUILD/qemu-4.1.0/util/async.c",
line=line@entry=524) at 
/usr/src/debug/qemu-4.1.0-170.x86_64/util/qemu-thread-posix.c:108
  0x7f9e71dd5bb5 in aio_context_release (ctx=ctx@entry=0x5559850b0b30) at 
/usr/src/debug/qemu-4.1.0-170.x86_64/util/async.c:524
  0x7f9e70dfed28 in bdrv_flush (bs=bs@entry=0x5559851f0a20) at 
/usr/src/debug/qemu-4.1.0-170.x86_64/block/io.c:2778
  0x7f9e70e37f63 in bdrv_close (bs=bs@entry=0x5559851f0a20) at 
/usr/src/debug/qemu-4.1.0-170.x86_64/block.c:4025
  0x7f9e70e38193 in bdrv_delete (bs=0x5559851f0a20) at 
/usr/src/debug/qemu-4.1.0-170.x86_64/block.c:4271
  0x7f9e70e38225 in bdrv_unref (bs=) at 
/usr/src/debug/qemu-4.1.0-170.x86_64/block.c:5612
  0x7f9e70df9a92 in bdrv_next (it=it@entry=0x7ffc5e3547a0) at 
/usr/src/debug/qemu-4.1.0-170.x86_64/block/block-backend.c:576
  0x7f9e70dfee76 in bdrv_flush_all () at 
/usr/src/debug/qemu-4.1.0-170.x86_64/block/io.c:2074
  0x7f9e71e3a08f in do_vm_stop (state=state@entry=RUN_STATE_SHUTDOWN, 
send_stop=send_stop@entry=false) at 
/usr/src/debug/qemu-4.1.0-170.x86_64/cpus.c:1140
  0x7f9e71e3a14c in vm_shutdown () at 
/usr/src/debug/qemu-4.1.0-170.x86_64/cpus.c:1151

During mirror job run, the VM is shutdown. During the shutdown, the mirror job 
I/O error triggers mirror_exit_commom.
In bdrv_flush_all(), bdrv_next() increase the ref to mirror_top_bs first,
and then bdrv_flush(bs) call BDRV_POLL_WHILE and executes mirror_exit_common() 
decreases ref to mirror_top_bs,
and finally bdrv_next() decreases the ref to mirror_top_bs, resulting in 
release mirror_top_bs.

Let's fix this by adding aio_context_acquire() and aio_context_release() to 
bdrv_next().

Signed-off-by: suruifeng 
---
 block/block-backend.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index e0e1aff4b1..5ae745c0ab 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -593,6 +593,7 @@ BlockBackend *blk_next(BlockBackend *blk)
 BlockDriverState *bdrv_next(BdrvNextIterator *it)
 {
 BlockDriverState *bs, *old_bs;
+AioContext *ctx = NULL;
 
 /* Must be called from the main loop */
 assert(qemu_get_current_aio_context() == qemu_get_aio_context());
@@ -613,11 +614,17 @@ BlockDriverState *bdrv_next(BdrvNextIterator *it)
 if (it->blk) {
 blk_ref(it->blk);
 }
+   ctx = blk_get_aio_context(old_blk);
+   aio_context_acquire(ctx);
 blk_unref(old_blk);
+   aio_context_release(ctx);
 
 if (bs) {
 bdrv_ref(bs);
+   ctx = bdrv_get_aio_context(old_bs);
+   aio_context_acquire(ctx);
 bdrv_unref(old_bs);
+   aio_context_release(ctx);
 return bs;
 }
 it->phase = BDRV_NEXT_MONITOR_OWNED;
@@ -636,7 +643,10 @@ BlockDriverState *bdrv_next(BdrvNextIterator *it)
 if (bs) {
 bdrv_ref(bs);
 }
+ctx = bdrv_get_aio_context(old_bs);
+aio_context_acquire(ctx);
 bdrv_unref(old_bs);
+aio_context_release(ctx);
 
 return bs;
 }
-- 
2.27.0




Re: [libvirt RFC] virFile: new VIR_FILE_WRAPPER_BIG_PIPE to improve performance

2022-04-12 Thread Claudio Fontana
On 4/11/22 8:53 PM, Dr. David Alan Gilbert wrote:
> * Claudio Fontana (cfont...@suse.de) wrote:
>> On 4/7/22 3:57 PM, Claudio Fontana wrote:
>>> On 4/7/22 3:53 PM, Dr. David Alan Gilbert wrote:
 * Claudio Fontana (cfont...@suse.de) wrote:
> On 4/5/22 10:35 AM, Dr. David Alan Gilbert wrote:
>> * Claudio Fontana (cfont...@suse.de) wrote:
>>> On 3/28/22 10:31 AM, Daniel P. Berrangé wrote:
 On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana wrote:
> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
 * Claudio Fontana (cfont...@suse.de) wrote:
> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana wrote:
 On 3/16/22 1:17 PM, Claudio Fontana wrote:
> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio Fontana 
>> wrote:
>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
 On Sat, Mar 12, 2022 at 05:30:01PM +0100, Claudio Fontana 
 wrote:
> the first user is the qemu driver,
>
> virsh save/resume would slow to a crawl with a default 
> pipe size (64k).
>
> This improves the situation by 400%.
>
> Going through io_helper still seems to incur in some 
> penalty (~15%-ish)
> compared with direct qemu migration to a nc socket to a 
> file.
>
> Signed-off-by: Claudio Fontana 
> ---
>  src/qemu/qemu_driver.c|  6 +++---
>  src/qemu/qemu_saveimage.c | 11 ++-
>  src/util/virfile.c| 12 
>  src/util/virfile.h|  1 +
>  4 files changed, 22 insertions(+), 8 deletions(-)
>
> Hello, I initially thought this to be a qemu performance 
> issue,
> so you can find the discussion about this in qemu-devel:
>
> "Re: bad virsh save /dev/null performance (600 MiB/s max)"
>
> https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>
>>>
 Current results show these experimental averages maximum 
 throughput
 migrating to /dev/null per each FdWrapper Pipe Size (as per 
 QEMU QMP
 "query-migrate", tests repeated 5 times for each).
 VM Size is 60G, most of the memory effectively touched before 
 migration,
 through user application allocating and touching all memory 
 with
 pseudorandom data.

 64K: 5200 Mbps (current situation)
 128K:5800 Mbps
 256K:   20900 Mbps
 512K:   21600 Mbps
 1M: 22800 Mbps
 2M: 22800 Mbps
 4M: 22400 Mbps
 8M: 22500 Mbps
 16M:22800 Mbps
 32M:22900 Mbps
 64M:22900 Mbps
 128M:   22800 Mbps

 This above is the throughput out of patched libvirt with 
 multiple Pipe Sizes for the FDWrapper.
>>>
>>> Ok, its bouncing around with noise after 1 MB. So I'd suggest 
>>> that
>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>> not try to go higher.
>>>
 As for the theoretical limit for the libvirt architecture,
 I ran a qemu migration directly issuing the appropriate QMP
 commands, setting the same migration parameters as per libvirt,
 and then migrating to a socket netcatted to /dev/null via
 {"execute": "migrate", "arguments": { "uri", 
 "unix:///tmp/netcat.sock" } } :

 QMP:37000 Mbps
>>>
 So although the Pipe size improves things (in particular the
 large jump is for the 256K size, although 1M seems a very good 
 value),
 there is still a second bottleneck in there somewhere that
 accounts for a loss of ~14200 Mbps in throughput.
>
>
> Interesting 

[RFC PATCH] target/i386: fix byte swap issue with XMM register access

2022-04-12 Thread Alex Bennée
During the conversion to the gdb_get_reg128 helpers the high and low
parts of the XMM register where inadvertently swapped. This causes
reads of the register to report the incorrect value to gdb.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/971
Fixes: b7b8756a9c (target/i386: use gdb_get_reg helpers)
Signed-off-by: Alex Bennée 
Cc: qemu-sta...@nongnu.org
---
 target/i386/gdbstub.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/i386/gdbstub.c b/target/i386/gdbstub.c
index 098a2ad15a..c3a2cf6f28 100644
--- a/target/i386/gdbstub.c
+++ b/target/i386/gdbstub.c
@@ -129,8 +129,8 @@ int x86_cpu_gdb_read_register(CPUState *cs, GByteArray 
*mem_buf, int n)
 n -= IDX_XMM_REGS;
 if (n < CPU_NB_REGS32 || TARGET_LONG_BITS == 64) {
 return gdb_get_reg128(mem_buf,
-  env->xmm_regs[n].ZMM_Q(0),
-  env->xmm_regs[n].ZMM_Q(1));
+  env->xmm_regs[n].ZMM_Q(1),
+  env->xmm_regs[n].ZMM_Q(0));
 }
 } else {
 switch (n) {
-- 
2.30.2




Re: [RFC v2 1/8] blkio: add io_uring block driver using libblkio

2022-04-12 Thread Stefan Hajnoczi
On Thu, Apr 07, 2022 at 10:34:07AM +0200, Kevin Wolf wrote:
> Am 07.04.2022 um 10:25 hat Kevin Wolf geschrieben:
> > Am 07.04.2022 um 09:22 hat Stefan Hajnoczi geschrieben:
> > > On Wed, Apr 06, 2022 at 07:32:04PM +0200, Kevin Wolf wrote:
> > > > Am 05.04.2022 um 17:33 hat Stefan Hajnoczi geschrieben:
> > > > > libblkio (https://gitlab.com/libblkio/libblkio/) is a library for
> > > > > high-performance disk I/O. It currently supports io_uring with
> > > > > additional drivers planned.
> > > > > 
> > > > > One of the reasons for developing libblkio is that other applications
> > > > > besides QEMU can use it. This will be particularly useful for
> > > > > vhost-user-blk which applications may wish to use for connecting to
> > > > > qemu-storage-daemon.
> > > > > 
> > > > > libblkio also gives us an opportunity to develop in Rust behind a C 
> > > > > API
> > > > > that is easy to consume from QEMU.
> > > > > 
> > > > > This commit adds an io_uring BlockDriver to QEMU using libblkio. For 
> > > > > now
> > > > > I/O buffers are copied through bounce buffers if the libblkio driver
> > > > > requires it. Later commits add an optimization for pre-registering 
> > > > > guest
> > > > > RAM to avoid bounce buffers. It will be easy to add other libblkio
> > > > > drivers since they will share the majority of code.
> > > > > 
> > > > > Signed-off-by: Stefan Hajnoczi 
> > > > 
> > > > > +static BlockDriver bdrv_io_uring = {
> > > > > +.format_name= "io_uring",
> > > > > +.protocol_name  = "io_uring",
> > > > > +.instance_size  = sizeof(BDRVBlkioState),
> > > > > +.bdrv_needs_filename= true,
> > > > > +.bdrv_parse_filename= blkio_parse_filename_io_uring,
> > > > > +.bdrv_file_open = blkio_file_open,
> > > > > +.bdrv_close = blkio_close,
> > > > > +.bdrv_getlength = blkio_getlength,
> > > > > +.has_variable_length= true,
> > > > 
> > > > This one is a bad idea. It means that every request will call
> > > > blkio_getlength() first, which looks up the "capacity" property in
> > > > libblkio and then calls lseek() for the io_uring backend.
> > > 
> > > Thanks for pointing this out. I didn't think this through. More below on
> > > what I was trying to do.
> > > 
> > > > For other backends like the vhost_user one (where I just copied your
> > > > definition and then noticed this behaviour), it involve a message over
> > > > the vhost socket, which is even worse.
> > > 
> > > (A vhost-user-blk driver could cache the capacity field and update it
> > > when a Configuration Change Notification is received. There is no need
> > > to send a vhost-user protocol message every time.)
> > 
> > In theory we could cache in libblkio, but then we would need a mechanism
> > to invalidate the cache so we can support resizing an image (similar to
> > what block_resize does in QEMU, except that it wouldn't set the new
> > size from a parameter, but just get the new value from the backend).
> 
> Oh, sorry, I misread. VHOST_USER_SLAVE_CONFIG_CHANGE_MSG is probably
> what you mean, so that the backend triggers the update. It exists in the
> spec, but neither libvhost-user nor rust-vmm seem to support it
> currently. We also don't set up the backchannel yet where this message
> could even be passed.
> 
> So it's an option, but probably only for later because it involves
> extending several places.

Agreed.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 0/3] vhost-user: Fixes for VHOST_USER_ADD/REM_MEM_REG

2022-04-12 Thread Stefan Hajnoczi
On Thu, Apr 07, 2022 at 03:36:54PM +0200, Kevin Wolf wrote:
> While implementing a vhost-user-blk driver for libblkio, I found some
> problems with VHOST_USER_ADD/REM_MEM_REG both in the spec and in the
> implementations in QEMU and libvhost-user that this series addresses.
> 
> I also noticed that you can use REM_MEM_REG or SET_MEM_TABLE to unmap a
> memory region that is still in use (e.g. a block I/O request using
> addresses from the region has been started, but not completed yet),
> which is not great. I'm not sure how to fix this best, though.
> 
> We would have to wait for these requests to complete (maybe introduce a
> refcount and wait for it to drop to zero), but waiting seems impossible
> in libvhost-user because it doesn't have any main loop integration. Just
> failing the memory region removal would be safe, but potentially a
> rather awkward interface because clients would have to implement some
> retry logic.
> 
> Kevin Wolf (3):
>   docs/vhost-user: Clarifications for VHOST_USER_ADD/REM_MEM_REG
>   libvhost-user: Fix extra vu_add/rem_mem_reg reply
>   vhost-user: Don't pass file descriptor for VHOST_USER_REM_MEM_REG
> 
>  docs/interop/vhost-user.rst   | 17 +
>  hw/virtio/vhost-user.c|  2 +-
>  subprojects/libvhost-user/libvhost-user.c | 17 +++--
>  3 files changed, 25 insertions(+), 11 deletions(-)
> 
> -- 
> 2.35.1
> 

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH] contrib/vhost-user-blk: add missing GOptionEntry NULL terminator

2022-04-12 Thread Stefan Hajnoczi
On Mon, Apr 11, 2022 at 04:00:57PM +0100, Stefan Hajnoczi wrote:
> The GLib documentation says "a NULL-terminated array of GOptionEntrys"
> so we'd better make sure there is a terminator that lets
> g_option_context_add_main_entries() know when the end of the array has
> been reached.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  contrib/vhost-user-blk/vhost-user-blk.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Thanks, applied to my block-next tree:
https://gitlab.com/stefanha/qemu/commits/block-next

Stefan


signature.asc
Description: PGP signature


Re: [libvirt] [PATCH RESEND v2 0/4] re-introduce

2022-04-12 Thread Michael S. Tsirkin
On Tue, Apr 12, 2022 at 09:52:26AM +0530, Ani Sinha wrote:
> On Tue, Apr 12, 2022 at 9:50 AM Ani Sinha  wrote:
> >
> > On Tue, Mar 8, 2022 at 10:28 PM Michael S. Tsirkin  wrote:
> > >
> > > On Tue, Mar 08, 2022 at 10:15:49PM +0530, Ani Sinha wrote:
> > > >
> > > > Change log:
> > > > v2: rebased the patchset. Laine's response is appended at the end.
> > > >
> > > > I am re-introducing the patchset for  which got
> > > > reverted here few months back:
> > > >
> > > > https://www.spinics.net/linux/fedora/libvir/msg224089.html
> > > >
> > > > The reason for the reversal was that there seemed to be some
> > > > instability/issues around the use of the qemu commandline which this
> > > > patchset tries to support. In particular, some guest operating systems
> > > > did not like the way QEMU was trying to disable native hotplug on pcie
> > > > root ports. Subsequently, in QEMU 6.2, we have changed our mechanism
> > > > using which we disable native hotplug. As I understand, we do not have
> > > > any reported issues so far in 6.2 around this area. QEMU will enter a
> > > > soft feature freeze in the first week of march in prep for 7.0 release.
> > >
> > > Right. But unfortunately we did not yet really work on
> > > a sane interface for this.
> > >
> > > The way I see it, at high level we thinkably need two flags
> > > - disable ACPI hotplug
> > > - enable native hotplug (maybe separately for pci and pcie?)

I still think this is the case.

> > pci does not have native hotplug. so this would be applicable only for
> > q35. For i440fx we have two separate flags already to disable acpi
> > hotplug, one for root bus and another for bridges.
> >
> > >
> > > and with both enabled guests actually can switch between
> > > the two.
> > >
> > > This will at least reflect the hardware, so has a chance to be
> > > stable.
> > >
> > > The big question however would be what is the actual use-case.
> > > Without that this begs the question of why do we bother at all.
> >
> > To me the main motivation is as I have described here:
> > https://listman.redhat.com/archives/libvir-list/2021-October/msg00068.html
> >
> > One concrete example of why one might still want to use native hotplug with
> > pcie-root-port controller is the fact that we are still discovering issues 
> > with
> > acpi hotplug on PCIE. One such issue is:
> > https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg02146.html

This one was fixed, right?


> > Another reason is that users have been using native hotplug on pcie root 
> > ports
> > up until now. They have built and tested their systems based on native 
> > hotplug.
> > They may not want to suddenly move to acpi based hotplug just because it is 
> > now
> > the default in qemu. Supporting the option to chose one or the other through
> > libvirt makes things simpler for end users.
> 
> Essentially what I do not like is that we are imposing acpi hotplug on
> q35 for the entire community without giving them a choice to revert
> back to native hotplug though libvirt.

The reason qemu did it is because it was expected it's more or less
transparent. Barring bugs bug hey, there's always bugs with any change.

> >
> > > To allow hotplug of bridges? If it is really necessary for us then
> > > we should think hard about questions that surround this:
> > >
> > > - how does one hotplug a pcie switch?
> > > - any way to use e.g. dynamic ACPI to support hotplug of bridges?
> > > - do we want to bite the bullet and create an option for management
> > >   to fully control guest memory layout including all pci devices?
> > >
> > >
> > >
> > > > Libvirt is also entering a new release cycle phaze. Hence, I am
> > > > introducing this patchset early enough in the release cycles so that if
> > > > we do see any issues on the qemu side during the rc0, rc1 cycles and if
> > > > reversal of this patchset is again required, it can be done in time
> > > > before the next libvirt release end of March.
> > > >
> > > > All the patches in this series had been previously reviewed. Some
> > > > subsequent fixes were made after my initial patches were pushed. I have
> > > > squashed all those fixes and consolidated them into four patches. I have
> > > > also updated the documentation to reflect the new changes from the QEMU
> > > > side and rebased my changes fixing the tests in the process.
> > > >
> > > > What changed in QEMU post version 6.1 ?
> > > > =
> > > >
> > > > We have made basically two major changes in QEMU. First is this change:
> > > >
> > > > (1) commit 211afe5c69b597acf85fdd577eb497f5be1ffbd8
> > > > Author: Julia Suvorova 
> > > > Date:   Fri Nov 12 06:08:56 2021 -0500
> > > >
> > > > hw/i386/acpi-build: Deny control on PCIe Native Hot-plug in _OSC
> > > >
> > > > There are two ways to enable ACPI PCI Hot-plug:
> > > >
> > > > * Disable the Hot-plug Capable bit on PCIe slots.
> > > >
> > > > This was the first approach which led to regression [1-2], 

[PATCH 2/2] acpi/nvdimm: Fix aml_or() and aml_and() in if clause

2022-04-12 Thread Robert Hoo
It should be some typo originally, where in If condition, using bitwise
and/or, rather than logical and/or.

The resulting change in AML code:

If (((Local6 == Zero) | (Arg0 != Local0)))
==>
If (((Local6 == Zero) || (Arg0 != Local0)))

If (((ObjectType (Arg3) == 0x04) & (SizeOf (Arg3) == One)))
==>
If (((ObjectType (Arg3) == 0x04) && (SizeOf (Arg3) == One)))

Fixes: 90623ebf603 ("nvdimm acpi: check UUID")
Fixes: 4568c948066 ("nvdimm acpi: save arg3 of _DSM method")
Signed-off-by: Robert Hoo 
Reviewed-by: Jingqi Liu 
---
 hw/acpi/nvdimm.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 7cc419401b..2cd26bb9e9 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -1040,7 +1040,7 @@ static void nvdimm_build_common_dsm(Aml *dev,
 
 uuid_invalid = aml_lnot(aml_equal(uuid, expected_uuid));
 
-unsupport = aml_if(aml_or(unpatched, uuid_invalid, NULL));
+unsupport = aml_if(aml_lor(unpatched, uuid_invalid));
 
 /*
  * function 0 is called to inquire what functions are supported by
@@ -1072,10 +1072,9 @@ static void nvdimm_build_common_dsm(Aml *dev,
  * in the DSM Spec.
  */
 pckg = aml_arg(3);
-ifctx = aml_if(aml_and(aml_equal(aml_object_type(pckg),
+ifctx = aml_if(aml_land(aml_equal(aml_object_type(pckg),
aml_int(4 /* Package */)) /* It is a Package? */,
-   aml_equal(aml_sizeof(pckg), aml_int(1)) /* 1 element? */,
-   NULL));
+   aml_equal(aml_sizeof(pckg), aml_int(1)) /* 1 element? */));
 
 pckg_index = aml_local(2);
 pckg_buf = aml_local(3);
-- 
2.31.1




Re: [libvirt] [PATCH RESEND v2 0/4] re-introduce

2022-04-12 Thread Michael S. Tsirkin
On Tue, Apr 12, 2022 at 09:50:15AM +0530, Ani Sinha wrote:
> On Tue, Mar 8, 2022 at 10:28 PM Michael S. Tsirkin  wrote:
> >
> > On Tue, Mar 08, 2022 at 10:15:49PM +0530, Ani Sinha wrote:
> > >
> > > Change log:
> > > v2: rebased the patchset. Laine's response is appended at the end.
> > >
> > > I am re-introducing the patchset for  which got
> > > reverted here few months back:
> > >
> > > https://www.spinics.net/linux/fedora/libvir/msg224089.html
> > >
> > > The reason for the reversal was that there seemed to be some
> > > instability/issues around the use of the qemu commandline which this
> > > patchset tries to support. In particular, some guest operating systems
> > > did not like the way QEMU was trying to disable native hotplug on pcie
> > > root ports. Subsequently, in QEMU 6.2, we have changed our mechanism
> > > using which we disable native hotplug. As I understand, we do not have
> > > any reported issues so far in 6.2 around this area. QEMU will enter a
> > > soft feature freeze in the first week of march in prep for 7.0 release.
> >
> > Right. But unfortunately we did not yet really work on
> > a sane interface for this.
> >
> > The way I see it, at high level we thinkably need two flags
> > - disable ACPI hotplug
> > - enable native hotplug (maybe separately for pci and pcie?)
> 
> pci does not have native hotplug.

shpc?

> so this would be applicable only for
> q35. For i440fx we have two separate flags already to disable acpi
> hotplug, one for root bus and another for bridges.
> 
> >
> > and with both enabled guests actually can switch between
> > the two.
> >
> > This will at least reflect the hardware, so has a chance to be
> > stable.
> >
> > The big question however would be what is the actual use-case.
> > Without that this begs the question of why do we bother at all.
> 
> To me the main motivation is as I have described here:
> https://listman.redhat.com/archives/libvir-list/2021-October/msg00068.html
> 
> One concrete example of why one might still want to use native hotplug with
> pcie-root-port controller is the fact that we are still discovering issues 
> with
> acpi hotplug on PCIE. One such issue is:
> https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg02146.html
> Another reason is that users have been using native hotplug on pcie root ports
> up until now. They have built and tested their systems based on native 
> hotplug.
> They may not want to suddenly move to acpi based hotplug just because it is 
> now
> the default in qemu. Supporting the option to chose one or the other through
> libvirt makes things simpler for end users.

To work around bugs then.

> > To allow hotplug of bridges? If it is really necessary for us then
> > we should think hard about questions that surround this:
> >
> > - how does one hotplug a pcie switch?
> > - any way to use e.g. dynamic ACPI to support hotplug of bridges?
> > - do we want to bite the bullet and create an option for management
> >   to fully control guest memory layout including all pci devices?
> >
> >
> >
> > > Libvirt is also entering a new release cycle phaze. Hence, I am
> > > introducing this patchset early enough in the release cycles so that if
> > > we do see any issues on the qemu side during the rc0, rc1 cycles and if
> > > reversal of this patchset is again required, it can be done in time
> > > before the next libvirt release end of March.
> > >
> > > All the patches in this series had been previously reviewed. Some
> > > subsequent fixes were made after my initial patches were pushed. I have
> > > squashed all those fixes and consolidated them into four patches. I have
> > > also updated the documentation to reflect the new changes from the QEMU
> > > side and rebased my changes fixing the tests in the process.
> > >
> > > What changed in QEMU post version 6.1 ?
> > > =
> > >
> > > We have made basically two major changes in QEMU. First is this change:
> > >
> > > (1) commit 211afe5c69b597acf85fdd577eb497f5be1ffbd8
> > > Author: Julia Suvorova 
> > > Date:   Fri Nov 12 06:08:56 2021 -0500
> > >
> > > hw/i386/acpi-build: Deny control on PCIe Native Hot-plug in _OSC
> > >
> > > There are two ways to enable ACPI PCI Hot-plug:
> > >
> > > * Disable the Hot-plug Capable bit on PCIe slots.
> > >
> > > This was the first approach which led to regression [1-2], as
> > > I/O space for a port is allocated only when it is hot-pluggable,
> > > which is determined by HPC bit.
> > >
> > > * Leave the HPC bit on and disable PCIe Native Hot-plug in 
> > > _OSC
> > >   method.
> > >
> > > This removes the (future) ability of hot-plugging switches with PCIe
> > > Native hotplug since ACPI PCI Hot-plug only works with cold-plugged
> > > bridges. If the user wants to explicitely use this feature, they can
> > > disable ACPI PCI Hot-plug with:
> > > --global 

[PATCH 1/2] acpi/nvdimm: Create _LS{I,R,W} method for NVDIMM device

2022-04-12 Thread Robert Hoo
Since ACPI 6.2, previous NVDIMM/_DSM funcions "Get Namespace Label Data
Size (function index 4)", "Get Namespace Label Data (function index 5)",
"Set Namespace Label Data (function index 6)" has been deprecated by ACPI
standard method _LSI, _LSR, _LSW respectively. Functions semantics are
almost identical, so my implementation is to reuse existing _DSMs, just
create _LS{I,R,W} interfaces and constructs parameters and call _DSMs.

Only child NVDIMM devices has these methods, rather Root device.

By this patch, new NVDIMM sub device in ACPI namespace will be like this:

Device (NV00)
{
Name (_ADR, One)  // _ADR: Address
Method (_LSI, 0, NotSerialized)  // _LSI: Label Storage Information
{
 Return (NCAL (ToUUID ("4309ac30-0d11-11e4-9191-0800200c9a66"), 
0x02, 0x04, Zero, One))
}

Method (_LSR, 2, Serialized)  // _LSR: Label Storage Read
{
CreateDWordField (BUFF, Zero, DWD0)
CreateDWordField (BUFF, 0x04, DWD1)
Name (PKG1, Package (0x01)
{
BUFF
})
DWD0 = Arg0
DWD1 = Arg1
Return (NCAL (ToUUID ("4309ac30-0d11-11e4-9191-0800200c9a66"), 
0x02, 0x05, PKG1, One))
}

Method (_LSW, 3, Serialized)  // _LSW: Label Storage Write
{
CreateDWordField (BUFF, Zero, DWD0)
CreateDWordField (BUFF, 0x04, DWD1)
CreateField (BUFF, 0x40, 0x7FA0, FILD)
Name (PKG1, Package (0x01)
{
BUFF
})
DWD0 = Arg0
DWD1 = Arg1
FILD = Arg2
Return (NCAL (ToUUID ("4309ac30-0d11-11e4-9191-0800200c9a66"), 
0x02, 0x06, PKG1, One))
 }

 Method (_DSM, 4, NotSerialized)  // _DSM: Device-Specific Method
 {
Return (NCAL (Arg0, Arg1, Arg2, Arg3, One))
 }
}

Signed-off-by: Robert Hoo 
Reviewed-by: Jingqi Liu
---
 hw/acpi/nvdimm.c | 56 
 1 file changed, 52 insertions(+), 4 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 0d43da19ea..7cc419401b 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -848,10 +848,10 @@ nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, 
unsigned size)
 
 nvdimm_debug("Revision 0x%x Handler 0x%x Function 0x%x.\n", in->revision,
  in->handle, in->function);
-
-if (in->revision != 0x1 /* Currently we only support DSM Spec Rev1. */) {
-nvdimm_debug("Revision 0x%x is not supported, expect 0x%x.\n",
- in->revision, 0x1);
+/* Currently we only support DSM Spec Rev1 and Rev2. */
+if (in->revision != 0x1 && in->revision != 0x2) {
+nvdimm_debug("Revision 0x%x is not supported, expect 0x1 or 0x2.\n",
+ in->revision);
 nvdimm_dsm_no_payload(NVDIMM_DSM_RET_STATUS_UNSUPPORT, dsm_mem_addr);
 goto exit;
 }
@@ -1247,6 +1247,11 @@ static void nvdimm_build_fit(Aml *dev)
 static void nvdimm_build_nvdimm_devices(Aml *root_dev, uint32_t ram_slots)
 {
 uint32_t slot;
+Aml *method, *pkg, *buff;
+
+/* Build common shared buffer for params pass in/out */
+buff = aml_buffer(4096, NULL);
+aml_append(root_dev, aml_name_decl("BUFF", buff));
 
 for (slot = 0; slot < ram_slots; slot++) {
 uint32_t handle = nvdimm_slot_to_handle(slot);
@@ -1264,6 +1269,49 @@ static void nvdimm_build_nvdimm_devices(Aml *root_dev, 
uint32_t ram_slots)
  */
 aml_append(nvdimm_dev, aml_name_decl("_ADR", aml_int(handle)));
 
+/* Build _LSI, _LSR, _LSW */
+method = aml_method("_LSI", 0, AML_NOTSERIALIZED);
+aml_append(method, aml_return(aml_call5(NVDIMM_COMMON_DSM,
+aml_touuid("4309AC30-0D11-11E4-9191-0800200C9A66"),
+aml_int(2), aml_int(4), aml_int(0),
+aml_int(handle;
+aml_append(nvdimm_dev, method);
+
+method = aml_method("_LSR", 2, AML_SERIALIZED);
+aml_append(method,
+aml_create_dword_field(aml_name("BUFF"), aml_int(0), "DWD0"));
+aml_append(method,
+aml_create_dword_field(aml_name("BUFF"), aml_int(4), "DWD1"));
+pkg = aml_package(1);
+aml_append(pkg, aml_name("BUFF"));
+aml_append(method, aml_name_decl("PKG1", pkg));
+aml_append(method, aml_store(aml_arg(0), aml_name("DWD0")));
+aml_append(method, aml_store(aml_arg(1), aml_name("DWD1")));
+aml_append(method, aml_return(aml_call5(NVDIMM_COMMON_DSM,
+aml_touuid("4309AC30-0D11-11E4-9191-0800200C9A66"),
+aml_int(2), aml_int(5), aml_name("PKG1"),
+aml_int(handle;
+aml_append(nvdimm_dev, method);
+
+method = aml_method("_LSW", 3, 

[RESEND][PATCH 0/2] acpi/nvdimm: support NVDIMM _LS{I,R,W} methods

2022-04-12 Thread Robert Hoo
The original NVDIMM _DSM functions (index 4~6) for label operations have
been deprecated by new ACPI methods _LS{I,R,W}[1][2].

Patch 1 implements the new _LS{I,R,W} methods, on top of old _DSM
implementation.

Patch 2 fixes some typo of logical and/or with bitwise and/or, though
functionally they haven't causing trouble.

[1] https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/index.html, 6.5.10 NVDIMM 
Label Methods
[2] https://pmem.io/documents/IntelOptanePMem_DSM_Interface-V2.0.pdf, 3.10 
Deprecated Functions

---
Resend for previous failed delivery to "qemu-devel@nongnu.org" due to
550-'Message headers fail syntax check'. 

Robert Hoo (2):
  acpi/nvdimm: Create _LS{I,R,W} method for NVDIMM device
  acpi/nvdimm: Fix aml_or() and aml_and() in if clause

 hw/acpi/nvdimm.c | 60 +++-
 1 file changed, 54 insertions(+), 6 deletions(-)


base-commit: 95a3fcc7487e5bef262e1f937ed8636986764c4e
-- 
2.31.1