date:20220412

Re: [Qemu-devel] [PATCH 6/8] i386/kvm: hv-stimer requires hv-time and hv-synic

2022-04-12 Thread Divya Garg

On 12/04/22 8:46 pm, Vitaly Kuznetsov wrote:

Divya Garg writes:

On 12/04/22 6:18 pm, Vitaly Kuznetsov wrote:

Divya Garg writes:

Hi Vitaly Kuznetsov !
I was working on hyperv flags and saw that we introduced new
dependencies some
time back
(https://urldefense.proofpoint.com/v2/url?u=https-3A__sourcegraph.com_github.com_qemu_qemu_-2D_commit_c686193072a47032d83cb4e131dc49ae30f9e5d7-3Fvisible-3D1=DwIBAg=s883GpUCOChKOHiocYtGcg=2QGHz-fTCVWImEBKe1ZcSe5t6UfasnhvdzD5DcixwOE=ln-t0rKlkFkOEKe97jJTLi2BoKK5E9lLMPHjPihl4kpdbvBStPeD0Ku9wTed7GPf=AtipQDs1Mi-0FQtb1AyvBpR34bpjp64troGF_nr_08E=
).
After these changes, if we try to live migrate a vm from older qemu to newer
one having these changes, it fails showing dependency issue.

I was wondering if this is the expected behaviour or if there is any work
around for handing it ? Or something needs to be done to ensure backward
compatibility ?

Hi Divya,

configurations with 'hv-stimer' and without 'hv-synic'/'hv-time' were
always incorrect as Windows can't use the feature, that's why the
dependencies were added. It is true that it doesn't seem to be possible
to forward-migrate such VMs to newer QEMU versions. We could've tied
these new dependencies to newer machine types I guess (so old machine
types would not fail to start) but we didn't do that back in 4.1 and
it's been awhile since... Not sure whether it would make much sense to
introduce something for pre-4.1 machine types now.

Out of curiosity, why do such "incorrect" configurations exist? Can you
just update them to include missing flags on older QEMU so they migrate
to newer ones without issues?

Hi Vitaly !

Thanks for the response. I understand that these were incorrect
configurations
and should be corrected. Only issue is, we want to avoid power cycling those
VMs. But yeah I think, since the configurations were wrong we should
update and
power cycle the VM. Just for understanding purpose, is it possible to
disable
the feature by throwing out some warning message and update libvirt to
metigate
this change and handle live migration ?

I'm not exactly sure about libvirt, I was under the impression it makes
sure that QEMU command line is the same on the destination and on the
source. If there's a way to add something, I'd suggest you add the
missing features (hv-time, hv-synic) on the destination rather than
remove 'hv-stimer' as it is probably safer.
Yes libvirt makes sure that the configurations remains constant on
source and

destination. And true that adding new features is a safer route.

Or maybe update libvirt to not to ask for this feature from qemu during live
migration and handle different configuration on source and destination
host ?

You can also modify QEMU locally and throw away these dependencies,
it'll allow these configurations again but generally speaking checking
that the set of hyper-v features is exactly the same on the source and
destination is the right thing to do: there are no guarantees that guest
OS (Windows) will keep behaving sane when the corresponding CPUIDs
change while it's running, all sorts of things are possible I believe.
True that. Its really difficult to predict the change in behaviour of
guest on

changing CPUIDs especially disabling a bit. I agree best solution will be to
power cycle the VMs and update the correct CPUIDs maintaining correct
dependency. Thankyou for clearing out the doubts and helping in a better
understanding.

Regards
Divya

Re: XIVE VFIO kernel resample failure in INTx mode under heavy load

2022-04-12 Thread Alexey Kardashevskiy





On 3/17/22 06:16, Cédric Le Goater wrote:

Timothy,

On 3/16/22 17:29, Cédric Le Goater wrote:

Hello,



I've been struggling for some time with what is looking like a
potential bug in QEMU/KVM on the POWER9 platform.  It appears that
in XIVE mode, when the in-kernel IRQ chip is enabled, an external
device that rapidly asserts IRQs via the legacy INTx level mechanism
will only receive one interrupt in the KVM guest.


Indeed. I could reproduce with a pass-through PCI adapter using
'pci=nomsi'. The virtio devices operate correctly but the network
adapter only receives one event (*):


$ cat /proc/interrupts
    CPU0   CPU1   CPU2   CPU3   CPU4   
CPU5   CPU6   CPU7
  16:   2198   1378   1519   1216  0  
0  0  0  XIVE-IPI   0 Edge  IPI-0
  17:  0  0  0  0   2003   
1936   1335   1507  XIVE-IPI   1 Edge  IPI-1
  18:  0   6401  0  0  0  
0  0  0  XIVE-IRQ 4609 Level virtio3, virtio0, 
virtio2
  19:  0  0  0  0  0
204  0  0  XIVE-IRQ 4610 Level virtio1
  20:  0  0  0  0  0  
0  0  0  XIVE-IRQ 4608 Level xhci-hcd:usb1
  21:  0  1  0  0  0  
0  0  0  XIVE-IRQ 4612 Level eth1 (*)
  23:  0  0  0  0  0  
0  0  0  XIVE-IRQ 4096 Edge  RAS_EPOW
  24:  0  0  0  0  0  
0  0  0  XIVE-IRQ 4592 Edge  hvc_console
  26:  0  0  0  0  0  
0  0  0  XIVE-IRQ 4097 Edge  RAS_HOTPLUG



Changing any one of those items appears to avoid the glitch, e.g. XICS


XICS is very different from XIVE. The driver implements the previous
interrupt controller architecture (P5-P8) and the hypervisor mediates
the delivery to the guest. With XIVE, vCPUs are directly signaled by
the IC. When under KVM, we use different KVM devices for each mode :

* KVM XIVE is a XICS-on-XIVE implementation (P9/P10 hosts) for guests
   not using the XIVE native interface. RHEL7 for instance.
* KVM XIVE native is a XIVE implementation (P9/P10 hosts) for guests
   using the XIVE native interface. Linux > 4.14.
* KVM XICS is for P8 hosts (no XIVE HW)

VFIO adds some complexity with the source events. I think the problem
comes from the assertion state. I will talk about it later.


mode with the in-kernel IRQ chip works (all interrupts are passed
through),


All interrupts are passed through using XIVE also. Run 'info pic' in
the monitor. On the host, check the IRQ mapping in the debugfs file :

   /sys/kernel/debug/powerpc/kvm-xive-*

and XIVE mode with the in-kernel IRQ chip disabled also works. 


In that case, no KVM device backs the QEMU device and all state
is in one place.


We
are also not seeing any problems in XIVE mode with the in-kernel
chip from MSI/MSI-X devices.


Yes. pass-through devices are expected to operate correctly :)


The device in question is a real time card that needs to raise an
interrupt every 1ms.  It works perfectly on the host, but fails in
the guest -- with the in-kernel IRQ chip and XIVE enabled, it
receives exactly one interrupt, at which point the host continues to
see INTx+ but the guest sees INTX-, and the IRQ handler in the guest
kernel is never reentered.


ok. Same symptom as the scenario above.


We have also seen some very rare glitches where, over a long period
of time, we can enter a similar deadlock in XICS mode.


with the in-kernel XICS IRQ chip ?


Disabling
the in-kernel IRQ chip in XIVE mode will also lead to the lockup
with this device, since the userspace IRQ emulation cannot keep up
with the rapid interrupt firing (measurements show around 100ms
required for processing each interrupt in the user mode).


MSI emulation in QEMU is slower indeed (35%). LSI is very slow because
it is handled as a special case in the device/driver. To maintain the
assertion state, all LSI handling is done with a special HCALL :
H_INT_ESB which is implemented in QEMU. This generates a lot of back
and forth in the KVM stack.


My understanding is the resample mechanism does some clever tricks
with level IRQs, but that QEMU needs to check if the IRQ is still
asserted by the device on guest EOI.


Yes. the problem is in that area.


Since a failure here would
explain these symptoms I'm wondering if there is a bug in either
QEMU or KVM for POWER / pSeries (SPAPr) where the IRQ is not
resampled and therefore not re-fired in the guest?


KVM I would say. The assertion state is maintained in KVM for the KVM
XICS-on-XIVE implementation and in QEMU for the KVM XIVE native
device. These are good candidates. I will take a look.


All works

[PATCH] docs: Correct the default thread-pool-size

2022-04-12 Thread Liu Yiding

Refer to 26ec190964 virtiofsd: Do not use a thread pool by default

Signed-off-by: Liu Yiding 
---
 docs/tools/virtiofsd.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/tools/virtiofsd.rst b/docs/tools/virtiofsd.rst
index 0c0560203c..33fed08c6f 100644
--- a/docs/tools/virtiofsd.rst
+++ b/docs/tools/virtiofsd.rst
@@ -127,7 +127,7 @@ Options
 .. option:: --thread-pool-size=NUM
 
   Restrict the number of worker threads per request queue to NUM.  The default
-  is 64.
+  is 0.
 
 .. option:: --cache=none|auto|always
 
-- 
2.31.1

Re: [RFC PATCH 0/4] 9pfs: Add 9pfs support for Windows host

2022-04-12 Thread Bin Meng

+Guohuai

On Tue, Apr 12, 2022 at 8:27 PM Christian Schoenebeck
 wrote:
>
> On Freitag, 8. April 2022 19:10:09 CEST Bin Meng wrote:
> > At present there is no Windows support for 9p file system.
> > This series adds initial Windows support for 9p file system.
>
> Nice!
>
> > Only 'local' file system driver backend is supported. security_model
> > should be 'none' due to limitations on Windows host.
>
> We have 3 fs drivers: local, synth, proxy. I don't mind about proxy, it is in
> bad shape and we will probably deprecate it in near future anyway. But it
> would be good to have support for the synth driver, because we are using it
> for running test cases and fuzzing tests (QA).
>
> What are the limitations against security_model=mapped on Windows? Keep in
> mind that with security_model=none you are very limited in what you can do
> with 9p.
>

Regards,
Bin

Re: [PATCH 0/5] target/arm: Support variable sized coprocessor registers

2022-04-12 Thread Gavin Shan

Hi Peter,

On 4/11/22 8:10 PM, Peter Maydell wrote:

On Mon, 11 Apr 2022 at 13:02, Andrew Jones wrote:

On Mon, Apr 11, 2022 at 10:22:59AM +0100, Peter Maydell wrote:

Also, we support SVE today, and we don't have variable size
coprocessor registers. Is there a bug here that we would be
fixing ?

SVE registers are KVM_REG_SIZE_U2048 and KVM_REG_SIZE_U256 sized
registers. They work fine (just like the VFP registers which are
KVM_REG_SIZE_U128 sized). They work because they don't get stored in the
cpreg list. SVE and CORE (which includes VFP) registers are filtered
out by kvm_arm_reg_syncs_via_cpreg_list(). Since they're filtered
out they need to be handled specifically by kvm_arch_get/put_registers()

Right, this is the distinction between ONE_REG registers and
coprocessor registers (which are a subset of ONE_REG registers).
We wouldn't want to handle SVE regs in the copro list anyway,
I think, because we want their state to end up in env->vfp.zregs[]
so the gdbstub can find it there. And we wouldn't have benefited
from the copro regs handling's "no need for new QEMU to handle
migrating state of a new register" because we needed a lot of
special case code for SVE and couldn't enable it by default
for other reasons.

Yep, those new introduced SDEI pseudo-registers, the intention
is to avoid the special case code. So the coprocessor register
list fits the need well. The only barrier to use the coprocessor
register list is the variable register sizes.

If we do add non-64-bit cpregs on the kernel side then we need to
make those new registers opt-in, because currently deployed QEMU
will refuse to start if the kernel passes it a register in the
KVM_GET_REG_LIST that is larger than 64 bits and isn't
KVM_REG_ARM_CORE or KVM_REG_ARM64_SVE (assuming I'm not misreading
the QEMU code).

Yes, we need make those new registers opt-in absolutely. Otherwise,
the old qemu, which doesn't have variable sized registers supported,
will crash on host kernel, where large sized registers are exposed
unconditionally.

I spent some time to think of the mechanisms for opt-in. There are
two options as I can figure out: (1) Using KVM_CAP_ARM_SDEI to check
if the large sized registers exist. (2) Using newly introduced
pseudo-register ("KVM_REG_ARM_STD_BMAP") in Raghavendra's series [1].
The individual bit in this pseudo-register corresponds to one
service in "standard hypervisor" category, where SDEI falls into.

I prefer (2) because those services or firmware interfaces are
exposed in a collective way by KVM_REG_ARM_STD_BMAP, comparing
to the individual capabilities. However, they are same in essence.
Another benefit to use KVM_REG_ARM_STD_BMAP is hiding SDEI interface
and the large sized registers for old QEMU.

[1]
https://lore.kernel.org/linux-arm-kernel/20220407011605.1966778-10-rana...@google.com/T/#m0bc1aa4048ca157e8e99c593b3f349b879032543

Thanks,
Gavin

Re: [PATCH v5 4/4] hw/acpi/aml-build: Use existing CPU topology to build PPTT table

2022-04-12 Thread Gavin Shan


Hi Jonathan,

On 4/12/22 11:40 PM, Jonathan Cameron wrote:

On Sun,  3 Apr 2022 22:59:53 +0800
Gavin Shan  wrote:

When the PPTT table is built, the CPU topology is re-calculated, but
it's unecessary because the CPU topology has been populated in
virt_possible_cpu_arch_ids() on arm/virt machine.

This reworks build_pptt() to avoid by reusing the existing one in
ms->possible_cpus. Currently, the only user of build_pptt() is
arm/virt machine.

Signed-off-by: Gavin Shan 


My compiler isn't being very smart today and gives a bunch of
maybe used uninitialized for socket_offset, cluster_offset and core_offset.

They probably are initialized in all real paths, but I think you need to
set them to something at instantiation to keep the compiler happy.



Thanks for reporting the warning raised from the compiler. I think
your compiler may be smarter than mine :)

Yeah, they're initialized in all real paths after checking the code
again. So lets initialize them with zeroes in v6, but I would hold
for Igor and Yanan's reviews on this (v5) series.

Thanks,
Gavin




---
  hw/acpi/aml-build.c | 100 +---
  1 file changed, 38 insertions(+), 62 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 4086879ebf..4b0f9df3e3 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -2002,86 +2002,62 @@ void build_pptt(GArray *table_data, BIOSLinker *linker, 
MachineState *ms,
  const char *oem_id, const char *oem_table_id)
  {
  MachineClass *mc = MACHINE_GET_CLASS(ms);
-GQueue *list = g_queue_new();
-guint pptt_start = table_data->len;
-guint parent_offset;
-guint length, i;
-int uid = 0;
-int socket;
+CPUArchIdList *cpus = ms->possible_cpus;
+int64_t socket_id = -1, cluster_id = -1, core_id = -1;
+uint32_t socket_offset, cluster_offset, core_offset;
+uint32_t pptt_start = table_data->len;
+int n;
  AcpiTable table = { .sig = "PPTT", .rev = 2,
  .oem_id = oem_id, .oem_table_id = oem_table_id };
  
  acpi_table_begin(, table_data);
  
-for (socket = 0; socket < ms->smp.sockets; socket++) {

-g_queue_push_tail(list,
-GUINT_TO_POINTER(table_data->len - pptt_start));
-build_processor_hierarchy_node(
-table_data,
-/*
- * Physical package - represents the boundary
- * of a physical package
- */
-(1 << 0),
-0, socket, NULL, 0);
-}
+for (n = 0; n < cpus->len; n++) {
+if (cpus->cpus[n].props.socket_id != socket_id) {
+socket_id = cpus->cpus[n].props.socket_id;
+cluster_id = -1;
+core_id = -1;
+socket_offset = table_data->len - pptt_start;
+build_processor_hierarchy_node(table_data,
+(1 << 0), /* Physical package */
+0, socket_id, NULL, 0);
+}
  
-if (mc->smp_props.clusters_supported) {

-length = g_queue_get_length(list);
-for (i = 0; i < length; i++) {
-int cluster;
-
-parent_offset = GPOINTER_TO_UINT(g_queue_pop_head(list));
-for (cluster = 0; cluster < ms->smp.clusters; cluster++) {
-g_queue_push_tail(list,
-GUINT_TO_POINTER(table_data->len - pptt_start));
-build_processor_hierarchy_node(
-table_data,
-(0 << 0), /* not a physical package */
-parent_offset, cluster, NULL, 0);
+if (mc->smp_props.clusters_supported) {
+if (cpus->cpus[n].props.cluster_id != cluster_id) {
+cluster_id = cpus->cpus[n].props.cluster_id;
+core_id = -1;
+cluster_offset = table_data->len - pptt_start;
+build_processor_hierarchy_node(table_data,
+(0 << 0), /* Not a physical package */
+socket_offset, cluster_id, NULL, 0);
  }
+} else {
+cluster_offset = socket_offset;
  }
-}
  
-length = g_queue_get_length(list);

-for (i = 0; i < length; i++) {
-int core;
-
-parent_offset = GPOINTER_TO_UINT(g_queue_pop_head(list));
-for (core = 0; core < ms->smp.cores; core++) {
-if (ms->smp.threads > 1) {
-g_queue_push_tail(list,
-GUINT_TO_POINTER(table_data->len - pptt_start));
-build_processor_hierarchy_node(
-table_data,
+if (ms->smp.threads <= 1) {
+build_processor_hierarchy_node(table_data,
+(1 << 1) | /* ACPI Processor ID valid */
+(1 << 3),  /* Node is a Leaf */
+cluster_offset, n, NULL, 0);
+} else {
+if (cpus->cpus[n].props.core_id != core_id) {
+core_id = cpus->cpus[n].props.core_id;
+core_offset = table_data->len -

[ANNOUNCE] QEMU 7.0.0-rc4 is now available

2022-04-12 Thread Michael Roth

Hello,

On behalf of the QEMU Team, I'd like to announce the availability of the
fifth release candidate for the QEMU 7.0 release. This release is meant
for testing purposes and should not be used in a production environment.

  http://download.qemu-project.org/qemu-7.0.0-rc4.tar.xz
  http://download.qemu-project.org/qemu-7.0.0-rc4.tar.xz.sig

A note from the maintainer:

  rc4 contains three fixes for late-breaking security bugs. The plan
  is to make the final 7.0 release in a week's time on the 19th April,
  with no further changes, unless we discover some last-minute
  catastrophic problem.

You can help improve the quality of the QEMU 7.0 release by testing this
release and reporting bugs using our GitLab issue tracker:

  https://gitlab.com/qemu-project/qemu/-/issues

The release plan, as well a documented known issues for release
candidates, are available at:

  http://wiki.qemu.org/Planning/7.0

Please add entries to the ChangeLog for the 7.0 release below:

  http://wiki.qemu.org/ChangeLog/7.0

Thank you to everyone involved!

Changes since rc3:

81c7ed41a1: Update version for v7.0.0-rc4 release (Peter Maydell)
4bf58c7213: virtio-iommu: use-after-free fix (Wentao Liang)
fa892e9abb: ui/cursor: fix integer overflow in cursor_alloc (CVE-2021-4206) 
(Mauro Matteo Cascella)
9569f5cb5b: display/qxl-render: fix race condition in qxl_cursor 
(CVE-2021-4207) (Mauro Matteo Cascella)

[PATCH] net/vhost-user: Save ack_features to net_clients during vhost_user_start

2022-04-12 Thread Yi Wang

From: Liu Xiangyu 

During vhost_user_start, if openvswitch.service restart, cause the final 
features
not expected. Because qemu not save the ack_features promptly.

Signed-off-by: Liu Xiangyu 
Signed-off-by: Yi Wang 
---
 net/vhost-user.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/vhost-user.c b/net/vhost-user.c
index b1a0247..ce9dcb6 100644
--- a/net/vhost-user.c
+++ b/net/vhost-user.c
@@ -92,6 +92,10 @@ static int vhost_user_start(int queues, NetClientState 
*ncs[],
 goto err;
 }
 
+if (s->vhost_net) {
+s->acked_features = vhost_net_get_acked_features(net);
+}
+
 if (i == 0) {
 max_queues = vhost_net_get_max_queues(net);
 if (queues > max_queues) {
-- 
1.8.3.1

[PATCH v3 for 7.1 1/1] block: add 'force' parameter to 'blockdev-change-medium' command

2022-04-12 Thread Denis V. Lunev

'blockdev-change-medium' is a convinient wrapper for the following
sequence of commands:
 * blockdev-open-tray
 * blockdev-remove-medium
 * blockdev-insert-medium
 * blockdev-close-tray
and should be used f.e. to change ISO image inside the CD-ROM tray.
Though the guest could lock the tray and some linux guests like
CentOS 8.5 actually does that. In this case the execution if this
command results in the error like the following:
  Device 'scsi0-0-1-0' is locked and force was not specified,
  wait for tray to open and try again.

This situation is could be resolved 'blockdev-open-tray' by passing
flag 'force' inside. Thus is seems reasonable to add the same
capability for 'blockdev-change-medium' too.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Acked-by: "Dr. David Alan Gilbert" 
CC: Kevin Wolf 
CC: Hanna Reitz 
CC: Eric Blake 
CC: Markus Armbruster 
---
 block/qapi-sysemu.c |  3 ++-
 hmp-commands.hx | 11 +++
 monitor/hmp-cmds.c  |  4 +++-
 qapi/block.json |  6 ++
 ui/cocoa.m  |  1 +
 5 files changed, 19 insertions(+), 6 deletions(-)

Changes from v2:
- fixed parameter's order in changeDeviceMedia(). This is a VERY interesting
  story, actually. Both versions of the patch (v2 & v3) compile silently.
  In order to see the difference one needs to enable -Weverything compilation
  option!

Changes from v1:
- added kludge to Objective C code
- simplified a bit call of do_open_tray() (thanks, Vova!)
- added record to hmp-command.hx

diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
index 8498402ad4..680c7ee342 100644
--- a/block/qapi-sysemu.c
+++ b/block/qapi-sysemu.c
@@ -318,6 +318,7 @@ void qmp_blockdev_change_medium(bool has_device, const char 
*device,
 bool has_id, const char *id,
 const char *filename,
 bool has_format, const char *format,
+bool has_force, bool force,
 bool has_read_only,
 BlockdevChangeReadOnlyMode read_only,
 Error **errp)
@@ -380,7 +381,7 @@ void qmp_blockdev_change_medium(bool has_device, const char 
*device,
 
 rc = do_open_tray(has_device ? device : NULL,
   has_id ? id : NULL,
-  false, );
+  force, );
 if (rc && rc != -ENOSYS) {
 error_propagate(errp, err);
 goto fail;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 8476277aa9..6ec593ea08 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -202,9 +202,9 @@ ERST
 
 {
 .name   = "change",
-.args_type  = "device:B,target:F,arg:s?,read-only-mode:s?",
-.params = "device filename [format [read-only-mode]]",
-.help   = "change a removable medium, optional format",
+.args_type  = "device:B,force:-f,target:F,arg:s?,read-only-mode:s?",
+.params = "device [-f] filename [format [read-only-mode]]",
+.help   = "change a removable medium, optional format, use -f to 
force the operation",
 .cmd= hmp_change,
 },
 
@@ -212,11 +212,14 @@ SRST
 ``change`` *device* *setting*
   Change the configuration of a device.
 
-  ``change`` *diskdevice* *filename* [*format* [*read-only-mode*]]
+  ``change`` *diskdevice* [-f] *filename* [*format* [*read-only-mode*]]
 Change the medium for a removable disk device to point to *filename*. eg::
 
   (qemu) change ide1-cd0 /path/to/some.iso
 
+``-f``
+  forces the operation even if the guest has locked the tray.
+
 *format* is optional.
 
 *read-only-mode* may be used to change the read-only status of the device.
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 634968498b..d8b98bed6c 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1472,6 +1472,7 @@ void hmp_change(Monitor *mon, const QDict *qdict)
 const char *target = qdict_get_str(qdict, "target");
 const char *arg = qdict_get_try_str(qdict, "arg");
 const char *read_only = qdict_get_try_str(qdict, "read-only-mode");
+bool force = qdict_get_try_bool(qdict, "force", false);
 BlockdevChangeReadOnlyMode read_only_mode = 0;
 Error *err = NULL;
 
@@ -1508,7 +1509,8 @@ void hmp_change(Monitor *mon, const QDict *qdict)
 }
 
 qmp_blockdev_change_medium(true, device, false, NULL, target,
-   !!arg, arg, !!read_only, read_only_mode,
+   !!arg, arg, true, force,
+   !!read_only, read_only_mode,
);
 }
 
diff --git a/qapi/block.json b/qapi/block.json
index 82fcf2c914..3f100d4887 100644
--- a/qapi/block.json
+++ b/qapi/block.json
@@ -326,6 +326,11 @@
 # @read-only-mode: change the read-only mode of the device; defaults
 #  to 'retain'
 #
+# @force: if false (the default), an

Re: [PATCH for-7.1 1/8] nbd: actually implement reply_possible safeguard

2022-04-12 Thread Eric Blake

On Tue, Apr 12, 2022 at 09:41:57PM +0200, Paolo Bonzini wrote:
> The .reply_possible field of s->requests is never set to false.  This is
> not a big problem as it is only a safeguard to detect protocol errors,
> but fix it anyway.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/nbd.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 567872ac53..6a5e410e5f 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -454,15 +454,16 @@ static coroutine_fn int 
> nbd_receive_replies(BDRVNBDState *s, uint64_t handle)
>  nbd_channel_error(s, -EINVAL);
>  return -EINVAL;
>  }
> -if (s->reply.handle == handle) {
> -/* We are done */
> -return 0;
> -}
>  ind2 = HANDLE_TO_INDEX(s, s->reply.handle);
>  if (ind2 >= MAX_NBD_REQUESTS || !s->requests[ind2].reply_possible) {
>  nbd_channel_error(s, -EINVAL);
>  return -EINVAL;
>  }
> +s->requests[ind2].reply_possible = 
> nbd_reply_is_structured(>reply);

If the reply is simple (not structured), then we expect no further
replies, so this sets things to false.  But if the reply is
structured, the answer depends on NBD_REPLY_FLAG_DONE, as in:

s->requests[ind2].reply_possible =
  nbd_reply_is_structured(>reply) &&
  (s->reply.structured.flags & NBD_REPLY_FLAG_DONE);

> +if (s->reply.handle == handle) {
> +/* We are done */
> +return 0;
> +}
>  nbd_recv_coroutine_wake_one(>requests[ind2]);
>  }
>  }

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v5 04/13] mm/shmem: Restrict MFD_INACCESSIBLE memory against RLIMIT_MEMLOCK

2022-04-12 Thread Andy Lutomirski

On Tue, Apr 12, 2022, at 7:36 AM, Jason Gunthorpe wrote:
> On Fri, Apr 08, 2022 at 08:54:02PM +0200, David Hildenbrand wrote:
>
>> RLIMIT_MEMLOCK was the obvious candidate, but as we discovered int he
>> past already with secretmem, it's not 100% that good of a fit (unmovable
>> is worth than mlocked). But it gets the job done for now at least.
>
> No, it doesn't. There are too many different interpretations how
> MELOCK is supposed to work
>
> eg VFIO accounts per-process so hostile users can just fork to go past
> it.
>
> RDMA is per-process but uses a different counter, so you can double up
>
> iouring is per-user and users a 3rd counter, so it can triple up on
> the above two
>
>> So I'm open for alternative to limit the amount of unmovable memory we
>> might allocate for user space, and then we could convert seretmem as well.
>
> I think it has to be cgroup based considering where we are now :\
>

So this is another situation where the actual backend (TDX, SEV, pKVM, pure 
software) makes a difference -- depending on exactly what backend we're using, 
the memory may not be unmoveable.  It might even be swappable (in the 
potentially distant future).

Anyway, here's a concrete proposal, with a bit of handwaving:

We add new cgroup limits:

memory.unmoveable
memory.locked

These can be set to an actual number or they can be set to the special value 
ROOT_CAP.  If they're set to ROOT_CAP, then anyone in the cgroup with 
capable(CAP_SYS_RESOURCE) (i.e. the global capability) can allocate movable or 
locked memory with this (and potentially other) new APIs.  If it's 0, then they 
can't.  If it's another value, then the memory can be allocated, charged to the 
cgroup, up to the limit, with no particular capability needed.  The default at 
boot is ROOT_CAP.  Anyone who wants to configure it differently is free to do 
so.  This avoids introducing a DoS, makes it easy to run tests without 
configuring cgroup, and lets serious users set up their cgroups.

Nothing is charge per mm.

To make this fully sensible, we need to know what the backend is for the 
private memory before allocating any so that we can charge it accordingly.

Re: [PATCH v2 04/39] util/log: Pass Error pointer to qemu_set_log

2022-04-12 Thread Alex Bennée



Richard Henderson  writes:

> Do not force exit within qemu_set_log; return bool and pass
> an Error value back up the stack as per usual.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

Re: [PATCH v2 07/39] util/log: Rename qemu_log_lock to qemu_log_trylock

2022-04-12 Thread Alex Bennée



Richard Henderson  writes:

> This function can fail, which makes it more like ftrylockfile
> or pthread_mutex_trylock than flockfile or pthread_mutex_lock,
> so rename it.
>
> To closer match the other trylock functions, release rcu_read_lock
> along the failure path, so that qemu_log_unlock need not be called
> on failure.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

Re: [PATCH] hw/nvme: fix narrowing conversion

2022-04-12 Thread Klaus Jensen

On Apr 12 11:59, Dmitry Tikhov wrote:
> Since nlbas is of type int, it does not work with large namespace size
> values, e.g., 9 TB size of file backing namespace and 8 byte metadata
> with 4096 bytes lbasz gives negative nlbas value, which is later
> promoted to negative int64_t type value and results in negative
> ns->moff which breaks namespace
> 
> Signed-off-by: Dmitry Tikhov 
> ---
>  hw/nvme/ns.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/nvme/ns.c b/hw/nvme/ns.c
> index 324f53ea0c..af6504fad2 100644
> --- a/hw/nvme/ns.c
> +++ b/hw/nvme/ns.c
> @@ -29,7 +29,8 @@ void nvme_ns_init_format(NvmeNamespace *ns)
>  {
>  NvmeIdNs *id_ns = >id_ns;
>  BlockDriverInfo bdi;
> -int npdg, nlbas, ret;
> +int npdg, ret;
> +int64_t nlbas;
>  
>  ns->lbaf = id_ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
>  ns->lbasz = 1 << ns->lbaf.ds;
> @@ -42,7 +43,7 @@ void nvme_ns_init_format(NvmeNamespace *ns)
>  id_ns->ncap = id_ns->nsze;
>  id_ns->nuse = id_ns->ncap;
>  
> -ns->moff = (int64_t)nlbas << ns->lbaf.ds;
> +ns->moff = nlbas << ns->lbaf.ds;
>  
>  npdg = ns->blkconf.discard_granularity / ns->lbasz;
>  
> -- 
> 2.35.1
> 

Thanks Dmitry. Looks reasonable,

Reviewed-by: Klaus Jensen 


signature.asc
Description: PGP signature

Re: [PATCH v2 03/39] util/log: Return bool from qemu_set_log_filename

2022-04-12 Thread Alex Bennée



Richard Henderson  writes:

> Per the recommendations in qapi/error.h, return false on failure.
>
> Use the return value in the monitor, the only place we aren't
> already passing error_fatal or error_abort.
>
> Signed-off-by: Richard Henderson 

Reviewed-by: Alex Bennée 

-- 
Alex Bennée

Re: [PATCH] Warn user if the vga flag is passed but no vga device is created

2022-04-12 Thread Gautam Agrawal

hi,

> thanks for your patch, looks pretty good already, but there is a small
> issue: Try for example:
>
>   ./qemu-system-s390x -vga none
>
> ... and it will print the warning "qemu-system-s390x: warning: No vga device
> is created", though the user only asked for no VGA device. This seems to
> happen if a machine does not have any VGA device by default, but still
> requests "-vga none" on the command line.

This can be solved by adding this condition : (vga_interface_type != VGA_NONE)


> On 08/04/2022 12.45, Gautam Agrawal wrote:
> > This patch is in regards to this 
> > issue:https://gitlab.com/qemu-project/qemu/-/issues/581#.
>
> Better write this right in front of your Signed-off-by line:
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/581
>
> ... then the ticket will be automatically be closed once your patch gets 
> merged.
>
I apologize for this mistake

> vga_interface_type is also used in hw/mips/fuloong2e.c and
> hw/xenpv/xen_machine_pv.c ... do they need a change, too?

I can definitely make similar changes in them too since they also
specify the vga_interface_type, shall I proceed with this?

> This will trigger a warning from the scripts/checkpatch.pl script:
>
> ERROR: do not initialise globals to 0 or NULL
> #238: FILE: softmmu/globals.c:43:
> +bool vga_interface_created = false;

Could you kindly suggest a better approach to this than creating a
global variable.


> I'm not a native speaker, and maybe it's just a matter of taste, but I'd
> rather say it in past tense: "No VGA device has been created"

I will correct the warning message, as suggested by Peter Maydell.

Regards,
Gautam Agrawal

Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-04-12 Thread Kirill A. Shutemov

On Mon, Mar 28, 2022 at 01:16:48PM -0700, Andy Lutomirski wrote:
> On Thu, Mar 10, 2022 at 6:09 AM Chao Peng  wrote:
> >
> > This is the v5 of this series which tries to implement the fd-based KVM
> > guest private memory. The patches are based on latest kvm/queue branch
> > commit:
> >
> >   d5089416b7fb KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
> 
> Can this series be run and a VM booted without TDX?  A feature like
> that might help push it forward.

It would require enlightenment of the guest code. We have two options.

Simple one is to limit enabling to the guest kernel, but it would require
non-destructive conversion between shared->private memory. This does not
seem to be compatible with current design.

Other option is get memory private from time 0 of VM boot, but it requires
modification of virtual BIOS to setup shared ranges as needed. I'm not
sure if anybody volunteer to work on BIOS code to make it happen.

Hm.

-- 
 Kirill A. Shutemov

98 matches

Mail list logo