date:20210216

Re: USB pass through into Mac OS 9.x with qemu-system-ppc

2021-02-16 Thread Gerd Hoffmann

  Hi,

> Thanks for the explanation. I've been successful in passing through an
> USB 2.0 stick into Mac OS 9.x when connected to a real USB 1.1 hub in
> a old Apple keyboard.

Ah, neat trick to force the device into usb1 mode.

take care,
  Gerd

Re: [PATCH 1/5] qcow2-bitmap: make bytes_covered_by_bitmap_cluster() public

2021-02-16 Thread Vladimir Sementsov-Ogievskiy


16.02.2021 19:45, Vladimir Sementsov-Ogievskiy wrote:

Rename bytes_covered_by_bitmap_cluster() to
bdrv_dirty_bitmap_serialization_coverage() and make it public. It is
needed as we are going to make load_bitmap_data() public in the next
commit.


Not we are not. So the last sentence should be changed to

  It is needed as we are going to share it with bitmap loading in parallels 
format.

I started from sharing load_bitmap_data() function, but than dropped this idea.



Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/dirty-bitmap.h |  2 ++
  block/dirty-bitmap.c | 13 +
  block/qcow2-bitmap.c | 16 ++--
  3 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/include/block/dirty-bitmap.h b/include/block/dirty-bitmap.h
index 36e8da4fc2..f581cf9fd7 100644
--- a/include/block/dirty-bitmap.h
+++ b/include/block/dirty-bitmap.h
@@ -57,6 +57,8 @@ void bdrv_dirty_iter_free(BdrvDirtyBitmapIter *iter);
  uint64_t bdrv_dirty_bitmap_serialization_size(const BdrvDirtyBitmap *bitmap,
uint64_t offset, uint64_t 
bytes);
  uint64_t bdrv_dirty_bitmap_serialization_align(const BdrvDirtyBitmap *bitmap);
+uint64_t bdrv_dirty_bitmap_serialization_coverage(int serialized_chunk_size,
+const BdrvDirtyBitmap *bitmap);
  void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
uint8_t *buf, uint64_t offset,
uint64_t bytes);
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 9b9cd71238..a0eaa28785 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -726,6 +726,19 @@ uint64_t bdrv_dirty_bitmap_serialization_align(const 
BdrvDirtyBitmap *bitmap)
  return hbitmap_serialization_align(bitmap->bitmap);
  }
  
+/* Return the disk size covered by a chunk of serialized bitmap data. */

+uint64_t bdrv_dirty_bitmap_serialization_coverage(int serialized_chunk_size,
+  const BdrvDirtyBitmap 
*bitmap)
+{
+uint64_t granularity = bdrv_dirty_bitmap_granularity(bitmap);
+uint64_t limit = granularity * (serialized_chunk_size << 3);
+
+assert(QEMU_IS_ALIGNED(limit,
+   bdrv_dirty_bitmap_serialization_align(bitmap)));
+return limit;
+}
+
+
  void bdrv_dirty_bitmap_serialize_part(const BdrvDirtyBitmap *bitmap,
uint8_t *buf, uint64_t offset,
uint64_t bytes)
diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
index 5eef82fa55..42d81c44cd 100644
--- a/block/qcow2-bitmap.c
+++ b/block/qcow2-bitmap.c
@@ -278,18 +278,6 @@ static int free_bitmap_clusters(BlockDriverState *bs, 
Qcow2BitmapTable *tb)
  return 0;
  }
  
-/* Return the disk size covered by a single qcow2 cluster of bitmap data. */

-static uint64_t bytes_covered_by_bitmap_cluster(const BDRVQcow2State *s,
-const BdrvDirtyBitmap *bitmap)
-{
-uint64_t granularity = bdrv_dirty_bitmap_granularity(bitmap);
-uint64_t limit = granularity * (s->cluster_size << 3);
-
-assert(QEMU_IS_ALIGNED(limit,
-   bdrv_dirty_bitmap_serialization_align(bitmap)));
-return limit;
-}
-
  /* load_bitmap_data
   * @bitmap_table entries must satisfy specification constraints.
   * @bitmap must be cleared */
@@ -312,7 +300,7 @@ static int load_bitmap_data(BlockDriverState *bs,
  }
  
  buf = g_malloc(s->cluster_size);

-limit = bytes_covered_by_bitmap_cluster(s, bitmap);
+limit = bdrv_dirty_bitmap_serialization_coverage(s->cluster_size, bitmap);
  for (i = 0, offset = 0; i < tab_size; ++i, offset += limit) {
  uint64_t count = MIN(bm_size - offset, limit);
  uint64_t entry = bitmap_table[i];
@@ -1303,7 +1291,7 @@ static uint64_t *store_bitmap_data(BlockDriverState *bs,
  }
  
  buf = g_malloc(s->cluster_size);

-limit = bytes_covered_by_bitmap_cluster(s, bitmap);
+limit = bdrv_dirty_bitmap_serialization_coverage(s->cluster_size, bitmap);
  assert(DIV_ROUND_UP(bm_size, limit) == tb_size);
  
  offset = 0;





--
Best regards,
Vladimir

Re: [PATCH RESEND] hostmem: Don't report pmem attribute if unsupported

2021-02-16 Thread Michal Privoznik


On 2/17/21 12:07 AM, John Snow wrote:

On 2/16/21 5:23 PM, Eduardo Habkost wrote:

On Tue, Jan 26, 2021 at 08:48:25AM +0100, Michal Privoznik wrote:

When management applications (like Libvirt) want to check whether
memory-backend-file.pmem is supported they can list object
properties using 'qom-list-properties'. However, 'pmem' is
declared always (and thus reported always) and only at runtime
QEMU errors out if it was built without libpmem (and thus can not
guarantee write persistence). This is suboptimal since we have
ability to declare attributes at compile time.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1915216
Signed-off-by: Michal Privoznik 
Reviewed-by: Daniel Henrique Barboza 
Tested-by: Daniel Henrique Barboza 
Reviewed-by: Igor Mammedov 


I'm not a fan of making availability of properties conditional
(even if at compile time), but if this helps libvirt I guess it
makes sense.



Compile time might be OK, but if we want to describe everything via QAPI 
eventually, we just need to be able to describe that compile-time 
requisite appropriately.


Conditional at run-time is I think the thing that absolutely has to go 
wherever it surfaces.


I'm open for discussion. How do you think libvirt (or any other mgmt 
tool/user) should inspect whether given feature is available?
What libvirt currently does it issues 'qom-list-properties' with 
'typename=memory-backend-file' and inspects whether pmem attribute is 
available. Is 'qom-list' preferred?






CCing John, who has been thinking a lot about these questions.



Thanks for the heads up. Good reminder that libvirt uses the existence 
of properties as a bellwether for feature support. I don't think I like 
that idea, but I like breaking libvirt even less.


That was at hand solution. If libvirt's not doing it right, I'm happy to 
make things better.


Michal

Re: USB pass through into Mac OS 9.x with qemu-system-ppc

2021-02-16 Thread Gerd Hoffmann

> Hi Gerd,
> 
> Thanks for looking into this. It looks to me that the usb storage
> device nicely reports endpoints 1 and 2 when asked, but that the host
> only ever communicates with endpoint 1.

EP 1 is host -> device.
EP 2 is device -> host.

So the host sends requests but the device never answers, probably due to
the wMaxPacketSize mismatch, macos being confused and sending broken
requests ...

> Can this also explain that other (non-mass-storage) devices cannot be
> passed through successfully ?

Yes, you can have simliar issues with other devices.  Devices can have
different interface descriptors for different device speeds.  This can
be rather small differences like the wMacPacketSize for usb sticks.  The
descriptors can be identical.  Or there can be large differences, like
usb-audio devices offering more channels when plugged into a usb2 port
(with enough bandwidth for that).

Problem is when the device is plugged into a usb2 port you can't query
the usb1 descriptors.  So qemu presents the wrong descriptors to the
guest in case host and guest use different usb speeds.  That may or may
not work ...

The other way around is less problematic, when plugging a usb2 device
into a usb3-capable (xhci) port I can tell the guest "this is a usb2
device".  But reporting "this is a usb2 device" via ohci isn't going to
fly for obvious reasons ...

take care,
  Gerd

Re: USB pass through into Mac OS 9.x with qemu-system-ppc

2021-02-16 Thread Howard Spoelstra

On Tue, Feb 16, 2021 at 4:42 PM Howard Spoelstra  wrote:
>
> On Tue, Feb 16, 2021 at 3:48 PM Gerd Hoffmann  wrote:
> >
> >   Hi,
> >
> > > Please find another pcap file attached. This one stems from an attempt
> > > to pass through a midi device when running qemu-system-ppc with Mac OS
> > > 9.2 in macOS host.
> >
> > Ah, yes, I remember now.  Problem is that the usb stick is plugged into
> > a high-speed port (usb2) on the host but passed as full-speed device
> > (usb1) to the guest.  That works in some cases, but is not guaranteed
> > to work.  usb_host_speed_compat() tries to catch some of the
> > incompatible cases.  The usb-storage incompatibility slips through
> > because the incompatibility is specific to the mass storage protocol.
> > Specifically the wMaxPacketSize is 64 for usb1 and 512 for usb2.
> >
> > Seems fedora deals better with the situation ...
> >
> > take care,
> >   Gerd
> >
>
> Hi Gerd,
>
> Thanks for looking into this. It looks to me that the usb storage
> device nicely reports endpoints 1 and 2 when asked, but that the host
> only ever communicates with endpoint 1.
> Is that the issue you refer to, or might that be libusb related?
>
> Can this also explain that other (non-mass-storage) devices cannot be
> passed through successfully ?
>
> Best,
> Howard

Hi Gerd,

Thanks for the explanation. I've been successful in passing through an
USB 2.0 stick into Mac OS 9.x when connected to a real USB 1.1 hub in
a old Apple keyboard. This at least gives some perspective to connect
other devices as well.

So we can consider this quest finished ;-)

Best,
Howard

Re: [PATCH v2] qga: return a more explicit error on why a command is disabled

2021-02-16 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20210217070944.2371327-1-marcandre.lur...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210217070944.2371327-1-marcandre.lur...@redhat.com
Subject: [PATCH v2] qga: return a more explicit error on why a command is 
disabled

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]  patchew/20210211225246.17315-1-danielhb...@gmail.com -> 
patchew/20210211225246.17315-1-danielhb...@gmail.com
 - [tag update]  patchew/20210216181316.794276-1-alx...@bu.edu -> 
patchew/20210216181316.794276-1-alx...@bu.edu
 - [tag update]  patchew/20210216224543.16142-1-rebe...@nuviainc.com -> 
patchew/20210216224543.16142-1-rebe...@nuviainc.com
 * [new tag] 
patchew/20210217070944.2371327-1-marcandre.lur...@redhat.com -> 
patchew/20210217070944.2371327-1-marcandre.lur...@redhat.com
Switched to a new branch 'test'
56eb0c9 qga: return a more explicit error on why a command is disabled

=== OUTPUT BEGIN ===
ERROR: open brace '{' following enum go on the same line
#32: FILE: include/qapi/qmp/dispatch.h:32:
+typedef enum QmpDisabled
+{

total: 1 errors, 0 warnings, 121 lines checked

Commit 56eb0c90cf28 (qga: return a more explicit error on why a command is 
disabled) has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210217070944.2371327-1-marcandre.lur...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[PATCH v2] qga: return a more explicit error on why a command is disabled

2021-02-16 Thread marcandre . lureau

From: Marc-André Lureau 

qmp_disable_command() now takes an enum for the reason, to be able
to give more explicit error messages.

Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=1928806

Signed-off-by: Marc-André Lureau 
---

v2:
 - replace string with an enum for disabling reason
 - remove trailing dot

 include/qapi/qmp/dispatch.h | 12 ++--
 monitor/qmp-cmds-control.c  |  2 +-
 qapi/qmp-dispatch.c | 10 +-
 qapi/qmp-registry.c | 16 +---
 qga/main.c  |  4 ++--
 5 files changed, 31 insertions(+), 13 deletions(-)

diff --git a/include/qapi/qmp/dispatch.h b/include/qapi/qmp/dispatch.h
index 1486cac3ef..fda9ffad73 100644
--- a/include/qapi/qmp/dispatch.h
+++ b/include/qapi/qmp/dispatch.h
@@ -28,6 +28,13 @@ typedef enum QmpCommandOptions
 QCO_COROUTINE =  (1U << 3),
 } QmpCommandOptions;
 
+typedef enum QmpDisabled
+{
+QMP_DISABLED_NONE,
+QMP_DISABLED_GENERIC,
+QMP_DISABLED_FROZEN,
+} QmpDisabled;
+
 typedef struct QmpCommand
 {
 const char *name;
@@ -35,7 +42,7 @@ typedef struct QmpCommand
 QmpCommandFunc *fn;
 QmpCommandOptions options;
 QTAILQ_ENTRY(QmpCommand) node;
-bool enabled;
+QmpDisabled disabled;
 } QmpCommand;
 
 typedef QTAILQ_HEAD(QmpCommandList, QmpCommand) QmpCommandList;
@@ -44,7 +51,8 @@ void qmp_register_command(QmpCommandList *cmds, const char 
*name,
   QmpCommandFunc *fn, QmpCommandOptions options);
 const QmpCommand *qmp_find_command(const QmpCommandList *cmds,
const char *name);
-void qmp_disable_command(QmpCommandList *cmds, const char *name);
+void qmp_disable_command(QmpCommandList *cmds, const char *name,
+ QmpDisabled disabled);
 void qmp_enable_command(QmpCommandList *cmds, const char *name);
 
 bool qmp_command_is_enabled(const QmpCommand *cmd);
diff --git a/monitor/qmp-cmds-control.c b/monitor/qmp-cmds-control.c
index 509ae870bd..94a8e133b6 100644
--- a/monitor/qmp-cmds-control.c
+++ b/monitor/qmp-cmds-control.c
@@ -107,7 +107,7 @@ static void query_commands_cb(const QmpCommand *cmd, void 
*opaque)
 CommandInfo *info;
 CommandInfoList **list = opaque;
 
-if (!cmd->enabled) {
+if (!qmp_command_is_enabled(cmd)) {
 return;
 }
 
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 0a2b20a4e4..b65f670152 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -155,11 +155,19 @@ QDict *qmp_dispatch(const QmpCommandList *cmds, QObject 
*request,
   "The command %s has not been found", command);
 goto out;
 }
-if (!cmd->enabled) {
+switch (cmd->disabled) {
+case QMP_DISABLED_FROZEN:
+error_set(&err, ERROR_CLASS_COMMAND_NOT_FOUND,
+  "The command %s has been disabled after fsfreeze",
+  command);
+goto out;
+case QMP_DISABLED_GENERIC:
 error_set(&err, ERROR_CLASS_COMMAND_NOT_FOUND,
   "The command %s has been disabled for this instance",
   command);
 goto out;
+case QMP_DISABLED_NONE:
+break;
 }
 if (oob && !(cmd->options & QCO_ALLOW_OOB)) {
 error_setg(&err, "The command %s does not support OOB",
diff --git a/qapi/qmp-registry.c b/qapi/qmp-registry.c
index 58c65b5052..e39e3b449c 100644
--- a/qapi/qmp-registry.c
+++ b/qapi/qmp-registry.c
@@ -25,7 +25,7 @@ void qmp_register_command(QmpCommandList *cmds, const char 
*name,
 
 cmd->name = name;
 cmd->fn = fn;
-cmd->enabled = true;
+cmd->disabled = QMP_DISABLED_NONE;
 cmd->options = options;
 QTAILQ_INSERT_TAIL(cmds, cmd, node);
 }
@@ -43,31 +43,33 @@ const QmpCommand *qmp_find_command(const QmpCommandList 
*cmds, const char *name)
 }
 
 static void qmp_toggle_command(QmpCommandList *cmds, const char *name,
-   bool enabled)
+   QmpDisabled disabled)
 {
 QmpCommand *cmd;
 
 QTAILQ_FOREACH(cmd, cmds, node) {
 if (strcmp(cmd->name, name) == 0) {
-cmd->enabled = enabled;
+cmd->disabled = disabled;
 return;
 }
 }
 }
 
-void qmp_disable_command(QmpCommandList *cmds, const char *name)
+void qmp_disable_command(QmpCommandList *cmds, const char *name,
+ QmpDisabled disabled)
 {
-qmp_toggle_command(cmds, name, false);
+assert(disabled != QMP_DISABLED_NONE);
+qmp_toggle_command(cmds, name, disabled);
 }
 
 void qmp_enable_command(QmpCommandList *cmds, const char *name)
 {
-qmp_toggle_command(cmds, name, true);
+qmp_toggle_command(cmds, name, QMP_DISABLED_NONE);
 }
 
 bool qmp_command_is_enabled(const QmpCommand *cmd)
 {
-return cmd->enabled;
+return cmd->disabled != QMP_DISABLED_NONE;
 }
 
 const char *qmp_command_name(const QmpCommand *cmd)
diff --git a/qga/main.c b/qga/main.c
index e7f8f3b161..0dbf0cacd2 100644
--- a/qga/main.c
+++ b/qga/main.c
@@ -375,7

Re: [PATCH] fuzz-test: remove unneccessary debugging flags

2021-02-16 Thread Thomas Huth


On 16/02/2021 19.13, Alexander Bulekov wrote:

These flags cause the output to look strange for 'make check', and
they aren't needed to reproduce bugs, if they reappear.

Suggested-by: Peter Maydell 
Signed-off-by: Alexander Bulekov 
---
  tests/qtest/fuzz-test.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/qtest/fuzz-test.c b/tests/qtest/fuzz-test.c
index cdb1100a0b..6f161c93be 100644
--- a/tests/qtest/fuzz-test.c
+++ b/tests/qtest/fuzz-test.c
@@ -39,8 +39,7 @@ static void test_lp1878642_pci_bus_get_irq_level_assert(void)
  QTestState *s;
  
  s = qtest_init("-M pc-q35-5.0 "

-   "-nographic -monitor none -serial none "
-   "-d guest_errors -trace pci*");
+   "-nographic -monitor none -serial none");
  
  qtest_outl(s, 0xcf8, 0x8400f841);

  qtest_outl(s, 0xcfc, 0xebed205d);



Reviewed-by: Thomas Huth

Re: [PATCH v4 18/24] python/qemu: add qemu package itself to pipenv

2021-02-16 Thread Cleber Rosa

On Thu, Feb 11, 2021 at 01:58:50PM -0500, John Snow wrote:
> This adds the python qemu packages themselves to the pipenv manifest.
> 'pipenv sync' will create a virtual environment sufficient to use the SDK.
> 'pipenv sync --dev' will create a virtual environment sufficient to use
> and test the SDK (with pylint, mypy, isort, flake8, etc.)
> 
> The qemu packages are installed in 'editable' mode; all changes made to
> the python package inside the git tree will be reflected in the
> installed package without reinstallation. This includes changes made
> via git pull and so on.
> 
> Signed-off-by: John Snow 
> ---
>  python/Pipfile  | 1 +
>  python/Pipfile.lock | 9 +++--
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 

Reviewed-by: Cleber Rosa 
Tested-by: Cleber Rosa 


signature.asc
Description: PGP signature

Re: [PATCH v4 17/24] python/qemu: add isort to pipenv

2021-02-16 Thread Cleber Rosa

On Thu, Feb 11, 2021 at 01:58:49PM -0500, John Snow wrote:
> isort 5.0.0 through 5.0.4 has a bug that causes it to misinterpret
> certain "from ..." clauses that are not related to imports.
> 
> Require 5.0.5 or greater.
> 
> isort can be run with 'isort -c qemu' from the python root.
> 
> Signed-off-by: John Snow 
> Reviewed-by: Philippe Mathieu-Daudé 
> ---
>  python/Pipfile  | 1 +
>  python/Pipfile.lock | 4 ++--
>  2 files changed, 3 insertions(+), 2 deletions(-)

Reviewed-by: Cleber Rosa 
Tested-by: Cleber Rosa 


signature.asc
Description: PGP signature

Re: [PATCH v4 16/24] python: move .isort.cfg into setup.cfg

2021-02-16 Thread Cleber Rosa

On Thu, Feb 11, 2021 at 01:58:48PM -0500, John Snow wrote:
> Signed-off-by: John Snow 
> ---
>  python/.isort.cfg | 7 ---
>  python/setup.cfg  | 8 
>  2 files changed, 8 insertions(+), 7 deletions(-)
>  delete mode 100644 python/.isort.cfg
> 

Reviewed-by: Cleber Rosa 


signature.asc
Description: PGP signature

Re: [PATCH v4 15/24] python: add mypy to pipenv

2021-02-16 Thread Cleber Rosa

On Thu, Feb 11, 2021 at 01:58:47PM -0500, John Snow wrote:
> 0.730 appears to be about the oldest version that works with the
> features we want, including nice human readable output (to make sure
> iotest 297 passes), and type-parameterized Popen generics.
> 
> 0.770, however, supports adding 'strict' to the config file, so require
> at least 0.770.
> 
> Now that we are checking a namespace package, we need to tell mypy to
> allow PEP420 namespaces, so modify the mypy config as part of the move.
> 
> mypy can now be run from the python root by typing 'mypy qemu'.
>

 $ mypy qemu 
 qemu/utils/accel.py: error: Source file found twice under different module 
names: 'qmp' and 'qemu.qmp'
 Found 1 error in 1 file (errors prevented further checking)

I guess you meant 'mypy -p qemu'.

> Signed-off-by: John Snow 
> ---
>  python/Pipfile  |  1 +
>  python/Pipfile.lock | 37 -
>  python/setup.cfg|  1 +
>  3 files changed, 38 insertions(+), 1 deletion(-)
> 

With that change,

Reviewed-by: Cleber Rosa 
Tested-by: Cleber Rosa 


signature.asc
Description: PGP signature

Re: [PATCH] pc-bios/s390-ccw: Use memory barriers in virtio code

2021-02-16 Thread Richard Henderson

On 2/16/21 8:15 AM, Thomas Huth wrote:
> On 16/02/2021 15.40, Halil Pasic wrote:
>> On Tue, 16 Feb 2021 12:00:56 +0100
>> Thomas Huth  wrote:
>>
>>> According to the virtio specification, a memory barrier should be
>>> used before incrementing the idx field in the "available" ring.
>>> So far, we did not do this in the s390-ccw bios yet, but recently
>>> Peter Maydell saw problems with the s390-ccw bios when running
>>> the qtests on an aarch64 host (the bios panic'ed with the message:
>>> "SCSI cannot report LUNs: response VS RESP=09"), which could
>>> maybe be related to the missing memory barriers. Thus let's add
>>> those barriers now. Since we've only seen the problem on TCG so far,
>>> a "bcr 14,0" should be sufficient here to trigger the tcg_gen_mb()
>>> in the TCG translate code.
>>>
>>> (Note: The virtio spec also talks about using a memory barrier
>>> *after* incrementing the idx field, but if I understood correctly
>>> this is only required when using notification suppression - which
>>> we don't use in the s390-ccw bios here)
>>
>> I suggest to the barrier after incrementing the idx field for two
>> reasons. First: If the device were to see the notification, but
>> not see the incremented idx field, it would effectively loose
>> initiative. That is pretty straight forward, because the
>> notification just says 'check out that queue', and if we don't
>> see the incremented index, miss the buffer that was made available
>> by incrementing idx.
> 
> I was just about to reply that this is certainly not necessary, since
> the DIAGNOSE instruction that we use for the notification hypercall
> should be serializing anyway ... but after looking at the PoP, it
> actually is not marked as a serializing instruction! (while e.g.
> SVC - supervisor call - is explicitly marked as serializing)
> 
> So maybe that's worth a try: Peter, could you please apply this patch
> on top an see whether it makes a difference?
> 
> diff --git a/pc-bios/s390-ccw/virtio.c b/pc-bios/s390-ccw/virtio.c
> --- a/pc-bios/s390-ccw/virtio.c
> +++ b/pc-bios/s390-ccw/virtio.c
> @@ -54,6 +54,7 @@ static long kvm_hypercall(unsigned long nr, unsigned long
> param1,
>  register ulong r_param3 asm("4") = param3;
>  register long retval asm("2");
>  
> +    virtio_mb();
>  asm volatile ("diag 2,4,0x500"
>    : "=d" (retval)
>    : "d" (r_nr), "0" (r_param1), "r"(r_param2), "d"(r_param3)

This is patching the wrong thing, IMO.  You're adding barriers to the guest
that I think ought not be architecturally required -- memory accesses on s390x
are strongly ordered, with or without diag/svc/whatever.

With

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index 1376cdc404..3c5f38be62 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1622,6 +1622,8 @@ static void tcg_out_tlb_read
 TCGType mask_type;
 uint64_t compare_mask;

+tcg_out_mb(s, TCG_MO_ALL);
+
 mask_type = (TARGET_PAGE_BITS + CPU_TLB_DYN_MAX_BITS > 32
  ? TCG_TYPE_I64 : TCG_TYPE_I32);

which is a gigantic hammer, adding a host barrier before every qemu guest
access, I can no longer provoke a failure (previously visible 1 in 4, now no
failures in 100).

With that as a data point for success, I'm going to try to use host
load-acquire / store-release instructions, and then apply TCG_GUEST_DEFAULT_MO
and see if I can find something that works reasonably.


r~

Re: [PATCH v4 13/24] python: Add flake8 to pipenv

2021-02-16 Thread Cleber Rosa

On Thu, Feb 11, 2021 at 01:58:45PM -0500, John Snow wrote:
> flake8 3.5.x does not support the --extend-ignore syntax used in the
> .flake8 file to gracefully extend default ignores, so 3.6.x is our
> minimum requirement. There is no known upper bound.
> 
> flake8 can be run from the python/ directory with no arguments.
> 
> Signed-off-by: John Snow 
> ---
>  python/Pipfile  |  1 +
>  python/Pipfile.lock | 51 -
>  2 files changed, 51 insertions(+), 1 deletion(-)
> 

Reviewed-by: Cleber Rosa 
Tested-by: Cleber Rosa 


signature.asc
Description: PGP signature

Re: [PATCH v4 12/24] python: move flake8 config to setup.cfg

2021-02-16 Thread Cleber Rosa

On Thu, Feb 11, 2021 at 01:58:44PM -0500, John Snow wrote:
> Update the comment concerning the flake8 exception to match commit
> 42c0dd12, whose commit message stated:
> 
> A note on the flake8 exception: flake8 will warn on *any* bare except,
> but pylint's is context-aware and will suppress the warning if you
> re-raise the exception.
> 
> Signed-off-by: John Snow 
> ---
>  python/qemu/machine/.flake8 | 2 --
>  python/setup.cfg| 3 +++
>  2 files changed, 3 insertions(+), 2 deletions(-)
>  delete mode 100644 python/qemu/machine/.flake8
>

Reviewed-by: Cleber Rosa 


signature.asc
Description: PGP signature

Re: [PATCH v4 11/24] python: add pylint to pipenv

2021-02-16 Thread Cleber Rosa

On Thu, Feb 11, 2021 at 01:58:43PM -0500, John Snow wrote:
> We are specifying >= pylint 2.6.x for two reasons:
> 
> 1. For setup.cfg support, added in pylint 2.5.x
> 2. To clarify that we are using a version that has incompatibly dropped
> bad-whitespace checks.
> 
> Signed-off-by: John Snow 
> ---
>  python/Pipfile  |   1 +
>  python/Pipfile.lock | 137 
>  2 files changed, 138 insertions(+)
>  create mode 100644 python/Pipfile.lock
> 

Tested at this point with:

 $ pipenv install --dev
 $ pipenv run pip freeze
 astroid==2.4.2
 isort==5.7.0
 lazy-object-proxy==1.4.3
 mccabe==0.6.1
 pylint==2.6.0
 six==1.15.0
 toml==0.10.2
 typed-ast==1.4.2
 wrapt==1.12.1

Reviewed-by: Cleber Rosa 
Tested-by: Cleber Rosa 


signature.asc
Description: PGP signature

Re: [PATCH v2 1/3] target/arm: Add support for FEAT_SSBS, Speculative Store Bypass Safe

2021-02-16 Thread Richard Henderson

On 2/16/21 2:45 PM, Rebecca Cran wrote:
> Add support for FEAT_SSBS. SSBS (Speculative Store Bypass Safe) is an
> optional feature in ARMv8.0, and mandatory in ARMv8.5.
> 
> Signed-off-by: Rebecca Cran 
> ---
>  target/arm/cpu.h   | 15 +++-
>  target/arm/helper.c| 37 
>  target/arm/internals.h |  6 
>  target/arm/translate-a64.c | 12 +++
>  4 files changed, 69 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson 

r~

Re: [PATCH v4 09/24] python: add pylint import exceptions

2021-02-16 Thread John Snow


On 2/16/21 10:07 PM, Cleber Rosa wrote:

On Thu, Feb 11, 2021 at 01:58:41PM -0500, John Snow wrote:

Pylint 2.5.x and 2.6.x have regressions that make import checking
inconsistent, see:

https://github.com/PyCQA/pylint/issues/3609
https://github.com/PyCQA/pylint/issues/3624
https://github.com/PyCQA/pylint/issues/3651

Pinning to 2.4.4 is worse, because it mandates versions of shared
dependencies that are too old for features we want in isort and mypy.
Oh well.

Signed-off-by: John Snow 
---
  python/qemu/machine/__init__.py | 3 +++
  python/qemu/machine/machine.py  | 2 +-
  python/qemu/machine/qtest.py| 2 +-
  3 files changed, 5 insertions(+), 2 deletions(-)



I can see your struggle on those issues, so I choose not to punish
myself attempting to replicate them.  I'll forever trust you in these
matters.



:~)


Reviewed-by: Cleber Rosa 



Thanks!

Re: [PATCH v4 08/24] python: Add pipenv support

2021-02-16 Thread John Snow


On 2/16/21 9:59 PM, Cleber Rosa wrote:

On Thu, Feb 11, 2021 at 01:58:40PM -0500, John Snow wrote:

pipenv is a tool used for managing virtual environments with pinned,
explicit dependencies. It is used for precisely recreating python
virtual environments.

pipenv uses two files to do this:

(1) Pipfile, which is similar in purpose and scope to what setup.py
lists. It specifies the requisite minimum to get a functional
environment for using this package.

(2) Pipfile.lock, which is similar in purpose to `pip freeze >
requirements.txt`. It specifies a canonical virtual environment used for
deployment or testing. This ensures that all users have repeatable
results.

The primary benefit of using this tool is to ensure repeatable CI
results with a known set of packages. Although I endeavor to support as
many versions as I can, the fluid nature of the Python toolchain often
means tailoring code for fairly specific versions.

Note that pipenv is *not* required to install or use this module; this is
purely for the sake of repeatable testing by CI or developers.

Here, a "blank" pipfile is added with no dependencies, but specifies
Python 3.6 for the virtual environment.

Pipfile will specify our version minimums, while Pipfile.lock specifies
an exact loudout of packages that were known to operate correctly. This


Layout? Loadout?


Whoops, "loadout", yeah.




latter file provides the real value for easy setup of container images
and CI environments.

Signed-off-by: John Snow 
---
  python/Pipfile | 11 +++
  1 file changed, 11 insertions(+)
  create mode 100644 python/Pipfile



Other than that,

Reviewed-by: Cleber Rosa 



Thanks!

Re: [PATCH v4 05/24] python: add qemu package installer

2021-02-16 Thread John Snow


On 2/16/21 9:23 PM, Cleber Rosa wrote:

On Thu, Feb 11, 2021 at 01:58:37PM -0500, John Snow wrote:

Add setup.cfg and setup.py, necessary for installing a package via
pip. Add a rst document explaining the basics of what this package is


Nitpick 1: setup.cfg and setup.py are indeed used by pip, your
statement is correct.  But, it hides the fact that these can be used
without pip.  On a source tree based install, you may want to simply
use "python setup.py develop" to achieve what "pip install -e" would
do (without pip ever entering the picture).



This is intentional. pip and setup.py actually use different pathways 
that are not identical. I do not recommend you call setup.py directly 
anymore. Once pyproject.toml is more widespread, there won't even *be* a 
setup.py.



Nitpick 2: while most people will understand what you mean by "rst
document", I believe that "Add a README file in reStructuredText
format" would be more obvious.



Sure.


for and who to contact for more information. This document will be used
as the landing page for the package on PyPI.

I am not yet using a pyproject.toml style package manifest, because
"editable" installs are not defined by PEP-517 and pip did not have
support for this for some time; I consider the feature necessary for
development.



I'm glad you kept it like this... I bet there's going to be another
PEP out, replacing the status quo, by the time I finish this review.



Actually, between writing this series last year and this latest respin, 
pip supports editable installs for pyproject.toml distributions now :)


...but I was still hesitant to take the leap, because maybe that's still 
too modern for the 3.6-based distributions we target and support.


...And I left the message alone, because I didn't re-research the exact 
reason I'm not using pyproject.toml now. Eh.



Use a light-weight setup.py instead.

Signed-off-by: John Snow 
---
  python/PACKAGE.rst | 32 
  python/setup.cfg   | 19 +++
  python/setup.py| 23 +++
  3 files changed, 74 insertions(+)
  create mode 100644 python/PACKAGE.rst
  create mode 100644 python/setup.cfg
  create mode 100755 python/setup.py

diff --git a/python/PACKAGE.rst b/python/PACKAGE.rst
new file mode 100644
index 000..0e714c87eb3
--- /dev/null
+++ b/python/PACKAGE.rst
@@ -0,0 +1,32 @@
+QEMU Python Tooling
+===
+
+This package provides QEMU tooling used by the QEMU project to build,
+configure, and test QEMU. It is not a fully-fledged SDK and it is subject
+to change at any time.
+
+Usage
+-
+
+The ``qemu.qmp`` subpackage provides a library for communicating with
+QMP servers. The ``qemu.machine`` subpackage offers rudimentary
+facilities for launching and managing QEMU processes. Refer to each
+package's documentation
+(``>>> help(qemu.qmp)``, ``>>> help(qemu.machine)``)
+for more information.
+


This gives the impression that those are the only subpackages (and
they were).  Better to reword it taking into account the qemu.utils
subpackage and possibly others (leave it open so that it doesn't rot
so quickly).



Intentional again. I don't, for the moment, consider that utils package 
something "supported" in the public sense, so I didn't feel the urge to 
draw attention to it.



- Cleber.

Re: [PATCH v4 00/24] python: create installable package

2021-02-16 Thread Cleber Rosa

On Mon, Feb 15, 2021 at 04:32:29PM -0500, John Snow wrote:
> On 2/11/21 9:52 PM, Cleber Rosa wrote:
> > On Thu, Feb 11, 2021 at 01:58:32PM -0500, John Snow wrote:
> > > This series factors the python/qemu directory as an installable
> > > package. It does not yet actually change the mechanics of how any other
> > > python source in the tree actually consumes it (yet), beyond the import
> > > path. (some import statements change in a few places.)
> > > 
> > > git: https://gitlab.com/jsnow/qemu/-/commits/python-package-mk3
> > > CI: https://gitlab.com/jsnow/qemu/-/pipelines/254940717
> > > (New CI job: https://gitlab.com/jsnow/qemu/-/jobs/1024230604)
> > > 
> > > The primary motivation of this series is primarily to formalize our
> > > dependencies on mypy, flake8, isort, and pylint alongside versions that
> > > are known to work. It does this using the setup.cfg and setup.py
> > > files. It also adds explicitly pinned versions (using Pipfile.lock) of
> > > these dependencies that should behave in a repeatable and known way for
> > > developers and CI environments both. Lastly, it enables those CI checks
> > > such that we can enforce Python coding quality checks via the CI tests.
> > > 
> > > An auxiliary motivation is that this package is formatted in such a way
> > > that it COULD be uploaded to https://pypi.org/project/qemu and installed
> > > independently of qemu.git with `pip install qemu`, but that button
> > > remains *unpushed* and this series *will not* cause any such
> > > releases. We have time to debate finer points like API guarantees and
> > > versioning even after this series is merged.
> > > 
> > > Some other things this enables that might be of interest:
> > > 
> > > With the python tooling as a proper package, you can install this
> > > package in editable or production mode to a virtual environment, your
> > > local user environment, or your system packages. The primary benefit of
> > > this is to gain access to QMP tooling regardless of CWD, without needing
> > > to battle sys.path (and confounding other python analysis tools).
> > > 
> > > For example: when developing, you may go to qemu/python/ and run `make
> > > venv` followed by `pipenv shell` to activate a virtual environment that
> > > contains the qemu python packages. These packages will always reflect
> > > the current version of the source files in the tree. When you are
> > > finished, you can simply exit the shell (^d) to remove these packages
> > > from your python environment.
> > > 
> > > When not developing, you could install a version of this package to your
> > > environment outright to gain access to the QMP and QEMUMachine classes
> > > for lightweight scripting and testing by using pip: "pip install [--user] 
> > > ."
> > > 
> > > TESTING THIS SERIES:
> > > 
> > > First of all, nothing should change. Without any intervention,
> > > everything should behave exactly as it was before. The only new
> > > information here comes from how to interact with and run the linters
> > > that will be enforcing code quality standards in this subdirectory.
> > > 
> > > To test those, CD to qemu/python first, and then:
> > > 
> > > 1. Try "make venv && pipenv shell" to get a venv with the package
> > > installed to it in editable mode. Ctrl+d exits this venv shell. While in
> > > this shell, any python script that uses "from qemu.[qmp|machine] import
> > > ..." should work correctly regardless of where the script is, or what
> > > your CWD is.
> > > 
> > 
> > Ack here, works as expected.
> > 
> 
> That's a relief!
> 
> > > You will need Python 3.6 installed on your system to do this step. For
> > > Fedora: "dnf install python3.6" will do the trick.
> > > 
> > 
> > You may have explained this before, so forgive me asking again.  Why
> > is this dependent on a given Python version, and not a *minimum*
> > Python version? Are the checkers themselves susceptible to different
> > behavior depending on the Python version used?  If so, any hint on the
> > strategy for developing with say, Python 3.8, and then having issues
> > caught only on Python 3.6?
> > 
> 
> It's a good question, and I have struggled with communicating the language.
> So here are a few points:
> 
> (1) Users will not need Python 3.6 on their local systems to be able to run
> the linters; they will be able to run "make check" to run it under *any*
> environment -- granted they have the necessary packages. (mypy, pylint,
> pytest, flake8, and isort.) See note #2 below:
> 
> (2) `pip install [--user] .[devel]` will install the needed dependencies to
> the local environment/venv regardless of python version. These dependencies
> are not pinned, but do express a minimum viable version (in setup.cfg) for
> each tool that I tested rigorously.
> 
> (3) The CI system will target Python 3.6 specifically, because that is our
> minimum version. This will work to prevent future features from bleeding
> into the codebase, which was a notable problem when we were targeting
> simult

Re: [PATCH v4 10/24] python: move pylintrc into setup.cfg

2021-02-16 Thread Cleber Rosa

On Thu, Feb 11, 2021 at 01:58:42PM -0500, John Snow wrote:
> Delete the empty settings now that it's sharing a home with settings for
> other tools.
> 
> pylint can now be run from this folder as "pylint qemu".
> 
> Signed-off-by: John Snow 
> ---
>  python/qemu/machine/pylintrc | 58 
>  python/setup.cfg | 29 ++
>  2 files changed, 29 insertions(+), 58 deletions(-)
>  delete mode 100644 python/qemu/machine/pylintrc
>

Reviewed-by: Cleber Rosa 
Tested-by: Cleber Rosa 


signature.asc
Description: PGP signature

Re: [PATCH v4 09/24] python: add pylint import exceptions

2021-02-16 Thread Cleber Rosa

On Thu, Feb 11, 2021 at 01:58:41PM -0500, John Snow wrote:
> Pylint 2.5.x and 2.6.x have regressions that make import checking
> inconsistent, see:
> 
> https://github.com/PyCQA/pylint/issues/3609
> https://github.com/PyCQA/pylint/issues/3624
> https://github.com/PyCQA/pylint/issues/3651
> 
> Pinning to 2.4.4 is worse, because it mandates versions of shared
> dependencies that are too old for features we want in isort and mypy.
> Oh well.
> 
> Signed-off-by: John Snow 
> ---
>  python/qemu/machine/__init__.py | 3 +++
>  python/qemu/machine/machine.py  | 2 +-
>  python/qemu/machine/qtest.py| 2 +-
>  3 files changed, 5 insertions(+), 2 deletions(-)
>

I can see your struggle on those issues, so I choose not to punish
myself attempting to replicate them.  I'll forever trust you in these
matters.

Reviewed-by: Cleber Rosa 


signature.asc
Description: PGP signature

Re: [PATCH v4 08/24] python: Add pipenv support

2021-02-16 Thread Cleber Rosa

On Tue, Feb 16, 2021 at 09:59:47PM -0500, Cleber Rosa wrote:
> On Thu, Feb 11, 2021 at 01:58:40PM -0500, John Snow wrote:
> > pipenv is a tool used for managing virtual environments with pinned,
> > explicit dependencies. It is used for precisely recreating python
> > virtual environments.
> > 
> > pipenv uses two files to do this:
> > 
> > (1) Pipfile, which is similar in purpose and scope to what setup.py
> > lists. It specifies the requisite minimum to get a functional
> > environment for using this package.
> > 
> > (2) Pipfile.lock, which is similar in purpose to `pip freeze >
> > requirements.txt`. It specifies a canonical virtual environment used for
> > deployment or testing. This ensures that all users have repeatable
> > results.
> > 
> > The primary benefit of using this tool is to ensure repeatable CI
> > results with a known set of packages. Although I endeavor to support as
> > many versions as I can, the fluid nature of the Python toolchain often
> > means tailoring code for fairly specific versions.
> > 
> > Note that pipenv is *not* required to install or use this module; this is
> > purely for the sake of repeatable testing by CI or developers.
> > 
> > Here, a "blank" pipfile is added with no dependencies, but specifies
> > Python 3.6 for the virtual environment.
> > 
> > Pipfile will specify our version minimums, while Pipfile.lock specifies
> > an exact loudout of packages that were known to operate correctly. This
> 
> Layout? Loadout?
> 
> > latter file provides the real value for easy setup of container images
> > and CI environments.
> > 
> > Signed-off-by: John Snow 
> > ---
> >  python/Pipfile | 11 +++
> >  1 file changed, 11 insertions(+)
> >  create mode 100644 python/Pipfile
> >
> 
> Other than that,
> 
> Reviewed-by: Cleber Rosa 

Actually, just one suggestion: bump the position of this patch twice.
It makes it easier to understand its purpose if it is placed right
before the "python: add pylint to pipenv" patch.

Cheers,
- Cleber.


signature.asc
Description: PGP signature

Re: [PATCH v4 08/24] python: Add pipenv support

2021-02-16 Thread Cleber Rosa

On Thu, Feb 11, 2021 at 01:58:40PM -0500, John Snow wrote:
> pipenv is a tool used for managing virtual environments with pinned,
> explicit dependencies. It is used for precisely recreating python
> virtual environments.
> 
> pipenv uses two files to do this:
> 
> (1) Pipfile, which is similar in purpose and scope to what setup.py
> lists. It specifies the requisite minimum to get a functional
> environment for using this package.
> 
> (2) Pipfile.lock, which is similar in purpose to `pip freeze >
> requirements.txt`. It specifies a canonical virtual environment used for
> deployment or testing. This ensures that all users have repeatable
> results.
> 
> The primary benefit of using this tool is to ensure repeatable CI
> results with a known set of packages. Although I endeavor to support as
> many versions as I can, the fluid nature of the Python toolchain often
> means tailoring code for fairly specific versions.
> 
> Note that pipenv is *not* required to install or use this module; this is
> purely for the sake of repeatable testing by CI or developers.
> 
> Here, a "blank" pipfile is added with no dependencies, but specifies
> Python 3.6 for the virtual environment.
> 
> Pipfile will specify our version minimums, while Pipfile.lock specifies
> an exact loudout of packages that were known to operate correctly. This

Layout? Loadout?

> latter file provides the real value for easy setup of container images
> and CI environments.
> 
> Signed-off-by: John Snow 
> ---
>  python/Pipfile | 11 +++
>  1 file changed, 11 insertions(+)
>  create mode 100644 python/Pipfile
>

Other than that,

Reviewed-by: Cleber Rosa 


signature.asc
Description: PGP signature

Re: [PATCH v4 07/24] python: add directory structure README.rst files

2021-02-16 Thread Cleber Rosa

On Thu, Feb 11, 2021 at 01:58:39PM -0500, John Snow wrote:
> Add short readmes to python/, python/qemu/, python/qemu/machine,
> python/qemu/qmp, and python/qemu/utils that explain the directory
> hierarchy. These readmes are visible when browsing the source on
> e.g. gitlab/github and are designed to help new developers/users quickly
> make sense of the source tree.
> 
> They are not designed for inclusion in a published manual.
> 
> Signed-off-by: John Snow 
> ---
>  python/README.rst  | 41 ++
>  python/qemu/README.rst |  8 +++
>  python/qemu/machine/README.rst |  9 
>  python/qemu/qmp/README.rst |  9 
>  python/qemu/utils/README.rst   |  9 
>  5 files changed, 76 insertions(+)
>  create mode 100644 python/README.rst
>  create mode 100644 python/qemu/README.rst
>  create mode 100644 python/qemu/machine/README.rst
>  create mode 100644 python/qemu/qmp/README.rst
>  create mode 100644 python/qemu/utils/README.rst
>

It's not often I complain about too much documentation, but I honestly
think this will not scale.  I understand that the intention is to help
users browsing through the directory structure it has a huge potential
for bit rot.

The READMEs at the first two levels seem OK, but the ones at the
subpackages level will be a maintainance nightmare.  I would *very
much* move those (subpackage ones) documentation into the Python file
themselves.

> diff --git a/python/README.rst b/python/README.rst
> new file mode 100644
> index 000..6a14b99e104
> --- /dev/null
> +++ b/python/README.rst
> @@ -0,0 +1,41 @@
> +QEMU Python Tooling
> +===
> +
> +This directory houses Python tooling used by the QEMU project to build,
> +configure, and test QEMU. It is organized by namespace (``qemu``), and
> +then by package (``qemu/machine``, ``qemu/qmp``).
> +
> +``setup.py`` is used by ``pip`` to install this tooling to the current
> +environment. ``setup.cfg`` provides the packaging configuration used by
> +setup.py in a setuptools specific format. You will generally invoke it
> +by doing one of the following:
> +
> +1. ``pip3 install .`` will install these packages to your current
> +   environment. If you are inside a virtual environment, they will
> +   install there. If you are not, it will attempt to install to the
> +   global environment, which is not recommended.
> +
> +2. ``pip3 install --user .`` will install these packages to your user's
> +   local python packages. If you are inside of a virtual environment,
> +   this will fail.
> +
> +If you amend the ``-e`` argument, pip will install in "editable" mode;
> +which installs a version of the package that installs a forwarder
> +pointing to these files, such that the package always reflects the
> +latest version in your git tree.
> +
> +See `Installing packages using pip and virtual environments
> +`_
> +for more information.
> +
> +
> +Files in this directory
> +---
> +
> +- ``qemu/`` Python package source directory.
> +- ``PACKAGE.rst`` is used as the README file that is visible on PyPI.org.
> +- ``README.rst`` you are here!
> +- ``VERSION`` contains the PEP-440 compliant version used to describe
> +  this package; it is referenced by ``setup.cfg``.
> +- ``setup.cfg`` houses setuptools package configuration.
> +- ``setup.py`` is the setuptools installer used by pip; See above.
> diff --git a/python/qemu/README.rst b/python/qemu/README.rst
> new file mode 100644
> index 000..d04943f526c
> --- /dev/null
> +++ b/python/qemu/README.rst
> @@ -0,0 +1,8 @@
> +QEMU Python Namespace
> +=
> +
> +This directory serves as the root of a `Python PEP 420 implicit
> +namespace package `_.
> +
> +Each directory below is assumed to be an installable Python package that
> +is available under the ``qemu.`` namespace.
> diff --git a/python/qemu/machine/README.rst b/python/qemu/machine/README.rst
> new file mode 100644
> index 000..ac2b4fffb42
> --- /dev/null
> +++ b/python/qemu/machine/README.rst
> @@ -0,0 +1,9 @@
> +qemu.machine package
> +
> +
> +This package provides core utilities used for testing and debugging
> +QEMU. It is used by the iotests, vm tests, acceptance tests, and several
> +other utilities in the ./scripts directory. It is not a fully-fledged
> +SDK and it is subject to change at any time.
> +
> +See the documentation in ``__init__.py`` for more information.
> diff --git a/python/qemu/qmp/README.rst b/python/qemu/qmp/README.rst
> new file mode 100644
> index 000..c21951491cf
> --- /dev/null
> +++ b/python/qemu/qmp/README.rst
> @@ -0,0 +1,9 @@
> +qemu.qmp package
> +
> +
> +This package provides a library used for connecting to and communicating
> +with QMP servers. It is used extensively by iotests, vm tests,
> +acceptance tests, and

Re: [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration

2021-02-16 Thread David Gibson

On Thu, Feb 11, 2021 at 07:52:39PM -0300, Daniel Henrique Barboza wrote:
> Hi,
> 
> This is marked as a v3 as it started as a result of discussions that
> followed the v2 [1]. 
> 
> The idea with this series is to add CPU hotunplug timeout to avoid the
> situations where the kernel refuses to release the CPU. The reasoning
> for a timeout approach is described in patch 05.
> 
> While investigating putting a timeout in memory hotunplug, I have found
> out that we have a way to determine, at least in some cases, when the kernel
> refuses to release the DIMM during a memory hotunplug. This alleviate one
> of the most common issues (at least AFAIK) with memory hotunplug and it
> made me gave up attempting to put a timeout in memory hotunplug altogether.
> 
> At this point I didn't add timeouts for PCI hotunplug operations, but it
> is trivial to do so if desirable.
> 
> The series goes as follows:
> 
> - Patches 1-4: DRC simplifications/cleanups. The idea with these
>   cleanups were to trim the spapr_drc_detach use as much as possible,
>   since the function would be used to start the timeout timer
> 
> - Patch 5: timeout timer infrastructure
> 
> - Patch 6: add cpu unplug timeout
> 
> - Patch 7: reset DIMM unplug state when the kernel reconfigures the DRC
>   connector

Very nice start.  More comments throughout.

> 
> 
> 
> v2 link: [1] 
> https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04400.html
> 
> 
> Daniel Henrique Barboza (7):
>   spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical()
>   spapr_pci.c: simplify spapr_pci_unplug_request() function handling
>   spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable
>   spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()
>   spapr_drc.c: introduce unplug_timeout_timer
>   spapr_drc.c: add hotunplug timeout for CPUs
>   spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state
> 
>  hw/ppc/spapr.c |  40 -
>  hw/ppc/spapr_drc.c | 116 +++--
>  hw/ppc/spapr_pci.c |  44 +-
>  hw/ppc/trace-events|   2 +-
>  include/hw/ppc/spapr.h |   2 +
>  include/hw/ppc/spapr_drc.h |   7 ++-
>  6 files changed, 147 insertions(+), 64 deletions(-)
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state

2021-02-16 Thread David Gibson

On Thu, Feb 11, 2021 at 07:52:46PM -0300, Daniel Henrique Barboza wrote:
> Handling errors in memory hotunplug in the pSeries machine is more complex
> than any other device type, because there are all the complications that other
> devices has, and more.
> 
> For instance, determining a timeout for a DIMM hotunplug must consider if 
> it's a
> Hash-MMU or a Radix-MMU guest, because Hash guests takes longer to hotunplug 
> DIMMs.
> The size of the DIMM is also a factor, given that longer DIMMs naturally takes
> longer to be hotunplugged from the kernel. And there's also the guest memory 
> usage to
> be considered: if there's a process that is consuming memory that would be 
> lost by
> the DIMM unplug, the kernel will postpone the unplug process until the process
> finishes, and then initiate the regular hotunplug process. The first two
> considerations are manageable, but the last one is a deal breaker.
> 
> There is no sane way for the pSeries machine to determine the memory load in 
> the guest
> when attempting a DIMM hotunplug - and even if there was a way, the guest can 
> start
> using all the RAM in the middle of the unplug process and invalidate our 
> previous
> assumptions - and in result we can't even begin to calculate a timeout for the
> operation. This means that we can't implement a viable timeout mechanism for 
> memory
> unplug in pSeries.
> 
> Going back to why we would consider an unplug timeout, the reason is that we 
> can't
> know if the kernel is giving up the unplug. Turns out that, sometimes, we can.
> Consider a failed memory hotunplug attempt where the kernel will error out 
> with
> the following message:
> 
> 'pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed 
> LMBs'
> 
> This happens when there is a LMB that the kernel gave up in removing, and the 
> LMBs
> marked for removal of the same DIMM are now being added back. This process 
> happens

We need to be a little careful about terminology here.  From the
guest's point of view, there's no such thing as a DIMM, only LMBs.
What the guest is doing here is essentially rejecting a single "index
+ number" DRC unplug request, which corresponds to one DIMM on the
qemu side.

> in the pseries kernel in [1], dlpar_memory_remove_by_ic() into 
> dlpar_add_lmb(), and
> after that update_lmb_associativity_index(). In this function, the kernel is 
> configuring
> the LMB DRC connector again. Note that this is a valid usage in LOPAR, as 
> stated in
> section "ibm,configure-connector RTAS Call":
> 
> 'A subsequent sequence of calls to ibm,configure-connector with the same 
> entry from
> the “ibm,drc-indexes” or “ibm,drc-info” property will restart the 
> configuration of
> devices which were not completely configured.'
> 
> We can use this kernel behavior in our favor. If a DRC connector 
> reconfiguration
> for a LMB that we marked as unplug pending happens, this indicates that the 
> kernel
> changed its mind about the unplug and is reasserting that it will keep using 
> the
> DIMM. In this case, it's safe to assume that the whole DIMM unplug was 
> cancelled.
> 
> This patch hops into rtas_ibm_configure_connector() and, in the scenario 
> described
> above, clear the unplug state for the DIMM device. This will not solve all the
> problems we still have with memory unplug, but it will cover this case where 
> the
> kernel reconfigures LMBs after a failed unplug. We are a bit more resilient,
> without using an unreliable timeout, and we didn't make the remaining error 
> cases
> any worse.

I wonder if we could use this as a beginning of a hotplug failure
reporting mechanism.  As noted, this is explicitly allowed by PAPR and
I think in general it makes sense that a configure-connector would
re-assert that the guest is using the resource and we can't unplug it.

Could we extend guests to do an indicative configure-connector on any
unplug it knows it can't complete?  Or if configure-connector is too
disruptive could we use an (extra) H_SET_INDICATOR to "UNISOLATE"
state? If I'm reading right, that should be both permitted and a no-op
for existing PAPR implementations, so it should be a pretty safe way
to add that indication.

> 
> [1] arch/powerpc/platforms/pseries/hotplug-memory.c
> 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/ppc/spapr.c | 30 ++
>  hw/ppc/spapr_drc.c | 14 ++
>  include/hw/ppc/spapr.h |  2 ++
>  3 files changed, 46 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index ecce8abf14..4bcded4a1a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3575,6 +3575,36 @@ static SpaprDimmState 
> *spapr_recover_pending_dimm_state(SpaprMachineState *ms,
>  return spapr_pending_dimm_unplugs_add(ms, avail_lmbs, dimm);
>  }
>  
> +void spapr_clear_pending_dimm_unplug_state(SpaprMachineState *spapr,
> +   PCDIMMDevice *dimm)
> +{
> +SpaprDimmState *ds = spapr_pending_dimm_unplugs

Re: [PATCH v4 06/24] python: add VERSION file

2021-02-16 Thread Cleber Rosa

On Thu, Feb 11, 2021 at 01:58:38PM -0500, John Snow wrote:
> Python infrastructure as it exists today is not capable reliably of
> single-sourcing a package version from a parent directory. The authors
> of pip are working to correct this, but as of today this is not possible.
> 
> The problem is that when using pip to build and install a python
> package, it copies files over to a temporary directory and performs its
> build there. This loses access to any information in the parent
> directory, including git itself.
> 
> Further, Python versions have a standard (PEP 440) that may or may not
> follow QEMU's versioning. In general, it does; but naturally QEMU does
> not follow PEP 440. To avoid any automatically-generated conflict, a
> manual version file is preferred.
> 
> 
> I am proposing:
> 
> - Python tooling follows the QEMU version, indirectly, but with a major
>   version of 0 to indicate that the API is not expected to be
>   stable. This would mean version 0.5.2.0, 0.5.1.1, 0.5.3.0, etc.
> 
> - In the event that a Python package needs to be updated independently
>   of the QEMU version, a pre-release alpha version should be preferred,
>   but *only* after inclusion to the qemu development or stable branches.
> 
>   e.g. 0.5.2.0a1, 0.5.2.0a2, and so on should be preferred prior to
>   5.2.0's release.
> 
> - The Python core tooling makes absolutely no version compatibility
>   checks or constraints. It *may* work with releases of QEMU from the
>   past or future, but it is not required to.
> 
>   i.e., "qemu.machine" will, for now, remain in lock-step with QEMU.
> 
> - We reserve the right to split the qemu package into independently
>   versioned subpackages at a later date. This might allow for us to
>   begin versioning QMP independently from QEMU at a later date, if
>   we so choose.
> 
> 
> Implement this versioning scheme by adding a VERSION file and setting it
> to 0.6.0.0a1.
> 
> Signed-off-by: John Snow 
> ---
>  python/VERSION   | 1 +
>  python/setup.cfg | 1 +
>  2 files changed, 2 insertions(+)
>  create mode 100644 python/VERSION
> 

Reviewed-by: Cleber Rosa 


signature.asc
Description: PGP signature

Re: [PATCH v4 05/24] python: add qemu package installer

2021-02-16 Thread Cleber Rosa

On Thu, Feb 11, 2021 at 01:58:37PM -0500, John Snow wrote:
> Add setup.cfg and setup.py, necessary for installing a package via
> pip. Add a rst document explaining the basics of what this package is

Nitpick 1: setup.cfg and setup.py are indeed used by pip, your
statement is correct.  But, it hides the fact that these can be used
without pip.  On a source tree based install, you may want to simply
use "python setup.py develop" to achieve what "pip install -e" would
do (without pip ever entering the picture).

Nitpick 2: while most people will understand what you mean by "rst
document", I believe that "Add a README file in reStructuredText
format" would be more obvious.

> for and who to contact for more information. This document will be used
> as the landing page for the package on PyPI.
> 
> I am not yet using a pyproject.toml style package manifest, because
> "editable" installs are not defined by PEP-517 and pip did not have
> support for this for some time; I consider the feature necessary for
> development.
>

I'm glad you kept it like this... I bet there's going to be another
PEP out, replacing the status quo, by the time I finish this review.

> Use a light-weight setup.py instead.
> 
> Signed-off-by: John Snow 
> ---
>  python/PACKAGE.rst | 32 
>  python/setup.cfg   | 19 +++
>  python/setup.py| 23 +++
>  3 files changed, 74 insertions(+)
>  create mode 100644 python/PACKAGE.rst
>  create mode 100644 python/setup.cfg
>  create mode 100755 python/setup.py
> 
> diff --git a/python/PACKAGE.rst b/python/PACKAGE.rst
> new file mode 100644
> index 000..0e714c87eb3
> --- /dev/null
> +++ b/python/PACKAGE.rst
> @@ -0,0 +1,32 @@
> +QEMU Python Tooling
> +===
> +
> +This package provides QEMU tooling used by the QEMU project to build,
> +configure, and test QEMU. It is not a fully-fledged SDK and it is subject
> +to change at any time.
> +
> +Usage
> +-
> +
> +The ``qemu.qmp`` subpackage provides a library for communicating with
> +QMP servers. The ``qemu.machine`` subpackage offers rudimentary
> +facilities for launching and managing QEMU processes. Refer to each
> +package's documentation
> +(``>>> help(qemu.qmp)``, ``>>> help(qemu.machine)``)
> +for more information.
> +

This gives the impression that those are the only subpackages (and
they were).  Better to reword it taking into account the qemu.utils
subpackage and possibly others (leave it open so that it doesn't rot
so quickly).

- Cleber.

signature.asc
Description: PGP signature

Re: [PATCH v3 6/7] spapr_drc.c: add hotunplug timeout for CPUs

2021-02-16 Thread David Gibson

On Thu, Feb 11, 2021 at 07:52:45PM -0300, Daniel Henrique Barboza wrote:
> There is a reliable way to make a CPU hotunplug fail in the pseries
> machine. Hotplug a CPU A, then offline all other CPUs inside the guest
> but A. When trying to hotunplug A the guest kernel will refuse to do
> it, because A is now the last online CPU of the guest. PAPR has no
> 'error callback' in this situation to report back to the platform,
> so the guest kernel will deny the unplug in silent and QEMU will never
> know what happened. The unplug pending state of A will remain until
> the guest is shutdown or rebooted.
> 
> Previous attempts of fixing it (see [1] and [2]) were aimed at trying to
> mitigate the effects of the problem. In [1] we were trying to guess which
> guest CPUs were online to forbid hotunplug of the last online CPU in the QEMU
> layer, avoiding the scenario described above because QEMU is now failing
> in behalf of the guest. This is not robust because the last online CPU of
> the guest can change while we're in the middle of the unplug process, and
> our initial assumptions are now invalid. In [2] we were accepting that our
> unplug process is uncertain and the user should be allowed to spam the IRQ
> hotunplug queue of the guest in case the CPU hotunplug fails.
> 
> This patch presents another alternative, using the timeout infrastructure
> introduced in the previous patch. CPU hotunplugs in the pSeries machine will
> now timeout after 15 seconds. This is a long time for a single CPU unplug
> to occur, regardless of guest load - although the user is *strongly* 
> encouraged
> to *not* hotunplug devices from a guest under high load - and we can be sure
> that something went wrong if it takes longer than that for the guest to 
> release
> the CPU (the same can't be said about memory hotunplug - more on that in the
> next patch).
> 
> Timing out the unplug operation will reset the unplug state of the CPU and
> allow the user to try it again, regardless of the error situation that
> prevented the hotunplug to occur. Of all the not so pretty fixes/mitigations
> for CPU hotunplug errors in pSeries, timing out the operation is an admission
> that we have no control in the process, and must assume the worst case if
> the operation doesn't succeed in a sensible time frame.
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg03353.html
> [2] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04400.html
> 
> Reported-by: Xujun Ma 
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: David Gibson 

> ---
>  hw/ppc/spapr.c |  4 
>  hw/ppc/spapr_drc.c | 17 +
>  include/hw/ppc/spapr_drc.h |  3 +++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index b066df68cb..ecce8abf14 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3724,6 +3724,10 @@ void spapr_core_unplug_request(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  if (!spapr_drc_unplug_requested(drc)) {
>  spapr_drc_unplug_request(drc);
>  spapr_hotplug_req_remove_by_index(drc);
> +} else {
> +error_setg(errp, "core-id %d unplug is still pending, %d seconds "
> +   "timeout remaining",
> +   cc->core_id, spapr_drc_unplug_timeout_remaining_sec(drc));

Reporting this information is a nice touch.

>  }
>  }
>  
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index c88bb524c5..c143bfb6d3 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -398,6 +398,12 @@ void spapr_drc_unplug_request(SpaprDrc *drc)
>  
>  drc->unplug_requested = true;
>  
> +if (drck->unplug_timeout_seconds != 0) {
> +timer_mod(drc->unplug_timeout_timer,
> +  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +  drck->unplug_timeout_seconds * 1000);
> +}
> +
>  if (drc->state != drck->empty_state) {
>  trace_spapr_drc_awaiting_quiesce(spapr_drc_index(drc));
>  return;
> @@ -406,6 +412,16 @@ void spapr_drc_unplug_request(SpaprDrc *drc)
>  spapr_drc_release(drc);
>  }
>  
> +int spapr_drc_unplug_timeout_remaining_sec(SpaprDrc *drc)
> +{
> +if (drc->unplug_requested && timer_pending(drc->unplug_timeout_timer)) {
> +return 
> (qemu_timeout_ns_to_ms(drc->unplug_timeout_timer->expire_time) -
> +qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL)) / 1000;

Hmm.  Reaching into the timer's internal fields isn't ideal.  I wonder
if we should add a helper in the timer code for reporting this information.

> +}
> +
> +return 0;
> +}
> +
>  bool spapr_drc_reset(SpaprDrc *drc)
>  {
>  SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> @@ -706,6 +722,7 @@ static void spapr_drc_cpu_class_init(ObjectClass *k, void 
> *data)
>  drck->drc_name_prefix = "CPU ";
>  drck->release = spapr_core_release;
>  drck->dt_populate = spapr_core_dt_populate

Re: [PATCH v3 5/7] spapr_drc.c: introduce unplug_timeout_timer

2021-02-16 Thread David Gibson

On Thu, Feb 11, 2021 at 07:52:44PM -0300, Daniel Henrique Barboza wrote:
> The LoPAR spec provides no way for the guest kernel to report failure of
> hotplug/hotunplug events. This wouldn't be bad if those operations were
> granted to always succeed, but that's far for the reality.
> 
> What ends up happening is that, in the case of a failed hotunplug,
> regardless of whether it was a QEMU error or a guest misbehavior, the pSeries
> machine is retaining the unplug state of the device in the running guest.
> This state is cleanup in machine reset, where it is assumed that this state
> represents a device that is pending unplug, and the device is hotunpluged
> from the board. Until the reset occurs, any hotunplug operation of the same
> device is forbid because there is a pending unplug state.
> 
> This behavior has at least one undesirable side effect. A long standing 
> pending
> unplug state is, more often than not, the result of a hotunplug error. The 
> user
> had to dealt with it, since retrying to unplug the device is noy allowed, and 
> then
> in the machine reset we're removing the device from the guest. This means that
> we're failing the user twice - failed to hotunplug when asked, then 
> hotunplugged
> without notice.
> 
> Solutions to this problem range between trying to predict when the hotunplug 
> will
> fail and forbid the operation from the QEMU layer, from opening up the IRQ 
> queue
> to allow for multiple hotunplug attempts, from telling the users to 'reboot 
> the
> machine if something goes wrong'. The first solution is flawed because we 
> can't
> fully predict guest behavior from QEMU, the second solution is a trial and 
> error
> remediation that counts on a hope that the unplug will eventually succeed, 
> and the
> third is ... well.
> 
> This patch introduces a crude, but effective solution to hotunplug errors in
> the pSeries machine. For each unplug done, we'll timeout after some time. If
> a certain amount of time passes, we'll cleanup the hotunplug state from the 
> machine.
> During the timeout period, any unplug operations in the same device will still
> be blocked. After that, we'll assume that the guest failed the operation, and
> allow the user to try again. If the timeout is too short we'll prevent 
> legitimate
> hotunplug situations to occur, so we'll need to overestimate the regular time
> an unplug operation takes to succeed to account that.
> 
> The true solution for the hotunplug errors in the pSeries machines is a PAPR 
> change
> to allow for the guest to warn the platform about it. For now, the work done 
> in this
> timeout design can be used for the new PAPR 'abort hcall' in the future, 
> given that
> for both cases we'll need code to cleanup the existing unplug states of the 
> DRCs.
> 
> At this moment we're adding the basic wiring of the timer into the DRC. Next 
> patch
> will use the timer to timeout failed CPU hotunplugs.
> 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/ppc/spapr_drc.c | 36 
>  include/hw/ppc/spapr_drc.h |  2 ++
>  2 files changed, 38 insertions(+)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 67041fb212..c88bb524c5 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -57,6 +57,8 @@ static void spapr_drc_release(SpaprDrc *drc)
>  drck->release(drc->dev);
>  
>  drc->unplug_requested = false;
> +timer_del(drc->unplug_timeout_timer);
> +
>  g_free(drc->fdt);
>  drc->fdt = NULL;
>  drc->fdt_start_offset = 0;
> @@ -453,6 +455,24 @@ static const VMStateDescription 
> vmstate_spapr_drc_unplug_requested = {
>  }
>  };
>  
> +static bool spapr_drc_unplug_timeout_timer_needed(void *opaque)
> +{
> +SpaprDrc *drc = opaque;
> +
> +return timer_pending(drc->unplug_timeout_timer);
> +}
> +
> +static const VMStateDescription vmstate_spapr_drc_unplug_timeout_timer = {
> +.name = "DRC unplug timeout timer",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.needed = spapr_drc_unplug_timeout_timer_needed,
> +.fields = (VMStateField[]) {
> +VMSTATE_TIMER_PTR(unplug_timeout_timer, SpaprDrc),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
>  static bool spapr_drc_needed(void *opaque)
>  {
>  SpaprDrc *drc = opaque;
> @@ -486,10 +506,20 @@ static const VMStateDescription vmstate_spapr_drc = {
>  },
>  .subsections = (const VMStateDescription * []) {
>  &vmstate_spapr_drc_unplug_requested,
> +&vmstate_spapr_drc_unplug_timeout_timer,
>  NULL
>  }
>  };
>  
> +static void drc_unplug_timeout_cb(void *opaque)
> +{
> +SpaprDrc *drc = opaque;
> +
> +if (drc->unplug_requested) {
> +drc->unplug_requested = false;
> +}

Sorry, forgot to mention in first pass - I think we want some kind of
reporting here.  At least a trace, and maybe also a warn_report() or
the like.

Hrm.. looking wider, I wonder if we should add a DEVICE_DELETE_FAILED
message to QAPI to m

Re: [PATCH v3 4/7] spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()

2021-02-16 Thread David Gibson

On Thu, Feb 11, 2021 at 07:52:43PM -0300, Daniel Henrique Barboza wrote:
> spapr_drc_detach() is not the best name for what the function does.
> The function does not detach the DRC, it makes an uncommited attempt
> to do it. It'll mark the DRC as pending unplug, via the 'unplug_request'
> flag, and only if the DRC state is drck->empty_state it will detach the
> DRC, via spapr_drc_release().
> 
> This is a contrast with its pair spapr_drc_attach(), where the function is
> indeed creating the DRC QOM object. If you know what spapr_drc_attach()
> does, you can be misled into thinking that spapr_drc_detach() is removing
> the DRC from QEMU internal state, which isn't true.
> 
> The current role of this function is better described as a request for
> detach, since there's no guarantee that we're going to detach the DRC in
> the end. Rename the function to spapr_drc_unplug_request to reflect what is is
> doing.
> 
> The initial idea was to change the name to spapr_drc_detach_request(), and
> later on change the unplug_request flag to detach_request. However,
> unplug_request is a migratable boolean for a long time now and renaming it
> is not worth the trouble. spapr_drc_unplug_request() setting 
> drc->unplug_request
> is more natural than spapr_drc_detach_request setting drc->unplug_request.
> 
> Signed-off-by: Daniel Henrique Barboza 

Good reasoning.

Reviewed-by: David Gibson 

> ---
>  hw/ppc/spapr.c | 6 +++---
>  hw/ppc/spapr_drc.c | 4 ++--
>  hw/ppc/spapr_pci.c | 2 +-
>  hw/ppc/trace-events| 2 +-
>  include/hw/ppc/spapr_drc.h | 2 +-
>  5 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 85fe65f894..b066df68cb 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3654,7 +3654,7 @@ static void spapr_memory_unplug_request(HotplugHandler 
> *hotplug_dev,
>addr / SPAPR_MEMORY_BLOCK_SIZE);
>  g_assert(drc);
>  
> -spapr_drc_detach(drc);
> +spapr_drc_unplug_request(drc);
>  addr += SPAPR_MEMORY_BLOCK_SIZE;
>  }
>  
> @@ -3722,7 +3722,7 @@ void spapr_core_unplug_request(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  g_assert(drc);
>  
>  if (!spapr_drc_unplug_requested(drc)) {
> -spapr_drc_detach(drc);
> +spapr_drc_unplug_request(drc);
>  spapr_hotplug_req_remove_by_index(drc);
>  }
>  }
> @@ -3985,7 +3985,7 @@ static void spapr_phb_unplug_request(HotplugHandler 
> *hotplug_dev,
>  assert(drc);
>  
>  if (!spapr_drc_unplug_requested(drc)) {
> -spapr_drc_detach(drc);
> +spapr_drc_unplug_request(drc);
>  spapr_hotplug_req_remove_by_index(drc);
>  }
>  }
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 555a25517d..67041fb212 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -386,11 +386,11 @@ void spapr_drc_attach(SpaprDrc *drc, DeviceState *d)
>   NULL, 0);
>  }
>  
> -void spapr_drc_detach(SpaprDrc *drc)
> +void spapr_drc_unplug_request(SpaprDrc *drc)
>  {
>  SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>  
> -trace_spapr_drc_detach(spapr_drc_index(drc));
> +trace_spapr_drc_unplug_request(spapr_drc_index(drc));
>  
>  g_assert(drc->dev);
>  
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 1791d98a49..9334ba5dbb 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1726,7 +1726,7 @@ static void spapr_pci_unplug_request(HotplugHandler 
> *plug_handler,
>  if (state == SPAPR_DR_ENTITY_SENSE_PRESENT) {
>  /* Mark the DRC as requested unplug if needed. */
>  if (!spapr_drc_unplug_requested(func_drc)) {
> -spapr_drc_detach(func_drc);
> +spapr_drc_unplug_request(func_drc);
>  }
>  spapr_hotplug_req_remove_by_index(func_drc);
>  }
> diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events
> index 1e91984526..b4bbfbb013 100644
> --- a/hw/ppc/trace-events
> +++ b/hw/ppc/trace-events
> @@ -50,7 +50,7 @@ spapr_drc_set_allocation_state(uint32_t index, int state) 
> "drc: 0x%"PRIx32", sta
>  spapr_drc_set_allocation_state_finalizing(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_set_configured(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_attach(uint32_t index) "drc: 0x%"PRIx32
> -spapr_drc_detach(uint32_t index) "drc: 0x%"PRIx32
> +spapr_drc_unplug_request(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_awaiting_quiesce(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_reset(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_realize(uint32_t index) "drc: 0x%"PRIx32
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index 8982927d5c..02a63b3666 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -243,7 +243,7 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, 
> uint32_t drc_type_mask);
>   * beforehand (eg

Re: [PATCH v3 5/7] spapr_drc.c: introduce unplug_timeout_timer

2021-02-16 Thread David Gibson

On Thu, Feb 11, 2021 at 07:52:44PM -0300, Daniel Henrique Barboza wrote:
> The LoPAR spec provides no way for the guest kernel to report failure of
> hotplug/hotunplug events. This wouldn't be bad if those operations were
> granted to always succeed, but that's far for the reality.
> 
> What ends up happening is that, in the case of a failed hotunplug,
> regardless of whether it was a QEMU error or a guest misbehavior, the pSeries
> machine is retaining the unplug state of the device in the running guest.
> This state is cleanup in machine reset, where it is assumed that this state
> represents a device that is pending unplug, and the device is hotunpluged
> from the board. Until the reset occurs, any hotunplug operation of the same
> device is forbid because there is a pending unplug state.
> 
> This behavior has at least one undesirable side effect. A long standing 
> pending
> unplug state is, more often than not, the result of a hotunplug error. The 
> user
> had to dealt with it, since retrying to unplug the device is noy allowed, and 
> then
> in the machine reset we're removing the device from the guest. This means that
> we're failing the user twice - failed to hotunplug when asked, then 
> hotunplugged
> without notice.
> 
> Solutions to this problem range between trying to predict when the hotunplug 
> will
> fail and forbid the operation from the QEMU layer, from opening up the IRQ 
> queue
> to allow for multiple hotunplug attempts, from telling the users to 'reboot 
> the
> machine if something goes wrong'. The first solution is flawed because we 
> can't
> fully predict guest behavior from QEMU, the second solution is a trial and 
> error
> remediation that counts on a hope that the unplug will eventually succeed, 
> and the
> third is ... well.
> 
> This patch introduces a crude, but effective solution to hotunplug errors in
> the pSeries machine. For each unplug done, we'll timeout after some time. If
> a certain amount of time passes, we'll cleanup the hotunplug state from the 
> machine.
> During the timeout period, any unplug operations in the same device will still
> be blocked. After that, we'll assume that the guest failed the operation, and
> allow the user to try again. If the timeout is too short we'll prevent 
> legitimate
> hotunplug situations to occur, so we'll need to overestimate the regular time
> an unplug operation takes to succeed to account that.
> 
> The true solution for the hotunplug errors in the pSeries machines is a PAPR 
> change
> to allow for the guest to warn the platform about it. For now, the work done 
> in this
> timeout design can be used for the new PAPR 'abort hcall' in the future, 
> given that
> for both cases we'll need code to cleanup the existing unplug states of the 
> DRCs.
> 
> At this moment we're adding the basic wiring of the timer into the DRC. Next 
> patch
> will use the timer to timeout failed CPU hotunplugs.
> 
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  hw/ppc/spapr_drc.c | 36 
>  include/hw/ppc/spapr_drc.h |  2 ++
>  2 files changed, 38 insertions(+)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 67041fb212..c88bb524c5 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -57,6 +57,8 @@ static void spapr_drc_release(SpaprDrc *drc)
>  drck->release(drc->dev);
>  
>  drc->unplug_requested = false;
> +timer_del(drc->unplug_timeout_timer);
> +
>  g_free(drc->fdt);
>  drc->fdt = NULL;
>  drc->fdt_start_offset = 0;
> @@ -453,6 +455,24 @@ static const VMStateDescription 
> vmstate_spapr_drc_unplug_requested = {
>  }
>  };
>  
> +static bool spapr_drc_unplug_timeout_timer_needed(void *opaque)
> +{
> +SpaprDrc *drc = opaque;
> +
> +return timer_pending(drc->unplug_timeout_timer);
> +}
> +
> +static const VMStateDescription vmstate_spapr_drc_unplug_timeout_timer = {
> +.name = "DRC unplug timeout timer",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.needed = spapr_drc_unplug_timeout_timer_needed,
> +.fields = (VMStateField[]) {
> +VMSTATE_TIMER_PTR(unplug_timeout_timer, SpaprDrc),
> +VMSTATE_END_OF_LIST()
> +}
> +};

I think we can probably avoid adding extra data to the migration
stream.  Because the exact length of the timeout isn't super
important, so long as it's "long enough" I think it's acceptable if we
restart the timeout period after a migration.  That can be
accomplished with a post-load hook that just restarts the timer at the
initial duration if the DRC is in the unplug_requested state.

>  static bool spapr_drc_needed(void *opaque)
>  {
>  SpaprDrc *drc = opaque;
> @@ -486,10 +506,20 @@ static const VMStateDescription vmstate_spapr_drc = {
>  },
>  .subsections = (const VMStateDescription * []) {
>  &vmstate_spapr_drc_unplug_requested,
> +&vmstate_spapr_drc_unplug_timeout_timer,
>  NULL
>  }
>  };
>  
> +static void drc_unplug_timeou

Re: [PATCH v3 3/7] spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable

2021-02-16 Thread David Gibson

On Thu, Feb 11, 2021 at 07:52:42PM -0300, Daniel Henrique Barboza wrote:
> When moving a physical DRC to "Available", drc_isolate_physical() will
> move the DRC state to STATE_PHYSICAL_POWERON and, if the DRC is marked
> for unplug, call spapr_drc_detach(). For physical DRCs, drck->empty_state
> is STATE_PHYSICAL_POWERON, meaning that we're sure that spapr_drc_detach()
> will end up calling spapr_drc_release() in the end.
> 
> Likewise, for logical DRCs, drc_set_unusable will move the DRC to "Unusable"
> state, setting drc->state to STATE_LOGICAL_UNUSABLE, which is the
> drck->empty_state for logical DRCs. spapr_drc_detach() will call
> spapr_drc_release() in this case as well.
> 
> In both scenarios, spapr_drc_detach() is being used as a spapr_drc_release(),
> wrapper, where we also set unplug_requested (which is already true, otherwise
> spapr_drc_detach() wouldn't be called in the first place) and check if
> drc->state == drck->empty_state, which we also know it's guaranteed to
> be true because we just set it.
> 
> Just use spapr_drc_release() in these functions to be clear of our intentions
> in both these functions.
> 
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: David Gibson 

> ---
>  hw/ppc/spapr_drc.c | 32 
>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 84bd3c881f..555a25517d 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -50,6 +50,20 @@ uint32_t spapr_drc_index(SpaprDrc *drc)
>  | (drc->id & DRC_INDEX_ID_MASK);
>  }
>  
> +static void spapr_drc_release(SpaprDrc *drc)
> +{
> +SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +
> +drck->release(drc->dev);
> +
> +drc->unplug_requested = false;
> +g_free(drc->fdt);
> +drc->fdt = NULL;
> +drc->fdt_start_offset = 0;
> +object_property_del(OBJECT(drc), "device");
> +drc->dev = NULL;
> +}
> +
>  static uint32_t drc_isolate_physical(SpaprDrc *drc)
>  {
>  switch (drc->state) {
> @@ -68,7 +82,7 @@ static uint32_t drc_isolate_physical(SpaprDrc *drc)
>  if (drc->unplug_requested) {
>  uint32_t drc_index = spapr_drc_index(drc);
>  trace_spapr_drc_set_isolation_state_finalizing(drc_index);
> -spapr_drc_detach(drc);
> +spapr_drc_release(drc);
>  }
>  
>  return RTAS_OUT_SUCCESS;
> @@ -209,7 +223,7 @@ static uint32_t drc_set_unusable(SpaprDrc *drc)
>  if (drc->unplug_requested) {
>  uint32_t drc_index = spapr_drc_index(drc);
>  trace_spapr_drc_set_allocation_state_finalizing(drc_index);
> -spapr_drc_detach(drc);
> +spapr_drc_release(drc);
>  }
>  
>  return RTAS_OUT_SUCCESS;
> @@ -372,20 +386,6 @@ void spapr_drc_attach(SpaprDrc *drc, DeviceState *d)
>   NULL, 0);
>  }
>  
> -static void spapr_drc_release(SpaprDrc *drc)
> -{
> -SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> -
> -drck->release(drc->dev);
> -
> -drc->unplug_requested = false;
> -g_free(drc->fdt);
> -drc->fdt = NULL;
> -drc->fdt_start_offset = 0;
> -object_property_del(OBJECT(drc), "device");
> -drc->dev = NULL;
> -}
> -
>  void spapr_drc_detach(SpaprDrc *drc)
>  {
>  SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v3 2/7] spapr_pci.c: simplify spapr_pci_unplug_request() function handling

2021-02-16 Thread David Gibson

On Tue, Feb 16, 2021 at 02:44:44PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 2/16/21 2:16 PM, Greg Kurz wrote:
> > On Tue, 16 Feb 2021 13:09:43 -0300
> > Daniel Henrique Barboza  wrote:
> > 
> > > 
> > > 
> > > On 2/16/21 12:50 PM, Greg Kurz wrote:
> > > > On Thu, 11 Feb 2021 19:52:41 -0300
> > > > Daniel Henrique Barboza  wrote:
> > > > 
> > > > > When hotunplugging a PCI function we'll branch out the logic in two 
> > > > > cases,
> > > > > function zero and non-zero. If non-zero, we'll call 
> > > > > spapr_drc_detach() and
> > > > > nothing else. If it's function zero, we'll loop it once between all 
> > > > > the
> > > > > functions in the slot to call spapr_drc_detach() on them, and 
> > > > > afterwards
> > > > > we'll do another backwards loop where we'll signal the event to the 
> > > > > guest.
> > > > > 
> > > > > We can simplify this logic. We can ignore all the DRC handling for 
> > > > > non-zero
> > > > > functions, since we'll end up doing that regardless when unplugging 
> > > > > function
> > > > > zero. And for function zero, everything can be done in a single loop, 
> > > > > since
> > > > > tt doesn't matter if we end up marking the function DRCs as unplug 
> > > > > pending in
> > > > > backwards order or not, as long as we call spapr_drc_detach() before 
> > > > > issuing
> > > > > the hotunplug event to the guest.
> > > > > 
> > > > > This will also avoid a possible scenario where the user starts to 
> > > > > hotunplug
> > > > > the slot, starting with a non-zero function, and then delays/forgets 
> > > > > to
> > > > > hotunplug function zero afterwards. This would keep the function DRC 
> > > > > marked
> > > > > as unplug requested indefinitely.
> > > > > 
> > > > 
> > > > ... or until the guest is reset, which will no longer happen with this
> > > > patch applied, i.e. breaks the long standing policy that machine reset
> > > > causes pending hot-unplug requests to succeed. I don't see an obvious
> > > > reason to special case non-zero PCI functions.
> > > 
> > > It's not possible to hotunplug the non-zero functions during machine 
> > > reset for
> > > multifunction PCI devices. We need to unplug the entire slot, and that 
> > > will only
> > > happen when function zero is unplugged. In fact, I think bad things will 
> > > happen
> > > in this case you mentioned if we are forcing the removal of non-zero 
> > > functions
> > > without function zero (spoiler: didn't test it).
> > > 
> > 
> > I've tested with the aggregation of two e1000e emulated devices:
> > 
> > device_add e1000e,addr=10.1,id=netfn1
> > device_add e1000e,multifunction=on,addr=10.0,id=netfn0
> > 
> > And I don't quite see what "bad things" could happen. We're resetting the
> > machine to a stable state and the new OS instance will just not see the
> > removed function (just like only function netfn0 got added).
> 
> 
> Interesting. Thanks for looking this up.
> 
> Given that the intention of this patch was a simplification of the existing
> design, without changing what we currently do regarding PCI functions and 
> unplug,
> and apparently it just did that, let's drop it.

I think that's best.  As Greg says, I think maintaining the behaviour
that reset completes pending hotplugs should be retained, and the
usual constraints on non-zero function hot-unplug don't apply at reset
time.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v3 1/7] spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical()

2021-02-16 Thread David Gibson

On Mon, Feb 15, 2021 at 11:40:06AM +0100, Greg Kurz wrote:
> On Thu, 11 Feb 2021 19:52:40 -0300
> Daniel Henrique Barboza  wrote:
> 
> > drc_isolate_logical() is used to move the DRC from the "Configured" to the
> > "Available" state, erroring out if the DRC is in the unexpected "Unisolate"
> > state and doing nothing (with RTAS_OUT_SUCCESS) if the DRC is already in
> > "Available" or in "Unusable" state.
> > 
> > When moving from "Configured" to "Available", the DRC is moved to the
> > LOGICAL_AVAILABLE state, a drc->unplug_requested check is done and, if true,
> > spapr_drc_detach() is called.
> > 
> > What spapr_drc_detach() does then is:
> > 
> > - set drc->unplug_requested to true. In fact, this is the only place where
> > unplug_request is set to true;
> > - does nothing else if drc->state != drck->empty_state. If the DRC state
> > is equal to drck->empty_state, spapr_drc_release() is called. For logical
> > DRCs, drck->empty_state = LOGICAL_UNUSABLE.
> > 
> > In short, calling spapr_drc_detach() in drc_isolate_logical() does nothing. 
> > It'll
> > set unplug_request to true again ('again' since it was already true - 
> > otherwise the
> > function wouldn't be called), and will return without calling 
> > spapr_drc_release()
> > because the DRC is not in LOGICAL_UNUSABLE, since drc_isolate_logical() just
> > moved it to LOGICAL_AVAILABLE. The only place where the logical DRC is 
> > released
> > is when called from drc_set_unusable(), when it is moved to the "Unusable" 
> > state.
> > As it should, according to PAPR.
> > 
> > Even though calling spapr_drc_detach() in drc_isolate_logical() is benign, 
> > removing
> > it will avoid further thought about the matter. So let's go ahead and do 
> > that.
> > 
> 
> Good catch. This path looks useless for logical DRCs indeed.
> 
> > As a note, this logic was introduced in commit bbf5c878ab76. Since then, 
> > the DRC
> > handling code was refactored and enhanced, and PAPR itself went through some
> > changes in the DRC area as well. It is expected that some assumptions we 
> > had back
> > then are now deprecated.
> > 
> 
> As specified in [1]:
> 
> Please do not use lines that are longer than 76 characters in your
> commit message (so that the text still shows up nicely with "git show"
> in a 80-columns terminal window).
> 
> [1] 
> https://wiki.qemu.org/Contribute/SubmitAPatch#Write_a_meaningful_commit_message

I've applied this patch, but re-wrapped the commit message.

> > Signed-off-by: Daniel Henrique Barboza 
> > ---
> >  hw/ppc/spapr_drc.c | 13 -
> >  1 file changed, 13 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > index 8571d5bafe..84bd3c881f 100644
> > --- a/hw/ppc/spapr_drc.c
> > +++ b/hw/ppc/spapr_drc.c
> > @@ -132,19 +132,6 @@ static uint32_t drc_isolate_logical(SpaprDrc *drc)
> >  
> >  drc->state = SPAPR_DRC_STATE_LOGICAL_AVAILABLE;
> >  
> > -/* if we're awaiting release, but still in an unconfigured state,
> > - * it's likely the guest is still in the process of configuring
> > - * the device and is transitioning the devices to an ISOLATED
> > - * state as a part of that process. so we only complete the
> > - * removal when this transition happens for a device in a
> > - * configured state, as suggested by the state diagram from PAPR+
> > - * 2.7, 13.4
> > - */
> > -if (drc->unplug_requested) {
> > -uint32_t drc_index = spapr_drc_index(drc);
> > -trace_spapr_drc_set_isolation_state_finalizing(drc_index);
> 
> I was expecting a change in hw/ppc/trace-events to ditch this trace,
> but it is still called by drc_isolate_physical(), so we're good.
> 
> Reviewed-by: Greg Kurz 
> 
> > -spapr_drc_detach(drc);
> > -}
> >  return RTAS_OUT_SUCCESS;
> >  }
> >  
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v2 3/7] device_tree: add qemu_fdt_setprop_string_array helper

2021-02-16 Thread David Gibson

On Thu, Feb 11, 2021 at 05:19:41PM +, Alex Bennée wrote:
> A string array in device tree is simply a series of \0 terminated
> strings next to each other. As libfdt doesn't support that directly
> we need to build it ourselves.

Hm, that might not make a bad extension to libfdt...

> 
> Signed-off-by: Alex Bennée 
> Reviewed-by: Alistair Francis 
> Message-Id: <20201105175153.30489-4-alex.ben...@linaro.org>
> 
> ---
> v2
>   - checkpatch long line fix
> ---
>  include/sysemu/device_tree.h | 17 +
>  softmmu/device_tree.c| 26 ++
>  2 files changed, 43 insertions(+)
> 
> diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
> index 982c89345f..8a2fe55622 100644
> --- a/include/sysemu/device_tree.h
> +++ b/include/sysemu/device_tree.h
> @@ -70,6 +70,23 @@ int qemu_fdt_setprop_u64(void *fdt, const char *node_path,
>   const char *property, uint64_t val);
>  int qemu_fdt_setprop_string(void *fdt, const char *node_path,
>  const char *property, const char *string);
> +
> +/**
> + * qemu_fdt_setprop_string_array: set a string array property
> + *
> + * @fdt: pointer to the dt blob
> + * @name: node name
> + * @prop: property array
> + * @array: pointer to an array of string pointers
> + * @len: length of array
> + *
> + * assigns a string array to a property. This function converts and
> + * array of strings to a sequential string with \0 separators before
> + * setting the property.
> + */
> +int qemu_fdt_setprop_string_array(void *fdt, const char *node_path,
> +  const char *prop, char **array, int len);
> +
>  int qemu_fdt_setprop_phandle(void *fdt, const char *node_path,
>   const char *property,
>   const char *target_node_path);
> diff --git a/softmmu/device_tree.c b/softmmu/device_tree.c
> index b9a3ddc518..2691c58cf6 100644
> --- a/softmmu/device_tree.c
> +++ b/softmmu/device_tree.c
> @@ -21,6 +21,7 @@
>  #include "qemu/error-report.h"
>  #include "qemu/option.h"
>  #include "qemu/bswap.h"
> +#include "qemu/cutils.h"
>  #include "sysemu/device_tree.h"
>  #include "sysemu/sysemu.h"
>  #include "hw/loader.h"
> @@ -397,6 +398,31 @@ int qemu_fdt_setprop_string(void *fdt, const char 
> *node_path,
>  return r;
>  }
>  
> +/*
> + * libfdt doesn't allow us to add string arrays directly but they are
> + * test a series of null terminated strings with a length. We build
> + * the string up here so we can calculate the final length.
> + */
> +int qemu_fdt_setprop_string_array(void *fdt, const char *node_path,
> +  const char *prop, char **array, int len)
> +{
> +int ret, i, total_len = 0;
> +char *str, *p;
> +for (i = 0; i < len; i++) {
> +total_len += strlen(array[i]) + 1;
> +}
> +p = str = g_malloc0(total_len);
> +for (i = 0; i < len; i++) {
> +int len = strlen(array[i]) + 1;
> +pstrcpy(p, len, array[i]);
> +p += len;
> +}
> +
> +ret = qemu_fdt_setprop(fdt, node_path, prop, str, total_len);
> +g_free(str);
> +return ret;
> +}
> +
>  const void *qemu_fdt_getprop(void *fdt, const char *node_path,
>   const char *property, int *lenp, Error **errp)
>  {

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature

Re: [PATCH v4 04/24] python: create utils sub-package

2021-02-16 Thread Cleber Rosa

On Thu, Feb 11, 2021 at 01:58:36PM -0500, John Snow wrote:
> Create a space for miscellaneous things that don't belong strictly in
> "qemu.machine" nor "qemu.qmp" packages.
> 
> Signed-off-by: John Snow 
> ---
>  python/qemu/machine/__init__.py |  8 
>  python/qemu/utils/__init__.py   | 23 +++
>  python/qemu/{machine => utils}/accel.py |  0
>  tests/acceptance/boot_linux.py  |  2 +-
>  tests/acceptance/virtio-gpu.py  |  2 +-
>  tests/acceptance/virtiofs_submounts.py  |  2 +-
>  tests/vm/aarch64vm.py   |  2 +-
>  tests/vm/basevm.py  |  3 ++-
>  8 files changed, 29 insertions(+), 13 deletions(-)
>  create mode 100644 python/qemu/utils/__init__.py
>  rename python/qemu/{machine => utils}/accel.py (100%)
> 
> diff --git a/python/qemu/machine/__init__.py b/python/qemu/machine/__init__.py
> index 27b0b19abd3..891a8f784b5 100644
> --- a/python/qemu/machine/__init__.py
> +++ b/python/qemu/machine/__init__.py
> @@ -8,10 +8,6 @@
>   - QEMUQtestMachine: VM class, with a qtest socket.
>  
>  - QEMUQtestProtocol: Connect to, send/receive qtest messages.
> -
> -- list_accel: List available accelerators
> -- kvm_available: Probe for KVM support
> -- tcg_available: Probe for TCG support
>  """
>  
>  # Copyright (C) 2020 John Snow for Red Hat Inc.
> @@ -26,15 +22,11 @@
>  # the COPYING file in the top-level directory.
>  #
>  
> -from .accel import kvm_available, list_accel, tcg_available
>  from .machine import QEMUMachine
>  from .qtest import QEMUQtestMachine, QEMUQtestProtocol
>  
>  
>  __all__ = (
> -'list_accel',
> -'kvm_available',
> -'tcg_available',
>  'QEMUMachine',
>  'QEMUQtestProtocol',
>  'QEMUQtestMachine',
> diff --git a/python/qemu/utils/__init__.py b/python/qemu/utils/__init__.py
> new file mode 100644
> index 000..edf807a93e5
> --- /dev/null
> +++ b/python/qemu/utils/__init__.py
> @@ -0,0 +1,23 @@
> +"""
> +QEMU development and testing utilities
> +
> +This library provides a small handful of utilities for performing various 
> tasks
> +not directly related to the launching of a VM.
> +
> +The only module included at present is accel; its public functions are
> +repeated here for your convenience:
> +
> +- list_accel: List available accelerators
> +- kvm_available: Probe for KVM support
> +- tcg_available: Prove for TCG support
> +"""
> +
> +# pylint: disable=import-error
> +from .accel import kvm_available, list_accel, tcg_available
> +
> +
> +__all__ = (
> +'list_accel',
> +'kvm_available',
> +'tcg_available',
> +)
> diff --git a/python/qemu/machine/accel.py b/python/qemu/utils/accel.py
> similarity index 100%
> rename from python/qemu/machine/accel.py
> rename to python/qemu/utils/accel.py
> diff --git a/tests/acceptance/boot_linux.py b/tests/acceptance/boot_linux.py
> index 212365fd185..824cf03d5f4 100644
> --- a/tests/acceptance/boot_linux.py
> +++ b/tests/acceptance/boot_linux.py
> @@ -12,7 +12,7 @@
>  
>  from avocado_qemu import Test, BUILD_DIR
>  
> -from qemu.machine import kvm_available, tcg_available
> +from qemu.utils import kvm_available, tcg_available
>  

With the latest changes merged earlier Today, this won't be necessary
anymore on boot_linux.py, but the equivalent change will be necessary
on tests/acceptance/avocado_qemu/__init__.py.

With the change mentioned above (which you would catch on a rebase):

Reviewed-by: Cleber Rosa 
Tested-by: Cleber Rosa 


signature.asc
Description: PGP signature

Re: [PATCH RFC v3 00/12] hw/block/nvme: metadata and end-to-end data protection support

2021-02-16 Thread Keith Busch

On Mon, Feb 15, 2021 at 12:02:28AM +0100, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> This is RFC v3 of a series that adds support for metadata and end-to-end data
> protection.
> 
> First, on the subject of metadata, in v1, support was restricted to
> extended logical blocks, which was pretty trivial to implement, but
> required special initialization and broke DULBE. In v2, metadata is
> always stored continuously at the end of the underlying block device.
> This has the advantage of not breaking DULBE since the data blocks
> remains aligned and allows bdrv_block_status to be used to determinate
> allocation status. It comes at the expense of complicating the extended
> LBA emulation, but on the other hand it also gains support for metadata
> transfered as a separate buffer.
> 
> The end-to-end data protection support blew up in terms of required
> changes. This is due to the fact that a bunch of new commands has been
> added to the device since v1 (zone append, compare, copy), and they all
> require various special handling for protection information. If
> potential reviewers would like it split up into multiple patches, each
> adding pi support to one command, shout out.
> 
> The core of the series (metadata and eedp) is preceeded by a set of
> patches that refactors mapping (yes, again) and tries to deal with the
> qsg/iov duality mess (maybe also again?).
> 
> Support fro metadata and end-to-end data protection is all joint work
> with Gollu Appalanaidu.

Patches 1 - 8 look good to me:

Reviewed-by: Keith Busch 

I like the LBA format and protection info support too, but might need
some minor changes.

The verify implementation looked fine, but lacking a generic backing for
it sounds to me the use cases aren't there to justify taking on this
feature.

Re: [PATCH 0/3] virtiofsd: Add options to enable/disable posix acl

2021-02-16 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20210216233611.33400-1-vgo...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210216233611.33400-1-vgo...@redhat.com
Subject: [PATCH 0/3] virtiofsd: Add options to enable/disable posix acl

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/20210216233611.33400-1-vgo...@redhat.com -> 
patchew/20210216233611.33400-1-vgo...@redhat.com
Switched to a new branch 'test'
599c357 virtiofsd: Change umask if posix acls are enabled
599cc19 virtiofsd: Add umask to seccom allow list
72ea185 virtiofsd: Add an option to enable/disable posix acls

=== OUTPUT BEGIN ===
1/3 Checking commit 72ea185ac754 (virtiofsd: Add an option to enable/disable 
posix acls)
WARNING: Block comments should align the * on each line
#56: FILE: tools/virtiofsd/passthrough_ll.c:648:
+ * Either user specified to disable posix_acl, or did not specify
+  * anything. In both the cases do not enable posix acl.

WARNING: Block comments should align the * on each line
#57: FILE: tools/virtiofsd/passthrough_ll.c:649:
+  * anything. In both the cases do not enable posix acl.
+ */

total: 0 errors, 2 warnings, 45 lines checked

Patch 1/3 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
2/3 Checking commit 599cc19faf56 (virtiofsd: Add umask to seccom allow list)
3/3 Checking commit 599c3575ec83 (virtiofsd: Change umask if posix acls are 
enabled)
ERROR: braces {} are necessary for all arms of this statement
#95: FILE: tools/virtiofsd/passthrough_ll.c:1072:
+if (change_umask)
[...]

total: 1 errors, 0 warnings, 96 lines checked

Patch 3/3 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210216233611.33400-1-vgo...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[PATCH 0/3] virtiofsd: Add options to enable/disable posix acl

2021-02-16 Thread Vivek Goyal

Luis Henriques reported that fstest generic/099 fails with virtiofs.
Little debugging showed that we don't enable acl support. So this
patch series provides option to enable/disable posix acl support. By
default it is disabled.

I have run blogbench and pjdfstests with posix acl enabled and
things work fine. 

Luis, can you please apply these patches, and run virtiofsd with
"-o posix_acl" and see if it fixes the failure you are seeing. I
ran the steps you provided manually and it fixes the issue for
me.

Vivek Goyal (3):
  virtiofsd: Add an option to enable/disable posix acls
  virtiofsd: Add umask to seccom allow list
  virtiofsd: Change umask if posix acls are enabled

 tools/virtiofsd/passthrough_ll.c  | 45 +++
 tools/virtiofsd/passthrough_seccomp.c |  1 +
 2 files changed, 40 insertions(+), 6 deletions(-)

-- 
2.25.4

[PATCH 2/3] virtiofsd: Add umask to seccom allow list

2021-02-16 Thread Vivek Goyal

Next patch is going to make use of "umask" syscall. So allow it.

Signed-off-by: Vivek Goyal 
---
 tools/virtiofsd/passthrough_seccomp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/virtiofsd/passthrough_seccomp.c 
b/tools/virtiofsd/passthrough_seccomp.c
index ea852e2e33..f0313c5ce4 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -114,6 +114,7 @@ static const int syscall_whitelist[] = {
 SCMP_SYS(utimensat),
 SCMP_SYS(write),
 SCMP_SYS(writev),
+SCMP_SYS(umask),
 };
 
 /* Syscalls used when --syslog is enabled */
-- 
2.25.4

[PATCH 3/3] virtiofsd: Change umask if posix acls are enabled

2021-02-16 Thread Vivek Goyal

When parent directory has default acl and a file is created in that
directory, then umask is ignored and final file permissions are
determined using default acl instead. (man 2 umask).

Currently, fuse applies the umask and sends modified mode in create
request accordingly. fuse server can set FUSE_DONT_MASK and tell
fuse client to not apply umask and fuse server will take care of
it as needed.

With posix acls enabled, requirement will be that we want umask
to determine final file mode if parent directory does not have
default acl.

So if posix acls are enabled, opt in for FUSE_DONT_MASK. virtiofsd
will set umask of the thread doing file creation. And host kernel
should use that umask if parent directory does not have default
acls, otherwise umask does not take affect.

Miklos mentioned that we already call unshare(CLONE_FS) for
every thread. That means umask has now become property of per
thread and it should be ok to manipulate it in file creation path.

So this patch opts in for FUSE_DONT_MASK if posix acls are enabled
and changes umask to caller umask before file creation and restores
original umask after file creation is done.

This should fix fstest generic/099.

Reported-by: Luis Henriques 
Signed-off-by: Vivek Goyal 
---
 tools/virtiofsd/passthrough_ll.c | 26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 34b2848e61..84691571d2 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -120,6 +120,7 @@ struct lo_inode {
 struct lo_cred {
 uid_t euid;
 gid_t egid;
+mode_t umask;
 };
 
 enum {
@@ -169,6 +170,8 @@ struct lo_data {
 /* An O_PATH file descriptor to /proc/self/fd/ */
 int proc_self_fd;
 int user_posix_acl;
+/* If set, virtiofsd is responsible for setting umask during creation */
+bool change_umask;
 };
 
 static const struct fuse_opt lo_opts[] = {
@@ -641,7 +644,8 @@ static void lo_init(void *userdata, struct fuse_conn_info 
*conn)
  * in fuse_lowlevel.c
  */
 fuse_log(FUSE_LOG_DEBUG, "lo_init: enabling posix acl\n");
-conn->want |= FUSE_CAP_POSIX_ACL;
+conn->want |= FUSE_CAP_POSIX_ACL | FUSE_CAP_DONT_MASK;
+lo->change_umask = true;
 } else {
 /*
  * Either user specified to disable posix_acl, or did not specify
@@ -649,6 +653,7 @@ static void lo_init(void *userdata, struct fuse_conn_info 
*conn)
  */
 fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling posix_acl\n");
 conn->want &= ~FUSE_CAP_POSIX_ACL;
+lo->change_umask = false;
 }
 }
 
@@ -1043,7 +1048,8 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, 
const char *name)
  * ownership of caller.
  * TODO: What about selinux context?
  */
-static int lo_change_cred(fuse_req_t req, struct lo_cred *old)
+static int lo_change_cred(fuse_req_t req, struct lo_cred *old,
+  bool change_umask)
 {
 int res;
 
@@ -1063,11 +1069,14 @@ static int lo_change_cred(fuse_req_t req, struct 
lo_cred *old)
 return errno_save;
 }
 
+if (change_umask)
+old->umask = umask(req->ctx.umask);
+
 return 0;
 }
 
 /* Regain Privileges */
-static void lo_restore_cred(struct lo_cred *old)
+static void lo_restore_cred(struct lo_cred *old, bool restore_umask)
 {
 int res;
 
@@ -1082,6 +1091,9 @@ static void lo_restore_cred(struct lo_cred *old)
 fuse_log(FUSE_LOG_ERR, "setegid(%u): %m\n", old->egid);
 exit(1);
 }
+
+if (restore_umask)
+umask(old->umask);
 }
 
 static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
@@ -1106,7 +1118,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t 
parent,
 return;
 }
 
-saverr = lo_change_cred(req, &old);
+saverr = lo_change_cred(req, &old, lo->change_umask && !S_ISLNK(mode));
 if (saverr) {
 goto out;
 }
@@ -1115,7 +1127,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t 
parent,
 
 saverr = errno;
 
-lo_restore_cred(&old);
+lo_restore_cred(&old, lo->change_umask && !S_ISLNK(mode));
 
 if (res == -1) {
 goto out;
@@ -1780,7 +1792,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, 
const char *name,
 return;
 }
 
-err = lo_change_cred(req, &old);
+err = lo_change_cred(req, &old, lo->change_umask);
 if (err) {
 goto out;
 }
@@ -1791,7 +1803,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, 
const char *name,
 fd = openat(parent_inode->fd, name, fi->flags | O_CREAT | O_EXCL, mode);
 err = fd == -1 ? errno : 0;
 
-lo_restore_cred(&old);
+lo_restore_cred(&old, lo->change_umask);
 
 /* Ignore the error if file exists and O_EXCL was not given */
 if (err && (err != EEXIST || (fi->flags & O_EXCL))) {
-- 
2.25.4

[PATCH 1/3] virtiofsd: Add an option to enable/disable posix acls

2021-02-16 Thread Vivek Goyal

fuse has an option FUSE_POSIX_ACL which needs to be opted in by fuse
server to enable posix acls.

Add virtiofsd option "-o posix_acl/no_posix_acl" to let users enable/disable
posix acl support. By default it is disabled as of now.

Signed-off-by: Vivek Goyal 
---
 tools/virtiofsd/passthrough_ll.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 147b59338a..34b2848e61 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -168,6 +168,7 @@ struct lo_data {
 
 /* An O_PATH file descriptor to /proc/self/fd/ */
 int proc_self_fd;
+int user_posix_acl;
 };
 
 static const struct fuse_opt lo_opts[] = {
@@ -198,6 +199,8 @@ static const struct fuse_opt lo_opts[] = {
 { "allow_direct_io", offsetof(struct lo_data, allow_direct_io), 1 },
 { "no_allow_direct_io", offsetof(struct lo_data, allow_direct_io), 0 },
 { "announce_submounts", offsetof(struct lo_data, announce_submounts), 1 },
+{ "posix_acl", offsetof(struct lo_data, user_posix_acl), 1 },
+{ "no_posix_acl", offsetof(struct lo_data, user_posix_acl), 0 },
 FUSE_OPT_END
 };
 static bool use_syslog = false;
@@ -630,6 +633,23 @@ static void lo_init(void *userdata, struct fuse_conn_info 
*conn)
  "does not support it\n");
 lo->announce_submounts = false;
 }
+
+if (lo->user_posix_acl == 1) {
+/*
+ * User explicitly asked for this option. Enable it unconditionally.
+ * If connection does not have this capability, it should fail
+ * in fuse_lowlevel.c
+ */
+fuse_log(FUSE_LOG_DEBUG, "lo_init: enabling posix acl\n");
+conn->want |= FUSE_CAP_POSIX_ACL;
+} else {
+/*
+ * Either user specified to disable posix_acl, or did not specify
+  * anything. In both the cases do not enable posix acl.
+ */
+fuse_log(FUSE_LOG_DEBUG, "lo_init: disabling posix_acl\n");
+conn->want &= ~FUSE_CAP_POSIX_ACL;
+}
 }
 
 static void lo_getattr(fuse_req_t req, fuse_ino_t ino,
@@ -3533,6 +3553,7 @@ int main(int argc, char *argv[])
 .posix_lock = 0,
 .allow_direct_io = 0,
 .proc_self_fd = -1,
+.user_posix_acl = -1,
 };
 struct lo_map_elem *root_elem;
 struct lo_map_elem *reserve_elem;
-- 
2.25.4

Re: [PATCH v13 0/5] UFFD write-tracking migration/snapshots

2021-02-16 Thread Peter Xu

Hi, Andrey,

On Sat, Feb 13, 2021 at 12:34:07PM +0300, Andrey Gruzdev wrote:
> On 12.02.2021 19:11, Peter Xu wrote:
> > On Fri, Feb 12, 2021 at 09:52:52AM +0100, David Hildenbrand wrote:
> > > On 12.02.21 04:06, Peter Xu wrote:
> > > > On Thu, Feb 11, 2021 at 10:09:58PM +0100, David Hildenbrand wrote:
> > > > > The issue is when the discard happened before starting the snapshot. 
> > > > > Write-protection won‘t work and the zeroed content won‘t be retained 
> > > > > in the snapshot.
> > > > I see what you mean now, and iiuc it will only be a problem if 
> > > > init_on_free=1.
> > > > I think CONFIG_INIT_ON_FREE_DEFAULT_ON should be off for most distros, 
> > > > so the
> > > Yes, some distros seem to enable init_on_alloc instead. Looking at the
> > > introducing commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1
> > > and init_on_free=1 boot options") there are security use cases and it 
> > > might
> > > become important with memory tagging.
> > > 
> > > Note that in Linux, there was also the option to poison pages with 0,
> > > removed via f289041ed4cf ("mm, page_poison: remove
> > > CONFIG_PAGE_POISONING_ZERO"), available in some kernels that supported 
> > > free
> > > page reporting.
> > > 
> > > It got removed and use cases got told to use init_on_free.
> 
> I think we talk about init_on_free()/init_on_alloc() on guest side, right?

Right.  IIUC it's the init_on_free() that matters.

We'll have no issue if init_on_alloc=1 && init_on_free=0, since in that case
all pages will be zeroed after all before the new page returned to the caller
to allocate the page. Then we're safe, I think.

> Still can't get how it relates to host's unpopulated pages..
> Try to look from hardware side. Untouched SDRAM in hardware is required to 
> contain zeroes somehow? No.
> These 'trash' pages in migration stream are like never written physical 
> memory pages, they are really
> not needed in snapshot but they don't do any harm as well as there's no harm 
> in that never-written physical
> page is full of garbage.
> 
> Do these 'trash' pages in snapshot contain sensitive information not allowed 
> to be accessed by the same VM?
> I think no. Or we need a good example how it can be potentially exploited.
> 
> The only issue that I see is madvise(MADV_DONTNEED) for RAM blocks during 
> snapshotting. And free page reporting
> or memory balloon is secondary - the point is that UFFD_WP snapshot is 
> incompatible with madvise(MADV_DONTNEED) on
> hypervisor side. No matter which guest functionality can induce it.

I think the problem is if with init_on_free=1, the kernel will assume that
all the pages that got freed has been zeroed before-hand so it thinks that it's
a waste of time to zero it again when the page is reused/reallocated.  As a
reference see kernel prep_new_page() where there's:

if (!free_pages_prezeroed() && want_init_on_alloc(gfp_flags))
kernel_init_free_pages(page, 1 << order);

In this case I believe free_pages_prezeroed() will return true, then we don't
even need to check want_init_on_alloc() at all. Note that it'll cover all the
cases where kernel allocates with __GFP_ZERO: it means it could happen that
even the guest kernel tries to alloc_page(__GFP_ZERO) it may got a page with
random data after the live snapshot is loaded.  So it's not about any hardware,
it's the optimization of guest kernel instead.  It is actually reasonable and
efficient since if we *know* that page is zero page then we shouldn't bother
zeroing it again.  However it brought us a bit of trouble on live snapshot that
the current solution might not work for all guest OS configurations.

> 
> > > > impact should be small, I think.  I thought about it, but indeed I 
> > > > didn't see a
> > > > good way to fix this if without fixing the zero page copy for live 
> > > > snapshot.
> > > We should really document this (unexpected) behavior of snapshotting.
> > > Otherwise, the next feature comes around that relies on pages that were
> > > discarded to remain zeroed (I even have one in mind ;) ) and forgets to
> > > disable snapshots.
> > Agreed.  I'll see whether Andrey would have any idea to workaround this, or
> > further comment.  Or I can draft a patch to document this next week (or 
> > unless
> > Andrey would beat me to it :).
> > 
> Really better to document this specific behaviour but also clarify that the 
> saved state remains
> consistent and secure, off course if you agree with my arguments.

Yes, no rush on anything yet, let's reach a consensus on understanding the
problem before trying to even document anything.  If you agree with above, feel
free to draft a documentation update patch if you'd like, that could also
contain the code to prohibit virtio-balloon page reporting when live snapshot.

IMHO if above issue exists, it'll be indeed better to implement the MISSING
tracking as well with UFFDIO_ZEROCOPY - it's still not optimized to keep trash
pages in the live snapshot, since for a high d

Re: [PATCH RFC v3 12/12] hw/block/nvme: add support for the format nvm command

2021-02-16 Thread Keith Busch

On Mon, Feb 15, 2021 at 12:02:40AM +0100, Klaus Jensen wrote:
> From: Minwoo Im 
> 
> Format NVM admin command can make a namespace or namespaces to be
> with different LBA size and metadata size with protection information
> types.
> 
> This patch introduces Format NVM command with LBA format, Metadata, and
> Protection Information for the device. The secure erase operation things
> are yet to be added.
> 
> The parameter checks inside of this patch has been referred from
> Keith's old branch.

Oh, and here's the format command now, so my previous comment on patch
11 doesn't matter.

> +struct nvme_aio_format_ctx {
> +NvmeRequest   *req;
> +NvmeNamespace *ns;
> +
> +/* number of outstanding write zeroes for this namespace */
> +int *count;

Shouldn't this count be the NvmeRequest's opaque value?

Re: [PATCH RFC v3 11/12] hw/block/nvme: support multiple lba formats

2021-02-16 Thread Keith Busch

On Mon, Feb 15, 2021 at 12:02:39AM +0100, Klaus Jensen wrote:
> From: Minwoo Im 
> 
> This patch introduces multiple LBA formats supported with the typical
> logical block sizes of 512 bytes and 4096 bytes as well as metadata
> sizes of 0, 8, 16 and 64 bytes. The format will be chosed based on the
> lbads and ms parameters of the nvme-ns device.

This is much less useful without support for Format NVM command.

Re: [PATCH RFC v3 09/12] hw/block/nvme: add verify command

2021-02-16 Thread Keith Busch

On Mon, Feb 15, 2021 at 12:02:37AM +0100, Klaus Jensen wrote:
> From: Gollu Appalanaidu 
> 
> See NVM Express 1.4, section 6.14 ("Verify Command").
> 
> Signed-off-by: Gollu Appalanaidu 
> [k.jensen: rebased, refactored for e2e]
> Signed-off-by: Klaus Jensen 

Verify is a generic block command supported in other protocols beyond
nvme. If we're going to support the command in nvme, I prefer the
implementation had generic backing out of the qemu block API rather than
emulate the entirety out of the nvme device.

Re: [PATCH RFC v3 08/12] hw/block/nvme: end-to-end data protection

2021-02-16 Thread Keith Busch

On Mon, Feb 15, 2021 at 12:02:36AM +0100, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Add support for namespaces formatted with protection information. The
> type of end-to-end data protection (i.e. Type 1, Type 2 or Type 3) is
> selected with the `pi` nvme-ns device parameter. If the number of
> metadata bytes is larger than 8, the `pil` nvme-ns device parameter may
> be used to control the location of the 8-byte DIF tuple. The default
> `pil` value of '0', causes the DIF tuple to be transferred as the last
> 8 bytes of the metadata. Set to 1 to store this in the first eight bytes
> instead.

This file is getting quite large. I think this feature can have the bulk
of the implementation in a separate file. For ex, nvme-dif.c. But like
the linux implementation this is based on, it isn't really nvme
specific, so even better if t10 dif is implemented in a generic location
with an API for nvme and others.

Re: [PATCH RESEND] hostmem: Don't report pmem attribute if unsupported

2021-02-16 Thread John Snow


On 2/16/21 5:23 PM, Eduardo Habkost wrote:

On Tue, Jan 26, 2021 at 08:48:25AM +0100, Michal Privoznik wrote:

When management applications (like Libvirt) want to check whether
memory-backend-file.pmem is supported they can list object
properties using 'qom-list-properties'. However, 'pmem' is
declared always (and thus reported always) and only at runtime
QEMU errors out if it was built without libpmem (and thus can not
guarantee write persistence). This is suboptimal since we have
ability to declare attributes at compile time.

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1915216
Signed-off-by: Michal Privoznik 
Reviewed-by: Daniel Henrique Barboza 
Tested-by: Daniel Henrique Barboza 
Reviewed-by: Igor Mammedov 


I'm not a fan of making availability of properties conditional
(even if at compile time), but if this helps libvirt I guess it
makes sense.



Compile time might be OK, but if we want to describe everything via QAPI 
eventually, we just need to be able to describe that compile-time 
requisite appropriately.


Conditional at run-time is I think the thing that absolutely has to go 
wherever it surfaces.



CCing John, who has been thinking a lot about these questions.



Thanks for the heads up. Good reminder that libvirt uses the existence 
of properties as a bellwether for feature support. I don't think I like 
that idea, but I like breaking libvirt even less.


--js


I'm queueing this on machine-next.  Thanks, and sorry for the delay!


---

This is just a resend of a patch I've sent earlier with Reviewed-by and
Tested-by added:

https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg04558.html

  backends/hostmem-file.c | 13 -
  1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 40e1e5b3e3..7e30eb5985 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -123,6 +123,7 @@ static void file_memory_backend_set_align(Object *o, 
Visitor *v,
  fb->align = val;
  }
  
+#ifdef CONFIG_LIBPMEM

  static bool file_memory_backend_get_pmem(Object *o, Error **errp)
  {
  return MEMORY_BACKEND_FILE(o)->is_pmem;
@@ -139,17 +140,9 @@ static void file_memory_backend_set_pmem(Object *o, bool 
value, Error **errp)
  return;
  }
  
-#ifndef CONFIG_LIBPMEM

-if (value) {
-error_setg(errp, "Lack of libpmem support while setting the 'pmem=on'"
-   " of %s. We can't ensure data persistence.",
-   object_get_typename(o));
-return;
-}
-#endif
-
  fb->is_pmem = value;
  }
+#endif /* CONFIG_LIBPMEM */
  
  static void file_backend_unparent(Object *obj)

  {
@@ -180,8 +173,10 @@ file_backend_class_init(ObjectClass *oc, void *data)
  file_memory_backend_get_align,
  file_memory_backend_set_align,
  NULL, NULL);
+#ifdef CONFIG_LIBPMEM
  object_class_property_add_bool(oc, "pmem",
  file_memory_backend_get_pmem, file_memory_backend_set_pmem);
+#endif
  }
  
  static void file_backend_instance_finalize(Object *o)

--
2.26.2

[PATCH v2 1/3] target/arm: Add support for FEAT_SSBS, Speculative Store Bypass Safe

2021-02-16 Thread Rebecca Cran

Add support for FEAT_SSBS. SSBS (Speculative Store Bypass Safe) is an
optional feature in ARMv8.0, and mandatory in ARMv8.5.

Signed-off-by: Rebecca Cran 
---
 target/arm/cpu.h   | 15 +++-
 target/arm/helper.c| 37 
 target/arm/internals.h |  6 
 target/arm/translate-a64.c | 12 +++
 4 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index f240275407bc..a0a3ee7bcde9 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1201,6 +1201,7 @@ void pmu_init(ARMCPU *cpu);
 #define SCTLR_TE  (1U << 30) /* AArch32 only */
 #define SCTLR_EnIB(1U << 30) /* v8.3, AArch64 only */
 #define SCTLR_EnIA(1U << 31) /* v8.3, AArch64 only */
+#define SCTLR_DSSBS_32 (1U << 31) /* v8.5, AArch32 only */
 #define SCTLR_BT0 (1ULL << 35) /* v8.5-BTI */
 #define SCTLR_BT1 (1ULL << 36) /* v8.5-BTI */
 #define SCTLR_ITFSB   (1ULL << 37) /* v8.5-MemTag */
@@ -1208,7 +1209,7 @@ void pmu_init(ARMCPU *cpu);
 #define SCTLR_TCF (3ULL << 40) /* v8.5-MemTag */
 #define SCTLR_ATA0(1ULL << 42) /* v8.5-MemTag */
 #define SCTLR_ATA (1ULL << 43) /* v8.5-MemTag */
-#define SCTLR_DSSBS   (1ULL << 44) /* v8.5 */
+#define SCTLR_DSSBS_64 (1ULL << 44) /* v8.5, AArch64 only */
 
 #define CPTR_TCPAC(1U << 31)
 #define CPTR_TTA  (1U << 20)
@@ -1245,6 +1246,7 @@ void pmu_init(ARMCPU *cpu);
 #define CPSR_IL (1U << 20)
 #define CPSR_DIT (1U << 21)
 #define CPSR_PAN (1U << 22)
+#define CPSR_SSBS (1U << 23)
 #define CPSR_J (1U << 24)
 #define CPSR_IT_0_1 (3U << 25)
 #define CPSR_Q (1U << 27)
@@ -1307,6 +1309,7 @@ void pmu_init(ARMCPU *cpu);
 #define PSTATE_A (1U << 8)
 #define PSTATE_D (1U << 9)
 #define PSTATE_BTYPE (3U << 10)
+#define PSTATE_SSBS (1U << 12)
 #define PSTATE_IL (1U << 20)
 #define PSTATE_SS (1U << 21)
 #define PSTATE_PAN (1U << 22)
@@ -3883,6 +3886,11 @@ static inline bool isar_feature_aa32_dit(const 
ARMISARegisters *id)
 return FIELD_EX32(id->id_pfr0, ID_PFR0, DIT) != 0;
 }
 
+static inline bool isar_feature_aa32_ssbs(const ARMISARegisters *id)
+{
+return FIELD_EX32(id->id_pfr2, ID_PFR2, SSBS) != 0;
+}
+
 /*
  * 64-bit feature tests via id registers.
  */
@@ -4137,6 +4145,11 @@ static inline bool isar_feature_aa64_dit(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, DIT) != 0;
 }
 
+static inline bool isar_feature_aa64_ssbs(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, SSBS) != 0;
+}
+
 /*
  * Feature tests for "does this exist in either 32-bit or 64-bit?"
  */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 0e1a3b94211c..fedcf2e739e2 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4450,6 +4450,24 @@ static const ARMCPRegInfo dit_reginfo = {
 .readfn = aa64_dit_read, .writefn = aa64_dit_write
 };
 
+static uint64_t aa64_ssbs_read(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+return env->pstate & PSTATE_SSBS;
+}
+
+static void aa64_ssbs_write(CPUARMState *env, const ARMCPRegInfo *ri,
+   uint64_t value)
+{
+env->pstate = (env->pstate & ~PSTATE_SSBS) | (value & PSTATE_SSBS);
+}
+
+static const ARMCPRegInfo ssbs_reginfo = {
+.name = "SSBS", .state = ARM_CP_STATE_AA64,
+.opc0 = 3, .opc1 = 3, .crn = 4, .crm = 2, .opc2 = 6,
+.type = ARM_CP_NO_RAW, .access = PL0_RW,
+.readfn = aa64_ssbs_read, .writefn = aa64_ssbs_write
+};
+
 static CPAccessResult aa64_cacheop_poc_access(CPUARMState *env,
   const ARMCPRegInfo *ri,
   bool isread)
@@ -8244,6 +8262,9 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 if (cpu_isar_feature(aa64_dit, cpu)) {
 define_one_arm_cp_reg(cpu, &dit_reginfo);
 }
+if (cpu_isar_feature(aa64_ssbs, cpu)) {
+define_one_arm_cp_reg(cpu, &ssbs_reginfo);
+}
 
 if (arm_feature(env, ARM_FEATURE_EL2) && cpu_isar_feature(aa64_vh, cpu)) {
 define_arm_cp_regs(cpu, vhe_reginfo);
@@ -9463,6 +9484,14 @@ static void take_aarch32_exception(CPUARMState *env, int 
new_mode,
 env->uncached_cpsr &= ~(CPSR_IL | CPSR_J);
 env->daif |= mask;
 
+if (cpu_isar_feature(aa32_ssbs, env_archcpu(env))) {
+if (env->cp15.sctlr_el[new_el] & SCTLR_DSSBS_32) {
+env->uncached_cpsr |= CPSR_SSBS;
+} else {
+env->uncached_cpsr &= ~CPSR_SSBS;
+}
+}
+
 if (new_mode == ARM_CPU_MODE_HYP) {
 env->thumb = (env->cp15.sctlr_el[2] & SCTLR_TE) != 0;
 env->elr_el[2] = env->regs[15];
@@ -9973,6 +10002,14 @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
 new_mode |= PSTATE_TCO;
 }
 
+if (cpu_isar_feature(aa64_ssbs, cpu)) {
+if (env->cp15.sctlr_el[new_el] & SCTLR_DSSBS_64) {
+new_mode |= PSTATE_SSBS;
+} else {
+new_mode &= ~PSTATE_SSBS;
+}
+}
+
 pstate_write(env, PSTATE_DAIF | new_mode);
 env->aarch

[PATCH v2 2/3] target/arm: Enable FEAT_SSBS for "max" AARCH64 CPU

2021-02-16 Thread Rebecca Cran

Set ID_AA64PFR1_EL1.SSBS to 2 and ID_PFR2.SSBS to 1.

Signed-off-by: Rebecca Cran 
Reviewed-by: Richard Henderson 
---
 target/arm/cpu64.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index c255f1bcc393..f0a9e968c9c1 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -674,6 +674,7 @@ static void aarch64_max_initfn(Object *obj)
 
 t = cpu->isar.id_aa64pfr1;
 t = FIELD_DP64(t, ID_AA64PFR1, BT, 1);
+t = FIELD_DP64(t, ID_AA64PFR1, SSBS, 2);
 /*
  * Begin with full support for MTE. This will be downgraded to MTE=0
  * during realize if the board provides no tag memory, much like
@@ -723,6 +724,10 @@ static void aarch64_max_initfn(Object *obj)
 u = FIELD_DP32(u, ID_PFR0, DIT, 1);
 cpu->isar.id_pfr0 = u;
 
+u = cpu->isar.id_pfr2;
+u = FIELD_DP32(u, ID_PFR2, SSBS, 1);
+cpu->isar.id_pfr2 = u;
+
 u = cpu->isar.id_mmfr3;
 u = FIELD_DP32(u, ID_MMFR3, PAN, 2); /* ATS1E1 */
 cpu->isar.id_mmfr3 = u;
-- 
2.26.2

[PATCH v2 3/3] target/arm: Set ID_PFR2.SSBS to 1 for "max" 32-bit CPU

2021-02-16 Thread Rebecca Cran

Enable FEAT_SSBS for the "max" 32-bit CPU.

Signed-off-by: Rebecca Cran 
Reviewed-by: Richard Henderson 
---
 target/arm/cpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 5cf6c056c50f..88a6b183d325 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -2206,6 +2206,10 @@ static void arm_max_initfn(Object *obj)
 t = cpu->isar.id_pfr0;
 t = FIELD_DP32(t, ID_PFR0, DIT, 1);
 cpu->isar.id_pfr0 = t;
+
+t = cpu->isar.id_pfr2;
+t = FIELD_DP32(t, ID_PFR2, SSBS, 1);
+cpu->isar.id_mfr2 = t;
 }
 #endif
 }
-- 
2.26.2

[PATCH v2 0/3] target/arm: Add support for FEAT_SSBS

2021-02-16 Thread Rebecca Cran



Add support for FEAT_SSBS, Speculative Store Bypass Safe. SSBS is an
optional feature in ARMv8.0 and is mandatory in ARMv8.5.

Changes from v1 to v2:

o Removed changes to cpsr_write_from_spsr_elx and cpsr_read_for_spsr_elx.
o Moved the SSBS case in translate-a64.c above DIT to keep the numbers in
  order.
o Moved the check for SCTLR_DSSBS_32 in take_aarch32_exception.

Rebecca Cran (3):
  target/arm: Add support for FEAT_SSBS, Speculative Store Bypass Safe
  target/arm: Enable FEAT_SSBS for "max" AARCH64 CPU
  target/arm: Set ID_PFR2.SSBS to 1 for "max" 32-bit CPU

 target/arm/cpu.c   |  4 +++
 target/arm/cpu.h   | 15 +++-
 target/arm/cpu64.c |  5 +++
 target/arm/helper.c| 37 
 target/arm/internals.h |  6 
 target/arm/translate-a64.c | 12 +++
 6 files changed, 78 insertions(+), 1 deletion(-)

-- 
2.26.2

Re: [PATCH v4 07/10] hw/i386: declare ACPI mother board resource for MMCONFIG region

2021-02-16 Thread Igor Mammedov

On Mon, 15 Feb 2021 17:04:12 -0800
isaku.yamah...@gmail.com wrote:

> From: Isaku Yamahata 
> 
> Declare PNP0C01 device to reserve MMCONFIG region to conform to the
> spec better and play nice with guest BIOSes/OSes.
> 
> According to PCI Firmware Specification[0], MMCONFIG region must be
> reserved by declaring a motherboard resource. It's optional to reserve
> the region in memory map by Int 15 E820h or EFIGetMemoryMap.
> Guest Linux checks if the MMCFG region is reserved by bios memory map
> or ACPI resource. If it's not reserved, Linux falls back to legacy PCI
> configuration access.
> 
> TDVF [1] [2] doesn't reserve MMCONFIG the region in memory map.
> On the other hand OVMF reserves it in memory map without declaring a
> motherboard resource. With memory map reservation, linux guest uses
> MMCONFIG region. However it doesn't comply to PCI Firmware
> specification.
> 
> [0] PCI Firmware specification Revision 3.2
>   4.1.2 MCFG Table Description table 4-2 NOTE 2
>   If the operating system does not natively comprehend reserving the
>   MMCFG region, The MMCFG region must e reserved by firmware. ...
>   For most systems, the mortheroard resource would appear at the root
>   of the ACPI namespace (under \_SB)...
>   The resource can optionally be returned in Int15 E820h or
>   EFIGetMemoryMap as reserved memory but must always be reported
>   through ACPI as a motherboard resource
> 
> [1] TDX: Intel Trust Domain Extension
> 
> https://software.intel.com/content/www/us/en/develop/articles/intel-trust-domain-extensions.html
> [2] TDX Virtual Firmware
> https://github.com/tianocore/edk2-staging/tree/TDVF
> 
> The change to DSDT is as follows.
> 
> @@ -68,32 +68,51 @@
> 
>  If ((CDW3 != Local0))
>  {
>  CDW1 |= 0x10
>  }
> 
>  CDW3 = Local0
>  }
>  Else
>  {
>  CDW1 |= 0x04
>  }
> 
>  Return (Arg3)
>  }
>  }
> +
> +Device (DRAC)
> +{
> +Name (_HID, "PNP0C01" /* System Board */)  // _HID: Hardware ID
> +Name (RBUF, ResourceTemplate ()
> +{
> +DWordMemory (ResourceProducer, PosDecode, MinFixed, 
> MaxFixed, NonCacheable, ReadWrite,
> +0x, // Granularity
> +0xB000, // Range Minimum
> +0xB000, // Range Maximum
> +0x, // Translation Offset
> +0x1000, // Length
> +,, , AddressRangeMemory, TypeStatic)
> +})
> +Method (_CRS, 0, Serialized)  // _CRS: Current Resource Settings
> +{
> +Return (RBUF) /* \_SB_.DRAC.RBUF */
> +}
> +}
>  }
> 
>  Scope (_SB)
>  {
>  Device (HPET)
>  {
>  Name (_HID, EisaId ("PNP0103") /* HPET System Timer */)  // 
> _HID: Hardware ID
>  Name (_UID, Zero)  // _UID: Unique ID
>  OperationRegion (HPTM, SystemMemory, 0xFED0, 0x0400)
>  Field (HPTM, DWordAcc, Lock, Preserve)
>  {
>  VEND,   32,
>  PRD,32
>  }
> 
>  Method (_STA, 0, NotSerialized)  // _STA: Status
> 
> Signed-off-by: Isaku Yamahata 
> ---
>  hw/i386/acpi-build.c | 55 +++-
>  1 file changed, 54 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index e3386ae674..30326f69b3 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -1072,6 +1072,55 @@ static void build_q35_pci0_int(Aml *table)
>  aml_append(table, sb_scope);
>  }
>  
> +static Aml *build_q35_dram_controller(AcpiMcfgInfo *mcfg)
> +{
> +Aml *dev;
> +Aml *rbuf;
> +Aml *resource_template;
> +Aml *rbuf_name;
> +Aml *crs;
> +
> +/* DRAM controller */
> +dev = aml_device("DRAC");
> +aml_append(dev, aml_name_decl("_HID", aml_string("PNP0C01")));
> +
> +resource_template = aml_resource_template();
> +if (mcfg->base + mcfg->size - 1 >= (1ULL << 32)) {
> +aml_append(resource_template,
> +   aml_qword_memory(AML_POS_DECODE,
> +AML_MIN_FIXED,
> +AML_MAX_FIXED,
> +AML_NON_CACHEABLE,
> +AML_READ_WRITE,
> +0x,
> +mcfg->base,
> +mcfg->base,
here   ^^^

> +aml_append(resource_template,
> +   aml_dword_memory(AML_POS_DECODE,
> +AML_MIN_FIXED,
> +

Re: [PATCH RESEND] hostmem: Don't report pmem attribute if unsupported

2021-02-16 Thread Eduardo Habkost

On Tue, Jan 26, 2021 at 08:48:25AM +0100, Michal Privoznik wrote:
> When management applications (like Libvirt) want to check whether
> memory-backend-file.pmem is supported they can list object
> properties using 'qom-list-properties'. However, 'pmem' is
> declared always (and thus reported always) and only at runtime
> QEMU errors out if it was built without libpmem (and thus can not
> guarantee write persistence). This is suboptimal since we have
> ability to declare attributes at compile time.
> 
> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1915216
> Signed-off-by: Michal Privoznik 
> Reviewed-by: Daniel Henrique Barboza 
> Tested-by: Daniel Henrique Barboza 
> Reviewed-by: Igor Mammedov 

I'm not a fan of making availability of properties conditional
(even if at compile time), but if this helps libvirt I guess it
makes sense.

CCing John, who has been thinking a lot about these questions.

I'm queueing this on machine-next.  Thanks, and sorry for the delay!

> ---
> 
> This is just a resend of a patch I've sent earlier with Reviewed-by and
> Tested-by added:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2021-01/msg04558.html
> 
>  backends/hostmem-file.c | 13 -
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> index 40e1e5b3e3..7e30eb5985 100644
> --- a/backends/hostmem-file.c
> +++ b/backends/hostmem-file.c
> @@ -123,6 +123,7 @@ static void file_memory_backend_set_align(Object *o, 
> Visitor *v,
>  fb->align = val;
>  }
>  
> +#ifdef CONFIG_LIBPMEM
>  static bool file_memory_backend_get_pmem(Object *o, Error **errp)
>  {
>  return MEMORY_BACKEND_FILE(o)->is_pmem;
> @@ -139,17 +140,9 @@ static void file_memory_backend_set_pmem(Object *o, bool 
> value, Error **errp)
>  return;
>  }
>  
> -#ifndef CONFIG_LIBPMEM
> -if (value) {
> -error_setg(errp, "Lack of libpmem support while setting the 
> 'pmem=on'"
> -   " of %s. We can't ensure data persistence.",
> -   object_get_typename(o));
> -return;
> -}
> -#endif
> -
>  fb->is_pmem = value;
>  }
> +#endif /* CONFIG_LIBPMEM */
>  
>  static void file_backend_unparent(Object *obj)
>  {
> @@ -180,8 +173,10 @@ file_backend_class_init(ObjectClass *oc, void *data)
>  file_memory_backend_get_align,
>  file_memory_backend_set_align,
>  NULL, NULL);
> +#ifdef CONFIG_LIBPMEM
>  object_class_property_add_bool(oc, "pmem",
>  file_memory_backend_get_pmem, file_memory_backend_set_pmem);
> +#endif
>  }
>  
>  static void file_backend_instance_finalize(Object *o)
> -- 
> 2.26.2
> 

-- 
Eduardo

Re: [PATCH v4 04/10] acpi/core: always set SCI_EN when SMM isn't supported

2021-02-16 Thread Igor Mammedov

On Mon, 15 Feb 2021 17:04:09 -0800
isaku.yamah...@gmail.com wrote:

> From: Isaku Yamahata 
> 
> If SMM is not supported, ACPI fixed hardware doesn't support
> legacy-mode. ACPI-only platform. Where SCI_EN in PM1_CNT register is
> always set.
> The bit tells OS legacy mode(SCI_EN cleared) or ACPI mode(SCI_EN set).
> 
> With the next patch (setting fadt.smi_cmd = 0 when smm isn't enabled),
> guest Linux tries to switch to ACPI mode, finds smi_cmd = 0, and then
> fails to initialize acpi subsystem. This patch proactively fixes it.
> 
> This patch changes guest ABI. To keep compatibility, use
> "x-smm-compat-5" introduced by earlier patch.
> If the property is true, disable new behavior.
> 
> ACPI spec 4.8.10.1 PM1 Event Grouping
> PM1 Eanble Registers
> > For ACPI-only platforms (where SCI_EN is always set)  
> 
> Signed-off-by: Isaku Yamahata 

Reviewed-by: Igor Mammedov 

> ---
>  hw/acpi/core.c | 11 ++-
>  hw/acpi/ich9.c |  2 +-
>  hw/acpi/piix4.c|  3 ++-
>  hw/isa/vt82c686.c  |  2 +-
>  include/hw/acpi/acpi.h |  4 +++-
>  5 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/acpi/core.c b/hw/acpi/core.c
> index 7170bff657..1e004d0078 100644
> --- a/hw/acpi/core.c
> +++ b/hw/acpi/core.c
> @@ -579,6 +579,10 @@ void acpi_pm1_cnt_update(ACPIREGS *ar,
>   bool sci_enable, bool sci_disable)
>  {
>  /* ACPI specs 3.0, 4.7.2.5 */
> +if (ar->pm1.cnt.acpi_only) {
> +return;
> +}
> +
>  if (sci_enable) {
>  ar->pm1.cnt.cnt |= ACPI_BITMASK_SCI_ENABLE;
>  } else if (sci_disable) {
> @@ -608,11 +612,13 @@ static const MemoryRegionOps acpi_pm_cnt_ops = {
>  };
>  
>  void acpi_pm1_cnt_init(ACPIREGS *ar, MemoryRegion *parent,
> -   bool disable_s3, bool disable_s4, uint8_t s4_val)
> +   bool disable_s3, bool disable_s4, uint8_t s4_val,
> +   bool acpi_only)
>  {
>  FWCfgState *fw_cfg;
>  
>  ar->pm1.cnt.s4_val = s4_val;
> +ar->pm1.cnt.acpi_only = acpi_only;
>  ar->wakeup.notify = acpi_notify_wakeup;
>  qemu_register_wakeup_notifier(&ar->wakeup);
>  
> @@ -638,6 +644,9 @@ void acpi_pm1_cnt_init(ACPIREGS *ar, MemoryRegion *parent,
>  void acpi_pm1_cnt_reset(ACPIREGS *ar)
>  {
>  ar->pm1.cnt.cnt = 0;
> +if (ar->pm1.cnt.acpi_only) {
> +ar->pm1.cnt.cnt |= ACPI_BITMASK_SCI_ENABLE;
> +}
>  }
>  
>  /* ACPI GPE */
> diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
> index 5ff4e01c36..853447cf9d 100644
> --- a/hw/acpi/ich9.c
> +++ b/hw/acpi/ich9.c
> @@ -282,7 +282,7 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm,
>  acpi_pm_tmr_init(&pm->acpi_regs, ich9_pm_update_sci_fn, &pm->io);
>  acpi_pm1_evt_init(&pm->acpi_regs, ich9_pm_update_sci_fn, &pm->io);
>  acpi_pm1_cnt_init(&pm->acpi_regs, &pm->io, pm->disable_s3, 
> pm->disable_s4,
> -  pm->s4_val);
> +  pm->s4_val, !pm->smm_compat && !smm_enabled);
>  
>  acpi_gpe_init(&pm->acpi_regs, ICH9_PMIO_GPE0_LEN);
>  memory_region_init_io(&pm->io_gpe, OBJECT(lpc_pci), &ich9_gpe_ops, pm,
> diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
> index 30dd9b2309..1efc0ded9f 100644
> --- a/hw/acpi/piix4.c
> +++ b/hw/acpi/piix4.c
> @@ -497,7 +497,8 @@ static void piix4_pm_realize(PCIDevice *dev, Error **errp)
>  
>  acpi_pm_tmr_init(&s->ar, pm_tmr_timer, &s->io);
>  acpi_pm1_evt_init(&s->ar, pm_tmr_timer, &s->io);
> -acpi_pm1_cnt_init(&s->ar, &s->io, s->disable_s3, s->disable_s4, 
> s->s4_val);
> +acpi_pm1_cnt_init(&s->ar, &s->io, s->disable_s3, s->disable_s4, 
> s->s4_val,
> +  !s->smm_compat && !s->smm_enabled);
>  acpi_gpe_init(&s->ar, GPE_LEN);
>  
>  s->powerdown_notifier.notify = piix4_pm_powerdown_req;
> diff --git a/hw/isa/vt82c686.c b/hw/isa/vt82c686.c
> index a6f5a0843d..071b64b497 100644
> --- a/hw/isa/vt82c686.c
> +++ b/hw/isa/vt82c686.c
> @@ -240,7 +240,7 @@ static void vt82c686b_pm_realize(PCIDevice *dev, Error 
> **errp)
>  
>  acpi_pm_tmr_init(&s->ar, pm_tmr_timer, &s->io);
>  acpi_pm1_evt_init(&s->ar, pm_tmr_timer, &s->io);
> -acpi_pm1_cnt_init(&s->ar, &s->io, false, false, 2);
> +acpi_pm1_cnt_init(&s->ar, &s->io, false, false, 2, false);
>  }
>  
>  static Property via_pm_properties[] = {
> diff --git a/include/hw/acpi/acpi.h b/include/hw/acpi/acpi.h
> index 22b0b65bb2..9e8a76f2e2 100644
> --- a/include/hw/acpi/acpi.h
> +++ b/include/hw/acpi/acpi.h
> @@ -128,6 +128,7 @@ struct ACPIPM1CNT {
>  MemoryRegion io;
>  uint16_t cnt;
>  uint8_t s4_val;
> +bool acpi_only;
>  };
>  
>  struct ACPIGPE {
> @@ -163,7 +164,8 @@ void acpi_pm1_evt_init(ACPIREGS *ar, acpi_update_sci_fn 
> update_sci,
>  
>  /* PM1a_CNT: piix and ich9 don't implement PM1b CNT. */
>  void acpi_pm1_cnt_init(ACPIREGS *ar, MemoryRegion *parent,
> -   bool disable_s3, bool disable_s4, uint8_t s4_val);
> +   bool disable_s3, bool dis

Re: [PATCH v4 03/10] ich9, piix4: add properoty, smm-compat, to keep compatibility of SMM

2021-02-16 Thread Igor Mammedov

On Mon, 15 Feb 2021 17:04:08 -0800
isaku.yamah...@gmail.com wrote:

> From: Isaku Yamahata 
> 
> The following patch will introduce incompatible behavior of SMM.
> Introduce a property to keep the old behavior for compatibility.
> To enable smm compat, use "-global ICH9-LPC.smm-compat=on" or
> "-global PIIX4.smm-compat=on"
> 
> Suggested-by: Igor Mammedov 
> Signed-off-by: Isaku Yamahata 

looks good to me, but doesn't apply to master anymore, so needs to be rebased
> ---
>  hw/acpi/piix4.c| 2 ++
>  hw/core/machine.c  | 5 -
>  hw/isa/lpc_ich9.c  | 1 +
>  include/hw/acpi/ich9.h | 1 +
>  4 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
> index 669be5bbf6..30dd9b2309 100644
> --- a/hw/acpi/piix4.c
> +++ b/hw/acpi/piix4.c
> @@ -74,6 +74,7 @@ struct PIIX4PMState {
>  qemu_irq irq;
>  qemu_irq smi_irq;
>  int smm_enabled;
> +bool smm_compat;
>  Notifier machine_ready;
>  Notifier powerdown_notifier;
>  
> @@ -642,6 +643,7 @@ static Property piix4_pm_properties[] = {
>   use_acpi_root_pci_hotplug, true),
>  DEFINE_PROP_BOOL("memory-hotplug-support", PIIX4PMState,
>   acpi_memory_hotplug.is_enabled, true),
> +DEFINE_PROP_BOOL("smm-compat", PIIX4PMState, smm_compat, false),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index de3b8f1b31..870c9201df 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -33,7 +33,10 @@
>  #include "migration/global_state.h"
>  #include "migration/vmstate.h"
>  
> -GlobalProperty hw_compat_5_2[] = {};
> +GlobalProperty hw_compat_5_2[] = {
> +{ "ICH9-LPC", "smm-compat", "on"},
> +{ "PIIX4_PM", "smm-compat", "on"},
> +};
I'd put this hunk into the 4/10, where behavior changes

and add a note to commit this message that property will be used by the next 
patch

>  const size_t hw_compat_5_2_len = G_N_ELEMENTS(hw_compat_5_2);
>  
>  GlobalProperty hw_compat_5_1[] = {
> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
> index d3145bf014..3963b73520 100644
> --- a/hw/isa/lpc_ich9.c
> +++ b/hw/isa/lpc_ich9.c
> @@ -775,6 +775,7 @@ static const VMStateDescription vmstate_ich9_lpc = {
>  
>  static Property ich9_lpc_properties[] = {
>  DEFINE_PROP_BOOL("noreboot", ICH9LPCState, pin_strap.spkr_hi, true),
> +DEFINE_PROP_BOOL("smm-compat", ICH9LPCState, pm.smm_compat, false),
>  DEFINE_PROP_BIT64("x-smi-broadcast", ICH9LPCState, smi_host_features,
>ICH9_LPC_SMI_F_BROADCAST_BIT, true),
>  DEFINE_PROP_BIT64("x-smi-cpu-hotplug", ICH9LPCState, smi_host_features,
> diff --git a/include/hw/acpi/ich9.h b/include/hw/acpi/ich9.h
> index 54571c77e0..df519e40b5 100644
> --- a/include/hw/acpi/ich9.h
> +++ b/include/hw/acpi/ich9.h
> @@ -59,6 +59,7 @@ typedef struct ICH9LPCPMRegs {
>  uint8_t disable_s4;
>  uint8_t s4_val;
>  uint8_t smm_enabled;
> +bool smm_compat;
>  bool enable_tco;
>  TCOIORegs tco_regs;
>  } ICH9LPCPMRegs;

Re: [PATCH] acpi: Remove duplicated code handling OEM ID and OEM table ID fields

2021-02-16 Thread Igor Mammedov

On Mon, 15 Feb 2021 20:26:10 +0200
Marian Postevca  wrote:

> Igor Mammedov  writes:
> 
> > hmm, looks like adding instead of removing
> >  
> Do you mean that the commit message does not describe the change
> correctly, or that my refactoring is too extreme?
I've meant that diff-stat shows that patch adds more lines than it removes
so I'm not sure it removes duplication.
Maybe rephrase commit message a little.


> If it is the latter, I think I tried to simplify things, by creating
> macros to be used in multiple places where this structure is created.
> And passing the structure around instead of two parameters seemed simpler.
> 
> > have you considered, putting this field into X86MachineState?
> > (that way you will be able to handle both PC and microvm in one place,
> > without duplication io init/property setters)
> >  
> I did not, will try this approach.
this would be pure deduplication, and simpler if you do it as separate patch
and then re-factoring to static fields on top of that.
(i.e. don't mix re-factoring and deduplication)

Re: [PATCH v3 07/10] hw/i386: declare ACPI mother board resource for MMCONFIG region

2021-02-16 Thread Igor Mammedov

On Tue, 16 Feb 2021 10:13:25 -0800
Isaku Yamahata  wrote:

> On Tue, Feb 16, 2021 at 08:45:48AM -0500,
> "Michael S. Tsirkin"  wrote:
> 
> > On Tue, Feb 16, 2021 at 01:43:01AM -0800, Isaku Yamahata wrote:  
> > > On Mon, Feb 15, 2021 at 01:48:32PM +0100,
> > > Igor Mammedov  wrote:
> > >   
> > > > On Fri, 12 Feb 2021 12:51:57 -0800
> > > > Isaku Yamahata  wrote:
> > > >   
> > > > > On Fri, Feb 12, 2021 at 04:40:38PM +0100,
> > > > > Igor Mammedov  wrote:
> > > > >   
> > > > > > On Wed, 10 Feb 2021 22:46:43 -0800
> > > > > > Isaku Yamahata  wrote:
> > > > > > 
> > > > > > > +Aml *dev;
> > > > > > > +Aml *rbuf;
> > > > > > > +Aml *resource_template;
> > > > > > > +Aml *rbuf_name;
> > > > > > > +Aml *crs;
> > > > > > > +
> > > > > > > +if (!acpi_get_mcfg(&mcfg)) {
> > > > > > > +return NULL;
> > > > > > > +}
> > > > > > > +
> > > > > > > +/* DRAM controller */
> > > > > > > +dev = aml_device("DRAC");
> > > > > > > +aml_append(dev, aml_name_decl("_HID", 
> > > > > > > aml_string("PNP0C01")));
> > > > > > > +
> > > > > > > +resource_template = aml_resource_template();
> > > > > > > +aml_append(resource_template,
> > > > > > > +   aml_qword_memory(AML_POS_DECODE,
> > > > > > > +AML_MIN_FIXED,
> > > > > > > +AML_MAX_FIXED,
> > > > > > > +AML_NON_CACHEABLE,
> > > > > > > +AML_READ_WRITE,
> > > > > > > +0x,
> > > > > > > +mcfg.base,
> > > > > > > +mcfg.base + mcfg.size - 1,
> > > > > > s/mcfg.base + mcfg.size - 1/mcfg.base/
> > > > > 
> > > > > AddressMaximum needs to be the highest address of the region.
> > > > > Not base address. By disassemble/assembl it, iasl complains as 
> > > > > follows.
> > > > > Also there are similar examples in acpi-build.c.  
> > > > I mostly clean up all places to use the same base address for min/max,
> > > > but probably something were left behind.
> > > > 
> > > > spec says:
> > > > 
> > > > acpi 6.3: 19.6.110 QWordMemory
> > > > 
> > > > AddressMaximum evaluates to a 64-bit integer that specifies the highest 
> > > > possible base address of the
> > > > Memory range. The value must have ‘0’ in all bits where the 
> > > > corresponding bit in AddressGranularity is
> > > > ‘1’. For bridge devices which translate addresses, this is the address 
> > > > on the secondary bus. The 64-bit
> > > > field DescriptorName ._MAX is automatically created to refer to this 
> > > > portion of the resource descriptor.  
> > > 
> > > Ok, Linux guest is happy with min=max.
> > > I conlude that it's iasl issue.
> > > 
> > > Thanks,  
> > 
> > OK but what about all the other places in the code that seem to use this
> > field differently?  
> 
> Igor, what do you think?
> The followings are the summary of the situation.
> 
> - acpi 6.4: 19.6.110 QWordMemory
>   _MAX: maximum of base address.
> 
> - acpi 6.4: 6.4.3.5 Address Space Resource Descriptors
>   table 6.44
>   If _LEN > 0 and _MIF = 1 and _MAF = 1, then _LEN = _MAX - _MIN + 1
>   This doesn't match with the above description
I'd say it 19.6.110 doesn't describes whole possibilities of resource,
but only subset in its current phrasing.

> - iasl
>   If _LEN > 0 and _MIF = 1 and _MAF = 1,
>   it emits warning on _LEN != _MAX - _MIN + 1
so IASL is right and follows spec.

Anyways, My apologies for misleading and your patch is correct.

I need to add similar checks to QEMU code plus comments to avoid confusion
in future.

> - Linux Guest MMCONFIG check 
>   check_mcfg_resoure() uses only _MIN. doesn't use _MAX.
>   _MAX value doesn't matter
> 
> - Linux acpi code
>   acpi_decode_space() uses _MAX to calulate range [start, end], not use _LEN.
>   i.e. It assumes _LEN = _MAX - _MIN + 1 if _LEN > 0, _MIF = 1, _MAF = 1
> 
> - qemu code sets for [qd]word_memory_resource (except this patch)
>   If _LEN > 0 and _MIF = 1 and _MAF = 1, then set _LEN = _MAX - _MIN + 1

Re: [PATCH] hw/display/tcx: Drop unnecessary code for handling BGR format outputs

2021-02-16 Thread Mark Cave-Ayland


On 16/02/2021 10:11, Peter Maydell wrote:


Would you like this to go via a qemu-sparc PR or is it easier to go as part of a
group alongside your other display surface patches via target-arm.next?


I'm happy either way -- if you don't happen to have anything else
queued up for sparc I can just put it in with the arm queue.


Nothing at the moment. I'm not sure whether the ESP patches will go via qemu-sparc or 
Laurent's m68k queue, and the ESP series is probably large enough by itself already. 
So if you've got a pending PR feel free to add it in, otherwise I'll have a look 
post-ESP :)



ATB,

Mark.

Re: [PATCH v2 11/42] esp: apply transfer length adjustment when STC is zero at TC load time

2021-02-16 Thread Mark Cave-Ayland


On 16/02/2021 07:33, Philippe Mathieu-Daudé wrote:


On 2/9/21 8:29 PM, Mark Cave-Ayland wrote:

Perform the length adjustment whereby a value of 0 in the STC represents
a transfer length of 0x1 at the point where the TC is loaded at the


0x1 -> 64 KiB?


I'd prefer to keep these as they are, since TC is described in the documentation as 
16-bit counter: it is the number of bits that is relevant here as opposed to the 
absolute size.


There is a slight bit of trickery here in that the ESP emulation already handles a 
later variant of the chip which has a 24-bit counter which is why we can get away 
with setting its value to 0x1 - guests that don't check for this will simply 
ignore the register containing the MSB.



start of a DMA command rather than just when a TI (Transfer Information)
command is executed. This better matches the description as given in the
datasheet.

Signed-off-by: Mark Cave-Ayland 
---
  hw/scsi/esp.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index a1acc2c9bd..02b7876394 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -562,9 +562,6 @@ static void handle_ti(ESPState *s)
  }
  
  dmalen = esp_get_tc(s);

-if (dmalen == 0) {
-dmalen = 0x1;
-}
  s->dma_counter = dmalen;
  
  if (s->do_cmd) {

@@ -699,7 +696,11 @@ void esp_reg_write(ESPState *s, uint32_t saddr, uint64_t 
val)
  if (val & CMD_DMA) {
  s->dma = 1;
  /* Reload DMA counter.  */
-esp_set_tc(s, esp_get_stc(s));
+if (esp_get_stc(s) == 0) {
+esp_set_tc(s, 0x1);


0x1 -> 64 * KiB


And same here too.


Reviewed-by: Philippe Mathieu-Daudé 



ATB,

Mark.

Re: [PATCH v2 30/42] esp: add 4 byte PDMA read and write transfers

2021-02-16 Thread Mark Cave-Ayland


On 16/02/2021 07:30, Philippe Mathieu-Daudé wrote:


Are you planning to review any more of this series? I'm keen to put out
a (hopefully final) v3 soon, but I'll hold off for little while if you
want more time to look over the remaining patches.


I talked about this series with Laurent on Sunday, asking him for
review help ;) I don't remember if there is any big comment to
address in patches 1-14. If not I can review the missing ones
there today and you could send directly a pull request for this
first set, then send the rest as v3. Does that help?
For the rest I doubt having time to focus before Friday.


I'd prefer to merge the entire series, since there is more than one migration 
compatibility break for the q800 machine and I think it would be almost impossible to 
ensure that all test images didn't regress at some point until all patches have been 
applied.


I should probably add that I expanded the test suite to booting 10 images from v1 to 
v2 across a mix of SPARC, m68k, x86_64 and hppa including both commercial and free OSs.



ATB,

Mark.

Re: [PATCH v8 00/35] Hexagon patch series

2021-02-16 Thread Peter Maydell

On Tue, 16 Feb 2021 at 20:59, Taylor Simpson  wrote:
> > -Original Message-
> > From: Richard Henderson 
> > I've completed review of this round, and there are some nits.  But they're
> > minor enough that I wouldn't even mind them being addressed via the
> > normal
> > development process.  I.e. I'd be keen to not look through that diffstat 
> > again.
> >  ;-)

> > Any objections from anyone else on that?
> >
> > I don't suppose you and Peter Maydell signed gpg keys when we all met in
> > Lyon?
>
> Nope.  Peter, please advise

We effectively are operating a TOFU policy for gpg keys,
ie put them on a public keyserver, to the extent that you can arrange
to get them signed by other people who also have gpg keys please do,
and at some point we may be able to meet up and get a shorter
trust path.

For this patchset, I would prefer it if Richard collected the patches
and sent me a pullrequest. First pullrequests from new submaintainers
are higher-effort for me, because I need to look them through carefully
to be sure that they're the right format and so on; so I'd rather
not do that with an enormous patchset. It's easier for me if that
work is postponed and done with something smaller later.

thanks
-- PMM

Re: [PATCH 02/10] qemu-options: update to show preferred boolean syntax for -chardev

2021-02-16 Thread Peter Maydell

On Tue, 16 Feb 2021 at 19:10, Daniel P. Berrangé  wrote:
>
> The preferred syntax is to use "foo=on|off", rather than a bare
> "foo" or "nofoo".
>
> Signed-off-by: Daniel P. Berrangé 
> -"-chardev 
> socket,id=id[,host=host],port=port[,to=to][,ipv4][,ipv6][,nodelay][,reconnect=seconds]\n"
> -" 
> [,server][,nowait][,telnet][,websocket][,reconnect=seconds][,mux=on|off]\n"
> +"-chardev 
> socket,id=id[,host=host],port=port[,to=to][,ipv4=on|off][,ipv6=on|off][,delay=on|off][,reconnect=seconds]\n"

Watch out, 'nodelay' is a special case, where the option name itself
starts with 'no':

{ 'struct': 'ChardevSocket',
  'data': { 'addr': 'SocketAddressLegacy',
'*tls-creds': 'str',
'*tls-authz'  : 'str',
'*server': 'bool',
'*wait': 'bool',
'*nodelay': 'bool',
'*telnet': 'bool',
'*tn3270': 'bool',
'*websocket': 'bool',
'*reconnect': 'int' },
  'base': 'ChardevCommon' }

(because it's setting the TCP_NODELAY option).

This probably applies elsewhere, eg I suspect the 'nodelay' in
patch 1 also should remain.

-- PMM

[Bug 1906193] Re: riscv32 user mode emulation: fork return values broken

2021-02-16 Thread Alistair Francis

A fix has been merged into master.

** Changed in: qemu
   Status: Confirmed => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1906193

Title:
  riscv32 user mode emulation: fork return values broken

Status in QEMU:
  Fix Committed

Bug description:
  When running in a chroot with riscv32 (on x86_64; qemu git master as
  of today):

  The following short program forks; the child immediately returns with
  exit(42). The parent checks for the return value - and obtains 40!

  gcc-10.2

  ===
  #include 
  #include 
  #include 
  #include 

  main(c, v)
   int c;
   char **v;
  {
pid_t pid, p;
int s, i, n;

s = 0;
pid = fork();
if (pid == 0)
  exit(42);

/* wait for the process */
p = wait(&s);
if (p != pid)
  exit (255);

if (WIFEXITED(s))
{
   int r=WEXITSTATUS(s);
   if (r!=42) {
printf("child wants to return %i (0x%X), parent received %i (0x%X), 
difference %i\n",42,42,r,r,r-42);
   }
}
  }
  ===

  (riscv-ilp32 chroot) farino /tmp # ./wait-test-short 
  child wants to return 42 (0x2A), parent received 40 (0x28), difference -2

  ===
  (riscv-ilp32 chroot) farino /tmp # gcc --version
  gcc (Gentoo 10.2.0-r1 p2) 10.2.0
  Copyright (C) 2020 Free Software Foundation, Inc.
  Dies ist freie Software; die Kopierbedingungen stehen in den Quellen. Es
  gibt KEINE Garantie; auch nicht für MARKTGÄNGIGKEIT oder FÜR SPEZIELLE ZWECKE.

  (riscv-ilp32 chroot) farino /tmp # ld --version
  GNU ld (Gentoo 2.34 p6) 2.34.0
  Copyright (C) 2020 Free Software Foundation, Inc.
  This program is free software; you may redistribute it under the terms of
  the GNU General Public License version 3 or (at your option) a later version.
  This program has absolutely no warranty.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1906193/+subscriptions

RE: [PATCH v8 00/35] Hexagon patch series

2021-02-16 Thread Taylor Simpson



> -Original Message-
> From: Richard Henderson 
> Sent: Sunday, February 14, 2021 7:23 PM
> To: Taylor Simpson ; qemu-devel@nongnu.org
> Cc: phi...@redhat.com; alex.ben...@linaro.org; laur...@vivier.eu;
> a...@rev.ng; Brian Cain ; Peter Maydell
> 
> Subject: Re: [PATCH v8 00/35] Hexagon patch series
>
> On 2/7/21 9:45 PM, Taylor Simpson wrote:
> > This series adds support for the Hexagon processor with Linux user support
> >
> > See patch 02 Hexagon README for detailed information.
> >
> > This series assumes int128_or() is implemented.
> > https://lists.nongnu.org/archive/html/qemu-devel/2020-10/msg06004.html
> >

What's the status of this patch?  Could you go ahead and merge it ahead of the 
rest of the series that it's in?

> >
> > Once the series is applied, the Hexagon port will pass "make check-tcg".
> > The series also includes Hexagon-specific tests in tcg/tests/hexagon.
> >
> > The final patch in the series add docker support.  Thanks to Alessandro
> > Di Federico  and Brian Cain  for making
> this
> > happen.  The default container (debian-hexagon-cross) uses a toolchain
> built
> > by rev.ng.  Alternatively, there is a container that will build the 
> > toolchain
> > locally (debian-hexagon-cross-build-local).
>
> Right.  This is in really good shape.
>
> I've completed review of this round, and there are some nits.  But they're
> minor enough that I wouldn't even mind them being addressed via the
> normal
> development process.  I.e. I'd be keen to not look through that diffstat 
> again.
>  ;-)

Thanks Richard!!  We're having unprecedented winter weather in Austin, so I'll 
address your latest feedback once our datacenter recovers from a power outage.

>
> Any objections from anyone else on that?
>
> I don't suppose you and Peter Maydell signed gpg keys when we all met in
> Lyon?

Nope.  Peter, please advise

Re: [RFC v19 13/15] i386: slit svm_helper into softmmu and stub-only user

2021-02-16 Thread Claudio Fontana

Hello Eric,

On 2/16/21 8:34 PM, Eric Blake wrote:
> On 2/16/21 4:46 AM, Claudio Fontana wrote:
> 
> In the subject, s/slit/split/

thanks, fixed.

> 
>> XXX Should we assert that cpu hidden flag SVME is not set,
>> when attempting to generate VMRUN and related instructions
>> in CONFIG_USER_ONLY?
>>
>> XXX Similarily, should we check for CONFIG_USER_ONLY when
>> attempting to cpu_load_efer?
> 
> If you don't fix the comment, s/Similarily/Similarly/

Thanks, will probably replace the whole comment,

I was just trying to make noise with this one to attract attention :-)

Thanks,

Claudio

> 
>>
>> Signed-off-by: Claudio Fontana 
>> Cc: Paolo Bonzini 
>> ---
> 
>

Re: [RFC PATCH v4 3/7] ebpf: Added eBPF RSS program.

2021-02-16 Thread Toke Høiland-Jørgensen

> From: Andrew 
> 
> RSS program and Makefile to build it.
> The bpftool used to generate '.h' file.
> The data in that file may be loaded by libbpf.
> EBPF compilation is not required for building qemu.
> You can use Makefile if you need to regenerate rss.bpf.skeleton.h.
> 
> Signed-off-by: Yuri Benditovich 
> Signed-off-by: Andrew Melnychenko 

A few comments on the BPF implementation:

> ---
>  tools/ebpf/Makefile.ebpf |  33 +++
>  tools/ebpf/rss.bpf.c | 505 +++
>  2 files changed, 538 insertions(+)
>  create mode 100755 tools/ebpf/Makefile.ebpf
>  create mode 100644 tools/ebpf/rss.bpf.c
> 
> diff --git a/tools/ebpf/Makefile.ebpf b/tools/ebpf/Makefile.ebpf
> new file mode 100755
> index 00..d32d1680b8
> --- /dev/null
> +++ b/tools/ebpf/Makefile.ebpf
> @@ -0,0 +1,33 @@
> +OBJS = rss.bpf.o
> +
> +LLC ?= llc
> +CLANG ?= clang
> +INC_FLAGS = `$(CLANG) -print-file-name=include`
> +EXTRA_CFLAGS ?= -O2 -emit-llvm -fno-stack-protector
> +
> +ifdef linuxhdrs
> +LINUXINCLUDE =  -I $(linuxhdrs)/arch/x86/include/uapi \
> +-I $(linuxhdrs)/arch/x86/include/generated/uapi \
> +-I $(linuxhdrs)/arch/x86/include/generated \
> +-I $(linuxhdrs)/include/generated/uapi \
> +-I $(linuxhdrs)/include/uapi \
> +-I $(linuxhdrs)/include \
> +-I $(linuxhdrs)/tools/lib
> +INC_FLAGS += -nostdinc -isystem
> +endif

It should be possible to set things up so you don't need a full set of
kernel headers to build stuff. What we usually do for BPF projects is to
just include the headers we need in a separate directory, clearly marked
as originating from the kernel tree. That way you don't incur a
dependency on the full kernel-headers, and users won't run into issues
where their kernel headers are too old. See an example here:

https://github.com/xdp-project/bpf-examples/tree/master/headers

> +
> +all: $(OBJS)
> +
> +.PHONY: clean
> +
> +clean:
> +   rm -f $(OBJS)
> +
> +$(OBJS):  %.o:%.c
> +   $(CLANG) $(INC_FLAGS) \
> +-D__KERNEL__ -D__ASM_SYSREG_H \
> +-I../include $(LINUXINCLUDE) \
> +$(EXTRA_CFLAGS) -c $< -o -| $(LLC) -march=bpf -filetype=obj 
> -o 
> $@
> +   bpftool gen skeleton rss.bpf.o > rss.bpf.skeleton.h
> +   cp rss.bpf.skeleton.h ../../ebpf/
> +
> diff --git a/tools/ebpf/rss.bpf.c b/tools/ebpf/rss.bpf.c
> new file mode 100644
> index 00..eb377247fc
> --- /dev/null
> +++ b/tools/ebpf/rss.bpf.c
> @@ -0,0 +1,505 @@
> +/*
> + * eBPF RSS program
> + *
> + * Developed by Daynix Computing LTD (http://www.daynix.com)
> + *
> + * Authors:
> + *  Andrew Melnychenko 
> + *  Yuri Benditovich 
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + * Prepare:
> + * Requires llvm, clang, bpftool, linux kernel tree
> + *
> + * Build rss.bpf.skeleton.h:
> + * make -f Makefile.ebpf clean all
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +
> +#define INDIRECTION_TABLE_SIZE 128
> +#define HASH_CALCULATION_BUFFER_SIZE 36
> +
> +struct rss_config_t {
> +__u8 redirect;
> +__u8 populate_hash;

two-byte hole here...

> +__u32 hash_types;
> +__u16 indirections_len;
> +__u16 default_queue;
> +};
> +
> +struct toeplitz_key_data_t {
> +__u32 leftmost_32_bits;
> +__u8 next_byte[HASH_CALCULATION_BUFFER_SIZE];
> +};
> +
> +struct packet_hash_info_t {
> +__u8 is_ipv4;
> +__u8 is_ipv6;
> +__u8 is_udp;
> +__u8 is_tcp;
> +__u8 is_ipv6_ext_src;
> +__u8 is_ipv6_ext_dst;
> +
> +__u16 src_port;
> +__u16 dst_port;

...and there's going to be a hole here as well I think.

> +union {
> +struct {
> +__be32 in_src;
> +__be32 in_dst;
> +};
> +
> +struct {
> +struct in6_addr in6_src;
> +struct in6_addr in6_dst;
> +struct in6_addr in6_ext_src;
> +struct in6_addr in6_ext_dst;
> +};
> +};
> +};
> +
> +struct bpf_map_def SEC("maps")
> +tap_rss_map_configurations = {
> +.type= BPF_MAP_TYPE_ARRAY,
> +.key_size= sizeof(__u32),
> +.value_size  = sizeof(struct rss_config_t),
> +.max_entries = 1,
> +};
> +
> +struct bpf_map_def SEC("maps")
> +tap_rss_map_toeplitz_key = {
> +.type= BPF_MAP_TYPE_ARRAY,
> +.key_size= sizeof(__u32),
> +.value_size  = sizeof(struct toeplitz_key_data_t),
> +.max_entries = 1,
> +};

Which version of LLVM and libbpf are you targeting?

Libbpf 0.0.3 (which is almost two years old now) added support for
relocations of global data section, so instead of defining these
one-element maps you could just define the config structs as global
variables and use them like you would from regular C

Re: [RFC v19 13/15] i386: slit svm_helper into softmmu and stub-only user

2021-02-16 Thread Eric Blake

On 2/16/21 4:46 AM, Claudio Fontana wrote:

In the subject, s/slit/split/

> XXX Should we assert that cpu hidden flag SVME is not set,
> when attempting to generate VMRUN and related instructions
> in CONFIG_USER_ONLY?
> 
> XXX Similarily, should we check for CONFIG_USER_ONLY when
> attempting to cpu_load_efer?

If you don't fix the comment, s/Similarily/Similarly/

> 
> Signed-off-by: Claudio Fontana 
> Cc: Paolo Bonzini 
> ---


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 00/33] migration: capture error reports into Error object

2021-02-16 Thread Dr. David Alan Gilbert

* Daniel P. Berrangé (berra...@redhat.com) wrote:
> On Mon, Feb 15, 2021 at 07:01:28PM +, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berra...@redhat.com) wrote:
> > > On Mon, Feb 15, 2021 at 06:38:05PM +, Dr. David Alan Gilbert wrote:
> > > > One thing to check, and I *think* you're OK, but we have one place where
> > > > we actually check the error number:
> > > > 
> > > > migration.c:
> > > > 3414 static MigThrError migration_detect_error(MigrationState *s)
> > > > ...
> > > > 3426 /* Try to detect any file errors */
> > > > 3427 ret = qemu_file_get_error_obj(s->to_dst_file, &local_error);
> > > > 3428 if (!ret) {
> > > > 3429 /* Everything is fine */
> > > > 3430 assert(!local_error);
> > > > 3431 return MIG_THR_ERR_NONE;
> > > > 3432 }
> > > > 3433 
> > > > 3434 if (local_error) {
> > > > 3435 migrate_set_error(s, local_error);
> > > > 3436 error_free(local_error);
> > > > 3437 }
> > > > 3438 
> > > > 3439 if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
> > > > 3440 /*
> > > > 3441  * For postcopy, we allow the network to be down for a
> > > > 3442  * while. After that, it can be continued by a
> > > > 3443  * recovery phase.
> > > > 3444  */
> > > > 3445 return postcopy_pause(s);
> > > > 3446 } else {
> > > > 
> > > > This is to go into postcopy pause if the network connection broke (but
> > > > not if for example a device moaned about being in an invalid state)
> > > > 
> > > > If I read this correctly, file errors are still being preserved - is
> > > > that correct?
> > > 
> > > Yes, in places where QemuFile is reporting an actual I/O error I've
> > > tried to preserve that. Only removed setting of fake I/O errors. So
> > > if anything, we ought to get more accurate at detecting the recoverable
> > > scenarios once we fully cleanup errors.
> > 
> > OK, good.
> 
> One scenario to possibly check though is that in a few places we used
> error_report_err() but didn't immediately return an error code back to
> the caller, instead carrying on doing other calls. It is possible that
> we thus reported an error about bad data, and then later hit the EIO
> check for QemuFile.

That's generally OK; it gets pretty painful to do the qemu file checks
after every read.

Dave

> 
> Regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v2] linux-user/mmap: Return EFAULT/EINVAL for invalid addresses

2021-02-16 Thread Laurent Vivier

Le 16/02/2021 à 20:01, Richard Purdie a écrit :
> When using qemu-i386 to build qemux86 webkitgtk on musl, it sits in an
> infinite loop of mremap calls of ever decreasing/increasing addresses.
> 
> I suspect something in the musl memory allocation code loops
> indefinitely if it only sees ENOMEM and only exits when it hits other
> errors such as EFAULT or EINVAL.
> 
> According to the docs, trying to mremap outside the address space
> can/should return EFAULT and changing this allows the build to succeed.
> 
> A better return value for the other cases of invalid addresses is
> EINVAL rather than ENOMEM so adjust the other part of the test to this.
> 
> Signed-off-by: Richard Purdie  
> Index: qemu-5.2.0/linux-user/mmap.c
> ===
> --- qemu-5.2.0.orig/linux-user/mmap.c
> +++ qemu-5.2.0/linux-user/mmap.c
> @@ -722,12 +722,14 @@ abi_long target_mremap(abi_ulong old_add
>  int prot;
>  void *host_addr;
>  
> -if (!guest_range_valid(old_addr, old_size) ||
> -((flags & MREMAP_FIXED) &&
> - !guest_range_valid(new_addr, new_size)) ||
> -((flags & MREMAP_MAYMOVE) == 0 &&
> - !guest_range_valid(old_addr, new_size))) {
> -errno = ENOMEM;
> +if (!guest_range_valid(old_addr, old_size)) {
> +errno = EFAULT;
> +return -1;
> +}
> +
> +if (((flags & MREMAP_FIXED) && !guest_range_valid(new_addr, new_size)) ||
> +((flags & MREMAP_MAYMOVE) == 0 && !guest_range_valid(old_addr, 
> new_size))) {
> +errno = EINVAL;
>  return -1;
>  }
>  
> 
> 

Reviewed-by: Laurent Vivier

[PATCH 08/10] docs: update to show preferred boolean syntax for -vnc

2021-02-16 Thread Daniel P . Berrangé

The preferred syntax is to use "foo=on|off", rather than a bare
"foo" or "nofoo".

The on|off syntax has been supported since -vnc switched to use
QemuOpts in commit 4db14629c38611061fc19ec6927405923de84f08

Signed-off-by: Daniel P. Berrangé 
---
 docs/system/vnc-security.rst | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/system/vnc-security.rst b/docs/system/vnc-security.rst
index ebca656d87..830f6acc73 100644
--- a/docs/system/vnc-security.rst
+++ b/docs/system/vnc-security.rst
@@ -44,7 +44,7 @@ the password all clients will be rejected.
 
 .. parsed-literal::
 
-   |qemu_system| [...OPTIONS...] -vnc :1,password -monitor stdio
+   |qemu_system| [...OPTIONS...] -vnc :1,password=on -monitor stdio
(qemu) change vnc password
Password: 
(qemu)
@@ -104,7 +104,7 @@ authentication to provide two layers of authentication for 
clients.
 
|qemu_system| [...OPTIONS...] \
  -object 
tls-creds-x509,id=tls0,dir=/etc/pki/qemu,endpoint=server,verify-peer=on \
- -vnc :1,tls-creds=tls0,password -monitor stdio
+ -vnc :1,tls-creds=tls0,password=on -monitor stdio
(qemu) change vnc password
Password: 
(qemu)
@@ -128,7 +128,7 @@ can be launched with:
 
 .. parsed-literal::
 
-   |qemu_system| [...OPTIONS...] -vnc :1,sasl -monitor stdio
+   |qemu_system| [...OPTIONS...] -vnc :1,sasl=on -monitor stdio
 
 .. _vnc_005fsec_005fcertificate_005fsasl:
 
@@ -146,7 +146,7 @@ x509 options:
 
|qemu_system| [...OPTIONS...] \
  -object 
tls-creds-x509,id=tls0,dir=/etc/pki/qemu,endpoint=server,verify-peer=on \
- -vnc :1,tls-creds=tls0,sasl -monitor stdio
+ -vnc :1,tls-creds=tls0,sasl=on -monitor stdio
 
 .. _vnc_005fsetup_005fsasl:
 
-- 
2.29.2

[PATCH 07/10] docs: update to show preferred boolean syntax for -chardev

2021-02-16 Thread Daniel P . Berrangé

The preferred syntax is to use "foo=on|off", rather than a bare
"foo" or "nofoo".

Signed-off-by: Daniel P. Berrangé 
---
 docs/COLO-FT.txt   |  8 
 docs/ccid.txt  |  6 --
 docs/colo-proxy.txt| 16 
 docs/devel/writing-qmp-commands.txt|  2 +-
 docs/interop/live-block-operations.rst |  4 ++--
 docs/interop/qmp-intro.txt |  4 ++--
 docs/system/cpu-hotplug.rst|  2 +-
 docs/system/s390x/3270.rst |  2 +-
 docs/system/target-avr.rst |  2 +-
 docs/tools/qemu-storage-daemon.rst |  4 ++--
 scripts/qmp/qemu-ga-client |  2 +-
 tests/test-char.c  |  4 ++--
 12 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt
index bc5fb2a1bb..8874690e83 100644
--- a/docs/COLO-FT.txt
+++ b/docs/COLO-FT.txt
@@ -162,11 +162,11 @@ instance.
-device piix3-usb-uhci -device usb-tablet -name primary \
-netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
-device rtl8139,id=e0,netdev=hn0 \
-   -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server,nowait \
-   -chardev socket,id=compare1,host=0.0.0.0,port=9004,server,wait \
-   -chardev socket,id=compare0,host=127.0.0.1,port=9001,server,nowait \
+   -chardev socket,id=mirror0,host=0.0.0.0,port=9003,server=on,wait=off \
+   -chardev socket,id=compare1,host=0.0.0.0,port=9004,server=on,wait=on \
+   -chardev socket,id=compare0,host=127.0.0.1,port=9001,server=on,wait=off \
-chardev socket,id=compare0-0,host=127.0.0.1,port=9001 \
-   -chardev socket,id=compare_out,host=127.0.0.1,port=9005,server,nowait \
+   -chardev socket,id=compare_out,host=127.0.0.1,port=9005,server=on,wait=off \
-chardev socket,id=compare_out0,host=127.0.0.1,port=9005 \
-object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \
-object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out \
diff --git a/docs/ccid.txt b/docs/ccid.txt
index c7fda6d07d..c97fbd2de0 100644
--- a/docs/ccid.txt
+++ b/docs/ccid.txt
@@ -109,7 +109,8 @@ NSS.  Registration can be done from Firefox or the command 
line:
 
 on the host specify the ccid-card-passthru device with a suitable chardev:
 
-qemu -chardev socket,server,host=0.0.0.0,port=2001,id=ccid,nowait -usb 
-device usb-ccid -device ccid-card-passthru,chardev=ccid
+qemu -chardev socket,server=on,host=0.0.0.0,port=2001,id=ccid,wait=off \
+ -usb -device usb-ccid -device ccid-card-passthru,chardev=ccid
 
 on the client run vscclient, built when you built QEMU:
 
@@ -125,7 +126,8 @@ Follow instructions as per #4, except run QEMU and 
vscclient as follows:
 Run qemu as per #5, and run vscclient from the "fake-smartcard"
 directory as follows:
 
-qemu -chardev socket,server,host=0.0.0.0,port=2001,id=ccid,nowait -usb 
-device usb-ccid -device ccid-card-passthru,chardev=ccid
+qemu -chardev socket,server=on,host=0.0.0.0,port=2001,id=ccid,wait=off \
+ -usb -device usb-ccid -device ccid-card-passthru,chardev=ccid
 vscclient -e "db=\"sql:$PWD\" use_hw=no 
soft=(,Test,CAC,,id-cert,signing-cert,encryption-cert)"  2001
 
 
diff --git a/docs/colo-proxy.txt b/docs/colo-proxy.txt
index fa1cef0278..1fc38aed1b 100644
--- a/docs/colo-proxy.txt
+++ b/docs/colo-proxy.txt
@@ -164,11 +164,11 @@ clearly describe the usage.
 Primary(ip:3.3.3.3):
 -netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
 -device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66
--chardev socket,id=mirror0,host=3.3.3.3,port=9003,server,nowait
--chardev socket,id=compare1,host=3.3.3.3,port=9004,server,nowait
--chardev socket,id=compare0,host=3.3.3.3,port=9001,server,nowait
+-chardev socket,id=mirror0,host=3.3.3.3,port=9003,server=on,wait=off
+-chardev socket,id=compare1,host=3.3.3.3,port=9004,server=on,wait=off
+-chardev socket,id=compare0,host=3.3.3.3,port=9001,server=on,wait=off
 -chardev socket,id=compare0-0,host=3.3.3.3,port=9001
--chardev socket,id=compare_out,host=3.3.3.3,port=9005,server,nowait
+-chardev socket,id=compare_out,host=3.3.3.3,port=9005,server=on,wait=off
 -chardev socket,id=compare_out0,host=3.3.3.3,port=9005
 -object iothread,id=iothread1
 -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0
@@ -190,11 +190,11 @@ If you want to use virtio-net-pci or other driver with 
vnet_header:
 Primary(ip:3.3.3.3):
 -netdev tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
 -device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66
--chardev socket,id=mirror0,host=3.3.3.3,port=9003,server,nowait
--chardev socket,id=compare1,host=3.3.3.3,port=9004,server,nowait
--chardev socket,id=compare0,host=3.3.3.3,port=9001,server,nowait
+-chardev socket,id=mirror0,host=3.3.3.3,port=9003,server=on,wait=off
+-chardev socket,id=compare1,host=3.3.3.3,port=9004,server=on,wait=off
+-chardev socket,id=compare0,host=3.3.3.3,port=9001,server=on,wait=off
 -chardev socket,id=compare0-0,ho

Re: [PATCH 1/5] qcow2-bitmap: make bytes_covered_by_bitmap_cluster() public

2021-02-16 Thread Eric Blake

On 2/16/21 10:45 AM, Vladimir Sementsov-Ogievskiy wrote:
> Rename bytes_covered_by_bitmap_cluster() to
> bdrv_dirty_bitmap_serialization_coverage() and make it public. It is
> needed as we are going to make load_bitmap_data() public in the next
> commit.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/block/dirty-bitmap.h |  2 ++
>  block/dirty-bitmap.c | 13 +
>  block/qcow2-bitmap.c | 16 ++--
>  3 files changed, 17 insertions(+), 14 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[PATCH 09/10] docs: update to show preferred boolean syntax for -cpu

2021-02-16 Thread Daniel P . Berrangé

The preferred syntax is to use "foo=on|off", rather than a bare
"+foo" or "-foo"

Signed-off-by: Daniel P. Berrangé 
---
 docs/COLO-FT.txt   | 4 ++--
 docs/interop/firmware.json | 2 +-
 docs/system/cpu-models-x86.rst.inc | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/docs/COLO-FT.txt b/docs/COLO-FT.txt
index 8874690e83..8d6d53a5a2 100644
--- a/docs/COLO-FT.txt
+++ b/docs/COLO-FT.txt
@@ -158,7 +158,7 @@ instance.
 
 # imagefolder="/mnt/vms/colo-test-primary"
 
-# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 512 -smp 1 -qmp 
stdio \
+# qemu-system-x86_64 -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp 
stdio \
-device piix3-usb-uhci -device usb-tablet -name primary \
-netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
-device rtl8139,id=e0,netdev=hn0 \
@@ -189,7 +189,7 @@ any IP's here, except for the $primary_ip variable.
 
 # qemu-img create -f qcow2 $imagefolder/secondary-hidden.qcow2 10G
 
-# qemu-system-x86_64 -enable-kvm -cpu qemu64,+kvmclock -m 512 -smp 1 -qmp 
stdio \
+# qemu-system-x86_64 -enable-kvm -cpu qemu64,kvmclock=on -m 512 -smp 1 -qmp 
stdio \
-device piix3-usb-uhci -device usb-tablet -name secondary \
-netdev tap,id=hn0,vhost=off,helper=/usr/lib/qemu/qemu-bridge-helper \
-device rtl8139,id=e0,netdev=hn0 \
diff --git a/docs/interop/firmware.json b/docs/interop/firmware.json
index 989f10b626..9d94ccafa9 100644
--- a/docs/interop/firmware.json
+++ b/docs/interop/firmware.json
@@ -129,7 +129,7 @@
 #"-machine smm=on". (On the "pc-q35-*" machine types of
 #the @i386 emulation target, @requires-smm presents
 #further CPU requirements; one combination known to work
-#is "-cpu coreduo,-nx".) If the firmware is marked as
+#is "-cpu coreduo,nx=off".) If the firmware is marked as
 #both @secure-boot and @requires-smm, then write
 #accesses to the pflash chip (NVRAM) that holds the UEFI
 #variable store must be restricted to code that executes
diff --git a/docs/system/cpu-models-x86.rst.inc 
b/docs/system/cpu-models-x86.rst.inc
index 9a2327828e..867c8216b5 100644
--- a/docs/system/cpu-models-x86.rst.inc
+++ b/docs/system/cpu-models-x86.rst.inc
@@ -364,7 +364,7 @@ Host passthrough with feature customization:
 
 .. parsed-literal::
 
-  |qemu_system| -cpu host,-vmx,...
+  |qemu_system| -cpu host,vmx=off,...
 
 Named CPU models:
 
@@ -376,7 +376,7 @@ Named CPU models with feature customization:
 
 .. parsed-literal::
 
-  |qemu_system| -cpu Westmere,+pcid,...
+  |qemu_system| -cpu Westmere,pcid=on,...
 
 Libvirt guest XML
 ^
-- 
2.29.2

[PATCH 06/10] qemu-options: update to show preferred boolean syntax for -vnc

2021-02-16 Thread Daniel P . Berrangé

The preferred syntax is to use "foo=on|off", rather than a bare
"foo" or "nofoo".

The on|off syntax has been supported since -vnc switched to use
QemuOpts in commit 4db14629c38611061fc19ec6927405923de84f08

Signed-off-by: Daniel P. Berrangé 
---
 qemu-options.hx | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index da0ddf8a3a..34be5a7a2d 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2123,13 +2123,13 @@ SRST
 Following the display value there may be one or more option flags
 separated by commas. Valid options are
 
-``reverse``
+``reverse=on|off``
 Connect to a listening VNC client via a "reverse" connection.
 The client is specified by the display. For reverse network
 connections (host:d,``reverse``), the d argument is a TCP port
 number, not a display number.
 
-``websocket``
+``websocket=on|off``
 Opens an additional TCP listening port dedicated to VNC
 Websocket connections. If a bare websocket option is given, the
 Websocket port is 5700+display. An alternative port can be
@@ -2143,7 +2143,7 @@ SRST
 runs in unencrypted mode. If TLS credentials are provided, the
 websocket connection requires encrypted client connections.
 
-``password``
+``password=on|off``
 Require that password based authentication is used for client
 connections.
 
@@ -2180,7 +2180,7 @@ SRST
 on the fly while the VNC server is active. If missing, it will
 default to denying access.
 
-``sasl``
+``sasl=on|off``
 Require that the client use SASL to authenticate with the VNC
 server. The exact choice of authentication method used is
 controlled from the system / user's SASL configuration file for
@@ -2203,7 +2203,7 @@ SRST
 fly while the VNC server is active. If missing, it will default
 to denying access.
 
-``acl``
+``acl=on|off``
 Legacy method for enabling authorization of clients against the
 x509 distinguished name and SASL username. It results in the
 creation of two ``authz-list`` objects with IDs of
@@ -2213,13 +2213,13 @@ SRST
 This option is deprecated and should no longer be used. The new
 ``sasl-authz`` and ``tls-authz`` options are a replacement.
 
-``lossy``
+``lossy=on|off``
 Enable lossy compression methods (gradient, JPEG, ...). If this
 option is set, VNC client may receive lossy framebuffer updates
 depending on its encoding settings. Enabling this option can
 save a lot of bandwidth at the expense of quality.
 
-``non-adaptive``
+``non-adaptive=on|off``
 Disable adaptive encodings. Adaptive encodings are enabled by
 default. An adaptive encoding will try to detect frequently
 updated screen regions, and send updates in these regions using
@@ -2254,7 +2254,7 @@ SRST
 must be omitted, otherwise is must be present and specify a
 valid audiodev.
 
-``power-control``
+``power-control=on|off``
 Permit the remote client to issue shutdown, reboot or reset power
 control requests.
 ERST
-- 
2.29.2

[PATCH 10/10] target/i386: update to show preferred boolean syntax for -cpu

2021-02-16 Thread Daniel P . Berrangé

The preferred syntax is to use "foo=on|off", rather than a bare
"+foo" or "-foo"

Signed-off-by: Daniel P. Berrangé 
---
 target/i386/cpu.c   |  2 +-
 tests/qtest/test-x86-cpuid-compat.c | 52 ++---
 2 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 9c3d2d60b7..a5091cc85c 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6451,7 +6451,7 @@ static void x86_cpu_expand_features(X86CPU *cpu, Error 
**errp)
 } else if (cpu->env.cpuid_min_level < 0x14) {
 mark_unavailable_features(cpu, FEAT_7_0_EBX,
 CPUID_7_0_EBX_INTEL_PT,
-"Intel PT need CPUID leaf 0x14, please set by \"-cpu 
...,+intel-pt,min-level=0x14\"");
+"Intel PT need CPUID leaf 0x14, please set by \"-cpu 
...,intel-pt=on,min-level=0x14\"");
 }
 }
 
diff --git a/tests/qtest/test-x86-cpuid-compat.c 
b/tests/qtest/test-x86-cpuid-compat.c
index 7ca1883a29..6470f0a85d 100644
--- a/tests/qtest/test-x86-cpuid-compat.c
+++ b/tests/qtest/test-x86-cpuid-compat.c
@@ -235,82 +235,82 @@ int main(int argc, char **argv)
 /* If level is not large enough, it should increase automatically: */
 /* CPUID[6].EAX: */
 add_cpuid_test("x86/cpuid/auto-level/phenom/arat",
-   "-cpu 486,+arat", "level", 6);
+   "-cpu 486,arat=on", "level", 6);
 /* CPUID[EAX=7,ECX=0].EBX: */
 add_cpuid_test("x86/cpuid/auto-level/phenom/fsgsbase",
-   "-cpu phenom,+fsgsbase", "level", 7);
+   "-cpu phenom,fsgsbase=on", "level", 7);
 /* CPUID[EAX=7,ECX=0].ECX: */
 add_cpuid_test("x86/cpuid/auto-level/phenom/avx512vbmi",
-   "-cpu phenom,+avx512vbmi", "level", 7);
+   "-cpu phenom,avx512vbmi=on", "level", 7);
 /* CPUID[EAX=0xd,ECX=1].EAX: */
 add_cpuid_test("x86/cpuid/auto-level/phenom/xsaveopt",
-   "-cpu phenom,+xsaveopt", "level", 0xd);
+   "-cpu phenom,xsaveopt=on", "level", 0xd);
 /* CPUID[8000_0001].EDX: */
 add_cpuid_test("x86/cpuid/auto-xlevel/486/3dnow",
-   "-cpu 486,+3dnow", "xlevel", 0x8001);
+   "-cpu 486,3dnow=on", "xlevel", 0x8001);
 /* CPUID[8000_0001].ECX: */
 add_cpuid_test("x86/cpuid/auto-xlevel/486/sse4a",
-   "-cpu 486,+sse4a", "xlevel", 0x8001);
+   "-cpu 486,sse4a=on", "xlevel", 0x8001);
 /* CPUID[8000_0007].EDX: */
 add_cpuid_test("x86/cpuid/auto-xlevel/486/invtsc",
-   "-cpu 486,+invtsc", "xlevel", 0x8007);
+   "-cpu 486,invtsc=on", "xlevel", 0x8007);
 /* CPUID[8000_000A].EDX: */
 add_cpuid_test("x86/cpuid/auto-xlevel/486/npt",
-   "-cpu 486,+svm,+npt", "xlevel", 0x800A);
+   "-cpu 486,svm=on,npt=on", "xlevel", 0x800A);
 /* CPUID[C000_0001].EDX: */
 add_cpuid_test("x86/cpuid/auto-xlevel2/phenom/xstore",
-   "-cpu phenom,+xstore", "xlevel2", 0xC001);
+   "-cpu phenom,xstore=on", "xlevel2", 0xC001);
 /* SVM needs CPUID[0x800A] */
 add_cpuid_test("x86/cpuid/auto-xlevel/athlon/svm",
-   "-cpu athlon,+svm", "xlevel", 0x800A);
+   "-cpu athlon,svm=on", "xlevel", 0x800A);
 
 
 /* If level is already large enough, it shouldn't change: */
 add_cpuid_test("x86/cpuid/auto-level/SandyBridge/multiple",
-   "-cpu SandyBridge,+arat,+fsgsbase,+avx512vbmi",
+   "-cpu SandyBridge,arat=on,fsgsbase=on,avx512vbmi=on",
"level", 0xd);
 /* If level is explicitly set, it shouldn't change: */
 add_cpuid_test("x86/cpuid/auto-level/486/fixed/0xF",
-   "-cpu 486,level=0xF,+arat,+fsgsbase,+avx512vbmi,+xsaveopt",
+   "-cpu 
486,level=0xF,arat=on,fsgsbase=on,avx512vbmi=on,xsaveopt=on",
"level", 0xF);
 add_cpuid_test("x86/cpuid/auto-level/486/fixed/2",
-   "-cpu 486,level=2,+arat,+fsgsbase,+avx512vbmi,+xsaveopt",
+   "-cpu 
486,level=2,arat=on,fsgsbase=on,avx512vbmi=on,xsaveopt=on",
"level", 2);
 add_cpuid_test("x86/cpuid/auto-level/486/fixed/0",
-   "-cpu 486,level=0,+arat,+fsgsbase,+avx512vbmi,+xsaveopt",
+   "-cpu 
486,level=0,arat=on,fsgsbase=on,avx512vbmi=on,xsaveopt=on",
"level", 0);
 
 /* if xlevel is already large enough, it shouldn't change: */
 add_cpuid_test("x86/cpuid/auto-xlevel/phenom/3dnow",
-   "-cpu phenom,+3dnow,+sse4a,+invtsc,+npt,+svm",
+   "-cpu phenom,3dnow=on,sse4a=on,invtsc=on,npt=on,svm=on",
"xlevel", 0x801A);
 /* If xlevel is explicitly set, it shouldn't change: */
 add_cpuid_test("x86/cpuid/auto-xlevel/486/fixed/8002",
-

[PATCH 04/10] qemu-options: update to show preferred boolean syntax for -netdev

2021-02-16 Thread Daniel P . Berrangé

The preferred syntax is to use "foo=on|off", rather than a bare
"foo" or "nofoo".

Signed-off-by: Daniel P. Berrangé 
---
 qemu-options.hx | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index bdf159c929..fb2050cda9 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2406,8 +2406,8 @@ DEFHEADING(Network options:)
 
 DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 #ifdef CONFIG_SLIRP
-"-netdev user,id=str[,ipv4[=on|off]][,net=addr[/mask]][,host=addr]\n"
-" [,ipv6[=on|off]][,ipv6-net=addr[/int]][,ipv6-host=addr]\n"
+"-netdev user,id=str[,ipv4=on|off][,net=addr[/mask]][,host=addr]\n"
+" [,ipv6=on|off][,ipv6-net=addr[/int]][,ipv6-host=addr]\n"
 " [,restrict=on|off][,hostname=host][,dhcpstart=addr]\n"
 " 
[,dns=addr][,ipv6-dns=addr][,dnssearch=domain][,domainname=domain]\n"
 " 
[,tftp=dir][,tftp-server-name=name][,bootfile=f][,hostfwd=rule][,guestfwd=rule]"
@@ -2454,8 +2454,8 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 #endif
 #ifdef __linux__
 "-netdev 
l2tpv3,id=str,src=srcaddr,dst=dstaddr[,srcport=srcport][,dstport=dstport]\n"
-" 
[,rxsession=rxsession],txsession=txsession[,ipv6=on/off][,udp=on/off]\n"
-" [,cookie64=on/off][,counter][,pincounter][,txcookie=txcookie]\n"
+" 
[,rxsession=rxsession],txsession=txsession[,ipv6=on|off][,udp=on|off]\n"
+" [,cookie64=on|off][,counter][,pincounter][,txcookie=txcookie]\n"
 " [,rxcookie=rxcookie][,offset=offset]\n"
 "configure a network backend with ID 'str' connected to\n"
 "an Ethernet over L2TPv3 pseudowire.\n"
@@ -2884,7 +2884,7 @@ SRST
  -device e1000,netdev=n1,mac=52:54:00:12:34:56 \\
  -netdev 
socket,id=n1,mcast=239.192.168.1:1102,localaddr=1.2.3.4
 
-``-netdev 
l2tpv3,id=id,src=srcaddr,dst=dstaddr[,srcport=srcport][,dstport=dstport],txsession=txsession[,rxsession=rxsession][,ipv6][,udp][,cookie64][,counter][,pincounter][,txcookie=txcookie][,rxcookie=rxcookie][,offset=offset]``
+``-netdev 
l2tpv3,id=id,src=srcaddr,dst=dstaddr[,srcport=srcport][,dstport=dstport],txsession=txsession[,rxsession=rxsession][,ipv6=on|off][,udp=on|off][,cookie64][,counter][,pincounter][,txcookie=txcookie][,rxcookie=rxcookie][,offset=offset]``
 Configure a L2TPv3 pseudowire host network backend. L2TPv3 (RFC3931)
 is a popular protocol to transport Ethernet (and other Layer 2) data
 frames between two systems. It is present in routers, firewalls and
-- 
2.29.2

[PATCH 05/10] qemu-options: update to show preferred boolean syntax for -incoming

2021-02-16 Thread Daniel P . Berrangé

The preferred syntax is to use "foo=on|off", rather than a bare
"foo" or "nofoo".

Signed-off-by: Daniel P. Berrangé 
---
 qemu-options.hx | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index fb2050cda9..da0ddf8a3a 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4135,8 +4135,8 @@ SRST
 ERST
 
 DEF("incoming", HAS_ARG, QEMU_OPTION_incoming, \
-"-incoming tcp:[host]:port[,to=maxport][,ipv4][,ipv6]\n" \
-"-incoming rdma:host:port[,ipv4][,ipv6]\n" \
+"-incoming tcp:[host]:port[,to=maxport][,ipv4=on|off][,ipv6=on|off]\n" \
+"-incoming rdma:host:port[,ipv4=on|off][,ipv6=on|off]\n" \
 "-incoming unix:socketpath\n" \
 "prepare for incoming migration, listen on\n" \
 "specified protocol and socket address\n" \
@@ -4148,9 +4148,9 @@ DEF("incoming", HAS_ARG, QEMU_OPTION_incoming, \
 "wait for the URI to be specified via migrate_incoming\n",
 QEMU_ARCH_ALL)
 SRST
-``-incoming tcp:[host]:port[,to=maxport][,ipv4][,ipv6]``
+``-incoming tcp:[host]:port[,to=maxport][,ipv4=on|off][,ipv6=on|off]``
   \ 
-``-incoming rdma:host:port[,ipv4][,ipv6]``
+``-incoming rdma:host:port[,ipv4=on|off][,ipv6=on|off]``
 Prepare for incoming migration, listen on a given tcp port.
 
 ``-incoming unix:socketpath``
-- 
2.29.2

[PATCH 03/10] qemu-options: update to show preferred boolean syntax for -spice

2021-02-16 Thread Daniel P . Berrangé

The preferred syntax is to use "foo=on|off", rather than a bare
"foo" or "nofoo".

Signed-off-by: Daniel P. Berrangé 
---
 qemu-options.hx | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 972ef412cc..bdf159c929 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1894,16 +1894,17 @@ DEF("spice", HAS_ARG, QEMU_OPTION_spice,
 "-spice [port=port][,tls-port=secured-port][,x509-dir=]\n"
 "   [,x509-key-file=][,x509-key-password=]\n"
 "   [,x509-cert-file=][,x509-cacert-file=]\n"
-"   [,x509-dh-key-file=][,addr=addr][,ipv4|ipv6|unix]\n"
+"   [,x509-dh-key-file=][,addr=addr]\n"
+"   [,ipv4=on|off][,ipv6=on|off][,unix=on|off]\n"
 "   [,tls-ciphers=]\n"
 "   [,tls-channel=[main|display|cursor|inputs|record|playback]]\n"
 "   
[,plaintext-channel=[main|display|cursor|inputs|record|playback]]\n"
-"   [,sasl][,password=][,disable-ticketing]\n"
+"   [,sasl=on|off][,password=][,disable-ticketing=on|off]\n"
 "   [,image-compression=[auto_glz|auto_lz|quic|glz|lz|off]]\n"
 "   [,jpeg-wan-compression=[auto|never|always]]\n"
 "   [,zlib-glz-wan-compression=[auto|never|always]]\n"
-"   [,streaming-video=[off|all|filter]][,disable-copy-paste]\n"
-"   [,disable-agent-file-xfer][,agent-mouse=[on|off]]\n"
+"   [,streaming-video=[off|all|filter]][,disable-copy-paste=on|off]\n"
+"   [,disable-agent-file-xfer=on|off][,agent-mouse=[on|off]]\n"
 "   [,playback-compression=[on|off]][,seamless-migration=[on|off]]\n"
 "   [,gl=[on|off]][,rendernode=]\n"
 "   enable spice\n"
@@ -1920,13 +1921,13 @@ SRST
 Set the IP address spice is listening on. Default is any
 address.
 
-``ipv4``; \ ``ipv6``; \ ``unix``
+``ipv4=on|off``; \ ``ipv6=on|off``; \ ``unix=on|off``
 Force using the specified IP version.
 
 ``password=``
 Set the password you need to authenticate.
 
-``sasl``
+``sasl=on|off``
 Require that the client use SASL to authenticate with the spice.
 The exact choice of authentication method used is controlled
 from the system / user's SASL configuration file for the 'qemu'
@@ -1940,13 +1941,13 @@ SRST
 data encryption preventing compromise of authentication
 credentials.
 
-``disable-ticketing``
+``disable-ticketing=on|off``
 Allow client connects without authentication.
 
-``disable-copy-paste``
+``disable-copy-paste=on|off``
 Disable copy paste between the client and the guest.
 
-``disable-agent-file-xfer``
+``disable-agent-file-xfer=on|off``
 Disable spice-vdagent based file-xfer between the client and the
 guest.
 
-- 
2.29.2

[PATCH 02/10] qemu-options: update to show preferred boolean syntax for -chardev

2021-02-16 Thread Daniel P . Berrangé

The preferred syntax is to use "foo=on|off", rather than a bare
"foo" or "nofoo".

Signed-off-by: Daniel P. Berrangé 
---
 qemu-options.hx | 78 -
 1 file changed, 39 insertions(+), 39 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 6c34c7050f..972ef412cc 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3032,13 +3032,13 @@ DEFHEADING(Character device options:)
 DEF("chardev", HAS_ARG, QEMU_OPTION_chardev,
 "-chardev help\n"
 "-chardev null,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
-"-chardev 
socket,id=id[,host=host],port=port[,to=to][,ipv4][,ipv6][,nodelay][,reconnect=seconds]\n"
-" 
[,server][,nowait][,telnet][,websocket][,reconnect=seconds][,mux=on|off]\n"
+"-chardev 
socket,id=id[,host=host],port=port[,to=to][,ipv4=on|off][,ipv6=on|off][,delay=on|off][,reconnect=seconds]\n"
+" 
[,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect=seconds][,mux=on|off]\n"
 " [,logfile=PATH][,logappend=on|off][,tls-creds=ID][,tls-authz=ID] 
(tcp)\n"
-"-chardev 
socket,id=id,path=path[,server][,nowait][,telnet][,websocket][,reconnect=seconds]\n"
+"-chardev 
socket,id=id,path=path[,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect=seconds]\n"
 " 
[,mux=on|off][,logfile=PATH][,logappend=on|off][,abstract=on|off][,tight=on|off]
 (unix)\n"
 "-chardev udp,id=id[,host=host],port=port[,localaddr=localaddr]\n"
-" [,localport=localport][,ipv4][,ipv6][,mux=on|off]\n"
+" 
[,localport=localport][,ipv4=on|off][,ipv6=on|off][,mux=on|off]\n"
 " [,logfile=PATH][,logappend=on|off]\n"
 "-chardev msmouse,id=id[,mux=on|off][,logfile=PATH][,logappend=on|off]\n"
 "-chardev 
vc,id=id[[,width=width][,height=height]][[,cols=cols][,rows=rows]]\n"
@@ -3148,21 +3148,21 @@ The available backends are:
 A void device. This device will not emit any data, and will drop any
 data it receives. The null backend does not take any options.
 
-``-chardev socket,id=id[,TCP options or unix 
options][,server][,nowait][,telnet][,websocket][,reconnect=seconds][,tls-creds=id][,tls-authz=id]``
+``-chardev socket,id=id[,TCP options or unix 
options][,server=on|off][,wait=on|off][,telnet=on|off][,websocket=on|off][,reconnect=seconds][,tls-creds=id][,tls-authz=id]``
 Create a two-way stream socket, which can be either a TCP or a unix
 socket. A unix socket will be created if ``path`` is specified.
 Behaviour is undefined if TCP options are specified for a unix
 socket.
 
-``server`` specifies that the socket shall be a listening socket.
+``server=on|off`` specifies that the socket shall be a listening socket.
 
-``nowait`` specifies that QEMU should not block waiting for a client
+``wait=on|off`` specifies that QEMU should not block waiting for a client
 to connect to a listening socket.
 
-``telnet`` specifies that traffic on the socket should interpret
+``telnet=on|off`` specifies that traffic on the socket should interpret
 telnet escape sequences.
 
-``websocket`` specifies that the socket uses WebSocket protocol for
+``websocket=on|off`` specifies that the socket uses WebSocket protocol for
 communication.
 
 ``reconnect`` sets the timeout for reconnecting on non-server
@@ -3183,7 +3183,7 @@ The available backends are:
 
 TCP and unix socket options are given below:
 
-``TCP options: port=port[,host=host][,to=to][,ipv4][,ipv6][,nodelay]``
+``TCP options: 
port=port[,host=host][,to=to][,ipv4=on|off][,ipv6=on|off][,delay=on|off]``
 ``host`` for a listening socket specifies the local address to
 be bound. For a connecting socket species the remote host to
 connect to. ``host`` is optional for listening sockets. If not
@@ -3199,21 +3199,21 @@ The available backends are:
 bind to subsequent ports up to and including ``to`` until it
 succeeds. ``to`` must be specified as a port number.
 
-``ipv4`` and ``ipv6`` specify that either IPv4 or IPv6 must be
-used. If neither is specified the socket may use either
-protocol.
+``ipv4=on|off`` and ``ipv6=on|off`` specify that either IPv4
+or IPv6 must be used. If neither is specified the socket may
+use either protocol.
 
-``nodelay`` disables the Nagle algorithm.
+``delay=on|off`` disables the Nagle algorithm.
 
 ``unix options: path=path[,abstract=on|off][,tight=on|off]``
 ``path`` specifies the local path of the unix socket. ``path``
 is required.
-``abstract`` specifies the use of the abstract socket namespace,
+``abstract=on|off`` specifies the use of the abstract socket namespace,
 rather than the filesystem.  Optional, defaults to false.
-``tight`` sets the socket length of abstract sockets to their minimum,
+``tight=on|off`` sets

[PATCH 01/10] gdbstub: use preferred boolean option syntax

2021-02-16 Thread Daniel P . Berrangé

Signed-off-by: Daniel P. Berrangé 
---
 gdbstub.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gdbstub.c b/gdbstub.c
index 759bb00bcf..3ee40479b6 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -3505,7 +3505,7 @@ int gdbserver_start(const char *device)
 if (strstart(device, "tcp:", NULL)) {
 /* enforce required TCP attributes */
 snprintf(gdbstub_device_name, sizeof(gdbstub_device_name),
- "%s,nowait,nodelay,server", device);
+ "%s,wait=off,delay=off,server=on", device);
 device = gdbstub_device_name;
 }
 #ifndef _WIN32
-- 
2.29.2

[PATCH 00/10] qemu-options: always use key=on|off syntax for bool option docs

2021-02-16 Thread Daniel P . Berrangé

In discussing the proposal to deprecate the short form bool options
without a value, it was pointed out that the docs mostly use the
short form:

  https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg05018.html

This series converts all usage of the short form into the canonical
key=on|off boolean syntax. As we can see, there was quite alot in
the docs using the various short forms. Not sure if I got all of
them since identifying them is quite hard.

Daniel P. Berrangé (10):
  gdbstub: use preferred boolean option syntax
  qemu-options: update to show preferred boolean syntax for -chardev
  qemu-options: update to show preferred boolean syntax for -spice
  qemu-options: update to show preferred boolean syntax for -netdev
  qemu-options: update to show preferred boolean syntax for -incoming
  qemu-options: update to show preferred boolean syntax for -vnc
  docs: update to show preferred boolean syntax for -chardev
  docs: update to show preferred boolean syntax for -vnc
  docs: update to show preferred boolean syntax for -cpu
  target/i386: update to show preferred boolean syntax for -cpu

 docs/COLO-FT.txt   |  12 +--
 docs/ccid.txt  |   6 +-
 docs/colo-proxy.txt|  16 +--
 docs/devel/writing-qmp-commands.txt|   2 +-
 docs/interop/firmware.json |   2 +-
 docs/interop/live-block-operations.rst |   4 +-
 docs/interop/qmp-intro.txt |   4 +-
 docs/system/cpu-hotplug.rst|   2 +-
 docs/system/cpu-models-x86.rst.inc |   4 +-
 docs/system/s390x/3270.rst |   2 +-
 docs/system/target-avr.rst |   2 +-
 docs/system/vnc-security.rst   |   8 +-
 docs/tools/qemu-storage-daemon.rst |   4 +-
 gdbstub.c  |   2 +-
 qemu-options.hx| 131 +
 scripts/qmp/qemu-ga-client |   2 +-
 target/i386/cpu.c  |   2 +-
 tests/qtest/test-x86-cpuid-compat.c|  52 +-
 tests/test-char.c  |   4 +-
 19 files changed, 132 insertions(+), 129 deletions(-)

-- 
2.29.2

Re: [PATCH v5 00/11] virtio-mem: vfio support

2021-02-16 Thread Alex Williamson

On Tue, 16 Feb 2021 19:49:09 +0100
David Hildenbrand  wrote:

> On 16.02.21 19:33, Alex Williamson wrote:
> > On Mon, 15 Feb 2021 15:03:43 +0100
> > David Hildenbrand  wrote:
> >   
> >> On 08.02.21 09:28, David Hildenbrand wrote:  
> >>> On 27.01.21 13:45, Michael S. Tsirkin wrote:  
>  On Thu, Jan 21, 2021 at 12:05:29PM +0100, David Hildenbrand wrote:  
> > A virtio-mem device manages a memory region in guest physical address
> > space, represented as a single (currently large) memory region in QEMU,
> > mapped into system memory address space. Before the guest is allowed to 
> > use
> > memory blocks, it must coordinate with the hypervisor (plug blocks). 
> > After
> > a reboot, all memory is usually unplugged - when the guest comes up, it
> > detects the virtio-mem device and selects memory blocks to plug (based 
> > on
> > resize requests from the hypervisor).
> >
> > Memory hot(un)plug consists of (un)plugging memory blocks via a 
> > virtio-mem
> > device (triggered by the guest). When unplugging blocks, we discard the
> > memory - similar to memory balloon inflation. In contrast to memory
> > ballooning, we always know which memory blocks a guest may actually use 
> > -
> > especially during a reboot, after a crash, or after kexec (and during
> > hibernation as well). Guests agreed to not access unplugged memory 
> > again,
> > especially not via DMA.
> >
> > The issue with vfio is, that it cannot deal with random discards - for 
> > this
> > reason, virtio-mem and vfio can currently only run mutually exclusive.
> > Especially, vfio would currently map the whole memory region (with 
> > possible
> > only little/no plugged blocks), resulting in all pages getting pinned 
> > and
> > therefore resulting in a higher memory consumption than expected 
> > (turning
> > virtio-mem basically useless in these environments).
> >
> > To make vfio work nicely with virtio-mem, we have to map only the 
> > plugged
> > blocks, and map/unmap properly when plugging/unplugging blocks 
> > (including
> > discarding of RAM when unplugging). We achieve that by using a new 
> > notifier
> > mechanism that communicates changes.  
> 
>  series
> 
>  Acked-by: Michael S. Tsirkin 
> 
>  virtio bits
> 
>  Reviewed-by: Michael S. Tsirkin 
> 
>  This needs to go through vfio tree I assume.  
> >>>
> >>> Thanks Michael.
> >>>
> >>> @Alex, what are your suggestions?  
> >>
> >> Gentle ping.  
> > 
> > Sorry for the delay.  It looks to me like patches 1, 8, and 9 are
> > Memory API that are still missing an Ack from Paolo.  I'll toss in my
> > A-b+R-b for patches 6 and 7.  I don't see that this necessarily needs
> > to go in through vfio, I'm more than happy if someone else wants to
> > grab it.  Thanks,  
> 
> Thanks, I assume patch #11 is fine with you as well?

Yep, sent my acks for it as well.  Thanks,

Alex
 
> @Paolo, it would be great if I can get your feedback on patch #1. I have 
> more stuff coming up that will reuse RamDiscardMgr (i.e., for better 
> migration handling and better guest memory dump handling).
>

Re: [PATCH v5 11/11] vfio: Disable only uncoordinated discards for VFIO_TYPE1 iommus

2021-02-16 Thread Alex Williamson

On Thu, 21 Jan 2021 12:05:40 +0100
David Hildenbrand  wrote:

> We support coordinated discarding of RAM using the RamDiscardMgr for
> the VFIO_TYPE1 iommus. Let's unlock support for coordinated discards,
> keeping uncoordinated discards (e.g., via virtio-balloon) disabled if
> possible.
> 
> This unlocks virtio-mem + vfio on x86-64. Note that vfio used via "nvme://"
> by the block layer has to be implemented/unlocked separately. For now,
> virtio-mem only supports x86-64; we don't restrict RamDiscardMgr to x86-64,
> though: arm64 and s390x are supposed to work as well, and we'll test
> once unlocking virtio-mem support. The spapr IOMMUs will need special
> care, to be tackled later, e.g.., once supporting virtio-mem.
> 
> Note: The block size of a virtio-mem device has to be set to sane sizes,
> depending on the maximum hotplug size - to not run out of vfio mappings.
> The default virtio-mem block size is usually in the range of a couple of
> MBs. The maximum number of mapping is 64k, shared with other users.
> Assume you want to hotplug 256GB using virtio-mem - the block size would
> have to be set to at least 8 MiB (resulting in 32768 separate mappings).
> 
> Cc: Paolo Bonzini 
> Cc: "Michael S. Tsirkin" 
> Cc: Alex Williamson 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Cc: Peter Xu 
> Cc: Auger Eric 
> Cc: Wei Yang 
> Cc: teawater 
> Cc: Marek Kedzierski 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/vfio/common.c | 63 +++-
>  1 file changed, 51 insertions(+), 12 deletions(-)


Acked-by: Alex Williamson 
Reviewed-by: Alex Williamson 


> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 15ecd05a4b..d879b8ab92 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -135,6 +135,27 @@ static const char *index_to_str(VFIODevice *vbasedev, 
> int index)
>  }
>  }
>  
> +static int vfio_ram_block_discard_disable(VFIOContainer *container, bool 
> state)
> +{
> +switch (container->iommu_type) {
> +case VFIO_TYPE1v2_IOMMU:
> +case VFIO_TYPE1_IOMMU:
> +/* We support coordinated discarding of RAM via the RamDiscardMgr. */
> +return ram_block_uncoordinated_discard_disable(state);
> +default:
> +/*
> + * VFIO_SPAPR_TCE_IOMMU most probably works just fine with
> + * RamDiscardMgr, however, it is completely untested.
> + *
> + * VFIO_SPAPR_TCE_v2_IOMMU with "DMA memory preregistering" does
> + * completely the opposite of managing mapping/pinning dynamically as
> + * required by RamDiscardMgr. We would have to special-case sections
> + * with a RamDiscardMgr.
> + */
> +return ram_block_discard_disable(state);
> +}
> +}
> +
>  int vfio_set_irq_signaling(VFIODevice *vbasedev, int index, int subindex,
> int action, int fd, Error **errp)
>  {
> @@ -1979,15 +2000,25 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as,
>   * new memory, it will not yet set ram_block_discard_set_required() and
>   * therefore, neither stops us here or deals with the sudden memory
>   * consumption of inflated memory.
> + *
> + * We do support discarding of memory coordinated via the RamDiscardMgr
> + * with some IOMMU types. vfio_ram_block_discard_disable() handles the
> + * details once we know which type of IOMMU we are using.
>   */
> -ret = ram_block_discard_disable(true);
> -if (ret) {
> -error_setg_errno(errp, -ret, "Cannot set discarding of RAM broken");
> -return ret;
> -}
>  
>  QLIST_FOREACH(container, &space->containers, next) {
>  if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
> +ret = vfio_ram_block_discard_disable(container, true);
> +if (ret) {
> +error_setg_errno(errp, -ret,
> + "Cannot set discarding of RAM broken");
> +if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER,
> +  &container->fd)) {
> +error_report("vfio: error disconnecting group %d from"
> + " container", group->groupid);
> +}
> +return ret;
> +}
>  group->container = container;
>  QLIST_INSERT_HEAD(&container->group_list, group, container_next);
>  vfio_kvm_device_add_group(group);
> @@ -2025,6 +2056,12 @@ static int vfio_connect_container(VFIOGroup *group, 
> AddressSpace *as,
>  goto free_container_exit;
>  }
>  
> +ret = vfio_ram_block_discard_disable(container, true);
> +if (ret) {
> +error_setg_errno(errp, -ret, "Cannot set discarding of RAM broken");
> +goto free_container_exit;
> +}
> +
>  switch (container->iommu_type) {
>  case VFIO_TYPE1v2_IOMMU:
>  case VFIO_TYPE1_IOMMU:
> @@ -2072,7 +2109,7 @@ static int vfio_co

[PATCH v2] linux-user/mmap: Return EFAULT/EINVAL for invalid addresses

2021-02-16 Thread Richard Purdie

When using qemu-i386 to build qemux86 webkitgtk on musl, it sits in an
infinite loop of mremap calls of ever decreasing/increasing addresses.

I suspect something in the musl memory allocation code loops
indefinitely if it only sees ENOMEM and only exits when it hits other
errors such as EFAULT or EINVAL.

According to the docs, trying to mremap outside the address space
can/should return EFAULT and changing this allows the build to succeed.

A better return value for the other cases of invalid addresses is
EINVAL rather than ENOMEM so adjust the other part of the test to this.

Signed-off-by: Richard Purdie

Re: [PATCH v11 12/12] migration: introduce snapshot-{save,load,delete} QMP commands

2021-02-16 Thread John Snow


On 2/4/21 7:48 AM, Daniel P. Berrangé wrote:

savevm, loadvm and delvm are some of the few HMP commands that have never
been converted to use QMP. The reasons for the lack of conversion are
that they blocked execution of the event thread, and the semantics
around choice of disks were ill-defined.

Despite this downside, however, libvirt and applications using libvirt
have used these commands for as long as QMP has existed, via the
"human-monitor-command" passthrough command. IOW, while it is clearly
desirable to be able to fix the problems, they are not a blocker to
all real world usage.

Meanwhile there is a need for other features which involve adding new
parameters to the commands. This is possible with HMP passthrough, but
it provides no reliable way for apps to introspect features, so using
QAPI modelling is highly desirable.

This patch thus introduces new snapshot-{load,save,delete} commands to
QMP that are intended to replace the old HMP counterparts. The new
commands are given different names, because they will be using the new
QEMU job framework and thus will have diverging behaviour from the HMP
originals. It would thus be misleading to keep the same name.

While this design uses the generic job framework, the current impl is
still blocking. The intention that the blocking problem is fixed later.
None the less applications using these new commands should assume that
they are asynchronous and thus wait for the job status change event to
indicate completion.

In addition to using the job framework, the new commands require the
caller to be explicit about all the block device nodes used in the
snapshot operations, with no built-in default heuristics in use.

Note that the existing "query-named-block-nodes" can be used to query
what snapshots currently exist for block nodes.



I wasn't sure how you were actually tackling this, but the approach laid 
out in the commit message here looks like a very good idea that doesn't 
require the full resolution of the savevm problem.


Acked-by: John Snow 


Acked-by: Markus Armbruster 
Signed-off-by: Daniel P. Berrangé 
---
  migration/savevm.c| 184 +++
  qapi/job.json |   9 +-
  qapi/migration.json   | 173 ++
  .../tests/internal-snapshots-qapi | 386 +
  .../tests/internal-snapshots-qapi.out | 520 ++
  5 files changed, 1271 insertions(+), 1 deletion(-)
  create mode 100755 tests/qemu-iotests/tests/internal-snapshots-qapi
  create mode 100644 tests/qemu-iotests/tests/internal-snapshots-qapi.out

diff --git a/migration/savevm.c b/migration/savevm.c
index 48186918a3..6b320423c7 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3077,3 +3077,187 @@ bool vmstate_check_only_migratable(const 
VMStateDescription *vmsd)
  
  return !(vmsd && vmsd->unmigratable);

  }
+
+typedef struct SnapshotJob {
+Job common;
+char *tag;
+char *vmstate;
+strList *devices;
+Coroutine *co;
+Error **errp;
+bool ret;
+} SnapshotJob;
+
+static void qmp_snapshot_job_free(SnapshotJob *s)
+{
+g_free(s->tag);
+g_free(s->vmstate);
+qapi_free_strList(s->devices);
+}
+
+
+static void snapshot_load_job_bh(void *opaque)
+{
+Job *job = opaque;
+SnapshotJob *s = container_of(job, SnapshotJob, common);
+int orig_vm_running;
+
+job_progress_set_remaining(&s->common, 1);
+
+orig_vm_running = runstate_is_running();
+vm_stop(RUN_STATE_RESTORE_VM);
+
+s->ret = load_snapshot(s->tag, s->vmstate, true, s->devices, s->errp);
+if (s->ret && orig_vm_running) {
+vm_start();
+}
+
+job_progress_update(&s->common, 1);
+
+qmp_snapshot_job_free(s);
+aio_co_wake(s->co);
+}
+
+static void snapshot_save_job_bh(void *opaque)
+{
+Job *job = opaque;
+SnapshotJob *s = container_of(job, SnapshotJob, common);
+
+job_progress_set_remaining(&s->common, 1);
+s->ret = save_snapshot(s->tag, false, s->vmstate,
+   true, s->devices, s->errp);
+job_progress_update(&s->common, 1);
+
+qmp_snapshot_job_free(s);
+aio_co_wake(s->co);
+}
+
+static void snapshot_delete_job_bh(void *opaque)
+{
+Job *job = opaque;
+SnapshotJob *s = container_of(job, SnapshotJob, common);
+
+job_progress_set_remaining(&s->common, 1);
+s->ret = delete_snapshot(s->tag, true, s->devices, s->errp);
+job_progress_update(&s->common, 1);
+
+qmp_snapshot_job_free(s);
+aio_co_wake(s->co);
+}
+
+static int coroutine_fn snapshot_save_job_run(Job *job, Error **errp)
+{
+SnapshotJob *s = container_of(job, SnapshotJob, common);
+s->errp = errp;
+s->co = qemu_coroutine_self();
+aio_bh_schedule_oneshot(qemu_get_aio_context(),
+snapshot_save_job_bh, job);
+qemu_coroutine_yield();
+return s->ret ? 0 : -1;
+}
+
+static int coroutine_fn snapshot_load_job_run(Job *job, Error **errp)
+{
+SnapshotJob *

Re: [PATCH v5 01/11] memory: Introduce RamDiscardMgr for RAM memory regions

2021-02-16 Thread David Hildenbrand


+/**
+ * @is_populated:
+ *
+ * Check whether the given range within the #MemoryRegion is completely
+ * populated (i.e., no parts are currently discarded). There are no
+ * alignment requirements for the range.
+ *
+ * @rdm: the #RamDiscardMgr
+ * @mr: the #MemoryRegion
+ * @offset: offset into the #MemoryRegion
+ * @size: size in the #MemoryRegion
+ *
+ * Returns whether the given range is completely populated.
+ */
+bool (*is_populated)(const RamDiscardMgr *rdm, const MemoryRegion *mr,
+ ram_addr_t start, ram_addr_t offset);


Minor thing I noticed in the meanwhile: Variable naming here is wrong. 
Last line should be:


"ram_addr_t offset, ram_addr_t size"

--
Thanks,

David / dhildenb

Re: [PATCH v5 00/11] virtio-mem: vfio support

2021-02-16 Thread David Hildenbrand


On 16.02.21 19:33, Alex Williamson wrote:

On Mon, 15 Feb 2021 15:03:43 +0100
David Hildenbrand  wrote:


On 08.02.21 09:28, David Hildenbrand wrote:

On 27.01.21 13:45, Michael S. Tsirkin wrote:

On Thu, Jan 21, 2021 at 12:05:29PM +0100, David Hildenbrand wrote:

A virtio-mem device manages a memory region in guest physical address
space, represented as a single (currently large) memory region in QEMU,
mapped into system memory address space. Before the guest is allowed to use
memory blocks, it must coordinate with the hypervisor (plug blocks). After
a reboot, all memory is usually unplugged - when the guest comes up, it
detects the virtio-mem device and selects memory blocks to plug (based on
resize requests from the hypervisor).

Memory hot(un)plug consists of (un)plugging memory blocks via a virtio-mem
device (triggered by the guest). When unplugging blocks, we discard the
memory - similar to memory balloon inflation. In contrast to memory
ballooning, we always know which memory blocks a guest may actually use -
especially during a reboot, after a crash, or after kexec (and during
hibernation as well). Guests agreed to not access unplugged memory again,
especially not via DMA.

The issue with vfio is, that it cannot deal with random discards - for this
reason, virtio-mem and vfio can currently only run mutually exclusive.
Especially, vfio would currently map the whole memory region (with possible
only little/no plugged blocks), resulting in all pages getting pinned and
therefore resulting in a higher memory consumption than expected (turning
virtio-mem basically useless in these environments).

To make vfio work nicely with virtio-mem, we have to map only the plugged
blocks, and map/unmap properly when plugging/unplugging blocks (including
discarding of RAM when unplugging). We achieve that by using a new notifier
mechanism that communicates changes.


series

Acked-by: Michael S. Tsirkin 

virtio bits

Reviewed-by: Michael S. Tsirkin 

This needs to go through vfio tree I assume.


Thanks Michael.

@Alex, what are your suggestions?


Gentle ping.


Sorry for the delay.  It looks to me like patches 1, 8, and 9 are
Memory API that are still missing an Ack from Paolo.  I'll toss in my
A-b+R-b for patches 6 and 7.  I don't see that this necessarily needs
to go in through vfio, I'm more than happy if someone else wants to
grab it.  Thanks,


Thanks, I assume patch #11 is fine with you as well?

@Paolo, it would be great if I can get your feedback on patch #1. I have 
more stuff coming up that will reuse RamDiscardMgr (i.e., for better 
migration handling and better guest memory dump handling).


--
Thanks,

David / dhildenb

[PULL 3/6] tools/virtiofsd: Replace the word 'whitelist'

2021-02-16 Thread Dr. David Alan Gilbert (git)

From: Philippe Mathieu-Daudé 

Follow the inclusive terminology from the "Conscious Language in your
Open Source Projects" guidelines [*] and replace the words "whitelist"
appropriately.

[*] https://github.com/conscious-lang/conscious-lang-docs/blob/main/faq.md

Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Philippe Mathieu-Daudé 
Message-Id: <20210205171817.2108907-3-phi...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 tools/virtiofsd/passthrough_ll.c  |  6 +++---
 tools/virtiofsd/passthrough_seccomp.c | 12 ++--
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 147b59338a..5f3afe8557 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -3204,7 +3204,7 @@ static void setup_mounts(const char *source)
 }
 
 /*
- * Only keep whitelisted capabilities that are needed for file system operation
+ * Only keep capabilities in allowlist that are needed for file system 
operation
  * The (possibly NULL) modcaps_in string passed in is free'd before exit.
  */
 static void setup_capabilities(char *modcaps_in)
@@ -3214,8 +3214,8 @@ static void setup_capabilities(char *modcaps_in)
 capng_restore_state(&cap.saved);
 
 /*
- * Whitelist file system-related capabilities that are needed for a file
- * server to act like root.  Drop everything else like networking and
+ * Add to allowlist file system-related capabilities that are needed for a
+ * file server to act like root.  Drop everything else like networking and
  * sysadmin capabilities.
  *
  * Exclusions:
diff --git a/tools/virtiofsd/passthrough_seccomp.c 
b/tools/virtiofsd/passthrough_seccomp.c
index ea852e2e33..62441cfcdb 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -21,7 +21,7 @@
 #endif
 #endif
 
-static const int syscall_whitelist[] = {
+static const int syscall_allowlist[] = {
 /* TODO ireg sem*() syscalls */
 SCMP_SYS(brk),
 SCMP_SYS(capget), /* For CAP_FSETID */
@@ -117,12 +117,12 @@ static const int syscall_whitelist[] = {
 };
 
 /* Syscalls used when --syslog is enabled */
-static const int syscall_whitelist_syslog[] = {
+static const int syscall_allowlist_syslog[] = {
 SCMP_SYS(send),
 SCMP_SYS(sendto),
 };
 
-static void add_whitelist(scmp_filter_ctx ctx, const int syscalls[], size_t 
len)
+static void add_allowlist(scmp_filter_ctx ctx, const int syscalls[], size_t 
len)
 {
 size_t i;
 
@@ -153,10 +153,10 @@ void setup_seccomp(bool enable_syslog)
 exit(1);
 }
 
-add_whitelist(ctx, syscall_whitelist, G_N_ELEMENTS(syscall_whitelist));
+add_allowlist(ctx, syscall_allowlist, G_N_ELEMENTS(syscall_allowlist));
 if (enable_syslog) {
-add_whitelist(ctx, syscall_whitelist_syslog,
-  G_N_ELEMENTS(syscall_whitelist_syslog));
+add_allowlist(ctx, syscall_allowlist_syslog,
+  G_N_ELEMENTS(syscall_allowlist_syslog));
 }
 
 /* libvhost-user calls this for post-copy migration, we don't need it */
-- 
2.29.2

[PULL 4/6] virtiofsd: Save error code early at the failure callsite

2021-02-16 Thread Dr. David Alan Gilbert (git)

From: Vivek Goyal 

Change error code handling slightly in lo_setattr(). Right now we seem
to jump to out_err and assume that "errno" is valid and use that to
send reply.

But if caller has to do some other operations before jumping to out_err,
then it does the dance of first saving errno to saverr and the restore
errno before jumping to out_err. This makes it more confusing.

I am about to make more changes where caller will have to do some
work after error before jumping to out_err. I found it easier to
change the convention a bit. That is caller saves error in "saverr"
before jumping to out_err. And out_err uses "saverr" to send error
back and does not rely on "errno" having actual error.

v3: Resolved conflicts in lo_setattr() due to lo_inode_open() changes.

Signed-off-by: Vivek Goyal 
Reviewed-by: Dr. David Alan Gilbert 
Message-Id: <20210208224024.43555-2-vgo...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 tools/virtiofsd/passthrough_ll.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 5f3afe8557..216c0bc026 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -698,6 +698,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, 
struct stat *attr,
 res = fchmodat(lo->proc_self_fd, procname, attr->st_mode, 0);
 }
 if (res == -1) {
+saverr = errno;
 goto out_err;
 }
 }
@@ -707,6 +708,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, 
struct stat *attr,
 
 res = fchownat(ifd, "", uid, gid, AT_EMPTY_PATH | AT_SYMLINK_NOFOLLOW);
 if (res == -1) {
+saverr = errno;
 goto out_err;
 }
 }
@@ -718,16 +720,15 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, 
struct stat *attr,
 } else {
 truncfd = lo_inode_open(lo, inode, O_RDWR);
 if (truncfd < 0) {
-errno = -truncfd;
+saverr = -truncfd;
 goto out_err;
 }
 }
 
 res = ftruncate(truncfd, attr->st_size);
+saverr = res == -1 ? errno : 0;
 if (!fi) {
-saverr = errno;
 close(truncfd);
-errno = saverr;
 }
 if (res == -1) {
 goto out_err;
@@ -760,6 +761,7 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, 
struct stat *attr,
 res = utimensat(lo->proc_self_fd, procname, tv, 0);
 }
 if (res == -1) {
+saverr = errno;
 goto out_err;
 }
 }
@@ -768,7 +770,6 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, 
struct stat *attr,
 return lo_getattr(req, ino, fi);
 
 out_err:
-saverr = errno;
 lo_inode_put(lo, &inode);
 fuse_reply_err(req, saverr);
 }
-- 
2.29.2

[PULL 2/6] virtiofsd: vu_dispatch locking should never fail

2021-02-16 Thread Dr. David Alan Gilbert (git)

From: Greg Kurz 

pthread_rwlock_rdlock() and pthread_rwlock_wrlock() can fail if a
deadlock condition is detected or the current thread already owns
the lock. They can also fail, like pthread_rwlock_unlock(), if the
mutex wasn't properly initialized. None of these are ever expected
to happen with fv_VuDev::vu_dispatch_rwlock.

Some users already check the return value and assert, some others
don't. Introduce rdlock/wrlock/unlock wrappers that just do the
former and use them everywhere for improved consistency and
robustness.

This is just cleanup. It doesn't fix any actual issue.

Signed-off-by: Greg Kurz 
Message-Id: <20210203182434.93870-1-gr...@kaod.org>
Reviewed-by: Vivek Goyal 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Dr. David Alan Gilbert 
---
 tools/virtiofsd/fuse_virtio.c | 49 +--
 1 file changed, 35 insertions(+), 14 deletions(-)

diff --git a/tools/virtiofsd/fuse_virtio.c b/tools/virtiofsd/fuse_virtio.c
index ddcefee427..523ee64fb7 100644
--- a/tools/virtiofsd/fuse_virtio.c
+++ b/tools/virtiofsd/fuse_virtio.c
@@ -187,6 +187,31 @@ static void copy_iov(struct iovec *src_iov, int src_count,
 }
 }
 
+/*
+ * pthread_rwlock_rdlock() and pthread_rwlock_wrlock can fail if
+ * a deadlock condition is detected or the current thread already
+ * owns the lock. They can also fail, like pthread_rwlock_unlock(),
+ * if the mutex wasn't properly initialized. None of these are ever
+ * expected to happen.
+ */
+static void vu_dispatch_rdlock(struct fv_VuDev *vud)
+{
+int ret = pthread_rwlock_rdlock(&vud->vu_dispatch_rwlock);
+assert(ret == 0);
+}
+
+static void vu_dispatch_wrlock(struct fv_VuDev *vud)
+{
+int ret = pthread_rwlock_wrlock(&vud->vu_dispatch_rwlock);
+assert(ret == 0);
+}
+
+static void vu_dispatch_unlock(struct fv_VuDev *vud)
+{
+int ret = pthread_rwlock_unlock(&vud->vu_dispatch_rwlock);
+assert(ret == 0);
+}
+
 /*
  * Called back by ll whenever it wants to send a reply/message back
  * The 1st element of the iov starts with the fuse_out_header
@@ -240,12 +265,12 @@ int virtio_send_msg(struct fuse_session *se, struct 
fuse_chan *ch,
 
 copy_iov(iov, count, in_sg, in_num, tosend_len);
 
-pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
+vu_dispatch_rdlock(qi->virtio_dev);
 pthread_mutex_lock(&qi->vq_lock);
 vu_queue_push(dev, q, elem, tosend_len);
 vu_queue_notify(dev, q);
 pthread_mutex_unlock(&qi->vq_lock);
-pthread_rwlock_unlock(&qi->virtio_dev->vu_dispatch_rwlock);
+vu_dispatch_unlock(qi->virtio_dev);
 
 req->reply_sent = true;
 
@@ -403,12 +428,12 @@ int virtio_send_data_iov(struct fuse_session *se, struct 
fuse_chan *ch,
 
 ret = 0;
 
-pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
+vu_dispatch_rdlock(qi->virtio_dev);
 pthread_mutex_lock(&qi->vq_lock);
 vu_queue_push(dev, q, elem, tosend_len);
 vu_queue_notify(dev, q);
 pthread_mutex_unlock(&qi->vq_lock);
-pthread_rwlock_unlock(&qi->virtio_dev->vu_dispatch_rwlock);
+vu_dispatch_unlock(qi->virtio_dev);
 
 err:
 if (ret == 0) {
@@ -558,12 +583,12 @@ out:
 fuse_log(FUSE_LOG_DEBUG, "%s: elem %d no reply sent\n", __func__,
  elem->index);
 
-pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
+vu_dispatch_rdlock(qi->virtio_dev);
 pthread_mutex_lock(&qi->vq_lock);
 vu_queue_push(dev, q, elem, 0);
 vu_queue_notify(dev, q);
 pthread_mutex_unlock(&qi->vq_lock);
-pthread_rwlock_unlock(&qi->virtio_dev->vu_dispatch_rwlock);
+vu_dispatch_unlock(qi->virtio_dev);
 }
 
 pthread_mutex_destroy(&req->ch.lock);
@@ -596,7 +621,6 @@ static void *fv_queue_thread(void *opaque)
  qi->qidx, qi->kick_fd);
 while (1) {
 struct pollfd pf[2];
-int ret;
 
 pf[0].fd = qi->kick_fd;
 pf[0].events = POLLIN;
@@ -645,8 +669,7 @@ static void *fv_queue_thread(void *opaque)
 break;
 }
 /* Mutual exclusion with virtio_loop() */
-ret = pthread_rwlock_rdlock(&qi->virtio_dev->vu_dispatch_rwlock);
-assert(ret == 0); /* there is no possible error case */
+vu_dispatch_rdlock(qi->virtio_dev);
 pthread_mutex_lock(&qi->vq_lock);
 /* out is from guest, in is too guest */
 unsigned int in_bytes, out_bytes;
@@ -672,7 +695,7 @@ static void *fv_queue_thread(void *opaque)
 }
 
 pthread_mutex_unlock(&qi->vq_lock);
-pthread_rwlock_unlock(&qi->virtio_dev->vu_dispatch_rwlock);
+vu_dispatch_unlock(qi->virtio_dev);
 
 /* Process all the requests. */
 if (!se->thread_pool_size && req_list != NULL) {
@@ -799,7 +822,6 @@ int virtio_loop(struct fuse_session *se)
 while (!fuse_session_exited(se)) {
 struct pollfd pf[1];
 bool ok;
-int ret;
 pf[0].fd = se->vu_socketfd;
 pf[0].events = POLLIN;
 pf[0].revents = 0;
@@ -825,12

[PULL 0/6] virtiofs queue

2021-02-16 Thread Dr. David Alan Gilbert (git)

From: "Dr. David Alan Gilbert" 

The following changes since commit 18543229fd7a2c79dcd6818c7b1f0f62512b5220:

  Merge remote-tracking branch 
'remotes/cleber-gitlab/tags/python-next-pull-request' into staging (2021-02-16 
14:37:57 +)

are available in the Git repository at:

  https://gitlab.com/dagrh/qemu.git tags/pull-virtiofs-20210216

for you to fetch changes up to 26ec1909648e0c06ff06ebc3ddb2f88ebeeaa6a9:

  virtiofsd: Do not use a thread pool by default (2021-02-16 17:54:18 +)


virtiofsd pull 2021-02-16

Vivek's support for new FUSE KILLPRIV_V2
and some smaller cleanups.

Signed-off-by: Dr. David Alan Gilbert 


Greg Kurz (1):
  virtiofsd: vu_dispatch locking should never fail

Philippe Mathieu-Daudé (1):
  tools/virtiofsd: Replace the word 'whitelist'

Vivek Goyal (3):
  virtiofsd: Save error code early at the failure callsite
  viriofsd: Add support for FUSE_HANDLE_KILLPRIV_V2
  virtiofsd: Do not use a thread pool by default

Wainer dos Santos Moschetta (1):
  virtiofsd: Allow to build it without the tools

 tools/meson.build |  7 ++-
 tools/virtiofsd/fuse_common.h | 15 ++
 tools/virtiofsd/fuse_lowlevel.c   | 13 -
 tools/virtiofsd/fuse_lowlevel.h   |  1 +
 tools/virtiofsd/fuse_virtio.c | 49 -
 tools/virtiofsd/passthrough_ll.c  | 99 ++-
 tools/virtiofsd/passthrough_seccomp.c | 12 ++---
 7 files changed, 158 insertions(+), 38 deletions(-)

[PULL 5/6] viriofsd: Add support for FUSE_HANDLE_KILLPRIV_V2

2021-02-16 Thread Dr. David Alan Gilbert (git)

From: Vivek Goyal 

This patch adds basic support for FUSE_HANDLE_KILLPRIV_V2. virtiofsd
can enable/disable this by specifying option "-o killpriv_v2/no_killpriv_v2".
By default this is enabled as long as client supports it

Enabling this option helps with performance in write path. Without this
option, currently every write is first preceeded with a getxattr() operation
to find out if security.capability is set. (Write is supposed to clear
security.capability). With this option enabled, server is signing up for
clearing security.capability on every WRITE and also clearing suid/sgid
subject to certain rules. This gets rid of extra getxattr() call for every
WRITE and improves performance. This is true when virtiofsd is run with
option -o xattr.

What does enabling FUSE_HANDLE_KILLPRIV_V2 mean for file server implementation.
It needs to adhere to following rules. Thanks to Miklos for this summary.

- clear "security.capability" on write, truncate and chown unconditionally
- clear suid/sgid in case of following. Note, sgid is cleared only if
  group executable bit is set.
o setattr has FATTR_SIZE and FATTR_KILL_SUIDGID set.
o setattr has FATTR_UID or FATTR_GID
o open has O_TRUNC and FUSE_OPEN_KILL_SUIDGID
o create has O_TRUNC and FUSE_OPEN_KILL_SUIDGID flag set.
o write has FUSE_WRITE_KILL_SUIDGID

>From Linux VFS client perspective, here are the requirements.

- caps are always cleared on chown/write/truncate
- suid is always cleared on chown, while for truncate/write it is cleared
  only if caller does not have CAP_FSETID.
- sgid is always cleared on chown, while for truncate/write it is cleared
  only if caller does not have CAP_FSETID as well as file has group execute
  permission.

virtiofsd implementation has not changed much to adhere to above ruls. And
reason being that current assumption is that we are running on Linux
and on top of filesystems like ext4/xfs which already follow above rules.
On write, truncate, chown, seucurity.capability is cleared. And virtiofsd
drops CAP_FSETID if need be and that will lead to clearing of suid/sgid.

But if virtiofsd is running on top a filesystem which breaks above assumptions,
then it will have to take extra actions to emulate above. That's a TODO
for later when need arises.

Note: create normally is supposed to be called only when file does not
  exist. So generally there should not be any question of clearing
  setuid/setgid. But it is possible that after client checks that
  file is not present, some other client creates file on server
  and this race can trigger sending FUSE_CREATE. In that case, if
  O_TRUNC is set, we should clear suid/sgid if FUSE_OPEN_KILL_SUIDGID
  is also set.

v3:
  - Resolved conflicts due to lo_inode_open() changes.
  - Moved capability code in lo_do_open() so that both lo_open() and
lo_create() can benefit from common code.
  - Dropped changes to kernel headers as these are part of qemu already.

Signed-off-by: Vivek Goyal 
Acked-by: Stefan Hajnoczi 
Reviewed-by: Dr. David Alan Gilbert 
Message-Id: <20210208224024.43555-3-vgo...@redhat.com>
Signed-off-by: Dr. David Alan Gilbert 
---
 tools/virtiofsd/fuse_common.h| 15 ++
 tools/virtiofsd/fuse_lowlevel.c  | 11 -
 tools/virtiofsd/fuse_lowlevel.h  |  1 +
 tools/virtiofsd/passthrough_ll.c | 84 +---
 4 files changed, 103 insertions(+), 8 deletions(-)

diff --git a/tools/virtiofsd/fuse_common.h b/tools/virtiofsd/fuse_common.h
index a090040bb2..fa9671872e 100644
--- a/tools/virtiofsd/fuse_common.h
+++ b/tools/virtiofsd/fuse_common.h
@@ -357,6 +357,21 @@ struct fuse_file_info {
  */
 #define FUSE_CAP_SUBMOUNTS (1 << 27)
 
+/**
+ * Indicates that the filesystem is responsible for clearing
+ * security.capability xattr and clearing setuid and setgid bits. Following
+ * are the rules.
+ * - clear "security.capability" on write, truncate and chown unconditionally
+ * - clear suid/sgid if following is true. Note, sgid is cleared only if
+ *   group executable bit is set.
+ *o setattr has FATTR_SIZE and FATTR_KILL_SUIDGID set.
+ *o setattr has FATTR_UID or FATTR_GID
+ *o open has O_TRUNC and FUSE_OPEN_KILL_SUIDGID
+ *o create has O_TRUNC and FUSE_OPEN_KILL_SUIDGID flag set.
+ *o write has FUSE_WRITE_KILL_SUIDGID
+ */
+#define FUSE_CAP_HANDLE_KILLPRIV_V2 (1 << 28)
+
 /**
  * Ioctl flags
  *
diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index e94b71110b..f78692ef66 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -855,7 +855,7 @@ static void do_setattr(fuse_req_t req, fuse_ino_t nodeid,
   FUSE_SET_ATTR_GID | FUSE_SET_ATTR_SIZE |
   FUSE_SET_ATTR_ATIME | FUSE_SET_ATTR_MTIME |
   FUSE_SET_ATTR_ATIME_NOW | FUSE_SET_ATTR_MTIME_NOW |
-  FUSE_SET_ATTR_CTIME;
+  FUSE_SET_ATTR_CTIME | FUSE_SET_ATTR_KILL_SUIDGID;
 
 req->se->

[PULL 6/6] virtiofsd: Do not use a thread pool by default

2021-02-16 Thread Dr. David Alan Gilbert (git)

From: Vivek Goyal 

Currently we created a thread pool (With 64 max threads per pool) for
each virtqueue. We hoped that this will provide us with better scalability
and performance.

But in practice, we are getting better numbers in most of the cases
when we don't create a thread pool at all and a single thread per
virtqueue receives the request and processes it.

Hence, I am proposing that we switch to no thread pool by default
(equivalent of --thread-pool-size=0). This will provide out of
box better performance to most of the users. In fact other users
have confirmed that not using a thread pool gives them better
numbers. So why not use this as default. It can be changed when
somebody can fix the issues with thread pool performance.

Signed-off-by: Vivek Goyal 
Message-Id: <20210210182744.27324-2-vgo...@redhat.com>
Reviewed-by: Dr. David Alan Gilbert 
Signed-off-by: Dr. David Alan Gilbert 
---
 tools/virtiofsd/fuse_lowlevel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index f78692ef66..1aa26c6333 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -18,7 +18,7 @@
 
 #include 
 
-#define THREAD_POOL_SIZE 64
+#define THREAD_POOL_SIZE 0
 
 #define OFFSET_MAX 0x7fffLL
 
-- 
2.29.2

[PULL 1/6] virtiofsd: Allow to build it without the tools

2021-02-16 Thread Dr. David Alan Gilbert (git)

From: Wainer dos Santos Moschetta 

This changed the Meson build script to allow virtiofsd be built even
though the tools build is disabled, thus honoring the --enable-virtiofsd
option.

Fixes: cece116c939d219070b250338439c2d16f94e3da (configure: add option for 
virtiofsd)
Signed-off-by: Wainer dos Santos Moschetta 
Message-Id: <20210201211456.1133364-2-waine...@redhat.com>
Reviewed-by: Misono Tomohiro 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Dr. David Alan Gilbert 
---
 tools/meson.build | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/meson.build b/tools/meson.build
index fdce66857d..3e5a0abfa2 100644
--- a/tools/meson.build
+++ b/tools/meson.build
@@ -10,8 +10,11 @@ if get_option('virtiofsd').enabled()
   error('virtiofsd requires Linux')
 elif not seccomp.found() or not libcap_ng.found()
   error('virtiofsd requires libcap-ng-devel and seccomp-devel')
-elif not have_tools or 'CONFIG_VHOST_USER' not in config_host
-  error('virtiofsd needs tools and vhost-user support')
+elif 'CONFIG_VHOST_USER' not in config_host
+  error('virtiofsd needs vhost-user support')
+else
+  # Disabled all the tools but virtiofsd.
+  have_virtiofsd = true
 endif
   endif
 elif get_option('virtiofsd').disabled() or not have_system
-- 
2.29.2

Re: [PATCH v5 07/11] vfio: Support for RamDiscardMgr in the vIOMMU case

2021-02-16 Thread Alex Williamson

On Thu, 21 Jan 2021 12:05:36 +0100
David Hildenbrand  wrote:

> vIOMMU support works already with RamDiscardMgr as long as guests only
> map populated memory. Both, populated and discarded memory is mapped
> into &address_space_memory, where vfio_get_xlat_addr() will find that
> memory, to create the vfio mapping.
> 
> Sane guests will never map discarded memory (e.g., unplugged memory
> blocks in virtio-mem) into an IOMMU - or keep it mapped into an IOMMU while
> memory is getting discarded. However, there are two cases where a malicious
> guests could trigger pinning of more memory than intended.
> 
> One case is easy to handle: the guest trying to map discarded memory
> into an IOMMU.
> 
> The other case is harder to handle: the guest keeping memory mapped in
> the IOMMU while it is getting discarded. We would have to walk over all
> mappings when discarding memory and identify if any mapping would be a
> violation. Let's keep it simple for now and print a warning, indicating
> that setting RLIMIT_MEMLOCK can mitigate such attacks.
> 
> We have to take care of incoming migration: at the point the
> IOMMUs get restored and start creating mappings in vfio, RamDiscardMgr
> implementations might not be back up and running yet: let's add runstate
> priorities to enforce the order when restoring.
> 
> Cc: Paolo Bonzini 
> Cc: "Michael S. Tsirkin" 
> Cc: Alex Williamson 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Cc: Peter Xu 
> Cc: Auger Eric 
> Cc: Wei Yang 
> Cc: teawater 
> Cc: Marek Kedzierski 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/vfio/common.c| 35 +++


Acked-by: Alex Williamson 
Reviewed-by: Alex Williamson 


>  hw/virtio/virtio-mem.c  |  1 +
>  include/migration/vmstate.h |  1 +
>  3 files changed, 37 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 166ec6ec62..15ecd05a4b 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -36,6 +36,7 @@
>  #include "qemu/range.h"
>  #include "sysemu/kvm.h"
>  #include "sysemu/reset.h"
> +#include "sysemu/runstate.h"
>  #include "trace.h"
>  #include "qapi/error.h"
>  #include "migration/migration.h"
> @@ -574,6 +575,40 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, 
> void **vaddr,
>  error_report("iommu map to non memory area %"HWADDR_PRIx"",
>   xlat);
>  return false;
> +} else if (memory_region_has_ram_discard_mgr(mr)) {
> +RamDiscardMgr *rdm = memory_region_get_ram_discard_mgr(mr);
> +RamDiscardMgrClass *rdmc = RAM_DISCARD_MGR_GET_CLASS(rdm);
> +
> +/*
> + * Malicious VMs can map memory into the IOMMU, which is expected
> + * to remain discarded. vfio will pin all pages, populating memory.
> + * Disallow that. vmstate priorities make sure any RamDiscardMgr were
> + * already restored before IOMMUs are restored.
> + */
> +if (!rdmc->is_populated(rdm, mr, xlat, len)) {
> +error_report("iommu map to discarded memory (e.g., unplugged via"
> + " virtio-mem): %"HWADDR_PRIx"",
> + iotlb->translated_addr);
> +return false;
> +}
> +
> +/*
> + * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> + * pages will remain pinned inside vfio until unmapped, resulting in 
> a
> + * higher memory consumption than expected. If memory would get
> + * populated again later, there would be an inconsistency between 
> pages
> + * pinned by vfio and pages seen by QEMU. This is the case until
> + * unmapped from the IOMMU (e.g., during device reset).
> + *
> + * With malicious guests, we really only care about pinning more 
> memory
> + * than expected. RLIMIT_MEMLOCK set for the user/process can never 
> be
> + * exceeded and can be used to mitigate this problem.
> + */
> +warn_report_once("Using vfio with vIOMMUs and coordinated discarding 
> of"
> + " RAM (e.g., virtio-mem) works, however, malicious"
> + " guests can trigger pinning of more memory than"
> + " intended via an IOMMU. It's possible to mitigate "
> + " by setting/adjusting RLIMIT_MEMLOCK.");
>  }
>  
>  /*
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index 6200813bb8..f419a758f3 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -871,6 +871,7 @@ static const VMStateDescription vmstate_virtio_mem_device 
> = {
>  .name = "virtio-mem-device",
>  .minimum_version_id = 1,
>  .version_id = 1,
> +.priority = MIG_PRI_VIRTIO_MEM,
>  .post_load = virtio_mem_post_load,
>  .fields = (VMStateField[]) {
>  VMSTATE_WITH_TMP(VirtIOMEM, VirtIOMEMMigSanityChecks,
> diff --git a/include/migration/vmstate.h b/include/migrat

Re: [PATCH v5 06/11] vfio: Sanity check maximum number of DMA mappings with RamDiscardMgr

2021-02-16 Thread Alex Williamson

On Thu, 21 Jan 2021 12:05:35 +0100
David Hildenbrand  wrote:

> Although RamDiscardMgr can handle running into the maximum number of
> DMA mappings by propagating errors when creating a DMA mapping, we want
> to sanity check and warn the user early that there is a theoretical setup
> issue and that virtio-mem might not be able to provide as much memory
> towards a VM as desired.
> 
> As suggested by Alex, let's use the number of KVM memory slots to guess
> how many other mappings we might see over time.
> 
> Cc: Paolo Bonzini 
> Cc: "Michael S. Tsirkin" 
> Cc: Alex Williamson 
> Cc: Dr. David Alan Gilbert 
> Cc: Igor Mammedov 
> Cc: Pankaj Gupta 
> Cc: Peter Xu 
> Cc: Auger Eric 
> Cc: Wei Yang 
> Cc: teawater 
> Cc: Marek Kedzierski 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/vfio/common.c | 43 +++
>  1 file changed, 43 insertions(+)


Acked-by: Alex Williamson 



> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 78be813a53..166ec6ec62 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -761,6 +761,49 @@ static void 
> vfio_register_ram_discard_notifier(VFIOContainer *container,
>vfio_ram_discard_notify_discard_all);
>  rdmc->register_listener(rdm, section->mr, &vrdl->listener);
>  QLIST_INSERT_HEAD(&container->vrdl_list, vrdl, next);
> +
> +/*
> + * Sanity-check if we have a theoretically problematic setup where we 
> could
> + * exceed the maximum number of possible DMA mappings over time. We 
> assume
> + * that each mapped section in the same address space as a RamDiscardMgr
> + * section consumes exactly one DMA mapping, with the exception of
> + * RamDiscardMgr sections; i.e., we don't expect to have gIOMMU sections 
> in
> + * the same address space as RamDiscardMgr sections.
> + *
> + * We assume that each section in the address space consumes one memslot.
> + * We take the number of KVM memory slots as a best guess for the maximum
> + * number of sections in the address space we could have over time,
> + * also consuming DMA mappings.
> + */
> +if (container->dma_max_mappings) {
> +unsigned int vrdl_count = 0, vrdl_mappings = 0, max_memslots = 512;
> +
> +#ifdef CONFIG_KVM
> +if (kvm_enabled()) {
> +max_memslots = kvm_get_max_memslots();
> +}
> +#endif
> +
> +QLIST_FOREACH(vrdl, &container->vrdl_list, next) {
> +hwaddr start, end;
> +
> +start = QEMU_ALIGN_DOWN(vrdl->offset_within_address_space,
> +vrdl->granularity);
> +end = ROUND_UP(vrdl->offset_within_address_space + vrdl->size,
> +   vrdl->granularity);
> +vrdl_mappings += (end - start) / vrdl->granularity;
> +vrdl_count++;
> +}
> +
> +if (vrdl_mappings + max_memslots - vrdl_count >
> +container->dma_max_mappings) {
> +warn_report("%s: possibly running out of DMA mappings. E.g., try"
> +" increasing the 'block-size' of virtio-mem devies."
> +" Maximum possible DMA mappings: %d, Maximum 
> possible"
> +" memslots: %d", __func__, 
> container->dma_max_mappings,
> +max_memslots);
> +}
> +}
>  }
>  
>  static void vfio_unregister_ram_discard_listener(VFIOContainer *container,

1 2 3 4 >

1 - 100 of 356 matches

Mail list logo