Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands

2021-04-26 Thread Alistair Francis
On Fri, Apr 23, 2021 at 4:46 PM Bin Meng  wrote:
>
> On Mon, Feb 8, 2021 at 10:41 PM Bin Meng  wrote:
> >
> > On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
> >  wrote:
> > >
> > > Hi Bin,
> > >
> > > On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> > > > Hi Francisco,
> > > >
> > > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> > > >  wrote:
> > > > >
> > > > > Dear Bin,
> > > > >
> > > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > > > > > Hi Francisco,
> > > > > >
> > > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > > > > >  wrote:
> > > > > > >
> > > > > > > Hi Bin,
> > > > > > >
> > > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > > > > > Hi Francisco,
> > > > > > > >
> > > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > Hi Bin,
> > > > > > > > >
> > > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > > > > > Hi Francisco,
> > > > > > > > > >
> > > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > > > > >  wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Bin,
> > > > > > > > > > >
> > > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > > > > > Hi Francisco,
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > > > > > From: Bin Meng 
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate 
> > > > > > > > > > > > > > how many follow-up
> > > > > > > > > > > > > > bytes are expected to be received after it receives 
> > > > > > > > > > > > > > a command. For
> > > > > > > > > > > > > > example, depending on the address mode, either 
> > > > > > > > > > > > > > 3-byte address or
> > > > > > > > > > > > > > 4-byte address is needed.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > For fast read family commands, some dummy cycles 
> > > > > > > > > > > > > > are required after
> > > > > > > > > > > > > > sending the address bytes, and the dummy cycles 
> > > > > > > > > > > > > > need to be counted
> > > > > > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the 
> > > > > > > > > > > > > > unit is in byte.
> > > > > > > > > > > > > > It is not in bit, or cycle. However for some reason 
> > > > > > > > > > > > > > the model has
> > > > > > > > > > > > > > been using the number of dummy cycles for 
> > > > > > > > > > > > > > s->needed_bytes. The right
> > > > > > > > > > > > > > approach is to convert the number of dummy cycles 
> > > > > > > > > > > > > > to bytes based on
> > > > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for 
> > > > > > > > > > > > > > the Fast Read Quad
> > > > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the 
> > > > > > > > > > > > > > formula (6 * 4 / 8).
> > > > > > > > > > > > >
> > > > > > > > > > > > > While not being the original implementor I must 
> > > > > > > > > > > > > assume that above solution was
> > > > > > > > > > > > > considered but not chosen by the developers due to it 
> > > > > > > > > > > > > is inaccuracy (it
> > > > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, 
> > > > > > > > > > > > > only a multiple of 8,
> > > > > > > > > > > > > meaning that if the controller is wrongly programmed 
> > > > > > > > > > > > > to generate 7 the error
> > > > > > > > > > > > > wouldn't be caught and the controller will still be 
> > > > > > > > > > > > > considered "correct"). Now
> > > > > > > > > > > > > that we have this detail in the implementation I'm in 
> > > > > > > > > > > > > favor of keeping it, this
> > > > > > > > > > > > > also because the detail is already in use for 
> > > > > > > > > > > > > catching exactly above error.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I found no clue from the commit message that my 
> > > > > > > > > > > > proposed solution here
> > > > > > > > > > > > was ever considered, otherwise all SPI controller 
> > > > > > > > > > > > models supporting
> > > > > > > > > > > > software generation should have been found out 
> > > > > > > > > > > > seriously broken long
> > > > > > > > > > > > time ago!
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The controllers you are referring to might lack support 
> > > > > > > > > > > for commands requiring
> > > > > > > > > > > dummy clock cycles but I really hope they work with the 
> > > > > > > > > > > other commands? If so I
> > > > > > > > > >
> > > > > > > > > > I am not sure why you view dummy clock cycles as something 
> > > > > > > > > > 

Re: [PATCH 1/5] hw/ppc/spapr_iommu: Register machine reset handler

2021-04-26 Thread David Gibson
On Sat, Apr 24, 2021 at 06:22:25PM +0200, Philippe Mathieu-Daudé wrote:
> The TYPE_SPAPR_TCE_TABLE device is bus-less, thus isn't reset
> automatically.  Register a reset handler to get reset with the
> machine.
> 
> It doesn't seem to be an issue because it is that way since the
> device QDev'ifycation 8 years ago, in commit a83000f5e3f
> ("spapr-tce: make sPAPRTCETable a proper device").
> Still, correct to have a proper API usage.

So, the reason this works now is that we explicitly call
device_reset() on the TCE table from the TCE tables "owner", either a
PHB (spapr_phb_reset()) or a VIO device (spapr_vio_quiesce_one()).

I think we want either that, or the register_reset(), not both.

> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/ppc/spapr_iommu.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 24537ffcbd3..f7dad1dc0fe 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -24,6 +24,7 @@
>  #include "sysemu/kvm.h"
>  #include "kvm_ppc.h"
>  #include "migration/vmstate.h"
> +#include "sysemu/reset.h"
>  #include "sysemu/dma.h"
>  #include "exec/address-spaces.h"
>  #include "trace.h"
> @@ -302,6 +303,11 @@ static const VMStateDescription vmstate_spapr_tce_table 
> = {
>  }
>  };
>  
> +static void spapr_tce_reset_handler(void *dev)
> +{
> +device_legacy_reset(DEVICE(dev));
> +}
> +
>  static void spapr_tce_table_realize(DeviceState *dev, Error **errp)
>  {
>  SpaprTceTable *tcet = SPAPR_TCE_TABLE(dev);
> @@ -324,6 +330,8 @@ static void spapr_tce_table_realize(DeviceState *dev, 
> Error **errp)
>  
>  vmstate_register(VMSTATE_IF(tcet), tcet->liobn, _spapr_tce_table,
>   tcet);
> +
> +qemu_register_reset(spapr_tce_reset_handler, dev);
>  }
>  
>  void spapr_tce_set_need_vfio(SpaprTceTable *tcet, bool need_vfio)
> @@ -425,6 +433,8 @@ static void spapr_tce_table_unrealize(DeviceState *dev)
>  {
>  SpaprTceTable *tcet = SPAPR_TCE_TABLE(dev);
>  
> +qemu_unregister_reset(spapr_tce_reset_handler, dev);
> +
>  vmstate_unregister(VMSTATE_IF(tcet), _spapr_tce_table, tcet);
>  
>  QLIST_REMOVE(tcet, list);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 4/5] hw/pci-host/raven: Manually reset the OR_IRQ device

2021-04-26 Thread David Gibson
On Sat, Apr 24, 2021 at 06:22:28PM +0200, Philippe Mathieu-Daudé wrote:
> The OR_IRQ device is bus-less, thus isn't reset automatically.
> Add the raven_pcihost_reset() handler to manually reset the OR IRQ.
> 
> Fixes: f40b83a4e31 ("40p: use OR gate to wire up raven PCI interrupts")
> Signed-off-by: Philippe Mathieu-Daudé 

Acked-by: David Gibson 

> ---
>  hw/pci-host/prep.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/hw/pci-host/prep.c b/hw/pci-host/prep.c
> index 0a9162fba97..275379e4c78 100644
> --- a/hw/pci-host/prep.c
> +++ b/hw/pci-host/prep.c
> @@ -230,6 +230,15 @@ static void raven_change_gpio(void *opaque, int n, int 
> level)
>  s->contiguous_map = level;
>  }
>  
> +static void raven_pcihost_reset(DeviceState *dev)
> +{
> +PREPPCIState *s = RAVEN_PCI_HOST_BRIDGE(dev);
> +
> +if (!s->is_legacy_prep) {
> +device_legacy_reset(DEVICE(>or_irq));
> +}
> +}
> +
>  static void raven_pcihost_realizefn(DeviceState *d, Error **errp)
>  {
>  SysBusDevice *dev = SYS_BUS_DEVICE(d);
> @@ -422,6 +431,7 @@ static void raven_pcihost_class_init(ObjectClass *klass, 
> void *data)
>  
>  set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories);
>  dc->realize = raven_pcihost_realizefn;
> +dc->reset = raven_pcihost_reset;
>  device_class_set_props(dc, raven_pcihost_properties);
>  dc->fw_name = "pci";
>  }

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH] qapi: deprecate drive-backup

2021-04-26 Thread Vladimir Sementsov-Ogievskiy

26.04.2021 21:30, John Snow wrote:

On 4/26/21 2:05 PM, Daniel P. Berrangé wrote:

On Mon, Apr 26, 2021 at 09:00:36PM +0300, Vladimir Sementsov-Ogievskiy wrote:

26.04.2021 20:34, John Snow wrote:

On 4/23/21 8:59 AM, Vladimir Sementsov-Ogievskiy wrote:

Modern way is using blockdev-add + blockdev-backup, which provides a
lot more control on how target is opened.

As example of drive-backup problems consider the following:

User of drive-backup expects that target will be opened in the same
cache and aio mode as source. Corresponding logic is in
drive_backup_prepare(), where we take bs->open_flags of source.

It works rather bad if source was added by blockdev-add. Assume source
is qcow2 image. On blockdev-add we should specify aio and cache options
for file child of qcow2 node. What happens next:

drive_backup_prepare() looks at bs->open_flags of qcow2 source node.
But there no BDRV_O_NOCAHE neither BDRV_O_NATIVE_AIO: BDRV_O_NOCAHE is
places in bs->file->bs->open_flags, and BDRV_O_NATIVE_AIO is nowhere,
as file-posix parse options and simply set s->use_linux_aio.



No complaints from me, especially if Virtuozzo is on board. I would like to see 
some documentation changes alongside this deprecation, though.


Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all! I remember, I suggested to deprecate drive-backup some time ago,
and nobody complain.. But that old patch was inside the series with
other more questionable deprecations and it did not landed.

Let's finally deprecate what should be deprecated long ago.

We now faced a problem in our downstream, described in commit message.
In downstream I've fixed it by simply enabling O_DIRECT and linux_aio
unconditionally for drive_backup target. But actually this just shows
that using drive-backup in blockdev era is a bad idea. So let's motivate
everyone (including Virtuozzo of course) to move to new interfaces and
avoid problems with all that outdated option inheritance.

   docs/system/deprecated.rst | 5 +
   qapi/block-core.json   | 5 -
   2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 80cae86252..b6f5766e17 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -186,6 +186,11 @@ Use the more generic commands ``block-export-add`` and 
``block-export-del``
   instead.  As part of this deprecation, where ``nbd-server-add`` used a
   single ``bitmap``, the new ``block-export-add`` uses a list of ``bitmaps``.
+``drive-backup`` (since 6.0)
+
+
+Use ``blockdev-backup`` in pair with ``blockdev-add`` instead.
+


1) Let's add a sphinx reference to 
https://qemu-project.gitlab.io/qemu/interop/live-block-operations.html#live-disk-backup-drive-backup-and-blockdev-backup


2) Just a thought, not a request: We also may wish to update 
https://qemu-project.gitlab.io/qemu/interop/bitmaps.html to use the new, 
preferred method. However, this doc is a bit old and is in need of an overhaul 
anyway (Especially to add the NBD pull workflow.) Since the doc is in need of 
an overhaul anyway, can we ask Kashyap to help us here, if he has time?


3) Let's add a small explanation here that outlines the differences in using 
these two commands. Here's a suggestion:

This change primarily separates the creation/opening process of the backup target with explicit, separate steps. 
BlockdevBackup uses mostly the same arguments as DriveBackup, except the "format" and "mode" 
options are removed in favor of using explicit "blockdev-create" and "blockdev-add" calls.

The "target" argument changes semantics. It no longer accepts filenames, and 
will now additionally accept arbitrary node names in addition to device names.


4) Also not a request: If we want to go above and beyond, it might be nice to 
spell out the exact steps required to transition from the old interface to the 
new one. Here's a (hasty) suggestion for how that might look:

- The MODE argument is deprecated.
   - "existing" is replaced by using "blockdev-add" commands.
   - "absolute-paths" is replaced by using "blockdev-add" and
     "blockdev-create" commands.

- The FORMAT argument is deprecated.
   - Format information is given to "blockdev-add"/"blockdev-create".

- The TARGET argument has new semantics:
   - Filenames are no longer supported, use blockdev-add/blockdev-create
     as necessary instead.
   - Device targets remain supported.


Example:

drive-backup $ARGS format=$FORMAT mode=$MODE target=$FILENAME becomes:

(taking some liberties with syntax to just illustrate the idea ...)

blockdev-create options={
     "driver": "file",
     "filename": $FILENAME,
     "size": 0,
}

blockdev-add arguments={
     "driver": "file",
     "filename": $FILENAME,
     "node-name": "Example_Filenode0"
}

blockdev-create options={
     "driver": $FORMAT,
     "file": "Example_Filenode0",
     "size": $SIZE,
}

blockdev-add arguments={
     "driver": $FORMAT,
     "file": 

Re: [PATCH] qapi: deprecate drive-backup

2021-04-26 Thread John Snow

On 4/26/21 2:41 PM, Vladimir Sementsov-Ogievskiy wrote:

26.04.2021 21:30, John Snow wrote:

On 4/26/21 2:05 PM, Daniel P. Berrangé wrote:
On Mon, Apr 26, 2021 at 09:00:36PM +0300, Vladimir 
Sementsov-Ogievskiy wrote:

26.04.2021 20:34, John Snow wrote:

On 4/23/21 8:59 AM, Vladimir Sementsov-Ogievskiy wrote:

Modern way is using blockdev-add + blockdev-backup, which provides a
lot more control on how target is opened.

As example of drive-backup problems consider the following:

User of drive-backup expects that target will be opened in the same
cache and aio mode as source. Corresponding logic is in
drive_backup_prepare(), where we take bs->open_flags of source.

It works rather bad if source was added by blockdev-add. Assume 
source
is qcow2 image. On blockdev-add we should specify aio and cache 
options

for file child of qcow2 node. What happens next:

drive_backup_prepare() looks at bs->open_flags of qcow2 source node.
But there no BDRV_O_NOCAHE neither BDRV_O_NATIVE_AIO: 
BDRV_O_NOCAHE is

places in bs->file->bs->open_flags, and BDRV_O_NATIVE_AIO is nowhere,
as file-posix parse options and simply set s->use_linux_aio.



No complaints from me, especially if Virtuozzo is on board. I would 
like to see some documentation changes alongside this deprecation, 
though.


Signed-off-by: Vladimir Sementsov-Ogievskiy 


---

Hi all! I remember, I suggested to deprecate drive-backup some 
time ago,

and nobody complain.. But that old patch was inside the series with
other more questionable deprecations and it did not landed.

Let's finally deprecate what should be deprecated long ago.

We now faced a problem in our downstream, described in commit 
message.

In downstream I've fixed it by simply enabling O_DIRECT and linux_aio
unconditionally for drive_backup target. But actually this just shows
that using drive-backup in blockdev era is a bad idea. So let's 
motivate
everyone (including Virtuozzo of course) to move to new interfaces 
and

avoid problems with all that outdated option inheritance.

   docs/system/deprecated.rst | 5 +
   qapi/block-core.json   | 5 -
   2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 80cae86252..b6f5766e17 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -186,6 +186,11 @@ Use the more generic commands 
``block-export-add`` and ``block-export-del``
   instead.  As part of this deprecation, where ``nbd-server-add`` 
used a
   single ``bitmap``, the new ``block-export-add`` uses a list of 
``bitmaps``.

+``drive-backup`` (since 6.0)
+
+
+Use ``blockdev-backup`` in pair with ``blockdev-add`` instead.
+


1) Let's add a sphinx reference to 
https://qemu-project.gitlab.io/qemu/interop/live-block-operations.html#live-disk-backup-drive-backup-and-blockdev-backup 




2) Just a thought, not a request: We also may wish to update 
https://qemu-project.gitlab.io/qemu/interop/bitmaps.html to use the 
new, preferred method. However, this doc is a bit old and is in 
need of an overhaul anyway (Especially to add the NBD pull 
workflow.) Since the doc is in need of an overhaul anyway, can we 
ask Kashyap to help us here, if he has time?



3) Let's add a small explanation here that outlines the differences 
in using these two commands. Here's a suggestion:


This change primarily separates the creation/opening process of the 
backup target with explicit, separate steps. BlockdevBackup uses 
mostly the same arguments as DriveBackup, except the "format" and 
"mode" options are removed in favor of using explicit 
"blockdev-create" and "blockdev-add" calls.




(Here, I accidentally used the names of the argument objects instead of 
the names of the commands. It's likely better to spell out the names of 
the commands instead.)


The "target" argument changes semantics. It no longer accepts 
filenames, and will now additionally accept arbitrary node names in 
addition to device names.



4) Also not a request: If we want to go above and beyond, it might 
be nice to spell out the exact steps required to transition from 
the old interface to the new one. Here's a (hasty) suggestion for 
how that might look:


- The MODE argument is deprecated.
   - "existing" is replaced by using "blockdev-add" commands.
   - "absolute-paths" is replaced by using "blockdev-add" and
     "blockdev-create" commands.

- The FORMAT argument is deprecated.
   - Format information is given to "blockdev-add"/"blockdev-create".

- The TARGET argument has new semantics:
   - Filenames are no longer supported, use 
blockdev-add/blockdev-create

     as necessary instead.
   - Device targets remain supported.


Example:

drive-backup $ARGS format=$FORMAT mode=$MODE target=$FILENAME becomes:

(taking some liberties with syntax to just illustrate the idea ...)

blockdev-create options={
     "driver": "file",
     "filename": $FILENAME,
     "size": 0,
}

blockdev-add arguments={
     

Re: [PATCH] qapi: deprecate drive-backup

2021-04-26 Thread John Snow

On 4/26/21 2:05 PM, Daniel P. Berrangé wrote:

On Mon, Apr 26, 2021 at 09:00:36PM +0300, Vladimir Sementsov-Ogievskiy wrote:

26.04.2021 20:34, John Snow wrote:

On 4/23/21 8:59 AM, Vladimir Sementsov-Ogievskiy wrote:

Modern way is using blockdev-add + blockdev-backup, which provides a
lot more control on how target is opened.

As example of drive-backup problems consider the following:

User of drive-backup expects that target will be opened in the same
cache and aio mode as source. Corresponding logic is in
drive_backup_prepare(), where we take bs->open_flags of source.

It works rather bad if source was added by blockdev-add. Assume source
is qcow2 image. On blockdev-add we should specify aio and cache options
for file child of qcow2 node. What happens next:

drive_backup_prepare() looks at bs->open_flags of qcow2 source node.
But there no BDRV_O_NOCAHE neither BDRV_O_NATIVE_AIO: BDRV_O_NOCAHE is
places in bs->file->bs->open_flags, and BDRV_O_NATIVE_AIO is nowhere,
as file-posix parse options and simply set s->use_linux_aio.



No complaints from me, especially if Virtuozzo is on board. I would like to see 
some documentation changes alongside this deprecation, though.


Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all! I remember, I suggested to deprecate drive-backup some time ago,
and nobody complain.. But that old patch was inside the series with
other more questionable deprecations and it did not landed.

Let's finally deprecate what should be deprecated long ago.

We now faced a problem in our downstream, described in commit message.
In downstream I've fixed it by simply enabling O_DIRECT and linux_aio
unconditionally for drive_backup target. But actually this just shows
that using drive-backup in blockdev era is a bad idea. So let's motivate
everyone (including Virtuozzo of course) to move to new interfaces and
avoid problems with all that outdated option inheritance.

   docs/system/deprecated.rst | 5 +
   qapi/block-core.json   | 5 -
   2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 80cae86252..b6f5766e17 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -186,6 +186,11 @@ Use the more generic commands ``block-export-add`` and 
``block-export-del``
   instead.  As part of this deprecation, where ``nbd-server-add`` used a
   single ``bitmap``, the new ``block-export-add`` uses a list of ``bitmaps``.
+``drive-backup`` (since 6.0)
+
+
+Use ``blockdev-backup`` in pair with ``blockdev-add`` instead.
+


1) Let's add a sphinx reference to 
https://qemu-project.gitlab.io/qemu/interop/live-block-operations.html#live-disk-backup-drive-backup-and-blockdev-backup


2) Just a thought, not a request: We also may wish to update 
https://qemu-project.gitlab.io/qemu/interop/bitmaps.html to use the new, 
preferred method. However, this doc is a bit old and is in need of an overhaul 
anyway (Especially to add the NBD pull workflow.) Since the doc is in need of 
an overhaul anyway, can we ask Kashyap to help us here, if he has time?


3) Let's add a small explanation here that outlines the differences in using 
these two commands. Here's a suggestion:

This change primarily separates the creation/opening process of the backup target with explicit, separate steps. 
BlockdevBackup uses mostly the same arguments as DriveBackup, except the "format" and "mode" 
options are removed in favor of using explicit "blockdev-create" and "blockdev-add" calls.

The "target" argument changes semantics. It no longer accepts filenames, and 
will now additionally accept arbitrary node names in addition to device names.


4) Also not a request: If we want to go above and beyond, it might be nice to 
spell out the exact steps required to transition from the old interface to the 
new one. Here's a (hasty) suggestion for how that might look:

- The MODE argument is deprecated.
   - "existing" is replaced by using "blockdev-add" commands.
   - "absolute-paths" is replaced by using "blockdev-add" and
     "blockdev-create" commands.

- The FORMAT argument is deprecated.
   - Format information is given to "blockdev-add"/"blockdev-create".

- The TARGET argument has new semantics:
   - Filenames are no longer supported, use blockdev-add/blockdev-create
     as necessary instead.
   - Device targets remain supported.


Example:

drive-backup $ARGS format=$FORMAT mode=$MODE target=$FILENAME becomes:

(taking some liberties with syntax to just illustrate the idea ...)

blockdev-create options={
     "driver": "file",
     "filename": $FILENAME,
     "size": 0,
}

blockdev-add arguments={
     "driver": "file",
     "filename": $FILENAME,
     "node-name": "Example_Filenode0"
}

blockdev-create options={
     "driver": $FORMAT,
     "file": "Example_Filenode0",
     "size": $SIZE,
}

blockdev-add arguments={
     "driver": $FORMAT,
     "file": "Example_Filenode0",
     "node-name": 

Re: [PATCH] qapi: deprecate drive-backup

2021-04-26 Thread Daniel P . Berrangé
On Mon, Apr 26, 2021 at 09:00:36PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> 26.04.2021 20:34, John Snow wrote:
> > On 4/23/21 8:59 AM, Vladimir Sementsov-Ogievskiy wrote:
> > > Modern way is using blockdev-add + blockdev-backup, which provides a
> > > lot more control on how target is opened.
> > > 
> > > As example of drive-backup problems consider the following:
> > > 
> > > User of drive-backup expects that target will be opened in the same
> > > cache and aio mode as source. Corresponding logic is in
> > > drive_backup_prepare(), where we take bs->open_flags of source.
> > > 
> > > It works rather bad if source was added by blockdev-add. Assume source
> > > is qcow2 image. On blockdev-add we should specify aio and cache options
> > > for file child of qcow2 node. What happens next:
> > > 
> > > drive_backup_prepare() looks at bs->open_flags of qcow2 source node.
> > > But there no BDRV_O_NOCAHE neither BDRV_O_NATIVE_AIO: BDRV_O_NOCAHE is
> > > places in bs->file->bs->open_flags, and BDRV_O_NATIVE_AIO is nowhere,
> > > as file-posix parse options and simply set s->use_linux_aio.
> > > 
> > 
> > No complaints from me, especially if Virtuozzo is on board. I would like to 
> > see some documentation changes alongside this deprecation, though.
> > 
> > > Signed-off-by: Vladimir Sementsov-Ogievskiy 
> > > ---
> > > 
> > > Hi all! I remember, I suggested to deprecate drive-backup some time ago,
> > > and nobody complain.. But that old patch was inside the series with
> > > other more questionable deprecations and it did not landed.
> > > 
> > > Let's finally deprecate what should be deprecated long ago.
> > > 
> > > We now faced a problem in our downstream, described in commit message.
> > > In downstream I've fixed it by simply enabling O_DIRECT and linux_aio
> > > unconditionally for drive_backup target. But actually this just shows
> > > that using drive-backup in blockdev era is a bad idea. So let's motivate
> > > everyone (including Virtuozzo of course) to move to new interfaces and
> > > avoid problems with all that outdated option inheritance.
> > > 
> > >   docs/system/deprecated.rst | 5 +
> > >   qapi/block-core.json   | 5 -
> > >   2 files changed, 9 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
> > > index 80cae86252..b6f5766e17 100644
> > > --- a/docs/system/deprecated.rst
> > > +++ b/docs/system/deprecated.rst
> > > @@ -186,6 +186,11 @@ Use the more generic commands ``block-export-add`` 
> > > and ``block-export-del``
> > >   instead.  As part of this deprecation, where ``nbd-server-add`` used a
> > >   single ``bitmap``, the new ``block-export-add`` uses a list of 
> > > ``bitmaps``.
> > > +``drive-backup`` (since 6.0)
> > > +
> > > +
> > > +Use ``blockdev-backup`` in pair with ``blockdev-add`` instead.
> > > +
> > 
> > 1) Let's add a sphinx reference to 
> > https://qemu-project.gitlab.io/qemu/interop/live-block-operations.html#live-disk-backup-drive-backup-and-blockdev-backup
> > 
> > 
> > 2) Just a thought, not a request: We also may wish to update 
> > https://qemu-project.gitlab.io/qemu/interop/bitmaps.html to use the new, 
> > preferred method. However, this doc is a bit old and is in need of an 
> > overhaul anyway (Especially to add the NBD pull workflow.) Since the doc is 
> > in need of an overhaul anyway, can we ask Kashyap to help us here, if he 
> > has time?
> > 
> > 
> > 3) Let's add a small explanation here that outlines the differences in 
> > using these two commands. Here's a suggestion:
> > 
> > This change primarily separates the creation/opening process of the backup 
> > target with explicit, separate steps. BlockdevBackup uses mostly the same 
> > arguments as DriveBackup, except the "format" and "mode" options are 
> > removed in favor of using explicit "blockdev-create" and "blockdev-add" 
> > calls.
> > 
> > The "target" argument changes semantics. It no longer accepts filenames, 
> > and will now additionally accept arbitrary node names in addition to device 
> > names.
> > 
> > 
> > 4) Also not a request: If we want to go above and beyond, it might be nice 
> > to spell out the exact steps required to transition from the old interface 
> > to the new one. Here's a (hasty) suggestion for how that might look:
> > 
> > - The MODE argument is deprecated.
> >   - "existing" is replaced by using "blockdev-add" commands.
> >   - "absolute-paths" is replaced by using "blockdev-add" and
> >     "blockdev-create" commands.
> > 
> > - The FORMAT argument is deprecated.
> >   - Format information is given to "blockdev-add"/"blockdev-create".
> > 
> > - The TARGET argument has new semantics:
> >   - Filenames are no longer supported, use blockdev-add/blockdev-create
> >     as necessary instead.
> >   - Device targets remain supported.
> > 
> > 
> > Example:
> > 
> > drive-backup $ARGS format=$FORMAT mode=$MODE target=$FILENAME becomes:
> > 
> > (taking some liberties 

Re: [PATCH] qapi: deprecate drive-backup

2021-04-26 Thread Vladimir Sementsov-Ogievskiy

26.04.2021 20:34, John Snow wrote:

On 4/23/21 8:59 AM, Vladimir Sementsov-Ogievskiy wrote:

Modern way is using blockdev-add + blockdev-backup, which provides a
lot more control on how target is opened.

As example of drive-backup problems consider the following:

User of drive-backup expects that target will be opened in the same
cache and aio mode as source. Corresponding logic is in
drive_backup_prepare(), where we take bs->open_flags of source.

It works rather bad if source was added by blockdev-add. Assume source
is qcow2 image. On blockdev-add we should specify aio and cache options
for file child of qcow2 node. What happens next:

drive_backup_prepare() looks at bs->open_flags of qcow2 source node.
But there no BDRV_O_NOCAHE neither BDRV_O_NATIVE_AIO: BDRV_O_NOCAHE is
places in bs->file->bs->open_flags, and BDRV_O_NATIVE_AIO is nowhere,
as file-posix parse options and simply set s->use_linux_aio.



No complaints from me, especially if Virtuozzo is on board. I would like to see 
some documentation changes alongside this deprecation, though.


Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all! I remember, I suggested to deprecate drive-backup some time ago,
and nobody complain.. But that old patch was inside the series with
other more questionable deprecations and it did not landed.

Let's finally deprecate what should be deprecated long ago.

We now faced a problem in our downstream, described in commit message.
In downstream I've fixed it by simply enabling O_DIRECT and linux_aio
unconditionally for drive_backup target. But actually this just shows
that using drive-backup in blockdev era is a bad idea. So let's motivate
everyone (including Virtuozzo of course) to move to new interfaces and
avoid problems with all that outdated option inheritance.

  docs/system/deprecated.rst | 5 +
  qapi/block-core.json   | 5 -
  2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 80cae86252..b6f5766e17 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -186,6 +186,11 @@ Use the more generic commands ``block-export-add`` and 
``block-export-del``
  instead.  As part of this deprecation, where ``nbd-server-add`` used a
  single ``bitmap``, the new ``block-export-add`` uses a list of ``bitmaps``.
+``drive-backup`` (since 6.0)
+
+
+Use ``blockdev-backup`` in pair with ``blockdev-add`` instead.
+


1) Let's add a sphinx reference to 
https://qemu-project.gitlab.io/qemu/interop/live-block-operations.html#live-disk-backup-drive-backup-and-blockdev-backup


2) Just a thought, not a request: We also may wish to update 
https://qemu-project.gitlab.io/qemu/interop/bitmaps.html to use the new, 
preferred method. However, this doc is a bit old and is in need of an overhaul 
anyway (Especially to add the NBD pull workflow.) Since the doc is in need of 
an overhaul anyway, can we ask Kashyap to help us here, if he has time?


3) Let's add a small explanation here that outlines the differences in using 
these two commands. Here's a suggestion:

This change primarily separates the creation/opening process of the backup target with explicit, separate steps. 
BlockdevBackup uses mostly the same arguments as DriveBackup, except the "format" and "mode" 
options are removed in favor of using explicit "blockdev-create" and "blockdev-add" calls.

The "target" argument changes semantics. It no longer accepts filenames, and 
will now additionally accept arbitrary node names in addition to device names.


4) Also not a request: If we want to go above and beyond, it might be nice to 
spell out the exact steps required to transition from the old interface to the 
new one. Here's a (hasty) suggestion for how that might look:

- The MODE argument is deprecated.
  - "existing" is replaced by using "blockdev-add" commands.
  - "absolute-paths" is replaced by using "blockdev-add" and
    "blockdev-create" commands.

- The FORMAT argument is deprecated.
  - Format information is given to "blockdev-add"/"blockdev-create".

- The TARGET argument has new semantics:
  - Filenames are no longer supported, use blockdev-add/blockdev-create
    as necessary instead.
  - Device targets remain supported.


Example:

drive-backup $ARGS format=$FORMAT mode=$MODE target=$FILENAME becomes:

(taking some liberties with syntax to just illustrate the idea ...)

blockdev-create options={
    "driver": "file",
    "filename": $FILENAME,
    "size": 0,
}

blockdev-add arguments={
    "driver": "file",
    "filename": $FILENAME,
    "node-name": "Example_Filenode0"
}

blockdev-create options={
    "driver": $FORMAT,
    "file": "Example_Filenode0",
    "size": $SIZE,
}

blockdev-add arguments={
    "driver": $FORMAT,
    "file": "Example_Filenode0",
    "node-name": "Example_Formatnode0",
}

blockdev-backup arguments={
    $ARGS ...,
    "target": "Example_Formatnode0",
}



Good ideas. Hmm. Do you think that the whole 

Re: [PATCH] qapi: deprecate drive-backup

2021-04-26 Thread John Snow

On 4/23/21 8:59 AM, Vladimir Sementsov-Ogievskiy wrote:

Modern way is using blockdev-add + blockdev-backup, which provides a
lot more control on how target is opened.

As example of drive-backup problems consider the following:

User of drive-backup expects that target will be opened in the same
cache and aio mode as source. Corresponding logic is in
drive_backup_prepare(), where we take bs->open_flags of source.

It works rather bad if source was added by blockdev-add. Assume source
is qcow2 image. On blockdev-add we should specify aio and cache options
for file child of qcow2 node. What happens next:

drive_backup_prepare() looks at bs->open_flags of qcow2 source node.
But there no BDRV_O_NOCAHE neither BDRV_O_NATIVE_AIO: BDRV_O_NOCAHE is
places in bs->file->bs->open_flags, and BDRV_O_NATIVE_AIO is nowhere,
as file-posix parse options and simply set s->use_linux_aio.



No complaints from me, especially if Virtuozzo is on board. I would like 
to see some documentation changes alongside this deprecation, though.



Signed-off-by: Vladimir Sementsov-Ogievskiy 
---

Hi all! I remember, I suggested to deprecate drive-backup some time ago,
and nobody complain.. But that old patch was inside the series with
other more questionable deprecations and it did not landed.

Let's finally deprecate what should be deprecated long ago.

We now faced a problem in our downstream, described in commit message.
In downstream I've fixed it by simply enabling O_DIRECT and linux_aio
unconditionally for drive_backup target. But actually this just shows
that using drive-backup in blockdev era is a bad idea. So let's motivate
everyone (including Virtuozzo of course) to move to new interfaces and
avoid problems with all that outdated option inheritance.

  docs/system/deprecated.rst | 5 +
  qapi/block-core.json   | 5 -
  2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/docs/system/deprecated.rst b/docs/system/deprecated.rst
index 80cae86252..b6f5766e17 100644
--- a/docs/system/deprecated.rst
+++ b/docs/system/deprecated.rst
@@ -186,6 +186,11 @@ Use the more generic commands ``block-export-add`` and 
``block-export-del``
  instead.  As part of this deprecation, where ``nbd-server-add`` used a
  single ``bitmap``, the new ``block-export-add`` uses a list of ``bitmaps``.
  
+``drive-backup`` (since 6.0)

+
+
+Use ``blockdev-backup`` in pair with ``blockdev-add`` instead.
+


1) Let's add a sphinx reference to 
https://qemu-project.gitlab.io/qemu/interop/live-block-operations.html#live-disk-backup-drive-backup-and-blockdev-backup



2) Just a thought, not a request: We also may wish to update 
https://qemu-project.gitlab.io/qemu/interop/bitmaps.html to use the new, 
preferred method. However, this doc is a bit old and is in need of an 
overhaul anyway (Especially to add the NBD pull workflow.) Since the doc 
is in need of an overhaul anyway, can we ask Kashyap to help us here, if 
he has time?



3) Let's add a small explanation here that outlines the differences in 
using these two commands. Here's a suggestion:


This change primarily separates the creation/opening process of the 
backup target with explicit, separate steps. BlockdevBackup uses mostly 
the same arguments as DriveBackup, except the "format" and "mode" 
options are removed in favor of using explicit "blockdev-create" and 
"blockdev-add" calls.


The "target" argument changes semantics. It no longer accepts filenames, 
and will now additionally accept arbitrary node names in addition to 
device names.



4) Also not a request: If we want to go above and beyond, it might be 
nice to spell out the exact steps required to transition from the old 
interface to the new one. Here's a (hasty) suggestion for how that might 
look:


- The MODE argument is deprecated.
  - "existing" is replaced by using "blockdev-add" commands.
  - "absolute-paths" is replaced by using "blockdev-add" and
"blockdev-create" commands.

- The FORMAT argument is deprecated.
  - Format information is given to "blockdev-add"/"blockdev-create".

- The TARGET argument has new semantics:
  - Filenames are no longer supported, use blockdev-add/blockdev-create
as necessary instead.
  - Device targets remain supported.


Example:

drive-backup $ARGS format=$FORMAT mode=$MODE target=$FILENAME becomes:

(taking some liberties with syntax to just illustrate the idea ...)

blockdev-create options={
"driver": "file",
"filename": $FILENAME,
"size": 0,
}

blockdev-add arguments={
"driver": "file",
"filename": $FILENAME,
"node-name": "Example_Filenode0"
}

blockdev-create options={
"driver": $FORMAT,
"file": "Example_Filenode0",
"size": $SIZE,
}

blockdev-add arguments={
"driver": $FORMAT,
"file": "Example_Filenode0",
"node-name": "Example_Formatnode0",
}

blockdev-backup arguments={
$ARGS ...,
"target": "Example_Formatnode0",
}



  System accelerators
  ---
  
diff --git 

Re: [PATCH v3 22/36] block: add bdrv_remove_filter_or_cow transaction action

2021-04-26 Thread Vladimir Sementsov-Ogievskiy

26.04.2021 19:26, Kevin Wolf wrote:

Am 17.03.2021 um 15:35 hat Vladimir Sementsov-Ogievskiy geschrieben:

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block.c | 78 +++--
  1 file changed, 76 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 11f7ad0818..2fca1f2ad5 100644
--- a/block.c
+++ b/block.c
@@ -2929,12 +2929,19 @@ static void bdrv_replace_child(BdrvChild *child, 
BlockDriverState *new_bs)
  }
  }
  
+static void bdrv_child_free(void *opaque)

+{
+BdrvChild *c = opaque;
+
+g_free(c->name);
+g_free(c);
+}
+
  static void bdrv_remove_empty_child(BdrvChild *child)
  {
  assert(!child->bs);
  QLIST_SAFE_REMOVE(child, next);
-g_free(child->name);
-g_free(child);
+bdrv_child_free(child);
  }
  
  typedef struct BdrvAttachChildCommonState {

@@ -4956,6 +4963,73 @@ static bool should_update_child(BdrvChild *c, 
BlockDriverState *to)
  return ret;
  }
  
+typedef struct BdrvRemoveFilterOrCowChild {

+BdrvChild *child;
+bool is_backing;
+} BdrvRemoveFilterOrCowChild;
+
+/* this doesn't restore original child bs, only the child itself */


Hm, this comment tells me that it's intentional, but why is it correct?


that's because bdrv_remove_filter_or_cow_child_abort() aborts only part of  
bdrv_remove_filter_or_cow_child().

Look: bdrv_remove_filter_or_cow_child() firstly do 
bdrv_replace_child_safe(child, NULL, tran);, so bs would be restored by 
.abort() of bdrv_replace_child_safe() action.


So, improved comment may look like:

This doesn't restore original child bs, only the child itself. The bs would be 
restored by .abort() bdrv_replace_child_safe() subation of 
bdrv_remove_filter_or_cow_child() action.

Probably it would be more correct to rename

BdrvRemoveFilterOrCowChild -> BdrvRemoveFilterOrCowChildNoBs
bdrv_remove_filter_or_cow_child_abort -> 
bdrv_remove_filter_or_cow_child_no_bs_abort
bdrv_remove_filter_or_cow_child_commit -> 
bdrv_remove_filter_or_cow_child_no_bs_commit

and assert on .abort() and .commit() that s->child->bs is NULL.




+static void bdrv_remove_filter_or_cow_child_abort(void *opaque)
+{
+BdrvRemoveFilterOrCowChild *s = opaque;
+BlockDriverState *parent_bs = s->child->opaque;
+
+QLIST_INSERT_HEAD(_bs->children, s->child, next);
+if (s->is_backing) {
+parent_bs->backing = s->child;
+} else {
+parent_bs->file = s->child;
+}
+}


Kevin




--
Best regards,
Vladimir



Re: [PATCH v3 18/36] block: add bdrv_attach_child_common() transaction action

2021-04-26 Thread Vladimir Sementsov-Ogievskiy

26.04.2021 19:14, Kevin Wolf wrote:

Am 17.03.2021 um 15:35 hat Vladimir Sementsov-Ogievskiy geschrieben:

Split out no-perm part of bdrv_root_attach_child() into separate
transaction action. bdrv_root_attach_child() now moves to new
permission update paradigm: first update graph relations then update
permissions.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block.c | 189 
  1 file changed, 135 insertions(+), 54 deletions(-)

diff --git a/block.c b/block.c
index 98ff44dbf7..b6bdc534d2 100644
--- a/block.c
+++ b/block.c
@@ -2921,37 +2921,73 @@ static void bdrv_replace_child(BdrvChild *child, 
BlockDriverState *new_bs)
  }
  }
  
-/*

- * This function steals the reference to child_bs from the caller.
- * That reference is later dropped by bdrv_root_unref_child().
- *
- * On failure NULL is returned, errp is set and the reference to
- * child_bs is also dropped.
- *
- * The caller must hold the AioContext lock @child_bs, but not that of @ctx
- * (unless @child_bs is already in @ctx).
- */
-BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
-  const char *child_name,
-  const BdrvChildClass *child_class,
-  BdrvChildRole child_role,
-  uint64_t perm, uint64_t shared_perm,
-  void *opaque, Error **errp)
+static void bdrv_remove_empty_child(BdrvChild *child)
  {
-BdrvChild *child;
-Error *local_err = NULL;
-int ret;
-AioContext *ctx;
+assert(!child->bs);
+QLIST_SAFE_REMOVE(child, next);
+g_free(child->name);
+g_free(child);
+}
  
-ret = bdrv_check_update_perm(child_bs, NULL, perm, shared_perm, NULL, errp);

-if (ret < 0) {
-bdrv_abort_perm_update(child_bs);
-bdrv_unref(child_bs);
-return NULL;
+typedef struct BdrvAttachChildCommonState {
+BdrvChild **child;
+AioContext *old_parent_ctx;
+AioContext *old_child_ctx;
+} BdrvAttachChildCommonState;
+
+static void bdrv_attach_child_common_abort(void *opaque)
+{
+BdrvAttachChildCommonState *s = opaque;
+BdrvChild *child = *s->child;
+BlockDriverState *bs = child->bs;
+
+bdrv_replace_child_noperm(child, NULL);
+
+if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
+bdrv_try_set_aio_context(bs, s->old_child_ctx, _abort);
  }
  
-child = g_new(BdrvChild, 1);

-*child = (BdrvChild) {
+if (bdrv_child_get_parent_aio_context(child) != s->old_parent_ctx) {
+GSList *ignore = g_slist_prepend(NULL, child);
+
+child->klass->can_set_aio_ctx(child, s->old_parent_ctx, ,
+  _abort);
+g_slist_free(ignore);
+ignore = g_slist_prepend(NULL, child);
+child->klass->set_aio_ctx(child, s->old_parent_ctx, );
+
+g_slist_free(ignore);
+}
+
+bdrv_unref(bs);
+bdrv_remove_empty_child(child);
+*s->child = NULL;
+}
+
+static TransactionActionDrv bdrv_attach_child_common_drv = {
+.abort = bdrv_attach_child_common_abort,
+};
+
+/*
+ * Common part of attoching bdrv child to bs or to blk or to job
+ */
+static int bdrv_attach_child_common(BlockDriverState *child_bs,
+const char *child_name,
+const BdrvChildClass *child_class,
+BdrvChildRole child_role,
+uint64_t perm, uint64_t shared_perm,
+void *opaque, BdrvChild **child,
+Transaction *tran, Error **errp)
+{
+BdrvChild *new_child;
+AioContext *parent_ctx;
+AioContext *child_ctx = bdrv_get_aio_context(child_bs);
+
+assert(child);
+assert(*child == NULL);
+
+new_child = g_new(BdrvChild, 1);
+*new_child = (BdrvChild) {
  .bs = NULL,
  .name   = g_strdup(child_name),
  .klass  = child_class,
@@ -2961,37 +2997,92 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState 
*child_bs,
  .opaque = opaque,
  };
  
-ctx = bdrv_child_get_parent_aio_context(child);

-
-/* If the AioContexts don't match, first try to move the subtree of
+/*
+ * If the AioContexts don't match, first try to move the subtree of
   * child_bs into the AioContext of the new parent. If this doesn't work,
- * try moving the parent into the AioContext of child_bs instead. */
-if (bdrv_get_aio_context(child_bs) != ctx) {
-ret = bdrv_try_set_aio_context(child_bs, ctx, _err);
+ * try moving the parent into the AioContext of child_bs instead.
+ */
+parent_ctx = bdrv_child_get_parent_aio_context(new_child);
+if (child_ctx != parent_ctx) {
+Error *local_err = NULL;
+int ret = bdrv_try_set_aio_context(child_bs, parent_ctx, _err);
+
  if (ret < 0 && 

Re: [PATCH v3 22/36] block: add bdrv_remove_filter_or_cow transaction action

2021-04-26 Thread Kevin Wolf
Am 17.03.2021 um 15:35 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block.c | 78 +++--
>  1 file changed, 76 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 11f7ad0818..2fca1f2ad5 100644
> --- a/block.c
> +++ b/block.c
> @@ -2929,12 +2929,19 @@ static void bdrv_replace_child(BdrvChild *child, 
> BlockDriverState *new_bs)
>  }
>  }
>  
> +static void bdrv_child_free(void *opaque)
> +{
> +BdrvChild *c = opaque;
> +
> +g_free(c->name);
> +g_free(c);
> +}
> +
>  static void bdrv_remove_empty_child(BdrvChild *child)
>  {
>  assert(!child->bs);
>  QLIST_SAFE_REMOVE(child, next);
> -g_free(child->name);
> -g_free(child);
> +bdrv_child_free(child);
>  }
>  
>  typedef struct BdrvAttachChildCommonState {
> @@ -4956,6 +4963,73 @@ static bool should_update_child(BdrvChild *c, 
> BlockDriverState *to)
>  return ret;
>  }
>  
> +typedef struct BdrvRemoveFilterOrCowChild {
> +BdrvChild *child;
> +bool is_backing;
> +} BdrvRemoveFilterOrCowChild;
> +
> +/* this doesn't restore original child bs, only the child itself */

Hm, this comment tells me that it's intentional, but why is it correct?

> +static void bdrv_remove_filter_or_cow_child_abort(void *opaque)
> +{
> +BdrvRemoveFilterOrCowChild *s = opaque;
> +BlockDriverState *parent_bs = s->child->opaque;
> +
> +QLIST_INSERT_HEAD(_bs->children, s->child, next);
> +if (s->is_backing) {
> +parent_bs->backing = s->child;
> +} else {
> +parent_bs->file = s->child;
> +}
> +}

Kevin




Re: [PATCH v3 18/36] block: add bdrv_attach_child_common() transaction action

2021-04-26 Thread Kevin Wolf
Am 17.03.2021 um 15:35 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Split out no-perm part of bdrv_root_attach_child() into separate
> transaction action. bdrv_root_attach_child() now moves to new
> permission update paradigm: first update graph relations then update
> permissions.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block.c | 189 
>  1 file changed, 135 insertions(+), 54 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 98ff44dbf7..b6bdc534d2 100644
> --- a/block.c
> +++ b/block.c
> @@ -2921,37 +2921,73 @@ static void bdrv_replace_child(BdrvChild *child, 
> BlockDriverState *new_bs)
>  }
>  }
>  
> -/*
> - * This function steals the reference to child_bs from the caller.
> - * That reference is later dropped by bdrv_root_unref_child().
> - *
> - * On failure NULL is returned, errp is set and the reference to
> - * child_bs is also dropped.
> - *
> - * The caller must hold the AioContext lock @child_bs, but not that of @ctx
> - * (unless @child_bs is already in @ctx).
> - */
> -BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
> -  const char *child_name,
> -  const BdrvChildClass *child_class,
> -  BdrvChildRole child_role,
> -  uint64_t perm, uint64_t shared_perm,
> -  void *opaque, Error **errp)
> +static void bdrv_remove_empty_child(BdrvChild *child)
>  {
> -BdrvChild *child;
> -Error *local_err = NULL;
> -int ret;
> -AioContext *ctx;
> +assert(!child->bs);
> +QLIST_SAFE_REMOVE(child, next);
> +g_free(child->name);
> +g_free(child);
> +}
>  
> -ret = bdrv_check_update_perm(child_bs, NULL, perm, shared_perm, NULL, 
> errp);
> -if (ret < 0) {
> -bdrv_abort_perm_update(child_bs);
> -bdrv_unref(child_bs);
> -return NULL;
> +typedef struct BdrvAttachChildCommonState {
> +BdrvChild **child;
> +AioContext *old_parent_ctx;
> +AioContext *old_child_ctx;
> +} BdrvAttachChildCommonState;
> +
> +static void bdrv_attach_child_common_abort(void *opaque)
> +{
> +BdrvAttachChildCommonState *s = opaque;
> +BdrvChild *child = *s->child;
> +BlockDriverState *bs = child->bs;
> +
> +bdrv_replace_child_noperm(child, NULL);
> +
> +if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
> +bdrv_try_set_aio_context(bs, s->old_child_ctx, _abort);
>  }
>  
> -child = g_new(BdrvChild, 1);
> -*child = (BdrvChild) {
> +if (bdrv_child_get_parent_aio_context(child) != s->old_parent_ctx) {
> +GSList *ignore = g_slist_prepend(NULL, child);
> +
> +child->klass->can_set_aio_ctx(child, s->old_parent_ctx, ,
> +  _abort);
> +g_slist_free(ignore);
> +ignore = g_slist_prepend(NULL, child);
> +child->klass->set_aio_ctx(child, s->old_parent_ctx, );
> +
> +g_slist_free(ignore);
> +}
> +
> +bdrv_unref(bs);
> +bdrv_remove_empty_child(child);
> +*s->child = NULL;
> +}
> +
> +static TransactionActionDrv bdrv_attach_child_common_drv = {
> +.abort = bdrv_attach_child_common_abort,
> +};
> +
> +/*
> + * Common part of attoching bdrv child to bs or to blk or to job
> + */
> +static int bdrv_attach_child_common(BlockDriverState *child_bs,
> +const char *child_name,
> +const BdrvChildClass *child_class,
> +BdrvChildRole child_role,
> +uint64_t perm, uint64_t shared_perm,
> +void *opaque, BdrvChild **child,
> +Transaction *tran, Error **errp)
> +{
> +BdrvChild *new_child;
> +AioContext *parent_ctx;
> +AioContext *child_ctx = bdrv_get_aio_context(child_bs);
> +
> +assert(child);
> +assert(*child == NULL);
> +
> +new_child = g_new(BdrvChild, 1);
> +*new_child = (BdrvChild) {
>  .bs = NULL,
>  .name   = g_strdup(child_name),
>  .klass  = child_class,
> @@ -2961,37 +2997,92 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState 
> *child_bs,
>  .opaque = opaque,
>  };
>  
> -ctx = bdrv_child_get_parent_aio_context(child);
> -
> -/* If the AioContexts don't match, first try to move the subtree of
> +/*
> + * If the AioContexts don't match, first try to move the subtree of
>   * child_bs into the AioContext of the new parent. If this doesn't work,
> - * try moving the parent into the AioContext of child_bs instead. */
> -if (bdrv_get_aio_context(child_bs) != ctx) {
> -ret = bdrv_try_set_aio_context(child_bs, ctx, _err);
> + * try moving the parent into the AioContext of child_bs instead.
> + */
> +parent_ctx = 

Re: [PATCH for-6.0 v2 1/2] hw/block/nvme: fix invalid msix exclusive uninit

2021-04-26 Thread Peter Maydell
On Fri, 23 Apr 2021 at 06:21, Klaus Jensen  wrote:
>
> From: Klaus Jensen 
>
> Commit 1901b4967c3f changed the nvme device from using a bar exclusive
> for MSI-x to sharing it on bar0.
>
> Unfortunately, the msix_uninit_exclusive_bar() call remains in
> nvme_exit() which causes havoc when the device is removed with, say,
> device_del. Fix this.
>
> Additionally, a subregion is added but it is not removed on exit which
> causes a reference to linger and the drive to never be unlocked.
>
> Fixes: 1901b4967c3f ("hw/block/nvme: move msix table and pba to BAR 0")
> Signed-off-by: Klaus Jensen 
> ---
>  hw/block/nvme.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 624a1431d072..5fe082ec34c5 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -6235,7 +6235,8 @@ static void nvme_exit(PCIDevice *pci_dev)
>  if (n->pmr.dev) {
>  host_memory_backend_set_mapped(n->pmr.dev, false);
>  }
> -msix_uninit_exclusive_bar(pci_dev);
> +msix_uninit(pci_dev, >bar0, >bar0);
> +memory_region_del_subregion(>bar0, >iomem);
>  }
>
>  static Property nvme_props[] = {
> --

Applied this patch (but not patch 2) to master for 6.0; thanks.

-- PMM



Re: [PATCH 2/5] hw/pcmcia/microdrive: Register machine reset handler

2021-04-26 Thread Philippe Mathieu-Daudé
On 4/25/21 8:36 PM, Peter Maydell wrote:
> On Sat, 24 Apr 2021 at 17:22, Philippe Mathieu-Daudé  wrote:
>>
>> The abstract PCMCIA_CARD is a bus-less TYPE_DEVICE, so devices
>> implementing it are not reset automatically.
>> Register a reset handler so children get reset on machine reset.
>>
>> Note, the DSCM-1 device (TYPE_DSCM1) which inherits
>> TYPE_MICRODRIVE and PCMCIA_CARD reset itself when a disk is
>> attached or detached, but was not resetting itself on machine
>> reset.
>>
>> It doesn't seem to be an issue because it is that way since the
>> device QDev'ifycation 8 years ago, in commit d1f2c96a81a
>> ("pcmcia: QOM'ify PCMCIACardState and MicroDriveState").
>> Still, correct to have a proper API usage.
>>
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  hw/pcmcia/pcmcia.c | 25 +
>>  1 file changed, 25 insertions(+)
>>
>> diff --git a/hw/pcmcia/pcmcia.c b/hw/pcmcia/pcmcia.c
>> index 03d13e7d670..73656257227 100644
>> --- a/hw/pcmcia/pcmcia.c
>> +++ b/hw/pcmcia/pcmcia.c
>> @@ -6,14 +6,39 @@
>>
>>  #include "qemu/osdep.h"
>>  #include "qemu/module.h"
>> +#include "sysemu/reset.h"
>>  #include "hw/pcmcia.h"
>>
>> +static void pcmcia_card_reset_handler(void *dev)
>> +{
>> +device_legacy_reset(DEVICE(dev));
>> +}
>> +
>> +static void pcmcia_card_realize(DeviceState *dev, Error **errp)
>> +{
>> +qemu_register_reset(pcmcia_card_reset_handler, dev);
>> +}
>> +
>> +static void pcmcia_card_unrealize(DeviceState *dev)
>> +{
>> +qemu_unregister_reset(pcmcia_card_reset_handler, dev);
>> +}
> 
> Why isn't a pcmcia card something that plugs into a bus ?

No clue, looks like a very old device with unfinished qdev-ification?

See pxa2xx_pcmcia_attach():

/* Insert a new card into a slot */
int pxa2xx_pcmcia_attach(void *opaque, PCMCIACardState *card)
{
PXA2xxPCMCIAState *s = (PXA2xxPCMCIAState *) opaque;
PCMCIACardClass *pcc;

...
s->card = card;
pcc = PCMCIA_CARD_GET_CLASS(s->card);
...
s->card->slot = >slot;
pcc->attach(s->card);
...
}



Re: [PATCH for-6.0 v2 1/2] hw/block/nvme: fix invalid msix exclusive uninit

2021-04-26 Thread Michael S. Tsirkin
On Mon, Apr 26, 2021 at 11:27:04AM +0200, Philippe Mathieu-Daudé wrote:
> On 4/26/21 6:40 AM, Klaus Jensen wrote:
> > On Apr 23 07:21, Klaus Jensen wrote:
> >> From: Klaus Jensen 
> >>
> >> Commit 1901b4967c3f changed the nvme device from using a bar exclusive
> >> for MSI-x to sharing it on bar0.
> >>
> >> Unfortunately, the msix_uninit_exclusive_bar() call remains in
> >> nvme_exit() which causes havoc when the device is removed with, say,
> >> device_del. Fix this.
> >>
> >> Additionally, a subregion is added but it is not removed on exit which
> >> causes a reference to linger and the drive to never be unlocked.
> >>
> >> Fixes: 1901b4967c3f ("hw/block/nvme: move msix table and pba to BAR 0")
> >> Signed-off-by: Klaus Jensen 

Reviewed-by: Michael S. Tsirkin 

> >> ---
> >> hw/block/nvme.c | 3 ++-
> >> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> >> index 624a1431d072..5fe082ec34c5 100644
> >> --- a/hw/block/nvme.c
> >> +++ b/hw/block/nvme.c
> >> @@ -6235,7 +6235,8 @@ static void nvme_exit(PCIDevice *pci_dev)
> >>     if (n->pmr.dev) {
> >>     host_memory_backend_set_mapped(n->pmr.dev, false);
> >>     }
> >> -    msix_uninit_exclusive_bar(pci_dev);
> >> +    msix_uninit(pci_dev, >bar0, >bar0);
> >> +    memory_region_del_subregion(>bar0, >iomem);
> >> }
> >>
> >> static Property nvme_props[] = {
> >> -- 
> >> 2.31.1
> >>
> > 
> > Ping for a review on this please :)
> 
> You forgot to Cc the maintainers :/ (doing it now).
> 
> $ ./scripts/get_maintainer.pl -f include/hw/pci/msix.h
> "Michael S. Tsirkin"  (supporter:PCI)
> Marcel Apfelbaum  (supporter:PCI)




Re: [PATCH 03/11] block/block-gen.h: bind monitor

2021-04-26 Thread Markus Armbruster
Vladimir Sementsov-Ogievskiy  writes:

> 24.04.2021 08:23, Markus Armbruster wrote:
>> Vladimir Sementsov-Ogievskiy  writes:
>> 
>>> If we have current monitor, let's bind it to wrapper coroutine too.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/block-gen.h | 10 ++
>>>   1 file changed, 10 insertions(+)
>>>
>>> diff --git a/block/block-gen.h b/block/block-gen.h
>>> index c1fd3f40de..61f055a8cc 100644
>>> --- a/block/block-gen.h
>>> +++ b/block/block-gen.h
>>> @@ -27,6 +27,7 @@
>>>   #define BLOCK_BLOCK_GEN_H
>>>   
>>>   #include "block/block_int.h"
>>> +#include "monitor/monitor.h"
>>>   
>>>   /* Base structure for argument packing structures */
>>>   typedef struct AioPollCo {
>>> @@ -38,11 +39,20 @@ typedef struct AioPollCo {
>>>   
>>>   static inline int aio_poll_co(AioPollCo *s)
>>>   {
>>> +Monitor *mon = monitor_cur();
>> 
>> This gets the currently executing coroutine's monitor from the hash
>> table.
>> 
>>>   assert(!qemu_in_coroutine());
>>>   
>>> +if (mon) {
>>> +monitor_set_cur(s->co, mon);
>> 
>> This writes it back.  No-op, since the coroutine hasn't changed.  Why?
>
> No. s->co != qemu_corotuine_current(), so it's not a write back, but creating 
> a new entry in the hash map. s->co is a new coroutine which we are going to 
> start.

Ah, that's what I missed.  Thanks!

[...]




Re: [PATCH v6 00/12] qcow2: fix parallel rewrite and discard (lockless)

2021-04-26 Thread Vladimir Sementsov-Ogievskiy

22.04.2021 19:30, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

It's an alternative lock-less solution to
   [PATCH v4 0/3] qcow2: fix parallel rewrite and discard (rw-lock)

In v6 a lot of things are rewritten.

What is changed:

1. rename the feature to host_range_refcnt, move it to separate file
2. better naming for everything (I hope)
3. cover reads, not only writes
4. do "ref" in qcow2_get_host_offset(), qcow2_alloc_host_offset(),
 qcow2_alloc_compressed_cluster_offset().
and callers do "unref" appropriately.




About performance. With these series we do extra allocations and hash-map 
operations.. Still testing by

./build/qemu-img bench -c 100 -s 4K --image-opts driver=null-co,size=5G

and

./build/qemu-img bench -c 100 -s 4K -w --image-opts driver=null-co,size=5G

I see difference less than 1%.


--
Best regards,
Vladimir



Re: [PATCH] hw/block/nvme: fix csi field for cns 0x00 and 0x11

2021-04-26 Thread Klaus Jensen

On Apr 26 13:16, Gollu Appalanaidu wrote:

As per the TP 4056d Namespace types CNS 0x00 and CNS 0x11
CSI field shouldn't use but it is being used for these two
Identify command CNS values, fix that.

Signed-off-by: Gollu Appalanaidu 
---
hw/nvme/ctrl.c | 11 ---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 2e7498a73e..1657b1d04a 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4244,11 +4244,16 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, 
NvmeRequest *req, bool active)
}
}

-if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) {
-return nvme_c2h(n, (uint8_t *)>id_ns, sizeof(NvmeIdNs), req);
+if (active && nvme_csi_has_nvm_support(ns)) {
+goto out;
+} else if (!active && ns->csi == NVME_CSI_NVM) {
+goto out;
+} else {
+return NVME_INVALID_CMD_SET | NVME_DNR;
}

-return NVME_INVALID_CMD_SET | NVME_DNR;
+out:
+return nvme_c2h(n, (uint8_t *)>id_ns, sizeof(NvmeIdNs), req);
}

static uint16_t nvme_identify_ns_attached_list(NvmeCtrl *n, NvmeRequest *req)
--
2.17.1




Looking closer at this, since we only support the NVM and Zoned command 
sets, we can get rid of the `nvme_csi_has_nvm_support()` helper and just 
assume NVM command set support for all namespaces. The way different 
command sets are handled doesn't scale anyway, so we might as well 
simplify the logic a bit.


Something like this (compile-tested only) patch maybe?

diff --git i/hw/nvme/ctrl.c w/hw/nvme/ctrl.c
index 2e7498a73e70..7fcd6992358d 100644
--- i/hw/nvme/ctrl.c
+++ w/hw/nvme/ctrl.c
@@ -4178,16 +4178,6 @@ static uint16_t nvme_rpt_empty_id_struct(NvmeCtrl *n, 
NvmeRequest *req)
 return nvme_c2h(n, id, sizeof(id), req);
 }

-static inline bool nvme_csi_has_nvm_support(NvmeNamespace *ns)
-{
-switch (ns->csi) {
-case NVME_CSI_NVM:
-case NVME_CSI_ZONED:
-return true;
-}
-return false;
-}
-
 static uint16_t nvme_identify_ctrl(NvmeCtrl *n, NvmeRequest *req)
 {
 trace_pci_nvme_identify_ctrl();
@@ -4244,7 +4234,7 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest 
*req, bool active)
 }
 }

-if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) {
+if (active || ns->csi == NVME_CSI_NVM) {
 return nvme_c2h(n, (uint8_t *)>id_ns, sizeof(NvmeIdNs), req);
 }

@@ -4315,7 +4305,7 @@ static uint16_t nvme_identify_ns_csi(NvmeCtrl *n, 
NvmeRequest *req,
 }
 }

-if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) {
+if (c->csi == NVME_CSI_NVM) {
 return nvme_rpt_empty_id_struct(n, req);
 } else if (c->csi == NVME_CSI_ZONED && ns->csi == NVME_CSI_ZONED) {
 return nvme_c2h(n, (uint8_t *)ns->id_ns_zoned, sizeof(NvmeIdNsZoned),



signature.asc
Description: PGP signature


Re: [PATCH 1/2] block/export: Free ignored Error

2021-04-26 Thread Vladimir Sementsov-Ogievskiy

26.04.2021 13:33, Max Reitz wrote:

On 26.04.21 11:44, Vladimir Sementsov-Ogievskiy wrote:

22.04.2021 17:53, Max Reitz wrote:

When invoking block-export-add with some iothread and
fixed-iothread=false, and changing the node's iothread fails, the error
is supposed to be ignored.

However, it is still stored in *errp, which is wrong.  If a second error
occurs, the "*errp must be NULL" assertion in error_setv() fails:

   qemu-system-x86_64: ../util/error.c:59: error_setv: Assertion
   `*errp == NULL' failed.

So the error from bdrv_try_set_aio_context() must be freed when it is
ignored.

Fixes: f51d23c80af73c95e0ce703ad06a300f1b3d63ef
    ("block/export: add iothread and fixed-iothread options")
Signed-off-by: Max Reitz 
---
  block/export/export.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/block/export/export.c b/block/export/export.c
index fec7d9f738..ce5dd3e59b 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -68,6 +68,7 @@ static const BlockExportDriver 
*blk_exp_find_driver(BlockExportType type)
  BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
  {
+    ERRP_GUARD();
  bool fixed_iothread = export->has_fixed_iothread && 
export->fixed_iothread;
  const BlockExportDriver *drv;
  BlockExport *exp = NULL;
@@ -127,6 +128,9 @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error 
**errp)
  ctx = new_ctx;
  } else if (fixed_iothread) {
  goto fail;
+    } else {
+    error_free(*errp);
+    *errp = NULL;
  }
  }



I don't think ERRP_GUARD is needed in this case: we don't need to handle errp 
somehow except for just free if it was set.


Perhaps not, but style-wise, I prefer not special-casing the
errp == NULL case.

(It can be argued that ERRP_GUARD similarly special-cases it, but that’s hidden 
from my view.  Also, the errp == NULL case actually doesn’t even happen, so 
ERRP_GUARD is effectively a no-op and it won’t cost performance (not that it 
really matters).)


Hm. I don't know. May be you are right.. Actually, I don't care too much, so, 
patch is OK as is:

Reviewed-by: Vladimir Sementsov-Ogievskiy 



Of course we could also do this:

ret = bdrv_try_set_aio_context(bs, new_ctx, fixed_iothread ? errp : NULL);

Would be even shorter.


So we can simply do:

} else if (errp) {
    error_free(*errp);
    *errp = NULL;
}

Let's only check that errp is really set on failure path of 
bdrv_try_set_aio_context():


OK,  but out of interest, why?  error_free() doesn’t care.  I mean it might be 
a problem if blk_exp_add() returns an error without setting *errp, but 
that’d’ve been pre-existing.



I remember we still have some functions not setting errp on some error paths.. 
bdrv_open_driver() has work-around for such bad .*open handlers of some 
drivers... So I decided to look through.




bdrv_try_set_aio_context() fails iff bdrv_can_set_aio_context() fails, which in 
turn may fail iff bdrv_parent_can_set_aio_context() or 
bdrv_child_can_set_aio_context() fails.

bdrv_parent_can_set_aio_context() has two failure path, on first it set errp by 
hand, and on second it has assertion that errp is set.

bdrv_child_can_set_aio_context() may fail only if nested call to 
bdrv_can_set_aio_context() fails, so recursion is closed.







--
Best regards,
Vladimir



Re: [PATCH 1/2] block/export: Free ignored Error

2021-04-26 Thread Max Reitz

On 26.04.21 11:44, Vladimir Sementsov-Ogievskiy wrote:

22.04.2021 17:53, Max Reitz wrote:

When invoking block-export-add with some iothread and
fixed-iothread=false, and changing the node's iothread fails, the error
is supposed to be ignored.

However, it is still stored in *errp, which is wrong.  If a second error
occurs, the "*errp must be NULL" assertion in error_setv() fails:

   qemu-system-x86_64: ../util/error.c:59: error_setv: Assertion
   `*errp == NULL' failed.

So the error from bdrv_try_set_aio_context() must be freed when it is
ignored.

Fixes: f51d23c80af73c95e0ce703ad06a300f1b3d63ef
    ("block/export: add iothread and fixed-iothread options")
Signed-off-by: Max Reitz 
---
  block/export/export.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/block/export/export.c b/block/export/export.c
index fec7d9f738..ce5dd3e59b 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -68,6 +68,7 @@ static const BlockExportDriver 
*blk_exp_find_driver(BlockExportType type)

  BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)
  {
+    ERRP_GUARD();
  bool fixed_iothread = export->has_fixed_iothread && 
export->fixed_iothread;

  const BlockExportDriver *drv;
  BlockExport *exp = NULL;
@@ -127,6 +128,9 @@ BlockExport *blk_exp_add(BlockExportOptions 
*export, Error **errp)

  ctx = new_ctx;
  } else if (fixed_iothread) {
  goto fail;
+    } else {
+    error_free(*errp);
+    *errp = NULL;
  }
  }



I don't think ERRP_GUARD is needed in this case: we don't need to handle 
errp somehow except for just free if it was set.


Perhaps not, but style-wise, I prefer not special-casing the
errp == NULL case.

(It can be argued that ERRP_GUARD similarly special-cases it, but that’s 
hidden from my view.  Also, the errp == NULL case actually doesn’t even 
happen, so ERRP_GUARD is effectively a no-op and it won’t cost 
performance (not that it really matters).)


Of course we could also do this:

ret = bdrv_try_set_aio_context(bs, new_ctx, fixed_iothread ? errp : NULL);

Would be even shorter.


So we can simply do:

} else if (errp) {
    error_free(*errp);
    *errp = NULL;
}

Let's only check that errp is really set on failure path of 
bdrv_try_set_aio_context():


OK,  but out of interest, why?  error_free() doesn’t care.  I mean it 
might be a problem if blk_exp_add() returns an error without setting 
*errp, but that’d’ve been pre-existing.


Max

bdrv_try_set_aio_context() fails iff bdrv_can_set_aio_context() fails, 
which in turn may fail iff bdrv_parent_can_set_aio_context() or 
bdrv_child_can_set_aio_context() fails.


bdrv_parent_can_set_aio_context() has two failure path, on first it set 
errp by hand, and on second it has assertion that errp is set.


bdrv_child_can_set_aio_context() may fail only if nested call to 
bdrv_can_set_aio_context() fails, so recursion is closed.








Re: [PATCH 2/2] iotests/307: Test iothread conflict for exports

2021-04-26 Thread Vladimir Sementsov-Ogievskiy

22.04.2021 17:53, Max Reitz wrote:

Passing fixed-iothread=true should make iothread conflicts fatal,
whereas fixed-iothread=false should not.

Combine the second case with an error condition that is checked after
the iothread is handled, to verify that qemu does not crash if there is
such an error after changing the iothread failed.

Signed-off-by: Max Reitz 


Reviewed-by: Vladimir Sementsov-Ogievskiy 
Tested-by: Vladimir Sementsov-Ogievskiy 


--
Best regards,
Vladimir



Re: [PATCH 0/2] iotests/qsd-jobs: Use common.qemu for the QSD

2021-04-26 Thread Stefan Hajnoczi
On Thu, Apr 01, 2021 at 03:28:13PM +0200, Max Reitz wrote:
> (Alternative to: “iotests/qsd-jobs: Filter events in the first test”)
> 
> Hi,
> 
> The qsd-jobs test has kind of unreliable output, because sometimes the
> job is ready before ‘quit’, and sometimes it is not.  This series
> presents one approach to fix that, which is to extend common.qemu to
> allow running the storage daemon instead of qemu, and then to use that
> in qsd-jobs to wait for the BLOCK_JOB_READY event before issuing the
> ‘quit’ command.
> 
> I took patch 1 from my “qcow2: Improve refcount structure rebuilding”
> series.
> (https://lists.nongnu.org/archive/html/qemu-block/2021-03/msg00654.html)
> 
> As noted above, this series is an alternative to “iotests/qsd-jobs:
> Filter events in the first test”.  I like this series here better
> because I’d prefer it if tests that do QMP actually check the output so
> they control what’s really happening.
> On the other hand, this may be too complicated for 6.0, and we might
> want to fix qsd-jobs in 6.0.
> 
> 
> Max Reitz (2):
>   iotests/common.qemu: Allow using the QSD
>   iotests/qsd-jobs: Use common.qemu for the QSD
> 
>  tests/qemu-iotests/common.qemu| 53 +-
>  tests/qemu-iotests/tests/qsd-jobs | 55 ---
>  tests/qemu-iotests/tests/qsd-jobs.out | 10 -
>  3 files changed, 92 insertions(+), 26 deletions(-)
> 
> -- 
> 2.29.2
> 

Acked-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature


Re: [PATCH 1/2] block/export: Free ignored Error

2021-04-26 Thread Vladimir Sementsov-Ogievskiy

22.04.2021 17:53, Max Reitz wrote:

When invoking block-export-add with some iothread and
fixed-iothread=false, and changing the node's iothread fails, the error
is supposed to be ignored.

However, it is still stored in *errp, which is wrong.  If a second error
occurs, the "*errp must be NULL" assertion in error_setv() fails:

   qemu-system-x86_64: ../util/error.c:59: error_setv: Assertion
   `*errp == NULL' failed.

So the error from bdrv_try_set_aio_context() must be freed when it is
ignored.

Fixes: f51d23c80af73c95e0ce703ad06a300f1b3d63ef
("block/export: add iothread and fixed-iothread options")
Signed-off-by: Max Reitz 
---
  block/export/export.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/block/export/export.c b/block/export/export.c
index fec7d9f738..ce5dd3e59b 100644
--- a/block/export/export.c
+++ b/block/export/export.c
@@ -68,6 +68,7 @@ static const BlockExportDriver 
*blk_exp_find_driver(BlockExportType type)
  
  BlockExport *blk_exp_add(BlockExportOptions *export, Error **errp)

  {
+ERRP_GUARD();
  bool fixed_iothread = export->has_fixed_iothread && 
export->fixed_iothread;
  const BlockExportDriver *drv;
  BlockExport *exp = NULL;
@@ -127,6 +128,9 @@ BlockExport *blk_exp_add(BlockExportOptions *export, Error 
**errp)
  ctx = new_ctx;
  } else if (fixed_iothread) {
  goto fail;
+} else {
+error_free(*errp);
+*errp = NULL;
  }
  }
  



I don't think ERRP_GUARD is needed in this case: we don't need to handle errp 
somehow except for just free if it was set. So we can simply do:

} else if (errp) {
   error_free(*errp);
   *errp = NULL;
}

Let's only check that errp is really set on failure path of 
bdrv_try_set_aio_context():

bdrv_try_set_aio_context() fails iff bdrv_can_set_aio_context() fails, which in 
turn may fail iff bdrv_parent_can_set_aio_context() or 
bdrv_child_can_set_aio_context() fails.

bdrv_parent_can_set_aio_context() has two failure path, on first it set errp by 
hand, and on second it has assertion that errp is set.

bdrv_child_can_set_aio_context() may fail only if nested call to 
bdrv_can_set_aio_context() fails, so recursion is closed.


--
Best regards,
Vladimir



Re: [PATCH for-6.0 v2 1/2] hw/block/nvme: fix invalid msix exclusive uninit

2021-04-26 Thread Klaus Jensen

On Apr 26 11:27, Philippe Mathieu-Daudé wrote:

On 4/26/21 6:40 AM, Klaus Jensen wrote:

On Apr 23 07:21, Klaus Jensen wrote:

From: Klaus Jensen 

Commit 1901b4967c3f changed the nvme device from using a bar exclusive
for MSI-x to sharing it on bar0.

Unfortunately, the msix_uninit_exclusive_bar() call remains in
nvme_exit() which causes havoc when the device is removed with, say,
device_del. Fix this.

Additionally, a subregion is added but it is not removed on exit which
causes a reference to linger and the drive to never be unlocked.

Fixes: 1901b4967c3f ("hw/block/nvme: move msix table and pba to BAR 0")
Signed-off-by: Klaus Jensen 
---
hw/block/nvme.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 624a1431d072..5fe082ec34c5 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -6235,7 +6235,8 @@ static void nvme_exit(PCIDevice *pci_dev)
    if (n->pmr.dev) {
    host_memory_backend_set_mapped(n->pmr.dev, false);
    }
-    msix_uninit_exclusive_bar(pci_dev);
+    msix_uninit(pci_dev, >bar0, >bar0);
+    memory_region_del_subregion(>bar0, >iomem);
}

static Property nvme_props[] = {
-- 
2.31.1



Ping for a review on this please :)


You forgot to Cc the maintainers :/ (doing it now).

$ ./scripts/get_maintainer.pl -f include/hw/pci/msix.h
"Michael S. Tsirkin"  (supporter:PCI)
Marcel Apfelbaum  (supporter:PCI)



I didnt consider CC'ing the PCI maintainers directly, but makes total 
sense here, thanks!


signature.asc
Description: PGP signature


Re: [PATCH for-6.0 v2 1/2] hw/block/nvme: fix invalid msix exclusive uninit

2021-04-26 Thread Philippe Mathieu-Daudé
On 4/26/21 6:40 AM, Klaus Jensen wrote:
> On Apr 23 07:21, Klaus Jensen wrote:
>> From: Klaus Jensen 
>>
>> Commit 1901b4967c3f changed the nvme device from using a bar exclusive
>> for MSI-x to sharing it on bar0.
>>
>> Unfortunately, the msix_uninit_exclusive_bar() call remains in
>> nvme_exit() which causes havoc when the device is removed with, say,
>> device_del. Fix this.
>>
>> Additionally, a subregion is added but it is not removed on exit which
>> causes a reference to linger and the drive to never be unlocked.
>>
>> Fixes: 1901b4967c3f ("hw/block/nvme: move msix table and pba to BAR 0")
>> Signed-off-by: Klaus Jensen 
>> ---
>> hw/block/nvme.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
>> index 624a1431d072..5fe082ec34c5 100644
>> --- a/hw/block/nvme.c
>> +++ b/hw/block/nvme.c
>> @@ -6235,7 +6235,8 @@ static void nvme_exit(PCIDevice *pci_dev)
>>     if (n->pmr.dev) {
>>     host_memory_backend_set_mapped(n->pmr.dev, false);
>>     }
>> -    msix_uninit_exclusive_bar(pci_dev);
>> +    msix_uninit(pci_dev, >bar0, >bar0);
>> +    memory_region_del_subregion(>bar0, >iomem);
>> }
>>
>> static Property nvme_props[] = {
>> -- 
>> 2.31.1
>>
> 
> Ping for a review on this please :)

You forgot to Cc the maintainers :/ (doing it now).

$ ./scripts/get_maintainer.pl -f include/hw/pci/msix.h
"Michael S. Tsirkin"  (supporter:PCI)
Marcel Apfelbaum  (supporter:PCI)




Re: [PATCH 03/11] block/block-gen.h: bind monitor

2021-04-26 Thread Vladimir Sementsov-Ogievskiy

24.04.2021 08:23, Markus Armbruster wrote:

Vladimir Sementsov-Ogievskiy  writes:


If we have current monitor, let's bind it to wrapper coroutine too.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/block-gen.h | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/block/block-gen.h b/block/block-gen.h
index c1fd3f40de..61f055a8cc 100644
--- a/block/block-gen.h
+++ b/block/block-gen.h
@@ -27,6 +27,7 @@
  #define BLOCK_BLOCK_GEN_H
  
  #include "block/block_int.h"

+#include "monitor/monitor.h"
  
  /* Base structure for argument packing structures */

  typedef struct AioPollCo {
@@ -38,11 +39,20 @@ typedef struct AioPollCo {
  
  static inline int aio_poll_co(AioPollCo *s)

  {
+Monitor *mon = monitor_cur();


This gets the currently executing coroutine's monitor from the hash
table.


  assert(!qemu_in_coroutine());
  
+if (mon) {

+monitor_set_cur(s->co, mon);


This writes it back.  No-op, since the coroutine hasn't changed.  Why?


No. s->co != qemu_corotuine_current(), so it's not a write back, but creating a 
new entry in the hash map. s->co is a new coroutine which we are going to start.




+}
+
  aio_co_enter(s->ctx, s->co);
  AIO_WAIT_WHILE(s->ctx, s->in_progress);
  
+if (mon) {

+monitor_set_cur(s->co, NULL);


This removes s->co's monitor from the hash table.  Why?


+}
+
  return s->ret;
  }




If I comment the new code of this patch (keeping the whole series applied), 249 
fails, as error message goes simply to stderr, not to monitor:

249   fail   [11:56:54] [11:56:54]   0.3s   (last: 0.2s)  output mismatch 
(see 249.out.bad)
--- /work/src/qemu/up/hmp-qemu-io/tests/qemu-iotests/249.out
+++ 249.out.bad
@@ -9,7 +9,8 @@

 { 'execute': 'human-monitor-command',
'arguments': {'command-line': 'qemu-io none0 "aio_write 0 2k"'}}
-{"return": "Block node is read-onlyrn"}
+QEMU_PROG: Block node is read-only
+{"return": ""}

 === Run block-commit on base using an invalid filter node name

@@ -24,7 +25,8 @@

 { 'execute': 'human-monitor-command',
'arguments': {'command-line': 'qemu-io none0 "aio_write 0 2k"'}}
-{"return": "Block node is read-onlyrn"}
+QEMU_PROG: Block node is read-only
+{"return": ""}

 === Run block-commit on base using the default filter node name

@@ -43,5 +45,6 @@

 { 'execute': 'human-monitor-command',
'arguments': {'command-line': 'qemu-io none0 "aio_write 0 2k"'}}
-{"return": "Block node is read-onlyrn"}
+QEMU_PROG: Block node is read-only
+{"return": ""}
 *** done
Failures: 249
Failed 1 of 1 iotests



--
Best regards,
Vladimir



Re: [PATCH v3 06/33] util/async: aio_co_schedule(): support reschedule in same ctx

2021-04-26 Thread Vladimir Sementsov-Ogievskiy

23.04.2021 13:09, Roman Kagan wrote:

On Fri, Apr 16, 2021 at 11:08:44AM +0300, Vladimir Sementsov-Ogievskiy wrote:

With the following patch we want to call wake coroutine from thread.
And it doesn't work with aio_co_wake:
Assume we have no iothreads.
Assume we have a coroutine A, which waits in the yield point for
external aio_co_wake(), and no progress can be done until it happen.
Main thread is in blocking aio_poll() (for example, in blk_read()).

Now, in a separate thread we do aio_co_wake(). It calls  aio_co_enter(),
which goes through last "else" branch and do aio_context_acquire(ctx).

Now we have a deadlock, as aio_poll() will not release the context lock
until some progress is done, and progress can't be done until
aio_co_wake() wake the coroutine A. And it can't because it wait for
aio_context_acquire().

Still, aio_co_schedule() works well in parallel with blocking
aio_poll(). So we want use it. Let's add a possibility of rescheduling
coroutine in same ctx where it was yield'ed.

Fetch co->ctx in same way as it is done in aio_co_wake().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/aio.h | 2 +-
  util/async.c| 8 
  2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 5f342267d5..744b695525 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -643,7 +643,7 @@ static inline bool aio_node_check(AioContext *ctx, bool 
is_external)
  
  /**

   * aio_co_schedule:
- * @ctx: the aio context
+ * @ctx: the aio context, if NULL, the current ctx of @co will be used.
   * @co: the coroutine
   *
   * Start a coroutine on a remote AioContext.
diff --git a/util/async.c b/util/async.c
index 674dbefb7c..750be555c6 100644
--- a/util/async.c
+++ b/util/async.c
@@ -545,6 +545,14 @@ fail:
  
  void aio_co_schedule(AioContext *ctx, Coroutine *co)

  {
+if (!ctx) {
+/*
+ * Read coroutine before co->ctx.  Matches smp_wmb in
+ * qemu_coroutine_enter.
+ */
+smp_read_barrier_depends();
+ctx = qatomic_read(>ctx);
+}


I'd rather not extend this interface, but add a new one on top.  And
document how it's different from aio_co_wake().



Agree, that's better. Will do.



--
Best regards,
Vladimir



Re: [PATCH 00/14] hw(/block/)nvme: spring cleaning

2021-04-26 Thread Klaus Jensen

On Apr 19 21:27, Klaus Jensen wrote:

From: Klaus Jensen 

This series consists of various clean up patches.

The final patch moves nvme emulation from hw/block to hw/nvme.

Klaus Jensen (14):
 hw/block/nvme: rename __nvme_zrm_open
 hw/block/nvme: rename __nvme_advance_zone_wp
 hw/block/nvme: rename __nvme_select_ns_iocs
 hw/block/nvme: consolidate header files
 hw/block/nvme: cleanup includes
 hw/block/nvme: remove non-shared defines from header file
 hw/block/nvme: replace nvme_ns_status
 hw/block/nvme: cache lba and ms sizes
 hw/block/nvme: add metadata offset helper
 hw/block/nvme: streamline namespace array indexing
 hw/block/nvme: remove num_namespaces member
 hw/block/nvme: remove irrelevant zone resource checks
 hw/block/nvme: move zoned constraints checks
 hw/nvme: move nvme emulation out of hw/block

meson.build   |   1 +
hw/block/nvme-dif.h   |  63 ---
hw/block/nvme-ns.h| 229 -
hw/block/nvme-subsys.h|  59 ---
hw/block/nvme.h   | 266 ---
hw/nvme/nvme.h| 547 ++
hw/nvme/trace.h   |   1 +
hw/{block/nvme.c => nvme/ctrl.c}  | 204 
hw/{block/nvme-dif.c => nvme/dif.c}   |  57 +--
hw/{block/nvme-ns.c => nvme/ns.c} | 104 ++--
hw/{block/nvme-subsys.c => nvme/subsys.c} |  13 +-
MAINTAINERS   |   2 +-
hw/Kconfig|   1 +
hw/block/Kconfig  |   5 -
hw/block/meson.build  |   1 -
hw/block/trace-events | 206 
hw/meson.build|   1 +
hw/nvme/Kconfig   |   4 +
hw/nvme/meson.build   |   1 +
hw/nvme/trace-events  | 204 
20 files changed, 928 insertions(+), 1041 deletions(-)
delete mode 100644 hw/block/nvme-dif.h
delete mode 100644 hw/block/nvme-ns.h
delete mode 100644 hw/block/nvme-subsys.h
delete mode 100644 hw/block/nvme.h
create mode 100644 hw/nvme/nvme.h
create mode 100644 hw/nvme/trace.h
rename hw/{block/nvme.c => nvme/ctrl.c} (98%)
rename hw/{block/nvme-dif.c => nvme/dif.c} (90%)
rename hw/{block/nvme-ns.c => nvme/ns.c} (87%)
rename hw/{block/nvme-subsys.c => nvme/subsys.c} (85%)
create mode 100644 hw/nvme/Kconfig
create mode 100644 hw/nvme/meson.build
create mode 100644 hw/nvme/trace-events

--
2.31.1




Applied to nvme-next.


signature.asc
Description: PGP signature


[PATCH] hw/block/nvme: fix csi field for cns 0x00 and 0x11

2021-04-26 Thread Gollu Appalanaidu
As per the TP 4056d Namespace types CNS 0x00 and CNS 0x11
CSI field shouldn't use but it is being used for these two
Identify command CNS values, fix that.

Signed-off-by: Gollu Appalanaidu 
---
 hw/nvme/ctrl.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 2e7498a73e..1657b1d04a 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4244,11 +4244,16 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, 
NvmeRequest *req, bool active)
 }
 }
 
-if (c->csi == NVME_CSI_NVM && nvme_csi_has_nvm_support(ns)) {
-return nvme_c2h(n, (uint8_t *)>id_ns, sizeof(NvmeIdNs), req);
+if (active && nvme_csi_has_nvm_support(ns)) {
+goto out;
+} else if (!active && ns->csi == NVME_CSI_NVM) {
+goto out;
+} else {
+return NVME_INVALID_CMD_SET | NVME_DNR;
 }
 
-return NVME_INVALID_CMD_SET | NVME_DNR;
+out:
+return nvme_c2h(n, (uint8_t *)>id_ns, sizeof(NvmeIdNs), req);
 }
 
 static uint16_t nvme_identify_ns_attached_list(NvmeCtrl *n, NvmeRequest *req)
-- 
2.17.1




Re: [PATCH] qapi: deprecate drive-backup

2021-04-26 Thread Peter Krempa
On Fri, Apr 23, 2021 at 15:59:00 +0300, Vladimir Sementsov-Ogievskiy wrote:
> Modern way is using blockdev-add + blockdev-backup, which provides a
> lot more control on how target is opened.
> 
> As example of drive-backup problems consider the following:
> 
> User of drive-backup expects that target will be opened in the same
> cache and aio mode as source. Corresponding logic is in
> drive_backup_prepare(), where we take bs->open_flags of source.
> 
> It works rather bad if source was added by blockdev-add. Assume source
> is qcow2 image. On blockdev-add we should specify aio and cache options
> for file child of qcow2 node. What happens next:
> 
> drive_backup_prepare() looks at bs->open_flags of qcow2 source node.
> But there no BDRV_O_NOCAHE neither BDRV_O_NATIVE_AIO: BDRV_O_NOCAHE is
> places in bs->file->bs->open_flags, and BDRV_O_NATIVE_AIO is nowhere,
> as file-posix parse options and simply set s->use_linux_aio.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
> 
> Hi all! I remember, I suggested to deprecate drive-backup some time ago,
> and nobody complain.. But that old patch was inside the series with
> other more questionable deprecations and it did not landed.
> 
> Let's finally deprecate what should be deprecated long ago.
> 
> We now faced a problem in our downstream, described in commit message.
> In downstream I've fixed it by simply enabling O_DIRECT and linux_aio
> unconditionally for drive_backup target. But actually this just shows
> that using drive-backup in blockdev era is a bad idea. So let's motivate
> everyone (including Virtuozzo of course) to move to new interfaces and
> avoid problems with all that outdated option inheritance.

libvirt never used 'drive-backup' thus

Reviewed-by: Peter Krempa