Re: [PATCH v2 0/4] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci

2021-05-17 Thread Michael S. Tsirkin
On Mon, May 17, 2021 at 10:32:59AM +0200, Greg Kurz wrote:
> On Wed, 12 May 2021 17:05:53 +0100
> Stefan Hajnoczi  wrote:
> 
> > On Fri, May 07, 2021 at 06:59:01PM +0200, Greg Kurz wrote:
> > > Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
> > > a serious slow down may be observed on setups with a big enough number
> > > of vCPUs.
> > > 
> > > Exemple with a pseries guest on a bi-POWER9 socket system (128 HW 
> > > threads):
> > > 
> > >   virtio-scsi  virtio-blk
> > > 
> > > 1 0m20.922s   0m21.346s
> > > 2 0m21.230s   0m20.350s
> > > 4 0m21.761s   0m20.997s
> > > 8 0m22.770s   0m20.051s
> > > 160m22.038s   0m19.994s
> > > 320m22.928s   0m20.803s
> > > 640m26.583s   0m22.953s
> > > 128   0m41.273s   0m32.333s
> > > 256   2m4.727s1m16.924s
> > > 384   6m5.563s3m26.186s
> > > 
> > > Both perf and gprof indicate that QEMU is hogging CPUs when setting up
> > > the ioeventfds:
> > > 
> > >  67.88%  swapper [kernel.kallsyms]  [k] power_pmu_enable
> > >   9.47%  qemu-kvm[kernel.kallsyms]  [k] smp_call_function_single
> > >   8.64%  qemu-kvm[kernel.kallsyms]  [k] power_pmu_enable
> > > =>2.79%  qemu-kvmqemu-kvm   [.] 
> > > memory_region_ioeventfd_before
> > > =>2.12%  qemu-kvmqemu-kvm   [.] 
> > > address_space_update_ioeventfds
> > >   0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single
> > > 
> > > address_space_update_ioeventfds() is called when committing an MR
> > > transaction, i.e. for each ioeventfd with the current code base,
> > > and it internally loops on all ioventfds:
> > > 
> > > static void address_space_update_ioeventfds(AddressSpace *as)
> > > {
> > > [...]
> > > FOR_EACH_FLAT_RANGE(fr, view) {
> > > for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
> > > 
> > > This means that the setup of ioeventfds for these devices has
> > > quadratic time complexity.
> > > 
> > > This series simply changes the device models to extend the transaction
> > > to all virtqueueues, like already done in the past in the generic
> > > code with 710fccf80d78 ("virtio: improve virtio devices initialization
> > > time").
> > > 
> > > Only virtio-scsi and virtio-blk are covered here, but a similar change
> > > might also be beneficial to other device types such as host-scsi-pci,
> > > vhost-user-scsi-pci and vhost-user-blk-pci.
> > > 
> > >   virtio-scsi  virtio-blk
> > > 
> > > 1 0m21.271s   0m22.076s
> > > 2 0m20.912s   0m19.716s
> > > 4 0m20.508s   0m19.310s
> > > 8 0m21.374s   0m20.273s
> > > 160m21.559s   0m21.374s
> > > 320m22.532s   0m21.271s
> > > 640m26.550s   0m22.007s
> > > 128   0m29.115s   0m27.446s
> > > 256   0m44.752s   0m41.004s
> > > 384   1m2.884s0m58.023s
> > > 
> > > This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108
> > > which reported the issue for virtio-scsi-pci.
> > > 
> > > Changes since v1:
> > > - Add some comments (Stefan)
> > > - Drop optimization on the error path in patch 2 (Stefan)
> > > 
> > > Changes since RFC:
> > > 
> > > As suggested by Stefan, splimplify the code by directly beginning and
> > > committing the memory transaction from the device model, without all
> > > the virtio specific proxying code and no changes needed in the memory
> > > subsystem.
> > > 
> > > Greg Kurz (4):
> > >   virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
> > >   virtio-blk: Configure all host notifiers in a single MR transaction
> > >   virtio-scsi: Set host notifiers and callbacks separately
> > >   virtio-scsi: Configure all host notifiers in a single MR transaction
> > > 
> > >  hw/block/dataplane/virtio-blk.c | 45 -
> > >  hw/scsi/virtio-scsi-dataplane.c | 72 -
> > >  2 files changed, 97 insertions(+), 20 deletions(-)
> > > 
> > > -- 
> > > 2.26.3
> > > 
> > 
> > Thanks, applied to my block tree:
> > https://gitlab.com/stefanha/qemu/commits/block
> > 
> 
> Hi Stefan,
> 
> It seems that Michael already merged the previous version of this
> patch set with its latest PR.
> 
> https://gitlab.com/qemu-project/qemu/-/commit/6005ee07c380cbde44292f5f6c96e7daa70f4f7d
> 
> It is thus missing the v1->v2 changes. Basically some comments to
> clarify the optimization we're doing with the MR transaction and
> the removal of the optimization on an error path.
> 
> The optimization on the error path isn't needed indeed but it
> doesn't hurt. No need to change that now that the patches are
> upstream.
> 
> I can post a follow-up patch to add the missing comments though.
> While here, I'd even add these comments in the generic
> virtio_device_*_ioeventfd_impl() calls as well, 

Re: [PATCH v2] floppy: remove dead code related to formatting

2021-05-17 Thread John Snow

On 4/27/21 10:28 PM, Alexander Bulekov wrote:

fdctrl_format_sector was added in
baca51faff ("updated floppy driver: formatting code, disk geometry auto detect 
(Jocelyn Mayer)")

The single callsite is guarded by a check:
fdctrl->data_state & FD_STATE_FORMAT

However, the only place where the FD_STATE_FORMAT flag is set (in
fdctrl_handle_format_track) is closely followed by the same flag being
unset, with no possibility to call fdctrl_format_sector in between.

This removes fdctrl_format_sector, the unncessary setting/unsetting
of the FD_STATE_FORMAT flag, and the fdctrl_handle_format_track function
(which is just a stub).

Suggested-by: Hervé Poussineau 
Signed-off-by: Alexander Bulekov 
---



Herve, does it look good to you? I feel bad about deleting code out of a 
device that badly needs attention, but it seems like this code was 
probably not operating correctly to begin with and I don't have the time 
to figure out how to implement it correctly.



I ran through tests/qtest/fdc-test, and ran fdformat on a dummy disk -
nothing exploded, but since I don't use floppies very often, more eyes
definitely won't hurt. In particular, I'm not sure about the
fdctrl_handle_format_track delete - that function has side-effects on
both FDrive and FDCtrl, and it is certainly reachable. If deleting the
whole thing seems wrong, I'll roll-back that change, and we can just
remove the unreachable code..



Yeah, I just had some reservations about allowing a stub to persist that 
touched state and didn't actually seem to invoke the routine it was 
meant to.


It's hard to audit the impact either way, and I don't have a good test 
suite to know what the ramifications are.



  hw/block/fdc.c | 97 --
  1 file changed, 97 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index a825c2acba..d851d23cc0 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -657,7 +657,6 @@ enum {
  
  enum {

  FD_STATE_MULTI  = 0x01,   /* multi track flag */
-FD_STATE_FORMAT = 0x02,/* format flag */
  };
  
  enum {

@@ -826,7 +825,6 @@ enum {
  };
  
  #define FD_MULTI_TRACK(state) ((state) & FD_STATE_MULTI)

-#define FD_FORMAT_CMD(state) ((state) & FD_STATE_FORMAT)
  
  struct FDCtrl {

  MemoryRegion iomem;
@@ -1942,67 +1940,6 @@ static uint32_t fdctrl_read_data(FDCtrl *fdctrl)
  return retval;
  }
  
-static void fdctrl_format_sector(FDCtrl *fdctrl)

-{
-FDrive *cur_drv;
-uint8_t kh, kt, ks;
-
-SET_CUR_DRV(fdctrl, fdctrl->fifo[1] & FD_DOR_SELMASK);
-cur_drv = get_cur_drv(fdctrl);
-kt = fdctrl->fifo[6];
-kh = fdctrl->fifo[7];
-ks = fdctrl->fifo[8];
-FLOPPY_DPRINTF("format sector at %d %d %02x %02x (%d)\n",
-   GET_CUR_DRV(fdctrl), kh, kt, ks,
-   fd_sector_calc(kh, kt, ks, cur_drv->last_sect,
-  NUM_SIDES(cur_drv)));
-switch (fd_seek(cur_drv, kh, kt, ks, fdctrl->config & FD_CONFIG_EIS)) {
-case 2:
-/* sect too big */
-fdctrl_stop_transfer(fdctrl, FD_SR0_ABNTERM, 0x00, 0x00);
-fdctrl->fifo[3] = kt;
-fdctrl->fifo[4] = kh;
-fdctrl->fifo[5] = ks;
-return;
-case 3:
-/* track too big */
-fdctrl_stop_transfer(fdctrl, FD_SR0_ABNTERM, FD_SR1_EC, 0x00);
-fdctrl->fifo[3] = kt;
-fdctrl->fifo[4] = kh;
-fdctrl->fifo[5] = ks;
-return;
-case 4:
-/* No seek enabled */
-fdctrl_stop_transfer(fdctrl, FD_SR0_ABNTERM, 0x00, 0x00);
-fdctrl->fifo[3] = kt;
-fdctrl->fifo[4] = kh;
-fdctrl->fifo[5] = ks;
-return;
-case 1:
-fdctrl->status0 |= FD_SR0_SEEK;
-break;
-default:
-break;
-}
-memset(fdctrl->fifo, 0, FD_SECTOR_LEN);
-if (cur_drv->blk == NULL ||
-blk_pwrite(cur_drv->blk, fd_offset(cur_drv), fdctrl->fifo,
-   BDRV_SECTOR_SIZE, 0) < 0) {
-FLOPPY_DPRINTF("error formatting sector %d\n", fd_sector(cur_drv));
-fdctrl_stop_transfer(fdctrl, FD_SR0_ABNTERM | FD_SR0_SEEK, 0x00, 0x00);
-} else {
-if (cur_drv->sect == cur_drv->last_sect) {
-fdctrl->data_state &= ~FD_STATE_FORMAT;
-/* Last sector done */
-fdctrl_stop_transfer(fdctrl, 0x00, 0x00, 0x00);
-} else {
-/* More to do */
-fdctrl->data_pos = 0;
-fdctrl->data_len = 4;
-}
-}
-}
-
  static void fdctrl_handle_lock(FDCtrl *fdctrl, int direction)
  {
  fdctrl->lock = (fdctrl->fifo[0] & 0x80) ? 1 : 0;
@@ -2110,34 +2047,6 @@ static void fdctrl_handle_readid(FDCtrl *fdctrl, int 
direction)
   (NANOSECONDS_PER_SECOND / 50));
  }
  
-static void fdctrl_handle_format_track(FDCtrl *fdctrl, int direction)

-{
-FDrive *cur_drv;
-
-SET_CUR_DRV(fdctrl, fdctrl->fifo[1] & FD_DOR_SELMASK);
-cur_drv = get_cur_drv(fdctrl);
-fdctrl->data_state |= FD_STATE_FORMAT;
-if (fdctrl->fifo[0] & 0x80)
- 

Re: [PATCH v2] fdc: fix floppy boot for Red Hat Linux 5.2

2021-05-17 Thread John Snow

On 4/27/21 2:10 PM, John Snow wrote:

The image size indicates it's an 81 track floppy disk image, which we
don't have a listing for in the geometry table. When you force the drive
type to 1.44MB, it guesses the reasonably close 18/80. When the drive
type is allowed to auto-detect or set to 2.88, it guesses a very
incorrect geometry.

auto, 144 and 288 drive types get the right geometry with the new entry
in the table.

Reported-by: Michael Tokarev 
Signed-off-by: John Snow 
Reviewed-by: Thomas Huth 

---

V2: I didn't actually stage this, so this is just a re-send to get a
fresh Message-ID to reference in the PR. Added Thomas's R-B.

  hw/block/fdc.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index a825c2acbae..0f0c716d878 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -122,6 +122,7 @@ static const FDFormat fd_formats[] = {
  /* First entry is default format */
  /* 1.44 MB 3"1/2 floppy disks */
  { FLOPPY_DRIVE_TYPE_144, 18, 80, 1, FDRIVE_RATE_500K, }, /* 3.5" 2880 */
+{ FLOPPY_DRIVE_TYPE_144, 18, 81, 1, FDRIVE_RATE_500K, },
  { FLOPPY_DRIVE_TYPE_144, 20, 80, 1, FDRIVE_RATE_500K, }, /* 3.5" 3200 */
  { FLOPPY_DRIVE_TYPE_144, 21, 80, 1, FDRIVE_RATE_500K, },
  { FLOPPY_DRIVE_TYPE_144, 21, 82, 1, FDRIVE_RATE_500K, },



Staged on my floppy branch.

--js




Re: [PATCH v4 0/9] hw/block/fdc: Allow Kconfig-selecting ISA bus/SysBus floppy controllers

2021-05-17 Thread John Snow

On 5/17/21 4:50 PM, Philippe Mathieu-Daudé wrote:

On 5/17/21 9:19 PM, John Snow wrote:

On 5/17/21 2:39 PM, Philippe Mathieu-Daudé wrote:

Missing review: #1

Hi,

The floppy disc controllers pulls in irrelevant devices (sysbus in
an ISA-only machine, ISA bus + isa devices on a sysbus-only machine).

This series clean that by extracting each device in its own file,
adding the corresponding Kconfig symbols: FDC_ISA and FDC_SYSBUS.

Since v3:
- Fix ISA_SUPERIO -> FDC Kconfig dependency (jsnow)

Since v2:
- rebased

Since v1:
- added missing "hw/block/block.h" header (jsnow)
- inlined hardware specific calls (Mark)
- added R-b/A-b tags

Regards,

Phil.

Philippe Mathieu-Daudé (9):
    hw/isa/Kconfig: Fix missing dependency ISA_SUPERIO -> FDC
    hw/block/fdc: Replace disabled fprintf() by trace event
    hw/block/fdc: Declare shared prototypes in fdc-internal.h
    hw/block/fdc: Extract ISA floppy controllers to fdc-isa.c
    hw/block/fdc: Extract SysBus floppy controllers to fdc-sysbus.c
    hw/block/fdc: Add sysbus_fdc_init_drives() method
    hw/sparc/sun4m: Inline sun4m_fdctrl_init()
    hw/block/fdc-sysbus: Add 'dma-channel' property
    hw/mips/jazz: Inline fdctrl_init_sysbus()

   hw/block/fdc-internal.h | 156 +++
   include/hw/block/fdc.h  |   7 +-
   hw/block/fdc-isa.c  | 313 +
   hw/block/fdc-sysbus.c   | 224 +++
   hw/block/fdc.c  | 608 +---
   hw/mips/jazz.c  |  16 ++
   hw/sparc/sun4m.c    |  16 ++
   MAINTAINERS |   3 +
   hw/block/Kconfig    |   8 +
   hw/block/meson.build    |   2 +
   hw/block/trace-events   |   3 +
   hw/i386/Kconfig |   2 +-
   hw/isa/Kconfig  |   7 +-
   hw/mips/Kconfig |   2 +-
   hw/sparc/Kconfig    |   2 +-
   hw/sparc64/Kconfig  |   2 +-
   16 files changed, 759 insertions(+), 612 deletions(-)
   create mode 100644 hw/block/fdc-internal.h
   create mode 100644 hw/block/fdc-isa.c
   create mode 100644 hw/block/fdc-sysbus.c



Hi, tentatively staged:

https://gitlab.com/jsnow/qemu/-/commits/floppy/

pending CI:

https://gitlab.com/jsnow/qemu/-/pipelines/304308461


Not good enough:

qemu-system-sparc: ../hw/block/fdc.c:2356: fdctrl_realize_common:
Assertion `fdctrl->dma' failed.

Forget about it for your next pull request.



Yup, I see. Dropping it from the queue for now. Thanks!

--js




Re: [PATCH v4 0/9] hw/block/fdc: Allow Kconfig-selecting ISA bus/SysBus floppy controllers

2021-05-17 Thread Philippe Mathieu-Daudé
On 5/17/21 9:19 PM, John Snow wrote:
> On 5/17/21 2:39 PM, Philippe Mathieu-Daudé wrote:
>> Missing review: #1
>>
>> Hi,
>>
>> The floppy disc controllers pulls in irrelevant devices (sysbus in
>> an ISA-only machine, ISA bus + isa devices on a sysbus-only machine).
>>
>> This series clean that by extracting each device in its own file,
>> adding the corresponding Kconfig symbols: FDC_ISA and FDC_SYSBUS.
>>
>> Since v3:
>> - Fix ISA_SUPERIO -> FDC Kconfig dependency (jsnow)
>>
>> Since v2:
>> - rebased
>>
>> Since v1:
>> - added missing "hw/block/block.h" header (jsnow)
>> - inlined hardware specific calls (Mark)
>> - added R-b/A-b tags
>>
>> Regards,
>>
>> Phil.
>>
>> Philippe Mathieu-Daudé (9):
>>    hw/isa/Kconfig: Fix missing dependency ISA_SUPERIO -> FDC
>>    hw/block/fdc: Replace disabled fprintf() by trace event
>>    hw/block/fdc: Declare shared prototypes in fdc-internal.h
>>    hw/block/fdc: Extract ISA floppy controllers to fdc-isa.c
>>    hw/block/fdc: Extract SysBus floppy controllers to fdc-sysbus.c
>>    hw/block/fdc: Add sysbus_fdc_init_drives() method
>>    hw/sparc/sun4m: Inline sun4m_fdctrl_init()
>>    hw/block/fdc-sysbus: Add 'dma-channel' property
>>    hw/mips/jazz: Inline fdctrl_init_sysbus()
>>
>>   hw/block/fdc-internal.h | 156 +++
>>   include/hw/block/fdc.h  |   7 +-
>>   hw/block/fdc-isa.c  | 313 +
>>   hw/block/fdc-sysbus.c   | 224 +++
>>   hw/block/fdc.c  | 608 +---
>>   hw/mips/jazz.c  |  16 ++
>>   hw/sparc/sun4m.c    |  16 ++
>>   MAINTAINERS |   3 +
>>   hw/block/Kconfig    |   8 +
>>   hw/block/meson.build    |   2 +
>>   hw/block/trace-events   |   3 +
>>   hw/i386/Kconfig |   2 +-
>>   hw/isa/Kconfig  |   7 +-
>>   hw/mips/Kconfig |   2 +-
>>   hw/sparc/Kconfig    |   2 +-
>>   hw/sparc64/Kconfig  |   2 +-
>>   16 files changed, 759 insertions(+), 612 deletions(-)
>>   create mode 100644 hw/block/fdc-internal.h
>>   create mode 100644 hw/block/fdc-isa.c
>>   create mode 100644 hw/block/fdc-sysbus.c
>>
> 
> Hi, tentatively staged:
> 
> https://gitlab.com/jsnow/qemu/-/commits/floppy/
> 
> pending CI:
> 
> https://gitlab.com/jsnow/qemu/-/pipelines/304308461

Not good enough:

qemu-system-sparc: ../hw/block/fdc.c:2356: fdctrl_realize_common:
Assertion `fdctrl->dma' failed.

Forget about it for your next pull request.




Re: [PATCH 08/21] block/backup: stricter backup_calculate_cluster_size()

2021-05-17 Thread Vladimir Sementsov-Ogievskiy

17.05.2021 19:57, Max Reitz wrote:

On 17.05.21 08:44, Vladimir Sementsov-Ogievskiy wrote:

No reason to tolerate bdrv_get_info() errors except for ENOTSUP. Let's
just error-out, it's simpler and safer.


Hm, doesn’t look that much simpler to me.  Not sure how much safer it is, 
because the point was that in the target_does_cow case, we would like a cluster 
size hint, but it isn’t necessary.  So if we don’t get one, regardless of the 
reason, we use the default cluster size.  I don’t know why ENOTSUP should be 
treated in a special way there.

So I don’t know.



I'm probably OK to drop this for now and don't care. Still, I can share what 
brings me to this:

First I thought that cluster size should be easily available for any driver:

protocol drivers and not-backing-supporting format drivers can set it to 1 or to 
request_alignment, if they don't have a "cluster" in mind.

backing-supporting format drivers should of course provide actual cluster size

And I decided to just add bs->cluster_size variable, set on driver open, to 
simplify the whole thing and make it clean. Then, most this detect-cluster-size 
function would be just dropped.

But it occurs, that there is one driver, that has a good and rather tricky 
reason for ENOTSUP: vmdk can have several extents with different cluster size..

So I give up refactored, and finished with this one patch. It can be simply 
dropped, I am not really a fan of it..




Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/backup.c | 14 +-
  1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index fe685e411b..fe7a1f1e37 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -367,7 +367,10 @@ static int64_t 
backup_calculate_cluster_size(BlockDriverState *target,
   * targets with a backing file, try to avoid COW if possible.
   */
  ret = bdrv_get_info(target, );
-    if (ret == -ENOTSUP && !target_does_cow) {
+    if (ret < 0 && ret != -ENOTSUP) {
+    error_setg_errno(errp, -ret, "Failed to get target info");
+    return ret;
+    } else if (ret == -ENOTSUP && !target_does_cow) {
  /* Cluster size is not defined */
  warn_report("The target block device doesn't provide "
  "information about the block size and it doesn't have a "
@@ -376,14 +379,7 @@ static int64_t 
backup_calculate_cluster_size(BlockDriverState *target,
  "this default, the backup may be unusable",
  BACKUP_CLUSTER_SIZE_DEFAULT);
  return BACKUP_CLUSTER_SIZE_DEFAULT;
-    } else if (ret < 0 && !target_does_cow) {
-    error_setg_errno(errp, -ret,
-    "Couldn't determine the cluster size of the target image, "
-    "which has no backing file");
-    error_append_hint(errp,
-    "Aborting, since this may create an unusable destination image\n");
-    return ret;
-    } else if (ret < 0 && target_does_cow) {
+    } else if (ret == -ENOTSUP && target_does_cow) {
  /* Not fatal; just trudge on ahead. */
  return BACKUP_CLUSTER_SIZE_DEFAULT;
  }






--
Best regards,
Vladimir



Re: [PATCH 05/21] block: rename backup-top to copy-before-write

2021-05-17 Thread Vladimir Sementsov-Ogievskiy

17.05.2021 19:05, Max Reitz wrote:

On 17.05.21 08:44, Vladimir Sementsov-Ogievskiy wrote:

We are going to convert backup_top to full featured public filter,
which can be used in separate of backup job. Start from renaming from
"how it used" to "what it does".


Is this safe?  The name was externally visible in queries after all. (I’m not 
saying it is unsafe, I just don’t know and would like to know whether you’ve 
considered this already.)

(Regardless, renaming files and so on is fine, of course.)


Hmmm. I don't know.

It was visible yes.. But we've never documented it. And if someone depends on name of the 
format of the filter automatically inserted during backup job, it's a kind of 
"undocumented feature" use..

Another change I is changing child from backing to file in 11, from this point 
of view it's unsafe too. But ше even more reasonable than good name: having all 
public filters behave similar is a very good thing.

So, may be it a bit risky, but I think good interface worth that risk. And we always can 
say "sorry guys, but that was not documented, we didn't promise anything".

But I'm OK to go on with "backup-top" and "backing", is someone has strict 
opinion about this.




While updating comments in 283 iotest, drop and rephrase also things
about ".active", as this field is now dropped, and filter doesn't have
"inactive" mode.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/{backup-top.h => copy-before-write.h} |  28 +++---
  block/backup.c  |  22 ++---
  block/{backup-top.c => copy-before-write.c} | 100 ++--
  MAINTAINERS |   4 +-
  block/meson.build   |   2 +-
  tests/qemu-iotests/283  |  35 +++
  tests/qemu-iotests/283.out  |   4 +-
  7 files changed, 95 insertions(+), 100 deletions(-)
  rename block/{backup-top.h => copy-before-write.h} (56%)
  rename block/{backup-top.c => copy-before-write.c} (62%)


[...]


diff --git a/block/backup-top.c b/block/copy-before-write.c
similarity index 62%
rename from block/backup-top.c
rename to block/copy-before-write.c
index 425e3778be..40e91832d7 100644
--- a/block/backup-top.c
+++ b/block/copy-before-write.c


[...]


@@ -32,25 +32,25 @@


[...]


-static coroutine_fn int backup_top_cbw(BlockDriverState *bs, uint64_t offset,
-   uint64_t bytes, BdrvRequestFlags flags)
+static coroutine_fn int cbw_cbw(BlockDriverState *bs, uint64_t offset,
+    uint64_t bytes, BdrvRequestFlags flags)


I’m sure you noticed it, too, but cbw_cbw() is weird.  Perhaps cbw_do_cbw() at 
least?



OK. Maybe even cbw_do_copy_before_write()


--
Best regards,
Vladimir



Re: [PATCH v4 0/9] hw/block/fdc: Allow Kconfig-selecting ISA bus/SysBus floppy controllers

2021-05-17 Thread John Snow

On 5/17/21 2:39 PM, Philippe Mathieu-Daudé wrote:

Missing review: #1

Hi,

The floppy disc controllers pulls in irrelevant devices (sysbus in
an ISA-only machine, ISA bus + isa devices on a sysbus-only machine).

This series clean that by extracting each device in its own file,
adding the corresponding Kconfig symbols: FDC_ISA and FDC_SYSBUS.

Since v3:
- Fix ISA_SUPERIO -> FDC Kconfig dependency (jsnow)

Since v2:
- rebased

Since v1:
- added missing "hw/block/block.h" header (jsnow)
- inlined hardware specific calls (Mark)
- added R-b/A-b tags

Regards,

Phil.

Philippe Mathieu-Daudé (9):
   hw/isa/Kconfig: Fix missing dependency ISA_SUPERIO -> FDC
   hw/block/fdc: Replace disabled fprintf() by trace event
   hw/block/fdc: Declare shared prototypes in fdc-internal.h
   hw/block/fdc: Extract ISA floppy controllers to fdc-isa.c
   hw/block/fdc: Extract SysBus floppy controllers to fdc-sysbus.c
   hw/block/fdc: Add sysbus_fdc_init_drives() method
   hw/sparc/sun4m: Inline sun4m_fdctrl_init()
   hw/block/fdc-sysbus: Add 'dma-channel' property
   hw/mips/jazz: Inline fdctrl_init_sysbus()

  hw/block/fdc-internal.h | 156 +++
  include/hw/block/fdc.h  |   7 +-
  hw/block/fdc-isa.c  | 313 +
  hw/block/fdc-sysbus.c   | 224 +++
  hw/block/fdc.c  | 608 +---
  hw/mips/jazz.c  |  16 ++
  hw/sparc/sun4m.c|  16 ++
  MAINTAINERS |   3 +
  hw/block/Kconfig|   8 +
  hw/block/meson.build|   2 +
  hw/block/trace-events   |   3 +
  hw/i386/Kconfig |   2 +-
  hw/isa/Kconfig  |   7 +-
  hw/mips/Kconfig |   2 +-
  hw/sparc/Kconfig|   2 +-
  hw/sparc64/Kconfig  |   2 +-
  16 files changed, 759 insertions(+), 612 deletions(-)
  create mode 100644 hw/block/fdc-internal.h
  create mode 100644 hw/block/fdc-isa.c
  create mode 100644 hw/block/fdc-sysbus.c



Hi, tentatively staged:

https://gitlab.com/jsnow/qemu/-/commits/floppy/

pending CI:

https://gitlab.com/jsnow/qemu/-/pipelines/304308461

--js




Re: [PATCH v6 00/25] python: create installable package

2021-05-17 Thread John Snow

On 5/12/21 7:12 PM, John Snow wrote:

Based-on: <20210512214642.2803189-1-js...@redhat.com>
CI: https://gitlab.com/jsnow/qemu/-/pipelines/302010131
GitLab: https://gitlab.com/jsnow/qemu/-/tree/python-package-mk3
MR: https://gitlab.com/jsnow/qemu/-/merge_requests/4


Patchset updated and rebased on top of new linting pre-req series.
(Gitlab branch and MR rebased and updated.)

Based-on: <20210517184808.3562549-1-js...@redhat.com>


I invite you to leave review comments on my mock merge request on
gitlab, submitted against my own mirror. I will, as always, also respond
to feedback on-list.

ABOUT
=

This series factors the python/qemu directory as an installable
package. It does not yet actually change the mechanics of how any other
python source in the tree actually consumes it (yet), beyond the import
path -- some import statements change in a few places.

RATIONALE
=

The primary motivation of this series is primarily to formalize our
dependencies on mypy, flake8, isort, and pylint alongside versions that
are known to work. It does this using the setup.cfg and setup.py
files. It also adds explicitly pinned versions (using Pipfile.lock) of
these dependencies that should behave in a repeatable and known way for
developers and CI environments both. Lastly, it enables those CI checks
such that we can enforce Python coding quality checks via the CI tests.

An auxiliary motivation is that this package is formatted in such a way
that it COULD be uploaded to https://pypi.org/project/qemu and installed
independently of qemu.git with `pip install qemu`, but that button
remains *unpushed* and this series *will not* cause any such
releases. We have time to debate finer points like API guarantees and
versioning even after this series is merged.

Other bits of interest
--

With the python tooling as a proper package, you can install this
package in editable or production mode to a virtual environment, your
local user environment, or your system packages. The primary benefit of
this is to gain access to QMP tooling regardless of CWD, without needing
to battle sys.path (and confounding other python analysis tools).

For example: when developing, you may go to qemu/python/ and run `make
venv` followed by `pipenv shell` to activate a virtual environment that
contains the qemu python packages. These packages will always reflect
the current version of the source files in the tree. When you are
finished, you can simply exit the shell (^d) to remove these packages
from your python environment.

When not developing, you could install a version of this package to your
environment outright to gain access to the QMP and QEMUMachine classes
for lightweight scripting and testing by using pip: "pip install
[--user] ."

TESTING THIS SERIES
===

First of all, nothing should change. Without any intervention,
everything should behave exactly as it did before. The only new
information here comes from how to interact with and run the linters
that will be enforcing code quality standards in this subdirectory.

There are various invocations available that will test subtly different
combinations using subtly different environments. I am assuming some
light knowledge of Python environments and installing Python packages
here. If you have questions, I would be delighted to answer them.

To test the new tests, CD to ./python/ first, and then:

0. Try "make" or "make help" to get a sense of this series.

1. Try "make venv && pipenv shell" to get a venv with the package
installed to it in editable mode. Ctrl+d exits this venv shell. While
in this shell, any python script that uses "from qemu.[qmp|machine]
import ..." should work correctly regardless of where the script is,
or what your CWD is.

This will pull some packages from PyPI and install them into the
virtual environment, leaving your normal environment untouched.

You will need Python 3.6 and pipenv installed on your system to do
this step. For Fedora: "dnf install python36 pipenv" will do the
trick. If you don't have this, skip down to #4 and onwards.

2. Try "make check" while still in the shell to run the Python linters
using the venv built in the previous step. This will run avocado, which
will in turn execute mypy, flake8, isort and pylint with the correct
arguments.

3. Having exited the shell from above, try "make venv-check". This will
create and update the venv if needed, then run 'make check' within the
context of that shell. It should pass as long as the above did. You
should be able to run "make distclean" prior to running "make
venv-check" and have the entire process work start to finish.

4. Still outside of the venv, you may try running "make check". This
will not install anything, but unless you have the right Python
dependencies installed, these tests may fail for you. You might try
using "pip install --user .[devel]" to install the development packages
needed to run the tests 

[PATCH v4 5/9] hw/block/fdc: Extract SysBus floppy controllers to fdc-sysbus.c

2021-05-17 Thread Philippe Mathieu-Daudé
Some machines use floppy controllers via the SysBus interface,
and don't need to pull in all the SysBus code.
Extract the SysBus specific code to a new unit: fdc-sysbus.c,
and add a new Kconfig symbol: "FDC_SYSBUS".

Reviewed-by: John Snow 
Acked-by: Mark Cave-Ayland 
Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/fdc-sysbus.c | 252 ++
 hw/block/fdc.c| 220 
 MAINTAINERS   |   1 +
 hw/block/Kconfig  |   4 +
 hw/block/meson.build  |   1 +
 hw/block/trace-events |   2 +
 hw/mips/Kconfig   |   2 +-
 hw/sparc/Kconfig  |   2 +-
 8 files changed, 262 insertions(+), 222 deletions(-)
 create mode 100644 hw/block/fdc-sysbus.c

diff --git a/hw/block/fdc-sysbus.c b/hw/block/fdc-sysbus.c
new file mode 100644
index 000..71755fd6ae4
--- /dev/null
+++ b/hw/block/fdc-sysbus.c
@@ -0,0 +1,252 @@
+/*
+ * QEMU Floppy disk emulator (Intel 82078)
+ *
+ * Copyright (c) 2003, 2007 Jocelyn Mayer
+ * Copyright (c) 2008 Hervé Poussineau
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+#include "hw/sysbus.h"
+#include "hw/block/fdc.h"
+#include "migration/vmstate.h"
+#include "fdc-internal.h"
+#include "trace.h"
+
+#define TYPE_SYSBUS_FDC "base-sysbus-fdc"
+typedef struct FDCtrlSysBusClass FDCtrlSysBusClass;
+typedef struct FDCtrlSysBus FDCtrlSysBus;
+DECLARE_OBJ_CHECKERS(FDCtrlSysBus, FDCtrlSysBusClass,
+ SYSBUS_FDC, TYPE_SYSBUS_FDC)
+
+struct FDCtrlSysBusClass {
+/*< private >*/
+SysBusDeviceClass parent_class;
+/*< public >*/
+
+bool use_strict_io;
+};
+
+struct FDCtrlSysBus {
+/*< private >*/
+SysBusDevice parent_obj;
+/*< public >*/
+
+struct FDCtrl state;
+};
+
+static uint64_t fdctrl_read_mem(void *opaque, hwaddr reg, unsigned ize)
+{
+return fdctrl_read(opaque, (uint32_t)reg);
+}
+
+static void fdctrl_write_mem(void *opaque, hwaddr reg,
+ uint64_t value, unsigned size)
+{
+fdctrl_write(opaque, (uint32_t)reg, value);
+}
+
+static const MemoryRegionOps fdctrl_mem_ops = {
+.read = fdctrl_read_mem,
+.write = fdctrl_write_mem,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+static const MemoryRegionOps fdctrl_mem_strict_ops = {
+.read = fdctrl_read_mem,
+.write = fdctrl_write_mem,
+.endianness = DEVICE_NATIVE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 1,
+},
+};
+
+static void fdctrl_external_reset_sysbus(DeviceState *d)
+{
+FDCtrlSysBus *sys = SYSBUS_FDC(d);
+FDCtrl *s = >state;
+
+fdctrl_reset(s, 0);
+}
+
+static void fdctrl_handle_tc(void *opaque, int irq, int level)
+{
+trace_fdctrl_tc_pulse(level);
+}
+
+void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
+hwaddr mmio_base, DriveInfo **fds)
+{
+FDCtrl *fdctrl;
+DeviceState *dev;
+SysBusDevice *sbd;
+FDCtrlSysBus *sys;
+
+dev = qdev_new("sysbus-fdc");
+sys = SYSBUS_FDC(dev);
+fdctrl = >state;
+fdctrl->dma_chann = dma_chann; /* FIXME */
+sbd = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(sbd, _fatal);
+sysbus_connect_irq(sbd, 0, irq);
+sysbus_mmio_map(sbd, 0, mmio_base);
+
+fdctrl_init_drives(>state.bus, fds);
+}
+
+void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
+   DriveInfo **fds, qemu_irq *fdc_tc)
+{
+DeviceState *dev;
+FDCtrlSysBus *sys;
+
+dev = qdev_new("sun-fdtwo");
+sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
+sys = SYSBUS_FDC(dev);
+sysbus_connect_irq(SYS_BUS_DEVICE(sys), 0, irq);
+sysbus_mmio_map(SYS_BUS_DEVICE(sys), 0, io_base);
+*fdc_tc = qdev_get_gpio_in(dev, 0);
+
+fdctrl_init_drives(>state.bus, fds);
+}
+
+static void sysbus_fdc_common_initfn(Object *obj)
+{
+DeviceState *dev = DEVICE(obj);
+ 

[PATCH v4 1/9] hw/isa/Kconfig: Fix missing dependency ISA_SUPERIO -> FDC

2021-05-17 Thread Philippe Mathieu-Daudé
isa_superio_realize() calls isa_fdc_init_drives(), which is defined
in hw/block/fdc.c, so ISA_SUPERIO needs to select the FDC symbol.

Reported-by: John Snow 
Fixes: c0ff3795143 ("Introduce a CONFIG_ISA_SUPERIO switch for isa-superio.c")
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/isa/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/isa/Kconfig b/hw/isa/Kconfig
index 55e0003ce40..7216f66a54a 100644
--- a/hw/isa/Kconfig
+++ b/hw/isa/Kconfig
@@ -17,6 +17,7 @@ config ISA_SUPERIO
 bool
 select ISA_BUS
 select PCKBD
+select FDC
 
 config PC87312
 bool
-- 
2.26.3




[PATCH v4 2/9] hw/block/fdc: Replace disabled fprintf() by trace event

2021-05-17 Thread Philippe Mathieu-Daudé
Reviewed-by: John Snow 
Acked-by: Mark Cave-Ayland 
Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/fdc.c| 7 +--
 hw/block/trace-events | 1 +
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index a825c2acbae..1d3a0473678 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -1242,12 +1242,7 @@ static void fdctrl_external_reset_isa(DeviceState *d)
 
 static void fdctrl_handle_tc(void *opaque, int irq, int level)
 {
-//FDCtrl *s = opaque;
-
-if (level) {
-// XXX
-FLOPPY_DPRINTF("TC pulsed\n");
-}
+trace_fdctrl_tc_pulse(level);
 }
 
 /* Change IRQ state */
diff --git a/hw/block/trace-events b/hw/block/trace-events
index fa12e3a67a7..306989c193c 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -3,6 +3,7 @@
 # fdc.c
 fdc_ioport_read(uint8_t reg, uint8_t value) "read reg 0x%02x val 0x%02x"
 fdc_ioport_write(uint8_t reg, uint8_t value) "write reg 0x%02x val 0x%02x"
+fdctrl_tc_pulse(int level) "TC pulse: %u"
 
 # pflash_cfi01.c
 # pflash_cfi02.c
-- 
2.26.3




[PATCH v4 0/9] hw/block/fdc: Allow Kconfig-selecting ISA bus/SysBus floppy controllers

2021-05-17 Thread Philippe Mathieu-Daudé
Missing review: #1

Hi,

The floppy disc controllers pulls in irrelevant devices (sysbus in
an ISA-only machine, ISA bus + isa devices on a sysbus-only machine).

This series clean that by extracting each device in its own file,
adding the corresponding Kconfig symbols: FDC_ISA and FDC_SYSBUS.

Since v3:
- Fix ISA_SUPERIO -> FDC Kconfig dependency (jsnow)

Since v2:
- rebased

Since v1:
- added missing "hw/block/block.h" header (jsnow)
- inlined hardware specific calls (Mark)
- added R-b/A-b tags

Regards,

Phil.

Philippe Mathieu-Daudé (9):
  hw/isa/Kconfig: Fix missing dependency ISA_SUPERIO -> FDC
  hw/block/fdc: Replace disabled fprintf() by trace event
  hw/block/fdc: Declare shared prototypes in fdc-internal.h
  hw/block/fdc: Extract ISA floppy controllers to fdc-isa.c
  hw/block/fdc: Extract SysBus floppy controllers to fdc-sysbus.c
  hw/block/fdc: Add sysbus_fdc_init_drives() method
  hw/sparc/sun4m: Inline sun4m_fdctrl_init()
  hw/block/fdc-sysbus: Add 'dma-channel' property
  hw/mips/jazz: Inline fdctrl_init_sysbus()

 hw/block/fdc-internal.h | 156 +++
 include/hw/block/fdc.h  |   7 +-
 hw/block/fdc-isa.c  | 313 +
 hw/block/fdc-sysbus.c   | 224 +++
 hw/block/fdc.c  | 608 +---
 hw/mips/jazz.c  |  16 ++
 hw/sparc/sun4m.c|  16 ++
 MAINTAINERS |   3 +
 hw/block/Kconfig|   8 +
 hw/block/meson.build|   2 +
 hw/block/trace-events   |   3 +
 hw/i386/Kconfig |   2 +-
 hw/isa/Kconfig  |   7 +-
 hw/mips/Kconfig |   2 +-
 hw/sparc/Kconfig|   2 +-
 hw/sparc64/Kconfig  |   2 +-
 16 files changed, 759 insertions(+), 612 deletions(-)
 create mode 100644 hw/block/fdc-internal.h
 create mode 100644 hw/block/fdc-isa.c
 create mode 100644 hw/block/fdc-sysbus.c

-- 
2.26.3





[PATCH v4 9/9] hw/mips/jazz: Inline fdctrl_init_sysbus()

2021-05-17 Thread Philippe Mathieu-Daudé
There is only one call site for fdctrl_init_sysbus(), and this
function is specific to the jazz machines, not part of the
SYSBUS_FDC API. Move it locally with the machine code, and
remove its declaration in "hw/block/fdc.h".

Suggested-by: Mark Cave-Ayland 
Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/block/fdc.h |  3 ---
 hw/block/fdc-sysbus.c  | 16 
 hw/mips/jazz.c | 16 
 3 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/include/hw/block/fdc.h b/include/hw/block/fdc.h
index 06612218630..ac99d6bcaa0 100644
--- a/include/hw/block/fdc.h
+++ b/include/hw/block/fdc.h
@@ -1,7 +1,6 @@
 #ifndef HW_FDC_H
 #define HW_FDC_H
 
-#include "exec/hwaddr.h"
 #include "qapi/qapi-types-block.h"
 #include "hw/sysbus.h"
 
@@ -12,8 +11,6 @@
 
 void isa_fdc_init_drives(ISADevice *fdc, DriveInfo **fds);
 void sysbus_fdc_init_drives(SysBusDevice *dev, DriveInfo **fds);
-void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
-hwaddr mmio_base, DriveInfo **fds);
 
 FloppyDriveType isa_fdc_get_drive_type(ISADevice *fdc, int i);
 int cmos_get_fd_drive_type(FloppyDriveType fd0);
diff --git a/hw/block/fdc-sysbus.c b/hw/block/fdc-sysbus.c
index 74c7c8f2e01..5c7e49bcc3f 100644
--- a/hw/block/fdc-sysbus.c
+++ b/hw/block/fdc-sysbus.c
@@ -103,22 +103,6 @@ void sysbus_fdc_init_drives(SysBusDevice *dev, DriveInfo 
**fds)
 fdctrl_init_drives(>state.bus, fds);
 }
 
-void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
-hwaddr mmio_base, DriveInfo **fds)
-{
-DeviceState *dev;
-SysBusDevice *sbd;
-
-dev = qdev_new("sysbus-fdc");
-qdev_prop_set_int32(dev, "dma-channel", dma_chann);
-sbd = SYS_BUS_DEVICE(dev);
-sysbus_realize_and_unref(sbd, _fatal);
-sysbus_connect_irq(sbd, 0, irq);
-sysbus_mmio_map(sbd, 0, mmio_base);
-
-sysbus_fdc_init_drives(sbd, fds);
-}
-
 static void sysbus_fdc_common_initfn(Object *obj)
 {
 DeviceState *dev = DEVICE(obj);
diff --git a/hw/mips/jazz.c b/hw/mips/jazz.c
index dba2088ed1a..13f26c5991f 100644
--- a/hw/mips/jazz.c
+++ b/hw/mips/jazz.c
@@ -143,6 +143,22 @@ static void mips_jazz_do_transaction_failed(CPUState *cs, 
hwaddr physaddr,
 }
 #endif /* CONFIG_TCG && !CONFIG_USER_ONLY */
 
+static void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
+   hwaddr mmio_base, DriveInfo **fds)
+{
+DeviceState *dev;
+SysBusDevice *sbd;
+
+dev = qdev_new("sysbus-fdc");
+qdev_prop_set_int32(dev, "dma-channel", dma_chann);
+sbd = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(sbd, _fatal);
+sysbus_connect_irq(sbd, 0, irq);
+sysbus_mmio_map(sbd, 0, mmio_base);
+
+sysbus_fdc_init_drives(sbd, fds);
+}
+
 static void mips_jazz_init(MachineState *machine,
enum jazz_model_e jazz_model)
 {
-- 
2.26.3




Re: [PATCH v3 3/8] hw/block/fdc: Extract ISA floppy controllers to fdc-isa.c

2021-05-17 Thread Philippe Mathieu-Daudé
On 5/17/21 8:19 PM, John Snow wrote:
> On 5/17/21 2:04 PM, John Snow wrote:
>> On 5/17/21 1:49 PM, Philippe Mathieu-Daudé wrote:
>>> Some machines use floppy controllers via the SysBus interface,
>>> and don't need to pull in all the ISA code.
>>> Extract the ISA specific code to a new unit: fdc-isa.c, and
>>> add a new Kconfig symbol: "FDC_ISA".
>>>
>>> Reviewed-by: John Snow 
>>> Acked-by: Mark Cave-Ayland 
>>> Reviewed-by: Mark Cave-Ayland 
>>> Signed-off-by: Philippe Mathieu-Daudé 
>>
>> Sorry, I'm seeing build failures on this for patch #03:
>>
>> ../../configure --enable-docs; and make -j13
>>
>> ...
>>
>> /usr/bin/ld: libcommon.fa.p/hw_isa_isa-superio.c.o: in function
>> `isa_superio_realize':
>> /home/jsnow/src/qemu/bin/git/../../hw/isa/isa-superio.c:132: undefined
>> reference to `isa_fdc_init_drives'
>> collect2: error: ld returned 1 exit status
>>
>>
> 
> It appears to show up if you enable the mips-softmmu target.

Sorry, fixed in v4...




[PATCH v4 8/9] hw/block/fdc-sysbus: Add 'dma-channel' property

2021-05-17 Thread Philippe Mathieu-Daudé
QDev properties to be set before the device is realized should
be exposed as a Property with a DEFINE_PROP_XXX() macro, then
accessed with the equivalent qdev_prop_set_xxx() API.

Do this with the FDCtrlSysBus 'dma-channel' property: convert
it to int32_t, default-initialize with DEFINE_PROP_INT32() and
use qdev_prop_set_int32() to set its value in fdctrl_init_sysbus().

Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/fdc-internal.h | 2 +-
 hw/block/fdc-sysbus.c   | 9 ++---
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/hw/block/fdc-internal.h b/hw/block/fdc-internal.h
index 278de725e69..29b318f7525 100644
--- a/hw/block/fdc-internal.h
+++ b/hw/block/fdc-internal.h
@@ -96,7 +96,7 @@ struct FDCtrl {
 qemu_irq irq;
 /* Controller state */
 QEMUTimer *result_timer;
-int dma_chann;
+int32_t dma_chann;
 uint8_t phase;
 IsaDma *dma;
 /* Controller's identification */
diff --git a/hw/block/fdc-sysbus.c b/hw/block/fdc-sysbus.c
index 8f94c2efb63..74c7c8f2e01 100644
--- a/hw/block/fdc-sysbus.c
+++ b/hw/block/fdc-sysbus.c
@@ -106,15 +106,11 @@ void sysbus_fdc_init_drives(SysBusDevice *dev, DriveInfo 
**fds)
 void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
 hwaddr mmio_base, DriveInfo **fds)
 {
-FDCtrl *fdctrl;
 DeviceState *dev;
 SysBusDevice *sbd;
-FDCtrlSysBus *sys;
 
 dev = qdev_new("sysbus-fdc");
-sys = SYSBUS_FDC(dev);
-fdctrl = >state;
-fdctrl->dma_chann = dma_chann; /* FIXME */
+qdev_prop_set_int32(dev, "dma-channel", dma_chann);
 sbd = SYS_BUS_DEVICE(dev);
 sysbus_realize_and_unref(sbd, _fatal);
 sysbus_connect_irq(sbd, 0, irq);
@@ -131,8 +127,6 @@ static void sysbus_fdc_common_initfn(Object *obj)
 FDCtrlSysBus *sys = SYSBUS_FDC(obj);
 FDCtrl *fdctrl = >state;
 
-fdctrl->dma_chann = -1;
-
 qdev_set_legacy_instance_id(dev, 0 /* io */, 2); /* FIXME */
 
 memory_region_init_io(>iomem, obj,
@@ -173,6 +167,7 @@ static Property sysbus_fdc_properties[] = {
 DEFINE_PROP_SIGNED("fallback", FDCtrlSysBus, state.fallback,
 FLOPPY_DRIVE_TYPE_144, qdev_prop_fdc_drive_type,
 FloppyDriveType),
+DEFINE_PROP_INT32("dma-channel", FDCtrlSysBus, state.dma_chann, -1),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.26.3




[PATCH v4 4/9] hw/block/fdc: Extract ISA floppy controllers to fdc-isa.c

2021-05-17 Thread Philippe Mathieu-Daudé
Some machines use floppy controllers via the SysBus interface,
and don't need to pull in all the ISA code.
Extract the ISA specific code to a new unit: fdc-isa.c, and
add a new Kconfig symbol: "FDC_ISA".

Reviewed-by: John Snow 
Acked-by: Mark Cave-Ayland 
Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/fdc-isa.c   | 313 +++
 hw/block/fdc.c   | 257 ---
 MAINTAINERS  |   1 +
 hw/block/Kconfig |   4 +
 hw/block/meson.build |   1 +
 hw/i386/Kconfig  |   2 +-
 hw/isa/Kconfig   |   8 +-
 hw/sparc64/Kconfig   |   2 +-
 8 files changed, 325 insertions(+), 263 deletions(-)
 create mode 100644 hw/block/fdc-isa.c

diff --git a/hw/block/fdc-isa.c b/hw/block/fdc-isa.c
new file mode 100644
index 000..97f3f9e5c0a
--- /dev/null
+++ b/hw/block/fdc-isa.c
@@ -0,0 +1,313 @@
+/*
+ * QEMU Floppy disk emulator (Intel 82078)
+ *
+ * Copyright (c) 2003, 2007 Jocelyn Mayer
+ * Copyright (c) 2008 Hervé Poussineau
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+/*
+ * The controller is used in Sun4m systems in a slightly different
+ * way. There are changes in DOR register and DMA is not available.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/block/fdc.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/timer.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/irq.h"
+#include "hw/isa/isa.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-properties-system.h"
+#include "migration/vmstate.h"
+#include "hw/block/block.h"
+#include "sysemu/block-backend.h"
+#include "sysemu/blockdev.h"
+#include "sysemu/sysemu.h"
+#include "qemu/log.h"
+#include "qemu/main-loop.h"
+#include "qemu/module.h"
+#include "trace.h"
+#include "qom/object.h"
+#include "fdc-internal.h"
+
+OBJECT_DECLARE_SIMPLE_TYPE(FDCtrlISABus, ISA_FDC)
+
+struct FDCtrlISABus {
+ISADevice parent_obj;
+
+uint32_t iobase;
+uint32_t irq;
+uint32_t dma;
+struct FDCtrl state;
+int32_t bootindexA;
+int32_t bootindexB;
+};
+
+static void fdctrl_external_reset_isa(DeviceState *d)
+{
+FDCtrlISABus *isa = ISA_FDC(d);
+FDCtrl *s = >state;
+
+fdctrl_reset(s, 0);
+}
+
+void isa_fdc_init_drives(ISADevice *fdc, DriveInfo **fds)
+{
+fdctrl_init_drives(_FDC(fdc)->state.bus, fds);
+}
+
+static const MemoryRegionPortio fdc_portio_list[] = {
+{ 1, 5, 1, .read = fdctrl_read, .write = fdctrl_write },
+{ 7, 1, 1, .read = fdctrl_read, .write = fdctrl_write },
+PORTIO_END_OF_LIST(),
+};
+
+static void isabus_fdc_realize(DeviceState *dev, Error **errp)
+{
+ISADevice *isadev = ISA_DEVICE(dev);
+FDCtrlISABus *isa = ISA_FDC(dev);
+FDCtrl *fdctrl = >state;
+Error *err = NULL;
+
+isa_register_portio_list(isadev, >portio_list,
+ isa->iobase, fdc_portio_list, fdctrl,
+ "fdc");
+
+isa_init_irq(isadev, >irq, isa->irq);
+fdctrl->dma_chann = isa->dma;
+if (fdctrl->dma_chann != -1) {
+fdctrl->dma = isa_get_dma(isa_bus_from_device(isadev), isa->dma);
+if (!fdctrl->dma) {
+error_setg(errp, "ISA controller does not support DMA");
+return;
+}
+}
+
+qdev_set_legacy_instance_id(dev, isa->iobase, 2);
+
+fdctrl_realize_common(dev, fdctrl, );
+if (err != NULL) {
+error_propagate(errp, err);
+return;
+}
+}
+
+FloppyDriveType isa_fdc_get_drive_type(ISADevice *fdc, int i)
+{
+FDCtrlISABus *isa = ISA_FDC(fdc);
+
+return isa->state.drives[i].drive;
+}
+
+static void isa_fdc_get_drive_max_chs(FloppyDriveType type, uint8_t *maxc,
+  uint8_t *maxh, uint8_t *maxs)
+{
+const FDFormat *fdf;
+
+*maxc = *maxh = *maxs = 0;
+for (fdf = fd_formats; fdf->drive != FLOPPY_DRIVE_TYPE_NONE; fdf++) {
+if (fdf->drive != type) {
+continue;
+}
+ 

[PATCH v4 3/9] hw/block/fdc: Declare shared prototypes in fdc-internal.h

2021-05-17 Thread Philippe Mathieu-Daudé
We want to extract ISA/SysBus code from the generic fdc.c file.
First, declare the prototypes we will access from the new units
into a new local header: "fdc-internal.h".

Acked-by: Mark Cave-Ayland 
Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/fdc-internal.h | 156 
 hw/block/fdc.c  | 126 +++-
 MAINTAINERS |   1 +
 3 files changed, 165 insertions(+), 118 deletions(-)
 create mode 100644 hw/block/fdc-internal.h

diff --git a/hw/block/fdc-internal.h b/hw/block/fdc-internal.h
new file mode 100644
index 000..278de725e69
--- /dev/null
+++ b/hw/block/fdc-internal.h
@@ -0,0 +1,156 @@
+/*
+ * QEMU Floppy disk emulator (Intel 82078)
+ *
+ * Copyright (c) 2003, 2007 Jocelyn Mayer
+ * Copyright (c) 2008 Hervé Poussineau
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#ifndef HW_BLOCK_FDC_INTERNAL_H
+#define HW_BLOCK_FDC_INTERNAL_H
+
+#include "exec/memory.h"
+#include "exec/ioport.h"
+#include "hw/block/block.h"
+#include "hw/block/fdc.h"
+#include "qapi/qapi-types-block.h"
+
+typedef struct FDCtrl FDCtrl;
+
+/* Floppy bus emulation */
+
+typedef struct FloppyBus {
+BusState bus;
+FDCtrl *fdc;
+} FloppyBus;
+
+/* Floppy disk drive emulation */
+
+typedef enum FDriveRate {
+FDRIVE_RATE_500K = 0x00,  /* 500 Kbps */
+FDRIVE_RATE_300K = 0x01,  /* 300 Kbps */
+FDRIVE_RATE_250K = 0x02,  /* 250 Kbps */
+FDRIVE_RATE_1M   = 0x03,  /*   1 Mbps */
+} FDriveRate;
+
+typedef enum FDriveSize {
+FDRIVE_SIZE_UNKNOWN,
+FDRIVE_SIZE_350,
+FDRIVE_SIZE_525,
+} FDriveSize;
+
+typedef struct FDFormat {
+FloppyDriveType drive;
+uint8_t last_sect;
+uint8_t max_track;
+uint8_t max_head;
+FDriveRate rate;
+} FDFormat;
+
+typedef enum FDiskFlags {
+FDISK_DBL_SIDES  = 0x01,
+} FDiskFlags;
+
+typedef struct FDrive {
+FDCtrl *fdctrl;
+BlockBackend *blk;
+BlockConf *conf;
+/* Drive status */
+FloppyDriveType drive;/* CMOS drive type*/
+uint8_t perpendicular;/* 2.88 MB access mode*/
+/* Position */
+uint8_t head;
+uint8_t track;
+uint8_t sect;
+/* Media */
+FloppyDriveType disk; /* Current disk type  */
+FDiskFlags flags;
+uint8_t last_sect;/* Nb sector per track*/
+uint8_t max_track;/* Nb of tracks   */
+uint16_t bps; /* Bytes per sector   */
+uint8_t ro;   /* Is read-only   */
+uint8_t media_changed;/* Is media changed   */
+uint8_t media_rate;   /* Data rate of medium*/
+
+bool media_validated; /* Have we validated the media? */
+} FDrive;
+
+struct FDCtrl {
+MemoryRegion iomem;
+qemu_irq irq;
+/* Controller state */
+QEMUTimer *result_timer;
+int dma_chann;
+uint8_t phase;
+IsaDma *dma;
+/* Controller's identification */
+uint8_t version;
+/* HW */
+uint8_t sra;
+uint8_t srb;
+uint8_t dor;
+uint8_t dor_vmstate; /* only used as temp during vmstate */
+uint8_t tdr;
+uint8_t dsr;
+uint8_t msr;
+uint8_t cur_drv;
+uint8_t status0;
+uint8_t status1;
+uint8_t status2;
+/* Command FIFO */
+uint8_t *fifo;
+int32_t fifo_size;
+uint32_t data_pos;
+uint32_t data_len;
+uint8_t data_state;
+uint8_t data_dir;
+uint8_t eot; /* last wanted sector */
+/* States kept only to be returned back */
+/* precompensation */
+uint8_t precomp_trk;
+uint8_t config;
+uint8_t lock;
+/* Power down config (also with status regB access mode */
+uint8_t pwrd;
+/* Floppy drives */
+FloppyBus bus;
+uint8_t num_floppies;
+FDrive drives[MAX_FD];
+struct {
+FloppyDriveType type;
+} qdev_for_drives[MAX_FD];
+int reset_sensei;
+FloppyDriveType fallback; /* type=auto failure fallback */
+/* 

Re: [PATCH v3 3/8] hw/block/fdc: Extract ISA floppy controllers to fdc-isa.c

2021-05-17 Thread John Snow

On 5/17/21 2:04 PM, John Snow wrote:

On 5/17/21 1:49 PM, Philippe Mathieu-Daudé wrote:

Some machines use floppy controllers via the SysBus interface,
and don't need to pull in all the ISA code.
Extract the ISA specific code to a new unit: fdc-isa.c, and
add a new Kconfig symbol: "FDC_ISA".

Reviewed-by: John Snow 
Acked-by: Mark Cave-Ayland 
Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 


Sorry, I'm seeing build failures on this for patch #03:

../../configure --enable-docs; and make -j13

...

/usr/bin/ld: libcommon.fa.p/hw_isa_isa-superio.c.o: in function 
`isa_superio_realize':
/home/jsnow/src/qemu/bin/git/../../hw/isa/isa-superio.c:132: undefined 
reference to `isa_fdc_init_drives'

collect2: error: ld returned 1 exit status




It appears to show up if you enable the mips-softmmu target.


--js





[PATCH v4 7/9] hw/sparc/sun4m: Inline sun4m_fdctrl_init()

2021-05-17 Thread Philippe Mathieu-Daudé
There is only one call site for sun4m_fdctrl_init(), and this
function is specific to the sun4m machines, not part of the
SYSBUS_FDC API. Move it locally with the machine code, and
remove its declaration in "hw/block/fdc.h".

Suggested-by: Mark Cave-Ayland 
Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/block/fdc.h |  2 --
 hw/block/fdc-sysbus.c  | 16 
 hw/sparc/sun4m.c   | 16 
 3 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/include/hw/block/fdc.h b/include/hw/block/fdc.h
index 52e45c53078..06612218630 100644
--- a/include/hw/block/fdc.h
+++ b/include/hw/block/fdc.h
@@ -14,8 +14,6 @@ void isa_fdc_init_drives(ISADevice *fdc, DriveInfo **fds);
 void sysbus_fdc_init_drives(SysBusDevice *dev, DriveInfo **fds);
 void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
 hwaddr mmio_base, DriveInfo **fds);
-void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
-   DriveInfo **fds, qemu_irq *fdc_tc);
 
 FloppyDriveType isa_fdc_get_drive_type(ISADevice *fdc, int i);
 int cmos_get_fd_drive_type(FloppyDriveType fd0);
diff --git a/hw/block/fdc-sysbus.c b/hw/block/fdc-sysbus.c
index 1163e53165d..8f94c2efb63 100644
--- a/hw/block/fdc-sysbus.c
+++ b/hw/block/fdc-sysbus.c
@@ -123,22 +123,6 @@ void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
 sysbus_fdc_init_drives(sbd, fds);
 }
 
-void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
-   DriveInfo **fds, qemu_irq *fdc_tc)
-{
-DeviceState *dev;
-SysBusDevice *sbd;
-
-dev = qdev_new("sun-fdtwo");
-sbd = SYS_BUS_DEVICE(dev);
-sysbus_realize_and_unref(sbd, _fatal);
-sysbus_connect_irq(sbd, 0, irq);
-sysbus_mmio_map(sbd, 0, io_base);
-*fdc_tc = qdev_get_gpio_in(dev, 0);
-
-sysbus_fdc_init_drives(sbd, fds);
-}
-
 static void sysbus_fdc_common_initfn(Object *obj)
 {
 DeviceState *dev = DEVICE(obj);
diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
index 42e139849ed..c08c650da72 100644
--- a/hw/sparc/sun4m.c
+++ b/hw/sparc/sun4m.c
@@ -816,6 +816,22 @@ static void dummy_fdc_tc(void *opaque, int irq, int level)
 {
 }
 
+static void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
+  DriveInfo **fds, qemu_irq *fdc_tc)
+{
+DeviceState *dev;
+SysBusDevice *sbd;
+
+dev = qdev_new("sun-fdtwo");
+sbd = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(sbd, _fatal);
+sysbus_connect_irq(sbd, 0, irq);
+sysbus_mmio_map(sbd, 0, io_base);
+*fdc_tc = qdev_get_gpio_in(dev, 0);
+
+sysbus_fdc_init_drives(sbd, fds);
+}
+
 static void sun4m_hw_init(MachineState *machine)
 {
 const struct sun4m_hwdef *hwdef = SUN4M_MACHINE_GET_CLASS(machine)->hwdef;
-- 
2.26.3




[PATCH v4 6/9] hw/block/fdc: Add sysbus_fdc_init_drives() method

2021-05-17 Thread Philippe Mathieu-Daudé
FDCtrlSysBus's FDCtrl state is a private field. However it is
accessed by the public fdctrl_init_sysbus() and sun4m_fdctrl_init()
methods. To be able to move them out of fdc-sysbus.c, first add
the sysbus_fdc_init_drives() method and use it in these 2 functions.

Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/block/fdc.h |  2 ++
 hw/block/fdc-sysbus.c  | 23 ---
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/include/hw/block/fdc.h b/include/hw/block/fdc.h
index 1ecca7cac7f..52e45c53078 100644
--- a/include/hw/block/fdc.h
+++ b/include/hw/block/fdc.h
@@ -3,6 +3,7 @@
 
 #include "exec/hwaddr.h"
 #include "qapi/qapi-types-block.h"
+#include "hw/sysbus.h"
 
 /* fdc.c */
 #define MAX_FD 2
@@ -10,6 +11,7 @@
 #define TYPE_ISA_FDC "isa-fdc"
 
 void isa_fdc_init_drives(ISADevice *fdc, DriveInfo **fds);
+void sysbus_fdc_init_drives(SysBusDevice *dev, DriveInfo **fds);
 void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
 hwaddr mmio_base, DriveInfo **fds);
 void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
diff --git a/hw/block/fdc-sysbus.c b/hw/block/fdc-sysbus.c
index 71755fd6ae4..1163e53165d 100644
--- a/hw/block/fdc-sysbus.c
+++ b/hw/block/fdc-sysbus.c
@@ -94,6 +94,15 @@ static void fdctrl_handle_tc(void *opaque, int irq, int 
level)
 trace_fdctrl_tc_pulse(level);
 }
 
+void sysbus_fdc_init_drives(SysBusDevice *dev, DriveInfo **fds)
+{
+FDCtrlSysBus *fdc;
+
+fdc = SYSBUS_FDC(dev);
+
+fdctrl_init_drives(>state.bus, fds);
+}
+
 void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
 hwaddr mmio_base, DriveInfo **fds)
 {
@@ -111,23 +120,23 @@ void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
 sysbus_connect_irq(sbd, 0, irq);
 sysbus_mmio_map(sbd, 0, mmio_base);
 
-fdctrl_init_drives(>state.bus, fds);
+sysbus_fdc_init_drives(sbd, fds);
 }
 
 void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
DriveInfo **fds, qemu_irq *fdc_tc)
 {
 DeviceState *dev;
-FDCtrlSysBus *sys;
+SysBusDevice *sbd;
 
 dev = qdev_new("sun-fdtwo");
-sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
-sys = SYSBUS_FDC(dev);
-sysbus_connect_irq(SYS_BUS_DEVICE(sys), 0, irq);
-sysbus_mmio_map(SYS_BUS_DEVICE(sys), 0, io_base);
+sbd = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(sbd, _fatal);
+sysbus_connect_irq(sbd, 0, irq);
+sysbus_mmio_map(sbd, 0, io_base);
 *fdc_tc = qdev_get_gpio_in(dev, 0);
 
-fdctrl_init_drives(>state.bus, fds);
+sysbus_fdc_init_drives(sbd, fds);
 }
 
 static void sysbus_fdc_common_initfn(Object *obj)
-- 
2.26.3




Re: [PATCH 04/21] qdev: allow setting drive property for realized device

2021-05-17 Thread Vladimir Sementsov-Ogievskiy

17.05.2021 18:48, Max Reitz wrote:

On 17.05.21 08:44, Vladimir Sementsov-Ogievskiy wrote:

We need an ability to insert filters above top block node, attached to
block device. It can't be achieved with blockdev-reopen command. So, we
want do it with help of qom-set.

Intended usage:

1. blockdev-add, creating the filter, which child is at top node A,
    attached to some guest block device.


Is a “not” missing here, i.e. “not attached to any guest block device”?  I 
would have thought one would create a filtered tree that is not in use by any 
frontend, so that the filter need not take any permissions.


node A is attached.

So, we have [blk] --root->  [A}

And want to insert a filter between blk and A.

We do

1.

[filter] --file--\
 v
[blk] --root-->  [A]

2.

[blk] --root--> [filer] --file--> [A]




2. qom-set, to change bs attached to root blk from original node to
    newly create filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  hw/core/qdev-properties-system.c | 30 ++
  1 file changed, 22 insertions(+), 8 deletions(-)


Looks good, just one question: (well, two, one was above)


diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 2760c21f11..7d97562654 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c


[...]


@@ -196,6 +209,7 @@ static void release_drive(Object *obj, const char *name, 
void *opaque)
  const PropertyInfo qdev_prop_drive = {
  .name  = "str",
  .description = "Node name or ID of a block device to use as a backend",
+    .realized_set_allowed = true,
  .get   = get_drive,
  .set   = set_drive,
  .release = release_drive,


Why not for qdev_prop_drive_iothread?



Hmm, the only reason is that I missed that part of architecture around here, 
I'm new to qdev code. Will add with next version


--
Best regards,
Vladimir



Re: [PATCH 01/21] block: introduce bdrv_replace_child_bs()

2021-05-17 Thread Vladimir Sementsov-Ogievskiy

17.05.2021 18:51, Max Reitz wrote:

On 17.05.21 16:30, Vladimir Sementsov-Ogievskiy wrote:

17.05.2021 15:09, Max Reitz wrote:

On 17.05.21 08:44, Vladimir Sementsov-Ogievskiy wrote:

Add function to transactionally replace bs inside BdrvChild.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/block.h |  2 ++
  block.c   | 36 
  2 files changed, 38 insertions(+)


As you may guess, I know little about the rewritten replacing functions, so 
this is kind of difficult to review for me.  However, nothing looks out of 
place, and the function looks sufficiently similar to 
bdrv_replace_node_common() to make me happy.


diff --git a/include/block/block.h b/include/block/block.h
index 82185965ff..f9d5fcb108 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -361,6 +361,8 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState 
*bs_top,
  Error **errp);
  int bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
    Error **errp);
+int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
+  Error **errp);
  BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict *node_options,
 int flags, Error **errp);
  int bdrv_drop_filter(BlockDriverState *bs, Error **errp);
diff --git a/block.c b/block.c
index 9ad725d205..755fa53d85 100644
--- a/block.c
+++ b/block.c
@@ -4961,6 +4961,42 @@ out:
  return ret;
  }
+int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
+  Error **errp)
+{
+    int ret;
+    Transaction *tran = tran_new();
+    g_autoptr(GHashTable) found = NULL;
+    g_autoptr(GSList) refresh_list = NULL;
+    BlockDriverState *old_bs = child->bs;
+
+    if (old_bs) {


Hm.  Can child->bs be ever NULL?


Hmm. Most probably not :)

In some intermediate states we don't have bs in child, but it shouldn't be the 
place where bdrv_replace_child_bs is called.




+    bdrv_ref(old_bs);
+    bdrv_drained_begin(old_bs);
+    }
+    bdrv_drained_begin(new_bs);


(I was wondering why we couldn’t handle the new_bs == NULL case here to replace 
bdrv_remove_filter_or_cow_child(), but then I realized it’s probably because 
that’s kind of difficult, precisely because child->bs at least should generally 
be non-NULL.  Which is why bdrv_remove_filter_or_cow_child() needs to add its own 
transaction entry to handle the BdrvChild object and the pointer to it.

Hence me wondering whether we could assume child->bs not to be NULL.)


bdrv_remove_filter_or_cow_child() is "lower leve" function: it doesn't do 
drained section nor permission update. And new bdrv_replace_child_bs() is public 
function, which cares about these things.




+
+    bdrv_replace_child(child, new_bs, tran);
+
+    found = g_hash_table_new(NULL, NULL);
+    if (old_bs) {
+    refresh_list = bdrv_topological_dfs(refresh_list, found, old_bs);
+    }
+    refresh_list = bdrv_topological_dfs(refresh_list, found, new_bs);
+
+    ret = bdrv_list_refresh_perms(refresh_list, NULL, tran, errp);


Speaking of bdrv_remove_filter_or_cow_child(): That function doesn’t refresh 
permissions.  I think it’s correct to do it here, so the following question 
doesn’t really concern this patch, but: Why don’t we do it there?

I guess it’s because we expect the node to go away anyway, so we don’t need to 
refresh the permissions.  And that assumption should hold true right now, given 
its callers.  But is that a safe assumption in general?  Would there be a 
problem if we refreshed permissions there? Or is not refreshing permissions 
just part of the function’s interface?



Caller of bdrv_remove_filter_or_cow_child() should care about permissions:  
bdrv_replace_node_common() do this, and bdrv_set_backing_noperm() has "_noperm" 
in the name..


OK.  Makes me wonder why bdrv_remove_filter_or_cow_child() then doesn’t have 
_noperm in its name, or why its comment doesn’t explain this interface 
contract, but, well. :)


You are right that's unclear. I'll make the patch that cleans that up.




The main impact of previous big rework of permission is new scheme of working 
with permission update:

  - first do all graph modifications, not thinking about permissions
  - refresh permissions for the whole updated subgraph
  - if refresh failed, rollback all the modifications (main sense if 
transactions here and there is possibility to do this rollback)

So a lot of internal functions with @tran argument don't update permissions. 
But of course, we should care to update permissions after any graph 
modification.


Ah, OK.  Makes sense, thanks.

Max




--
Best regards,
Vladimir



Re: [PATCH v3 3/8] hw/block/fdc: Extract ISA floppy controllers to fdc-isa.c

2021-05-17 Thread John Snow

On 5/17/21 1:49 PM, Philippe Mathieu-Daudé wrote:

Some machines use floppy controllers via the SysBus interface,
and don't need to pull in all the ISA code.
Extract the ISA specific code to a new unit: fdc-isa.c, and
add a new Kconfig symbol: "FDC_ISA".

Reviewed-by: John Snow 
Acked-by: Mark Cave-Ayland 
Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 


Sorry, I'm seeing build failures on this for patch #03:

../../configure --enable-docs; and make -j13

...

/usr/bin/ld: libcommon.fa.p/hw_isa_isa-superio.c.o: in function 
`isa_superio_realize':
/home/jsnow/src/qemu/bin/git/../../hw/isa/isa-superio.c:132: undefined 
reference to `isa_fdc_init_drives'

collect2: error: ld returned 1 exit status


--js




RE: [RFC PATCH] block/io.c: Flush parent for quorum in generic code

2021-05-17 Thread Zhang, Chen


> -Original Message-
> From: Stefan Hajnoczi 
> Sent: Thursday, May 13, 2021 10:26 PM
> To: Zhang, Chen 
> Cc: Kevin Wolf ; Max Reitz ; Fam
> Zheng ; qemu-dev ; qemu-
> block ; Zhang Chen ;
> Minghao Yuan 
> Subject: Re: [RFC PATCH] block/io.c: Flush parent for quorum in generic code
> 
> On Wed, May 12, 2021 at 03:49:57PM +0800, Zhang Chen wrote:
> > Fix the issue from this patch:
> > [PATCH] block: Flush all children in generic code From
> > 883833e29cb800b4d92b5d4736252f4004885191
> >
> > Quorum driver do not have the primary child.
> > It will caused guest block flush issue when use quorum and NBD.
> > The vm guest flushes failed,and then guest filesystem is shutdown.
> >
> > Signed-off-by: Zhang Chen 
> > Reported-by: Minghao Yuan 
> > ---
> >  block/io.c | 31 ++-
> >  1 file changed, 22 insertions(+), 9 deletions(-)
> ...
> > +flush_data:
> > +if (no_primary_child) {
> > +/* Flush parent */
> > +ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
> > +} else {
> > +/* Flush childrens */
> > +ret = 0;
> > +QLIST_FOREACH(child, >children, next) {
> > +if (child->perm & (BLK_PERM_WRITE |
> BLK_PERM_WRITE_UNCHANGED)) {
> > +int this_child_ret = bdrv_co_flush(child->bs);
> > +if (!ret) {
> > +ret = this_child_ret;
> > +}
> >  }
> >  }
> 
> I'm missing something:
> 
> The quorum driver has a valid bs->children list even though it does not have a
> primary child. Why does QLIST_FOREACH() loop fail for you?
> 

Yes, in most cases QLIST_FOREACH() works for me.
But not work when one of the child disconnected, the original patch changed
the default behavior of quorum driver when occurs issue.

For example:
VM quorum driver have two children, local disk1 and NBD disk2.
When network down and NBD disk2 disconnected, current code will report 
"event": "QUORUM_REPORT_BAD" "type": "flush", "error": "Input/output error"
And
"event": "BLOCK_IO_ERROR" "#block008", "reason": "Input/output error"

The guest even cannot read/write the normal local disk1. VM IO will crashed 
caused by NDB disk2 input/output error.
I think we do need report the event about we lose a child(NBD disk2) at this 
time, but no need crash all IO system.
Because we can fix it by x-blockdev-change and drive_add/drive_del for new 
children. 
Before the original patch(883833e2), VM still can read/write the local disk1.

This patch just the RFC version, please give me more comments to fix this issue.
 
Thanks
Chen


> Does this patch effectively skip bdrv_co_flush() calls on all quorum children?
> That seems wrong since children need to be flushed so that data is persisted.
> 

Yes, 

> Stefan


Re: [PATCH] fdc: check drive block device before usage (CVE-2021-20196)

2021-05-17 Thread Philippe Mathieu-Daudé
On 5/17/21 1:12 PM, P J P wrote:
> +-- On Sat, 15 May 2021, Philippe Mathieu-Daudé wrote --+
> | This patch misses the qtest companion with the reproducer
> | provided by Alexander.
> 
> Do we need a revised patch[-series] including a qtest? OR it can be done at 
> merge time?

Paolo usually asks for it and don't queue patch without qtest when
reproducer is available, but since it is a recent CVE it might I
suppose it depends on the maintainer :)




[PATCH v3 8/8] hw/mips/jazz: Inline fdctrl_init_sysbus()

2021-05-17 Thread Philippe Mathieu-Daudé
There is only one call site for fdctrl_init_sysbus(), and this
function is specific to the jazz machines, not part of the
SYSBUS_FDC API. Move it locally with the machine code, and
remove its declaration in "hw/block/fdc.h".

Suggested-by: Mark Cave-Ayland 
Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/block/fdc.h |  3 ---
 hw/block/fdc-sysbus.c  | 16 
 hw/mips/jazz.c | 16 
 3 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/include/hw/block/fdc.h b/include/hw/block/fdc.h
index 06612218630..ac99d6bcaa0 100644
--- a/include/hw/block/fdc.h
+++ b/include/hw/block/fdc.h
@@ -1,7 +1,6 @@
 #ifndef HW_FDC_H
 #define HW_FDC_H
 
-#include "exec/hwaddr.h"
 #include "qapi/qapi-types-block.h"
 #include "hw/sysbus.h"
 
@@ -12,8 +11,6 @@
 
 void isa_fdc_init_drives(ISADevice *fdc, DriveInfo **fds);
 void sysbus_fdc_init_drives(SysBusDevice *dev, DriveInfo **fds);
-void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
-hwaddr mmio_base, DriveInfo **fds);
 
 FloppyDriveType isa_fdc_get_drive_type(ISADevice *fdc, int i);
 int cmos_get_fd_drive_type(FloppyDriveType fd0);
diff --git a/hw/block/fdc-sysbus.c b/hw/block/fdc-sysbus.c
index 74c7c8f2e01..5c7e49bcc3f 100644
--- a/hw/block/fdc-sysbus.c
+++ b/hw/block/fdc-sysbus.c
@@ -103,22 +103,6 @@ void sysbus_fdc_init_drives(SysBusDevice *dev, DriveInfo 
**fds)
 fdctrl_init_drives(>state.bus, fds);
 }
 
-void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
-hwaddr mmio_base, DriveInfo **fds)
-{
-DeviceState *dev;
-SysBusDevice *sbd;
-
-dev = qdev_new("sysbus-fdc");
-qdev_prop_set_int32(dev, "dma-channel", dma_chann);
-sbd = SYS_BUS_DEVICE(dev);
-sysbus_realize_and_unref(sbd, _fatal);
-sysbus_connect_irq(sbd, 0, irq);
-sysbus_mmio_map(sbd, 0, mmio_base);
-
-sysbus_fdc_init_drives(sbd, fds);
-}
-
 static void sysbus_fdc_common_initfn(Object *obj)
 {
 DeviceState *dev = DEVICE(obj);
diff --git a/hw/mips/jazz.c b/hw/mips/jazz.c
index dba2088ed1a..13f26c5991f 100644
--- a/hw/mips/jazz.c
+++ b/hw/mips/jazz.c
@@ -143,6 +143,22 @@ static void mips_jazz_do_transaction_failed(CPUState *cs, 
hwaddr physaddr,
 }
 #endif /* CONFIG_TCG && !CONFIG_USER_ONLY */
 
+static void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
+   hwaddr mmio_base, DriveInfo **fds)
+{
+DeviceState *dev;
+SysBusDevice *sbd;
+
+dev = qdev_new("sysbus-fdc");
+qdev_prop_set_int32(dev, "dma-channel", dma_chann);
+sbd = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(sbd, _fatal);
+sysbus_connect_irq(sbd, 0, irq);
+sysbus_mmio_map(sbd, 0, mmio_base);
+
+sysbus_fdc_init_drives(sbd, fds);
+}
+
 static void mips_jazz_init(MachineState *machine,
enum jazz_model_e jazz_model)
 {
-- 
2.26.3




[PATCH v3 7/8] hw/block/fdc-sysbus: Add 'dma-channel' property

2021-05-17 Thread Philippe Mathieu-Daudé
QDev properties to be set before the device is realized should
be exposed as a Property with a DEFINE_PROP_XXX() macro, then
accessed with the equivalent qdev_prop_set_xxx() API.

Do this with the FDCtrlSysBus 'dma-channel' property: convert
it to int32_t, default-initialize with DEFINE_PROP_INT32() and
use qdev_prop_set_int32() to set its value in fdctrl_init_sysbus().

Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/fdc-internal.h | 2 +-
 hw/block/fdc-sysbus.c   | 9 ++---
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/hw/block/fdc-internal.h b/hw/block/fdc-internal.h
index 278de725e69..29b318f7525 100644
--- a/hw/block/fdc-internal.h
+++ b/hw/block/fdc-internal.h
@@ -96,7 +96,7 @@ struct FDCtrl {
 qemu_irq irq;
 /* Controller state */
 QEMUTimer *result_timer;
-int dma_chann;
+int32_t dma_chann;
 uint8_t phase;
 IsaDma *dma;
 /* Controller's identification */
diff --git a/hw/block/fdc-sysbus.c b/hw/block/fdc-sysbus.c
index 8f94c2efb63..74c7c8f2e01 100644
--- a/hw/block/fdc-sysbus.c
+++ b/hw/block/fdc-sysbus.c
@@ -106,15 +106,11 @@ void sysbus_fdc_init_drives(SysBusDevice *dev, DriveInfo 
**fds)
 void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
 hwaddr mmio_base, DriveInfo **fds)
 {
-FDCtrl *fdctrl;
 DeviceState *dev;
 SysBusDevice *sbd;
-FDCtrlSysBus *sys;
 
 dev = qdev_new("sysbus-fdc");
-sys = SYSBUS_FDC(dev);
-fdctrl = >state;
-fdctrl->dma_chann = dma_chann; /* FIXME */
+qdev_prop_set_int32(dev, "dma-channel", dma_chann);
 sbd = SYS_BUS_DEVICE(dev);
 sysbus_realize_and_unref(sbd, _fatal);
 sysbus_connect_irq(sbd, 0, irq);
@@ -131,8 +127,6 @@ static void sysbus_fdc_common_initfn(Object *obj)
 FDCtrlSysBus *sys = SYSBUS_FDC(obj);
 FDCtrl *fdctrl = >state;
 
-fdctrl->dma_chann = -1;
-
 qdev_set_legacy_instance_id(dev, 0 /* io */, 2); /* FIXME */
 
 memory_region_init_io(>iomem, obj,
@@ -173,6 +167,7 @@ static Property sysbus_fdc_properties[] = {
 DEFINE_PROP_SIGNED("fallback", FDCtrlSysBus, state.fallback,
 FLOPPY_DRIVE_TYPE_144, qdev_prop_fdc_drive_type,
 FloppyDriveType),
+DEFINE_PROP_INT32("dma-channel", FDCtrlSysBus, state.dma_chann, -1),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.26.3




[PATCH v3 4/8] hw/block/fdc: Extract SysBus floppy controllers to fdc-sysbus.c

2021-05-17 Thread Philippe Mathieu-Daudé
Some machines use floppy controllers via the SysBus interface,
and don't need to pull in all the SysBus code.
Extract the SysBus specific code to a new unit: fdc-sysbus.c,
and add a new Kconfig symbol: "FDC_SYSBUS".

Reviewed-by: John Snow 
Acked-by: Mark Cave-Ayland 
Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/fdc-sysbus.c | 252 ++
 hw/block/fdc.c| 220 
 MAINTAINERS   |   1 +
 hw/block/Kconfig  |   4 +
 hw/block/meson.build  |   1 +
 hw/block/trace-events |   2 +
 hw/mips/Kconfig   |   2 +-
 hw/sparc/Kconfig  |   2 +-
 8 files changed, 262 insertions(+), 222 deletions(-)
 create mode 100644 hw/block/fdc-sysbus.c

diff --git a/hw/block/fdc-sysbus.c b/hw/block/fdc-sysbus.c
new file mode 100644
index 000..71755fd6ae4
--- /dev/null
+++ b/hw/block/fdc-sysbus.c
@@ -0,0 +1,252 @@
+/*
+ * QEMU Floppy disk emulator (Intel 82078)
+ *
+ * Copyright (c) 2003, 2007 Jocelyn Mayer
+ * Copyright (c) 2008 Hervé Poussineau
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+#include "hw/sysbus.h"
+#include "hw/block/fdc.h"
+#include "migration/vmstate.h"
+#include "fdc-internal.h"
+#include "trace.h"
+
+#define TYPE_SYSBUS_FDC "base-sysbus-fdc"
+typedef struct FDCtrlSysBusClass FDCtrlSysBusClass;
+typedef struct FDCtrlSysBus FDCtrlSysBus;
+DECLARE_OBJ_CHECKERS(FDCtrlSysBus, FDCtrlSysBusClass,
+ SYSBUS_FDC, TYPE_SYSBUS_FDC)
+
+struct FDCtrlSysBusClass {
+/*< private >*/
+SysBusDeviceClass parent_class;
+/*< public >*/
+
+bool use_strict_io;
+};
+
+struct FDCtrlSysBus {
+/*< private >*/
+SysBusDevice parent_obj;
+/*< public >*/
+
+struct FDCtrl state;
+};
+
+static uint64_t fdctrl_read_mem(void *opaque, hwaddr reg, unsigned ize)
+{
+return fdctrl_read(opaque, (uint32_t)reg);
+}
+
+static void fdctrl_write_mem(void *opaque, hwaddr reg,
+ uint64_t value, unsigned size)
+{
+fdctrl_write(opaque, (uint32_t)reg, value);
+}
+
+static const MemoryRegionOps fdctrl_mem_ops = {
+.read = fdctrl_read_mem,
+.write = fdctrl_write_mem,
+.endianness = DEVICE_NATIVE_ENDIAN,
+};
+
+static const MemoryRegionOps fdctrl_mem_strict_ops = {
+.read = fdctrl_read_mem,
+.write = fdctrl_write_mem,
+.endianness = DEVICE_NATIVE_ENDIAN,
+.valid = {
+.min_access_size = 1,
+.max_access_size = 1,
+},
+};
+
+static void fdctrl_external_reset_sysbus(DeviceState *d)
+{
+FDCtrlSysBus *sys = SYSBUS_FDC(d);
+FDCtrl *s = >state;
+
+fdctrl_reset(s, 0);
+}
+
+static void fdctrl_handle_tc(void *opaque, int irq, int level)
+{
+trace_fdctrl_tc_pulse(level);
+}
+
+void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
+hwaddr mmio_base, DriveInfo **fds)
+{
+FDCtrl *fdctrl;
+DeviceState *dev;
+SysBusDevice *sbd;
+FDCtrlSysBus *sys;
+
+dev = qdev_new("sysbus-fdc");
+sys = SYSBUS_FDC(dev);
+fdctrl = >state;
+fdctrl->dma_chann = dma_chann; /* FIXME */
+sbd = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(sbd, _fatal);
+sysbus_connect_irq(sbd, 0, irq);
+sysbus_mmio_map(sbd, 0, mmio_base);
+
+fdctrl_init_drives(>state.bus, fds);
+}
+
+void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
+   DriveInfo **fds, qemu_irq *fdc_tc)
+{
+DeviceState *dev;
+FDCtrlSysBus *sys;
+
+dev = qdev_new("sun-fdtwo");
+sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
+sys = SYSBUS_FDC(dev);
+sysbus_connect_irq(SYS_BUS_DEVICE(sys), 0, irq);
+sysbus_mmio_map(SYS_BUS_DEVICE(sys), 0, io_base);
+*fdc_tc = qdev_get_gpio_in(dev, 0);
+
+fdctrl_init_drives(>state.bus, fds);
+}
+
+static void sysbus_fdc_common_initfn(Object *obj)
+{
+DeviceState *dev = DEVICE(obj);
+ 

[PATCH v3 6/8] hw/sparc/sun4m: Inline sun4m_fdctrl_init()

2021-05-17 Thread Philippe Mathieu-Daudé
There is only one call site for sun4m_fdctrl_init(), and this
function is specific to the sun4m machines, not part of the
SYSBUS_FDC API. Move it locally with the machine code, and
remove its declaration in "hw/block/fdc.h".

Suggested-by: Mark Cave-Ayland 
Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/block/fdc.h |  2 --
 hw/block/fdc-sysbus.c  | 16 
 hw/sparc/sun4m.c   | 16 
 3 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/include/hw/block/fdc.h b/include/hw/block/fdc.h
index 52e45c53078..06612218630 100644
--- a/include/hw/block/fdc.h
+++ b/include/hw/block/fdc.h
@@ -14,8 +14,6 @@ void isa_fdc_init_drives(ISADevice *fdc, DriveInfo **fds);
 void sysbus_fdc_init_drives(SysBusDevice *dev, DriveInfo **fds);
 void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
 hwaddr mmio_base, DriveInfo **fds);
-void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
-   DriveInfo **fds, qemu_irq *fdc_tc);
 
 FloppyDriveType isa_fdc_get_drive_type(ISADevice *fdc, int i);
 int cmos_get_fd_drive_type(FloppyDriveType fd0);
diff --git a/hw/block/fdc-sysbus.c b/hw/block/fdc-sysbus.c
index 1163e53165d..8f94c2efb63 100644
--- a/hw/block/fdc-sysbus.c
+++ b/hw/block/fdc-sysbus.c
@@ -123,22 +123,6 @@ void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
 sysbus_fdc_init_drives(sbd, fds);
 }
 
-void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
-   DriveInfo **fds, qemu_irq *fdc_tc)
-{
-DeviceState *dev;
-SysBusDevice *sbd;
-
-dev = qdev_new("sun-fdtwo");
-sbd = SYS_BUS_DEVICE(dev);
-sysbus_realize_and_unref(sbd, _fatal);
-sysbus_connect_irq(sbd, 0, irq);
-sysbus_mmio_map(sbd, 0, io_base);
-*fdc_tc = qdev_get_gpio_in(dev, 0);
-
-sysbus_fdc_init_drives(sbd, fds);
-}
-
 static void sysbus_fdc_common_initfn(Object *obj)
 {
 DeviceState *dev = DEVICE(obj);
diff --git a/hw/sparc/sun4m.c b/hw/sparc/sun4m.c
index 42e139849ed..c08c650da72 100644
--- a/hw/sparc/sun4m.c
+++ b/hw/sparc/sun4m.c
@@ -816,6 +816,22 @@ static void dummy_fdc_tc(void *opaque, int irq, int level)
 {
 }
 
+static void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
+  DriveInfo **fds, qemu_irq *fdc_tc)
+{
+DeviceState *dev;
+SysBusDevice *sbd;
+
+dev = qdev_new("sun-fdtwo");
+sbd = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(sbd, _fatal);
+sysbus_connect_irq(sbd, 0, irq);
+sysbus_mmio_map(sbd, 0, io_base);
+*fdc_tc = qdev_get_gpio_in(dev, 0);
+
+sysbus_fdc_init_drives(sbd, fds);
+}
+
 static void sun4m_hw_init(MachineState *machine)
 {
 const struct sun4m_hwdef *hwdef = SUN4M_MACHINE_GET_CLASS(machine)->hwdef;
-- 
2.26.3




[PATCH v3 3/8] hw/block/fdc: Extract ISA floppy controllers to fdc-isa.c

2021-05-17 Thread Philippe Mathieu-Daudé
Some machines use floppy controllers via the SysBus interface,
and don't need to pull in all the ISA code.
Extract the ISA specific code to a new unit: fdc-isa.c, and
add a new Kconfig symbol: "FDC_ISA".

Reviewed-by: John Snow 
Acked-by: Mark Cave-Ayland 
Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/fdc-isa.c   | 313 +++
 hw/block/fdc.c   | 257 ---
 MAINTAINERS  |   1 +
 hw/block/Kconfig |   4 +
 hw/block/meson.build |   1 +
 hw/i386/Kconfig  |   2 +-
 hw/isa/Kconfig   |   6 +-
 hw/sparc64/Kconfig   |   2 +-
 8 files changed, 324 insertions(+), 262 deletions(-)
 create mode 100644 hw/block/fdc-isa.c

diff --git a/hw/block/fdc-isa.c b/hw/block/fdc-isa.c
new file mode 100644
index 000..97f3f9e5c0a
--- /dev/null
+++ b/hw/block/fdc-isa.c
@@ -0,0 +1,313 @@
+/*
+ * QEMU Floppy disk emulator (Intel 82078)
+ *
+ * Copyright (c) 2003, 2007 Jocelyn Mayer
+ * Copyright (c) 2008 Hervé Poussineau
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+/*
+ * The controller is used in Sun4m systems in a slightly different
+ * way. There are changes in DOR register and DMA is not available.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/block/fdc.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/timer.h"
+#include "hw/acpi/aml-build.h"
+#include "hw/irq.h"
+#include "hw/isa/isa.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-properties-system.h"
+#include "migration/vmstate.h"
+#include "hw/block/block.h"
+#include "sysemu/block-backend.h"
+#include "sysemu/blockdev.h"
+#include "sysemu/sysemu.h"
+#include "qemu/log.h"
+#include "qemu/main-loop.h"
+#include "qemu/module.h"
+#include "trace.h"
+#include "qom/object.h"
+#include "fdc-internal.h"
+
+OBJECT_DECLARE_SIMPLE_TYPE(FDCtrlISABus, ISA_FDC)
+
+struct FDCtrlISABus {
+ISADevice parent_obj;
+
+uint32_t iobase;
+uint32_t irq;
+uint32_t dma;
+struct FDCtrl state;
+int32_t bootindexA;
+int32_t bootindexB;
+};
+
+static void fdctrl_external_reset_isa(DeviceState *d)
+{
+FDCtrlISABus *isa = ISA_FDC(d);
+FDCtrl *s = >state;
+
+fdctrl_reset(s, 0);
+}
+
+void isa_fdc_init_drives(ISADevice *fdc, DriveInfo **fds)
+{
+fdctrl_init_drives(_FDC(fdc)->state.bus, fds);
+}
+
+static const MemoryRegionPortio fdc_portio_list[] = {
+{ 1, 5, 1, .read = fdctrl_read, .write = fdctrl_write },
+{ 7, 1, 1, .read = fdctrl_read, .write = fdctrl_write },
+PORTIO_END_OF_LIST(),
+};
+
+static void isabus_fdc_realize(DeviceState *dev, Error **errp)
+{
+ISADevice *isadev = ISA_DEVICE(dev);
+FDCtrlISABus *isa = ISA_FDC(dev);
+FDCtrl *fdctrl = >state;
+Error *err = NULL;
+
+isa_register_portio_list(isadev, >portio_list,
+ isa->iobase, fdc_portio_list, fdctrl,
+ "fdc");
+
+isa_init_irq(isadev, >irq, isa->irq);
+fdctrl->dma_chann = isa->dma;
+if (fdctrl->dma_chann != -1) {
+fdctrl->dma = isa_get_dma(isa_bus_from_device(isadev), isa->dma);
+if (!fdctrl->dma) {
+error_setg(errp, "ISA controller does not support DMA");
+return;
+}
+}
+
+qdev_set_legacy_instance_id(dev, isa->iobase, 2);
+
+fdctrl_realize_common(dev, fdctrl, );
+if (err != NULL) {
+error_propagate(errp, err);
+return;
+}
+}
+
+FloppyDriveType isa_fdc_get_drive_type(ISADevice *fdc, int i)
+{
+FDCtrlISABus *isa = ISA_FDC(fdc);
+
+return isa->state.drives[i].drive;
+}
+
+static void isa_fdc_get_drive_max_chs(FloppyDriveType type, uint8_t *maxc,
+  uint8_t *maxh, uint8_t *maxs)
+{
+const FDFormat *fdf;
+
+*maxc = *maxh = *maxs = 0;
+for (fdf = fd_formats; fdf->drive != FLOPPY_DRIVE_TYPE_NONE; fdf++) {
+if (fdf->drive != type) {
+continue;
+}
+ 

[PATCH v3 5/8] hw/block/fdc: Add sysbus_fdc_init_drives() method

2021-05-17 Thread Philippe Mathieu-Daudé
FDCtrlSysBus's FDCtrl state is a private field. However it is
accessed by the public fdctrl_init_sysbus() and sun4m_fdctrl_init()
methods. To be able to move them out of fdc-sysbus.c, first add
the sysbus_fdc_init_drives() method and use it in these 2 functions.

Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 
---
 include/hw/block/fdc.h |  2 ++
 hw/block/fdc-sysbus.c  | 23 ---
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/include/hw/block/fdc.h b/include/hw/block/fdc.h
index 1ecca7cac7f..52e45c53078 100644
--- a/include/hw/block/fdc.h
+++ b/include/hw/block/fdc.h
@@ -3,6 +3,7 @@
 
 #include "exec/hwaddr.h"
 #include "qapi/qapi-types-block.h"
+#include "hw/sysbus.h"
 
 /* fdc.c */
 #define MAX_FD 2
@@ -10,6 +11,7 @@
 #define TYPE_ISA_FDC "isa-fdc"
 
 void isa_fdc_init_drives(ISADevice *fdc, DriveInfo **fds);
+void sysbus_fdc_init_drives(SysBusDevice *dev, DriveInfo **fds);
 void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
 hwaddr mmio_base, DriveInfo **fds);
 void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
diff --git a/hw/block/fdc-sysbus.c b/hw/block/fdc-sysbus.c
index 71755fd6ae4..1163e53165d 100644
--- a/hw/block/fdc-sysbus.c
+++ b/hw/block/fdc-sysbus.c
@@ -94,6 +94,15 @@ static void fdctrl_handle_tc(void *opaque, int irq, int 
level)
 trace_fdctrl_tc_pulse(level);
 }
 
+void sysbus_fdc_init_drives(SysBusDevice *dev, DriveInfo **fds)
+{
+FDCtrlSysBus *fdc;
+
+fdc = SYSBUS_FDC(dev);
+
+fdctrl_init_drives(>state.bus, fds);
+}
+
 void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
 hwaddr mmio_base, DriveInfo **fds)
 {
@@ -111,23 +120,23 @@ void fdctrl_init_sysbus(qemu_irq irq, int dma_chann,
 sysbus_connect_irq(sbd, 0, irq);
 sysbus_mmio_map(sbd, 0, mmio_base);
 
-fdctrl_init_drives(>state.bus, fds);
+sysbus_fdc_init_drives(sbd, fds);
 }
 
 void sun4m_fdctrl_init(qemu_irq irq, hwaddr io_base,
DriveInfo **fds, qemu_irq *fdc_tc)
 {
 DeviceState *dev;
-FDCtrlSysBus *sys;
+SysBusDevice *sbd;
 
 dev = qdev_new("sun-fdtwo");
-sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), _fatal);
-sys = SYSBUS_FDC(dev);
-sysbus_connect_irq(SYS_BUS_DEVICE(sys), 0, irq);
-sysbus_mmio_map(SYS_BUS_DEVICE(sys), 0, io_base);
+sbd = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(sbd, _fatal);
+sysbus_connect_irq(sbd, 0, irq);
+sysbus_mmio_map(sbd, 0, io_base);
 *fdc_tc = qdev_get_gpio_in(dev, 0);
 
-fdctrl_init_drives(>state.bus, fds);
+sysbus_fdc_init_drives(sbd, fds);
 }
 
 static void sysbus_fdc_common_initfn(Object *obj)
-- 
2.26.3




[PATCH v3 1/8] hw/block/fdc: Replace disabled fprintf() by trace event

2021-05-17 Thread Philippe Mathieu-Daudé
Reviewed-by: John Snow 
Acked-by: Mark Cave-Ayland 
Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/fdc.c| 7 +--
 hw/block/trace-events | 1 +
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index a825c2acbae..1d3a0473678 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -1242,12 +1242,7 @@ static void fdctrl_external_reset_isa(DeviceState *d)
 
 static void fdctrl_handle_tc(void *opaque, int irq, int level)
 {
-//FDCtrl *s = opaque;
-
-if (level) {
-// XXX
-FLOPPY_DPRINTF("TC pulsed\n");
-}
+trace_fdctrl_tc_pulse(level);
 }
 
 /* Change IRQ state */
diff --git a/hw/block/trace-events b/hw/block/trace-events
index fa12e3a67a7..306989c193c 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -3,6 +3,7 @@
 # fdc.c
 fdc_ioport_read(uint8_t reg, uint8_t value) "read reg 0x%02x val 0x%02x"
 fdc_ioport_write(uint8_t reg, uint8_t value) "write reg 0x%02x val 0x%02x"
+fdctrl_tc_pulse(int level) "TC pulse: %u"
 
 # pflash_cfi01.c
 # pflash_cfi02.c
-- 
2.26.3




[PATCH v3 2/8] hw/block/fdc: Declare shared prototypes in fdc-internal.h

2021-05-17 Thread Philippe Mathieu-Daudé
We want to extract ISA/SysBus code from the generic fdc.c file.
First, declare the prototypes we will access from the new units
into a new local header: "fdc-internal.h".

Acked-by: Mark Cave-Ayland 
Reviewed-by: Mark Cave-Ayland 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/fdc-internal.h | 156 
 hw/block/fdc.c  | 126 +++-
 MAINTAINERS |   1 +
 3 files changed, 165 insertions(+), 118 deletions(-)
 create mode 100644 hw/block/fdc-internal.h

diff --git a/hw/block/fdc-internal.h b/hw/block/fdc-internal.h
new file mode 100644
index 000..278de725e69
--- /dev/null
+++ b/hw/block/fdc-internal.h
@@ -0,0 +1,156 @@
+/*
+ * QEMU Floppy disk emulator (Intel 82078)
+ *
+ * Copyright (c) 2003, 2007 Jocelyn Mayer
+ * Copyright (c) 2008 Hervé Poussineau
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#ifndef HW_BLOCK_FDC_INTERNAL_H
+#define HW_BLOCK_FDC_INTERNAL_H
+
+#include "exec/memory.h"
+#include "exec/ioport.h"
+#include "hw/block/block.h"
+#include "hw/block/fdc.h"
+#include "qapi/qapi-types-block.h"
+
+typedef struct FDCtrl FDCtrl;
+
+/* Floppy bus emulation */
+
+typedef struct FloppyBus {
+BusState bus;
+FDCtrl *fdc;
+} FloppyBus;
+
+/* Floppy disk drive emulation */
+
+typedef enum FDriveRate {
+FDRIVE_RATE_500K = 0x00,  /* 500 Kbps */
+FDRIVE_RATE_300K = 0x01,  /* 300 Kbps */
+FDRIVE_RATE_250K = 0x02,  /* 250 Kbps */
+FDRIVE_RATE_1M   = 0x03,  /*   1 Mbps */
+} FDriveRate;
+
+typedef enum FDriveSize {
+FDRIVE_SIZE_UNKNOWN,
+FDRIVE_SIZE_350,
+FDRIVE_SIZE_525,
+} FDriveSize;
+
+typedef struct FDFormat {
+FloppyDriveType drive;
+uint8_t last_sect;
+uint8_t max_track;
+uint8_t max_head;
+FDriveRate rate;
+} FDFormat;
+
+typedef enum FDiskFlags {
+FDISK_DBL_SIDES  = 0x01,
+} FDiskFlags;
+
+typedef struct FDrive {
+FDCtrl *fdctrl;
+BlockBackend *blk;
+BlockConf *conf;
+/* Drive status */
+FloppyDriveType drive;/* CMOS drive type*/
+uint8_t perpendicular;/* 2.88 MB access mode*/
+/* Position */
+uint8_t head;
+uint8_t track;
+uint8_t sect;
+/* Media */
+FloppyDriveType disk; /* Current disk type  */
+FDiskFlags flags;
+uint8_t last_sect;/* Nb sector per track*/
+uint8_t max_track;/* Nb of tracks   */
+uint16_t bps; /* Bytes per sector   */
+uint8_t ro;   /* Is read-only   */
+uint8_t media_changed;/* Is media changed   */
+uint8_t media_rate;   /* Data rate of medium*/
+
+bool media_validated; /* Have we validated the media? */
+} FDrive;
+
+struct FDCtrl {
+MemoryRegion iomem;
+qemu_irq irq;
+/* Controller state */
+QEMUTimer *result_timer;
+int dma_chann;
+uint8_t phase;
+IsaDma *dma;
+/* Controller's identification */
+uint8_t version;
+/* HW */
+uint8_t sra;
+uint8_t srb;
+uint8_t dor;
+uint8_t dor_vmstate; /* only used as temp during vmstate */
+uint8_t tdr;
+uint8_t dsr;
+uint8_t msr;
+uint8_t cur_drv;
+uint8_t status0;
+uint8_t status1;
+uint8_t status2;
+/* Command FIFO */
+uint8_t *fifo;
+int32_t fifo_size;
+uint32_t data_pos;
+uint32_t data_len;
+uint8_t data_state;
+uint8_t data_dir;
+uint8_t eot; /* last wanted sector */
+/* States kept only to be returned back */
+/* precompensation */
+uint8_t precomp_trk;
+uint8_t config;
+uint8_t lock;
+/* Power down config (also with status regB access mode */
+uint8_t pwrd;
+/* Floppy drives */
+FloppyBus bus;
+uint8_t num_floppies;
+FDrive drives[MAX_FD];
+struct {
+FloppyDriveType type;
+} qdev_for_drives[MAX_FD];
+int reset_sensei;
+FloppyDriveType fallback; /* type=auto failure fallback */
+/* 

[PATCH v3 0/8] hw/block/fdc: Allow Kconfig-selecting ISA bus/SysBus floppy controllers

2021-05-17 Thread Philippe Mathieu-Daudé
Series fully reviewed.

Hi,

The floppy disc controllers pulls in irrelevant devices (sysbus in
an ISA-only machine, ISA bus + isa devices on a sysbus-only machine).

This series clean that by extracting each device in its own file,
adding the corresponding Kconfig symbols: FDC_ISA and FDC_SYSBUS.

Since v2:
- rebased

Since v1:
- added missing "hw/block/block.h" header (jsnow)
- inlined hardware specific calls (Mark)
- added R-b/A-b tags

Regards,

Phil.

Philippe Mathieu-Daudé (8):
  hw/block/fdc: Replace disabled fprintf() by trace event
  hw/block/fdc: Declare shared prototypes in fdc-internal.h
  hw/block/fdc: Extract ISA floppy controllers to fdc-isa.c
  hw/block/fdc: Extract SysBus floppy controllers to fdc-sysbus.c
  hw/block/fdc: Add sysbus_fdc_init_drives() method
  hw/sparc/sun4m: Inline sun4m_fdctrl_init()
  hw/block/fdc-sysbus: Add 'dma-channel' property
  hw/mips/jazz: Inline fdctrl_init_sysbus()

 hw/block/fdc-internal.h | 156 +++
 include/hw/block/fdc.h  |   7 +-
 hw/block/fdc-isa.c  | 313 +
 hw/block/fdc-sysbus.c   | 224 +++
 hw/block/fdc.c  | 608 +---
 hw/mips/jazz.c  |  16 ++
 hw/sparc/sun4m.c|  16 ++
 MAINTAINERS |   3 +
 hw/block/Kconfig|   8 +
 hw/block/meson.build|   2 +
 hw/block/trace-events   |   3 +
 hw/i386/Kconfig |   2 +-
 hw/isa/Kconfig  |   6 +-
 hw/mips/Kconfig |   2 +-
 hw/sparc/Kconfig|   2 +-
 hw/sparc64/Kconfig  |   2 +-
 16 files changed, 758 insertions(+), 612 deletions(-)
 create mode 100644 hw/block/fdc-internal.h
 create mode 100644 hw/block/fdc-isa.c
 create mode 100644 hw/block/fdc-sysbus.c

-- 
2.26.3





Re: [PATCH] fdc: check drive block device before usage (CVE-2021-20196)

2021-05-17 Thread John Snow

On 5/17/21 7:12 AM, P J P wrote:

+-- On Sat, 15 May 2021, Philippe Mathieu-Daudé wrote --+
| This patch misses the qtest companion with the reproducer
| provided by Alexander.

Do we need a revised patch[-series] including a qtest? OR it can be done at
merge time?

Thank you.
--
  - P J P
8685 545E B54C 486B C6EB 271E E285 8B5A F050 DE8D



Unknown, haven't dug into this patch and problem yet.

If you have the time to write a qtest reproducer, you can send it 
separately and I'll pick it up if everything looks correct.


Sorry for the FDC/ATA delays. Working on it.

(...Maintainers wanted!)

--js




Re: [PATCH 09/21] block/backup: move cluster size calculation to block-copy

2021-05-17 Thread Max Reitz

On 17.05.21 08:44, Vladimir Sementsov-Ogievskiy wrote:

The main consumer of cluster-size is block-copy. Let's calculate it
here instead of passing through backup-top.

We are going to publish copy-before-write filter soon, so it will be
created through options. But we don't want for now to make explicit
option for cluster-size, let's continue to calculate it automatically.
So, now is the time to get rid of cluster_size argument for
bdrv_cbw_append().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/copy-before-write.h  |  1 -
  include/block/block-copy.h |  5 ++--
  block/backup.c | 58 ++
  block/block-copy.c | 47 +-
  block/copy-before-write.c  | 10 +++
  5 files changed, 62 insertions(+), 59 deletions(-)


Reviewed-by: Max Reitz 




Re: [PATCH 08/21] block/backup: stricter backup_calculate_cluster_size()

2021-05-17 Thread Max Reitz

On 17.05.21 08:44, Vladimir Sementsov-Ogievskiy wrote:

No reason to tolerate bdrv_get_info() errors except for ENOTSUP. Let's
just error-out, it's simpler and safer.


Hm, doesn’t look that much simpler to me.  Not sure how much safer it 
is, because the point was that in the target_does_cow case, we would 
like a cluster size hint, but it isn’t necessary.  So if we don’t get 
one, regardless of the reason, we use the default cluster size.  I don’t 
know why ENOTSUP should be treated in a special way there.


So I don’t know.

Max


Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/backup.c | 14 +-
  1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index fe685e411b..fe7a1f1e37 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -367,7 +367,10 @@ static int64_t 
backup_calculate_cluster_size(BlockDriverState *target,
   * targets with a backing file, try to avoid COW if possible.
   */
  ret = bdrv_get_info(target, );
-if (ret == -ENOTSUP && !target_does_cow) {
+if (ret < 0 && ret != -ENOTSUP) {
+error_setg_errno(errp, -ret, "Failed to get target info");
+return ret;
+} else if (ret == -ENOTSUP && !target_does_cow) {
  /* Cluster size is not defined */
  warn_report("The target block device doesn't provide "
  "information about the block size and it doesn't have a "
@@ -376,14 +379,7 @@ static int64_t 
backup_calculate_cluster_size(BlockDriverState *target,
  "this default, the backup may be unusable",
  BACKUP_CLUSTER_SIZE_DEFAULT);
  return BACKUP_CLUSTER_SIZE_DEFAULT;
-} else if (ret < 0 && !target_does_cow) {
-error_setg_errno(errp, -ret,
-"Couldn't determine the cluster size of the target image, "
-"which has no backing file");
-error_append_hint(errp,
-"Aborting, since this may create an unusable destination image\n");
-return ret;
-} else if (ret < 0 && target_does_cow) {
+} else if (ret == -ENOTSUP && target_does_cow) {
  /* Not fatal; just trudge on ahead. */
  return BACKUP_CLUSTER_SIZE_DEFAULT;
  }






Re: [PATCH 00/10] Python: delint iotests, machine.py and console_socket.py

2021-05-17 Thread Emanuele Giuseppe Esposito




On 17/05/2021 18:11, John Snow wrote:

On 5/12/21 5:46 PM, John Snow wrote:

gitlab CI: https://gitlab.com/jsnow/qemu/-/pipelines/301924893
branch: 
https://gitlab.com/jsnow/qemu/-/commits/python-package-pre-cleanup


This series serves as a pre-requisite for packaging the python series
and getting the linters running via CI. The first patch fixes a linter
error we've had for a while now; the subsequent 9 fix a new warning that
was recently added to pylint 2.8.x.

If there's nobody opposed, I'll take it through my Python queue,
including the iotests bits.

John Snow (10):
   python/console_socket: avoid one-letter variable
   python/machine: use subprocess.DEVNULL instead of
 open(os.path.devnull)
   python/machine: use subprocess.run instead of subprocess.Popen
   python/console_socket: Add a pylint ignore
   python/machine: Disable pylint warning for open() in _pre_launch
   python/machine: disable warning for Popen in _launch()
   iotests: use subprocess.run where possible
   iotests: use 'with open()' where applicable
   iotests: silence spurious consider-using-with warnings
   iotests: ensure that QemuIoInteractive definitely closes

  python/qemu/console_socket.py    | 11 ---
  python/qemu/machine.py   | 28 ++--
  tests/qemu-iotests/iotests.py    | 55 +++-
  tests/qemu-iotests/testrunner.py |  1 +
  4 files changed, 57 insertions(+), 38 deletions(-)



The iotests stuff was handled by Emanuele Giuseppe Esposito instead, and 
-- I must admit -- better than I did. Dropping patches 7-10.


Yes, patch 7-9 + the #pylint: disable= in patch 10 are covered in
"qemu-iotests: fix pylint 2.8 consider-using-with error"
https://patchew.org/QEMU/20210510190449.65948-1-eespo...@redhat.com/
that is merged.

Just wanted to point that maybe you want to keep part of patch 10, if 
you think that it is important :)


Emanuele




Re: [PATCH 07/21] block-copy: always set BDRV_REQ_SERIALISING flag

2021-05-17 Thread Max Reitz

On 17.05.21 08:44, Vladimir Sementsov-Ogievskiy wrote:

It won't hurt in common case, so let's not bother with detecting image
fleecing.

Also, we want to simplify initialization interface of copy-before-write
filter as we are going to make it public.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/copy-before-write.h  |  2 +-
  include/block/block-copy.h |  3 +--
  block/backup.c | 20 +---
  block/block-copy.c | 29 ++---
  block/copy-before-write.c  |  4 ++--
  5 files changed, 31 insertions(+), 27 deletions(-)


Reviewed-by: Max Reitz 




Re: [PATCH 00/10] Python: delint iotests, machine.py and console_socket.py

2021-05-17 Thread John Snow

On 5/12/21 5:46 PM, John Snow wrote:

gitlab CI: https://gitlab.com/jsnow/qemu/-/pipelines/301924893
branch: https://gitlab.com/jsnow/qemu/-/commits/python-package-pre-cleanup

This series serves as a pre-requisite for packaging the python series
and getting the linters running via CI. The first patch fixes a linter
error we've had for a while now; the subsequent 9 fix a new warning that
was recently added to pylint 2.8.x.

If there's nobody opposed, I'll take it through my Python queue,
including the iotests bits.

John Snow (10):
   python/console_socket: avoid one-letter variable
   python/machine: use subprocess.DEVNULL instead of
 open(os.path.devnull)
   python/machine: use subprocess.run instead of subprocess.Popen
   python/console_socket: Add a pylint ignore
   python/machine: Disable pylint warning for open() in _pre_launch
   python/machine: disable warning for Popen in _launch()
   iotests: use subprocess.run where possible
   iotests: use 'with open()' where applicable
   iotests: silence spurious consider-using-with warnings
   iotests: ensure that QemuIoInteractive definitely closes

  python/qemu/console_socket.py| 11 ---
  python/qemu/machine.py   | 28 ++--
  tests/qemu-iotests/iotests.py| 55 +++-
  tests/qemu-iotests/testrunner.py |  1 +
  4 files changed, 57 insertions(+), 38 deletions(-)



The iotests stuff was handled by Emanuele Giuseppe Esposito instead, and 
-- I must admit -- better than I did. Dropping patches 7-10.


--js




Re: [PATCH 06/21] block/backup: drop support for copy_range

2021-05-17 Thread Max Reitz

On 17.05.21 08:44, Vladimir Sementsov-Ogievskiy wrote:

copy_range is not a default behavior since 6a30f663d4c0b3c, and it's
now available only though x-perf experimantal argument, so it's OK to
drop it.

Even when backup is used to copy disk to same filesystem, and
filesystem supports zero-copy copy_range, copy_range is probably not
what we want for backup: backup has good property of making a copy of
active disk, with no impact to active disk itself (unlike creating a
snapshot). And if copy_range instead of copying data adds fs-level
references, and on next guest write COW operation occurs, it's seems
most possible, that new block will be allocated for original vm disk,
not for backup disk. Thus, fragmentation of original disk will
increase.


Good point.


We can simply add support back on demand. Now we want to publish
copy-before-write filter, and instead of thinking how to pass
use-copy-range argument to block-copy (create x-block-copy parameter
for new public filter driver, or may be set it by hand after filter
node creation?), instead of this let's just drop copy-range support in
backup for now.

After this patch copy-range support in block-copy becomes unused. Let's
keep it for a while, it won't hurt:

1. If there would be request for supporting copy_range in backup
(and/or in a new public copy-before-write filter), it will be easy
to satisfy it.

2. Probably, qemu-img convert will reuse block-copy, and qemu-img has
option to enable copy-range. qemu-img convert is not a backup, and
copy_range may be more reasonable for some cases in context of
qemu-img convert.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/copy-before-write.h | 1 -
  block/backup.c| 3 +--
  block/copy-before-write.c | 4 +---
  3 files changed, 2 insertions(+), 6 deletions(-)


Reviewed-by: Max Reitz 




Re: [PATCH 05/21] block: rename backup-top to copy-before-write

2021-05-17 Thread Max Reitz

On 17.05.21 08:44, Vladimir Sementsov-Ogievskiy wrote:

We are going to convert backup_top to full featured public filter,
which can be used in separate of backup job. Start from renaming from
"how it used" to "what it does".


Is this safe?  The name was externally visible in queries after all. 
(I’m not saying it is unsafe, I just don’t know and would like to know 
whether you’ve considered this already.)


(Regardless, renaming files and so on is fine, of course.)


While updating comments in 283 iotest, drop and rephrase also things
about ".active", as this field is now dropped, and filter doesn't have
"inactive" mode.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block/{backup-top.h => copy-before-write.h} |  28 +++---
  block/backup.c  |  22 ++---
  block/{backup-top.c => copy-before-write.c} | 100 ++--
  MAINTAINERS |   4 +-
  block/meson.build   |   2 +-
  tests/qemu-iotests/283  |  35 +++
  tests/qemu-iotests/283.out  |   4 +-
  7 files changed, 95 insertions(+), 100 deletions(-)
  rename block/{backup-top.h => copy-before-write.h} (56%)
  rename block/{backup-top.c => copy-before-write.c} (62%)


[...]


diff --git a/block/backup-top.c b/block/copy-before-write.c
similarity index 62%
rename from block/backup-top.c
rename to block/copy-before-write.c
index 425e3778be..40e91832d7 100644
--- a/block/backup-top.c
+++ b/block/copy-before-write.c


[...]


@@ -32,25 +32,25 @@


[...]


-static coroutine_fn int backup_top_cbw(BlockDriverState *bs, uint64_t offset,
-   uint64_t bytes, BdrvRequestFlags flags)
+static coroutine_fn int cbw_cbw(BlockDriverState *bs, uint64_t offset,
+uint64_t bytes, BdrvRequestFlags flags)


I’m sure you noticed it, too, but cbw_cbw() is weird.  Perhaps 
cbw_do_cbw() at least?


Max




Re: [PATCH 01/21] block: introduce bdrv_replace_child_bs()

2021-05-17 Thread Max Reitz

On 17.05.21 16:30, Vladimir Sementsov-Ogievskiy wrote:

17.05.2021 15:09, Max Reitz wrote:

On 17.05.21 08:44, Vladimir Sementsov-Ogievskiy wrote:

Add function to transactionally replace bs inside BdrvChild.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/block.h |  2 ++
  block.c   | 36 
  2 files changed, 38 insertions(+)


As you may guess, I know little about the rewritten replacing 
functions, so this is kind of difficult to review for me.  However, 
nothing looks out of place, and the function looks sufficiently 
similar to bdrv_replace_node_common() to make me happy.



diff --git a/include/block/block.h b/include/block/block.h
index 82185965ff..f9d5fcb108 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -361,6 +361,8 @@ int bdrv_append(BlockDriverState *bs_new, 
BlockDriverState *bs_top,

  Error **errp);
  int bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
    Error **errp);
+int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
+  Error **errp);
  BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict 
*node_options,

 int flags, Error **errp);
  int bdrv_drop_filter(BlockDriverState *bs, Error **errp);
diff --git a/block.c b/block.c
index 9ad725d205..755fa53d85 100644
--- a/block.c
+++ b/block.c
@@ -4961,6 +4961,42 @@ out:
  return ret;
  }
+int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
+  Error **errp)
+{
+    int ret;
+    Transaction *tran = tran_new();
+    g_autoptr(GHashTable) found = NULL;
+    g_autoptr(GSList) refresh_list = NULL;
+    BlockDriverState *old_bs = child->bs;
+
+    if (old_bs) {


Hm.  Can child->bs be ever NULL?


Hmm. Most probably not :)

In some intermediate states we don't have bs in child, but it shouldn't 
be the place where bdrv_replace_child_bs is called.





+    bdrv_ref(old_bs);
+    bdrv_drained_begin(old_bs);
+    }
+    bdrv_drained_begin(new_bs);


(I was wondering why we couldn’t handle the new_bs == NULL case here 
to replace bdrv_remove_filter_or_cow_child(), but then I realized it’s 
probably because that’s kind of difficult, precisely because child->bs 
at least should generally be non-NULL.  Which is why 
bdrv_remove_filter_or_cow_child() needs to add its own transaction 
entry to handle the BdrvChild object and the pointer to it.


Hence me wondering whether we could assume child->bs not to be NULL.)


bdrv_remove_filter_or_cow_child() is "lower leve" function: it doesn't 
do drained section nor permission update. And new 
bdrv_replace_child_bs() is public function, which cares about these things.





+
+    bdrv_replace_child(child, new_bs, tran);
+
+    found = g_hash_table_new(NULL, NULL);
+    if (old_bs) {
+    refresh_list = bdrv_topological_dfs(refresh_list, found, 
old_bs);

+    }
+    refresh_list = bdrv_topological_dfs(refresh_list, found, new_bs);
+
+    ret = bdrv_list_refresh_perms(refresh_list, NULL, tran, errp);


Speaking of bdrv_remove_filter_or_cow_child(): That function doesn’t 
refresh permissions.  I think it’s correct to do it here, so the 
following question doesn’t really concern this patch, but: Why don’t 
we do it there?


I guess it’s because we expect the node to go away anyway, so we don’t 
need to refresh the permissions.  And that assumption should hold true 
right now, given its callers.  But is that a safe assumption in 
general?  Would there be a problem if we refreshed permissions there?  
Or is not refreshing permissions just part of the function’s interface?




Caller of bdrv_remove_filter_or_cow_child() should care about 
permissions:  bdrv_replace_node_common() do this, and 
bdrv_set_backing_noperm() has "_noperm" in the name..


OK.  Makes me wonder why bdrv_remove_filter_or_cow_child() then doesn’t 
have _noperm in its name, or why its comment doesn’t explain this 
interface contract, but, well. :)


The main impact of previous big rework of permission is new scheme of 
working with permission update:


  - first do all graph modifications, not thinking about permissions
  - refresh permissions for the whole updated subgraph
  - if refresh failed, rollback all the modifications (main sense if 
transactions here and there is possibility to do this rollback)


So a lot of internal functions with @tran argument don't update 
permissions. But of course, we should care to update permissions after 
any graph modification.


Ah, OK.  Makes sense, thanks.

Max




Re: [PATCH 04/21] qdev: allow setting drive property for realized device

2021-05-17 Thread Max Reitz

On 17.05.21 08:44, Vladimir Sementsov-Ogievskiy wrote:

We need an ability to insert filters above top block node, attached to
block device. It can't be achieved with blockdev-reopen command. So, we
want do it with help of qom-set.

Intended usage:

1. blockdev-add, creating the filter, which child is at top node A,
attached to some guest block device.


Is a “not” missing here, i.e. “not attached to any guest block device”? 
 I would have thought one would create a filtered tree that is not in 
use by any frontend, so that the filter need not take any permissions.



2. qom-set, to change bs attached to root blk from original node to
newly create filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  hw/core/qdev-properties-system.c | 30 ++
  1 file changed, 22 insertions(+), 8 deletions(-)


Looks good, just one question: (well, two, one was above)


diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 2760c21f11..7d97562654 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c


[...]


@@ -196,6 +209,7 @@ static void release_drive(Object *obj, const char *name, 
void *opaque)
  const PropertyInfo qdev_prop_drive = {
  .name  = "str",
  .description = "Node name or ID of a block device to use as a backend",
+.realized_set_allowed = true,
  .get   = get_drive,
  .set   = set_drive,
  .release = release_drive,


Why not for qdev_prop_drive_iothread?

Max




[PATCH v3 5/5] blkdebug: protect rules and suspended_reqs with a lock

2021-05-17 Thread Emanuele Giuseppe Esposito
Co-developed-by: Paolo Bonzini 
Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/blkdebug.c | 53 
 1 file changed, 40 insertions(+), 13 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index dffd869b32..cf8b088ce7 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -54,6 +54,7 @@ typedef struct BDRVBlkdebugState {
 /* For blkdebug_refresh_filename() */
 char *config_file;
 
+QemuMutex lock;
 QLIST_HEAD(, BlkdebugRule) rules[BLKDBG__MAX];
 QSIMPLEQ_HEAD(, BlkdebugRule) active_rules;
 QLIST_HEAD(, BlkdebugSuspendedReq) suspended_reqs;
@@ -245,7 +246,9 @@ static int add_rule(void *opaque, QemuOpts *opts, Error 
**errp)
 };
 
 /* Add the rule */
+qemu_mutex_lock(>lock);
 QLIST_INSERT_HEAD(>rules[event], rule, next);
+qemu_mutex_unlock(>lock);
 
 return 0;
 }
@@ -468,6 +471,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 int ret;
 uint64_t align;
 
+qemu_mutex_init(>lock);
 opts = qemu_opts_create(_opts, NULL, 0, _abort);
 if (!qemu_opts_absorb_qdict(opts, options, errp)) {
 ret = -EINVAL;
@@ -568,6 +572,7 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 ret = 0;
 out:
 if (ret < 0) {
+qemu_mutex_destroy(>lock);
 g_free(s->config_file);
 }
 qemu_opts_del(opts);
@@ -582,6 +587,7 @@ static int rule_check(BlockDriverState *bs, uint64_t 
offset, uint64_t bytes,
 int error;
 bool immediately;
 
+qemu_mutex_lock(>lock);
 QSIMPLEQ_FOREACH(rule, >active_rules, active_next) {
 uint64_t inject_offset = rule->options.inject.offset;
 
@@ -595,6 +601,7 @@ static int rule_check(BlockDriverState *bs, uint64_t 
offset, uint64_t bytes,
 }
 
 if (!rule || !rule->options.inject.error) {
+qemu_mutex_unlock(>lock);
 return 0;
 }
 
@@ -606,6 +613,7 @@ static int rule_check(BlockDriverState *bs, uint64_t 
offset, uint64_t bytes,
 remove_rule(rule);
 }
 
+qemu_mutex_unlock(>lock);
 if (!immediately) {
 aio_co_schedule(qemu_get_current_aio_context(), qemu_coroutine_self());
 qemu_coroutine_yield();
@@ -771,8 +779,10 @@ static void blkdebug_close(BlockDriverState *bs)
 }
 
 g_free(s->config_file);
+qemu_mutex_destroy(>lock);
 }
 
+/* Called with lock held.  */
 static void suspend_request(BlockDriverState *bs, BlkdebugRule *rule)
 {
 BDRVBlkdebugState *s = bs->opaque;
@@ -791,6 +801,7 @@ static void suspend_request(BlockDriverState *bs, 
BlkdebugRule *rule)
 }
 }
 
+/* Called with lock held.  */
 static void process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
  int *action_count)
 {
@@ -829,9 +840,11 @@ static void blkdebug_debug_event(BlockDriverState *bs, 
BlkdebugEvent event)
 
 assert((int)event >= 0 && event < BLKDBG__MAX);
 
-s->new_state = s->state;
-QLIST_FOREACH_SAFE(rule, >rules[event], next, next) {
-process_rule(bs, rule, actions_count);
+WITH_QEMU_LOCK_GUARD(>lock) {
+s->new_state = s->state;
+QLIST_FOREACH_SAFE(rule, >rules[event], next, next) {
+process_rule(bs, rule, actions_count);
+}
 }
 
 while (actions_count[ACTION_SUSPEND] > 0) {
@@ -839,7 +852,9 @@ static void blkdebug_debug_event(BlockDriverState *bs, 
BlkdebugEvent event)
 actions_count[ACTION_SUSPEND]--;
 }
 
+qemu_mutex_lock(>lock);
 s->state = s->new_state;
+qemu_mutex_unlock(>lock);
 }
 
 static int blkdebug_debug_breakpoint(BlockDriverState *bs, const char *event,
@@ -862,11 +877,14 @@ static int blkdebug_debug_breakpoint(BlockDriverState 
*bs, const char *event,
 .options.suspend.tag = g_strdup(tag),
 };
 
+qemu_mutex_lock(>lock);
 QLIST_INSERT_HEAD(>rules[blkdebug_event], rule, next);
+qemu_mutex_unlock(>lock);
 
 return 0;
 }
 
+/* Called with lock held.  */
 static int resume_req_by_tag(BDRVBlkdebugState *s, const char *tag, bool all)
 {
 BlkdebugSuspendedReq *r;
@@ -884,7 +902,9 @@ retry:
 g_free(r->tag);
 g_free(r);
 
+qemu_mutex_unlock(>lock);
 qemu_coroutine_enter(co);
+qemu_mutex_lock(>lock);
 
 if (all) {
 goto retry;
@@ -898,8 +918,12 @@ retry:
 static int blkdebug_debug_resume(BlockDriverState *bs, const char *tag)
 {
 BDRVBlkdebugState *s = bs->opaque;
+int rc;
 
-return resume_req_by_tag(s, tag, false);
+qemu_mutex_lock(>lock);
+rc = resume_req_by_tag(s, tag, false);
+qemu_mutex_unlock(>lock);
+return rc;
 }
 
 static int blkdebug_debug_remove_breakpoint(BlockDriverState *bs,
@@ -909,17 +933,19 @@ static int 
blkdebug_debug_remove_breakpoint(BlockDriverState *bs,
 BlkdebugRule *rule, *next;
 int i, ret = -ENOENT;
 
-for (i = 0; i < BLKDBG__MAX; i++) {
-QLIST_FOREACH_SAFE(rule, >rules[i], next, next) {
-  

[PATCH v3 3/5] blkdebug: track all actions

2021-05-17 Thread Emanuele Giuseppe Esposito
Add a counter for each action that a rule can trigger.
This is mainly used to keep track of how many coroutine_yield()
we need to perform after processing all rules in the list.

Co-developed-by: Paolo Bonzini 
Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/blkdebug.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index e37f999254..388b5ed615 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -74,6 +74,7 @@ enum {
 ACTION_INJECT_ERROR,
 ACTION_SET_STATE,
 ACTION_SUSPEND,
+ACTION__MAX,
 };
 
 typedef struct BlkdebugRule {
@@ -791,22 +792,22 @@ static void suspend_request(BlockDriverState *bs, 
BlkdebugRule *rule)
 qemu_coroutine_yield();
 }
 
-static bool process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
-bool injected)
+static void process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
+ int *action_count)
 {
 BDRVBlkdebugState *s = bs->opaque;
 
 /* Only process rules for the current state */
 if (rule->state && rule->state != s->state) {
-return injected;
+return;
 }
 
 /* Take the action */
+action_count[rule->action]++;
 switch (rule->action) {
 case ACTION_INJECT_ERROR:
-if (!injected) {
+if (action_count[ACTION_INJECT_ERROR] == 1) {
 QSIMPLEQ_INIT(>active_rules);
-injected = true;
 }
 QSIMPLEQ_INSERT_HEAD(>active_rules, rule, active_next);
 break;
@@ -819,21 +820,19 @@ static bool process_rule(BlockDriverState *bs, struct 
BlkdebugRule *rule,
 suspend_request(bs, rule);
 break;
 }
-return injected;
 }
 
 static void blkdebug_debug_event(BlockDriverState *bs, BlkdebugEvent event)
 {
 BDRVBlkdebugState *s = bs->opaque;
 struct BlkdebugRule *rule, *next;
-bool injected;
+int actions_count[ACTION__MAX] = { 0 };
 
 assert((int)event >= 0 && event < BLKDBG__MAX);
 
-injected = false;
 s->new_state = s->state;
 QLIST_FOREACH_SAFE(rule, >rules[event], next, next) {
-injected = process_rule(bs, rule, injected);
+process_rule(bs, rule, actions_count);
 }
 s->state = s->new_state;
 }
-- 
2.30.2




[PATCH v3 4/5] blkdebug: do not suspend in the middle of QLIST_FOREACH_SAFE

2021-05-17 Thread Emanuele Giuseppe Esposito
That would be unsafe in case a rule other than the current one
is removed while the coroutine has yielded.
Keep FOREACH_SAFE because suspend_request deletes the current rule.

After this patch, *all* matching rules are deleted before suspending
the coroutine, rather than just one.
This doesn't affect the existing testcases.

Use actions_count to see how many yield to issue.

Co-developed-by: Paolo Bonzini 
Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/blkdebug.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 388b5ed615..dffd869b32 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -789,7 +789,6 @@ static void suspend_request(BlockDriverState *bs, 
BlkdebugRule *rule)
 if (!qtest_enabled()) {
 printf("blkdebug: Suspended request '%s'\n", r->tag);
 }
-qemu_coroutine_yield();
 }
 
 static void process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
@@ -834,6 +833,12 @@ static void blkdebug_debug_event(BlockDriverState *bs, 
BlkdebugEvent event)
 QLIST_FOREACH_SAFE(rule, >rules[event], next, next) {
 process_rule(bs, rule, actions_count);
 }
+
+while (actions_count[ACTION_SUSPEND] > 0) {
+qemu_coroutine_yield();
+actions_count[ACTION_SUSPEND]--;
+}
+
 s->state = s->new_state;
 }
 
-- 
2.30.2




[PATCH v3 2/5] blkdebug: move post-resume handling to resume_req_by_tag

2021-05-17 Thread Emanuele Giuseppe Esposito
We want to move qemu_coroutine_yield() after the loop on rules,
because QLIST_FOREACH_SAFE is wrong if the rule list is modified
while the coroutine has yielded.  Therefore move the suspended
request to the heap and clean it up from the remove side.
All that is left is for blkdebug_debug_event to handle the
yielding.

Co-developed-by: Paolo Bonzini 
Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/blkdebug.c | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 8f19d991fa..e37f999254 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -775,25 +775,20 @@ static void blkdebug_close(BlockDriverState *bs)
 static void suspend_request(BlockDriverState *bs, BlkdebugRule *rule)
 {
 BDRVBlkdebugState *s = bs->opaque;
-BlkdebugSuspendedReq r;
+BlkdebugSuspendedReq *r;
 
-r = (BlkdebugSuspendedReq) {
-.co = qemu_coroutine_self(),
-.tag= g_strdup(rule->options.suspend.tag),
-};
+r = g_new(BlkdebugSuspendedReq, 1);
+
+r->co = qemu_coroutine_self();
+r->tag= g_strdup(rule->options.suspend.tag);
 
 remove_rule(rule);
-QLIST_INSERT_HEAD(>suspended_reqs, , next);
+QLIST_INSERT_HEAD(>suspended_reqs, r, next);
 
 if (!qtest_enabled()) {
-printf("blkdebug: Suspended request '%s'\n", r.tag);
+printf("blkdebug: Suspended request '%s'\n", r->tag);
 }
 qemu_coroutine_yield();
-if (!qtest_enabled()) {
-printf("blkdebug: Resuming request '%s'\n", r.tag);
-}
-
-g_free(r.tag);
 }
 
 static bool process_rule(BlockDriverState *bs, struct BlkdebugRule *rule,
@@ -875,8 +870,18 @@ static int resume_req_by_tag(BDRVBlkdebugState *s, const 
char *tag, bool all)
 retry:
 QLIST_FOREACH(r, >suspended_reqs, next) {
 if (!strcmp(r->tag, tag)) {
+Coroutine *co = r->co;
+
+if (!qtest_enabled()) {
+printf("blkdebug: Resuming request '%s'\n", r->tag);
+}
+
 QLIST_REMOVE(r, next);
-qemu_coroutine_enter(r->co);
+g_free(r->tag);
+g_free(r);
+
+qemu_coroutine_enter(co);
+
 if (all) {
 goto retry;
 }
-- 
2.30.2




[PATCH v3 1/5] blkdebug: refactor removal of a suspended request

2021-05-17 Thread Emanuele Giuseppe Esposito
Extract to a separate function.  Do not rely on FOREACH_SAFE, which is
only "safe" if the *current* node is removed---not if another node is
removed.  Instead, just walk the entire list from the beginning when
asked to resume all suspended requests with a given tag.

Co-developed-by: Paolo Bonzini 
Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/blkdebug.c | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 2c0b9b0ee8..8f19d991fa 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -793,7 +793,6 @@ static void suspend_request(BlockDriverState *bs, 
BlkdebugRule *rule)
 printf("blkdebug: Resuming request '%s'\n", r.tag);
 }
 
-QLIST_REMOVE(, next);
 g_free(r.tag);
 }
 
@@ -869,25 +868,35 @@ static int blkdebug_debug_breakpoint(BlockDriverState 
*bs, const char *event,
 return 0;
 }
 
-static int blkdebug_debug_resume(BlockDriverState *bs, const char *tag)
+static int resume_req_by_tag(BDRVBlkdebugState *s, const char *tag, bool all)
 {
-BDRVBlkdebugState *s = bs->opaque;
-BlkdebugSuspendedReq *r, *next;
+BlkdebugSuspendedReq *r;
 
-QLIST_FOREACH_SAFE(r, >suspended_reqs, next, next) {
+retry:
+QLIST_FOREACH(r, >suspended_reqs, next) {
 if (!strcmp(r->tag, tag)) {
+QLIST_REMOVE(r, next);
 qemu_coroutine_enter(r->co);
+if (all) {
+goto retry;
+}
 return 0;
 }
 }
 return -ENOENT;
 }
 
+static int blkdebug_debug_resume(BlockDriverState *bs, const char *tag)
+{
+BDRVBlkdebugState *s = bs->opaque;
+
+return resume_req_by_tag(s, tag, false);
+}
+
 static int blkdebug_debug_remove_breakpoint(BlockDriverState *bs,
 const char *tag)
 {
 BDRVBlkdebugState *s = bs->opaque;
-BlkdebugSuspendedReq *r, *r_next;
 BlkdebugRule *rule, *next;
 int i, ret = -ENOENT;
 
@@ -900,11 +909,8 @@ static int 
blkdebug_debug_remove_breakpoint(BlockDriverState *bs,
 }
 }
 }
-QLIST_FOREACH_SAFE(r, >suspended_reqs, next, r_next) {
-if (!strcmp(r->tag, tag)) {
-qemu_coroutine_enter(r->co);
-ret = 0;
-}
+if (resume_req_by_tag(s, tag, true) == 0) {
+ret = 0;
 }
 return ret;
 }
-- 
2.30.2




[PATCH v3 0/5] blkdebug: fix racing condition when iterating on

2021-05-17 Thread Emanuele Giuseppe Esposito
When qemu_coroutine_enter is executed in a loop
(even QEMU_FOREACH_SAFE), the new routine can modify the list,
for example removing an element, causing problem when control
is given back to the caller that continues iterating on the same list. 

Patch 1 solves the issue in blkdebug_debug_resume by restarting
the list walk after every coroutine_enter if list has to be fully iterated.
Patches 2,3,4 aim to fix blkdebug_debug_event by gathering
all actions that the rules make in a counter and invoking 
the respective coroutine_yeld only after processing all requests.

Patch 5 is somewhat independent of the others, it adds a lock to
protect rules and suspended_reqs; right now everything works because
it's protected by the AioContext lock.
This is a preparation for the current proposal of removing the AioContext
lock and instead using smaller granularity locks to allow multiple
iothread execution in the same block device.

Signed-off-by: Emanuele Giuseppe Esposito 
---
v2 -> v3
* Fix "yeld"->"yield" in patches 3-4 [Eric]
* Use lock guard instead of lock/unlock in patch 5 [Eric]

Emanuele Giuseppe Esposito (5):
  blkdebug: refactor removal of a suspended request
  blkdebug: move post-resume handling to resume_req_by_tag
  blkdebug: track all actions
  blkdebug: do not suspend in the middle of QLIST_FOREACH_SAFE
  blkdebug: protect rules and suspended_reqs with a lock

 block/blkdebug.c | 124 +++
 1 file changed, 83 insertions(+), 41 deletions(-)

-- 
2.30.2




Re: [PATCH 03/21] qdev-properties: PropertyInfo: add realized_set_allowed field

2021-05-17 Thread Vladimir Sementsov-Ogievskiy

17.05.2021 15:40, Max Reitz wrote:

On 17.05.21 08:44, Vladimir Sementsov-Ogievskiy wrote:

Add field, so property can declare support for setting the property
when device is realized. To be used in the following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/hw/qdev-properties.h | 1 +
  hw/core/qdev-properties.c    | 6 +++---
  2 files changed, 4 insertions(+), 3 deletions(-)


Looks OK to me, although qdev isn’t my specialty.


Neither my :) Thanks for looking anyway!




diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 0ef97d60ce..007e1f69f4 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -32,6 +32,7 @@ struct PropertyInfo {
  const char *name;
  const char *description;
  const QEnumLookup *enum_table;
+    bool realized_set_allowed;


I think a comment would be nice, though.



Agree, will add.




  int (*print)(Object *obj, Property *prop, char *dest, size_t len);
  void (*set_default_value)(ObjectProperty *op, const Property *prop);
  ObjectProperty *(*create)(ObjectClass *oc, const char *name,
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 50f40949f5..c34aac6ebc 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -26,11 +26,11 @@ void qdev_prop_set_after_realize(DeviceState *dev, const 
char *name,
  /* returns: true if property is allowed to be set, false otherwise */
  static bool qdev_prop_allow_set(Object *obj, const char *name,
-    Error **errp)
+    const PropertyInfo *info, Error **errp)
  {
  DeviceState *dev = DEVICE(obj);
-    if (dev->realized) {
+    if (dev->realized && !info->realized_set_allowed) {
  qdev_prop_set_after_realize(dev, name, errp);
  return false;
  }
@@ -79,7 +79,7 @@ static void field_prop_set(Object *obj, Visitor *v, const 
char *name,
  {
  Property *prop = opaque;
-    if (!qdev_prop_allow_set(obj, name, errp)) {
+    if (!qdev_prop_allow_set(obj, name, prop->info, errp)) {
  return;
  }






--
Best regards,
Vladimir



Re: [PATCH 01/21] block: introduce bdrv_replace_child_bs()

2021-05-17 Thread Vladimir Sementsov-Ogievskiy

17.05.2021 15:09, Max Reitz wrote:

On 17.05.21 08:44, Vladimir Sementsov-Ogievskiy wrote:

Add function to transactionally replace bs inside BdrvChild.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/block.h |  2 ++
  block.c   | 36 
  2 files changed, 38 insertions(+)


As you may guess, I know little about the rewritten replacing functions, so 
this is kind of difficult to review for me.  However, nothing looks out of 
place, and the function looks sufficiently similar to 
bdrv_replace_node_common() to make me happy.


diff --git a/include/block/block.h b/include/block/block.h
index 82185965ff..f9d5fcb108 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -361,6 +361,8 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState 
*bs_top,
  Error **errp);
  int bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
    Error **errp);
+int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
+  Error **errp);
  BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict *node_options,
 int flags, Error **errp);
  int bdrv_drop_filter(BlockDriverState *bs, Error **errp);
diff --git a/block.c b/block.c
index 9ad725d205..755fa53d85 100644
--- a/block.c
+++ b/block.c
@@ -4961,6 +4961,42 @@ out:
  return ret;
  }
+int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
+  Error **errp)
+{
+    int ret;
+    Transaction *tran = tran_new();
+    g_autoptr(GHashTable) found = NULL;
+    g_autoptr(GSList) refresh_list = NULL;
+    BlockDriverState *old_bs = child->bs;
+
+    if (old_bs) {


Hm.  Can child->bs be ever NULL?


Hmm. Most probably not :)

In some intermediate states we don't have bs in child, but it shouldn't be the 
place where bdrv_replace_child_bs is called.




+    bdrv_ref(old_bs);
+    bdrv_drained_begin(old_bs);
+    }
+    bdrv_drained_begin(new_bs);


(I was wondering why we couldn’t handle the new_bs == NULL case here to replace 
bdrv_remove_filter_or_cow_child(), but then I realized it’s probably because 
that’s kind of difficult, precisely because child->bs at least should generally 
be non-NULL.  Which is why bdrv_remove_filter_or_cow_child() needs to add its own 
transaction entry to handle the BdrvChild object and the pointer to it.

Hence me wondering whether we could assume child->bs not to be NULL.)


bdrv_remove_filter_or_cow_child() is "lower leve" function: it doesn't do 
drained section nor permission update. And new bdrv_replace_child_bs() is public 
function, which cares about these things.




+
+    bdrv_replace_child(child, new_bs, tran);
+
+    found = g_hash_table_new(NULL, NULL);
+    if (old_bs) {
+    refresh_list = bdrv_topological_dfs(refresh_list, found, old_bs);
+    }
+    refresh_list = bdrv_topological_dfs(refresh_list, found, new_bs);
+
+    ret = bdrv_list_refresh_perms(refresh_list, NULL, tran, errp);


Speaking of bdrv_remove_filter_or_cow_child(): That function doesn’t refresh 
permissions.  I think it’s correct to do it here, so the following question 
doesn’t really concern this patch, but: Why don’t we do it there?

I guess it’s because we expect the node to go away anyway, so we don’t need to 
refresh the permissions.  And that assumption should hold true right now, given 
its callers.  But is that a safe assumption in general?  Would there be a 
problem if we refreshed permissions there?  Or is not refreshing permissions 
just part of the function’s interface?



Caller of bdrv_remove_filter_or_cow_child() should care about permissions:  
bdrv_replace_node_common() do this, and bdrv_set_backing_noperm() has "_noperm" 
in the name..

The main impact of previous big rework of permission is new scheme of working 
with permission update:

 - first do all graph modifications, not thinking about permissions
 - refresh permissions for the whole updated subgraph
 - if refresh failed, rollback all the modifications (main sense if 
transactions here and there is possibility to do this rollback)

So a lot of internal functions with @tran argument don't update permissions. 
But of course, we should care to update permissions after any graph 
modification.




+
+    tran_finalize(tran, ret);
+
+    if (old_bs) {
+    bdrv_drained_end(old_bs);
+    bdrv_unref(old_bs);
+    }
+    bdrv_drained_end(new_bs);
+
+    return ret;
+}
+
  static void bdrv_delete(BlockDriverState *bs)
  {
  assert(bdrv_op_blocker_is_empty(bs));






--
Best regards,
Vladimir



Re: [ANNOUNCE] libblkio v0.1.0 preview release

2021-05-17 Thread Stefan Hajnoczi
On Fri, May 14, 2021 at 05:55:13PM +0200, Kevin Wolf wrote:
> Am 13.05.2021 um 11:47 hat Stefan Hajnoczi geschrieben:
> > On Thu, May 06, 2021 at 12:33:24PM +0200, Kevin Wolf wrote:
> > > Am 06.05.2021 um 10:46 hat Stefan Hajnoczi geschrieben:
> > > > What do you think about this:
> > > > 
> > > > The blkio instance states are:
> > > > 
> > > >   created -> attached -> started -> destroyed
> > > > 
> > > > It is not possible to go backwards anymore, which simplifies driver
> > > > implementations and it probably won't be needed by applications.
> > > > 
> > > > The "initialized" state is renamed to "attached" to make it clearer that
> > > > this means the block device has been connected/opened. Also
> > > > "initialized" can be confused with "created".
> > > > 
> > > > The corresponding APIs are:
> > > > 
> > > > int blkio_create(const char *driver, struct blkio **bp, char **errmsg);
> > > > int blkio_attach(struct blkio *bp, char **errmsg);
> > > > int blkio_start(struct blkio *bp, char **errmsg);
> > > > void blkio_destroy(struct blkio **bp);
> > > > 
> > > > There is no way to query the state here, but that probably isn't
> > > > necessary since an application setting up the blkio instance must
> > > > already be aware of the state in order to configure it in the first
> > > > place.
> > > > 
> > > > One advantage of this approach is that it can support network drivers
> > > > where the attach and start operations can take a long time while regular
> > > > property accesses do not block.
> > > 
> > > I like this.
> > > 
> > > For properties, I think, each property will have a first state in which
> > > it becomes available and then it will be available in all later states,
> > > too.
> > > 
> > > Currently, apart from properties that are always read-only, we only have
> > > properties that are rw only in their first state and become read-only in
> > > later states. It might be reasonable to assume that properties will
> > > exist that can be rw in all later states, too.
> > > 
> > > In their first state, most properties only store the value into the
> > > config and it's the next state transition that actually makes use of
> > > them. Similarly, reading usually only reads the value from the config.
> > > So these parts can be automatically covered. Usually you would then only
> > > need a custom implementation for property updates after the fact. I
> > > think this could simplify the driver implementations a lot. I'll play
> > > with this a bit more.
> > 
> > Hi Kevin,
> > I posted a patch that introduces blkio_connect() and blkio_start():
> > https://gitlab.com/libblkio/libblkio/-/merge_requests/4
> 
> Assuming that you want review to happen on Gitlab, I added a few
> comments there.
> 
> I'm not sure if you saw it, but on Wednesday, I also created a merge
> request for some first changes to reduce the properties boilerplate in
> the iouring module that would otherwise be duplicated for every new
> driver. Not sure if everything is a good idea, but the first patch is
> almost certainly one.
> 
> (However, I just realised that the test failure is not the same as on
> main, so I degraded it to a draft now. It also conflicts with your merge
> request. Next thing to learn for me is how to respin a merge request on
> Gitlab... You may want to have a look anyway.)

Awesome, I will take a look, thanks. I need to tweak my GitLab
notification options :-).

You can force push to your topic branch to respin the merge request.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v2 0/4] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci

2021-05-17 Thread Stefan Hajnoczi
On Mon, May 17, 2021 at 10:32:59AM +0200, Greg Kurz wrote:
> On Wed, 12 May 2021 17:05:53 +0100
> Stefan Hajnoczi  wrote:
> 
> > On Fri, May 07, 2021 at 06:59:01PM +0200, Greg Kurz wrote:
> > > Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
> > > a serious slow down may be observed on setups with a big enough number
> > > of vCPUs.
> > > 
> > > Exemple with a pseries guest on a bi-POWER9 socket system (128 HW 
> > > threads):
> > > 
> > >   virtio-scsi  virtio-blk
> > > 
> > > 1 0m20.922s   0m21.346s
> > > 2 0m21.230s   0m20.350s
> > > 4 0m21.761s   0m20.997s
> > > 8 0m22.770s   0m20.051s
> > > 160m22.038s   0m19.994s
> > > 320m22.928s   0m20.803s
> > > 640m26.583s   0m22.953s
> > > 128   0m41.273s   0m32.333s
> > > 256   2m4.727s1m16.924s
> > > 384   6m5.563s3m26.186s
> > > 
> > > Both perf and gprof indicate that QEMU is hogging CPUs when setting up
> > > the ioeventfds:
> > > 
> > >  67.88%  swapper [kernel.kallsyms]  [k] power_pmu_enable
> > >   9.47%  qemu-kvm[kernel.kallsyms]  [k] smp_call_function_single
> > >   8.64%  qemu-kvm[kernel.kallsyms]  [k] power_pmu_enable
> > > =>2.79%  qemu-kvmqemu-kvm   [.] 
> > > memory_region_ioeventfd_before
> > > =>2.12%  qemu-kvmqemu-kvm   [.] 
> > > address_space_update_ioeventfds
> > >   0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single
> > > 
> > > address_space_update_ioeventfds() is called when committing an MR
> > > transaction, i.e. for each ioeventfd with the current code base,
> > > and it internally loops on all ioventfds:
> > > 
> > > static void address_space_update_ioeventfds(AddressSpace *as)
> > > {
> > > [...]
> > > FOR_EACH_FLAT_RANGE(fr, view) {
> > > for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
> > > 
> > > This means that the setup of ioeventfds for these devices has
> > > quadratic time complexity.
> > > 
> > > This series simply changes the device models to extend the transaction
> > > to all virtqueueues, like already done in the past in the generic
> > > code with 710fccf80d78 ("virtio: improve virtio devices initialization
> > > time").
> > > 
> > > Only virtio-scsi and virtio-blk are covered here, but a similar change
> > > might also be beneficial to other device types such as host-scsi-pci,
> > > vhost-user-scsi-pci and vhost-user-blk-pci.
> > > 
> > >   virtio-scsi  virtio-blk
> > > 
> > > 1 0m21.271s   0m22.076s
> > > 2 0m20.912s   0m19.716s
> > > 4 0m20.508s   0m19.310s
> > > 8 0m21.374s   0m20.273s
> > > 160m21.559s   0m21.374s
> > > 320m22.532s   0m21.271s
> > > 640m26.550s   0m22.007s
> > > 128   0m29.115s   0m27.446s
> > > 256   0m44.752s   0m41.004s
> > > 384   1m2.884s0m58.023s
> > > 
> > > This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108
> > > which reported the issue for virtio-scsi-pci.
> > > 
> > > Changes since v1:
> > > - Add some comments (Stefan)
> > > - Drop optimization on the error path in patch 2 (Stefan)
> > > 
> > > Changes since RFC:
> > > 
> > > As suggested by Stefan, splimplify the code by directly beginning and
> > > committing the memory transaction from the device model, without all
> > > the virtio specific proxying code and no changes needed in the memory
> > > subsystem.
> > > 
> > > Greg Kurz (4):
> > >   virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
> > >   virtio-blk: Configure all host notifiers in a single MR transaction
> > >   virtio-scsi: Set host notifiers and callbacks separately
> > >   virtio-scsi: Configure all host notifiers in a single MR transaction
> > > 
> > >  hw/block/dataplane/virtio-blk.c | 45 -
> > >  hw/scsi/virtio-scsi-dataplane.c | 72 -
> > >  2 files changed, 97 insertions(+), 20 deletions(-)
> > > 
> > > -- 
> > > 2.26.3
> > > 
> > 
> > Thanks, applied to my block tree:
> > https://gitlab.com/stefanha/qemu/commits/block
> > 
> 
> Hi Stefan,
> 
> It seems that Michael already merged the previous version of this
> patch set with its latest PR.
> 
> https://gitlab.com/qemu-project/qemu/-/commit/6005ee07c380cbde44292f5f6c96e7daa70f4f7d
> 
> It is thus missing the v1->v2 changes. Basically some comments to
> clarify the optimization we're doing with the MR transaction and
> the removal of the optimization on an error path.
> 
> The optimization on the error path isn't needed indeed but it
> doesn't hurt. No need to change that now that the patches are
> upstream.
> 
> I can post a follow-up patch to add the missing comments though.
> While here, I'd even add these comments in the generic
> virtio_device_*_ioeventfd_impl() calls as well, 

Re: [PATCH] replication: move include out of root directory

2021-05-17 Thread Philippe Mathieu-Daudé
On 5/17/21 2:19 PM, Paolo Bonzini wrote:
> The replication.h file is included from migration/colo.c and 
> tests/unit/test-replication.c,
> so it should be in include/.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/replication.c  | 2 +-
>  replication.h => include/block/replication.h | 4 ++--
>  migration/colo.c | 2 +-
>  replication.c| 2 +-
>  tests/unit/test-replication.c| 2 +-
>  5 files changed, 6 insertions(+), 6 deletions(-)
>  rename replication.h => include/block/replication.h (98%)

Including the following hunk:
Reviewed-by: Philippe Mathieu-Daudé 

-- >8 --
diff --git a/MAINTAINERS b/MAINTAINERS
index 7877710e372..34c60d9284f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3257,8 +3257,9 @@ Replication
 M: Wen Congyang 
 M: Xie Changlong 
 S: Supported
-F: replication*
+F: replication.c
 F: block/replication.c
+F: include/block/replication.h
 F: tests/unit/test-replication.c
 F: docs/block-replication.txt

---




Re: [PULL 00/19] Block patches

2021-05-17 Thread Peter Maydell
On Fri, 14 May 2021 at 17:45, Max Reitz  wrote:
>
> The following changes since commit 96662996eda78c48aa4e76d8615c7eb72d80:
>
>   Merge remote-tracking branch 
> 'remotes/dgilbert/tags/pull-migration-20210513a' into staging (2021-05-14 
> 12:03:47 +0100)
>
> are available in the Git repository at:
>
>   https://github.com/XanClic/qemu.git tags/pull-block-2021-05-14
>
> for you to fetch changes up to c61ebf362d0abf288ce266845519d5a550a1d89f:
>
>   write-threshold: deal with includes (2021-05-14 16:14:10 +0200)
>
> 
> Block patches:
> - drop block/io write notifiers
> - qemu-iotests enhancements to make debugging easier
> - rbd parsing fix
> - HMP qemu-io fix (for iothreads)
> - mirror job cancel relaxation (do not cancel in-flight requests when a
>   READY mirror job is canceled with force=false)
> - document qcow2's data_file and data_file_raw features
> - fix iotest 297 for pylint 2.8
> - block/copy-on-read refactoring
>
> 


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.1
for any user-visible changes.

-- PMM



Qemu block filter insertion/removal API

2021-05-17 Thread Vladimir Sementsov-Ogievskiy

Hi all!

I'd like to be sure that we know where we are going to.

In blockdev-era where qemu user is aware about block nodes, all nodes have good 
names and controlled by user we can efficiently use block filters.

We already have some useful filters: copy-on-read, throttling, compress. In my 
parallel series I make backup-top filter public and useful without backup block 
jobs. But now filters could be inserted only together with opening their child. 
We can specify filters in qemu cmdline, or filter can take place in the block 
node chain created by blockdev-add.

Still, it would be good to insert/remove filters on demand.

Currently we are going to use x-blockdev-reopen for this. Still it can't be used to 
insert a filter above root node (as x-blockdev-reopen can change only block node options 
and their children). In my series "[PATCH 00/21] block: publish backup-top 
filter" I propose (as Kevin suggested) to modify qom-set, so that it can set drive 
option of running device. That's not difficult, but it means that we have different 
scenario of inserting/removing filters:

1. filter above root node X:

inserting:

  - do blockdev-add to add a filter (and specify X as its child)
  - do qom-set to set new filter as a rood node instead of X

removing

  - do qom-set to make X a root node again
  - do blockdev-del to drop a filter

2. filter between two block nodes P and X. (For example, X is a backing child 
of P)

inserting

  - do blockdev-add to add a filter (and specify X as its child)
  - do blockdev-reopen to set P.backing = filter

remvoing

  - do blockdev-reopen to set P.backing = X
  - do blockdev-del to drop a filter


And, probably we'll want transaction support for all these things.


Is it OK? Or do we need some kind of additional blockdev-replace command, that 
can replace one node by another, so in both cases we will do

inserting:
 
  - blockdev-add filter

  - blockdev-replace (make all parents of X to point to the new filter instead 
(except for the filter itself of course)

removing
  
  - blockdev-replace (make all parante of filter to be parents of X instead)

  - blockdev-del filter


It's simple to implement, and it seems for me that it is simpler to use. Any 
thoughts?

--
Best regards,
Vladimir



Re: [PATCH 03/21] qdev-properties: PropertyInfo: add realized_set_allowed field

2021-05-17 Thread Max Reitz

On 17.05.21 08:44, Vladimir Sementsov-Ogievskiy wrote:

Add field, so property can declare support for setting the property
when device is realized. To be used in the following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/hw/qdev-properties.h | 1 +
  hw/core/qdev-properties.c| 6 +++---
  2 files changed, 4 insertions(+), 3 deletions(-)


Looks OK to me, although qdev isn’t my specialty.


diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 0ef97d60ce..007e1f69f4 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -32,6 +32,7 @@ struct PropertyInfo {
  const char *name;
  const char *description;
  const QEnumLookup *enum_table;
+bool realized_set_allowed;


I think a comment would be nice, though.

Max


  int (*print)(Object *obj, Property *prop, char *dest, size_t len);
  void (*set_default_value)(ObjectProperty *op, const Property *prop);
  ObjectProperty *(*create)(ObjectClass *oc, const char *name,
diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index 50f40949f5..c34aac6ebc 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -26,11 +26,11 @@ void qdev_prop_set_after_realize(DeviceState *dev, const 
char *name,
  
  /* returns: true if property is allowed to be set, false otherwise */

  static bool qdev_prop_allow_set(Object *obj, const char *name,
-Error **errp)
+const PropertyInfo *info, Error **errp)
  {
  DeviceState *dev = DEVICE(obj);
  
-if (dev->realized) {

+if (dev->realized && !info->realized_set_allowed) {
  qdev_prop_set_after_realize(dev, name, errp);
  return false;
  }
@@ -79,7 +79,7 @@ static void field_prop_set(Object *obj, Visitor *v, const 
char *name,
  {
  Property *prop = opaque;
  
-if (!qdev_prop_allow_set(obj, name, errp)) {

+if (!qdev_prop_allow_set(obj, name, prop->info, errp)) {
  return;
  }
  






Re: [PATCH] Add missing coroutine_fn function signature to functions

2021-05-17 Thread cennedee
Focusing on a single file at a time now, this particular revised patch adds
missing function signature `coroutine_fn` to definitions in 
scsi/qemu-pr-helper.c
Intend to do more files in a separate patch series once I get the full flow of 
this.

Compared to my previous e-mail, have also confirmed this edit passes 
checkpatch.pl

The following functions are affected.

do_sgio()
do_pr_in() --> do_sgio()
do_pr_out() --> do_sgio()
mpath_reconstruct_sense() --> do_sgio()
multipath_pr_out() --> mpath_reconstruct_sense() --> do_sgio()
multipath_pr_in() --> mpath_reconstruct_sense() --> do_sgio()
accept_client() --> prh_co_entry()


>From 5bdef14027457d412972131dace76c3cabcc45a0 Mon Sep 17 00:00:00 2001
From: Cenne Dee 
Date: Fri, 30 Apr 2021 15:52:28 -0400
Subject: [PATCH] Add missing coroutine_fn function signature to some _co()
 functions

Patch adds the signature for relevant functions ending with _co
or those that use them.

Signed-off-by: Cenne Dee 
---
 scsi/qemu-pr-helper.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/scsi/qemu-pr-helper.c b/scsi/qemu-pr-helper.c
index 7b9389b47b..7ed47c17c7 100644
--- a/scsi/qemu-pr-helper.c
+++ b/scsi/qemu-pr-helper.c
@@ -175,8 +175,8 @@ static int do_sgio_worker(void *opaque)
 return status;
 }

-static int do_sgio(int fd, const uint8_t *cdb, uint8_t *sense,
-uint8_t *buf, int *sz, int dir)
+static int coroutine_fn do_sgio(int fd, const uint8_t *cdb, uint8_t *sense,
+uint8_t *buf, int *sz, int dir)
 {
 ThreadPool *pool = aio_get_thread_pool(qemu_get_aio_context());
 int r;
@@ -318,7 +318,7 @@ static SCSISense mpath_generic_sense(int r)
 }
 }

-static int mpath_reconstruct_sense(int fd, int r, uint8_t *sense)
+static int coroutine_fn mpath_reconstruct_sense(int fd, int r, uint8_t *sense)
 {
 switch (r) {
 case MPATH_PR_SUCCESS:
@@ -370,8 +370,8 @@ static int mpath_reconstruct_sense(int fd, int r, uint8_t 
*sense)
 }
 }

-static int multipath_pr_in(int fd, const uint8_t *cdb, uint8_t *sense,
-   uint8_t *data, int sz)
+static int coroutine_fn multipath_pr_in(int fd, const uint8_t *cdb,
+uint8_t *sense, uint8_t *data, int sz)
 {
 int rq_servact = cdb[1];
 struct prin_resp resp;
@@ -425,8 +425,9 @@ static int multipath_pr_in(int fd, const uint8_t *cdb, 
uint8_t *sense,
 return mpath_reconstruct_sense(fd, r, sense);
 }

-static int multipath_pr_out(int fd, const uint8_t *cdb, uint8_t *sense,
-const uint8_t *param, int sz)
+static int coroutine_fn multipath_pr_out(int fd, const uint8_t *cdb,
+ uint8_t *sense, const uint8_t *param,
+ int sz)
 {
 int rq_servact = cdb[1];
 int rq_scope = cdb[2] >> 4;
@@ -543,8 +544,8 @@ static int multipath_pr_out(int fd, const uint8_t *cdb, 
uint8_t *sense,
 }
 #endif

-static int do_pr_in(int fd, const uint8_t *cdb, uint8_t *sense,
-uint8_t *data, int *resp_sz)
+static int coroutine_fn do_pr_in(int fd, const uint8_t *cdb, uint8_t *sense,
+ uint8_t *data, int *resp_sz)
 {
 #ifdef CONFIG_MPATH
 if (is_mpath(fd)) {
@@ -561,8 +562,8 @@ static int do_pr_in(int fd, const uint8_t *cdb, uint8_t 
*sense,
SG_DXFER_FROM_DEV);
 }

-static int do_pr_out(int fd, const uint8_t *cdb, uint8_t *sense,
- const uint8_t *param, int sz)
+static int coroutine_fn do_pr_out(int fd, const uint8_t *cdb, uint8_t *sense,
+  const uint8_t *param, int sz)
 {
 int resp_sz;

@@ -804,7 +805,8 @@ out:
 g_free(client);
 }

-static gboolean accept_client(QIOChannel *ioc, GIOCondition cond, gpointer 
opaque)
+static gboolean coroutine_fn accept_client(QIOChannel *ioc, GIOCondition cond,
+   gpointer opaque)
 {
 QIOChannelSocket *cioc;
 PRHelperClient *prh;
--
2.31.1




[PATCH] replication: move include out of root directory

2021-05-17 Thread Paolo Bonzini
The replication.h file is included from migration/colo.c and 
tests/unit/test-replication.c,
so it should be in include/.

Signed-off-by: Paolo Bonzini 
---
 block/replication.c  | 2 +-
 replication.h => include/block/replication.h | 4 ++--
 migration/colo.c | 2 +-
 replication.c| 2 +-
 tests/unit/test-replication.c| 2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)
 rename replication.h => include/block/replication.h (98%)

diff --git a/block/replication.c b/block/replication.c
index 97be7ef4de..52163f2d1f 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -22,7 +22,7 @@
 #include "sysemu/block-backend.h"
 #include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
-#include "replication.h"
+#include "block/replication.h"
 
 typedef enum {
 BLOCK_REPLICATION_NONE, /* block replication is not started */
diff --git a/replication.h b/include/block/replication.h
similarity index 98%
rename from replication.h
rename to include/block/replication.h
index d49fc22cb9..21931b4f0c 100644
--- a/replication.h
+++ b/include/block/replication.h
@@ -23,7 +23,7 @@ typedef struct ReplicationOps ReplicationOps;
 typedef struct ReplicationState ReplicationState;
 
 /**
- * SECTION:replication.h
+ * SECTION:block/replication.h
  * @title:Base Replication System
  * @short_description: interfaces for handling replication
  *
@@ -32,7 +32,7 @@ typedef struct ReplicationState ReplicationState;
  * 
  *   How to use replication interfaces
  *   
- * #include "replication.h"
+ * #include "block/replication.h"
  *
  * typedef struct BDRVReplicationState {
  * ReplicationState *rs;
diff --git a/migration/colo.c b/migration/colo.c
index de27662cab..e498fdb125 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -28,7 +28,7 @@
 #include "migration/failover.h"
 #include "migration/ram.h"
 #ifdef CONFIG_REPLICATION
-#include "replication.h"
+#include "block/replication.h"
 #endif
 #include "net/colo-compare.h"
 #include "net/colo.h"
diff --git a/replication.c b/replication.c
index be3a42f9c9..4acd3f8004 100644
--- a/replication.c
+++ b/replication.c
@@ -14,7 +14,7 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
-#include "replication.h"
+#include "block/replication.h"
 
 static QLIST_HEAD(, ReplicationState) replication_states;
 
diff --git a/tests/unit/test-replication.c b/tests/unit/test-replication.c
index b067240add..afff908d77 100644
--- a/tests/unit/test-replication.c
+++ b/tests/unit/test-replication.c
@@ -14,7 +14,7 @@
 #include "qapi/qmp/qdict.h"
 #include "qemu/option.h"
 #include "qemu/main-loop.h"
-#include "replication.h"
+#include "block/replication.h"
 #include "block/block_int.h"
 #include "block/qdict.h"
 #include "sysemu/block-backend.h"
-- 
2.27.0




Re: [PATCH 02/21] block: introduce blk_replace_bs

2021-05-17 Thread Max Reitz

On 17.05.21 08:44, Vladimir Sementsov-Ogievskiy wrote:

Add function to change bs inside blk.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/sysemu/block-backend.h | 1 +
  block/block-backend.c  | 8 
  2 files changed, 9 insertions(+)

diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 880e903293..aec05ef0a0 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -98,6 +98,7 @@ BlockBackend *blk_by_public(BlockBackendPublic *public);
  BlockDriverState *blk_bs(BlockBackend *blk);
  void blk_remove_bs(BlockBackend *blk);
  int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp);
+int blk_replace_bs(BlockBackend *blk, BlockDriverState *new_bs, Error **errp);
  bool bdrv_has_blk(BlockDriverState *bs);
  bool bdrv_is_root_node(BlockDriverState *bs);
  int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
diff --git a/block/block-backend.c b/block/block-backend.c
index de5496af66..b1abc6f3e6 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -870,6 +870,14 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, 
Error **errp)
  return 0;
  }
  
+/*

+ * Change BlockDriverState associated with @blk.
+ */
+int blk_replace_bs(BlockBackend *blk, BlockDriverState *new_bs, Error **errp)
+{
+return bdrv_replace_child_bs(blk->root, new_bs, errp);
+}


Reviewed-by: Max Reitz 

(Looks indeed like we don’t need to do any of the things that 
blk_insert_bs() and blk_remove_bs() do besides inserting and removing 
the node.)





Re: [PATCH 01/21] block: introduce bdrv_replace_child_bs()

2021-05-17 Thread Max Reitz

On 17.05.21 08:44, Vladimir Sementsov-Ogievskiy wrote:

Add function to transactionally replace bs inside BdrvChild.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  include/block/block.h |  2 ++
  block.c   | 36 
  2 files changed, 38 insertions(+)


As you may guess, I know little about the rewritten replacing functions, 
so this is kind of difficult to review for me.  However, nothing looks 
out of place, and the function looks sufficiently similar to 
bdrv_replace_node_common() to make me happy.



diff --git a/include/block/block.h b/include/block/block.h
index 82185965ff..f9d5fcb108 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -361,6 +361,8 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState 
*bs_top,
  Error **errp);
  int bdrv_replace_node(BlockDriverState *from, BlockDriverState *to,
Error **errp);
+int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
+  Error **errp);
  BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict *node_options,
 int flags, Error **errp);
  int bdrv_drop_filter(BlockDriverState *bs, Error **errp);
diff --git a/block.c b/block.c
index 9ad725d205..755fa53d85 100644
--- a/block.c
+++ b/block.c
@@ -4961,6 +4961,42 @@ out:
  return ret;
  }
  
+int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,

+  Error **errp)
+{
+int ret;
+Transaction *tran = tran_new();
+g_autoptr(GHashTable) found = NULL;
+g_autoptr(GSList) refresh_list = NULL;
+BlockDriverState *old_bs = child->bs;
+
+if (old_bs) {


Hm.  Can child->bs be ever NULL?


+bdrv_ref(old_bs);
+bdrv_drained_begin(old_bs);
+}
+bdrv_drained_begin(new_bs);


(I was wondering why we couldn’t handle the new_bs == NULL case here to 
replace bdrv_remove_filter_or_cow_child(), but then I realized it’s 
probably because that’s kind of difficult, precisely because child->bs 
at least should generally be non-NULL.  Which is why 
bdrv_remove_filter_or_cow_child() needs to add its own transaction entry 
to handle the BdrvChild object and the pointer to it.


Hence me wondering whether we could assume child->bs not to be NULL.)


+
+bdrv_replace_child(child, new_bs, tran);
+
+found = g_hash_table_new(NULL, NULL);
+if (old_bs) {
+refresh_list = bdrv_topological_dfs(refresh_list, found, old_bs);
+}
+refresh_list = bdrv_topological_dfs(refresh_list, found, new_bs);
+
+ret = bdrv_list_refresh_perms(refresh_list, NULL, tran, errp);


Speaking of bdrv_remove_filter_or_cow_child(): That function doesn’t 
refresh permissions.  I think it’s correct to do it here, so the 
following question doesn’t really concern this patch, but: Why don’t we 
do it there?


I guess it’s because we expect the node to go away anyway, so we don’t 
need to refresh the permissions.  And that assumption should hold true 
right now, given its callers.  But is that a safe assumption in general? 
 Would there be a problem if we refreshed permissions there?  Or is not 
refreshing permissions just part of the function’s interface?


Max


+
+tran_finalize(tran, ret);
+
+if (old_bs) {
+bdrv_drained_end(old_bs);
+bdrv_unref(old_bs);
+}
+bdrv_drained_end(new_bs);
+
+return ret;
+}
+
  static void bdrv_delete(BlockDriverState *bs)
  {
  assert(bdrv_op_blocker_is_empty(bs));






RFC: Qemu backup interface plans

2021-05-17 Thread Vladimir Sementsov-Ogievskiy

Hi all!

I'd like to share and discuss some plans on Qemu backup interface I have. 
(actually, most of this I've presented on KVM Forum long ago.. But now I'm a 
lot closer to realization:)


I'd start with a reminder about image fleecing:

We have image fleecing scheme to export point-in-time state of active
disk (iotest 222):


  backup(sync=none)
 ┌───┐
 ▼   │
┌┐ ┌┐  backing ┌─┐
│ NBD export │ ─── │ temp qcow2 img │ ───▶ │ active disk │
└┘ └┘  └─┘
 ▲
┌┐   │
│ guest blk  │ ──┘
└┘


Actually, backup job inserts a backup-top filter, so in detail it looks
like:

  backup(sync=none)
 ┌───┐
 ▼   │
┌┐ ┌┐  backing ┌─┐
│ NBD export │ ─── │ temp qcow2 img │ ───▶ │ active disk │
└┘ └┘  └─┘
 ▲   ▲
 │ target│
 │   │
┌┐ ┌┐  backing   │
│ guest blk  │ ──▶ │   backup-top   │ ───┘
└┘ └┘
 


This scheme is also called external backup or pull backup. It allows some 
external tool to write data to actual backup, and Qemu only provides this data.

We support also incremental external backup: Qemu can manage dirty bitmaps in any way 
user wants, and we can export bitmaps through NBD protocol. So, client of NBD export can 
get the bitmap, and read only "dirty" regions of exported image.

What we lack in this scheme:

1. handling dirty bitmap in backup-top filter: backup-top does copy-before-write 
operation on any guest write, when actually we are interested only in "dirty" 
regions for incremental backup

Probable solution would allowing specifying bitmap for sync=none mode of 
backup, but I think what I propose below is better.

2. [actually it's a tricky part of 1]: possibility to not do copy-before-write 
operations for regions that was already copied to final backup. With normal 
Qemu backup job, this is achieved by the fact that block-copy state with its 
internal bitmap is shared between backup job and copy-before-write filter.

3. Not a real problem but fact: backup block-job does nothing in the scheme, 
the whole job is done by filter. So, it would be interesting to have a 
possibility to simply insert/remove the filter, and avoid block-job creation 
and managing at all for external backup. (and I'd like to send another RFC on 
how to insert/remove filters, let's not discuss it here).


Next. Think about internal backup. It has one drawback too:
4. If target is remote with slow connection, copy-before-write operations will 
slow down guest writes appreciably.

It may be solved with help of image fleecing: we create temporary qcow2 image, 
setup fleecing scheme, and instead of exporting temp image through NBD we start 
a second backup with source = temporary image and target would be real backup 
target (NBD for example). Still, with such solution there are same [1,2] 
problems, 3 becomes worse:

5. We'll have two jobs and two automatically inserted filters, when actually 
one filter and one job are enough (as first job is needed only to insert a 
filter, second job doesn't need a filter at all).

Note also, that this (starting two backup jobs to make push backup with 
fleecing) doesn't work now, op-blockers will be against. It's simple to fix 
(and in Virtuozzo we live with downstream-only patch, which allows push backup 
with fleecing, based on starting two backup jobs).. But I never send a patch, 
as I have better plan, which will solve all listed problems.


So, what I propose:

1. We make backup-top filter public, so that it could be inserted/removed where 
user wants through QMP (how to properly insert/remove filter I'll post another 
RFC, as backup-top is not the only filter that can be usefully inserted 
somewhere). For this first step I've sent a series today:

  subject: [PATCH 00/21] block: publish backup-top filter
  id: <20210517064428.16223-1-vsement...@virtuozzo.com>
  patchew: 
https://patchew.org/QEMU/20210517064428.16223-1-vsement...@virtuozzo.com/

(note, that one of things in this series is rename 
s/backup-top/copy-before-write/, still, I call it backup-top in this letter)

This solves [3]. [4, 5] are solved 

Re: [PATCH] fdc: check drive block device before usage (CVE-2021-20196)

2021-05-17 Thread P J P
+-- On Sat, 15 May 2021, Philippe Mathieu-Daudé wrote --+
| This patch misses the qtest companion with the reproducer
| provided by Alexander.

Do we need a revised patch[-series] including a qtest? OR it can be done at 
merge time?

Thank you.
--
 - P J P
8685 545E B54C 486B C6EB 271E E285 8B5A F050 DE8D

Re: [PULL 00/14] Block layer patches

2021-05-17 Thread Peter Maydell
On Sun, 16 May 2021 at 22:09, Philippe Mathieu-Daudé  wrote:
>
> On 5/14/21 6:31 PM, Kevin Wolf wrote:
> > The following changes since commit 96662996eda78c48aa4e76d8615c7eb72d80:
> >
> >   Merge remote-tracking branch 
> > 'remotes/dgilbert/tags/pull-migration-20210513a' into staging (2021-05-14 
> > 12:03:47 +0100)
> >
> > are available in the Git repository at:
> >
> >   git://repo.or.cz/qemu/kevin.git tags/for-upstream
> >
> > for you to fetch changes up to b773c9fb68ceff9a9692409d7afbc5d6865983c6:
> >
> >   vhost-user-blk: Check that num-queues is supported by backend (2021-05-14 
> > 18:04:27 +0200)
> >
> > 
> > Block layer patches
> >
> > - vhost-user-blk: Fix error handling during initialisation
> > - Add test cases for the vhost-user-blk export
> > - Fix leaked Transaction objects
> > - qcow2: Expose dirty bit in 'qemu-img info'
> >
> > 
> > Coiby Xu (1):
> >   test: new qTest case to test the vhost-user-blk-server
>
> Not sure if worth blocking the pull request, but this new test
> breaks builds using --disable-tools (therefore breaks bisection).

Since I hadn't got as far as applying it, I'll drop the pullreq so
that can be fixed.

-- PMM



[PATCH 0/3] adding ctrl list (cns 0x13) support and random fixes

2021-05-17 Thread Gollu Appalanaidu
This series will add the Identify Controller List (CNS 0x13) support
and NSID endian conversion fixes for CNS 0x12 and CNS 0x13.

Documentation fix for the '-detached' parameter.

Gollu Appalanaidu (3):
  hw/nvme/ctrl: add controller list cns 0x13
  hw/nvme/ctrl: fix endian conversion for nsid in ctrl list
  hw/nvme/ctrl: documentation fix

 hw/nvme/ctrl.c   | 28 +---
 hw/nvme/trace-events |  2 +-
 include/block/nvme.h |  1 +
 3 files changed, 19 insertions(+), 12 deletions(-)

-- 
2.17.1




[PATCH 3/3] hw/nvme/ctrl: documentation fix

2021-05-17 Thread Gollu Appalanaidu
In the documentation of the '-detached' param "be" and "not" has been
used side by side, fix that.

Signed-off-by: Gollu Appalanaidu 
---
 hw/nvme/ctrl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 813a72c655..a3df26d0ce 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -114,7 +114,7 @@
  *   This parameter is only valid together with the `subsys` parameter. If left
  *   at the default value (`false/off`), the namespace will be attached to all
  *   controllers in the NVMe subsystem at boot-up. If set to `true/on`, the
- *   namespace will be be available in the subsystem not not attached to any
+ *   namespace will be available in the subsystem not attached to any
  *   controllers.
  *
  * Setting `zoned` to true selects Zoned Command Set at the namespace.
-- 
2.17.1




[PATCH 2/3] hw/nvme/ctrl: fix endian conversion for nsid in ctrl list

2021-05-17 Thread Gollu Appalanaidu
In Identify Ctrl List of the CNS 0x12 and 0x13 no endian conversion
for the nsid field.

Signed-off-by: Gollu Appalanaidu 
---
 hw/nvme/ctrl.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index d08a3350e2..813a72c655 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4255,6 +4255,7 @@ static uint16_t nvme_identify_ctrl_list(NvmeCtrl *n, 
NvmeRequest *req,
 bool attached)
 {
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
+uint32_t nsid = le32_to_cpu(c->nsid);
 uint16_t min_id = le16_to_cpu(c->ctrlid);
 uint16_t list[NVME_CONTROLLER_LIST_SIZE] = {};
 uint16_t *ids = [1];
@@ -4265,11 +4266,11 @@ static uint16_t nvme_identify_ctrl_list(NvmeCtrl *n, 
NvmeRequest *req,
 trace_pci_nvme_identify_ctrl_list(c->cns, min_id);
 
 if (attached) {
-if (c->nsid == NVME_NSID_BROADCAST) {
+if (nsid == NVME_NSID_BROADCAST) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-ns = nvme_subsys_ns(n->subsys, c->nsid);
+ns = nvme_subsys_ns(n->subsys, nsid);
 if (!ns) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
@@ -4281,7 +4282,7 @@ static uint16_t nvme_identify_ctrl_list(NvmeCtrl *n, 
NvmeRequest *req,
 continue;
 }
 
-if (attached && !nvme_ns(ctrl, c->nsid)) {
+if (attached && !nvme_ns(ctrl, nsid)) {
 continue;
 }
 
-- 
2.17.1




[PATCH v2 5/6] coroutine-sleep: replace QemuCoSleepState pointer with struct in the API

2021-05-17 Thread Paolo Bonzini
Right now, users of qemu_co_sleep_ns_wakeable are simply passing
a pointer to QemuCoSleepState by reference to the function.  But
QemuCoSleepState really is just a Coroutine*; making the
content of the struct public is just as efficient and lets us
skip the user_state_pointer indirection.

Since the usage is changed, take the occasion to rename the
struct to QemuCoSleep.

Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Paolo Bonzini 
---
 block/block-copy.c  |  8 
 block/nbd.c | 10 -
 include/qemu/coroutine.h| 23 +++--
 util/qemu-coroutine-sleep.c | 41 -
 4 files changed, 39 insertions(+), 43 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index f896dc56f2..c2e5090412 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -50,7 +50,7 @@ typedef struct BlockCopyCallState {
 /* State */
 int ret;
 bool finished;
-QemuCoSleepState *sleep_state;
+QemuCoSleep sleep;
 bool cancelled;
 
 /* OUT parameters */
@@ -625,8 +625,8 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
 if (ns > 0) {
 block_copy_task_end(task, -EAGAIN);
 g_free(task);
-qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, ns,
-  _state->sleep_state);
+qemu_co_sleep_ns_wakeable(_state->sleep,
+  QEMU_CLOCK_REALTIME, ns);
 continue;
 }
 }
@@ -674,7 +674,7 @@ out:
 
 void block_copy_kick(BlockCopyCallState *call_state)
 {
-qemu_co_sleep_wake(call_state->sleep_state);
+qemu_co_sleep_wake(_state->sleep);
 }
 
 /*
diff --git a/block/nbd.c b/block/nbd.c
index 1c6315b168..616f9ae6c4 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -116,7 +116,7 @@ typedef struct BDRVNBDState {
 CoQueue free_sema;
 Coroutine *connection_co;
 Coroutine *teardown_co;
-QemuCoSleepState *connection_co_sleep_ns_state;
+QemuCoSleep reconnect_sleep;
 bool drained;
 bool wait_drained_end;
 int in_flight;
@@ -289,7 +289,7 @@ static void coroutine_fn 
nbd_client_co_drain_begin(BlockDriverState *bs)
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
 s->drained = true;
-qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
+qemu_co_sleep_wake(>reconnect_sleep);
 
 nbd_co_establish_connection_cancel(bs, false);
 
@@ -328,7 +328,7 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 
 s->state = NBD_CLIENT_QUIT;
 if (s->connection_co) {
-qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
+qemu_co_sleep_wake(>reconnect_sleep);
 nbd_co_establish_connection_cancel(bs, true);
 }
 if (qemu_in_coroutine()) {
@@ -685,8 +685,8 @@ static coroutine_fn void nbd_co_reconnect_loop(BDRVNBDState 
*s)
 }
 bdrv_inc_in_flight(s->bs);
 } else {
-qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, timeout,
-  >connection_co_sleep_ns_state);
+qemu_co_sleep_ns_wakeable(>reconnect_sleep,
+  QEMU_CLOCK_REALTIME, timeout);
 if (s->drained) {
 continue;
 }
diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index c5d7742989..82c0671f80 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -291,21 +291,22 @@ void qemu_co_rwlock_wrlock(CoRwlock *lock);
  */
 void qemu_co_rwlock_unlock(CoRwlock *lock);
 
-typedef struct QemuCoSleepState QemuCoSleepState;
+typedef struct QemuCoSleep {
+Coroutine *to_wake;
+} QemuCoSleep;
 
 /**
- * Yield the coroutine for a given duration. During this yield, @sleep_state
- * is set to an opaque pointer, which may be used for
- * qemu_co_sleep_wake(). Be careful, the pointer is set back to zero when the
- * timer fires. Don't save the obtained value to other variables and don't call
- * qemu_co_sleep_wake from another aio context.
+ * Yield the coroutine for a given duration. Initializes @w so that,
+ * during this yield, it can be passed to qemu_co_sleep_wake() to
+ * terminate the sleep.
  */
-void coroutine_fn qemu_co_sleep_ns_wakeable(QEMUClockType type, int64_t ns,
-QemuCoSleepState **sleep_state);
+void coroutine_fn qemu_co_sleep_ns_wakeable(QemuCoSleep *w,
+QEMUClockType type, int64_t ns);
+
 static inline void coroutine_fn qemu_co_sleep_ns(QEMUClockType type, int64_t 
ns)
 {
-QemuCoSleepState *unused = NULL;
-qemu_co_sleep_ns_wakeable(type, ns, );
+QemuCoSleep w = { 0 };
+qemu_co_sleep_ns_wakeable(, type, ns);
 }
 
 /**
@@ -314,7 +315,7 @@ static inline void coroutine_fn 
qemu_co_sleep_ns(QEMUClockType type, int64_t ns)
  * qemu_co_sleep_ns() and should be checked to be non-NULL 

[PATCH 1/3] hw/nvme/ctrl: add controller list cns 0x13

2021-05-17 Thread Gollu Appalanaidu
Add the controller identifiers list available in NVM Subsystem
that may or may not be attached to namespaces.

Signed-off-by: Gollu Appalanaidu 
---
 hw/nvme/ctrl.c   | 25 +++--
 hw/nvme/trace-events |  2 +-
 include/block/nvme.h |  1 +
 3 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 2e7498a73e..d08a3350e2 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -4251,7 +4251,8 @@ static uint16_t nvme_identify_ns(NvmeCtrl *n, NvmeRequest 
*req, bool active)
 return NVME_INVALID_CMD_SET | NVME_DNR;
 }
 
-static uint16_t nvme_identify_ns_attached_list(NvmeCtrl *n, NvmeRequest *req)
+static uint16_t nvme_identify_ctrl_list(NvmeCtrl *n, NvmeRequest *req,
+bool attached)
 {
 NvmeIdentify *c = (NvmeIdentify *)>cmd;
 uint16_t min_id = le16_to_cpu(c->ctrlid);
@@ -4261,15 +4262,17 @@ static uint16_t nvme_identify_ns_attached_list(NvmeCtrl 
*n, NvmeRequest *req)
 NvmeCtrl *ctrl;
 int cntlid, nr_ids = 0;
 
-trace_pci_nvme_identify_ns_attached_list(min_id);
+trace_pci_nvme_identify_ctrl_list(c->cns, min_id);
 
-if (c->nsid == NVME_NSID_BROADCAST) {
-return NVME_INVALID_FIELD | NVME_DNR;
-}
+if (attached) {
+if (c->nsid == NVME_NSID_BROADCAST) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
 
-ns = nvme_subsys_ns(n->subsys, c->nsid);
-if (!ns) {
-return NVME_INVALID_FIELD | NVME_DNR;
+ns = nvme_subsys_ns(n->subsys, c->nsid);
+if (!ns) {
+return NVME_INVALID_FIELD | NVME_DNR;
+}
 }
 
 for (cntlid = min_id; cntlid < ARRAY_SIZE(n->subsys->ctrls); cntlid++) {
@@ -4278,7 +4281,7 @@ static uint16_t nvme_identify_ns_attached_list(NvmeCtrl 
*n, NvmeRequest *req)
 continue;
 }
 
-if (!nvme_ns(ctrl, c->nsid)) {
+if (attached && !nvme_ns(ctrl, c->nsid)) {
 continue;
 }
 
@@ -4493,7 +4496,9 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeRequest 
*req)
 case NVME_ID_CNS_NS_PRESENT:
 return nvme_identify_ns(n, req, false);
 case NVME_ID_CNS_NS_ATTACHED_CTRL_LIST:
-return nvme_identify_ns_attached_list(n, req);
+return nvme_identify_ctrl_list(n, req, true);
+case NVME_ID_CNS_CTRL_LIST:
+return nvme_identify_ctrl_list(n, req, false);
 case NVME_ID_CNS_CS_NS:
 return nvme_identify_ns_csi(n, req, true);
 case NVME_ID_CNS_CS_NS_PRESENT:
diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
index ea33d0ccc3..7ba3714671 100644
--- a/hw/nvme/trace-events
+++ b/hw/nvme/trace-events
@@ -55,7 +55,7 @@ pci_nvme_identify(uint16_t cid, uint8_t cns, uint16_t ctrlid, 
uint8_t csi) "cid
 pci_nvme_identify_ctrl(void) "identify controller"
 pci_nvme_identify_ctrl_csi(uint8_t csi) "identify controller, csi=0x%"PRIx8""
 pci_nvme_identify_ns(uint32_t ns) "nsid %"PRIu32""
-pci_nvme_identify_ns_attached_list(uint16_t cntid) "cntid=%"PRIu16""
+pci_nvme_identify_ctrl_list(uint8_t cns, uint16_t cntid) "cns 0x%"PRIx8" 
cntid=%"PRIu16""
 pci_nvme_identify_ns_csi(uint32_t ns, uint8_t csi) "nsid=%"PRIu32", 
csi=0x%"PRIx8""
 pci_nvme_identify_nslist(uint32_t ns) "nsid %"PRIu32""
 pci_nvme_identify_nslist_csi(uint16_t ns, uint8_t csi) "nsid=%"PRIu16", 
csi=0x%"PRIx8""
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 0ff9ce17a9..188ab460df 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -980,6 +980,7 @@ enum NvmeIdCns {
 NVME_ID_CNS_NS_PRESENT_LIST   = 0x10,
 NVME_ID_CNS_NS_PRESENT= 0x11,
 NVME_ID_CNS_NS_ATTACHED_CTRL_LIST = 0x12,
+NVME_ID_CNS_CTRL_LIST = 0x13,
 NVME_ID_CNS_CS_NS_PRESENT_LIST= 0x1a,
 NVME_ID_CNS_CS_NS_PRESENT = 0x1b,
 NVME_ID_CNS_IO_COMMAND_SET= 0x1c,
-- 
2.17.1




[PATCH v2 6/6] coroutine-sleep: introduce qemu_co_sleep

2021-05-17 Thread Paolo Bonzini
Allow using QemuCoSleep to sleep forever until woken by qemu_co_sleep_wake.
This makes the logic of qemu_co_sleep_ns_wakeable easy to understand.

In the future we will introduce an API that can work even if the
sleep and wake happen from different threads.  For now, initializing
w->to_wake after timer_mod is fine because the timer can only fire in
the same AioContext.

Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Paolo Bonzini 
---
 include/qemu/coroutine.h|  5 +
 util/qemu-coroutine-sleep.c | 26 +++---
 2 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 82c0671f80..292e61aef0 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -303,6 +303,11 @@ typedef struct QemuCoSleep {
 void coroutine_fn qemu_co_sleep_ns_wakeable(QemuCoSleep *w,
 QEMUClockType type, int64_t ns);
 
+/**
+ * Yield the coroutine until the next call to qemu_co_sleep_wake.
+ */
+void coroutine_fn qemu_co_sleep(QemuCoSleep *w);
+
 static inline void coroutine_fn qemu_co_sleep_ns(QEMUClockType type, int64_t 
ns)
 {
 QemuCoSleep w = { 0 };
diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
index 89c3b758c5..571ab521ff 100644
--- a/util/qemu-coroutine-sleep.c
+++ b/util/qemu-coroutine-sleep.c
@@ -41,12 +41,9 @@ static void co_sleep_cb(void *opaque)
 qemu_co_sleep_wake(w);
 }
 
-void coroutine_fn qemu_co_sleep_ns_wakeable(QemuCoSleep *w,
-QEMUClockType type, int64_t ns)
+void coroutine_fn qemu_co_sleep(QemuCoSleep *w)
 {
 Coroutine *co = qemu_coroutine_self();
-AioContext *ctx = qemu_get_current_aio_context();
-QEMUTimer ts;
 
 const char *scheduled = qatomic_cmpxchg(>scheduled, NULL,
 qemu_co_sleep_ns__scheduled);
@@ -58,11 +55,26 @@ void coroutine_fn qemu_co_sleep_ns_wakeable(QemuCoSleep *w,
 }
 
 w->to_wake = co;
-aio_timer_init(ctx, , type, SCALE_NS, co_sleep_cb, w),
-timer_mod(, qemu_clock_get_ns(type) + ns);
 qemu_coroutine_yield();
-timer_del();
 
 /* w->to_wake is cleared before resuming this coroutine.  */
 assert(w->to_wake == NULL);
 }
+
+void coroutine_fn qemu_co_sleep_ns_wakeable(QemuCoSleep *w,
+QEMUClockType type, int64_t ns)
+{
+AioContext *ctx = qemu_get_current_aio_context();
+QEMUTimer ts;
+
+aio_timer_init(ctx, , type, SCALE_NS, co_sleep_cb, w);
+timer_mod(, qemu_clock_get_ns(type) + ns);
+
+/*
+ * The timer will fire in the current AiOContext, so the callback
+ * must happen after qemu_co_sleep yields and there is no race
+ * between timer_mod and qemu_co_sleep.
+ */
+qemu_co_sleep(w);
+timer_del();
+}
-- 
2.31.1




[PATCH v2 3/6] coroutine-sleep: allow qemu_co_sleep_wake that wakes nothing

2021-05-17 Thread Paolo Bonzini
All callers of qemu_co_sleep_wake are checking whether they are passing
a NULL argument inside the pointer-to-pointer: do the check in
qemu_co_sleep_wake itself.

As a side effect, qemu_co_sleep_wake can be called more than once and
it will only wake the coroutine once; after the first time, the argument
will be set to NULL via *sleep_state->user_state_pointer.  However, this
would not be safe unless co_sleep_cb keeps using the QemuCoSleepState*
directly, so make it go through the pointer-to-pointer instead.

Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Paolo Bonzini 
---
 block/block-copy.c  |  4 +---
 block/nbd.c |  8 ++--
 util/qemu-coroutine-sleep.c | 21 -
 3 files changed, 15 insertions(+), 18 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 9b4af00614..f896dc56f2 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -674,9 +674,7 @@ out:
 
 void block_copy_kick(BlockCopyCallState *call_state)
 {
-if (call_state->sleep_state) {
-qemu_co_sleep_wake(call_state->sleep_state);
-}
+qemu_co_sleep_wake(call_state->sleep_state);
 }
 
 /*
diff --git a/block/nbd.c b/block/nbd.c
index 1d4668d42d..1c6315b168 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -289,9 +289,7 @@ static void coroutine_fn 
nbd_client_co_drain_begin(BlockDriverState *bs)
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
 
 s->drained = true;
-if (s->connection_co_sleep_ns_state) {
-qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
-}
+qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
 
 nbd_co_establish_connection_cancel(bs, false);
 
@@ -330,9 +328,7 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 
 s->state = NBD_CLIENT_QUIT;
 if (s->connection_co) {
-if (s->connection_co_sleep_ns_state) {
-qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
-}
+qemu_co_sleep_wake(s->connection_co_sleep_ns_state);
 nbd_co_establish_connection_cancel(bs, true);
 }
 if (qemu_in_coroutine()) {
diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
index 3f6f637e81..3ae2b5399a 100644
--- a/util/qemu-coroutine-sleep.c
+++ b/util/qemu-coroutine-sleep.c
@@ -27,19 +27,22 @@ struct QemuCoSleepState {
 
 void qemu_co_sleep_wake(QemuCoSleepState *sleep_state)
 {
-/* Write of schedule protected by barrier write in aio_co_schedule */
-const char *scheduled = qatomic_cmpxchg(_state->co->scheduled,
-   qemu_co_sleep_ns__scheduled, NULL);
+if (sleep_state) {
+/* Write of schedule protected by barrier write in aio_co_schedule */
+const char *scheduled = qatomic_cmpxchg(_state->co->scheduled,
+qemu_co_sleep_ns__scheduled, 
NULL);
 
-assert(scheduled == qemu_co_sleep_ns__scheduled);
-*sleep_state->user_state_pointer = NULL;
-timer_del(_state->ts);
-aio_co_wake(sleep_state->co);
+assert(scheduled == qemu_co_sleep_ns__scheduled);
+*sleep_state->user_state_pointer = NULL;
+timer_del(_state->ts);
+aio_co_wake(sleep_state->co);
+}
 }
 
 static void co_sleep_cb(void *opaque)
 {
-qemu_co_sleep_wake(opaque);
+QemuCoSleepState **sleep_state = opaque;
+qemu_co_sleep_wake(*sleep_state);
 }
 
 void coroutine_fn qemu_co_sleep_ns_wakeable(QEMUClockType type, int64_t ns,
@@ -60,7 +63,7 @@ void coroutine_fn qemu_co_sleep_ns_wakeable(QEMUClockType 
type, int64_t ns,
 abort();
 }
 
-aio_timer_init(ctx, , type, SCALE_NS, co_sleep_cb, );
+aio_timer_init(ctx, , type, SCALE_NS, co_sleep_cb, sleep_state);
 *sleep_state = 
 timer_mod(, qemu_clock_get_ns(type) + ns);
 qemu_coroutine_yield();
-- 
2.31.1





[PATCH v2 4/6] coroutine-sleep: move timer out of QemuCoSleepState

2021-05-17 Thread Paolo Bonzini
This simplification is enabled by the previous patch.  Now aio_co_wake
will only be called once, therefore we do not care about a spurious
firing of the timer after a qemu_co_sleep_wake.

Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Paolo Bonzini 
---
 util/qemu-coroutine-sleep.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
index 3ae2b5399a..1d25019620 100644
--- a/util/qemu-coroutine-sleep.c
+++ b/util/qemu-coroutine-sleep.c
@@ -21,7 +21,6 @@ static const char *qemu_co_sleep_ns__scheduled = 
"qemu_co_sleep_ns";
 
 struct QemuCoSleepState {
 Coroutine *co;
-QEMUTimer ts;
 QemuCoSleepState **user_state_pointer;
 };
 
@@ -34,7 +33,6 @@ void qemu_co_sleep_wake(QemuCoSleepState *sleep_state)
 
 assert(scheduled == qemu_co_sleep_ns__scheduled);
 *sleep_state->user_state_pointer = NULL;
-timer_del(_state->ts);
 aio_co_wake(sleep_state->co);
 }
 }
@@ -49,6 +47,7 @@ void coroutine_fn qemu_co_sleep_ns_wakeable(QEMUClockType 
type, int64_t ns,
 QemuCoSleepState **sleep_state)
 {
 AioContext *ctx = qemu_get_current_aio_context();
+QEMUTimer ts;
 QemuCoSleepState state = {
 .co = qemu_coroutine_self(),
 .user_state_pointer = sleep_state,
@@ -63,10 +62,11 @@ void coroutine_fn qemu_co_sleep_ns_wakeable(QEMUClockType 
type, int64_t ns,
 abort();
 }
 
-aio_timer_init(ctx, , type, SCALE_NS, co_sleep_cb, sleep_state);
+aio_timer_init(ctx, , type, SCALE_NS, co_sleep_cb, sleep_state);
 *sleep_state = 
-timer_mod(, qemu_clock_get_ns(type) + ns);
+timer_mod(, qemu_clock_get_ns(type) + ns);
 qemu_coroutine_yield();
+timer_del();
 
 /* qemu_co_sleep_wake clears *sleep_state before resuming this coroutine.  
*/
 assert(*sleep_state == NULL);
-- 
2.31.1





[PATCH v2 2/6] coroutine-sleep: disallow NULL QemuCoSleepState** argument

2021-05-17 Thread Paolo Bonzini
Simplify the code by removing conditionals.  qemu_co_sleep_ns
can simply point the argument to an on-stack temporary.

Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Paolo Bonzini 
---
 include/qemu/coroutine.h|  5 +++--
 util/qemu-coroutine-sleep.c | 18 +-
 2 files changed, 8 insertions(+), 15 deletions(-)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index ce5b9c6851..c5d7742989 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -295,7 +295,7 @@ typedef struct QemuCoSleepState QemuCoSleepState;
 
 /**
  * Yield the coroutine for a given duration. During this yield, @sleep_state
- * (if not NULL) is set to an opaque pointer, which may be used for
+ * is set to an opaque pointer, which may be used for
  * qemu_co_sleep_wake(). Be careful, the pointer is set back to zero when the
  * timer fires. Don't save the obtained value to other variables and don't call
  * qemu_co_sleep_wake from another aio context.
@@ -304,7 +304,8 @@ void coroutine_fn qemu_co_sleep_ns_wakeable(QEMUClockType 
type, int64_t ns,
 QemuCoSleepState **sleep_state);
 static inline void coroutine_fn qemu_co_sleep_ns(QEMUClockType type, int64_t 
ns)
 {
-qemu_co_sleep_ns_wakeable(type, ns, NULL);
+QemuCoSleepState *unused = NULL;
+qemu_co_sleep_ns_wakeable(type, ns, );
 }
 
 /**
diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
index eec6e81f3f..3f6f637e81 100644
--- a/util/qemu-coroutine-sleep.c
+++ b/util/qemu-coroutine-sleep.c
@@ -32,9 +32,7 @@ void qemu_co_sleep_wake(QemuCoSleepState *sleep_state)
qemu_co_sleep_ns__scheduled, NULL);
 
 assert(scheduled == qemu_co_sleep_ns__scheduled);
-if (sleep_state->user_state_pointer) {
-*sleep_state->user_state_pointer = NULL;
-}
+*sleep_state->user_state_pointer = NULL;
 timer_del(_state->ts);
 aio_co_wake(sleep_state->co);
 }
@@ -63,16 +61,10 @@ void coroutine_fn qemu_co_sleep_ns_wakeable(QEMUClockType 
type, int64_t ns,
 }
 
 aio_timer_init(ctx, , type, SCALE_NS, co_sleep_cb, );
-if (sleep_state) {
-*sleep_state = 
-}
+*sleep_state = 
 timer_mod(, qemu_clock_get_ns(type) + ns);
 qemu_coroutine_yield();
-if (sleep_state) {
-/*
- * Note that *sleep_state is cleared during qemu_co_sleep_wake
- * before resuming this coroutine.
- */
-assert(*sleep_state == NULL);
-}
+
+/* qemu_co_sleep_wake clears *sleep_state before resuming this coroutine.  
*/
+assert(*sleep_state == NULL);
 }
-- 
2.31.1





[PATCH v2 1/6] coroutine-sleep: use a stack-allocated timer

2021-05-17 Thread Paolo Bonzini
The lifetime of the timer is well-known (it cannot outlive
qemu_co_sleep_ns_wakeable, because it's deleted by the time the
coroutine resumes), so it is not necessary to place it on the heap.

Reviewed-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Paolo Bonzini 
---
 util/qemu-coroutine-sleep.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/util/qemu-coroutine-sleep.c b/util/qemu-coroutine-sleep.c
index 8c4dac4fd7..eec6e81f3f 100644
--- a/util/qemu-coroutine-sleep.c
+++ b/util/qemu-coroutine-sleep.c
@@ -21,7 +21,7 @@ static const char *qemu_co_sleep_ns__scheduled = 
"qemu_co_sleep_ns";
 
 struct QemuCoSleepState {
 Coroutine *co;
-QEMUTimer *ts;
+QEMUTimer ts;
 QemuCoSleepState **user_state_pointer;
 };
 
@@ -35,7 +35,7 @@ void qemu_co_sleep_wake(QemuCoSleepState *sleep_state)
 if (sleep_state->user_state_pointer) {
 *sleep_state->user_state_pointer = NULL;
 }
-timer_del(sleep_state->ts);
+timer_del(_state->ts);
 aio_co_wake(sleep_state->co);
 }
 
@@ -50,7 +50,6 @@ void coroutine_fn qemu_co_sleep_ns_wakeable(QEMUClockType 
type, int64_t ns,
 AioContext *ctx = qemu_get_current_aio_context();
 QemuCoSleepState state = {
 .co = qemu_coroutine_self(),
-.ts = aio_timer_new(ctx, type, SCALE_NS, co_sleep_cb, ),
 .user_state_pointer = sleep_state,
 };
 
@@ -63,10 +62,11 @@ void coroutine_fn qemu_co_sleep_ns_wakeable(QEMUClockType 
type, int64_t ns,
 abort();
 }
 
+aio_timer_init(ctx, , type, SCALE_NS, co_sleep_cb, );
 if (sleep_state) {
 *sleep_state = 
 }
-timer_mod(state.ts, qemu_clock_get_ns(type) + ns);
+timer_mod(, qemu_clock_get_ns(type) + ns);
 qemu_coroutine_yield();
 if (sleep_state) {
 /*
@@ -75,5 +75,4 @@ void coroutine_fn qemu_co_sleep_ns_wakeable(QEMUClockType 
type, int64_t ns,
  */
 assert(*sleep_state == NULL);
 }
-timer_free(state.ts);
 }
-- 
2.31.1





[PATCH v2 0/6] coroutine: new sleep/wake API

2021-05-17 Thread Paolo Bonzini
This is a revamp of the qemu_co_sleep* API that makes it easier to
extend the API: the state that is needed to wake up a coroutine is now
part of the public API instead of hidden behind a pointer-to-pointer;
the API is made more extensible by pushing the rest of QemuCoSleepState
into local variables.

In the future, this will be extended to introduce a prepare/sleep/cancel
API similar to Linux's prepare_to_wait/schedule/finish_wait functions.
For now, this is just a nice refactoring.

Paolo

v1->v2: comment and commit message updates in patches 3, 5 and 6

Paolo Bonzini (6):
  coroutine-sleep: use a stack-allocated timer
  coroutine-sleep: disallow NULL QemuCoSleepState** argument
  coroutine-sleep: allow qemu_co_sleep_wake that wakes nothing
  coroutine-sleep: move timer out of QemuCoSleepState
  coroutine-sleep: replace QemuCoSleepState pointer with struct in the
API
  coroutine-sleep: introduce qemu_co_sleep

 block/block-copy.c  | 10 ++---
 block/nbd.c | 14 +++
 include/qemu/coroutine.h| 27 -
 util/qemu-coroutine-sleep.c | 75 +++--
 4 files changed, 64 insertions(+), 62 deletions(-)

-- 
2.31.1




[PULL 14/20] hw/block/nvme: cache lba and ms sizes

2021-05-17 Thread Klaus Jensen
From: Klaus Jensen 

There is no need to look up the lba size and metadata size in the LBA
Format structure everytime we want to use it. And we use it a lot.

Cache the values in the NvmeNamespace and update them if the namespace
is formatted.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.h | 37 ---
 hw/block/nvme-dif.c | 45 ++-
 hw/block/nvme-ns.c  | 26 +
 hw/block/nvme.c | 47 ++---
 4 files changed, 56 insertions(+), 99 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index d9bee7e5a05c..dc065e57b509 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -109,6 +109,8 @@ typedef struct NvmeNamespace {
 int64_t  size;
 int64_t  mdata_offset;
 NvmeIdNs id_ns;
+NvmeLBAF lbaf;
+size_t   lbasz;
 const uint32_t *iocs;
 uint8_t  csi;
 uint16_t status;
@@ -146,36 +148,14 @@ static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 return 0;
 }
 
-static inline NvmeLBAF *nvme_ns_lbaf(NvmeNamespace *ns)
-{
-NvmeIdNs *id_ns = >id_ns;
-return _ns->lbaf[NVME_ID_NS_FLBAS_INDEX(id_ns->flbas)];
-}
-
-static inline uint8_t nvme_ns_lbads(NvmeNamespace *ns)
-{
-return nvme_ns_lbaf(ns)->ds;
-}
-
-/* convert an LBA to the equivalent in bytes */
 static inline size_t nvme_l2b(NvmeNamespace *ns, uint64_t lba)
 {
-return lba << nvme_ns_lbads(ns);
-}
-
-static inline size_t nvme_lsize(NvmeNamespace *ns)
-{
-return 1 << nvme_ns_lbads(ns);
-}
-
-static inline uint16_t nvme_msize(NvmeNamespace *ns)
-{
-return nvme_ns_lbaf(ns)->ms;
+return lba << ns->lbaf.ds;
 }
 
 static inline size_t nvme_m2b(NvmeNamespace *ns, uint64_t lba)
 {
-return nvme_msize(ns) * lba;
+return ns->lbaf.ms * lba;
 }
 
 static inline bool nvme_ns_ext(NvmeNamespace *ns)
@@ -183,15 +163,6 @@ static inline bool nvme_ns_ext(NvmeNamespace *ns)
 return !!NVME_ID_NS_FLBAS_EXTENDED(ns->id_ns.flbas);
 }
 
-/* calculate the number of LBAs that the namespace can accomodate */
-static inline uint64_t nvme_ns_nlbas(NvmeNamespace *ns)
-{
-if (nvme_msize(ns)) {
-return ns->size / (nvme_lsize(ns) + nvme_msize(ns));
-}
-return ns->size >> nvme_ns_lbads(ns);
-}
-
 static inline NvmeZoneState nvme_get_zone_state(NvmeZone *zone)
 {
 return zone->d.zs >> 4;
diff --git a/hw/block/nvme-dif.c b/hw/block/nvme-dif.c
index e269d275ebed..c72e43195abf 100644
--- a/hw/block/nvme-dif.c
+++ b/hw/block/nvme-dif.c
@@ -44,20 +44,18 @@ void nvme_dif_pract_generate_dif(NvmeNamespace *ns, uint8_t 
*buf, size_t len,
  uint32_t reftag)
 {
 uint8_t *end = buf + len;
-size_t lsize = nvme_lsize(ns);
-size_t msize = nvme_msize(ns);
 int16_t pil = 0;
 
 if (!(ns->id_ns.dps & NVME_ID_NS_DPS_FIRST_EIGHT)) {
-pil = nvme_msize(ns) - sizeof(NvmeDifTuple);
+pil = ns->lbaf.ms - sizeof(NvmeDifTuple);
 }
 
-trace_pci_nvme_dif_pract_generate_dif(len, lsize, lsize + pil, apptag,
-  reftag);
+trace_pci_nvme_dif_pract_generate_dif(len, ns->lbasz, ns->lbasz + pil,
+  apptag, reftag);
 
-for (; buf < end; buf += lsize, mbuf += msize) {
+for (; buf < end; buf += ns->lbasz, mbuf += ns->lbaf.ms) {
 NvmeDifTuple *dif = (NvmeDifTuple *)(mbuf + pil);
-uint16_t crc = crc_t10dif(0x0, buf, lsize);
+uint16_t crc = crc_t10dif(0x0, buf, ns->lbasz);
 
 if (pil) {
 crc = crc_t10dif(crc, mbuf, pil);
@@ -98,7 +96,7 @@ static uint16_t nvme_dif_prchk(NvmeNamespace *ns, 
NvmeDifTuple *dif,
 }
 
 if (ctrl & NVME_RW_PRINFO_PRCHK_GUARD) {
-uint16_t crc = crc_t10dif(0x0, buf, nvme_lsize(ns));
+uint16_t crc = crc_t10dif(0x0, buf, ns->lbasz);
 
 if (pil) {
 crc = crc_t10dif(crc, mbuf, pil);
@@ -137,8 +135,6 @@ uint16_t nvme_dif_check(NvmeNamespace *ns, uint8_t *buf, 
size_t len,
 uint16_t appmask, uint32_t reftag)
 {
 uint8_t *end = buf + len;
-size_t lsize = nvme_lsize(ns);
-size_t msize = nvme_msize(ns);
 int16_t pil = 0;
 uint16_t status;
 
@@ -148,12 +144,12 @@ uint16_t nvme_dif_check(NvmeNamespace *ns, uint8_t *buf, 
size_t len,
 }
 
 if (!(ns->id_ns.dps & NVME_ID_NS_DPS_FIRST_EIGHT)) {
-pil = nvme_msize(ns) - sizeof(NvmeDifTuple);
+pil = ns->lbaf.ms - sizeof(NvmeDifTuple);
 }
 
-trace_pci_nvme_dif_check(NVME_RW_PRINFO(ctrl), lsize + pil);
+trace_pci_nvme_dif_check(NVME_RW_PRINFO(ctrl), ns->lbasz + pil);
 
-for (; buf < end; buf += lsize, mbuf += msize) {
+for (; buf < end; buf += ns->lbasz, mbuf += ns->lbaf.ms) {
 NvmeDifTuple *dif = (NvmeDifTuple *)(mbuf + pil);
 
 status = nvme_dif_prchk(ns, dif, buf, mbuf, pil, ctrl, apptag,
@@ -176,20 +172,18 @@ uint16_t 

Re: making a qdev bus available from a (non-qtree?) device

2021-05-17 Thread Stefan Hajnoczi
On Mon, May 17, 2021 at 08:55:50AM +0200, Klaus Jensen wrote:
> On May 13 15:02, Stefan Hajnoczi wrote:
> > On Wed, May 12, 2021 at 02:02:50PM +0200, Markus Armbruster wrote:
> > > Klaus Jensen  writes:
> > > > I can then call `qdev_set_parent_bus()` and set the parent bus to the
> > > > bus creates in the nvme-subsys device. This solves the problem since
> > > > the namespaces are not "garbage collected" when the nvme device is
> > > > removed, but it just feels wrong you know? Also, if possible, I'd of
> > > > course really like to retain the nice entries in `info qtree`.
> > > 
> > > I'm afraid I'm too ignorant on NVME to give useful advice.
> > > 
> > > Can you give us a brief primer on the aspects of physical NVME devices
> > > you'd like to model in QEMU?  What are "controllers", "namespaces", and
> > > "subsystems", and how do they work together?
> > > 
> > > Once we understand the relevant aspects of physical devices, we can
> > > discuss how to best model them in QEMU.
> > 
> > One specific question about the nature of devices vs subsystems vs
> > namespaces:
> > 
> > Does the device expose all the namespaces from one subsystem, or does it
> > need to be able to filter them (e.g. hide certain namespaces or present
> > a mix of namespaces from multiple subsystems)?
> > 
> 
> Subsystems are fully isolated. There are no interaction possible between
> different subsystems. Within a subsystem, all the "resources" (controllers
> and namespaces) are potentially "shared". That is, there may exists
> many-to-many relationships. A controller may have multiple namespaces
> attached and namespaces may be attached to multiple controllers.
> 
> > The status of the namespace as a DeviceState is a bit questionable since
> > the only possible parent it could have is a device, but multiple devices
> > want to use it. I understand why you're considering whether it should be
> > an --object...
> > 
> 
> When you say parent, I think you mean parent in terms of bus-device
> relationship? In that case, then the parent can actually be the subsystem,
> since if the namespace is not attached to any controllers, then it is just
> an entity/object in the subsystem that the controllers (the actual devices)
> may attach to[1].
> 
> Yes, the more I think about this and understand qdev I realize that it was a
> mistake to define nvme-ns to be a TYPE_DEVICE, since it does not act as a
> piece of virtual hardware. It is just an entity (object). The biggest
> mistake right now seems to be the bus_type use. It just worked wonderfully
> in the absence of subsystem support, but I feel that that choice is coming
> back to haunt me now. If we'd used a 'ctrl' link property we could just add
> a 'subsys' link property now and be happy.
> 
> Is there any way that we can "overload" the implicit "bus=" parameter to
> provide backwards compatibility (while basically changing it to function
> like a "link" parameter)?

I would consider adding new --object types and deprecating the devices
so they can be dropped in a future QEMU release. It may be necessary to
choose new names to avoid collisions with the existing ones.

Backwards compatibility might be tricky. One way might be to extract
most of the code from --device nvme-ns and move it into the new
--object, but leave the device to instantiate an object behind the
scenes? Then the device can still have its bus and translate that
relationship to --object somehow. I'm not sure, it depends on the
details of the code.

Stefan


signature.asc
Description: PGP signature


[PULL 19/20] hw/block/nvme: move zoned constraints checks

2021-05-17 Thread Klaus Jensen
From: Klaus Jensen 

Validation of the max_active and max_open zoned parameters are
independent of any other state, so move them to the early
nvme_ns_check_constraints parameter checks.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme-ns.c | 52 +-
 1 file changed, 28 insertions(+), 24 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 008deb5e87d1..992e5a13f538 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -210,30 +210,6 @@ static int nvme_ns_zoned_check_calc_geometry(NvmeNamespace 
*ns, Error **errp)
 return -1;
 }
 
-if (ns->params.max_active_zones) {
-if (ns->params.max_open_zones > ns->params.max_active_zones) {
-error_setg(errp, "max_open_zones (%u) exceeds max_active_zones 
(%u)",
-   ns->params.max_open_zones, ns->params.max_active_zones);
-return -1;
-}
-
-if (!ns->params.max_open_zones) {
-ns->params.max_open_zones = ns->params.max_active_zones;
-}
-}
-
-if (ns->params.zd_extension_size) {
-if (ns->params.zd_extension_size & 0x3f) {
-error_setg(errp,
-"zone descriptor extension size must be a multiple of 64B");
-return -1;
-}
-if ((ns->params.zd_extension_size >> 6) > 0xff) {
-error_setg(errp, "zone descriptor extension size is too large");
-return -1;
-}
-}
-
 return 0;
 }
 
@@ -403,6 +379,34 @@ static int nvme_ns_check_constraints(NvmeCtrl *n, 
NvmeNamespace *ns,
 }
 }
 
+if (ns->params.zoned) {
+if (ns->params.max_active_zones) {
+if (ns->params.max_open_zones > ns->params.max_active_zones) {
+error_setg(errp, "max_open_zones (%u) exceeds "
+   "max_active_zones (%u)", ns->params.max_open_zones,
+   ns->params.max_active_zones);
+return -1;
+}
+
+if (!ns->params.max_open_zones) {
+ns->params.max_open_zones = ns->params.max_active_zones;
+}
+}
+
+if (ns->params.zd_extension_size) {
+if (ns->params.zd_extension_size & 0x3f) {
+error_setg(errp, "zone descriptor extension size must be a "
+   "multiple of 64B");
+return -1;
+}
+if ((ns->params.zd_extension_size >> 6) > 0xff) {
+error_setg(errp,
+   "zone descriptor extension size is too large");
+return -1;
+}
+}
+}
+
 return 0;
 }
 
-- 
2.31.1




[PULL 18/20] hw/block/nvme: remove irrelevant zone resource checks

2021-05-17 Thread Klaus Jensen
From: Klaus Jensen 

It is not an error to report more active/open zones supported than the
number of zones in the namespace.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme-ns.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index b25838ac4fd4..008deb5e87d1 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -210,19 +210,6 @@ static int nvme_ns_zoned_check_calc_geometry(NvmeNamespace 
*ns, Error **errp)
 return -1;
 }
 
-if (ns->params.max_open_zones > ns->num_zones) {
-error_setg(errp,
-   "max_open_zones value %u exceeds the number of zones %u",
-   ns->params.max_open_zones, ns->num_zones);
-return -1;
-}
-if (ns->params.max_active_zones > ns->num_zones) {
-error_setg(errp,
-   "max_active_zones value %u exceeds the number of zones %u",
-   ns->params.max_active_zones, ns->num_zones);
-return -1;
-}
-
 if (ns->params.max_active_zones) {
 if (ns->params.max_open_zones > ns->params.max_active_zones) {
 error_setg(errp, "max_open_zones (%u) exceeds max_active_zones 
(%u)",
-- 
2.31.1




[PULL 16/20] hw/block/nvme: streamline namespace array indexing

2021-05-17 Thread Klaus Jensen
From: Klaus Jensen 

Streamline namespace array indexing such that both the subsystem and
controller namespaces arrays are 1-indexed.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.h | 4 ++--
 hw/block/nvme.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 9349d1c33ad7..ac3f0a886735 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -438,7 +438,7 @@ typedef struct NvmeCtrl {
 NvmeSubsystem   *subsys;
 
 NvmeNamespace   namespace;
-NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES];
+NvmeNamespace   *namespaces[NVME_MAX_NAMESPACES + 1];
 NvmeSQueue  **sq;
 NvmeCQueue  **cq;
 NvmeSQueue  admin_sq;
@@ -460,7 +460,7 @@ static inline NvmeNamespace *nvme_ns(NvmeCtrl *n, uint32_t 
nsid)
 return NULL;
 }
 
-return n->namespaces[nsid - 1];
+return n->namespaces[nsid];
 }
 
 static inline NvmeCQueue *nvme_cq(NvmeRequest *req)
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 1db9a603f5c4..baf7b6714544 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4990,7 +4990,7 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 return NVME_NS_NOT_ATTACHED | NVME_DNR;
 }
 
-ctrl->namespaces[nsid - 1] = NULL;
+ctrl->namespaces[nsid] = NULL;
 ns->attached--;
 
 nvme_update_dmrsl(ctrl);
@@ -6163,7 +6163,7 @@ void nvme_attach_ns(NvmeCtrl *n, NvmeNamespace *ns)
 uint32_t nsid = ns->params.nsid;
 assert(nsid && nsid <= NVME_MAX_NAMESPACES);
 
-n->namespaces[nsid - 1] = ns;
+n->namespaces[nsid] = ns;
 ns->attached++;
 
 n->dmrsl = MIN_NON_ZERO(n->dmrsl,
-- 
2.31.1




[PULL 07/20] hw/block/nvme: rename __nvme_zrm_open

2021-05-17 Thread Klaus Jensen
From: Klaus Jensen 

Get rid of the (reserved) double underscore use. Rename the "generic"
zone open function to nvme_zrm_open_flags() and add a generic `int
flags` argument instead which allows more flags to be easily added in
the future. There is at least one TP under standardization that would
add an additional flag.

Cc: Philippe Mathieu-Daudé 
Cc: Thomas Huth 
Signed-off-by: Klaus Jensen 
Reviewed-by: Thomas Huth 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index baba949660f2..9e5ab4cacb06 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1694,8 +1694,12 @@ static void nvme_zrm_auto_transition_zone(NvmeNamespace 
*ns)
 }
 }
 
-static uint16_t __nvme_zrm_open(NvmeNamespace *ns, NvmeZone *zone,
-bool implicit)
+enum {
+NVME_ZRM_AUTO = 1 << 0,
+};
+
+static uint16_t nvme_zrm_open_flags(NvmeNamespace *ns, NvmeZone *zone,
+int flags)
 {
 int act = 0;
 uint16_t status;
@@ -1719,7 +1723,7 @@ static uint16_t __nvme_zrm_open(NvmeNamespace *ns, 
NvmeZone *zone,
 
 nvme_aor_inc_open(ns);
 
-if (implicit) {
+if (flags & NVME_ZRM_AUTO) {
 nvme_assign_zone_state(ns, zone, NVME_ZONE_STATE_IMPLICITLY_OPEN);
 return NVME_SUCCESS;
 }
@@ -1727,7 +1731,7 @@ static uint16_t __nvme_zrm_open(NvmeNamespace *ns, 
NvmeZone *zone,
 /* fallthrough */
 
 case NVME_ZONE_STATE_IMPLICITLY_OPEN:
-if (implicit) {
+if (flags & NVME_ZRM_AUTO) {
 return NVME_SUCCESS;
 }
 
@@ -1745,12 +1749,12 @@ static uint16_t __nvme_zrm_open(NvmeNamespace *ns, 
NvmeZone *zone,
 
 static inline uint16_t nvme_zrm_auto(NvmeNamespace *ns, NvmeZone *zone)
 {
-return __nvme_zrm_open(ns, zone, true);
+return nvme_zrm_open_flags(ns, zone, NVME_ZRM_AUTO);
 }
 
 static inline uint16_t nvme_zrm_open(NvmeNamespace *ns, NvmeZone *zone)
 {
-return __nvme_zrm_open(ns, zone, false);
+return nvme_zrm_open_flags(ns, zone, 0);
 }
 
 static void __nvme_advance_zone_wp(NvmeNamespace *ns, NvmeZone *zone,
-- 
2.31.1




[PULL 05/20] hw/block/nvme: function formatting fix

2021-05-17 Thread Klaus Jensen
From: Gollu Appalanaidu 

nvme_map_addr_pmr function arguments not aligned, fix that.

Signed-off-by: Gollu Appalanaidu 
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 14c24f9b0866..79a087a41ce8 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -576,7 +576,7 @@ static uint16_t nvme_map_addr_cmb(NvmeCtrl *n, QEMUIOVector 
*iov, hwaddr addr,
 }
 
 static uint16_t nvme_map_addr_pmr(NvmeCtrl *n, QEMUIOVector *iov, hwaddr addr,
-size_t len)
+  size_t len)
 {
 if (!len) {
 return NVME_SUCCESS;
-- 
2.31.1




[PULL 17/20] hw/block/nvme: remove num_namespaces member

2021-05-17 Thread Klaus Jensen
From: Klaus Jensen 

The NvmeCtrl num_namespaces member is just an indirection for the
NVME_MAX_NAMESPACES constant.

Remove the indirection.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.h |  1 -
 hw/block/nvme.c | 30 +++---
 2 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index ac3f0a886735..fb028d81d16f 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -401,7 +401,6 @@ typedef struct NvmeCtrl {
 uint16_tcqe_size;
 uint16_tsqe_size;
 uint32_treg_size;
-uint32_tnum_namespaces;
 uint32_tmax_q_ents;
 uint8_t outstanding_aers;
 uint32_tirq_status;
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index baf7b6714544..0bcaf7192f99 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -393,7 +393,8 @@ static int nvme_addr_write(NvmeCtrl *n, hwaddr addr, void 
*buf, int size)
 
 static bool nvme_nsid_valid(NvmeCtrl *n, uint32_t nsid)
 {
-return nsid && (nsid == NVME_NSID_BROADCAST || nsid <= n->num_namespaces);
+return nsid &&
+(nsid == NVME_NSID_BROADCAST || nsid <= NVME_MAX_NAMESPACES);
 }
 
 static int nvme_check_sqid(NvmeCtrl *n, uint16_t sqid)
@@ -2882,7 +2883,7 @@ static uint16_t nvme_flush(NvmeCtrl *n, NvmeRequest *req)
 /* 1-initialize; see comment in nvme_dsm */
 *num_flushes = 1;
 
-for (int i = 1; i <= n->num_namespaces; i++) {
+for (int i = 1; i <= NVME_MAX_NAMESPACES; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
 continue;
@@ -3850,7 +3851,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, 
uint32_t buf_len,
 } else {
 int i;
 
-for (i = 1; i <= n->num_namespaces; i++) {
+for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
 continue;
@@ -4347,7 +4348,7 @@ static uint16_t nvme_identify_nslist(NvmeCtrl *n, 
NvmeRequest *req,
 return NVME_INVALID_NSID | NVME_DNR;
 }
 
-for (i = 1; i <= n->num_namespaces; i++) {
+for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
 if (!active) {
@@ -4395,7 +4396,7 @@ static uint16_t nvme_identify_nslist_csi(NvmeCtrl *n, 
NvmeRequest *req,
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-for (i = 1; i <= n->num_namespaces; i++) {
+for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
 if (!active) {
@@ -4661,7 +4662,7 @@ static uint16_t nvme_get_feature(NvmeCtrl *n, NvmeRequest 
*req)
 goto out;
 case NVME_VOLATILE_WRITE_CACHE:
 result = 0;
-for (i = 1; i <= n->num_namespaces; i++) {
+for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
 continue;
@@ -4808,7 +4809,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest 
*req)
 break;
 case NVME_ERROR_RECOVERY:
 if (nsid == NVME_NSID_BROADCAST) {
-for (i = 1; i <= n->num_namespaces; i++) {
+for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
 ns = nvme_ns(n, i);
 
 if (!ns) {
@@ -4829,7 +4830,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest 
*req)
 }
 break;
 case NVME_VOLATILE_WRITE_CACHE:
-for (i = 1; i <= n->num_namespaces; i++) {
+for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
 continue;
@@ -5122,7 +5123,7 @@ static uint16_t nvme_format(NvmeCtrl *n, NvmeRequest *req)
 req->status = status;
 }
 } else {
-for (i = 1; i <= n->num_namespaces; i++) {
+for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
 continue;
@@ -5233,7 +5234,7 @@ static void nvme_ctrl_reset(NvmeCtrl *n)
 NvmeNamespace *ns;
 int i;
 
-for (i = 1; i <= n->num_namespaces; i++) {
+for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
 continue;
@@ -5275,7 +5276,7 @@ static void nvme_ctrl_shutdown(NvmeCtrl *n)
 memory_region_msync(>pmr.dev->mr, 0, n->pmr.dev->size);
 }
 
-for (i = 1; i <= n->num_namespaces; i++) {
+for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
 continue;
@@ -5290,7 +5291,7 @@ static void nvme_select_iocs(NvmeCtrl *n)
 NvmeNamespace *ns;
 int i;
 
-for (i = 1; i <= n->num_namespaces; i++) {
+for (i = 1; i <= NVME_MAX_NAMESPACES; i++) {
 ns = nvme_ns(n, i);
 if (!ns) {
 continue;
@@ -5917,7 +5918,6 @@ static void nvme_check_constraints(NvmeCtrl *n, Error 
**errp)
 
 static void nvme_init_state(NvmeCtrl *n)
 {
-n->num_namespaces = NVME_MAX_NAMESPACES;
 /* add one to max_ioqpairs to account for 

[PULL 20/20] hw/nvme: move nvme emulation out of hw/block

2021-05-17 Thread Klaus Jensen
From: Klaus Jensen 

With the introduction of the nvme-subsystem device we are really
cluttering up the hw/block directory.

As suggested by Philippe previously, move the nvme emulation to hw/nvme.

Suggested-by: Philippe Mathieu-Daudé 
Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 meson.build   |   1 +
 hw/{block => nvme}/nvme.h |   6 +-
 hw/nvme/trace.h   |   1 +
 hw/{block/nvme.c => nvme/ctrl.c}  |   0
 hw/{block/nvme-dif.c => nvme/dif.c}   |   0
 hw/{block/nvme-ns.c => nvme/ns.c} |   0
 hw/{block/nvme-subsys.c => nvme/subsys.c} |   0
 MAINTAINERS   |   2 +-
 hw/Kconfig|   1 +
 hw/block/Kconfig  |   5 -
 hw/block/meson.build  |   1 -
 hw/block/trace-events | 206 --
 hw/meson.build|   1 +
 hw/nvme/Kconfig   |   4 +
 hw/nvme/meson.build   |   1 +
 hw/nvme/trace-events  | 204 +
 16 files changed, 217 insertions(+), 216 deletions(-)
 rename hw/{block => nvme}/nvme.h (99%)
 create mode 100644 hw/nvme/trace.h
 rename hw/{block/nvme.c => nvme/ctrl.c} (100%)
 rename hw/{block/nvme-dif.c => nvme/dif.c} (100%)
 rename hw/{block/nvme-ns.c => nvme/ns.c} (100%)
 rename hw/{block/nvme-subsys.c => nvme/subsys.c} (100%)
 create mode 100644 hw/nvme/Kconfig
 create mode 100644 hw/nvme/meson.build
 create mode 100644 hw/nvme/trace-events

diff --git a/meson.build b/meson.build
index 8e16e05c2ade..1559e8d873a7 100644
--- a/meson.build
+++ b/meson.build
@@ -1822,6 +1822,7 @@ if have_system
 'hw/misc/macio',
 'hw/net',
 'hw/net/can',
+'hw/nvme',
 'hw/nvram',
 'hw/pci',
 'hw/pci-host',
diff --git a/hw/block/nvme.h b/hw/nvme/nvme.h
similarity index 99%
rename from hw/block/nvme.h
rename to hw/nvme/nvme.h
index fb028d81d16f..81a35cda142b 100644
--- a/hw/block/nvme.h
+++ b/hw/nvme/nvme.h
@@ -15,8 +15,8 @@
  * This code is licensed under the GNU GPL v2 or later.
  */
 
-#ifndef HW_NVME_H
-#define HW_NVME_H
+#ifndef HW_NVME_INTERNAL_H
+#define HW_NVME_INTERNAL_H
 
 #include "qemu/uuid.h"
 #include "hw/pci/pci.h"
@@ -544,4 +544,4 @@ uint16_t nvme_dif_check(NvmeNamespace *ns, uint8_t *buf, 
size_t len,
 uint16_t nvme_dif_rw(NvmeCtrl *n, NvmeRequest *req);
 
 
-#endif /* HW_NVME_H */
+#endif /* HW_NVME_INTERNAL_H */
diff --git a/hw/nvme/trace.h b/hw/nvme/trace.h
new file mode 100644
index ..b398ea107f59
--- /dev/null
+++ b/hw/nvme/trace.h
@@ -0,0 +1 @@
+#include "trace/trace-hw_nvme.h"
diff --git a/hw/block/nvme.c b/hw/nvme/ctrl.c
similarity index 100%
rename from hw/block/nvme.c
rename to hw/nvme/ctrl.c
diff --git a/hw/block/nvme-dif.c b/hw/nvme/dif.c
similarity index 100%
rename from hw/block/nvme-dif.c
rename to hw/nvme/dif.c
diff --git a/hw/block/nvme-ns.c b/hw/nvme/ns.c
similarity index 100%
rename from hw/block/nvme-ns.c
rename to hw/nvme/ns.c
diff --git a/hw/block/nvme-subsys.c b/hw/nvme/subsys.c
similarity index 100%
rename from hw/block/nvme-subsys.c
rename to hw/nvme/subsys.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 78561a223f92..e3c2866393e2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1956,7 +1956,7 @@ M: Keith Busch 
 M: Klaus Jensen 
 L: qemu-block@nongnu.org
 S: Supported
-F: hw/block/nvme*
+F: hw/nvme/*
 F: include/block/nvme.h
 F: tests/qtest/nvme-test.c
 F: docs/system/nvme.rst
diff --git a/hw/Kconfig b/hw/Kconfig
index aa10357adff2..805860f56451 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -21,6 +21,7 @@ source mem/Kconfig
 source misc/Kconfig
 source net/Kconfig
 source nubus/Kconfig
+source nvme/Kconfig
 source nvram/Kconfig
 source pci-bridge/Kconfig
 source pci-host/Kconfig
diff --git a/hw/block/Kconfig b/hw/block/Kconfig
index 4fcd15216684..295441e64ab4 100644
--- a/hw/block/Kconfig
+++ b/hw/block/Kconfig
@@ -25,11 +25,6 @@ config ONENAND
 config TC58128
 bool
 
-config NVME_PCI
-bool
-default y if PCI_DEVICES
-depends on PCI
-
 config VIRTIO_BLK
 bool
 default y
diff --git a/hw/block/meson.build b/hw/block/meson.build
index 5b4a7699f98f..8b0de54db1fc 100644
--- a/hw/block/meson.build
+++ b/hw/block/meson.build
@@ -13,7 +13,6 @@ softmmu_ss.add(when: 'CONFIG_SSI_M25P80', if_true: 
files('m25p80.c'))
 softmmu_ss.add(when: 'CONFIG_SWIM', if_true: files('swim.c'))
 softmmu_ss.add(when: 'CONFIG_XEN', if_true: files('xen-block.c'))
 softmmu_ss.add(when: 'CONFIG_TC58128', if_true: files('tc58128.c'))
-softmmu_ss.add(when: 'CONFIG_NVME_PCI', if_true: files('nvme.c', 'nvme-ns.c', 
'nvme-subsys.c', 'nvme-dif.c'))
 
 specific_ss.add(when: 'CONFIG_VIRTIO_BLK', if_true: files('virtio-blk.c'))
 specific_ss.add(when: 'CONFIG_VHOST_USER_BLK', if_true: 
files('vhost-user-blk.c'))
diff --git a/hw/block/trace-events b/hw/block/trace-events
index fa12e3a67a75..646917d045f7 100644
--- a/hw/block/trace-events

[PULL 03/20] hw/block/nvme: consider metadata read aio return value in compare

2021-05-17 Thread Klaus Jensen
From: Gollu Appalanaidu 

Currently in compare command metadata aio read blk_aio_preadv return
value ignored. Consider it and complete the block accounting.

Signed-off-by: Gollu Appalanaidu 
Fixes: 0a384f923f51 ("hw/block/nvme: add compare command")
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index cd594280a7f9..67abc9eb2c24 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2369,10 +2369,19 @@ static void nvme_compare_mdata_cb(void *opaque, int ret)
 uint32_t reftag = le32_to_cpu(rw->reftag);
 struct nvme_compare_ctx *ctx = req->opaque;
 g_autofree uint8_t *buf = NULL;
+BlockBackend *blk = ns->blkconf.blk;
+BlockAcctCookie *acct = >acct;
+BlockAcctStats *stats = blk_get_stats(blk);
 uint16_t status = NVME_SUCCESS;
 
 trace_pci_nvme_compare_mdata_cb(nvme_cid(req));
 
+if (ret) {
+block_acct_failed(stats, acct);
+nvme_aio_err(req, ret);
+goto out;
+}
+
 buf = g_malloc(ctx->mdata.iov.size);
 
 status = nvme_bounce_mdata(n, buf, ctx->mdata.iov.size,
@@ -2421,6 +2430,8 @@ static void nvme_compare_mdata_cb(void *opaque, int ret)
 goto out;
 }
 
+block_acct_done(stats, acct);
+
 out:
 qemu_iovec_destroy(>data.iov);
 g_free(ctx->data.bounce);
-- 
2.31.1




[PULL 12/20] hw/block/nvme: remove non-shared defines from header file

2021-05-17 Thread Klaus Jensen
From: Klaus Jensen 

Remove non-shared defines from the shared header.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.h| 2 --
 hw/block/nvme-ns.c | 1 +
 hw/block/nvme.c| 1 +
 3 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index d9374d3e33e0..2c4e7b90fa54 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -24,8 +24,6 @@
 
 #include "block/nvme.h"
 
-#define NVME_DEFAULT_ZONE_SIZE   (128 * MiB)
-#define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB)
 #define NVME_MAX_CONTROLLERS 32
 #define NVME_MAX_NAMESPACES  256
 
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index d91bf7bb..93aaf6de02af 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -23,6 +23,7 @@
 #include "trace.h"
 
 #define MIN_DISCARD_GRANULARITY (4 * KiB)
+#define NVME_DEFAULT_ZONE_SIZE   (128 * MiB)
 
 void nvme_ns_init_format(NvmeNamespace *ns)
 {
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index e152c61adb76..e7bd22b8b2ed 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -168,6 +168,7 @@
 #define NVME_TEMPERATURE_WARNING 0x157
 #define NVME_TEMPERATURE_CRITICAL 0x175
 #define NVME_NUM_FW_SLOTS 1
+#define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB)
 
 #define NVME_GUEST_ERR(trace, fmt, ...) \
 do { \
-- 
2.31.1




[PULL 13/20] hw/block/nvme: replace nvme_ns_status

2021-05-17 Thread Klaus Jensen
From: Klaus Jensen 

The inline nvme_ns_status() helper only has a single call site. Remove
it from the header file and inline it for real.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.h |  5 -
 hw/block/nvme.c | 15 ---
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 2c4e7b90fa54..d9bee7e5a05c 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -137,11 +137,6 @@ typedef struct NvmeNamespace {
 } features;
 } NvmeNamespace;
 
-static inline uint16_t nvme_ns_status(NvmeNamespace *ns)
-{
-return ns->status;
-}
-
 static inline uint32_t nvme_nsid(NvmeNamespace *ns)
 {
 if (ns) {
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index e7bd22b8b2ed..710af6a7147c 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -3609,8 +3609,8 @@ static uint16_t nvme_zone_mgmt_recv(NvmeCtrl *n, 
NvmeRequest *req)
 
 static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest *req)
 {
+NvmeNamespace *ns;
 uint32_t nsid = le32_to_cpu(req->cmd.nsid);
-uint16_t status;
 
 trace_pci_nvme_io_cmd(nvme_cid(req), nsid, nvme_sqid(req),
   req->cmd.opcode, nvme_io_opc_str(req->cmd.opcode));
@@ -3642,21 +3642,22 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 return nvme_flush(n, req);
 }
 
-req->ns = nvme_ns(n, nsid);
-if (unlikely(!req->ns)) {
+ns = nvme_ns(n, nsid);
+if (unlikely(!ns)) {
 return NVME_INVALID_FIELD | NVME_DNR;
 }
 
-if (!(req->ns->iocs[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) {
+if (!(ns->iocs[req->cmd.opcode] & NVME_CMD_EFF_CSUPP)) {
 trace_pci_nvme_err_invalid_opc(req->cmd.opcode);
 return NVME_INVALID_OPCODE | NVME_DNR;
 }
 
-status = nvme_ns_status(req->ns);
-if (unlikely(status)) {
-return status;
+if (ns->status) {
+return ns->status;
 }
 
+req->ns = ns;
+
 switch (req->cmd.opcode) {
 case NVME_CMD_WRITE_ZEROES:
 return nvme_write_zeroes(n, req);
-- 
2.31.1




[PULL 10/20] hw/block/nvme: consolidate header files

2021-05-17 Thread Klaus Jensen
From: Klaus Jensen 

In preparation for moving the nvme device into its own subtree, merge
the header files into one.

Also add missing copyright notice and add list of authors with
substantial contributions.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme-dif.h|  63 ---
 hw/block/nvme-ns.h | 229 
 hw/block/nvme-subsys.h |  59 ---
 hw/block/nvme.h| 383 +
 hw/block/nvme-dif.c|   1 -
 hw/block/nvme-ns.c |   1 -
 hw/block/nvme-subsys.c |   1 -
 hw/block/nvme.c|   2 -
 8 files changed, 348 insertions(+), 391 deletions(-)
 delete mode 100644 hw/block/nvme-dif.h
 delete mode 100644 hw/block/nvme-ns.h
 delete mode 100644 hw/block/nvme-subsys.h

diff --git a/hw/block/nvme-dif.h b/hw/block/nvme-dif.h
deleted file mode 100644
index 524faffbd7a0..
--- a/hw/block/nvme-dif.h
+++ /dev/null
@@ -1,63 +0,0 @@
-/*
- * QEMU NVM Express End-to-End Data Protection support
- *
- * Copyright (c) 2021 Samsung Electronics Co., Ltd.
- *
- * Authors:
- *   Klaus Jensen   
- *   Gollu Appalanaidu  
- */
-
-#ifndef HW_NVME_DIF_H
-#define HW_NVME_DIF_H
-
-/* from Linux kernel (crypto/crct10dif_common.c) */
-static const uint16_t t10_dif_crc_table[256] = {
-0x, 0x8BB7, 0x9CD9, 0x176E, 0xB205, 0x39B2, 0x2EDC, 0xA56B,
-0xEFBD, 0x640A, 0x7364, 0xF8D3, 0x5DB8, 0xD60F, 0xC161, 0x4AD6,
-0x54CD, 0xDF7A, 0xC814, 0x43A3, 0xE6C8, 0x6D7F, 0x7A11, 0xF1A6,
-0xBB70, 0x30C7, 0x27A9, 0xAC1E, 0x0975, 0x82C2, 0x95AC, 0x1E1B,
-0xA99A, 0x222D, 0x3543, 0xBEF4, 0x1B9F, 0x9028, 0x8746, 0x0CF1,
-0x4627, 0xCD90, 0xDAFE, 0x5149, 0xF422, 0x7F95, 0x68FB, 0xE34C,
-0xFD57, 0x76E0, 0x618E, 0xEA39, 0x4F52, 0xC4E5, 0xD38B, 0x583C,
-0x12EA, 0x995D, 0x8E33, 0x0584, 0xA0EF, 0x2B58, 0x3C36, 0xB781,
-0xD883, 0x5334, 0x445A, 0xCFED, 0x6A86, 0xE131, 0xF65F, 0x7DE8,
-0x373E, 0xBC89, 0xABE7, 0x2050, 0x853B, 0x0E8C, 0x19E2, 0x9255,
-0x8C4E, 0x07F9, 0x1097, 0x9B20, 0x3E4B, 0xB5FC, 0xA292, 0x2925,
-0x63F3, 0xE844, 0xFF2A, 0x749D, 0xD1F6, 0x5A41, 0x4D2F, 0xC698,
-0x7119, 0xFAAE, 0xEDC0, 0x6677, 0xC31C, 0x48AB, 0x5FC5, 0xD472,
-0x9EA4, 0x1513, 0x027D, 0x89CA, 0x2CA1, 0xA716, 0xB078, 0x3BCF,
-0x25D4, 0xAE63, 0xB90D, 0x32BA, 0x97D1, 0x1C66, 0x0B08, 0x80BF,
-0xCA69, 0x41DE, 0x56B0, 0xDD07, 0x786C, 0xF3DB, 0xE4B5, 0x6F02,
-0x3AB1, 0xB106, 0xA668, 0x2DDF, 0x88B4, 0x0303, 0x146D, 0x9FDA,
-0xD50C, 0x5EBB, 0x49D5, 0xC262, 0x6709, 0xECBE, 0xFBD0, 0x7067,
-0x6E7C, 0xE5CB, 0xF2A5, 0x7912, 0xDC79, 0x57CE, 0x40A0, 0xCB17,
-0x81C1, 0x0A76, 0x1D18, 0x96AF, 0x33C4, 0xB873, 0xAF1D, 0x24AA,
-0x932B, 0x189C, 0x0FF2, 0x8445, 0x212E, 0xAA99, 0xBDF7, 0x3640,
-0x7C96, 0xF721, 0xE04F, 0x6BF8, 0xCE93, 0x4524, 0x524A, 0xD9FD,
-0xC7E6, 0x4C51, 0x5B3F, 0xD088, 0x75E3, 0xFE54, 0xE93A, 0x628D,
-0x285B, 0xA3EC, 0xB482, 0x3F35, 0x9A5E, 0x11E9, 0x0687, 0x8D30,
-0xE232, 0x6985, 0x7EEB, 0xF55C, 0x5037, 0xDB80, 0xCCEE, 0x4759,
-0x0D8F, 0x8638, 0x9156, 0x1AE1, 0xBF8A, 0x343D, 0x2353, 0xA8E4,
-0xB6FF, 0x3D48, 0x2A26, 0xA191, 0x04FA, 0x8F4D, 0x9823, 0x1394,
-0x5942, 0xD2F5, 0xC59B, 0x4E2C, 0xEB47, 0x60F0, 0x779E, 0xFC29,
-0x4BA8, 0xC01F, 0xD771, 0x5CC6, 0xF9AD, 0x721A, 0x6574, 0xEEC3,
-0xA415, 0x2FA2, 0x38CC, 0xB37B, 0x1610, 0x9DA7, 0x8AC9, 0x017E,
-0x1F65, 0x94D2, 0x83BC, 0x080B, 0xAD60, 0x26D7, 0x31B9, 0xBA0E,
-0xF0D8, 0x7B6F, 0x6C01, 0xE7B6, 0x42DD, 0xC96A, 0xDE04, 0x55B3
-};
-
-uint16_t nvme_check_prinfo(NvmeNamespace *ns, uint16_t ctrl, uint64_t slba,
-   uint32_t reftag);
-uint16_t nvme_dif_mangle_mdata(NvmeNamespace *ns, uint8_t *mbuf, size_t mlen,
-   uint64_t slba);
-void nvme_dif_pract_generate_dif(NvmeNamespace *ns, uint8_t *buf, size_t len,
- uint8_t *mbuf, size_t mlen, uint16_t apptag,
- uint32_t reftag);
-uint16_t nvme_dif_check(NvmeNamespace *ns, uint8_t *buf, size_t len,
-uint8_t *mbuf, size_t mlen, uint16_t ctrl,
-uint64_t slba, uint16_t apptag,
-uint16_t appmask, uint32_t reftag);
-uint16_t nvme_dif_rw(NvmeCtrl *n, NvmeRequest *req);
-
-#endif /* HW_NVME_DIF_H */
diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
deleted file mode 100644
index fb0a41f912e7..
--- a/hw/block/nvme-ns.h
+++ /dev/null
@@ -1,229 +0,0 @@
-/*
- * QEMU NVM Express Virtual Namespace
- *
- * Copyright (c) 2019 CNEX Labs
- * Copyright (c) 2020 Samsung Electronics
- *
- * Authors:
- *  Klaus Jensen  
- *
- * This work is licensed under the terms of the GNU GPL, version 2. See the
- * COPYING file in the top-level directory.
- *
- */
-
-#ifndef NVME_NS_H
-#define NVME_NS_H
-
-#include "qemu/uuid.h"
-
-#define TYPE_NVME_NS "nvme-ns"
-#define NVME_NS(obj) \
-OBJECT_CHECK(NvmeNamespace, (obj), TYPE_NVME_NS)
-
-typedef struct NvmeZone {
-

[PULL 11/20] hw/block/nvme: cleanup includes

2021-05-17 Thread Klaus Jensen
From: Klaus Jensen 

Clean up includes.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme-dif.c|  7 +++
 hw/block/nvme-ns.c | 11 ++-
 hw/block/nvme-subsys.c | 11 +--
 hw/block/nvme.c| 22 +-
 4 files changed, 15 insertions(+), 36 deletions(-)

diff --git a/hw/block/nvme-dif.c b/hw/block/nvme-dif.c
index 25e5a90854fa..e269d275ebed 100644
--- a/hw/block/nvme-dif.c
+++ b/hw/block/nvme-dif.c
@@ -9,12 +9,11 @@
  */
 
 #include "qemu/osdep.h"
-#include "hw/block/block.h"
-#include "sysemu/dma.h"
-#include "sysemu/block-backend.h"
 #include "qapi/error.h"
-#include "trace.h"
+#include "sysemu/block-backend.h"
+
 #include "nvme.h"
+#include "trace.h"
 
 uint16_t nvme_check_prinfo(NvmeNamespace *ns, uint16_t ctrl, uint64_t slba,
uint32_t reftag)
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 4d7103e78ff8..d91bf7bb 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -14,20 +14,13 @@
 
 #include "qemu/osdep.h"
 #include "qemu/units.h"
-#include "qemu/cutils.h"
-#include "qemu/log.h"
 #include "qemu/error-report.h"
-#include "hw/block/block.h"
-#include "hw/pci/pci.h"
+#include "qapi/error.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/block-backend.h"
-#include "qapi/error.h"
 
-#include "hw/qdev-properties.h"
-#include "hw/qdev-core.h"
-
-#include "trace.h"
 #include "nvme.h"
+#include "trace.h"
 
 #define MIN_DISCARD_GRANULARITY (4 * KiB)
 
diff --git a/hw/block/nvme-subsys.c b/hw/block/nvme-subsys.c
index 3c404e3fcb78..192223d17ca1 100644
--- a/hw/block/nvme-subsys.c
+++ b/hw/block/nvme-subsys.c
@@ -6,18 +6,9 @@
  * This code is licensed under the GNU GPL v2.  Refer COPYING.
  */
 
-#include "qemu/units.h"
 #include "qemu/osdep.h"
-#include "qemu/uuid.h"
-#include "qemu/iov.h"
-#include "qemu/cutils.h"
 #include "qapi/error.h"
-#include "hw/qdev-properties.h"
-#include "hw/qdev-core.h"
-#include "hw/block/block.h"
-#include "block/aio.h"
-#include "block/accounting.h"
-#include "hw/pci/pci.h"
+
 #include "nvme.h"
 
 int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 29f80d543903..e152c61adb76 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -144,24 +144,20 @@
  */
 
 #include "qemu/osdep.h"
-#include "qemu/units.h"
+#include "qemu/cutils.h"
 #include "qemu/error-report.h"
-#include "hw/block/block.h"
-#include "hw/pci/msix.h"
-#include "hw/pci/pci.h"
-#include "hw/qdev-properties.h"
-#include "migration/vmstate.h"
-#include "sysemu/sysemu.h"
+#include "qemu/log.h"
+#include "qemu/units.h"
 #include "qapi/error.h"
 #include "qapi/visitor.h"
-#include "sysemu/hostmem.h"
+#include "sysemu/sysemu.h"
 #include "sysemu/block-backend.h"
-#include "exec/memory.h"
-#include "qemu/log.h"
-#include "qemu/module.h"
-#include "qemu/cutils.h"
-#include "trace.h"
+#include "sysemu/hostmem.h"
+#include "hw/pci/msix.h"
+#include "migration/vmstate.h"
+
 #include "nvme.h"
+#include "trace.h"
 
 #define NVME_MAX_IOQPAIRS 0x
 #define NVME_DB_SIZE  4
-- 
2.31.1




[PULL 15/20] hw/block/nvme: add metadata offset helper

2021-05-17 Thread Klaus Jensen
From: Klaus Jensen 

Add an nvme_moff() helper.

Signed-off-by: Klaus Jensen 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.h |  7 ++-
 hw/block/nvme-dif.c |  4 ++--
 hw/block/nvme-ns.c  |  2 +-
 hw/block/nvme.c | 12 ++--
 4 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index dc065e57b509..9349d1c33ad7 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -107,7 +107,7 @@ typedef struct NvmeNamespace {
 BlockConfblkconf;
 int32_t  bootindex;
 int64_t  size;
-int64_t  mdata_offset;
+int64_t  moff;
 NvmeIdNs id_ns;
 NvmeLBAF lbaf;
 size_t   lbasz;
@@ -158,6 +158,11 @@ static inline size_t nvme_m2b(NvmeNamespace *ns, uint64_t 
lba)
 return ns->lbaf.ms * lba;
 }
 
+static inline int64_t nvme_moff(NvmeNamespace *ns, uint64_t lba)
+{
+return ns->moff + nvme_m2b(ns, lba);
+}
+
 static inline bool nvme_ns_ext(NvmeNamespace *ns)
 {
 return !!NVME_ID_NS_FLBAS_EXTENDED(ns->id_ns.flbas);
diff --git a/hw/block/nvme-dif.c b/hw/block/nvme-dif.c
index c72e43195abf..88efcbe9bd60 100644
--- a/hw/block/nvme-dif.c
+++ b/hw/block/nvme-dif.c
@@ -306,7 +306,7 @@ static void nvme_dif_rw_mdata_in_cb(void *opaque, int ret)
 uint64_t slba = le64_to_cpu(rw->slba);
 uint32_t nlb = le16_to_cpu(rw->nlb) + 1;
 size_t mlen = nvme_m2b(ns, nlb);
-uint64_t offset = ns->mdata_offset + nvme_m2b(ns, slba);
+uint64_t offset = nvme_moff(ns, slba);
 BlockBackend *blk = ns->blkconf.blk;
 
 trace_pci_nvme_dif_rw_mdata_in_cb(nvme_cid(req), blk_name(blk));
@@ -335,7 +335,7 @@ static void nvme_dif_rw_mdata_out_cb(void *opaque, int ret)
 NvmeNamespace *ns = req->ns;
 NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
 uint64_t slba = le64_to_cpu(rw->slba);
-uint64_t offset = ns->mdata_offset + nvme_m2b(ns, slba);
+uint64_t offset = nvme_moff(ns, slba);
 BlockBackend *blk = ns->blkconf.blk;
 
 trace_pci_nvme_dif_rw_mdata_out_cb(nvme_cid(req), blk_name(blk));
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index b9369460b965..b25838ac4fd4 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -42,7 +42,7 @@ void nvme_ns_init_format(NvmeNamespace *ns)
 id_ns->ncap = id_ns->nsze;
 id_ns->nuse = id_ns->ncap;
 
-ns->mdata_offset = (int64_t)nlbas << ns->lbaf.ds;
+ns->moff = (int64_t)nlbas << ns->lbaf.ds;
 
 npdg = ns->blkconf.discard_granularity / ns->lbasz;
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 9153d5d91363..1db9a603f5c4 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1839,7 +1839,7 @@ static void nvme_rw_cb(void *opaque, int ret)
 NvmeRwCmd *rw = (NvmeRwCmd *)>cmd;
 uint64_t slba = le64_to_cpu(rw->slba);
 uint32_t nlb = (uint32_t)le16_to_cpu(rw->nlb) + 1;
-uint64_t offset = ns->mdata_offset + nvme_m2b(ns, slba);
+uint64_t offset = nvme_moff(ns, slba);
 
 if (req->cmd.opcode == NVME_CMD_WRITE_ZEROES) {
 size_t mlen = nvme_m2b(ns, nlb);
@@ -2005,7 +2005,7 @@ static void nvme_verify_mdata_in_cb(void *opaque, int ret)
 uint64_t slba = le64_to_cpu(rw->slba);
 uint32_t nlb = le16_to_cpu(rw->nlb) + 1;
 size_t mlen = nvme_m2b(ns, nlb);
-uint64_t offset = ns->mdata_offset + nvme_m2b(ns, slba);
+uint64_t offset = nvme_moff(ns, slba);
 BlockBackend *blk = ns->blkconf.blk;
 
 trace_pci_nvme_verify_mdata_in_cb(nvme_cid(req), blk_name(blk));
@@ -2108,7 +2108,7 @@ static void nvme_aio_zone_reset_cb(void *opaque, int ret)
 }
 
 if (ns->lbaf.ms) {
-int64_t offset = ns->mdata_offset + nvme_m2b(ns, zone->d.zslba);
+int64_t offset = nvme_moff(ns, zone->d.zslba);
 
 blk_aio_pwrite_zeroes(ns->blkconf.blk, offset,
   nvme_m2b(ns, ns->zone_size), BDRV_REQ_MAY_UNMAP,
@@ -2179,7 +2179,7 @@ static void nvme_copy_cb(void *opaque, int ret)
 if (ns->lbaf.ms) {
 NvmeCopyCmd *copy = (NvmeCopyCmd *)>cmd;
 uint64_t sdlba = le64_to_cpu(copy->sdlba);
-int64_t offset = ns->mdata_offset + nvme_m2b(ns, sdlba);
+int64_t offset = nvme_moff(ns, sdlba);
 
 qemu_iovec_reset(>sg.iov);
 qemu_iovec_add(>sg.iov, ctx->mbounce, nvme_m2b(ns, ctx->nlb));
@@ -2485,7 +2485,7 @@ static void nvme_compare_data_cb(void *opaque, int ret)
 uint64_t slba = le64_to_cpu(rw->slba);
 uint32_t nlb = le16_to_cpu(rw->nlb) + 1;
 size_t mlen = nvme_m2b(ns, nlb);
-uint64_t offset = ns->mdata_offset + nvme_m2b(ns, slba);
+uint64_t offset = nvme_moff(ns, slba);
 
 ctx->mdata.bounce = g_malloc(mlen);
 
@@ -2762,7 +2762,7 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
 
 if (ns->lbaf.ms) {
 len = nvme_m2b(ns, nlb);
-offset = ns->mdata_offset + nvme_m2b(ns, slba);
+offset = nvme_moff(ns, slba);
 
 in_ctx = g_new(struct nvme_copy_in_ctx, 1);
 in_ctx->req = req;
-- 
2.31.1




[PULL 09/20] hw/block/nvme: rename __nvme_select_ns_iocs

2021-05-17 Thread Klaus Jensen
From: Klaus Jensen 

Get rid of the (reserved) double underscore use.

Cc: Philippe Mathieu-Daudé 
Cc: Thomas Huth 
Signed-off-by: Klaus Jensen 
Reviewed-by: Thomas Huth 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 47 +++
 1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index acbfa3f890dc..f0cfca869875 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -4928,7 +4928,25 @@ static void nvme_update_dmrsl(NvmeCtrl *n)
 }
 }
 
-static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns);
+static void nvme_select_iocs_ns(NvmeCtrl *n, NvmeNamespace *ns)
+{
+ns->iocs = nvme_cse_iocs_none;
+switch (ns->csi) {
+case NVME_CSI_NVM:
+if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) {
+ns->iocs = nvme_cse_iocs_nvm;
+}
+break;
+case NVME_CSI_ZONED:
+if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) {
+ns->iocs = nvme_cse_iocs_zoned;
+} else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) {
+ns->iocs = nvme_cse_iocs_nvm;
+}
+break;
+}
+}
+
 static uint16_t nvme_ns_attachment(NvmeCtrl *n, NvmeRequest *req)
 {
 NvmeNamespace *ns;
@@ -4979,7 +4997,7 @@ static uint16_t nvme_ns_attachment(NvmeCtrl *n, 
NvmeRequest *req)
 }
 
 nvme_attach_ns(ctrl, ns);
-__nvme_select_ns_iocs(ctrl, ns);
+nvme_select_iocs_ns(ctrl, ns);
 } else {
 if (!nvme_ns(ctrl, nsid)) {
 return NVME_NS_NOT_ATTACHED | NVME_DNR;
@@ -5280,26 +5298,7 @@ static void nvme_ctrl_shutdown(NvmeCtrl *n)
 }
 }
 
-static void __nvme_select_ns_iocs(NvmeCtrl *n, NvmeNamespace *ns)
-{
-ns->iocs = nvme_cse_iocs_none;
-switch (ns->csi) {
-case NVME_CSI_NVM:
-if (NVME_CC_CSS(n->bar.cc) != NVME_CC_CSS_ADMIN_ONLY) {
-ns->iocs = nvme_cse_iocs_nvm;
-}
-break;
-case NVME_CSI_ZONED:
-if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_CSI) {
-ns->iocs = nvme_cse_iocs_zoned;
-} else if (NVME_CC_CSS(n->bar.cc) == NVME_CC_CSS_NVM) {
-ns->iocs = nvme_cse_iocs_nvm;
-}
-break;
-}
-}
-
-static void nvme_select_ns_iocs(NvmeCtrl *n)
+static void nvme_select_iocs(NvmeCtrl *n)
 {
 NvmeNamespace *ns;
 int i;
@@ -5310,7 +5309,7 @@ static void nvme_select_ns_iocs(NvmeCtrl *n)
 continue;
 }
 
-__nvme_select_ns_iocs(n, ns);
+nvme_select_iocs_ns(n, ns);
 }
 }
 
@@ -5412,7 +5411,7 @@ static int nvme_start_ctrl(NvmeCtrl *n)
 
 QTAILQ_INIT(>aer_queue);
 
-nvme_select_ns_iocs(n);
+nvme_select_iocs(n);
 
 return 0;
 }
-- 
2.31.1




[PULL 08/20] hw/block/nvme: rename __nvme_advance_zone_wp

2021-05-17 Thread Klaus Jensen
From: Klaus Jensen 

Get rid of the (reserved) double underscore use.

Cc: Philippe Mathieu-Daudé 
Cc: Thomas Huth 
Signed-off-by: Klaus Jensen 
Reviewed-by: Thomas Huth 
Reviewed-by: Keith Busch 
---
 hw/block/nvme.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 9e5ab4cacb06..acbfa3f890dc 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1757,8 +1757,8 @@ static inline uint16_t nvme_zrm_open(NvmeNamespace *ns, 
NvmeZone *zone)
 return nvme_zrm_open_flags(ns, zone, 0);
 }
 
-static void __nvme_advance_zone_wp(NvmeNamespace *ns, NvmeZone *zone,
-   uint32_t nlb)
+static void nvme_advance_zone_wp(NvmeNamespace *ns, NvmeZone *zone,
+ uint32_t nlb)
 {
 zone->d.wp += nlb;
 
@@ -1778,7 +1778,7 @@ static void nvme_finalize_zoned_write(NvmeNamespace *ns, 
NvmeRequest *req)
 nlb = le16_to_cpu(rw->nlb) + 1;
 zone = nvme_get_zone_by_slba(ns, slba);
 
-__nvme_advance_zone_wp(ns, zone, nlb);
+nvme_advance_zone_wp(ns, zone, nlb);
 }
 
 static inline bool nvme_is_write(NvmeRequest *req)
@@ -2167,7 +2167,7 @@ out:
 uint64_t sdlba = le64_to_cpu(copy->sdlba);
 NvmeZone *zone = nvme_get_zone_by_slba(ns, sdlba);
 
-__nvme_advance_zone_wp(ns, zone, ctx->nlb);
+nvme_advance_zone_wp(ns, zone, ctx->nlb);
 }
 
 g_free(ctx->bounce);
-- 
2.31.1




[PULL 06/20] hw/block/nvme: align with existing style

2021-05-17 Thread Klaus Jensen
From: Gollu Appalanaidu 

While QEMU coding style prefers lowercase hexadecimals in constants, the
NVMe subsystem uses the format from the NVMe specifications in comments,
i.e. 'h' suffix instead of '0x' prefix.

Fix this up across the code base.

Signed-off-by: Gollu Appalanaidu 
[k.jensen: updated message; added conversion in a couple of missing comments]
Signed-off-by: Klaus Jensen 
---
 include/block/nvme.h | 10 +++
 hw/block/nvme-ns.c   |  2 +-
 hw/block/nvme.c  | 67 +---
 3 files changed, 44 insertions(+), 35 deletions(-)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index e7fc119adb24..0ff9ce17a99e 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -848,8 +848,8 @@ enum NvmeStatusCodes {
 NVME_FW_REQ_SUSYSTEM_RESET  = 0x0110,
 NVME_NS_ALREADY_ATTACHED= 0x0118,
 NVME_NS_PRIVATE = 0x0119,
-NVME_NS_NOT_ATTACHED= 0x011A,
-NVME_NS_CTRL_LIST_INVALID   = 0x011C,
+NVME_NS_NOT_ATTACHED= 0x011a,
+NVME_NS_CTRL_LIST_INVALID   = 0x011c,
 NVME_CONFLICTING_ATTRS  = 0x0180,
 NVME_INVALID_PROT_INFO  = 0x0181,
 NVME_WRITE_TO_RO= 0x0182,
@@ -1409,9 +1409,9 @@ typedef enum NvmeZoneState {
 NVME_ZONE_STATE_IMPLICITLY_OPEN  = 0x02,
 NVME_ZONE_STATE_EXPLICITLY_OPEN  = 0x03,
 NVME_ZONE_STATE_CLOSED   = 0x04,
-NVME_ZONE_STATE_READ_ONLY= 0x0D,
-NVME_ZONE_STATE_FULL = 0x0E,
-NVME_ZONE_STATE_OFFLINE  = 0x0F,
+NVME_ZONE_STATE_READ_ONLY= 0x0d,
+NVME_ZONE_STATE_FULL = 0x0e,
+NVME_ZONE_STATE_OFFLINE  = 0x0f,
 } NvmeZoneState;
 
 static inline void _nvme_check_size(void)
diff --git a/hw/block/nvme-ns.c b/hw/block/nvme-ns.c
index 7bb618f18209..a0895614d9c3 100644
--- a/hw/block/nvme-ns.c
+++ b/hw/block/nvme-ns.c
@@ -303,7 +303,7 @@ static void nvme_ns_init_zoned(NvmeNamespace *ns)
 
 id_ns_z = g_malloc0(sizeof(NvmeIdNsZoned));
 
-/* MAR/MOR are zeroes-based, 0x means no limit */
+/* MAR/MOR are zeroes-based, Fh means no limit */
 id_ns_z->mar = cpu_to_le32(ns->params.max_active_zones - 1);
 id_ns_z->mor = cpu_to_le32(ns->params.max_open_zones - 1);
 id_ns_z->zoc = 0;
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 79a087a41ce8..baba949660f2 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -12,10 +12,19 @@
  * Reference Specs: http://www.nvmexpress.org, 1.4, 1.3, 1.2, 1.1, 1.0e
  *
  *  https://nvmexpress.org/developers/nvme-specification/
- */
-
-/**
- * Usage: add options:
+ *
+ *
+ * Notes on coding style
+ * -
+ * While QEMU coding style prefers lowercase hexadecimals in constants, the
+ * NVMe subsystem use thes format from the NVMe specifications in the comments
+ * (i.e. 'h' suffix instead of '0x' prefix).
+ *
+ * Usage
+ * -
+ * See docs/system/nvme.rst for extensive documentation.
+ *
+ * Add options:
  *  -drive file=,if=none,id=
  *  -device nvme-subsys,id=,nqn=
  *  -device nvme,serial=,id=, \
@@ -3613,18 +3622,18 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 
 /*
  * In the base NVM command set, Flush may apply to all namespaces
- * (indicated by NSID being set to 0x). But if that feature is used
+ * (indicated by NSID being set to h). But if that feature is used
  * along with TP 4056 (Namespace Types), it may be pretty screwed up.
  *
- * If NSID is indeed set to 0x, we simply cannot associate the
+ * If NSID is indeed set to h, we simply cannot associate the
  * opcode with a specific command since we cannot determine a unique I/O
- * command set. Opcode 0x0 could have any other meaning than something
+ * command set. Opcode 0h could have any other meaning than something
  * equivalent to flushing and say it DOES have completely different
- * semantics in some other command set - does an NSID of 0x then
+ * semantics in some other command set - does an NSID of h then
  * mean "for all namespaces, apply whatever command set specific command
- * that uses the 0x0 opcode?" Or does it mean "for all namespaces, apply
- * whatever command that uses the 0x0 opcode if, and only if, it allows
- * NSID to be 0x"?
+ * that uses the 0h opcode?" Or does it mean "for all namespaces, apply
+ * whatever command that uses the 0h opcode if, and only if, it allows NSID
+ * to be h"?
  *
  * Anyway (and luckily), for now, we do not care about this since the
  * device only supports namespace types that includes the NVM Flush command
@@ -3940,7 +3949,7 @@ static uint16_t nvme_changed_nslist(NvmeCtrl *n, uint8_t 
rae, uint32_t buf_len,
 NVME_CHANGED_NSID_SIZE) {
 /*
  * If more than 1024 namespaces, the first entry in the log page should
- * be set to 0x and the others to 0 as 

[PULL 04/20] hw/block/nvme: fix io-command set profile feature

2021-05-17 Thread Klaus Jensen
From: Gollu Appalanaidu 

Currently IO Command Set Profile feature is supported, but the feature
support flag not set. Further, this feature is changable. Fix that.

Additionally, remove filling default value of the CQE result with zero,
since it will fall back to the default case anyway.

Signed-off-by: Gollu Appalanaidu 
[k.jensen: fix up commit message]
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 67abc9eb2c24..14c24f9b0866 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -185,6 +185,7 @@ static const bool nvme_feature_support[NVME_FID_MAX] = {
 [NVME_WRITE_ATOMICITY]  = true,
 [NVME_ASYNCHRONOUS_EVENT_CONF]  = true,
 [NVME_TIMESTAMP]= true,
+[NVME_COMMAND_SET_PROFILE]  = true,
 };
 
 static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
@@ -194,6 +195,7 @@ static const uint32_t nvme_feature_cap[NVME_FID_MAX] = {
 [NVME_NUMBER_OF_QUEUES] = NVME_FEAT_CAP_CHANGE,
 [NVME_ASYNCHRONOUS_EVENT_CONF]  = NVME_FEAT_CAP_CHANGE,
 [NVME_TIMESTAMP]= NVME_FEAT_CAP_CHANGE,
+[NVME_COMMAND_SET_PROFILE]  = NVME_FEAT_CAP_CHANGE,
 };
 
 static const uint32_t nvme_cse_acs[256] = {
@@ -4711,9 +4713,6 @@ defaults:
 result |= NVME_INTVC_NOCOALESCING;
 }
 break;
-case NVME_COMMAND_SET_PROFILE:
-result = 0;
-break;
 default:
 result = nvme_feature_default[fid];
 break;
-- 
2.31.1




[PULL 01/20] hw/block/nvme: remove redundant invalid_lba_range trace

2021-05-17 Thread Klaus Jensen
From: Gollu Appalanaidu 

Currently pci_nvme_err_invalid_lba_range trace is called individually at
each nvme_check_bounds() call site.

Move the trace event to nvme_check_bounds() and remove the redundant
events.

Signed-off-by: Gollu Appalanaidu 
Reviewed-by: Philippe Mathieu-Daudé 
[k.jensen: commit message fixup]
Signed-off-by: Klaus Jensen 
---
 hw/block/nvme.c | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 5fe082ec34c5..cd594280a7f9 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1426,6 +1426,7 @@ static inline uint16_t nvme_check_bounds(NvmeNamespace 
*ns, uint64_t slba,
 uint64_t nsze = le64_to_cpu(ns->id_ns.nsze);
 
 if (unlikely(UINT64_MAX - slba < nlb || slba + nlb > nsze)) {
+trace_pci_nvme_err_invalid_lba_range(slba, nlb, nsze);
 return NVME_LBA_RANGE | NVME_DNR;
 }
 
@@ -2268,7 +2269,6 @@ static void nvme_copy_in_complete(NvmeRequest *req)
 
 status = nvme_check_bounds(ns, sdlba, ctx->nlb);
 if (status) {
-trace_pci_nvme_err_invalid_lba_range(sdlba, ctx->nlb, ns->id_ns.nsze);
 goto invalid;
 }
 
@@ -2530,8 +2530,6 @@ static uint16_t nvme_dsm(NvmeCtrl *n, NvmeRequest *req)
 uint32_t nlb = le32_to_cpu(range[i].nlb);
 
 if (nvme_check_bounds(ns, slba, nlb)) {
-trace_pci_nvme_err_invalid_lba_range(slba, nlb,
- ns->id_ns.nsze);
 continue;
 }
 
@@ -2604,7 +2602,6 @@ static uint16_t nvme_verify(NvmeCtrl *n, NvmeRequest *req)
 
 status = nvme_check_bounds(ns, slba, nlb);
 if (status) {
-trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
 return status;
 }
 
@@ -2689,7 +2686,6 @@ static uint16_t nvme_copy(NvmeCtrl *n, NvmeRequest *req)
 
 status = nvme_check_bounds(ns, slba, _nlb);
 if (status) {
-trace_pci_nvme_err_invalid_lba_range(slba, _nlb, ns->id_ns.nsze);
 goto out;
 }
 
@@ -2818,7 +2814,6 @@ static uint16_t nvme_compare(NvmeCtrl *n, NvmeRequest 
*req)
 
 status = nvme_check_bounds(ns, slba, nlb);
 if (status) {
-trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
 return status;
 }
 
@@ -2938,7 +2933,6 @@ static uint16_t nvme_read(NvmeCtrl *n, NvmeRequest *req)
 
 status = nvme_check_bounds(ns, slba, nlb);
 if (status) {
-trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
 goto invalid;
 }
 
@@ -3018,7 +3012,6 @@ static uint16_t nvme_do_write(NvmeCtrl *n, NvmeRequest 
*req, bool append,
 
 status = nvme_check_bounds(ns, slba, nlb);
 if (status) {
-trace_pci_nvme_err_invalid_lba_range(slba, nlb, ns->id_ns.nsze);
 goto invalid;
 }
 
-- 
2.31.1




[PULL 00/20] emulated nvme updates

2021-05-17 Thread Klaus Jensen
From: Klaus Jensen 

Hi Peter,

The following changes since commit 6005ee07c380cbde44292f5f6c96e7daa70f4f7d:

  Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging 
(2021-05-16 17:22:46 +0100)

are available in the Git repository at:

  git://git.infradead.org/qemu-nvme.git tags/nvme-next-pull-request

for you to fetch changes up to 88eea45c536470cd3c43440cbb1cd4d3b9fa519c:

  hw/nvme: move nvme emulation out of hw/block (2021-05-17 09:19:00 +0200)


emulated nvme updates

* various fixes (Gollu Appalanaidu)
* refactoring (me)
* move to hw/nvme from hw/block (me)



Gollu Appalanaidu (6):
  hw/block/nvme: remove redundant invalid_lba_range trace
  hw/block/nvme: rename reserved fields declarations
  hw/block/nvme: consider metadata read aio return value in compare
  hw/block/nvme: fix io-command set profile feature
  hw/block/nvme: function formatting fix
  hw/block/nvme: align with existing style

Klaus Jensen (14):
  hw/block/nvme: rename __nvme_zrm_open
  hw/block/nvme: rename __nvme_advance_zone_wp
  hw/block/nvme: rename __nvme_select_ns_iocs
  hw/block/nvme: consolidate header files
  hw/block/nvme: cleanup includes
  hw/block/nvme: remove non-shared defines from header file
  hw/block/nvme: replace nvme_ns_status
  hw/block/nvme: cache lba and ms sizes
  hw/block/nvme: add metadata offset helper
  hw/block/nvme: streamline namespace array indexing
  hw/block/nvme: remove num_namespaces member
  hw/block/nvme: remove irrelevant zone resource checks
  hw/block/nvme: move zoned constraints checks
  hw/nvme: move nvme emulation out of hw/block

 meson.build   |   1 +
 hw/block/nvme-dif.h   |  63 ---
 hw/block/nvme-ns.h| 229 -
 hw/block/nvme-subsys.h|  59 ---
 hw/block/nvme.h   | 266 ---
 hw/nvme/nvme.h| 547 ++
 hw/nvme/trace.h   |   1 +
 include/block/nvme.h  |  12 +-
 hw/{block/nvme.c => nvme/ctrl.c}  | 298 ++--
 hw/{block/nvme-dif.c => nvme/dif.c}   |  57 +--
 hw/{block/nvme-ns.c => nvme/ns.c} | 106 ++---
 hw/{block/nvme-subsys.c => nvme/subsys.c} |  12 +-
 MAINTAINERS   |   2 +-
 hw/Kconfig|   1 +
 hw/block/Kconfig  |   5 -
 hw/block/meson.build  |   1 -
 hw/block/trace-events | 206 
 hw/meson.build|   1 +
 hw/nvme/Kconfig   |   4 +
 hw/nvme/meson.build   |   1 +
 hw/nvme/trace-events  | 204 
 21 files changed, 988 insertions(+), 1088 deletions(-)
 delete mode 100644 hw/block/nvme-dif.h
 delete mode 100644 hw/block/nvme-ns.h
 delete mode 100644 hw/block/nvme-subsys.h
 delete mode 100644 hw/block/nvme.h
 create mode 100644 hw/nvme/nvme.h
 create mode 100644 hw/nvme/trace.h
 rename hw/{block/nvme.c => nvme/ctrl.c} (96%)
 rename hw/{block/nvme-dif.c => nvme/dif.c} (90%)
 rename hw/{block/nvme-ns.c => nvme/ns.c} (87%)
 rename hw/{block/nvme-subsys.c => nvme/subsys.c} (86%)
 create mode 100644 hw/nvme/Kconfig
 create mode 100644 hw/nvme/meson.build
 create mode 100644 hw/nvme/trace-events

-- 
2.31.1




[PULL 02/20] hw/block/nvme: rename reserved fields declarations

2021-05-17 Thread Klaus Jensen
From: Gollu Appalanaidu 

Align the 'rsvd1' reserved field declaration in NvmeBar with existing
style.

Signed-off-by: Gollu Appalanaidu 
[k.jensen: minor commit message fixup]
Signed-off-by: Klaus Jensen 
---
 include/block/nvme.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 4ac926fbc687..e7fc119adb24 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -7,7 +7,7 @@ typedef struct QEMU_PACKED NvmeBar {
 uint32_tintms;
 uint32_tintmc;
 uint32_tcc;
-uint32_trsvd1;
+uint8_t rsvd24[4];
 uint32_tcsts;
 uint32_tnssrc;
 uint32_taqa;
-- 
2.31.1




Re: [PATCH v2 0/4] virtio: Improve boot time of virtio-scsi-pci and virtio-blk-pci

2021-05-17 Thread Greg Kurz
On Wed, 12 May 2021 17:05:53 +0100
Stefan Hajnoczi  wrote:

> On Fri, May 07, 2021 at 06:59:01PM +0200, Greg Kurz wrote:
> > Now that virtio-scsi-pci and virtio-blk-pci map 1 virtqueue per vCPU,
> > a serious slow down may be observed on setups with a big enough number
> > of vCPUs.
> > 
> > Exemple with a pseries guest on a bi-POWER9 socket system (128 HW threads):
> > 
> >   virtio-scsi  virtio-blk
> > 
> > 1   0m20.922s   0m21.346s
> > 2   0m21.230s   0m20.350s
> > 4   0m21.761s   0m20.997s
> > 8   0m22.770s   0m20.051s
> > 16  0m22.038s   0m19.994s
> > 32  0m22.928s   0m20.803s
> > 64  0m26.583s   0m22.953s
> > 128 0m41.273s   0m32.333s
> > 256 2m4.727s1m16.924s
> > 384 6m5.563s3m26.186s
> > 
> > Both perf and gprof indicate that QEMU is hogging CPUs when setting up
> > the ioeventfds:
> > 
> >  67.88%  swapper [kernel.kallsyms]  [k] power_pmu_enable
> >   9.47%  qemu-kvm[kernel.kallsyms]  [k] smp_call_function_single
> >   8.64%  qemu-kvm[kernel.kallsyms]  [k] power_pmu_enable
> > =>2.79%  qemu-kvmqemu-kvm   [.] 
> > memory_region_ioeventfd_before
> > =>2.12%  qemu-kvmqemu-kvm   [.] 
> > address_space_update_ioeventfds
> >   0.56%  kworker/8:0-mm  [kernel.kallsyms]  [k] smp_call_function_single
> > 
> > address_space_update_ioeventfds() is called when committing an MR
> > transaction, i.e. for each ioeventfd with the current code base,
> > and it internally loops on all ioventfds:
> > 
> > static void address_space_update_ioeventfds(AddressSpace *as)
> > {
> > [...]
> > FOR_EACH_FLAT_RANGE(fr, view) {
> > for (i = 0; i < fr->mr->ioeventfd_nb; ++i) {
> > 
> > This means that the setup of ioeventfds for these devices has
> > quadratic time complexity.
> > 
> > This series simply changes the device models to extend the transaction
> > to all virtqueueues, like already done in the past in the generic
> > code with 710fccf80d78 ("virtio: improve virtio devices initialization
> > time").
> > 
> > Only virtio-scsi and virtio-blk are covered here, but a similar change
> > might also be beneficial to other device types such as host-scsi-pci,
> > vhost-user-scsi-pci and vhost-user-blk-pci.
> > 
> >   virtio-scsi  virtio-blk
> > 
> > 1   0m21.271s   0m22.076s
> > 2   0m20.912s   0m19.716s
> > 4   0m20.508s   0m19.310s
> > 8   0m21.374s   0m20.273s
> > 16  0m21.559s   0m21.374s
> > 32  0m22.532s   0m21.271s
> > 64  0m26.550s   0m22.007s
> > 128 0m29.115s   0m27.446s
> > 256 0m44.752s   0m41.004s
> > 384 1m2.884s0m58.023s
> > 
> > This should fix https://bugzilla.redhat.com/show_bug.cgi?id=1927108
> > which reported the issue for virtio-scsi-pci.
> > 
> > Changes since v1:
> > - Add some comments (Stefan)
> > - Drop optimization on the error path in patch 2 (Stefan)
> > 
> > Changes since RFC:
> > 
> > As suggested by Stefan, splimplify the code by directly beginning and
> > committing the memory transaction from the device model, without all
> > the virtio specific proxying code and no changes needed in the memory
> > subsystem.
> > 
> > Greg Kurz (4):
> >   virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
> >   virtio-blk: Configure all host notifiers in a single MR transaction
> >   virtio-scsi: Set host notifiers and callbacks separately
> >   virtio-scsi: Configure all host notifiers in a single MR transaction
> > 
> >  hw/block/dataplane/virtio-blk.c | 45 -
> >  hw/scsi/virtio-scsi-dataplane.c | 72 -
> >  2 files changed, 97 insertions(+), 20 deletions(-)
> > 
> > -- 
> > 2.26.3
> > 
> 
> Thanks, applied to my block tree:
> https://gitlab.com/stefanha/qemu/commits/block
> 

Hi Stefan,

It seems that Michael already merged the previous version of this
patch set with its latest PR.

https://gitlab.com/qemu-project/qemu/-/commit/6005ee07c380cbde44292f5f6c96e7daa70f4f7d

It is thus missing the v1->v2 changes. Basically some comments to
clarify the optimization we're doing with the MR transaction and
the removal of the optimization on an error path.

The optimization on the error path isn't needed indeed but it
doesn't hurt. No need to change that now that the patches are
upstream.

I can post a follow-up patch to add the missing comments though.
While here, I'd even add these comments in the generic
virtio_device_*_ioeventfd_impl() calls as well, since they already
have the very same optimization.

Anyway, I guess you can drop the patches from your tree.

Cheers,

--
Greg

> Stefan



pgpcj4AIrpiAx.pgp
Description: OpenPGP digital signature


  1   2   >