Re: [PATCH v3 00/14] scsi: add quirks and features to support m68k Macs

2022-07-13 Thread Mark Cave-Ayland

On 12/07/2022 15:48, Paolo Bonzini wrote:


Queued, thanks (I was on vacation last week).

I am a bit scared about the mode_select_truncated quirk.  My reading
of the code is that the MODE SELECT would fail anyway because the
page length does not match in scsi_disk_check_mode_select:

 len = mode_sense_page(s, page, &p, 0);
 if (len < 0 || len != expected_len) {
 return -1;
 }

Is that correct?  If not, I'm not sure where I am wrong.  If so,
I wonder if it is enough for the quirk to do just a "goto invalid_param;"
in place of invalid_param_len.


(goes and looks)

The truncated MODE SELECT request in question seems to be only used by the initial 
A/UX installation image and looks like this:


scsi_req_parsed target 3 lun 0 tag 0 command 21 dir 2 length 20
scsi_req_parsed_lba target 3 lun 0 tag 0 command 21 lba 0
scsi_req_alloc target 3 lun 0 tag 0
scsi_disk_new_request Command: lun=0 tag=0x0 data= 0x15 0x00 0x00 0x00 0x14 0x00
scsi_disk_emulate_command_MODE_SELECT Mode Select(6) (len 20)
scsi_req_continue target 3 lun 0 tag 0
scsi_disk_emulate_write_data Write buf_len=20
scsi_req_data target 3 lun 0 tag 0 len 20
scsi_req_continue target 3 lun 0 tag 0

This includes an 8 byte block descriptor which is used to set the CDROM sector size 
to 512 bytes and so the request content looks like:


4 bytes hdr_len for MODE SELECT(6)
8 bytes bd_len for the block descriptor
8 bytes of data for page 0 (MODE_PAGE_R_W_ERROR) data with page_len = 10

Stepping through mode_select_pages() in the debugger shows that since page_len is set 
correctly to 10 bytes (which matches the expected length in mode_sense_page()) the 
above check succeeds.


I suspect what happened is that the original author built the MODE_PAGE_R_W_ERROR 
page data correctly but miscalculated the length of the request in the CDB. Given 
that the truncated request is seemingly accepted on real hardware, no-one actually 
noticed until now :)



ATB,

Mark.



Re: [PATCH 00/12] hw/nvme: misc fixes and updates

2022-07-13 Thread Klaus Jensen
On Jun 23 23:18, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> This series includes a couple of misc fixes as well as some cleanup
> pertaining to the aio handling in flush, dsm, copy and zone reset. As
> Jinhao gets around to iothread stuff, it might come in handy to have
> this stuff cleaned up a bit.
> 
> Dmitrys fix (nvme-next commit "hw/nvme: add missing return statement")
> for dsm prompted me to audit the flush, dsm, zone reset and copy code
> and that resulted in the discovery of some bugs and some general clean
> up.
> 
> Klaus Jensen (12):
>   hw/nvme: fix incorrect use of errp/local_err
>   hw/nvme: remove redundant passing of PCIDevice
>   hw/nvme: cleanup error reporting in nvme_init_pci()
>   hw/nvme: fix numzrwa handling
>   hw/nvme: fix accidental reintroduction of redundant code
>   hw/nvme: fix cancellation of format operations
>   hw/nvme: fix flush cancel
>   hw/nvme: rework flush bh scheduling
>   hw/nvme: improve cancellation handling in zone reset
>   hw/nvme: improve cancellation handling in dsm
>   hw/nvme: simplify copy command error handling
>   hw/nvme: align logic of format with flush
> 
>  hw/nvme/ctrl.c | 252 +++--
>  hw/nvme/ns.c   |   4 +-
>  2 files changed, 119 insertions(+), 137 deletions(-)
> 
> -- 
> 2.36.1
> 

Ping,

We are coming up on the 7.1 soft-freeze. Most of the above are fixes, so
they can go in after, but I'd like to get a couple of the other non-fix
patches in if possile ;)


signature.asc
Description: PGP signature


[PATCH] hw/nvme: add trace events for ioeventfd

2022-07-13 Thread Klaus Jensen
From: Klaus Jensen 

While testing Jinhaos ioeventfd patch I found it useful with a couple of
additional trace events since we no longer see the mmio events.

Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c   | 8 
 hw/nvme/trace-events | 4 
 2 files changed, 12 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 533ad14e7a61..09725ec49c5d 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1346,6 +1346,8 @@ static void nvme_post_cqes(void *opaque)
 bool pending = cq->head != cq->tail;
 int ret;
 
+trace_pci_nvme_post_cqes(cq->cqid);
+
 QTAILQ_FOREACH_SAFE(req, &cq->req_list, entry, next) {
 NvmeSQueue *sq;
 hwaddr addr;
@@ -4238,6 +4240,8 @@ static void nvme_cq_notifier(EventNotifier *e)
 NvmeCQueue *cq = container_of(e, NvmeCQueue, notifier);
 NvmeCtrl *n = cq->ctrl;
 
+trace_pci_nvme_cq_notify(cq->cqid);
+
 event_notifier_test_and_clear(&cq->notifier);
 
 nvme_update_cq_head(cq);
@@ -4275,6 +4279,8 @@ static void nvme_sq_notifier(EventNotifier *e)
 {
 NvmeSQueue *sq = container_of(e, NvmeSQueue, notifier);
 
+trace_pci_nvme_sq_notify(sq->sqid);
+
 event_notifier_test_and_clear(&sq->notifier);
 
 nvme_process_sq(sq);
@@ -6240,6 +6246,8 @@ static void nvme_process_sq(void *opaque)
 NvmeCtrl *n = sq->ctrl;
 NvmeCQueue *cq = n->cq[sq->cqid];
 
+trace_pci_nvme_process_sq(sq->sqid);
+
 uint16_t status;
 hwaddr addr;
 NvmeCmd cmd;
diff --git a/hw/nvme/trace-events b/hw/nvme/trace-events
index fccb79f48973..45dd708bd2fa 100644
--- a/hw/nvme/trace-events
+++ b/hw/nvme/trace-events
@@ -104,6 +104,10 @@ pci_nvme_mmio_shutdown_set(void) "shutdown bit set"
 pci_nvme_mmio_shutdown_cleared(void) "shutdown bit cleared"
 pci_nvme_shadow_doorbell_cq(uint16_t cqid, uint16_t new_shadow_doorbell) "cqid 
%"PRIu16" new_shadow_doorbell %"PRIu16""
 pci_nvme_shadow_doorbell_sq(uint16_t sqid, uint16_t new_shadow_doorbell) "sqid 
%"PRIu16" new_shadow_doorbell %"PRIu16""
+pci_nvme_sq_notify(uint16_t sqid) "sqid %"PRIu16""
+pci_nvme_cq_notify(uint16_t cqid) "cqid %"PRIu16""
+pci_nvme_process_sq(uint16_t sqid) "sqid %"PRIu16""
+pci_nvme_post_cqes(uint16_t cqid) "cqid %"PRIu16""
 pci_nvme_open_zone(uint64_t slba, uint32_t zone_idx, int all) "open zone, 
slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32""
 pci_nvme_close_zone(uint64_t slba, uint32_t zone_idx, int all) "close zone, 
slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32""
 pci_nvme_finish_zone(uint64_t slba, uint32_t zone_idx, int all) "finish zone, 
slba=%"PRIu64", idx=%"PRIu32", all=%"PRIi32""
-- 
2.36.1




Re: [PULL 00/35] Block patches

2022-07-13 Thread Peter Maydell
On Tue, 12 Jul 2022 at 19:10, Hanna Reitz  wrote:
>
> The following changes since commit 9548cbed4253e38570d29b8cff0bf77c998f:
>
>   iotests/copy-before-write: specify required_fmts (2022-07-12 13:21:02 +0530)
>
> are available in the Git repository at:
>
>   https://gitlab.com/hreitz/qemu.git tags/pull-block-2022-07-12
>
> for you to fetch changes up to 85c4bf8aa6c93c24876e8870ae7cf8ab2e5a96cf:
>
>   vl: Unlink absolute PID file path (2022-07-12 14:31:15 +0200)
>
> 
> Block patches:
> - Refactoring for non-coroutine variants of bdrv/blk_co_* functions:
>   Auto-generate more of them with the block coroutine wrapper generator
>   script
> - iotest fixes
> - Both for the storage daemon and the system emulator: Fix PID file
>   handling when daemonizing (store the absolute path and delete that on
>   exit, which is necessary because daemonizing will change the working
>   directory to /)
>
> 



Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.1
for any user-visible changes.

-- PMM



Re: [PATCH] hw/nvme: Add options to override hardcoded values

2022-07-13 Thread Keith Busch
On Wed, Jul 13, 2022 at 09:11:41PM +0200, Mauricio Sandt wrote:
> On 13/07/2022 20:48, Keith Busch wrote:
> > I guess I'm missing the bigger picture here. You are supposed to be able to
> > retrieve these fields with ioctl's, so not sure what this has to do with
> > malware. Why does the firmware revision matter to this program?
> Oh I'm sorry, I forgot to explain properly. Malware usually checks if it is
> being run in a sandbox environment like a VM, and if it detects such a
> sandbox, it doesn't run or doesn't unleash its full potential. This makes my
> life as a researcher much harder.
> 
> Hiding the VM by overriding the model, firmware, and nqn strings to either
> random values or names of existing hardware in the hypervisor is a much
> cleaner solution than intercepting the IOCTLs in the VM and changing the
> result with a kernel driver.

IIUC, this program is trying to avoid being studied, and uses indicators like
nvme firmware to help determine if it is running in such an environment. If so,
I suspect defeating all possible indicators will be a fun and time consuming
process. :)



Re: [PATCH] hw/nvme: Add options to override hardcoded values

2022-07-13 Thread Mauricio Sandt

On 13/07/2022 20:48, Keith Busch wrote:

I guess I'm missing the bigger picture here. You are supposed to be able to
retrieve these fields with ioctl's, so not sure what this has to do with
malware. Why does the firmware revision matter to this program?

Oh I'm sorry, I forgot to explain properly. Malware usually checks if it is
being run in a sandbox environment like a VM, and if it detects such a
sandbox, it doesn't run or doesn't unleash its full potential. This makes my
life as a researcher much harder.

Hiding the VM by overriding the model, firmware, and nqn strings to either
random values or names of existing hardware in the hypervisor is a much
cleaner solution than intercepting the IOCTLs in the VM and changing the
result with a kernel driver.



Older qemu nvme wasn't reliably generating unique identifiers, so linux quirks
them to ignore. They are reliable now, so the quirk can be changed to firmware
specific when the version number updates with the next release.

If I understand you correctly, that means that changing the model and
firmware version to something that doesn't identify the device as a
qemu device would work just fine provided that you're running a recent
qemu version?



Re: [PATCH] hw/nvme: Add options to override hardcoded values

2022-07-13 Thread Keith Busch
On Wed, Jul 13, 2022 at 08:06:26PM +0200, Mauricio Sandt wrote:
> My specific use case that required this patch is a piece of malware that used
> several IOCTLs to read model, firmware, and nqn from the NVMe attached to the
> VM. Modifying that info at the hypervisor level was a much better approach
> than loading an (unsigned) driver into windows to intercept said IOCTLs.
> Having this patch in the official qemu version would help me a lot in my test
> lab, and I'm pretty sure it would also help other people.

I guess I'm missing the bigger picture here. You are supposed to be able to
retrieve these fields with ioctl's, so not sure what this has to do with
malware. Why does the firmware revision matter to this program?
 
> I guess there could be a warning about potential problems with drivers in the
> description for model, firmware, and nqn, but I haven't experienced any
> issues when changing the values (for all of them). If you encountered any
> problems, how did they manifest?

Older qemu nvme wasn't reliably generating unique identifiers, so linux quirks
them to ignore. They are reliable now, so the quirk can be changed to firmware
specific when the version number updates with the next release.



Re: [PATCH] hw/nvme: Add options to override hardcoded values

2022-07-13 Thread Mauricio Sandt

I want to argue the other way around. Why shouldn't those values
be tunable by the user? You are right; if misconfigured, it could 
potentially

break stuff on the driver side, but unless you manually set values for model
and firmware, the default is used (just like it is now), so this patch
wouldn't break any existing setups.
In my opinion, having more freedom in the values being used for model names
and similar is much better than hardcoding values. The previously hardcoded
values should remain the default if no custom value is provided.
My specific use case that required this patch is a piece of malware that 
used

several IOCTLs to read model, firmware, and nqn from the NVMe attached to
the VM. Modifying that info at the hypervisor level was a much better 
approach

than loading an (unsigned) driver into windows to intercept said IOCTLs.
Having this patch in the official qemu version would help me a lot in my 
test

lab, and I'm pretty sure it would also help other people.

I guess there could be a warning about potential problems with drivers 
in the
description for model, firmware, and nqn, but I haven't experienced any 
issues

when changing the values (for all of them). If you encountered any problems,
how did they manifest?

On 13/07/2022 19:03, Keith Busch wrote:

On Sun, Jun 12, 2022 at 12:35:09AM +0200, Mauricio Sandt wrote:

This small patch is the result of some recent malware research I did
in a QEMU VM. The malware used multiple ways of querying info from
the VM disk and I needed a clean way to change those values from the
hypervisor.

I believe this functionality could be useful to more people from multiple
fields, sometimes you just want to change some default values and having
them hardcoded in the sourcecode makes that much harder.

This patch adds three config parameters to the nvme device, all of them
are optional to not break anything. If any of them are not specified,
the previous (before this patch) default is used.

-model - This takes a string and sets it as the devices model name.
If you don't specify this parameter, the default is "QEMU NVMe Ctrl".

-firmware - The firmware version string, max 8 ascii characters.
The default is whatever `QEMU_VERSION` evaluates to.

-nqn_override - Allows to set a custom nqn on the nvme device.
Only used if there is no subsystem. This string should be in the same
format as the default "nqn.2019-08.org.qemu:...", but there is no
validation for that. Its up to the user to provide a valid string.

I guess the nqn can be user tunable just like it is when used with subsystems,
but what's the point of messing with model and firmware? That could mess with
host drivers' ability to detect what quirks it needs to apply to specific
instances of this virtual controller.





Re: [PATCH] hw/nvme: Add options to override hardcoded values

2022-07-13 Thread Keith Busch
On Sun, Jun 12, 2022 at 12:35:09AM +0200, Mauricio Sandt wrote:
> This small patch is the result of some recent malware research I did
> in a QEMU VM. The malware used multiple ways of querying info from
> the VM disk and I needed a clean way to change those values from the
> hypervisor.
> 
> I believe this functionality could be useful to more people from multiple
> fields, sometimes you just want to change some default values and having
> them hardcoded in the sourcecode makes that much harder.
> 
> This patch adds three config parameters to the nvme device, all of them
> are optional to not break anything. If any of them are not specified,
> the previous (before this patch) default is used.
> 
> -model - This takes a string and sets it as the devices model name.
> If you don't specify this parameter, the default is "QEMU NVMe Ctrl".
> 
> -firmware - The firmware version string, max 8 ascii characters.
> The default is whatever `QEMU_VERSION` evaluates to.
> 
> -nqn_override - Allows to set a custom nqn on the nvme device.
> Only used if there is no subsystem. This string should be in the same
> format as the default "nqn.2019-08.org.qemu:...", but there is no
> validation for that. Its up to the user to provide a valid string.

I guess the nqn can be user tunable just like it is when used with subsystems,
but what's the point of messing with model and firmware? That could mess with
host drivers' ability to detect what quirks it needs to apply to specific
instances of this virtual controller.



Ping: [PATCH] hw/nvme: Add options to override hardcoded values

2022-07-13 Thread Mauricio Sandt

https://patchew.org/QEMU/20220611223509.32280-1-mauri...@mailbox.org/
https://lore.kernel.org/qemu-devel/20220611223509.32280-1-mauri...@mailbox.org/

On 12/06/2022 00:35, Mauricio Sandt wrote:

This small patch is the result of some recent malware research I did
in a QEMU VM. The malware used multiple ways of querying info from
the VM disk and I needed a clean way to change those values from the
hypervisor.

I believe this functionality could be useful to more people from multiple
fields, sometimes you just want to change some default values and having
them hardcoded in the sourcecode makes that much harder.

This patch adds three config parameters to the nvme device, all of them
are optional to not break anything. If any of them are not specified,
the previous (before this patch) default is used.

-model - This takes a string and sets it as the devices model name.
If you don't specify this parameter, the default is "QEMU NVMe Ctrl".

-firmware - The firmware version string, max 8 ascii characters.
The default is whatever `QEMU_VERSION` evaluates to.

-nqn_override - Allows to set a custom nqn on the nvme device.
Only used if there is no subsystem. This string should be in the same
format as the default "nqn.2019-08.org.qemu:...", but there is no
validation for that. Its up to the user to provide a valid string.

Signed-off-by: Mauricio Sandt
---
  hw/nvme/ctrl.c | 16 +---
  hw/nvme/nvme.h |  3 +++
  2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index 1e6e0fcad9..0e67217a63 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -6697,8 +6697,13 @@ static void nvme_init_subnqn(NvmeCtrl *n)
  NvmeIdCtrl *id = &n->id_ctrl;
  
  if (!subsys) {

-snprintf((char *)id->subnqn, sizeof(id->subnqn),
+if (n->params.nqn_override) {
+snprintf((char *)id->subnqn, sizeof(id->subnqn),
+ "%s", n->params.nqn_override);
+} else {
+snprintf((char *)id->subnqn, sizeof(id->subnqn),
   "nqn.2019-08.org.qemu:%s", n->params.serial);
+}
  } else {
  pstrcpy((char *)id->subnqn, sizeof(id->subnqn), 
(char*)subsys->subnqn);
  }
@@ -6712,8 +6717,10 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice 
*pci_dev)
  
  id->vid = cpu_to_le16(pci_get_word(pci_conf + PCI_VENDOR_ID));

  id->ssvid = cpu_to_le16(pci_get_word(pci_conf + PCI_SUBSYSTEM_VENDOR_ID));
-strpadcpy((char *)id->mn, sizeof(id->mn), "QEMU NVMe Ctrl", ' ');
-strpadcpy((char *)id->fr, sizeof(id->fr), QEMU_VERSION, ' ');
+strpadcpy((char *)id->mn, sizeof(id->mn),
+n->params.model ? n->params.model : "QEMU NVMe Ctrl", ' ');
+strpadcpy((char *)id->fr, sizeof(id->fr),
+n->params.firmware ? n->params.firmware : QEMU_VERSION, ' ');
  strpadcpy((char *)id->sn, sizeof(id->sn), n->params.serial, ' ');
  
  id->cntlid = cpu_to_le16(n->cntlid);

@@ -6913,6 +6920,9 @@ static Property nvme_props[] = {
  DEFINE_PROP_LINK("subsys", NvmeCtrl, subsys, TYPE_NVME_SUBSYS,
   NvmeSubsystem *),
  DEFINE_PROP_STRING("serial", NvmeCtrl, params.serial),
+DEFINE_PROP_STRING("model", NvmeCtrl, params.model),
+DEFINE_PROP_STRING("nqn_override", NvmeCtrl, params.nqn_override),
+DEFINE_PROP_STRING("firmware", NvmeCtrl, params.firmware),
  DEFINE_PROP_UINT32("cmb_size_mb", NvmeCtrl, params.cmb_size_mb, 0),
  DEFINE_PROP_UINT32("num_queues", NvmeCtrl, params.num_queues, 0),
  DEFINE_PROP_UINT32("max_ioqpairs", NvmeCtrl, params.max_ioqpairs, 64),
diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index e41771604f..45bcf3e02e 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -394,6 +394,9 @@ typedef struct NvmeCQueue {
  
  typedef struct NvmeParams {

  char *serial;
+char *model;
+char *firmware;
+char *nqn_override;
  uint32_t num_queues; /* deprecated since 5.1 */
  uint32_t max_ioqpairs;
  uint16_t msix_qsize;


Re: [RFC v3 3/8] block: pass size to bdrv_unregister_buf()

2022-07-13 Thread Hanna Reitz

On 08.07.22 06:17, Stefan Hajnoczi wrote:

The only implementor of bdrv_register_buf() is block/nvme.c, where the
size is not needed when unregistering a buffer. This is because
util/vfio-helpers.c can look up mappings by address.

Future block drivers that implement bdrv_register_buf() may not be able
to do their job given only the buffer address. Add a size argument to
bdrv_unregister_buf().

Also document the assumptions about
bdrv_register_buf()/bdrv_unregister_buf() calls. The same 
values that were given to bdrv_register_buf() must be given to
bdrv_unregister_buf().

gcc 11.2.1 emits a spurious warning that img_bench()'s buf_size local
variable might be uninitialized, so it's necessary to silence the
compiler.

Signed-off-by: Stefan Hajnoczi 
---
  include/block/block-global-state.h  | 5 -
  include/block/block_int-common.h| 2 +-
  include/sysemu/block-backend-global-state.h | 2 +-
  block/block-backend.c   | 4 ++--
  block/io.c  | 6 +++---
  block/nvme.c| 2 +-
  qemu-img.c  | 4 ++--
  7 files changed, 14 insertions(+), 11 deletions(-)


Reviewed-by: Hanna Reitz 




Re: [RFC v3 1/8] blkio: add io_uring block driver using libblkio

2022-07-13 Thread Hanna Reitz

On 08.07.22 06:17, Stefan Hajnoczi wrote:

libblkio (https://gitlab.com/libblkio/libblkio/) is a library for
high-performance disk I/O. It currently supports io_uring and
virtio-blk-vhost-vdpa with additional drivers under development.

One of the reasons for developing libblkio is that other applications
besides QEMU can use it. This will be particularly useful for
vhost-user-blk which applications may wish to use for connecting to
qemu-storage-daemon.

libblkio also gives us an opportunity to develop in Rust behind a C API
that is easy to consume from QEMU.

This commit adds io_uring and virtio-blk-vhost-vdpa BlockDrivers to QEMU
using libblkio. It will be easy to add other libblkio drivers since they
will share the majority of code.

For now I/O buffers are copied through bounce buffers if the libblkio
driver requires it. Later commits add an optimization for
pre-registering guest RAM to avoid bounce buffers.

The syntax is:

   --blockdev 
io_uring,node-name=drive0,filename=test.img,readonly=on|off,cache.direct=on|off

and:

   --blockdev 
virtio-blk-vhost-vdpa,node-name=drive0,path=/dev/vdpa...,readonly=on|off

Signed-off-by: Stefan Hajnoczi 
---
  MAINTAINERS   |   6 +
  meson_options.txt |   2 +
  qapi/block-core.json  |  37 +-
  meson.build   |   9 +
  block/blkio.c | 659 ++
  tests/qtest/modules-test.c|   3 +
  block/meson.build |   1 +
  scripts/meson-buildoptions.sh |   3 +
  8 files changed, 718 insertions(+), 2 deletions(-)
  create mode 100644 block/blkio.c


[...]


diff --git a/block/blkio.c b/block/blkio.c
new file mode 100644
index 00..7fbdbd7fae
--- /dev/null
+++ b/block/blkio.c
@@ -0,0 +1,659 @@


Not sure whether it’s necessary, but I would have expected a copyright 
header here.



+#include "qemu/osdep.h"
+#include 
+#include "block/block_int.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qdict.h"
+#include "qemu/module.h"
+
+typedef struct BlkAIOCB {
+BlockAIOCB common;
+struct blkio_mem_region mem_region;
+QEMUIOVector qiov;
+struct iovec bounce_iov;
+} BlkioAIOCB;
+
+typedef struct {
+/* Protects ->blkio and request submission on ->blkioq */
+QemuMutex lock;
+
+struct blkio *blkio;
+struct blkioq *blkioq; /* this could be multi-queue in the future */
+int completion_fd;
+
+/* Polling fetches the next completion into this field */
+struct blkio_completion poll_completion;
+
+/* The value of the "mem-region-alignment" property */
+size_t mem_region_alignment;
+
+/* Can we skip adding/deleting blkio_mem_regions? */
+bool needs_mem_regions;
+} BDRVBlkioState;
+
+static void blkio_aiocb_complete(BlkioAIOCB *acb, int ret)
+{
+/* Copy bounce buffer back to qiov */
+if (acb->qiov.niov > 0) {
+qemu_iovec_from_buf(&acb->qiov, 0,
+acb->bounce_iov.iov_base,
+acb->bounce_iov.iov_len);
+qemu_iovec_destroy(&acb->qiov);
+}
+
+acb->common.cb(acb->common.opaque, ret);
+
+if (acb->mem_region.len > 0) {
+BDRVBlkioState *s = acb->common.bs->opaque;
+
+WITH_QEMU_LOCK_GUARD(&s->lock) {
+blkio_free_mem_region(s->blkio, &acb->mem_region);
+}
+}
+
+qemu_aio_unref(&acb->common);
+}
+
+/*
+ * Only the thread that calls aio_poll() invokes fd and poll handlers.
+ * Therefore locks are not necessary except when accessing s->blkio.
+ *
+ * No locking is performed around blkioq_get_completions() although other
+ * threads may submit I/O requests on s->blkioq. We're assuming there is no
+ * inteference between blkioq_get_completions() and other s->blkioq APIs.
+ */
+
+static void blkio_completion_fd_read(void *opaque)
+{
+BlockDriverState *bs = opaque;
+BDRVBlkioState *s = bs->opaque;
+struct blkio_completion completion;
+uint64_t val;
+ssize_t ret __attribute__((unused));


I’d prefer a `(void)ret;` over this attribute, not least because that 
line would give a nice opportunity to explain in a short comment why we 
ignore this return value that the compiler tells us not to ignore, but 
if you don’t, then this’ll be fine.



+
+/* Polling may have already fetched a completion */
+if (s->poll_completion.user_data != NULL) {
+completion = s->poll_completion;
+
+/* Clear it in case blkio_aiocb_complete() has a nested event loop */
+s->poll_completion.user_data = NULL;
+
+blkio_aiocb_complete(completion.user_data, completion.ret);
+}
+
+/* Reset completion fd status */
+ret = read(s->completion_fd, &val, sizeof(val));
+
+/*
+ * Reading one completion at a time makes nested event loop re-entrancy
+ * simple. Change this loop to get multiple completions in one go if it
+ * becomes a performance bottleneck.
+ */
+while (blkioq_do_io(s->blkioq, &completion, 0, 1, NULL) == 1) {
+blkio_aiocb_complete(completion.user_data, complet