Re: x-blockdev-reopen & block-dirty-bitmaps

2020-02-14 Thread John Snow



On 2/14/20 3:19 PM, Kevin Wolf wrote:
> Am 14.02.2020 um 19:54 hat John Snow geschrieben:
>> Hi, what work remains to make this a stable interface, is it known?
>>
>> We're having a problem with bitmaps where in some cases libvirt wants to
>> commit an image back down to a base image but -- for various reasons --
>> the bitmap was left enabled in the backing image, so it would accrue new
>> writes during the commit.
>>
>> Normally, when creating a snapshot using blockdev-snapshot, the backing
>> file becomes RO and all of the bitmaps become RO too.
>>
>> The goal is to be able to disable (or enable) bitmaps from a backing
>> file before (or atomically just before) a commit operation to allow
>> libvirt greater control on snapshot commit.
>>
>> Now, in my own testing, we can reopen a backing file just fine, delete
>> or disable a bitmap and be done with it -- but the interface isn't
>> stable, so libvirt will be reluctant to use such tricks.
>>
>> Probably a loaded question, but:
>>
>> - What's needed to make the interface stable?
>> - Are there known problem points?
>> - Any suggestions for workarounds in the meantime?
> 
> I think I've asked this before, but I don't remember the answer...
> 
> What would be the problem with letting the enable/disable command
> temporarily reopen the backing file read-write, like the commit job [1]
> does?
> 

I guess no problem? I wouldn't want it to do this automatically, but
perhaps we could add a "force=True" bool where it tries to do just this.

(And once reopen works properly we could deprecate this workaround again.)

> Kevin
> 
> [1] I mean, I know that this code is wrong strictly speaking because we
> really should be counting read-write users [2] rather than blindly
> making the node read-only at the end of the operation - but somehow
> it seems to work in practice for commit jobs.
> 
> [2] Counting really means just looking at parent BdrvChild links that
> have WRITE permissions. I guess doing it right would mean getting
> rid of BlockDriverState.read_only (which is redundant) and using
> only permissions.
> 

OK, sounds good. I'll make a mockup that tries to accurately detect
read-only-ness and reverts changes only if it made any to begin with.

--js




Re: x-blockdev-reopen & block-dirty-bitmaps

2020-02-14 Thread Kevin Wolf
Am 14.02.2020 um 19:54 hat John Snow geschrieben:
> Hi, what work remains to make this a stable interface, is it known?
> 
> We're having a problem with bitmaps where in some cases libvirt wants to
> commit an image back down to a base image but -- for various reasons --
> the bitmap was left enabled in the backing image, so it would accrue new
> writes during the commit.
> 
> Normally, when creating a snapshot using blockdev-snapshot, the backing
> file becomes RO and all of the bitmaps become RO too.
> 
> The goal is to be able to disable (or enable) bitmaps from a backing
> file before (or atomically just before) a commit operation to allow
> libvirt greater control on snapshot commit.
> 
> Now, in my own testing, we can reopen a backing file just fine, delete
> or disable a bitmap and be done with it -- but the interface isn't
> stable, so libvirt will be reluctant to use such tricks.
> 
> Probably a loaded question, but:
> 
> - What's needed to make the interface stable?
> - Are there known problem points?
> - Any suggestions for workarounds in the meantime?

I think I've asked this before, but I don't remember the answer...

What would be the problem with letting the enable/disable command
temporarily reopen the backing file read-write, like the commit job [1]
does?

Kevin

[1] I mean, I know that this code is wrong strictly speaking because we
really should be counting read-write users [2] rather than blindly
making the node read-only at the end of the operation - but somehow
it seems to work in practice for commit jobs.

[2] Counting really means just looking at parent BdrvChild links that
have WRITE permissions. I guess doing it right would mean getting
rid of BlockDriverState.read_only (which is redundant) and using
only permissions.




[PATCH 6/7] commit: Expose on-error option in QMP

2020-02-14 Thread Kevin Wolf
Now that the error handling in the common block job is fixed, we can
expose the on-error option in QMP instead of hard-coding it as 'report'
in qmp_block_commit().

This fulfills the promise that the old comment in that function made,
even if a bit later than expected: "This will be part of the QMP
command, if/when the BlockdevOnError change for blkmirror makes it in".

Signed-off-by: Kevin Wolf 
---
 qapi/block-core.json | 4 
 blockdev.c   | 8 
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 395d205fa8..c69aece46e 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1659,6 +1659,9 @@
 #
 # @speed:  the maximum speed, in bytes per second
 #
+# @on-error: the action to take on an error. 'ignore' means that the request
+#should be retried. (default: report; Since: 5.0)
+#
 # @filter-node-name: the node name that should be assigned to the
 #filter driver that the commit job inserts into the graph
 #above @top. If this option is not given, a node name is
@@ -1695,6 +1698,7 @@
   'data': { '*job-id': 'str', 'device': 'str', '*base-node': 'str',
 '*base': 'str', '*top-node': 'str', '*top': 'str',
 '*backing-file': 'str', '*speed': 'int',
+'*on-error': 'BlockdevOnError',
 '*filter-node-name': 'str',
 '*auto-finalize': 'bool', '*auto-dismiss': 'bool' } }
 
diff --git a/blockdev.c b/blockdev.c
index c6a727cca9..374189a426 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3471,6 +3471,7 @@ void qmp_block_commit(bool has_job_id, const char 
*job_id, const char *device,
   bool has_top, const char *top,
   bool has_backing_file, const char *backing_file,
   bool has_speed, int64_t speed,
+  bool has_on_error, BlockdevOnError on_error,
   bool has_filter_node_name, const char *filter_node_name,
   bool has_auto_finalize, bool auto_finalize,
   bool has_auto_dismiss, bool auto_dismiss,
@@ -3481,15 +3482,14 @@ void qmp_block_commit(bool has_job_id, const char 
*job_id, const char *device,
 BlockDriverState *base_bs, *top_bs;
 AioContext *aio_context;
 Error *local_err = NULL;
-/* This will be part of the QMP command, if/when the
- * BlockdevOnError change for blkmirror makes it in
- */
-BlockdevOnError on_error = BLOCKDEV_ON_ERROR_REPORT;
 int job_flags = JOB_DEFAULT;
 
 if (!has_speed) {
 speed = 0;
 }
+if (!has_on_error) {
+on_error = BLOCKDEV_ON_ERROR_REPORT;
+}
 if (!has_filter_node_name) {
 filter_node_name = NULL;
 }
-- 
2.20.1




[PATCH 4/7] commit: Inline commit_populate()

2020-02-14 Thread Kevin Wolf
commit_populate() is a very short function and only called in a single
place. Its return value doesn't tell us whether an error happened while
reading or writing, which would be necessary for sending the right data
in the BLOCK_JOB_ERROR QMP event.

Signed-off-by: Kevin Wolf 
---
 block/commit.c | 28 ++--
 1 file changed, 6 insertions(+), 22 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 8189f079d2..2992a1012f 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -43,27 +43,6 @@ typedef struct CommitBlockJob {
 char *backing_file_str;
 } CommitBlockJob;
 
-static int coroutine_fn commit_populate(BlockBackend *bs, BlockBackend *base,
-int64_t offset, uint64_t bytes,
-void *buf)
-{
-int ret = 0;
-
-assert(bytes < SIZE_MAX);
-
-ret = blk_co_pread(bs, offset, bytes, buf, 0);
-if (ret < 0) {
-return ret;
-}
-
-ret = blk_co_pwrite(base, offset, bytes, buf, 0);
-if (ret < 0) {
-return ret;
-}
-
-return 0;
-}
-
 static int commit_prepare(Job *job)
 {
 CommitBlockJob *s = container_of(job, CommitBlockJob, common.job);
@@ -178,7 +157,12 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
 copy = (ret == 1);
 trace_commit_one_iteration(s, offset, n, ret);
 if (copy) {
-ret = commit_populate(s->top, s->base, offset, n, buf);
+assert(n < SIZE_MAX);
+
+ret = blk_co_pread(s->top, offset, n, buf, 0);
+if (ret >= 0) {
+ret = blk_co_pwrite(s->base, offset, n, buf, 0);
+}
 }
 if (ret < 0) {
 BlockErrorAction action =
-- 
2.20.1




[PATCH 2/7] commit: Remove unused bytes_written

2020-02-14 Thread Kevin Wolf
The bytes_written variable is only ever written to, it serves no
purpose. This has actually been the case since the commit job was first
introduced in commit 747ff602636.

Signed-off-by: Kevin Wolf 
---
 block/commit.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 23c90b3b91..cce898a4f3 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -140,7 +140,6 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
 int ret = 0;
 int64_t n = 0; /* bytes */
 void *buf = NULL;
-int bytes_written = 0;
 int64_t len, base_len;
 
 ret = len = blk_getlength(s->top);
@@ -180,7 +179,6 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
 trace_commit_one_iteration(s, offset, n, ret);
 if (copy) {
 ret = commit_populate(s->top, s->base, offset, n, buf);
-bytes_written += n;
 }
 if (ret < 0) {
 BlockErrorAction action =
-- 
2.20.1




[PATCH 5/7] commit: Fix is_read for block_job_error_action()

2020-02-14 Thread Kevin Wolf
block_job_error_action() needs to know if reading from the top node or
writing to the base node failed so that it can set the right 'operation'
in the BLOCK_JOB_ERROR QMP event.

Signed-off-by: Kevin Wolf 
---
 block/commit.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/commit.c b/block/commit.c
index 2992a1012f..8e672799af 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -143,6 +143,7 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
 
 for (offset = 0; offset < len; offset += n) {
 bool copy;
+bool error_in_source = true;
 
 /* Note that even when no rate limit is applied we need to yield
  * with no pending I/O here so that bdrv_drain_all() returns.
@@ -162,11 +163,15 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
 ret = blk_co_pread(s->top, offset, n, buf, 0);
 if (ret >= 0) {
 ret = blk_co_pwrite(s->base, offset, n, buf, 0);
+if (ret < 0) {
+error_in_source = false;
+}
 }
 }
 if (ret < 0) {
 BlockErrorAction action =
-block_job_error_action(&s->common, s->on_error, false, -ret);
+block_job_error_action(&s->common, s->on_error,
+   error_in_source, -ret);
 if (action == BLOCK_ERROR_ACTION_REPORT) {
 goto out;
 } else {
-- 
2.20.1




[PATCH 7/7] iotests: Test error handling policies with block-commit

2020-02-14 Thread Kevin Wolf
This tests both read failure (from the top node) and write failure (to
the base node) for on-error=report/stop/ignore.

As block-commit actually starts two different types of block jobs
(mirror.c for committing the active later, commit.c for intermediate
layers), all tests are run for both cases.

Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/040 | 283 +
 tests/qemu-iotests/040.out |   4 +-
 2 files changed, 285 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index 2e7ee0e84f..32c82b4ec6 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -430,6 +430,289 @@ class TestReopenOverlay(ImageCommitTestCase):
 def test_reopen_overlay(self):
 self.run_commit_test(self.img1, self.img0)
 
+class TestErrorHandling(iotests.QMPTestCase):
+image_len = 2 * 1024 * 1024
+
+def setUp(self):
+iotests.create_image(backing_img, self.image_len)
+qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % 
backing_img, mid_img)
+qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % 
mid_img, test_img)
+
+qemu_io('-f', iotests.imgfmt, '-c', 'write -P 0x11 0 512k', mid_img)
+qemu_io('-f', iotests.imgfmt, '-c', 'write -P 0x22 0 512k', test_img)
+
+self.vm = iotests.VM()
+self.vm.launch()
+
+self.blkdebug_file = iotests.file_path("blkdebug.conf")
+
+def tearDown(self):
+self.vm.shutdown()
+os.remove(test_img)
+os.remove(mid_img)
+os.remove(backing_img)
+
+def blockdev_add(self, **kwargs):
+result = self.vm.qmp('blockdev-add', **kwargs)
+self.assert_qmp(result, 'return', {})
+
+def add_block_nodes(self, base_debug=None, mid_debug=None, top_debug=None):
+self.blockdev_add(node_name='base-file', driver='file',
+  filename=backing_img)
+self.blockdev_add(node_name='mid-file', driver='file',
+  filename=mid_img)
+self.blockdev_add(node_name='top-file', driver='file',
+  filename=test_img)
+
+if base_debug:
+self.blockdev_add(node_name='base-dbg', driver='blkdebug',
+  image='base-file', inject_error=base_debug)
+if mid_debug:
+self.blockdev_add(node_name='mid-dbg', driver='blkdebug',
+  image='mid-file', inject_error=mid_debug)
+if top_debug:
+self.blockdev_add(node_name='top-dbg', driver='blkdebug',
+  image='top-file', inject_error=top_debug)
+
+self.blockdev_add(node_name='base-fmt', driver='raw',
+  file=('base-dbg' if base_debug else 'base-file'))
+self.blockdev_add(node_name='mid-fmt', driver=iotests.imgfmt,
+  file=('mid-dbg' if mid_debug else 'mid-file'),
+  backing='base-fmt')
+self.blockdev_add(node_name='top-fmt', driver=iotests.imgfmt,
+  file=('top-dbg' if top_debug else 'top-file'),
+  backing='mid-fmt')
+
+def run_job(self, expected_events, error_pauses_job=False):
+match_device = {'data': {'device': 'job0'}}
+events = [
+('BLOCK_JOB_COMPLETED', match_device),
+('BLOCK_JOB_CANCELLED', match_device),
+('BLOCK_JOB_ERROR', match_device),
+('BLOCK_JOB_READY', match_device),
+]
+
+completed = False
+log = []
+while not completed:
+ev = self.vm.events_wait(events, timeout=5.0)
+if ev['event'] == 'BLOCK_JOB_COMPLETED':
+completed = True
+elif ev['event'] == 'BLOCK_JOB_ERROR':
+if error_pauses_job:
+result = self.vm.qmp('block-job-resume', device='job0')
+self.assert_qmp(result, 'return', {})
+elif ev['event'] == 'BLOCK_JOB_READY':
+result = self.vm.qmp('block-job-complete', device='job0')
+self.assert_qmp(result, 'return', {})
+else:
+self.fail("Unexpected event: %s" % ev)
+log.append(iotests.filter_qmp_event(ev))
+
+self.maxDiff = None
+self.assertEqual(expected_events, log)
+
+def event_error(self, op, action):
+return {
+'event': 'BLOCK_JOB_ERROR',
+'data': {'action': action, 'device': 'job0', 'operation': op},
+'timestamp': {'microseconds': 'USECS', 'seconds': 'SECS'}
+}
+
+def event_ready(self):
+return {
+'event': 'BLOCK_JOB_READY',
+'data': {'device': 'job0',
+ 'len': 524288,
+ 'offset': 524288,
+ 'speed': 0,
+ 'type': 'commit'},
+'timestamp': {'microseconds': 'USECS', 'seconds': 'SECS'},
+}

[PATCH 3/7] commit: Fix argument order for block_job_error_action()

2020-02-14 Thread Kevin Wolf
The block_job_error_action() error call in the commit job gives the
on_err and is_read arguments in the wrong order. Fix this.

(Of course, hard-coded is_read = false is wrong, too, but that's a
separate problem for a separate patch.)

Signed-off-by: Kevin Wolf 
---
 block/commit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/commit.c b/block/commit.c
index cce898a4f3..8189f079d2 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -182,7 +182,7 @@ static int coroutine_fn commit_run(Job *job, Error **errp)
 }
 if (ret < 0) {
 BlockErrorAction action =
-block_job_error_action(&s->common, false, s->on_error, -ret);
+block_job_error_action(&s->common, s->on_error, false, -ret);
 if (action == BLOCK_ERROR_ACTION_REPORT) {
 goto out;
 } else {
-- 
2.20.1




[PATCH 0/7] commit: Expose on-error option in QMP

2020-02-14 Thread Kevin Wolf
All other block jobs already provide a QMP option to control the error
handling policy. It's time to add it to commit, too.

Peter, I guess libvirt wants to expose this?

Nir, as you would probably assume, this is motivated by the recent
finding that VDSM has to preallocate the theoretical maximum for commit
operations because QEMU doesn't provide a way to reliably resize the
base image dynamically. If you ever want to improve that code, this will
enable you to do so from the QEMU side.

Kevin Wolf (7):
  qapi: Document meaning of 'ignore' BlockdevOnError for jobs
  commit: Remove unused bytes_written
  commit: Fix argument order for block_job_error_action()
  commit: Inline commit_populate()
  commit: Fix is_read for block_job_error_action()
  commit: Expose on-error option in QMP
  iotests: Test error handling policies with block-commit

 qapi/block-core.json   |   9 +-
 block/commit.c |  37 ++---
 blockdev.c |   8 +-
 tests/qemu-iotests/040 | 283 +
 tests/qemu-iotests/040.out |   4 +-
 5 files changed, 309 insertions(+), 32 deletions(-)

-- 
2.20.1




[PATCH 1/7] qapi: Document meaning of 'ignore' BlockdevOnError for jobs

2020-02-14 Thread Kevin Wolf
It is not obvious what 'ignore' actually means for block jobs: It could
be continuing the job and returning success in the end despite the error
(no block job does this). It could also mean continuing and returning
failure in the end (this is what stream does). And it can mean retrying
the failed request later (this is what backup, commit and mirror do).

This (somewhat inconsistent) behaviour was introduced and described for
stream and mirror in commit ae586d6158. backup and commit were
introduced later and use the same model as mirror.

Signed-off-by: Kevin Wolf 
---
 qapi/block-core.json | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index ef94a29686..395d205fa8 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1164,7 +1164,10 @@
 #  for jobs, cancel the job
 #
 # @ignore: ignore the error, only report a QMP event (BLOCK_IO_ERROR
-#  or BLOCK_JOB_ERROR)
+#  or BLOCK_JOB_ERROR). The backup, mirror and commit block jobs retry
+#  the failing request later and may still complete successfully. The
+#  stream block job continues to stream and will complete with an
+#  error.
 #
 # @enospc: same as @stop on ENOSPC, same as @report otherwise.
 #
-- 
2.20.1




x-blockdev-reopen & block-dirty-bitmaps

2020-02-14 Thread John Snow
Hi, what work remains to make this a stable interface, is it known?

We're having a problem with bitmaps where in some cases libvirt wants to
commit an image back down to a base image but -- for various reasons --
the bitmap was left enabled in the backing image, so it would accrue new
writes during the commit.

Normally, when creating a snapshot using blockdev-snapshot, the backing
file becomes RO and all of the bitmaps become RO too.

The goal is to be able to disable (or enable) bitmaps from a backing
file before (or atomically just before) a commit operation to allow
libvirt greater control on snapshot commit.

Now, in my own testing, we can reopen a backing file just fine, delete
or disable a bitmap and be done with it -- but the interface isn't
stable, so libvirt will be reluctant to use such tricks.

Probably a loaded question, but:

- What's needed to make the interface stable?
- Are there known problem points?
- Any suggestions for workarounds in the meantime?


I'm wary of workarounds, because I'd rather do it the right way.

That said, here's an absolutely awful workaround I thought of that I
think everyone will hate:

- Allow bitmap commands like "enable" and "disable" to be "queued" when
applied to readonly bitmaps ...
- On reopen, apply queued bitmap commands.

Eh, this is bad because it's basically creating a command that we will
immediately deprecate as soon as x-blockdev-reopen is formalized.

All ears for better solutions, though.

--js




[PATCH 4/5] aio-posix: make AioHandler deletion O(1)

2020-02-14 Thread Stefan Hajnoczi
It is not necessary to scan all AioHandlers for deletion.  Keep a list
of deleted handlers instead of scanning the full list of all handlers.

The AioHandler->deleted field can be dropped.  Let's check if the
handler has been inserted into the deleted list instead.  Add a new
QLIST_IS_INSERTED() API for this check.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/aio.h  |  6 -
 include/qemu/queue.h |  3 +++
 util/aio-posix.c | 53 +---
 3 files changed, 43 insertions(+), 19 deletions(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 7ba9bd7874..1a0de1508c 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -42,6 +42,7 @@ void qemu_aio_unref(void *p);
 void qemu_aio_ref(void *p);
 
 typedef struct AioHandler AioHandler;
+typedef QLIST_HEAD(, AioHandler) AioHandlerList;
 typedef void QEMUBHFunc(void *opaque);
 typedef bool AioPollFn(void *opaque);
 typedef void IOHandler(void *opaque);
@@ -58,7 +59,10 @@ struct AioContext {
 QemuRecMutex lock;
 
 /* The list of registered AIO handlers.  Protected by ctx->list_lock. */
-QLIST_HEAD(, AioHandler) aio_handlers;
+AioHandlerList aio_handlers;
+
+/* The list of AIO handlers to be deleted.  Protected by ctx->list_lock. */
+AioHandlerList deleted_aio_handlers;
 
 /* Used to avoid unnecessary event_notifier_set calls in aio_notify;
  * accessed with atomic primitives.  If this field is 0, everything
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index a276363372..699a8a0568 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -158,6 +158,9 @@ struct {
\
 }   \
 } while (/*CONSTCOND*/0)
 
+/* Is elm in a list? */
+#define QLIST_IS_INSERTED(elm, field) ((elm)->field.le_prev != NULL)
+
 #define QLIST_FOREACH(var, head, field) \
 for ((var) = ((head)->lh_first);\
 (var);  \
diff --git a/util/aio-posix.c b/util/aio-posix.c
index b21bcd8e97..3a98a2acb9 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -31,10 +31,10 @@ struct AioHandler
 AioPollFn *io_poll;
 IOHandler *io_poll_begin;
 IOHandler *io_poll_end;
-int deleted;
 void *opaque;
 bool is_external;
 QLIST_ENTRY(AioHandler) node;
+QLIST_ENTRY(AioHandler) node_deleted;
 };
 
 #ifdef CONFIG_EPOLL_CREATE1
@@ -67,7 +67,7 @@ static bool aio_epoll_try_enable(AioContext *ctx)
 
 QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
 int r;
-if (node->deleted || !node->pfd.events) {
+if (QLIST_IS_INSERTED(node, node_deleted) || !node->pfd.events) {
 continue;
 }
 event.events = epoll_events_from_pfd(node->pfd.events);
@@ -195,9 +195,11 @@ static AioHandler *find_aio_handler(AioContext *ctx, int 
fd)
 AioHandler *node;
 
 QLIST_FOREACH(node, &ctx->aio_handlers, node) {
-if (node->pfd.fd == fd)
-if (!node->deleted)
+if (node->pfd.fd == fd) {
+if (!QLIST_IS_INSERTED(node, node_deleted)) {
 return node;
+}
+}
 }
 
 return NULL;
@@ -216,7 +218,7 @@ static bool aio_remove_fd_handler(AioContext *ctx, 
AioHandler *node)
 
 /* If a read is in progress, just mark the node as deleted */
 if (qemu_lockcnt_count(&ctx->list_lock)) {
-node->deleted = 1;
+QLIST_INSERT_HEAD_RCU(&ctx->deleted_aio_handlers, node, node_deleted);
 node->pfd.revents = 0;
 return false;
 }
@@ -358,7 +360,7 @@ static void poll_set_started(AioContext *ctx, bool started)
 QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) {
 IOHandler *fn;
 
-if (node->deleted) {
+if (QLIST_IS_INSERTED(node, node_deleted)) {
 continue;
 }
 
@@ -415,6 +417,26 @@ bool aio_pending(AioContext *ctx)
 return result;
 }
 
+static void aio_free_deleted_handlers(AioContext *ctx)
+{
+AioHandler *node;
+
+if (QLIST_EMPTY_RCU(&ctx->deleted_aio_handlers)) {
+return;
+}
+if (!qemu_lockcnt_dec_if_lock(&ctx->list_lock)) {
+return; /* we are nested, let the parent do the freeing */
+}
+
+while ((node = QLIST_FIRST_RCU(&ctx->deleted_aio_handlers))) {
+QLIST_REMOVE(node, node);
+QLIST_REMOVE(node, node_deleted);
+g_free(node);
+}
+
+qemu_lockcnt_inc_and_unlock(&ctx->list_lock);
+}
+
 static bool aio_dispatch_handlers(AioContext *ctx)
 {
 AioHandler *node, *tmp;
@@ -426,7 +448,7 @@ static bool aio_dispatch_handlers(AioContext *ctx)
 revents = node->pfd.revents & node->pfd.events;
 node->pfd.revents = 0;
 
-if (!node->deleted &&
+if (!QLIST_IS_INSERTED(node, node_deleted) &&
 (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&

[PATCH 5/5] aio-posix: make AioHandler dispatch O(1) with epoll

2020-02-14 Thread Stefan Hajnoczi
File descriptor monitoring is O(1) with epoll(7), but
aio_dispatch_handlers() still scans all AioHandlers instead of
dispatching just those that are ready.  This makes aio_poll() O(n) with
respect to the total number of registered handlers.

Add a local ready_list to aio_poll() so that each nested aio_poll()
builds a list of handlers ready to be dispatched.  Since file descriptor
polling is level-triggered, nested aio_poll() calls also see fds that
were ready in the parent but not yet dispatched.  This guarantees that
nested aio_poll() invocations will dispatch all fds, even those that
became ready before the nested invocation.

Since only handlers ready to be dispatched are placed onto the
ready_list, the new aio_dispatch_ready_handlers() function provides O(1)
dispatch.

Note that AioContext polling is still O(n) and currently cannot be fully
disabled.  This still needs to be fixed before aio_poll() is fully O(1).

Signed-off-by: Stefan Hajnoczi 
---
 util/aio-posix.c | 106 +--
 1 file changed, 76 insertions(+), 30 deletions(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index 3a98a2acb9..dc33ca08a6 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -34,6 +34,7 @@ struct AioHandler
 void *opaque;
 bool is_external;
 QLIST_ENTRY(AioHandler) node;
+QLIST_ENTRY(AioHandler) node_ready; /* only used during aio_poll() */
 QLIST_ENTRY(AioHandler) node_deleted;
 };
 
@@ -104,7 +105,18 @@ static void aio_epoll_update(AioContext *ctx, AioHandler 
*node, bool is_new)
 }
 }
 
-static int aio_epoll(AioContext *ctx, int64_t timeout)
+/* Add a handler to a ready list */
+static void add_ready_handler(AioHandlerList *ready_list,
+  AioHandler *node,
+  int revents)
+{
+QLIST_SAFE_REMOVE(node, node_ready); /* remove from nested parent's list */
+node->pfd.revents = revents;
+QLIST_INSERT_HEAD(ready_list, node, node_ready);
+}
+
+static int aio_epoll(AioContext *ctx, AioHandlerList *ready_list,
+ int64_t timeout)
 {
 GPollFD pfd = {
 .fd = ctx->epollfd,
@@ -129,11 +141,13 @@ static int aio_epoll(AioContext *ctx, int64_t timeout)
 }
 for (i = 0; i < ret; i++) {
 int ev = events[i].events;
+int revents = (ev & EPOLLIN ? G_IO_IN : 0) |
+  (ev & EPOLLOUT ? G_IO_OUT : 0) |
+  (ev & EPOLLHUP ? G_IO_HUP : 0) |
+  (ev & EPOLLERR ? G_IO_ERR : 0);
+
 node = events[i].data.ptr;
-node->pfd.revents = (ev & EPOLLIN ? G_IO_IN : 0) |
-(ev & EPOLLOUT ? G_IO_OUT : 0) |
-(ev & EPOLLHUP ? G_IO_HUP : 0) |
-(ev & EPOLLERR ? G_IO_ERR : 0);
+add_ready_handler(ready_list, node, revents);
 }
 }
 out:
@@ -437,36 +451,63 @@ static void aio_free_deleted_handlers(AioContext *ctx)
 qemu_lockcnt_inc_and_unlock(&ctx->list_lock);
 }
 
-static bool aio_dispatch_handlers(AioContext *ctx)
+static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node)
 {
-AioHandler *node, *tmp;
 bool progress = false;
+int revents;
 
-QLIST_FOREACH_SAFE_RCU(node, &ctx->aio_handlers, node, tmp) {
-int revents;
+revents = node->pfd.revents & node->pfd.events;
+node->pfd.revents = 0;
 
-revents = node->pfd.revents & node->pfd.events;
-node->pfd.revents = 0;
+if (!QLIST_IS_INSERTED(node, node_deleted) &&
+(revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&
+aio_node_check(ctx, node->is_external) &&
+node->io_read) {
+node->io_read(node->opaque);
 
-if (!QLIST_IS_INSERTED(node, node_deleted) &&
-(revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) &&
-aio_node_check(ctx, node->is_external) &&
-node->io_read) {
-node->io_read(node->opaque);
-
-/* aio_notify() does not count as progress */
-if (node->opaque != &ctx->notifier) {
-progress = true;
-}
-}
-if (!QLIST_IS_INSERTED(node, node_deleted) &&
-(revents & (G_IO_OUT | G_IO_ERR)) &&
-aio_node_check(ctx, node->is_external) &&
-node->io_write) {
-node->io_write(node->opaque);
+/* aio_notify() does not count as progress */
+if (node->opaque != &ctx->notifier) {
 progress = true;
 }
 }
+if (!QLIST_IS_INSERTED(node, node_deleted) &&
+(revents & (G_IO_OUT | G_IO_ERR)) &&
+aio_node_check(ctx, node->is_external) &&
+node->io_write) {
+node->io_write(node->opaque);
+progress = true;
+}
+
+return progress;
+}
+
+/*
+ * If we have a list of ready handlers then this is more efficient than
+ * scanning all handlers with aio_dispatch_handlers().
+ */
+static bool aio_dispatch_ready_handlers(AioContext *ctx,
+  

[PATCH 2/5] aio-posix: don't pass ns timeout to epoll_wait()

2020-02-14 Thread Stefan Hajnoczi
Don't pass the nanosecond timeout into epoll_wait(), which expects
milliseconds.

The epoll_wait() timeout value does not matter if qemu_poll_ns()
determined that the poll fd is ready, but passing a value in the wrong
units is still ugly.  Pass a 0 timeout to epoll_wait() instead.

Signed-off-by: Stefan Hajnoczi 
---
 util/aio-posix.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index 31a8e03ca7..b21bcd8e97 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -116,6 +116,9 @@ static int aio_epoll(AioContext *ctx, int64_t timeout)
 
 if (timeout > 0) {
 ret = qemu_poll_ns(&pfd, 1, timeout);
+if (ret > 0) {
+timeout = 0;
+}
 }
 if (timeout <= 0 || ret > 0) {
 ret = epoll_wait(ctx->epollfd, events,
-- 
2.24.1



[PATCH 3/5] qemu/queue.h: add QLIST_SAFE_REMOVE()

2020-02-14 Thread Stefan Hajnoczi
QLIST_REMOVE() assumes the element is in a list.  It also leaves the
element's linked list pointers dangling.

Introduce a safe version of QLIST_REMOVE() and convert open-coded
instances of this pattern.

Signed-off-by: Stefan Hajnoczi 
---
 block.c  |  5 +
 chardev/spice.c  |  4 +---
 include/qemu/queue.h | 14 ++
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index 9c810534d6..484e01d042 100644
--- a/block.c
+++ b/block.c
@@ -2499,10 +2499,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 
 static void bdrv_detach_child(BdrvChild *child)
 {
-if (child->next.le_prev) {
-QLIST_REMOVE(child, next);
-child->next.le_prev = NULL;
-}
+QLIST_SAFE_REMOVE(child, next);
 
 bdrv_replace_child(child, NULL);
 
diff --git a/chardev/spice.c b/chardev/spice.c
index 241e2b7770..bf7ea1e294 100644
--- a/chardev/spice.c
+++ b/chardev/spice.c
@@ -216,9 +216,7 @@ static void char_spice_finalize(Object *obj)
 
 vmc_unregister_interface(s);
 
-if (s->next.le_prev) {
-QLIST_REMOVE(s, next);
-}
+QLIST_SAFE_REMOVE(s, next);
 
 g_free((char *)s->sin.subtype);
 g_free((char *)s->sin.portname);
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 19425f973f..a276363372 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -144,6 +144,20 @@ struct {   
 \
 *(elm)->field.le_prev = (elm)->field.le_next;   \
 } while (/*CONSTCOND*/0)
 
+/*
+ * Like QLIST_REMOVE() but safe to call when elm is not in a list
+ */
+#define QLIST_SAFE_REMOVE(elm, field) do {  \
+if ((elm)->field.le_prev != NULL) { \
+if ((elm)->field.le_next != NULL)   \
+(elm)->field.le_next->field.le_prev =   \
+(elm)->field.le_prev;   \
+*(elm)->field.le_prev = (elm)->field.le_next;   \
+(elm)->field.le_next = NULL;\
+(elm)->field.le_prev = NULL;\
+}   \
+} while (/*CONSTCOND*/0)
+
 #define QLIST_FOREACH(var, head, field) \
 for ((var) = ((head)->lh_first);\
 (var);  \
-- 
2.24.1



[PATCH 1/5] aio-posix: fix use after leaving scope in aio_poll()

2020-02-14 Thread Stefan Hajnoczi
epoll_handler is a stack variable and must not be accessed after it goes
out of scope:

  if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) {
  AioHandler epoll_handler;
  ...
  add_pollfd(&epoll_handler);
  ret = aio_epoll(ctx, pollfds, npfd, timeout);
  } ...

  ...

  /* if we have any readable fds, dispatch event */
  if (ret > 0) {
  for (i = 0; i < npfd; i++) {
  nodes[i]->pfd.revents = pollfds[i].revents;
  }
  }

nodes[0] is &epoll_handler, which has already gone out of scope.

There is no need to use pollfds[] for epoll.  We don't need an
AioHandler for the epoll fd.

Signed-off-by: Stefan Hajnoczi 
---
 util/aio-posix.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/util/aio-posix.c b/util/aio-posix.c
index a4977f538e..31a8e03ca7 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -104,17 +104,18 @@ static void aio_epoll_update(AioContext *ctx, AioHandler 
*node, bool is_new)
 }
 }
 
-static int aio_epoll(AioContext *ctx, GPollFD *pfds,
- unsigned npfd, int64_t timeout)
+static int aio_epoll(AioContext *ctx, int64_t timeout)
 {
+GPollFD pfd = {
+.fd = ctx->epollfd,
+.events = G_IO_IN | G_IO_OUT | G_IO_HUP | G_IO_ERR,
+};
 AioHandler *node;
 int i, ret = 0;
 struct epoll_event events[128];
 
-assert(npfd == 1);
-assert(pfds[0].fd == ctx->epollfd);
 if (timeout > 0) {
-ret = qemu_poll_ns(pfds, npfd, timeout);
+ret = qemu_poll_ns(&pfd, 1, timeout);
 }
 if (timeout <= 0 || ret > 0) {
 ret = epoll_wait(ctx->epollfd, events,
@@ -658,13 +659,8 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
 /* wait until next event */
 if (aio_epoll_check_poll(ctx, pollfds, npfd, timeout)) {
-AioHandler epoll_handler;
-
-epoll_handler.pfd.fd = ctx->epollfd;
-epoll_handler.pfd.events = G_IO_IN | G_IO_OUT | G_IO_HUP | 
G_IO_ERR;
-npfd = 0;
-add_pollfd(&epoll_handler);
-ret = aio_epoll(ctx, pollfds, npfd, timeout);
+npfd = 0; /* pollfds[] is not being used */
+ret = aio_epoll(ctx, timeout);
 } else  {
 ret = qemu_poll_ns(pollfds, npfd, timeout);
 }
-- 
2.24.1



[PATCH 0/5] aio-posix: towards an O(1) event loop

2020-02-14 Thread Stefan Hajnoczi
This patch series makes AioHandler deletion and dispatch O(1) with respect to
the total number of registered handlers.  The event loop has scalability
problems when many AioHandlers are registered because it is O(n).  Linux
epoll(7) is used to avoid scanning over all pollfds but parts of the code still
scan all AioHandlers.

This series reduces QEMU CPU utilization and therefore increases IOPS,
especially for guests that have many devices.  It was tested with 32 vCPUs, 2
virtio-blk,num-queues=1,iothread=iothread1, and 99
virtio-blk,num-queues=32,iothread=iothread1 devices.  Using an IOThread is
necessary because this series does not improve the glib main loop, a non-goal
since the glib API is inherently O(n).

AioContext polling remains O(n) and will be addressed in a separate patch
series.  This patch series increases IOPS from 260k to 300k when AioContext
polling is commented out
(rw=randread,bs=4k,iodepth=32,ioengine=libaio,direct=1).

Stefan Hajnoczi (5):
  aio-posix: fix use after leaving scope in aio_poll()
  aio-posix: don't pass ns timeout to epoll_wait()
  qemu/queue.h: add QLIST_SAFE_REMOVE()
  aio-posix: make AioHandler deletion O(1)
  aio-posix: make AioHandler dispatch O(1) with epoll

 block.c  |   5 +-
 chardev/spice.c  |   4 +-
 include/block/aio.h  |   6 +-
 include/qemu/queue.h |  17 +
 util/aio-posix.c | 172 +--
 5 files changed, 141 insertions(+), 63 deletions(-)

-- 
2.24.1



Re: [PATCH] Report stringified errno in VFIO related errors

2020-02-14 Thread Michal Privoznik

On 2/14/20 4:49 PM, Alex Williamson wrote:

On Fri, 14 Feb 2020 10:19:50 +0100
Michal Privoznik  wrote:



Do you guys want me to file a bug?


Probably a good idea.  Thanks,


Since I'm reproducing this against upstream, I've reported it here:

https://bugs.launchpad.net/qemu/+bug/186

Michal




Re: [PATCH] Report stringified errno in VFIO related errors

2020-02-14 Thread Alex Williamson
On Fri, 14 Feb 2020 10:19:50 +0100
Michal Privoznik  wrote:

> On 2/14/20 9:47 AM, Michal Privoznik wrote:
> > In a few places we report errno formatted as a negative integer.
> > This is not as user friendly as it can be. Use strerror() and/or
> > error_setg_errno() instead.
> > 
> > Signed-off-by: Michal Privoznik 
> > ---
> >   hw/vfio/common.c| 2 +-
> >   util/vfio-helpers.c | 6 +++---
> >   2 files changed, 4 insertions(+), 4 deletions(-)
> >   
> 
> BTW the reason I've noticed these is because I'm seeing some errors when 
> assigning my NVMe disk to qemu. This is the full command line:
> 
> 
> /home/zippy/work/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64 \
> -name guest=fedora,debug-threads=on \
> -S \
> -object 
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-fedora/master-key.aes
>  
> \
> -machine pc-i440fx-4.1,accel=kvm,usb=off,dump-guest-core=off \
> -cpu host \
> -m size=4194304k,slots=16,maxmem=1099511627776k \
> -overcommit mem-lock=off \
> -smp 4,sockets=1,dies=1,cores=2,threads=2 \
> -object iothread,id=iothread1 \
> -object iothread,id=iothread2 \
> -object iothread,id=iothread3 \
> -object iothread,id=iothread4 \
> -mem-prealloc \
> -mem-path /hugepages2M/libvirt/qemu/2-fedora \
> -numa node,nodeid=0,cpus=0,mem=4096 \
> -uuid 63840878-0deb-4095-97e6-fc444d9bc9fa \
> -no-user-config \
> -nodefaults \
> -chardev socket,id=charmonitor,fd=31,server,nowait \
> -mon chardev=charmonitor,id=monitor,mode=control \
> -rtc base=utc \
> -no-shutdown \
> -global PIIX4_PM.disable_s3=0 \
> -global PIIX4_PM.disable_s4=0 \
> -boot menu=on,strict=on \
> -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
> -blockdev 
> '{"driver":"file","filename":"/var/lib/libvirt/images/fedora.qcow2","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}'
>  
> \
> -blockdev 
> '{"node-name":"libvirt-2-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-2-storage","backing":null}'
>  
> \
> -device 
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-2-format,id=scsi0-0-0-0,bootindex=1
>  
> \
> -blockdev 
> '{"driver":"nvme","device":":02:00.0","namespace":1,"node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}'
>  
> \
> -blockdev 
> '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}'
>  
> \
> -device 
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=libvirt-1-format,id=virtio-disk0
>  
> \
> -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 \
> -device 
> virtio-net-pci,host_mtu=9000,netdev=hostnet0,id=net0,mac=52:54:00:a4:6f:91,bus=pci.0,addr=0x3
>  
> \
> -chardev pty,id=charserial0 \
> -device isa-serial,chardev=charserial0,id=serial0 \
> -chardev socket,id=charchannel0,fd=35,server,nowait \
> -device 
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
>  
> \
> -spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on \
> -device virtio-vga,id=video0,virgl=on,max_outputs=1,bus=pci.0,addr=0x2 \
> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
> -sandbox 
> on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
> -msg timestamp=on
> 
> And these are the errors I see:
> 
> 2020-02-14T09:06:18.183167Z qemu-system-x86_64: VFIO_MAP_DMA failed: 
> Invalid argument
> 2020-02-14T09:10:49.753767Z qemu-system-x86_64: VFIO_MAP_DMA failed: 
> Cannot allocate memory
> 2020-02-14T09:11:04.530344Z qemu-system-x86_64: VFIO_MAP_DMA failed: No 
> space left on device
> 2020-02-14T09:11:04.531087Z qemu-system-x86_64: VFIO_MAP_DMA failed: No 
> space left on device
> 2020-02-14T09:11:04.531230Z qemu-system-x86_64: VFIO_MAP_DMA failed: No 
> space left on device

I feel obligated to reply given the VFIO issues, but TBH I've never
used the nvme-vfio driver and don't consider myself the maintainer of
the vfio-helpers.  I'd guess the latter ENOSPC errors indicate we've
exhausted the number of separate DMA mappings that a vfio user is
allowed to create.  We had to put a cap on this to prevent malicious
users and it was set to 64K, which seemed like orders of magnitude more
than we use for device assignment, but perhaps not enough for this
driver.  There's a dma_entry_limit module option on the
vfio_iommu_type1 module that can be bumped up to see if it resolves
this issue.  The initial EINVAL and ENOMEM errors might however
indicate the driver has already gone into the weeds before we get to
the latter problem though.

> I'm doing nothing with the disk inside the guest, but:
> 
># dd if=/dev/vda of=/dev/null status=progress
> 
> (the disk appears as /dev/vda in the guest). Surprisingly, I do not see 
> these errors when I use the traditional PCI assignment (-device 
> vfio-pci).

Not really surprising, entirely different models of using the d

Re: [PATCH 0/3] qcow2: Fix write/copy error path with data file

2020-02-14 Thread Kevin Wolf
Am 11.02.2020 um 10:48 hat Kevin Wolf geschrieben:
> Kevin Wolf (3):
>   qcow2: update_refcount(): Reset old_table_index after
> qcow2_cache_put()
>   qcow2: Fix qcow2_alloc_cluster_abort() for external data file
>   iotests: Test copy offloading with external data file

Applied to the block branch.

Kevin




Re: [PATCH] block/vvfat: Do not unref qcow on closing backing bdrv

2020-02-14 Thread Kevin Wolf
Am 09.02.2020 um 18:51 hat Hikaru Nishida geschrieben:
> Before this commit, BDRVVVFATState.qcow is unrefed in write_target_close
> on closing backing bdrv of vvfat. However, qcow bdrv is opend as a child
> of vvfat in enable_write_target() so it will be also unrefed on closing
> vvfat itself. This causes use-after-free of qcow on freeing vvfat which
> has backing bdrv and qcow bdrv as children in this order because
> bdrv_close(vvfat) tries to free qcow bdrv after freeing backing bdrv
> as QLIST_FOREACH_SAFE() loop keeps next pointer, but BdrvChild of qcow
> is already freed in bdrv_close(backing bdrv).
> 
> Signed-off-by: Hikaru Nishida 

Thanks, applied to the block branch.

Kevin




Re: [PATCH] qcow2: Fix alignment checks in encrypted images

2020-02-14 Thread Kevin Wolf
Am 13.02.2020 um 18:16 hat Alberto Garcia geschrieben:
> I/O requests to encrypted media should be aligned to the sector size
> used by the underlying encryption method, not to BDRV_SECTOR_SIZE.
> Fortunately this doesn't break anything at the moment because
> both existing QCRYPTO_BLOCK_*_SECTOR_SIZE have the same value as
> BDRV_SECTOR_SIZE.
> 
> The checks in qcow2_co_preadv_encrypted() are also unnecessary because
> they are repeated immediately afterwards in qcow2_co_encdec().
> 
> Signed-off-by: Alberto Garcia 

Thanks, applied to the block branch.

Kevin




Re: [PATCH] Report stringified errno in VFIO related errors

2020-02-14 Thread Philippe Mathieu-Daudé
On Fri, Feb 14, 2020 at 10:20 AM Michal Privoznik  wrote:
> On 2/14/20 9:47 AM, Michal Privoznik wrote:
> > In a few places we report errno formatted as a negative integer.
> > This is not as user friendly as it can be. Use strerror() and/or
> > error_setg_errno() instead.
> >
> > Signed-off-by: Michal Privoznik 
> > ---
> >   hw/vfio/common.c| 2 +-
> >   util/vfio-helpers.c | 6 +++---
> >   2 files changed, 4 insertions(+), 4 deletions(-)
> >
>
> BTW the reason I've noticed these is because I'm seeing some errors when
> assigning my NVMe disk to qemu. This is the full command line:
>
>
> /home/zippy/work/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64 \
> -name guest=fedora,debug-threads=on \
> -S \
> -object
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-fedora/master-key.aes
> \
> -machine pc-i440fx-4.1,accel=kvm,usb=off,dump-guest-core=off \
> -cpu host \
> -m size=4194304k,slots=16,maxmem=1099511627776k \
> -overcommit mem-lock=off \
> -smp 4,sockets=1,dies=1,cores=2,threads=2 \
> -object iothread,id=iothread1 \
> -object iothread,id=iothread2 \
> -object iothread,id=iothread3 \
> -object iothread,id=iothread4 \
> -mem-prealloc \
> -mem-path /hugepages2M/libvirt/qemu/2-fedora \
> -numa node,nodeid=0,cpus=0,mem=4096 \
> -uuid 63840878-0deb-4095-97e6-fc444d9bc9fa \
> -no-user-config \
> -nodefaults \
> -chardev socket,id=charmonitor,fd=31,server,nowait \
> -mon chardev=charmonitor,id=monitor,mode=control \
> -rtc base=utc \
> -no-shutdown \
> -global PIIX4_PM.disable_s3=0 \
> -global PIIX4_PM.disable_s4=0 \
> -boot menu=on,strict=on \
> -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
> -blockdev
> '{"driver":"file","filename":"/var/lib/libvirt/images/fedora.qcow2","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}'
> \
> -blockdev
> '{"node-name":"libvirt-2-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-2-storage","backing":null}'
> \
> -device
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-2-format,id=scsi0-0-0-0,bootindex=1
> \
> -blockdev
> '{"driver":"nvme","device":":02:00.0","namespace":1,"node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}'
> \
> -blockdev
> '{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}'
> \
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=libvirt-1-format,id=virtio-disk0
> \
> -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 \
> -device
> virtio-net-pci,host_mtu=9000,netdev=hostnet0,id=net0,mac=52:54:00:a4:6f:91,bus=pci.0,addr=0x3
> \
> -chardev pty,id=charserial0 \
> -device isa-serial,chardev=charserial0,id=serial0 \
> -chardev socket,id=charchannel0,fd=35,server,nowait \
> -device
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
> \
> -spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on \
> -device virtio-vga,id=video0,virgl=on,max_outputs=1,bus=pci.0,addr=0x2 \
> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
> -sandbox
> on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
> -msg timestamp=on
>
> And these are the errors I see:
>
> 2020-02-14T09:06:18.183167Z qemu-system-x86_64: VFIO_MAP_DMA failed:
> Invalid argument
> 2020-02-14T09:10:49.753767Z qemu-system-x86_64: VFIO_MAP_DMA failed:
> Cannot allocate memory
> 2020-02-14T09:11:04.530344Z qemu-system-x86_64: VFIO_MAP_DMA failed: No
> space left on device
> 2020-02-14T09:11:04.531087Z qemu-system-x86_64: VFIO_MAP_DMA failed: No
> space left on device
> 2020-02-14T09:11:04.531230Z qemu-system-x86_64: VFIO_MAP_DMA failed: No
> space left on device
>
>
> I'm doing nothing with the disk inside the guest, but:
>
># dd if=/dev/vda of=/dev/null status=progress
>
> (the disk appears as /dev/vda in the guest). Surprisingly, I do not see
> these errors when I use the traditional PCI assignment (-device
> vfio-pci). My versions of kernel and qemu:
>
> moe ~ # uname -r
> 5.4.15-gentoo
> moe ~ # /home/zippy/work/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64
> --version
> QEMU emulator version 4.2.50 (v4.2.0-1439-g5d6542bea7-dirty)
> Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers
>
> Do you guys want me to file a bug?

As you already have all the information, and it is a simple
copy/paste, I'd say "yes"




Re: [PATCH] Report stringified errno in VFIO related errors

2020-02-14 Thread Michal Privoznik

On 2/14/20 9:47 AM, Michal Privoznik wrote:

In a few places we report errno formatted as a negative integer.
This is not as user friendly as it can be. Use strerror() and/or
error_setg_errno() instead.

Signed-off-by: Michal Privoznik 
---
  hw/vfio/common.c| 2 +-
  util/vfio-helpers.c | 6 +++---
  2 files changed, 4 insertions(+), 4 deletions(-)



BTW the reason I've noticed these is because I'm seeing some errors when 
assigning my NVMe disk to qemu. This is the full command line:



/home/zippy/work/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64 \
-name guest=fedora,debug-threads=on \
-S \
-object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-fedora/master-key.aes 
\

-machine pc-i440fx-4.1,accel=kvm,usb=off,dump-guest-core=off \
-cpu host \
-m size=4194304k,slots=16,maxmem=1099511627776k \
-overcommit mem-lock=off \
-smp 4,sockets=1,dies=1,cores=2,threads=2 \
-object iothread,id=iothread1 \
-object iothread,id=iothread2 \
-object iothread,id=iothread3 \
-object iothread,id=iothread4 \
-mem-prealloc \
-mem-path /hugepages2M/libvirt/qemu/2-fedora \
-numa node,nodeid=0,cpus=0,mem=4096 \
-uuid 63840878-0deb-4095-97e6-fc444d9bc9fa \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=31,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc \
-no-shutdown \
-global PIIX4_PM.disable_s3=0 \
-global PIIX4_PM.disable_s4=0 \
-boot menu=on,strict=on \
-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
-blockdev 
'{"driver":"file","filename":"/var/lib/libvirt/images/fedora.qcow2","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' 
\
-blockdev 
'{"node-name":"libvirt-2-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-2-storage","backing":null}' 
\
-device 
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-2-format,id=scsi0-0-0-0,bootindex=1 
\
-blockdev 
'{"driver":"nvme","device":":02:00.0","namespace":1,"node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' 
\
-blockdev 
'{"node-name":"libvirt-1-format","read-only":false,"driver":"raw","file":"libvirt-1-storage"}' 
\
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=libvirt-1-format,id=virtio-disk0 
\

-netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=34 \
-device 
virtio-net-pci,host_mtu=9000,netdev=hostnet0,id=net0,mac=52:54:00:a4:6f:91,bus=pci.0,addr=0x3 
\

-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev socket,id=charchannel0,fd=35,server,nowait \
-device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 
\

-spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on \
-device virtio-vga,id=video0,virgl=on,max_outputs=1,bus=pci.0,addr=0x2 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 \
-sandbox 
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \

-msg timestamp=on

And these are the errors I see:

2020-02-14T09:06:18.183167Z qemu-system-x86_64: VFIO_MAP_DMA failed: 
Invalid argument
2020-02-14T09:10:49.753767Z qemu-system-x86_64: VFIO_MAP_DMA failed: 
Cannot allocate memory
2020-02-14T09:11:04.530344Z qemu-system-x86_64: VFIO_MAP_DMA failed: No 
space left on device
2020-02-14T09:11:04.531087Z qemu-system-x86_64: VFIO_MAP_DMA failed: No 
space left on device
2020-02-14T09:11:04.531230Z qemu-system-x86_64: VFIO_MAP_DMA failed: No 
space left on device



I'm doing nothing with the disk inside the guest, but:

  # dd if=/dev/vda of=/dev/null status=progress

(the disk appears as /dev/vda in the guest). Surprisingly, I do not see 
these errors when I use the traditional PCI assignment (-device 
vfio-pci). My versions of kernel and qemu:


moe ~ # uname -r
5.4.15-gentoo
moe ~ # /home/zippy/work/qemu/qemu.git/x86_64-softmmu/qemu-system-x86_64 
--version

QEMU emulator version 4.2.50 (v4.2.0-1439-g5d6542bea7-dirty)
Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers

Do you guys want me to file a bug?

Michal