Re: [Qemu-block] [PATCH v2 1/1] qemu/migration: fix the double free problem on from_src_file

2017-06-07 Thread QingFeng Hao



在 2017/6/7 20:18, Dr. David Alan Gilbert 写道:

* QingFeng Hao (ha...@linux.vnet.ibm.com) wrote:


在 2017/6/6 20:49, Kevin Wolf 写道:

Am 06.06.2017 um 07:24 hat QingFeng Hao geschrieben:




I can't tell for postcopy_ram_listen_thread() - commit 660819b didn't
seem to remove a qemu_fclose() call there, but I can't see one left
behind either. Was the file leaked before commit 660819b or am I
missing something?

I don't think so because loadvm_postcopy_handle_listen creates thread
postcopy_ram_listen_thread
and passes mis->from_src_file as its arg, which will be closed by
migration_incoming_state_destroy.
What confuses me is in the series function calls of qemu_loadvm_state_main
etc, argument f looks
to be redundant as mis already contains from_src_file which equals to f.

In postcopy qemu_loadvm_state_main is called with two different file
arguments but the same mis argument;  see loadvm_handle_cmd_packaged for
the other case where it's called on a packaged-file blob.

yes, you are right, I missed that one. :)



Furthermore, mis may be
also redundant as it can be got via migration_incoming_get_current. Thanks!

We keep changing our minds about the preferred style.  Sometimes we
think it's best to pass the pointer, sometimes we think it's best
to call get_current.

Got it. Thanks!


Dave


Kevin


--
Regards
QingFeng Hao


--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



--
Regards
QingFeng Hao




Re: [Qemu-block] [PATCH v2 3/7] qcow2: Make perform_cow() call do_perform_cow() twice

2017-06-07 Thread Eric Blake
On 06/07/2017 09:08 AM, Alberto Garcia wrote:
> Instead of calling perform_cow() twice with a different COW region
> each time, call it just once and make perform_cow() handle both
> regions.
> 
> This patch simply moves code around. The next one will do the actual
> reordering of the COW operations.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 38 +++---
>  1 file changed, 23 insertions(+), 15 deletions(-)

>  qemu_co_mutex_unlock(>lock);
> -ret = do_perform_cow(bs, m->offset, m->alloc_offset, r->offset, 
> r->nb_bytes);
> +ret = do_perform_cow(bs, m->offset, m->alloc_offset,
> + start->offset, start->nb_bytes);
> +if (ret < 0) {
> +goto fail;
> +}
> +
> +ret = do_perform_cow(bs, m->offset, m->alloc_offset,
> + end->offset, end->nb_bytes);
> +
> +fail:

Since you reach this label even on success, it feels a bit awkward. I
probably would have avoided the goto and used fewer lines by refactoring
the logic as:

ret = do_perform_cow(..., start->...);
if (ret >= 0) {
ret = do_perform_cow(..., end->...);
}

But that doesn't affect correctness, so whether or not you redo the
logic in the shorter fashion:

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PULL 6/8] qemu-iotests: Test automatic commit job cancel on hot unplug

2017-06-07 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
Reviewed-by: John Snow 
---
 tests/qemu-iotests/040 | 35 +--
 tests/qemu-iotests/040.out |  4 ++--
 2 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index 5bdaf3d..9d381d9 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -70,7 +70,9 @@ class ImageCommitTestCase(iotests.QMPTestCase):
 self.wait_for_complete()
 
 class TestSingleDrive(ImageCommitTestCase):
-image_len = 1 * 1024 * 1024
+# Need some space after the copied data so that throttling is effective in
+# tests that use it rather than just completing the job immediately
+image_len = 2 * 1024 * 1024
 test_len = 1 * 1024 * 256
 
 def setUp(self):
@@ -79,7 +81,9 @@ class TestSingleDrive(ImageCommitTestCase):
 qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % 
mid_img, test_img)
 qemu_io('-f', 'raw', '-c', 'write -P 0xab 0 524288', backing_img)
 qemu_io('-f', iotests.imgfmt, '-c', 'write -P 0xef 524288 524288', 
mid_img)
-self.vm = iotests.VM().add_drive(test_img)
+self.vm = iotests.VM().add_drive(test_img, interface="none")
+self.vm.add_device("virtio-scsi-pci")
+self.vm.add_device("scsi-hd,id=scsi0,drive=drive0")
 self.vm.launch()
 
 def tearDown(self):
@@ -131,6 +135,33 @@ class TestSingleDrive(ImageCommitTestCase):
 self.assert_qmp(result, 'error/class', 'GenericError')
 self.assert_qmp(result, 'error/desc', 'Base \'%s\' not found' % 
mid_img)
 
+# When the job is running on a BB that is automatically deleted on hot
+# unplug, the job is cancelled when the device disappears
+def test_hot_unplug(self):
+if self.image_len == 0:
+return
+
+self.assert_no_active_block_jobs()
+result = self.vm.qmp('block-commit', device='drive0', top=mid_img,
+ base=backing_img, speed=(self.image_len / 4))
+self.assert_qmp(result, 'return', {})
+result = self.vm.qmp('device_del', id='scsi0')
+self.assert_qmp(result, 'return', {})
+
+cancelled = False
+deleted = False
+while not cancelled or not deleted:
+for event in self.vm.get_qmp_events(wait=True):
+if event['event'] == 'DEVICE_DELETED':
+self.assert_qmp(event, 'data/device', 'scsi0')
+deleted = True
+elif event['event'] == 'BLOCK_JOB_CANCELLED':
+self.assert_qmp(event, 'data/device', 'drive0')
+cancelled = True
+else:
+self.fail("Unexpected event %s" % (event['event']))
+
+self.assert_no_active_block_jobs()
 
 class TestRelativePaths(ImageCommitTestCase):
 image_len = 1 * 1024 * 1024
diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
index 4fd1c2d..6d9bee1 100644
--- a/tests/qemu-iotests/040.out
+++ b/tests/qemu-iotests/040.out
@@ -1,5 +1,5 @@
-.
+...
 --
-Ran 25 tests
+Ran 27 tests
 
 OK
-- 
1.8.3.1




[Qemu-block] [PULL 8/8] block: fix external snapshot abort permission error

2017-06-07 Thread Kevin Wolf
From: Jeff Cody 

In external_snapshot_abort(), we try to undo what was done in
external_snapshot_prepare() calling bdrv_replace_node() to swap the
nodes back.  However, we receive a permissions error as writers are
blocked on the old node, which is now the new node backing file.

An easy fix (initially suggested by Kevin Wolf) is to call
bdrv_set_backing_hd() on the new node, to set the backing node to NULL.

Signed-off-by: Jeff Cody 
Reviewed-by: Eric Blake 
Signed-off-by: Kevin Wolf 
---
 blockdev.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 892d768..6472548 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1803,7 +1803,11 @@ static void external_snapshot_abort(BlkActionState 
*common)
  DO_UPCAST(ExternalSnapshotState, common, common);
 if (state->new_bs) {
 if (state->overlay_appended) {
+bdrv_ref(state->old_bs);   /* we can't let bdrv_set_backind_hd()
+  close state->old_bs; we need it */
+bdrv_set_backing_hd(state->new_bs, NULL, _abort);
 bdrv_replace_node(state->new_bs, state->old_bs, _abort);
+bdrv_unref(state->old_bs); /* bdrv_replace_node() ref'ed old_bs */
 }
 }
 }
-- 
1.8.3.1




[Qemu-block] [PULL 4/8] qemu-iotests: Block migration test

2017-06-07 Thread Kevin Wolf
Signed-off-by: Kevin Wolf 
Reviewed-by: Jeff Cody 
Reviewed-by: Eric Blake 
---
 tests/qemu-iotests/183 | 140 +
 tests/qemu-iotests/183.out |  46 +++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 187 insertions(+)
 create mode 100755 tests/qemu-iotests/183
 create mode 100644 tests/qemu-iotests/183.out

diff --git a/tests/qemu-iotests/183 b/tests/qemu-iotests/183
new file mode 100755
index 000..20268ff
--- /dev/null
+++ b/tests/qemu-iotests/183
@@ -0,0 +1,140 @@
+#!/bin/bash
+#
+# Test old-style block migration (migrate -b)
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=kw...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1 # failure is the default!
+
+MIG_SOCKET="${TEST_DIR}/migrate"
+
+_cleanup()
+{
+rm -f "${MIG_SOCKET}"
+rm -f "${TEST_IMG}.dest"
+_cleanup_test_img
+_cleanup_qemu
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt qcow2 raw qed dmg quorum
+_supported_proto file
+_supported_os Linux
+
+size=64M
+_make_test_img $size
+TEST_IMG="${TEST_IMG}.dest" _make_test_img $size
+
+echo
+echo === Starting VMs ===
+echo
+
+qemu_comm_method="qmp"
+
+_launch_qemu \
+-drive file="${TEST_IMG}",cache=$CACHEMODE,driver=$IMGFMT,id=disk
+src=$QEMU_HANDLE
+_send_qemu_cmd $src "{ 'execute': 'qmp_capabilities' }" 'return'
+
+_launch_qemu \
+-drive file="${TEST_IMG}.dest",cache=$CACHEMODE,driver=$IMGFMT,id=disk \
+-incoming "unix:${MIG_SOCKET}"
+dest=$QEMU_HANDLE
+_send_qemu_cmd $dest "{ 'execute': 'qmp_capabilities' }" 'return'
+
+echo
+echo === Write something on the source ===
+echo
+
+_send_qemu_cmd $src \
+"{ 'execute': 'human-monitor-command',
+   'arguments': { 'command-line':
+  'qemu-io disk \"write -P 0x55 0 64k\"' } }" \
+'return'
+_send_qemu_cmd $src \
+"{ 'execute': 'human-monitor-command',
+   'arguments': { 'command-line':
+  'qemu-io disk \"read -P 0x55 0 64k\"' } }" \
+'return'
+
+echo
+echo === Do block migration to destination ===
+echo
+
+reply="$(_send_qemu_cmd $src \
+"{ 'execute': 'migrate',
+   'arguments': { 'uri': 'unix:${MIG_SOCKET}', 'blk': true } }" \
+'return\|error')"
+echo "$reply"
+if echo "$reply" | grep "compiled without old-style" > /dev/null; then
+_notrun "migrate -b support not compiled in"
+fi
+
+QEMU_COMM_TIMEOUT=0.1 qemu_cmd_repeat=50 silent=yes \
+_send_qemu_cmd $src "{ 'execute': 'query-migrate' }" '"status": 
"completed"'
+_send_qemu_cmd $src "{ 'execute': 'query-status' }" "return"
+
+echo
+echo === Do some I/O on the destination ===
+echo
+
+# It is important that we use the BlockBackend of the guest device here instead
+# of the node name, which would create a new BlockBackend and not test whether
+# the guest has the necessary permissions to access the image now
+silent=yes _send_qemu_cmd $dest "" "100 %"
+_send_qemu_cmd $dest "{ 'execute': 'query-status' }" "return"
+_send_qemu_cmd $dest \
+"{ 'execute': 'human-monitor-command',
+   'arguments': { 'command-line':
+  'qemu-io disk \"read -P 0x55 0 64k\"' } }" \
+'return'
+_send_qemu_cmd $dest \
+"{ 'execute': 'human-monitor-command',
+   'arguments': { 'command-line':
+  'qemu-io disk \"write -P 0x66 1M 64k\"' } }" \
+'return'
+
+echo
+echo === Shut down and check image ===
+echo
+
+_send_qemu_cmd $src '{"execute":"quit"}' 'return'
+_send_qemu_cmd $dest '{"execute":"quit"}' 'return'
+wait=1 _cleanup_qemu
+
+_check_test_img
+TEST_IMG="${TEST_IMG}.dest" _check_test_img
+
+$QEMU_IO -c 'write -P 0x66 1M 64k' "$TEST_IMG" | _filter_qemu_io
+$QEMU_IMG compare "$TEST_IMG.dest" "$TEST_IMG"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/183.out b/tests/qemu-iotests/183.out
new file mode 100644
index 000..103fdc7
--- /dev/null
+++ b/tests/qemu-iotests/183.out
@@ -0,0 +1,46 @@
+QA output created by 183
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+Formatting 'TEST_DIR/t.IMGFMT.dest', fmt=IMGFMT size=67108864
+
+=== Starting VMs ===
+
+{"return": {}}

[Qemu-block] [PULL 3/8] migration/block: Clean up BBs in block_save_complete()

2017-06-07 Thread Kevin Wolf
We need to release any block migrations BlockBackends on the source
before successfully completing the migration because otherwise
inactivating the images will fail (inactivation only tolerates device
BBs).

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Eric Blake 
Reviewed-by: Jeff Cody 
---
 migration/block.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index 4d8c2e9..114cedb 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -674,16 +674,14 @@ static int64_t get_remaining_dirty(void)
 return dirty << BDRV_SECTOR_BITS;
 }
 
-/* Called with iothread lock taken.  */
 
-static void block_migration_cleanup(void *opaque)
+
+/* Called with iothread lock taken.  */
+static void block_migration_cleanup_bmds(void)
 {
 BlkMigDevState *bmds;
-BlkMigBlock *blk;
 AioContext *ctx;
 
-bdrv_drain_all();
-
 unset_dirty_tracking();
 
 while ((bmds = QSIMPLEQ_FIRST(_mig_state.bmds_list)) != NULL) {
@@ -701,6 +699,16 @@ static void block_migration_cleanup(void *opaque)
 g_free(bmds->aio_bitmap);
 g_free(bmds);
 }
+}
+
+/* Called with iothread lock taken.  */
+static void block_migration_cleanup(void *opaque)
+{
+BlkMigBlock *blk;
+
+bdrv_drain_all();
+
+block_migration_cleanup_bmds();
 
 blk_mig_lock();
 while ((blk = QSIMPLEQ_FIRST(_mig_state.blk_list)) != NULL) {
@@ -844,6 +852,10 @@ static int block_save_complete(QEMUFile *f, void *opaque)
 
 qemu_put_be64(f, BLK_MIG_FLAG_EOS);
 
+/* Make sure that our BlockBackends are gone, so that the block driver
+ * nodes can be inactivated. */
+block_migration_cleanup_bmds();
+
 return 0;
 }
 
-- 
1.8.3.1




[Qemu-block] [PULL 7/8] block/qcow.c: Fix memory leak in qcow_create()

2017-06-07 Thread Kevin Wolf
From: Peter Maydell 

Coverity points out that the code path in qcow_create() for
the magic "fat:" backing file name leaks the memory used to
store the filename (CID 1307771). Free the memory before
we overwrite the pointer.

Signed-off-by: Peter Maydell 
Reviewed-by: Eric Blake 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Kevin Wolf 
---
 block/qcow.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/qcow.c b/block/qcow.c
index 95ab123..7bd94dc 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -852,6 +852,7 @@ static int qcow_create(const char *filename, QemuOpts 
*opts, Error **errp)
 header_size += backing_filename_len;
 } else {
 /* special backing file for vvfat */
+g_free(backing_file);
 backing_file = NULL;
 }
 header.cluster_bits = 9; /* 512 byte cluster to avoid copying
-- 
1.8.3.1




[Qemu-block] [PULL 5/8] commit: Fix use after free in completion

2017-06-07 Thread Kevin Wolf
The final bdrv_set_backing_hd() could be working on already freed nodes
because the commit job drops its references (through BlockBackends) to
both overlay_bs and top already a bit earlier.

One way to trigger the bug is hot unplugging a disk for which
blockdev_mark_auto_del() cancels the block job.

Fix this by taking BDS-level references while we're still using the
nodes.

Signed-off-by: Kevin Wolf 
Reviewed-by: John Snow 
---
 block/commit.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/block/commit.c b/block/commit.c
index a3028b2..af6fa68 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -89,6 +89,10 @@ static void commit_complete(BlockJob *job, void *opaque)
 int ret = data->ret;
 bool remove_commit_top_bs = false;
 
+/* Make sure overlay_bs and top stay around until bdrv_set_backing_hd() */
+bdrv_ref(top);
+bdrv_ref(overlay_bs);
+
 /* Remove base node parent that still uses BLK_PERM_WRITE/RESIZE before
  * the normal backing chain can be restored. */
 blk_unref(s->base);
@@ -124,6 +128,9 @@ static void commit_complete(BlockJob *job, void *opaque)
 if (remove_commit_top_bs) {
 bdrv_set_backing_hd(overlay_bs, top, _abort);
 }
+
+bdrv_unref(overlay_bs);
+bdrv_unref(top);
 }
 
 static void coroutine_fn commit_run(void *opaque)
-- 
1.8.3.1




[Qemu-block] [PULL 1/8] block: Fix anonymous BBs in blk_root_inactivate()

2017-06-07 Thread Kevin Wolf
blk->name isn't an array, but a pointer that can be NULL. Checking for
an anonymous BB must involve a NULL check first, otherwise we get
crashes.

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Juan Quintela 
Reviewed-by: Eric Blake 
Reviewed-by: Jeff Cody 
---
 block/block-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index f3a6008..7d7f369 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -168,7 +168,7 @@ static int blk_root_inactivate(BdrvChild *child)
  * this point because the VM is stopped) and unattached monitor-owned
  * BlockBackends. If there is still any other user like a block job, then
  * we simply can't inactivate the image. */
-if (!blk->dev && !blk->name[0]) {
+if (!blk->dev && !blk_name(blk)[0]) {
 return -EPERM;
 }
 
-- 
1.8.3.1




[Qemu-block] [PULL 2/8] migration: Inactivate images after .save_live_complete_precopy()

2017-06-07 Thread Kevin Wolf
Block migration may still access the image during its
.save_live_complete_precopy() implementation, so we should only
inactivate the image afterwards.

Another reason for the change is that inactivating an image fails when
there is still a non-device BlockBackend using it, which includes the
BBs used by block migration. We want to give block migration a chance to
release the BBs before trying to inactivate the image (this will be done
in another patch).

Signed-off-by: Kevin Wolf 
Reviewed-by: Fam Zheng 
Reviewed-by: Juan Quintela 
Reviewed-by: Eric Blake 
Reviewed-by: Jeff Cody 
---
 migration/migration.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6f0705a..fc95acb 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1825,17 +1825,19 @@ static void migration_completion(MigrationState *s, int 
current_active_state,
 
 if (!ret) {
 ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+if (ret >= 0) {
+qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
+qemu_savevm_state_complete_precopy(s->to_dst_file, false);
+}
 /*
  * Don't mark the image with BDRV_O_INACTIVE flag if
  * we will go into COLO stage later.
  */
 if (ret >= 0 && !migrate_colo_enabled()) {
 ret = bdrv_inactivate_all();
-}
-if (ret >= 0) {
-qemu_file_set_rate_limit(s->to_dst_file, INT64_MAX);
-qemu_savevm_state_complete_precopy(s->to_dst_file, false);
-s->block_inactive = true;
+if (ret >= 0) {
+s->block_inactive = true;
+}
 }
 }
 qemu_mutex_unlock_iothread();
-- 
1.8.3.1




[Qemu-block] [PULL 0/8] Block layer patches

2017-06-07 Thread Kevin Wolf
The following changes since commit b55a69fe5f0a504dac6359bb7e99a72b130c3661:

  Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20170607' 
into staging (2017-06-07 15:06:42 +0100)

are available in the git repository at:


  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 2cb8f869e43406ce829ebe3b16fecc78d367dc7f:

  block: fix external snapshot abort permission error (2017-06-07 18:51:51 
+0200)


Block layer patches


Jeff Cody (1):
  block: fix external snapshot abort permission error

Kevin Wolf (6):
  block: Fix anonymous BBs in blk_root_inactivate()
  migration: Inactivate images after .save_live_complete_precopy()
  migration/block: Clean up BBs in block_save_complete()
  qemu-iotests: Block migration test
  commit: Fix use after free in completion
  qemu-iotests: Test automatic commit job cancel on hot unplug

Peter Maydell (1):
  block/qcow.c: Fix memory leak in qcow_create()

 block/block-backend.c  |   2 +-
 block/commit.c |   7 +++
 block/qcow.c   |   1 +
 blockdev.c |   4 ++
 migration/block.c  |  22 +--
 migration/migration.c  |  12 ++--
 tests/qemu-iotests/040 |  35 +++-
 tests/qemu-iotests/040.out |   4 +-
 tests/qemu-iotests/183 | 140 +
 tests/qemu-iotests/183.out |  46 +++
 tests/qemu-iotests/group   |   1 +
 11 files changed, 259 insertions(+), 15 deletions(-)
 create mode 100755 tests/qemu-iotests/183
 create mode 100644 tests/qemu-iotests/183.out



Re: [Qemu-block] [PATCH v8 00/20] Convert QCow[2] to QCryptoBlock & add LUKS support

2017-06-07 Thread Max Reitz
On 2017-06-01 19:27, Daniel P. Berrange wrote:
> Previously posted:
> 
>  v1: https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg00201.html
>  v2: https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg05147.html
>  v3: https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg05671.html
>  v4: https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg02293.html
>  v5: https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg04653.html
>  v6: https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg04561.html
>  v7: https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg05818.html
> 
> This series is a continuation of previous work to support LUKS in
> QEMU. The existing merged code supports LUKS as a standalone
> driver which can be layered over/under any other QEMU block device
> driver. This works well when using LUKS over protocol drivers (file,
> rbd, iscsi, etc, etc), but has some downsides when combined with
> format drivers like qcow2.
> 
> If you layer LUKS under qcow2 (eg qcow2 -> luks -> file) then you
> cannot get any information about the qcow2 file without first
> decrypting it, as both the header and payload are encrypted.
> 
> If you layer LUKS over qcow2 (eg luks -> qcow2 -> file) then you
> cannot distinguish between a qcow2 file where the guest has done
> LUKS encryption from a qcow2 file which qemu has done encryption.
> More seriously, when encrypting sectors the guest virtual sector
> is used as the input for deriving the initialization vectors.
> When internal snapshots are used, this means that multiple sectors
> in the qcow2 file may be encrypted with the same initialization
> vector. This is a security weakness when combined with certain
> cryptographic modes.
> 
> Integrating LUKS natively into qcow2 allows us to combine the
> best aspects of both layering strategies above. In particular
> the header remains unecrypted, but initialization vectors are
> generated using physical sector numbers preserving security
> when snapshots are used. This is a change from previous postings
> of this work, where the IVs were (incorrectly) generated based
> on the virtual disk sector.
> 
> In a previous posting of this work, Fam had suggested that we
> do integration by layering luks over qcow2, but having QEMU
> block layer automatically create the luks driver above qcow2
> based on the qcow2 header crypt_method field. This is not
> possible though, because such a scheme would suffer from the
> problem of IVs being generated from the virtual disk sector
> instead of physical disk sector. So having LUKS specific
> code in the qcow2 block driver is unavoidable. In comparison
> to the previous posting though, the amount of code in qcow2.c
> has been reduced by allowing re-use of code from block/crypto.c
> for handling QemuOpts -> QAPI conversion. So extra lines of
> code in qcow2 to support LUKS is < 200.
> 
> I have also split the changes to qcow2 up into 2 patches. The
> first patch simply introduces use of the QCryptoBlock framework
> to qcow2 for the existing (deprecated) AES-CBC encryption method.
> The second patch wires up the LUKS support for qcow2. This makes
> it clearer which parts of the changes are related to plain code
> refactoring, vs enabling the new features. Specifically we can
> now see that the LUKS enablement in qcow2 has this footprint:

I'm trusting Eric and Berto, so I haven't looked at all patches in
depth. But I did have a look at all, and found only some minor stuff to
complain about. I didn't send an R-b for patches I did not find
anything, because (as I said) I didn't look at them sufficiently that
I'd give an R-b (which I didn't because Eric's and Berto's were already
present).

I'm just writing this so you know I'm not planning to write any
complaints for the patches I did not reply to. :-)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v8 19/20] qcow2: report encryption specific image information

2017-06-07 Thread Max Reitz
On 2017-06-01 19:27, Daniel P. Berrange wrote:
> Currently 'qemu-img info' reports a simple "encrypted: yes"
> field. This is not very useful now that qcow2 can support
> multiple encryption formats. Users want to know which format
> is in use and some data related to it.
> 
> Wire up usage of the qcrypto_block_get_info() method so that
> 'qemu-img info' can report about the encryption format
> and parameters in use
> 
>   $ qemu-img create \
>   --object secret,id=sec0,data=123456 \
>   -o encrypt.format=luks,encrypt.key-secret=sec0 \
>   -f qcow2 demo.qcow2 1G
>   Formatting 'demo.qcow2', fmt=qcow2 size=1073741824 \
>   encryption=off encrypt.format=luks encrypt.key-secret=sec0 \
>   cluster_size=65536 lazy_refcounts=off refcount_bits=16
> 
>   $ qemu-img info demo.qcow2
>   image: demo.qcow2
>   file format: qcow2
>   virtual size: 1.0G (1073741824 bytes)
>   disk size: 480K
>   encrypted: yes
>   cluster_size: 65536
>   Format specific information:
>   compat: 1.1
>   lazy refcounts: false
>   refcount bits: 16
>   encrypt:
>   ivgen alg: plain64
>   hash alg: sha256
>   cipher alg: aes-256
>   uuid: 3fa930c4-58c8-4ef7-b3c5-314bb5af21f3
>   format: luks
>   cipher mode: xts
>   slots:
>   [0]:
>   active: true
>   iters: 1839058
>   key offset: 4096
>   stripes: 4000
>   [1]:
>   active: false
>   key offset: 262144
>   [2]:
>   active: false
>   key offset: 520192
>   [3]:
>   active: false
>   key offset: 778240
>   [4]:
>   active: false
>   key offset: 1036288
>   [5]:
>   active: false
>   key offset: 1294336
>   [6]:
>   active: false
>   key offset: 1552384
>   [7]:
>   active: false
>   key offset: 1810432
>   payload offset: 2068480
>   master key iters: 438487
>   corrupt: false
> 
> With the legacy "AES" encryption we just report the format
> name
> 
>   $ qemu-img create \
>   --object secret,id=sec0,data=123456 \
>   -o encrypt.format=aes,encrypt.key-secret=sec0 \
>   -f qcow2 demo.qcow2 1G
>   Formatting 'demo.qcow2', fmt=qcow2 size=1073741824 \
>   encryption=off encrypt.format=aes encrypt.key-secret=sec0 \
>   cluster_size=65536 lazy_refcounts=off refcount_bits=16
> 
>   $ ./qemu-img info demo.qcow2
>   image: demo.qcow2
>   file format: qcow2
>   virtual size: 1.0G (1073741824 bytes)
>   disk size: 196K
>   encrypted: yes
>   cluster_size: 65536
>   Format specific information:
>   compat: 1.1
>   lazy refcounts: false
>   refcount bits: 16
>   encrypt:
>   format: aes
>   corrupt: false
> 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/qcow2.c| 32 +++-
>  qapi/block-core.json | 27 ++-
>  2 files changed, 57 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 58da658..a8a23af 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c

[...]

> @@ -3224,6 +3230,30 @@ static ImageInfoSpecific 
> *qcow2_get_specific_info(BlockDriverState *bs)
>  assert(false);
>  }
>  
> +if (encrypt_info) {
> +ImageInfoSpecificQCow2Encryption *qencrypt =
> +g_new(ImageInfoSpecificQCow2Encryption, 1);
> +switch (encrypt_info->format) {
> +case Q_CRYPTO_BLOCK_FORMAT_QCOW:
> +qencrypt->format = BLOCKDEV_QCOW2_ENCRYPTION_FORMAT_AES;
> +qencrypt->u.aes = encrypt_info->u.qcow;
> +break;
> +case Q_CRYPTO_BLOCK_FORMAT_LUKS:
> +qencrypt->format = BLOCKDEV_QCOW2_ENCRYPTION_FORMAT_LUKS;
> +qencrypt->u.luks = encrypt_info->u.luks;
> +break;
> +default:
> +assert(false);

I'd rather like this to be either a plain abort() or a
g_asert_not_reached(); the latter is more expressive, and the former
will work even with NDEBUG.

I know we already have an assert(false) in this function, but I'd assert
this is just wrong.

With this changed (or with me convinced that we should just use
assert(false)):

Reviewed-by: Max Reitz 

> +}
> +/* Since we did shallow copy above, erase any pointers
> + * in the original info */
> +memset(_info->u, 0, sizeof(encrypt_info->u));
> +qapi_free_QCryptoBlockInfo(encrypt_info);
> +
> +spec_info->u.qcow2.data->has_encrypt = true;
> +spec_info->u.qcow2.data->encrypt = qencrypt;
> +}
> +
>  return spec_info;
>  }




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v8 13/20] qcow2: add support for LUKS encryption format

2017-06-07 Thread Max Reitz
On 2017-06-01 19:27, Daniel P. Berrange wrote:
> This adds support for using LUKS as an encryption format
> with the qcow2 file, using the new encrypt.format parameter
> to request "luks" format. e.g.
> 
>   # qemu-img create --object secret,data=123456,id=sec0 \
>-f qcow2 -o encrypt.format=luks,encrypt.key-secret=sec0 \
>test.qcow2 10G
> 
> The legacy "encryption=on" parameter still results in
> creation of the old qcow2 AES format (and is equivalent
> to the new 'encryption-format=aes'). e.g. the following are
> equivalent:
> 
>   # qemu-img create --object secret,data=123456,id=sec0 \
>-f qcow2 -o encryption=on,encrypt.key-secret=sec0 \
>test.qcow2 10G
> 
>  # qemu-img create --object secret,data=123456,id=sec0 \
>-f qcow2 -o encryption-format=aes,encrypt.key-secret=sec0 \
>test.qcow2 10G
> 
> With the LUKS format it is necessary to store the LUKS
> partition header and key material in the QCow2 file. This
> data can be many MB in size, so cannot go into the QCow2
> header region directly. Thus the spec defines a FDE
> (Full Disk Encryption) header extension that specifies
> the offset of a set of clusters to hold the FDE headers,
> as well as the length of that region. The LUKS header is
> thus stored in these extra allocated clusters before the
> main image payload.
> 
> Aside from all the cryptographic differences implied by
> use of the LUKS format, there is one further key difference
> between the use of legacy AES and LUKS encryption in qcow2.
> For LUKS, the initialiazation vectors are generated using
> the host physical sector as the input, rather than the
> guest virtual sector. This guarantees unique initialization
> vectors for all sectors when qcow2 internal snapshots are
> used, thus giving stronger protection against watermarking
> attacks.
> 
> Reviewed-by: Eric Blake 
> Reviewed-by: Alberto Garcia 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/qcow2-cluster.c  |   4 +-
>  block/qcow2-refcount.c |  10 ++
>  block/qcow2.c  | 267 ++--
>  block/qcow2.h  |   9 ++
>  qapi/block-core.json   |   5 +-
>  tests/qemu-iotests/082.out | 270 
> -
>  6 files changed, 477 insertions(+), 88 deletions(-)

[...]

> diff --git a/block/qcow2.c b/block/qcow2.c
> index 38c0420..30d0343 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c

[...]

> @@ -2072,22 +2253,30 @@ static int qcow2_set_up_encryption(BlockDriverState 
> *bs, const char *encryptfmt,
>  qdict_extract_subqdict(options, , "encrypt.");
>  QDECREF(options);
>  
> -if (!g_str_equal(encryptfmt, "aes")) {
> -error_setg(errp, "Unknown encryption format '%s', expected 'aes'",
> -   encryptfmt);
> -ret = -EINVAL;
> -goto out;
> +int fmt = qcow2_crypt_method_from_format(encryptfmt);

Because I love nit picking: Variable declarations should be at the start
of the surrounding block.

Max

> +
> +switch (fmt) {
> +case QCOW_CRYPT_LUKS:
> +cryptoopts = block_crypto_create_opts_init(
> +Q_CRYPTO_BLOCK_FORMAT_LUKS, encryptopts, errp);
> +break;
> +case QCOW_CRYPT_AES:
> +cryptoopts = block_crypto_create_opts_init(
> +Q_CRYPTO_BLOCK_FORMAT_QCOW, encryptopts, errp);
> +break;
> +default:
> +error_setg(errp, "Unknown encryption format '%s'", encryptfmt);
> +break;
>  }
> -cryptoopts = block_crypto_create_opts_init(
> -Q_CRYPTO_BLOCK_FORMAT_QCOW, encryptopts, errp);
>  if (!cryptoopts) {
>  ret = -EINVAL;
>  goto out;
>  }
> -s->crypt_method_header = QCOW_CRYPT_AES;
> +s->crypt_method_header = fmt;
>  
>  crypto = qcrypto_block_create(cryptoopts,
> -  NULL, NULL,
> +  qcow2_crypto_hdr_init_func,
> +  qcow2_crypto_hdr_write_func,
>bs, errp);
>  if (!crypto) {
>  ret = -EINVAL;



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH] block: fix external snapshot abort permission error

2017-06-07 Thread Kevin Wolf
Am 07.06.2017 um 15:55 hat Jeff Cody geschrieben:
> In external_snapshot_abort(), we try to undo what was done in
> external_snapshot_prepare() calling bdrv_replace_node() to swap the
> nodes back.  However, we receive a permissions error as writers are
> blocked on the old node, which is now the new node backing file.
> 
> An easy fix (initially suggested by Kevin Wolf) is to call
> bdrv_set_backing_hd() on the new node, to set the backing node to NULL.
> 
> Signed-off-by: Jeff Cody 

Thanks, applied to the block branch.

Kevin



Re: [Qemu-block] [PATCH v8 09/20] qcow: convert QCow to use QCryptoBlock for encryption

2017-06-07 Thread Max Reitz
On 2017-06-01 19:27, Daniel P. Berrange wrote:
> This converts the qcow driver to make use of the QCryptoBlock
> APIs for encrypting image content. This is only wired up to
> permit use of the legacy QCow encryption format. Users who wish
> to have the strong LUKS format should switch to qcow2 instead.
> 
> With this change it is now required to use the QCryptoSecret
> object for providing passwords, instead of the current block
> password APIs / interactive prompting.
> 

Beware, nit picks incoming:

>   $QEMU \
> -object secret,id=sec0,filename=/home/berrange/encrypted.pw \> -drive 
> file=/home/berrange/encrypted.qcow,encrypt.format=qcow,\

encrypt.format should be "aes".

>encrypt.key-secret=sec0

This doesn't work at all, though, because:

Use of AES-CBC encrypted qcow images is no longer supported in system
emulators
You can use 'qemu-img convert' to convert your image to an alternative
supported format, such as unencrypted qcow, or raw with the LUKS format
instead.

> 
> Likewise when creating images with the legacy AES-CBC format
> 
>   qemu-img create -f qcow \
> -object secret,id=sec0,filename=/home/berrange/encrypted.pw \

Should be --object.

> -o encrypt.format=aes,encrypt.key-secret=sec0 \
> /home/berrange/encrypted.qcow

There should be a size here to make it work.

The patch itself does look good to me, though.

Max

> 
> Reviewed-by: Alberto Garcia 
> Reviewed-by: Eric Blake 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/crypto.c   |  10 +++
>  block/crypto.h   |  20 --
>  block/qcow.c | 198 
> +--
>  qapi/block-core.json |  38 +-
>  4 files changed, 158 insertions(+), 108 deletions(-)



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v8 07/20] block: deprecate "encryption=on" in favor of "encrypt.format=aes"

2017-06-07 Thread Max Reitz
On 2017-06-01 19:27, Daniel P. Berrange wrote:
> Historically the qcow & qcow2 image formats supported a property
> "encryption=on" to enable their built-in AES encryption. We'll
> soon be supporting LUKS for qcow2, so need a more general purpose
> way to enable encryption, with a choice of formats.
> 
> This introduces an "encrypt.format" option, which will later be
> joined by a number of other "encrypt.XXX" options. The use of
> a "encrypt." prefix instead of "encrypt-" is done to facilitate
> mapping to a nested QAPI schema at later date.
> 
> e.g. the preferred syntax is now
> 
>   qemu-img create -f qcow2 -o encrypt.format=aes demo.qcow2
> 
> Reviewed-by: Eric Blake 
> Reviewed-by: Alberto Garcia 
> Signed-off-by: Daniel P. Berrange 
> ---
>  block/qcow.c   | 30 ++---
>  block/qcow2.c  | 33 +++
>  include/block/block_int.h  |  2 +-
>  qemu-img.c |  4 ++-
>  tests/qemu-iotests/082.out | 81 
> ++
>  5 files changed, 110 insertions(+), 40 deletions(-)
> 
> diff --git a/block/qcow.c b/block/qcow.c
> index 6738bc7..42f83b2 100644
> --- a/block/qcow.c
> +++ b/block/qcow.c

[...]

> @@ -818,8 +818,16 @@ static int qcow_create(const char *filename, QemuOpts 
> *opts, Error **errp)
>  }
>  
>  backing_file = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FILE);
> -if (qemu_opt_get_bool_del(opts, BLOCK_OPT_ENCRYPT, false)) {
> -flags |= BLOCK_FLAG_ENCRYPT;
> +encryptfmt = qemu_opt_get_del(opts, BLOCK_OPT_ENCRYPT_FORMAT);
> +if (encryptfmt) {
> +if (qemu_opt_get_bool_del(opts, BLOCK_OPT_ENCRYPT, false)) {

You should probably just use qemu_opt_get_del(opts, BLOCK_OPT_ENCRYPT)
here, because otherwise you can do this:

$ ./qemu-img create -f qcow -o encryption=off,encrypt.format=aes \
foo.qcow 64M
Formatting 'foo.qcow', fmt=qcow size=67108864 encryption=off
encrypt.format=aes

> +error_setg(errp, "Options " BLOCK_OPT_ENCRYPT " and "
> +   BLOCK_OPT_ENCRYPT_FORMAT " are mutually exclusive");
> +ret = -EINVAL;
> +goto cleanup;
> +}
> +} else if (qemu_opt_get_bool_del(opts, BLOCK_OPT_ENCRYPT, false)) {
> +encryptfmt = "aes";
>  }
>  
>  ret = bdrv_create_file(filename, opts, _err);

[...]

> diff --git a/block/qcow2.c b/block/qcow2.c
> index b3ba5da..19dfcd1 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c

[...]

> @@ -2366,8 +2373,16 @@ static int qcow2_create(const char *filename, QemuOpts 
> *opts, Error **errp)
>  BDRV_SECTOR_SIZE);
>  backing_file = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FILE);
>  backing_fmt = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FMT);
> -if (qemu_opt_get_bool_del(opts, BLOCK_OPT_ENCRYPT, false)) {
> -flags |= BLOCK_FLAG_ENCRYPT;
> +encryptfmt = qemu_opt_get_del(opts, BLOCK_OPT_ENCRYPT_FORMAT);
> +if (encryptfmt) {
> +if (qemu_opt_get_bool_del(opts, BLOCK_OPT_ENCRYPT, false)) {

Same here.

With these places fixed:

Reviewed-by: Max Reitz 

> +error_setg(errp, "Options " BLOCK_OPT_ENCRYPT " and "
> +   BLOCK_OPT_ENCRYPT_FORMAT " are mutually exclusive");
> +ret = -EINVAL;
> +goto finish;
> +}
> +} else if (qemu_opt_get_bool_del(opts, BLOCK_OPT_ENCRYPT, false)) {
> +encryptfmt = "aes";
>  }
>  cluster_size = qemu_opt_get_size_del(opts, BLOCK_OPT_CLUSTER_SIZE,
>   DEFAULT_CLUSTER_SIZE);



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH v2 6/7] qcow2: Pass a QEMUIOVector to do_perform_cow_{read, write}()

2017-06-07 Thread Manos Pitsidianakis

On Wed, Jun 07, 2017 at 05:08:27PM +0300, Alberto Garcia wrote:

Instead of passing a single buffer pointer to do_perform_cow_write(),
pass a QEMUIOVector. This will allow us to merge the write requests
for the COW regions and the actual data into a single one.

Although do_perform_cow_read() does not strictly need to change its
API, we're doing it here as well for consistency.

Signed-off-by: Alberto Garcia 


Reviewed-by: Manos Pitsidianakis 


signature.asc
Description: PGP signature


Re: [Qemu-block] [PATCH v2 2/7] qcow2: Use unsigned int for both members of Qcow2COWRegion

2017-06-07 Thread Eric Blake
On 06/07/2017 09:08 AM, Alberto Garcia wrote:
> Qcow2COWRegion has two attributes:
> 
> - The offset of the COW region from the start of the first cluster
>   touched by the I/O request. Since it's always going to be positive
>   and the maximum request size is at most INT_MAX, we can use a
>   regular unsigned int to store this offset.

I don't know if we will ever get to the point that we allow a 64-bit
request at the block layer (and then the block layer guarantees it is
split down to the driver's limits, which works when a driver is still
bound by 32-bit limits).  But we ALSO know that a cluster is at most 2M
(in our current implementation of qcow2), so the offset of where the COW
region starts in relation to the start of a cluster is < 2M.

> 
> - The size of the COW region in bytes. This is guaranteed to be >= 0,
>   so we should use an unsigned type instead.

And likewise, since a COW region is a sub-cluster, and clusters are
bounded at 2M, we also have a sub-int upper bound on the size of the region.

> 
> In x86_64 this reduces the size of Qcow2COWRegion from 16 to 8 bytes.
> It will also help keep some assertions simpler now that we know that
> there are no negative numbers.
> 
> The prototype of do_perform_cow() is also updated to reflect these
> changes.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 4 ++--
>  block/qcow2.h | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index d1c419f52b..a86c5a75a9 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -406,8 +406,8 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t 
> sector_num,
>  static int coroutine_fn do_perform_cow(BlockDriverState *bs,
> uint64_t src_cluster_offset,
> uint64_t cluster_offset,
> -   int offset_in_cluster,
> -   int bytes)
> +   unsigned offset_in_cluster,
> +   unsigned bytes)

I don't know if the code base has a strong preference for 'unsigned int'
over 'unsigned', but it doesn't bother me.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 0/2] Add reduce image for qcow2

2017-06-07 Thread Kevin Wolf
Am 07.06.2017 um 15:37 hat Max Reitz geschrieben:
> On 2017-06-01 13:11, Denis V. Lunev wrote:
> > On 06/01/2017 12:12 PM, Kevin Wolf wrote:
> >> Am 31.05.2017 um 17:03 hat Eric Blake geschrieben:
> >>> On 05/31/2017 09:43 AM, Pavel Butsykin wrote:
>  This patch adds the reduction of the image file for qcow2. As a result, 
>  this
>  allows us to reduce the virtual image size and free up space on the disk 
>  without
>  copying the image. Image can be fragmented and reduction is done by 
>  punching
>  holes in the image file.
> 
>  # ./qemu-img create -f qcow2 -o size=4G image.qcow2
>  Formatting 'image.qcow2', fmt=qcow2 size=4294967296 encryption=off 
>  cluster_size=65536 lazy_refcounts=off refcount_bits=16
> 
>  # ./qemu-io -c "write -P 0x22 0 1G" image.qcow2
> >>> So this is 1G of guest-visible data...
> >>>
>  # ./qemu-img resize image.qcow2 128M
>  Image resized.
> >>> ...and you are truncating the image by throwing away guest-visible
> >>> content, with no warning or double-checking (such as requiring a -f
> >>> force parameter or something) about the data loss.  Shrinking images is
> >>> something we should allow, but it feels like is a rare enough operation
> >>> that you don't want it to be casually easy to throw away data.
> >>>
> >>> Is it feasible to require that a shrink operation will not be performed
> >>> unless all clusters being eliminated have been previously discarded (or
> >>> maybe written to zero), as an assurance that the guest did not care
> >>> about the tail of the image?
> >> I think that ship has sailed long ago because raw images have always
> >> supported shrinking images without any special checks or options. We
> >> want consistency between raw and qcow2, and we also need to provide
> >> backwards compatibility.
> >>
> >> The only thing I can imagine we could do now is to introduce a --shrink
> >> option in qemu-img, print a deprecation warning for shrinking without
> >> this option, and then make it mandatory in a few years.
> 
> Do I hear a "3.0"?

You do.

> >> If we want to distinguish based on the block driver, so that we can
> >> require --shrink for all drivers that didn't support shrinking until
> >> now, we'd have to check the .bdrv_truncate() implementations of all
> >> drivers to see whether it already allowed shrinking.
> 
> We could have an ugly raw-only hack directly in qemu-img (and
> qmp_block_resize) until 3.0.
> 
> I'm really concerned about someone mistyping something and accidentally
> dropping a digit.

Me too. But still we can't break working command lines (at least before
3.0).

I'm okay with temporary hacks in qemu-img, but did you check whether
it's really raw-only? We know that raw can shrink currently and qcow2
can't, but there are 12 drivers implementing .bdrv_truncate, not only
two.

> > Will the solution proposed by Pavel in the answer to Max work:
> > - if there are no data blocks in the truncated space, truncate is allowed
> >   without additional options
> > - if there are data blocks in the truncated space, truncate is allowed
> >   with the flag --force or error is generated
> 
> I'd be OK with that, but I'm not sure we really need it.

Agreed. It would add a lot of complexity for little use. As I wrote in a
previous mail: I don't even know how I would discard the unpartitioned
space after shrinking my guest filesystem.

> It would be nice to have for convenience, but I don't think it solves
> the backwards-compatibility problem (as said by Kevin), and I'm not
> sure it makes things that much more convenient: Just specifying
> --force is easier than to manually trim the device in the guest (and
> then maybe having to specify --force anyway because you didn't do it
> right).

I think it should be specifically --shrink rather than an unspecific
--force that could mean anything.

Kevin


pgpdswo2NZQfp.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH v2 1/7] qcow2: Remove unused Error variable in do_perform_cow()

2017-06-07 Thread Eric Blake
On 06/07/2017 09:08 AM, Alberto Garcia wrote:
> We are using the return value of qcow2_encrypt_sectors() to detect
> problems but we are throwing away the returned Error since we have no
> way to report it to the user. Therefore we can simply get rid of the
> local Error variable and pass NULL instead.
> 
> Alternatively we could try to figure out a way to pass the original
> error instead of simply returning -EIO, but that would be more
> invasive, so let's keep the current approach.
> 
> Signed-off-by: Alberto Garcia 
> ---
>  block/qcow2-cluster.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH] block: fix external snapshot abort permission error

2017-06-07 Thread Eric Blake
On 06/07/2017 08:55 AM, Jeff Cody wrote:
> In external_snapshot_abort(), we try to undo what was done in
> external_snapshot_prepare() calling bdrv_replace_node() to swap the
> nodes back.  However, we receive a permissions error as writers are
> blocked on the old node, which is now the new node backing file.
> 
> An easy fix (initially suggested by Kevin Wolf) is to call
> bdrv_set_backing_hd() on the new node, to set the backing node to NULL.
> 
> Signed-off-by: Jeff Cody 
> ---
>  blockdev.c | 4 
>  1 file changed, 4 insertions(+)
> 

Reviewed-by: Eric Blake 

> diff --git a/blockdev.c b/blockdev.c
> index 892d768..6472548 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -1803,7 +1803,11 @@ static void external_snapshot_abort(BlkActionState 
> *common)
>   DO_UPCAST(ExternalSnapshotState, common, 
> common);
>  if (state->new_bs) {
>  if (state->overlay_appended) {
> +bdrv_ref(state->old_bs);   /* we can't let bdrv_set_backind_hd()
> +  close state->old_bs; we need it */
> +bdrv_set_backing_hd(state->new_bs, NULL, _abort);
>  bdrv_replace_node(state->new_bs, state->old_bs, _abort);
> +bdrv_unref(state->old_bs); /* bdrv_replace_node() ref'ed old_bs 
> */
>  }
>  }
>  }
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH v2 4/7] qcow2: Split do_perform_cow() into _read(), _encrypt() and _write()

2017-06-07 Thread Alberto Garcia
This patch splits do_perform_cow() into three separate functions to
read, encrypt and write the COW regions.

perform_cow() can now read both regions first, then encrypt them and
finally write them to disk. The memory allocation is also done in
this function now, using one single buffer large enough to hold both
regions.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 114 +-
 1 file changed, 84 insertions(+), 30 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 4c03639a72..af43e6a34f 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -403,34 +403,26 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t 
sector_num,
 return 0;
 }
 
-static int coroutine_fn do_perform_cow(BlockDriverState *bs,
-   uint64_t src_cluster_offset,
-   uint64_t cluster_offset,
-   unsigned offset_in_cluster,
-   unsigned bytes)
+static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
+uint64_t src_cluster_offset,
+unsigned offset_in_cluster,
+uint8_t *buffer,
+unsigned bytes)
 {
-BDRVQcow2State *s = bs->opaque;
 QEMUIOVector qiov;
-struct iovec iov;
+struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
 int ret;
 
 if (bytes == 0) {
 return 0;
 }
 
-iov.iov_len = bytes;
-iov.iov_base = qemu_try_blockalign(bs, iov.iov_len);
-if (iov.iov_base == NULL) {
-return -ENOMEM;
-}
-
 qemu_iovec_init_external(, , 1);
 
 BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
 
 if (!bs->drv) {
-ret = -ENOMEDIUM;
-goto out;
+return -ENOMEDIUM;
 }
 
 /* Call .bdrv_co_readv() directly instead of using the public block-layer
@@ -440,39 +432,63 @@ static int coroutine_fn do_perform_cow(BlockDriverState 
*bs,
 ret = bs->drv->bdrv_co_preadv(bs, src_cluster_offset + offset_in_cluster,
   bytes, , 0);
 if (ret < 0) {
-goto out;
+return ret;
 }
 
-if (bs->encrypted) {
+return 0;
+}
+
+static bool coroutine_fn do_perform_cow_encrypt(BlockDriverState *bs,
+uint64_t src_cluster_offset,
+unsigned offset_in_cluster,
+uint8_t *buffer,
+unsigned bytes)
+{
+if (bytes && bs->encrypted) {
+BDRVQcow2State *s = bs->opaque;
 int64_t sector = (src_cluster_offset + offset_in_cluster)
  >> BDRV_SECTOR_BITS;
 assert(s->cipher);
 assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0);
 assert((bytes & ~BDRV_SECTOR_MASK) == 0);
-if (qcow2_encrypt_sectors(s, sector, iov.iov_base, iov.iov_base,
+if (qcow2_encrypt_sectors(s, sector, buffer, buffer,
   bytes >> BDRV_SECTOR_BITS, true, NULL) < 0) {
-ret = -EIO;
-goto out;
+return false;
 }
 }
+return true;
+}
+
+static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
+ uint64_t cluster_offset,
+ unsigned offset_in_cluster,
+ uint8_t *buffer,
+ unsigned bytes)
+{
+QEMUIOVector qiov;
+struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
+int ret;
+
+if (bytes == 0) {
+return 0;
+}
+
+qemu_iovec_init_external(, , 1);
 
 ret = qcow2_pre_write_overlap_check(bs, 0,
 cluster_offset + offset_in_cluster, bytes);
 if (ret < 0) {
-goto out;
+return ret;
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
 ret = bdrv_co_pwritev(bs->file, cluster_offset + offset_in_cluster,
   bytes, , 0);
 if (ret < 0) {
-goto out;
+return ret;
 }
 
-ret = 0;
-out:
-qemu_vfree(iov.iov_base);
-return ret;
+return 0;
 }
 
 
@@ -760,22 +776,59 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta 
*m)
 BDRVQcow2State *s = bs->opaque;
 Qcow2COWRegion *start = >cow_start;
 Qcow2COWRegion *end = >cow_end;
+unsigned buffer_size = start->nb_bytes + end->nb_bytes;
+uint8_t *start_buffer, *end_buffer;
 int ret;
 
+assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
+
 if (start->nb_bytes == 0 && end->nb_bytes == 0) {
 return 0;
 }
 
+/* Reserve a buffer large enough to store the data from both the
+ * start and 

[Qemu-block] [PATCH v2 3/7] qcow2: Make perform_cow() call do_perform_cow() twice

2017-06-07 Thread Alberto Garcia
Instead of calling perform_cow() twice with a different COW region
each time, call it just once and make perform_cow() handle both
regions.

This patch simply moves code around. The next one will do the actual
reordering of the COW operations.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 38 +++---
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index a86c5a75a9..4c03639a72 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -414,6 +414,10 @@ static int coroutine_fn do_perform_cow(BlockDriverState 
*bs,
 struct iovec iov;
 int ret;
 
+if (bytes == 0) {
+return 0;
+}
+
 iov.iov_len = bytes;
 iov.iov_base = qemu_try_blockalign(bs, iov.iov_len);
 if (iov.iov_base == NULL) {
@@ -751,31 +755,40 @@ uint64_t 
qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
 return cluster_offset;
 }
 
-static int perform_cow(BlockDriverState *bs, QCowL2Meta *m, Qcow2COWRegion *r)
+static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
 {
 BDRVQcow2State *s = bs->opaque;
+Qcow2COWRegion *start = >cow_start;
+Qcow2COWRegion *end = >cow_end;
 int ret;
 
-if (r->nb_bytes == 0) {
+if (start->nb_bytes == 0 && end->nb_bytes == 0) {
 return 0;
 }
 
 qemu_co_mutex_unlock(>lock);
-ret = do_perform_cow(bs, m->offset, m->alloc_offset, r->offset, 
r->nb_bytes);
+ret = do_perform_cow(bs, m->offset, m->alloc_offset,
+ start->offset, start->nb_bytes);
+if (ret < 0) {
+goto fail;
+}
+
+ret = do_perform_cow(bs, m->offset, m->alloc_offset,
+ end->offset, end->nb_bytes);
+
+fail:
 qemu_co_mutex_lock(>lock);
 
-if (ret < 0) {
-return ret;
-}
-
 /*
  * Before we update the L2 table to actually point to the new cluster, we
  * need to be sure that the refcounts have been increased and COW was
  * handled.
  */
-qcow2_cache_depends_on_flush(s->l2_table_cache);
+if (ret == 0) {
+qcow2_cache_depends_on_flush(s->l2_table_cache);
+}
 
-return 0;
+return ret;
 }
 
 int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
@@ -795,12 +808,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 }
 
 /* copy content of unmodified sectors */
-ret = perform_cow(bs, m, >cow_start);
-if (ret < 0) {
-goto err;
-}
-
-ret = perform_cow(bs, m, >cow_end);
+ret = perform_cow(bs, m);
 if (ret < 0) {
 goto err;
 }
-- 
2.11.0




[Qemu-block] [PATCH v2 1/7] qcow2: Remove unused Error variable in do_perform_cow()

2017-06-07 Thread Alberto Garcia
We are using the return value of qcow2_encrypt_sectors() to detect
problems but we are throwing away the returned Error since we have no
way to report it to the user. Therefore we can simply get rid of the
local Error variable and pass NULL instead.

Alternatively we could try to figure out a way to pass the original
error instead of simply returning -EIO, but that would be more
invasive, so let's keep the current approach.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index d779ea19cf..d1c419f52b 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -440,16 +440,14 @@ static int coroutine_fn do_perform_cow(BlockDriverState 
*bs,
 }
 
 if (bs->encrypted) {
-Error *err = NULL;
 int64_t sector = (src_cluster_offset + offset_in_cluster)
  >> BDRV_SECTOR_BITS;
 assert(s->cipher);
 assert((offset_in_cluster & ~BDRV_SECTOR_MASK) == 0);
 assert((bytes & ~BDRV_SECTOR_MASK) == 0);
 if (qcow2_encrypt_sectors(s, sector, iov.iov_base, iov.iov_base,
-  bytes >> BDRV_SECTOR_BITS, true, ) < 0) {
+  bytes >> BDRV_SECTOR_BITS, true, NULL) < 0) {
 ret = -EIO;
-error_free(err);
 goto out;
 }
 }
-- 
2.11.0




[Qemu-block] [PATCH v2 0/7] qcow2: Reduce the number of I/O ops when doing COW

2017-06-07 Thread Alberto Garcia
Hi all,

here's a patch series that rewrites the copy-on-write code in the
qcow2 driver to reduce the number of I/O operations.

This is version v2, please refer to the original e-mail for a complete
description:

https://lists.gnu.org/archive/html/qemu-block/2017-05/msg00882.html

Regards,

Berto

Changes:

v2:
- Patch 1: Update commit message [Eric]
- Patch 7: Make sure that the number of iovs does not exceed IOV_MAX [Anton]
- Patch 7: Don't add zero-length buffers to the qiov in perform_cow()

v1: https://lists.gnu.org/archive/html/qemu-block/2017-05/msg00882.html
- Initial version

Output of git-backport-diff against v1:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/7:[down] 'qcow2: Remove unused Error variable in do_perform_cow()'
002/7:[] [--] 'qcow2: Use unsigned int for both members of Qcow2COWRegion'
003/7:[] [--] 'qcow2: Make perform_cow() call do_perform_cow() twice'
004/7:[] [--] 'qcow2: Split do_perform_cow() into _read(), _encrypt() and 
_write()'
005/7:[] [--] 'qcow2: Allow reading both COW regions with only one request'
006/7:[] [--] 'qcow2: Pass a QEMUIOVector to do_perform_cow_{read,write}()'
007/7:[0014] [FC] 'qcow2: Merge the writing of the COW regions with the guest 
data'

Alberto Garcia (7):
  qcow2: Remove unused Error variable in do_perform_cow()
  qcow2: Use unsigned int for both members of Qcow2COWRegion
  qcow2: Make perform_cow() call do_perform_cow() twice
  qcow2: Split do_perform_cow() into _read(), _encrypt() and _write()
  qcow2: Allow reading both COW regions with only one request
  qcow2: Pass a QEMUIOVector to do_perform_cow_{read,write}()
  qcow2: Merge the writing of the COW regions with the guest data

 block/qcow2-cluster.c | 192 +-
 block/qcow2.c |  64 ++---
 block/qcow2.h |  11 ++-
 3 files changed, 207 insertions(+), 60 deletions(-)

-- 
2.11.0




[Qemu-block] [PATCH v2 2/7] qcow2: Use unsigned int for both members of Qcow2COWRegion

2017-06-07 Thread Alberto Garcia
Qcow2COWRegion has two attributes:

- The offset of the COW region from the start of the first cluster
  touched by the I/O request. Since it's always going to be positive
  and the maximum request size is at most INT_MAX, we can use a
  regular unsigned int to store this offset.

- The size of the COW region in bytes. This is guaranteed to be >= 0,
  so we should use an unsigned type instead.

In x86_64 this reduces the size of Qcow2COWRegion from 16 to 8 bytes.
It will also help keep some assertions simpler now that we know that
there are no negative numbers.

The prototype of do_perform_cow() is also updated to reflect these
changes.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 4 ++--
 block/qcow2.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index d1c419f52b..a86c5a75a9 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -406,8 +406,8 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t 
sector_num,
 static int coroutine_fn do_perform_cow(BlockDriverState *bs,
uint64_t src_cluster_offset,
uint64_t cluster_offset,
-   int offset_in_cluster,
-   int bytes)
+   unsigned offset_in_cluster,
+   unsigned bytes)
 {
 BDRVQcow2State *s = bs->opaque;
 QEMUIOVector qiov;
diff --git a/block/qcow2.h b/block/qcow2.h
index 1801dc30dc..c26ee0a33d 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -301,10 +301,10 @@ typedef struct Qcow2COWRegion {
  * Offset of the COW region in bytes from the start of the first cluster
  * touched by the request.
  */
-uint64_toffset;
+unsignedoffset;
 
 /** Number of bytes to copy */
-int nb_bytes;
+unsignednb_bytes;
 } Qcow2COWRegion;
 
 /**
-- 
2.11.0




[Qemu-block] [PATCH v2 5/7] qcow2: Allow reading both COW regions with only one request

2017-06-07 Thread Alberto Garcia
Reading both COW regions requires two separate requests, but it's
perfectly possible to merge them and perform only one. This generally
improves performance, particularly on rotating disk drives. The
downside is that the data in the middle region is read but discarded.

This patch takes a conservative approach and only merges reads when
the size of the middle region is <= 16KB.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 41 ++---
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index af43e6a34f..8f6bc3d0b9 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -777,34 +777,53 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta 
*m)
 Qcow2COWRegion *start = >cow_start;
 Qcow2COWRegion *end = >cow_end;
 unsigned buffer_size = start->nb_bytes + end->nb_bytes;
+unsigned data_bytes = end->offset - (start->offset + start->nb_bytes);
+bool merge_reads = false;
 uint8_t *start_buffer, *end_buffer;
 int ret;
 
 assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
+assert(buffer_size <= UINT_MAX - data_bytes);
+assert(start->offset + start->nb_bytes <= end->offset);
 
 if (start->nb_bytes == 0 && end->nb_bytes == 0) {
 return 0;
 }
 
-/* Reserve a buffer large enough to store the data from both the
- * start and end COW regions */
+/* If we have to read both the start and end COW regions and the
+ * middle region is not too large then perform just one read
+ * operation */
+if (start->nb_bytes && end->nb_bytes && data_bytes <= 16384) {
+merge_reads = true;
+buffer_size += data_bytes;
+}
+
+/* Reserve a buffer large enough to store all the data that we're
+ * going to read */
 start_buffer = qemu_try_blockalign(bs, buffer_size);
 if (start_buffer == NULL) {
 return -ENOMEM;
 }
 /* The part of the buffer where the end region is located */
-end_buffer = start_buffer + start->nb_bytes;
+end_buffer = start_buffer + buffer_size - end->nb_bytes;
 
 qemu_co_mutex_unlock(>lock);
-/* First we read the existing data from both COW regions */
-ret = do_perform_cow_read(bs, m->offset, start->offset,
-  start_buffer, start->nb_bytes);
-if (ret < 0) {
-goto fail;
-}
+/* First we read the existing data from both COW regions. We
+ * either read the whole region in one go, or the start and end
+ * regions separately. */
+if (merge_reads) {
+ret = do_perform_cow_read(bs, m->offset, start->offset,
+  start_buffer, buffer_size);
+} else {
+ret = do_perform_cow_read(bs, m->offset, start->offset,
+  start_buffer, start->nb_bytes);
+if (ret < 0) {
+goto fail;
+}
 
-ret = do_perform_cow_read(bs, m->offset, end->offset,
-  end_buffer, end->nb_bytes);
+ret = do_perform_cow_read(bs, m->offset, end->offset,
+  end_buffer, end->nb_bytes);
+}
 if (ret < 0) {
 goto fail;
 }
-- 
2.11.0




[Qemu-block] [PATCH v2 6/7] qcow2: Pass a QEMUIOVector to do_perform_cow_{read, write}()

2017-06-07 Thread Alberto Garcia
Instead of passing a single buffer pointer to do_perform_cow_write(),
pass a QEMUIOVector. This will allow us to merge the write requests
for the COW regions and the actual data into a single one.

Although do_perform_cow_read() does not strictly need to change its
API, we're doing it here as well for consistency.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 51 ---
 1 file changed, 24 insertions(+), 27 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 8f6bc3d0b9..71609ff7a2 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -406,19 +406,14 @@ int qcow2_encrypt_sectors(BDRVQcow2State *s, int64_t 
sector_num,
 static int coroutine_fn do_perform_cow_read(BlockDriverState *bs,
 uint64_t src_cluster_offset,
 unsigned offset_in_cluster,
-uint8_t *buffer,
-unsigned bytes)
+QEMUIOVector *qiov)
 {
-QEMUIOVector qiov;
-struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
 int ret;
 
-if (bytes == 0) {
+if (qiov->size == 0) {
 return 0;
 }
 
-qemu_iovec_init_external(, , 1);
-
 BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
 
 if (!bs->drv) {
@@ -430,7 +425,7 @@ static int coroutine_fn 
do_perform_cow_read(BlockDriverState *bs,
  * which can lead to deadlock when block layer copy-on-read is enabled.
  */
 ret = bs->drv->bdrv_co_preadv(bs, src_cluster_offset + offset_in_cluster,
-  bytes, , 0);
+  qiov->size, qiov, 0);
 if (ret < 0) {
 return ret;
 }
@@ -462,28 +457,23 @@ static bool coroutine_fn 
do_perform_cow_encrypt(BlockDriverState *bs,
 static int coroutine_fn do_perform_cow_write(BlockDriverState *bs,
  uint64_t cluster_offset,
  unsigned offset_in_cluster,
- uint8_t *buffer,
- unsigned bytes)
+ QEMUIOVector *qiov)
 {
-QEMUIOVector qiov;
-struct iovec iov = { .iov_base = buffer, .iov_len = bytes };
 int ret;
 
-if (bytes == 0) {
+if (qiov->size == 0) {
 return 0;
 }
 
-qemu_iovec_init_external(, , 1);
-
 ret = qcow2_pre_write_overlap_check(bs, 0,
-cluster_offset + offset_in_cluster, bytes);
+cluster_offset + offset_in_cluster, qiov->size);
 if (ret < 0) {
 return ret;
 }
 
 BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
 ret = bdrv_co_pwritev(bs->file, cluster_offset + offset_in_cluster,
-  bytes, , 0);
+  qiov->size, qiov, 0);
 if (ret < 0) {
 return ret;
 }
@@ -780,6 +770,7 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
 unsigned data_bytes = end->offset - (start->offset + start->nb_bytes);
 bool merge_reads = false;
 uint8_t *start_buffer, *end_buffer;
+QEMUIOVector qiov;
 int ret;
 
 assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
@@ -807,22 +798,25 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta 
*m)
 /* The part of the buffer where the end region is located */
 end_buffer = start_buffer + buffer_size - end->nb_bytes;
 
+qemu_iovec_init(, 3);
+
 qemu_co_mutex_unlock(>lock);
 /* First we read the existing data from both COW regions. We
  * either read the whole region in one go, or the start and end
  * regions separately. */
 if (merge_reads) {
-ret = do_perform_cow_read(bs, m->offset, start->offset,
-  start_buffer, buffer_size);
+qemu_iovec_add(, start_buffer, buffer_size);
+ret = do_perform_cow_read(bs, m->offset, start->offset, );
 } else {
-ret = do_perform_cow_read(bs, m->offset, start->offset,
-  start_buffer, start->nb_bytes);
+qemu_iovec_add(, start_buffer, start->nb_bytes);
+ret = do_perform_cow_read(bs, m->offset, start->offset, );
 if (ret < 0) {
 goto fail;
 }
 
-ret = do_perform_cow_read(bs, m->offset, end->offset,
-  end_buffer, end->nb_bytes);
+qemu_iovec_reset();
+qemu_iovec_add(, end_buffer, end->nb_bytes);
+ret = do_perform_cow_read(bs, m->offset, end->offset, );
 }
 if (ret < 0) {
 goto fail;
@@ -840,14 +834,16 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta 
*m)
 }
 
 /* And now we can write everything */
-ret = do_perform_cow_write(bs, m->alloc_offset, start->offset,
-   start_buffer, 

[Qemu-block] [PATCH v2 7/7] qcow2: Merge the writing of the COW regions with the guest data

2017-06-07 Thread Alberto Garcia
If the guest tries to write data that results on the allocation of a
new cluster, instead of writing the guest data first and then the data
from the COW regions, write everything together using one single I/O
operation.

This can improve the write performance by 25% or more, depending on
several factors such as the media type, the cluster size and the I/O
request size.

Signed-off-by: Alberto Garcia 
---
 block/qcow2-cluster.c | 38 ++
 block/qcow2.c | 64 +++
 block/qcow2.h |  7 ++
 3 files changed, 90 insertions(+), 19 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 71609ff7a2..8e789e790b 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -776,6 +776,7 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
 assert(start->nb_bytes <= UINT_MAX - end->nb_bytes);
 assert(buffer_size <= UINT_MAX - data_bytes);
 assert(start->offset + start->nb_bytes <= end->offset);
+assert(!m->data_qiov || m->data_qiov->size == data_bytes);
 
 if (start->nb_bytes == 0 && end->nb_bytes == 0) {
 return 0;
@@ -833,17 +834,36 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta 
*m)
 }
 }
 
-/* And now we can write everything */
-qemu_iovec_reset();
-qemu_iovec_add(, start_buffer, start->nb_bytes);
-ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, );
-if (ret < 0) {
-goto fail;
+/* And now we can write everything. If we have the guest data we
+ * can write everything in one single operation */
+if (m->data_qiov) {
+qemu_iovec_reset();
+if (start->nb_bytes) {
+qemu_iovec_add(, start_buffer, start->nb_bytes);
+}
+qemu_iovec_concat(, m->data_qiov, 0, data_bytes);
+if (end->nb_bytes) {
+qemu_iovec_add(, end_buffer, end->nb_bytes);
+}
+/* NOTE: we have a write_aio blkdebug event here followed by
+ * a cow_write one in do_perform_cow_write(), but there's only
+ * one single I/O operation */
+BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
+ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, );
+} else {
+/* If there's no guest data then write both COW regions separately */
+qemu_iovec_reset();
+qemu_iovec_add(, start_buffer, start->nb_bytes);
+ret = do_perform_cow_write(bs, m->alloc_offset, start->offset, );
+if (ret < 0) {
+goto fail;
+}
+
+qemu_iovec_reset();
+qemu_iovec_add(, end_buffer, end->nb_bytes);
+ret = do_perform_cow_write(bs, m->alloc_offset, end->offset, );
 }
 
-qemu_iovec_reset();
-qemu_iovec_add(, end_buffer, end->nb_bytes);
-ret = do_perform_cow_write(bs, m->alloc_offset, end->offset, );
 fail:
 qemu_co_mutex_lock(>lock);
 
diff --git a/block/qcow2.c b/block/qcow2.c
index b3ba5daa93..89be2083d4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1575,6 +1575,44 @@ fail:
 return ret;
 }
 
+/* Check if it's possible to merge a write request with the writing of
+ * the data from the COW regions */
+static bool can_merge_cow(uint64_t offset, unsigned bytes,
+  QEMUIOVector *hd_qiov, QCowL2Meta *l2meta)
+{
+QCowL2Meta *m;
+
+for (m = l2meta; m != NULL; m = m->next) {
+/* If both COW regions are empty then there's nothing to merge */
+if (m->cow_start.nb_bytes == 0 && m->cow_end.nb_bytes == 0) {
+continue;
+}
+
+/* The data (middle) region must be immediately after the
+ * start region */
+if (l2meta_cow_start(m) + m->cow_start.nb_bytes != offset) {
+continue;
+}
+
+/* The end region must be immediately after the data (middle)
+ * region */
+if (m->offset + m->cow_end.offset != offset + bytes) {
+continue;
+}
+
+/* Make sure that adding both COW regions to the QEMUIOVector
+ * does not exceed IOV_MAX */
+if (hd_qiov->niov > IOV_MAX - 2) {
+continue;
+}
+
+m->data_qiov = hd_qiov;
+return true;
+}
+
+return false;
+}
+
 static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
  uint64_t bytes, QEMUIOVector *qiov,
  int flags)
@@ -1657,16 +1695,22 @@ static coroutine_fn int 
qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
 goto fail;
 }
 
-qemu_co_mutex_unlock(>lock);
-BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
-trace_qcow2_writev_data(qemu_coroutine_self(),
-cluster_offset + offset_in_cluster);
-ret = bdrv_co_pwritev(bs->file,
-  cluster_offset + offset_in_cluster,
-  

[Qemu-block] [PATCH] block: fix external snapshot abort permission error

2017-06-07 Thread Jeff Cody
In external_snapshot_abort(), we try to undo what was done in
external_snapshot_prepare() calling bdrv_replace_node() to swap the
nodes back.  However, we receive a permissions error as writers are
blocked on the old node, which is now the new node backing file.

An easy fix (initially suggested by Kevin Wolf) is to call
bdrv_set_backing_hd() on the new node, to set the backing node to NULL.

Signed-off-by: Jeff Cody 
---
 blockdev.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 892d768..6472548 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1803,7 +1803,11 @@ static void external_snapshot_abort(BlkActionState 
*common)
  DO_UPCAST(ExternalSnapshotState, common, common);
 if (state->new_bs) {
 if (state->overlay_appended) {
+bdrv_ref(state->old_bs);   /* we can't let bdrv_set_backind_hd()
+  close state->old_bs; we need it */
+bdrv_set_backing_hd(state->new_bs, NULL, _abort);
 bdrv_replace_node(state->new_bs, state->old_bs, _abort);
+bdrv_unref(state->old_bs); /* bdrv_replace_node() ref'ed old_bs */
 }
 }
 }
-- 
2.9.3




Re: [Qemu-block] [Qemu-devel] [PATCH 09/25] block/dirty-bitmap: add readonly field to BdrvDirtyBitmap

2017-06-07 Thread Max Reitz
On 2017-06-02 00:29, John Snow wrote:
> 
> 
> On 05/31/2017 09:43 AM, Max Reitz wrote:
>> On 2017-05-30 08:50, Vladimir Sementsov-Ogievskiy wrote:
>>> Thank you for this scenario. Hmm.
>>>
>>> So, as I need guarantee that image and bitmap are unchanged,
>>> bdrv_set_dirty should return error and fail the whole write. Ok?
>>
>> I don't know. That would mean that you couldn't commit to an image that
>> has a persistent auto-loading bitmap, which doesn't seem very nice to me.
>>
>> I'm not quite sure what to do myself. So first I'd definitely want the
>> commit operation to succeed. That means we'd have to automatically make
>> the bitmap non-readonly once we write to it. The "readonly" flag would
>> then be an "unchanged" flag, rather, to signify that the bitmap has not
>> been changed since it was loaded, which means that it does not need to
>> be written back to the image file.
>>
>> Now the issue remains that if you modify a persistent bitmap that is
>> stored in an image file that is opened RO when it's closed, you won't be
>> able to write the modifications back.
>>> So in addition, I guess we'd need to "flush" all persistent bitmaps
>> (that is, write all modifications back to the file and set the
>> "unchanged" flag (you could also call it "dirty" and then mean the
>> opposite) for each bitmap) not only when the image is closed or
>> invalidated, but also when it is reopened read-only.
>>
> 
> Makes sense.
> 
>> (block-commit reopens the backing BDS R/W, then writes to them, thus
>> modifying the dirty bitmaps, and finally reopens the BDS as read-only;
>> before that happens, we will have to flush the modified bitmap data.)
>>
> 
> OK, so it would perhaps be enough to toggle the RO flag on/off when
> nodes get reopened. When they get reopened RO, we'd need to flush at
> that point.
> 
> (Right?)

Right.

> Of course, a changed flag makes this a little moot as it is probably
> more flexible; but there is something slightly attractive about the more
> rigid form.
> 
> (Hmm, for the purposes of periodic flushing, we may want a changed flag
> anyway...)

I don't mind either way. I like a dirty flag because this is a common
concept (we basically cache the persistent bitmaps in RAM, so it's
natural for them to have a dirty/clean state); but I also like keeping
the read-only flag because it's probably simpler to implement (and makes
just as much sense).

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v5 5/5] iotests: chown LUKS device before qemu-io launches

2017-06-07 Thread Max Reitz
On 2017-06-01 10:40, Daniel P. Berrange wrote:
> On Wed, May 31, 2017 at 06:15:27PM +0200, Max Reitz wrote:
>> On 2017-05-09 19:33, Daniel P. Berrange wrote:
>>> On some distros, whenever you close a block device file
>>> descriptor there is a udev rule that resets the file
>>> permissions. This can race with the test script when
>>> we run qemu-io multiple times against the same block
>>> device. Occasionally the second qemu-io invocation
>>> will find udev has reset the permissions causing failure.
>>>
>>> Reviewed-by: Eric Blake 
>>> Signed-off-by: Daniel P. Berrange 
>>> ---
>>>  tests/qemu-iotests/149 |  13 +-
>>>  tests/qemu-iotests/149.out | 344 
>>> ++---
>>>  2 files changed, 178 insertions(+), 179 deletions(-)
>>>
>>> diff --git a/tests/qemu-iotests/149 b/tests/qemu-iotests/149
>>> index 5faf585..bc628ce 100755
>>> --- a/tests/qemu-iotests/149
>>> +++ b/tests/qemu-iotests/149
>>> @@ -23,6 +23,7 @@
>>>  import subprocess
>>>  import os
>>>  import os.path
>>> +import time
>>
>> Why?
> 
> That's left over cruft from debugging this test. I'll kill it.
> 
>> Rest looks good.
> 
> Does that count for "reviewed-by" if i kill the import ?

Yes, it does. :-)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2 1/1] qemu/migration: fix the double free problem on from_src_file

2017-06-07 Thread Dr. David Alan Gilbert
* QingFeng Hao (ha...@linux.vnet.ibm.com) wrote:
> 
> 
> 在 2017/6/6 20:49, Kevin Wolf 写道:
> > Am 06.06.2017 um 07:24 hat QingFeng Hao geschrieben:



> > I can't tell for postcopy_ram_listen_thread() - commit 660819b didn't
> > seem to remove a qemu_fclose() call there, but I can't see one left
> > behind either. Was the file leaked before commit 660819b or am I
> > missing something?
> I don't think so because loadvm_postcopy_handle_listen creates thread
> postcopy_ram_listen_thread
> and passes mis->from_src_file as its arg, which will be closed by
> migration_incoming_state_destroy.
> What confuses me is in the series function calls of qemu_loadvm_state_main
> etc, argument f looks
> to be redundant as mis already contains from_src_file which equals to f.

In postcopy qemu_loadvm_state_main is called with two different file
arguments but the same mis argument;  see loadvm_handle_cmd_packaged for
the other case where it's called on a packaged-file blob.

> Furthermore, mis may be
> also redundant as it can be got via migration_incoming_get_current. Thanks!

We keep changing our minds about the preferred style.  Sometimes we
think it's best to pass the pointer, sometimes we think it's best
to call get_current.

Dave

> > 
> > Kevin
> > 
> 
> -- 
> Regards
> QingFeng Hao
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-block] [Qemu-devel] [PATCH v3 06/16] backup: Request BLK_PERM_AIO_CONTEXT_CHANGE on target

2017-06-07 Thread Fam Zheng
On Wed, 05/24 10:52, Fam Zheng wrote:
> What's done in the source's context change notifier is moving the
> target's context to follow the new one, so we request this permission
> here.
> 
> Signed-off-by: Fam Zheng 
> ---
>  block/backup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/backup.c b/block/backup.c
> index a4fb288..546c5c5 100644
> --- a/block/backup.c
> +++ b/block/backup.c
> @@ -636,7 +636,7 @@ BlockJob *backup_job_create(const char *job_id, 
> BlockDriverState *bs,
>  }
>  
>  /* The target must match the source in size, so no resize here either */
> -job->target = blk_new(BLK_PERM_WRITE,
> +job->target = blk_new(BLK_PERM_WRITE | BLK_PERM_AIO_CONTEXT_CHANGE,
>BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE |
>BLK_PERM_WRITE_UNCHANGED | BLK_PERM_GRAPH_MOD);
>  ret = blk_insert_bs(job->target, target, errp);

Since fc0932fdc we reuse @bs as the backing of @target if sync=none, and this
hunk causes problem: the device's BB probably doesn't permit
BLK_PERM_AIO_CONTEXT_CHANGE.

Still scratching my head on this.

Fam



Re: [Qemu-block] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW

2017-06-07 Thread Alberto Garcia
On Wed 07 Jun 2017 01:59:58 PM CEST, Kevin Wolf wrote:
> Am 07.06.2017 um 13:44 hat Alberto Garcia geschrieben:
>> ping
>
> You wanted to address two or three things in the next version, so I
> assumed that this version shouldn't be merged.

Right, I had a couple of minor changes, but the core of the series is
going to remain the same and can already be reviewed.

But of course I can just send the second version with my changes, I'll
do it then.

Berto



Re: [Qemu-block] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW

2017-06-07 Thread Kevin Wolf
Am 07.06.2017 um 13:44 hat Alberto Garcia geschrieben:
> ping

You wanted to address two or three things in the next version, so I
assumed that this version shouldn't be merged.

Kevin



Re: [Qemu-block] [PATCH 0/7] qcow2: Reduce the number of I/O ops when doing COW

2017-06-07 Thread Alberto Garcia
ping

On Tue, May 23, 2017 at 01:22:55PM +0200, Alberto Garcia wrote:
> Hi all,
> 
> here's a patch series that rewrites the copy-on-write code in the
> qcow2 driver to reduce the number of I/O operations.
> 
> The situation is that when a guest sends a write request and QEMU
> needs to allocate new cluster(s) in a qcow2 file, the unwritten
> regions of the new cluster(s) need to be filled with the existing data
> (e.g. from the backing image) or with zeroes.
> 
> The whole process can require up to 5 I/O operations:
> 
> 1) Write the data from the actual write request.
> 2) Read the existing data located before the guest data.
> 3) Write that data to the new clusters.
> 4) Read the existing data located after the guest data.
> 5) Write that data to the new clusters.
> 
> This series reduces that to only two operations:
> 
> 1) Read the existing data from the original clusters
> 2) Write the updated data (=original + guest request) to the new clusters
> 
> Step (1) implies that there's data that will be read but will be
> immediately discarded (because it's overwritten by the guest
> request). I haven't really detected any big performance problems
> because of that, but I decided to be conservative and my code includes
> a simple heuristic that keeps the old behavior if the amount of data
> to be discarded is higher than 16KB.
> 
> I've been testing this series in several scenarios, with different
> cluster sizes (32K, 64K, 1MB) and request sizes (from 4 up to 512KB),
> and both with an SSD and a rotating HDD. The results vary depending on
> the case, with an average increase of 60% in the number of IOPS in the
> HDD case, and 15% in the SSD case. In some cases there are really no
> big differences and the results are similar before and after this
> patch.
> 
> Further work for the future includes detecting when the data that
> needs to be written consists on zeroes (i.e. allocating a new cluster
> with no backing image) and optimizing that case, but let's start with
> this.
> 
> Regards,
> 
> Berto
> 
> Alberto Garcia (7):
>   qcow2: Remove unused Error in do_perform_cow()
>   qcow2: Use unsigned int for both members of Qcow2COWRegion
>   qcow2: Make perform_cow() call do_perform_cow() twice
>   qcow2: Split do_perform_cow() into _read(), _encrypt() and _write()
>   qcow2: Allow reading both COW regions with only one request
>   qcow2: Pass a QEMUIOVector to do_perform_cow_{read,write}()
>   qcow2: Merge the writing of the COW regions with the guest data
> 
>  block/qcow2-cluster.c | 188 
> +-
>  block/qcow2.c |  58 +---
>  block/qcow2.h |  11 ++-
>  3 files changed, 197 insertions(+), 60 deletions(-)
> 
> -- 
> 2.11.0



Re: [Qemu-block] Virtio-BLK/SCSI write requests and data payload checksums

2017-06-07 Thread Paolo Bonzini
> > Yes, it's ugly but it's legal.  It probably doesn't happen on real hardware
> > that computes the checksum after or during DMA and has some kind of buffer
> > inside the board.  But on virt there is only one copy until we reach the
> > actual physical hardware.
> 
> okay, so it's no bug. and there is no way to detect that the page is dirty
> (again). so i need a bounce buffer if i have data checksums?

I think so, yes.

Paolo



Re: [Qemu-block] [PATCH] blockjob: cancel blockjobs before stopping all iothreads

2017-06-07 Thread Alberto Garcia
On Sat 03 Jun 2017 07:48:37 AM CEST, sochin.jiang wrote:

> --- a/block.c
> +++ b/block.c
> @@ -3084,9 +3084,16 @@ static void bdrv_close(BlockDriverState *bs)
>  bdrv_drained_end(bs);
>  }
>  
> +void bdrv_cancel_all(void)
> +{
> +if (!block_jobs_is_empty()) {
> +block_job_cancel_sync_all();
> +}
> +}

Why do you need this function at all? block_job_cancel_sync_all() is
already doing nothing when the block_jobs list is empty.

Berto



Re: [Qemu-block] Virtio-BLK/SCSI write requests and data payload checksums

2017-06-07 Thread Peter Lieven

> Am 07.06.2017 um 01:17 schrieb Paolo Bonzini :
> 
> 
> 
> - Original Message -
>> From: "Peter Lieven" 
>> To: "qemu block" 
>> Cc: "Paolo Bonzini" , stefa...@redhat.com, 
>> kw...@redhat.com, "Max Reitz" ,
>> "Michael S. Tsirkin" 
>> Sent: Tuesday, June 6, 2017 10:13:51 PM
>> Subject: Virtio-BLK/SCSI write requests and data payload checksums
>> 
>> Hi,
>> 
>> 
>> ich have spend several hours debugging a strange checksum error issue and
>> finally found the cause, but I am totally unsure if whats happening
>> is correct or not.
>> 
>> Imagine a Protocol like iSCSI which has a Data Digest and which receives its
>> data via zero copy straight from the guest kernel through Qemu and
>> 
>> then out of the socket to the network. It calculates the CRC32C data digest
>> from the buffer it gets from Qemu. It totally relies thereby that the buffer
>> is not mangled after the write request has been submitted. But it seems this
>> assumption is not true.
>> 
>> [...] the contents of the buffer submitted to Qemu change while Qemu 
>> processes
>> the write.
>> In the end several write requests follow and the /etc/network/interfaces file
>> has the right content, but
>> is this at all legal?
> 
> Yes, it's ugly but it's legal.  It probably doesn't happen on real hardware
> that computes the checksum after or during DMA and has some kind of buffer
> inside the board.  But on virt there is only one copy until we reach the 
> actual
> physical hardware.

okay, so it's no bug. and there is no way to detect that the page is dirty 
(again). so i need a bounce buffer if i have data checksums?

Peter

> 
> Paolo
> 
>> If yes, checksum calculation of payload and zero copy
>> would not work together.
> 




Re: [Qemu-block] [Qemu-devel] [PATCH v3 11/16] virtio-scsi: Request BLK_PERM_AIO_CONTEXT_CHANGE for dataplane

2017-06-07 Thread Fam Zheng
On Wed, 05/24 10:52, Fam Zheng wrote:
> blk_set_aio_context is audited by perm API, so follow the protocol and
> request for permission first.
> 
> Signed-off-by: Fam Zheng 
> ---
>  hw/scsi/virtio-scsi.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 46a3e3f..074e235 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -794,6 +794,10 @@ static void virtio_scsi_hotplug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>  return;
>  }
>  virtio_scsi_acquire(s);
> +if (!blk_request_perm(sd->conf.blk, BLK_PERM_AIO_CONTEXT_CHANGE, 
> errp)) {

Inversed condition, should be s/!//.

Fam

> +virtio_scsi_release(s);
> +return;
> +}
>  blk_set_aio_context(sd->conf.blk, s->ctx);
>  virtio_scsi_release(s);
>  
> -- 
> 2.9.4
> 
> 



[Qemu-block] [PATCH] iscsi: use scsi_create_task()

2017-06-07 Thread Marc-André Lureau
The function does the same initialization, and matches with
scsi_free_scsi_task() usage, and qemu doesn't need to know the
allocator details.

Signed-off-by: Marc-André Lureau 
---
 block/iscsi.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 5daa201181..a6bcd8eb13 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1013,6 +1013,7 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
 struct iscsi_context *iscsi = iscsilun->iscsi;
 struct iscsi_data data;
 IscsiAIOCB *acb;
+int xfer_dir;
 
 acb = qemu_aio_get(_aiocb_info, bs, cb, opaque);
 
@@ -1034,30 +1035,26 @@ static BlockAIOCB *iscsi_aio_ioctl(BlockDriverState *bs,
 return NULL;
 }
 
-acb->task = malloc(sizeof(struct scsi_task));
-if (acb->task == NULL) {
-error_report("iSCSI: Failed to allocate task for scsi command. %s",
- iscsi_get_error(iscsi));
-qemu_aio_unref(acb);
-return NULL;
-}
-memset(acb->task, 0, sizeof(struct scsi_task));
-
 switch (acb->ioh->dxfer_direction) {
 case SG_DXFER_TO_DEV:
-acb->task->xfer_dir = SCSI_XFER_WRITE;
+xfer_dir = SCSI_XFER_WRITE;
 break;
 case SG_DXFER_FROM_DEV:
-acb->task->xfer_dir = SCSI_XFER_READ;
+xfer_dir = SCSI_XFER_READ;
 break;
 default:
-acb->task->xfer_dir = SCSI_XFER_NONE;
+xfer_dir = SCSI_XFER_NONE;
 break;
 }
 
-acb->task->cdb_size = acb->ioh->cmd_len;
-memcpy(>task->cdb[0], acb->ioh->cmdp, acb->ioh->cmd_len);
-acb->task->expxferlen = acb->ioh->dxfer_len;
+acb->task = scsi_create_task(acb->ioh->cmd_len, acb->ioh->cmdp,
+ xfer_dir, acb->ioh->dxfer_len);
+if (acb->task == NULL) {
+error_report("iSCSI: Failed to allocate task for scsi command. %s",
+ iscsi_get_error(iscsi));
+qemu_aio_unref(acb);
+return NULL;
+}
 
 data.size = 0;
 qemu_mutex_lock(>mutex);
-- 
2.13.0.91.g00982b8dd