[Qemu-block] [RFC 0/5] block: Generic file truncation/creation fallbacks

2019-07-11 Thread Max Reitz
Hi,

Some protocol drivers do not really support file truncation but still
implement .bdrv_co_truncate(): They just don’t do anything when asked to
shrink a file.  This is reflected by qemu-img, which warns if you resize
a file and it has the exact same length afterwards as it had before.

We can just do that generically.  There is no reason for some protocol
drivers to act ashamed and pretend nobody notices.  The only thing we
have to take care of is to zero everything in the first sector past the
desired EOF, so that format probing won’t go wrong.

(RFC: Is it really OK to just do this for all block drivers?)

Similarly, we can add a fallback file creation path: If a block driver
does not support creating a file but it already exists, we can just use
it.  All we have to do is truncate it to the desired size.


This is an RFC because it feels weird and I don’t want people to
associate me with weird stuff too closely.

Well, patch 1 isn’t really an RFC.  It’s just a fix.


I was inspired to this series by Maxim’s patch “block/nvme: add support
for image creation” (from his “Few fixes for userspace NVME driver”
series).


Max Reitz (5):
  block/nbd: Fix hang in .bdrv_close()
  block: Generic truncation fallback
  block: Fall back to fallback truncate function
  block: Generic file creation fallback
  iotests: Add test for fallback truncate/create

 block.c| 77 --
 block/file-posix.c | 21 +--
 block/io.c | 69 --
 block/nbd.c| 12 +-
 block/sheepdog.c   |  2 +-
 tests/qemu-iotests/259 | 71 +++
 tests/qemu-iotests/259.out | 20 ++
 tests/qemu-iotests/group   |  1 +
 8 files changed, 235 insertions(+), 38 deletions(-)
 create mode 100755 tests/qemu-iotests/259
 create mode 100644 tests/qemu-iotests/259.out

-- 
2.21.0




[Qemu-block] [RFC 1/5] block/nbd: Fix hang in .bdrv_close()

2019-07-11 Thread Max Reitz
When nbd_close() is called from a coroutine, the connection_co never
gets to run, and thus nbd_teardown_connection() hangs.

This is because aio_co_enter() only puts the connection_co into the main
coroutine's wake-up queue, so this main coroutine needs to yield and
reschedule itself to let the connection_co run.

Signed-off-by: Max Reitz 
---
 block/nbd.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/block/nbd.c b/block/nbd.c
index 81edabbf35..b83b6cd43e 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -135,7 +135,17 @@ static void nbd_teardown_connection(BlockDriverState *bs)
 qio_channel_shutdown(s->ioc,
  QIO_CHANNEL_SHUTDOWN_BOTH,
  NULL);
-BDRV_POLL_WHILE(bs, s->connection_co);
+
+if (qemu_in_coroutine()) {
+/* Let our caller poll and just yield until connection_co is done */
+while (s->connection_co) {
+aio_co_schedule(qemu_get_current_aio_context(),
+qemu_coroutine_self());
+qemu_coroutine_yield();
+}
+} else {
+BDRV_POLL_WHILE(bs, s->connection_co);
+}
 
 nbd_client_detach_aio_context(bs);
 object_unref(OBJECT(s->sioc));
-- 
2.21.0




[Qemu-block] [RFC 5/5] iotests: Add test for fallback truncate/create

2019-07-11 Thread Max Reitz
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/259 | 71 ++
 tests/qemu-iotests/259.out | 20 +++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 92 insertions(+)
 create mode 100755 tests/qemu-iotests/259
 create mode 100644 tests/qemu-iotests/259.out

diff --git a/tests/qemu-iotests/259 b/tests/qemu-iotests/259
new file mode 100755
index 00..6e3941378f
--- /dev/null
+++ b/tests/qemu-iotests/259
@@ -0,0 +1,71 @@
+#!/usr/bin/env bash
+#
+# Test generic image creation and truncation fallback (by using NBD)
+#
+# Copyright (C) 2019 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=mre...@redhat.com
+
+seq=$(basename $0)
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+_cleanup()
+{
+_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+_supported_proto nbd
+_supported_os Linux
+
+
+_make_test_img 64M
+
+echo
+echo '--- Testing no-op shrinking ---'
+
+$QEMU_IMG resize -f raw --shrink "$TEST_IMG" 32M
+
+echo
+echo '--- Testing non-working growing ---'
+
+$QEMU_IMG resize -f raw "$TEST_IMG" 128M
+
+echo
+echo '--- Testing creation ---'
+
+$QEMU_IMG create -f qcow2 "$TEST_IMG" 64M | _filter_img_create
+$QEMU_IMG info "$TEST_IMG" | _filter_img_info
+
+echo
+echo '--- Testing creation for which the node would need to grow ---'
+
+$QEMU_IMG create -f qcow2 -o preallocation=metadata "$TEST_IMG" 64M 2>&1 \
+| _filter_img_create
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/259.out b/tests/qemu-iotests/259.out
new file mode 100644
index 00..1e4b11055f
--- /dev/null
+++ b/tests/qemu-iotests/259.out
@@ -0,0 +1,20 @@
+QA output created by 259
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+
+--- Testing no-op shrinking ---
+qemu-img: Image was not resized; resizing may not be supported for this image
+
+--- Testing non-working growing ---
+qemu-img: Cannot grow this nbd node
+
+--- Testing creation ---
+Formatting 'TEST_DIR/t.IMGFMT', fmt=qcow2 size=67108864
+image: TEST_DIR/t.IMGFMT
+file format: qcow2
+virtual size: 64 MiB (67108864 bytes)
+disk size: unavailable
+
+--- Testing creation for which the node would need to grow ---
+qemu-img: TEST_DIR/t.IMGFMT: Could not resize image: Cannot grow this nbd node
+Formatting 'TEST_DIR/t.IMGFMT', fmt=qcow2 size=67108864 preallocation=metadata
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index b34c8e3c0c..80e7603174 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -269,3 +269,4 @@
 254 rw auto backing quick
 255 rw auto quick
 256 rw auto quick
+259 rw auto quick
-- 
2.21.0




[Qemu-block] [RFC 4/5] block: Generic file creation fallback

2019-07-11 Thread Max Reitz
If a protocol driver does not support image creation, we can see whether
maybe the file exists already.  If so, just truncating it will be
sufficient.

Signed-off-by: Max Reitz 
---
 block.c | 77 -
 1 file changed, 65 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index c139540f2b..8fb8e4dfda 100644
--- a/block.c
+++ b/block.c
@@ -531,20 +531,57 @@ out:
 return ret;
 }
 
-int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
+static int bdrv_create_file_fallback(const char *filename, BlockDriver *drv,
+ QemuOpts *opts, Error **errp)
 {
-BlockDriver *drv;
+BlockBackend *blk;
+QDict *options = qdict_new();
+int64_t size = 0;
+char *buf = NULL;
+PreallocMode prealloc;
 Error *local_err = NULL;
 int ret;
 
+size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
+buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
+prealloc = qapi_enum_parse(_lookup, buf,
+   PREALLOC_MODE_OFF, _err);
+g_free(buf);
+if (local_err) {
+error_propagate(errp, local_err);
+return -EINVAL;
+}
+
+qdict_put_str(options, "driver", drv->format_name);
+
+blk = blk_new_open(filename, NULL, options,
+   BDRV_O_RDWR | BDRV_O_RESIZE, errp);
+if (!blk) {
+error_prepend(errp, "Protocol driver '%s' does not support "
+  "image creation, and opening the image failed: ",
+  drv->format_name);
+return -EINVAL;
+}
+
+ret = blk_truncate(blk, size, prealloc, errp);
+blk_unref(blk);
+return ret;
+}
+
+int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp)
+{
+BlockDriver *drv;
+
 drv = bdrv_find_protocol(filename, true, errp);
 if (drv == NULL) {
 return -ENOENT;
 }
 
-ret = bdrv_create(drv, filename, opts, _err);
-error_propagate(errp, local_err);
-return ret;
+if (drv->bdrv_co_create_opts) {
+return bdrv_create(drv, filename, opts, errp);
+} else {
+return bdrv_create_file_fallback(filename, drv, opts, errp);
+}
 }
 
 /**
@@ -1420,6 +1457,24 @@ QemuOptsList bdrv_runtime_opts = {
 },
 };
 
+static QemuOptsList fallback_create_opts = {
+.name = "fallback-create-opts",
+.head = QTAILQ_HEAD_INITIALIZER(fallback_create_opts.head),
+.desc = {
+{
+.name = BLOCK_OPT_SIZE,
+.type = QEMU_OPT_SIZE,
+.help = "Virtual disk size"
+},
+{
+.name = BLOCK_OPT_PREALLOC,
+.type = QEMU_OPT_STRING,
+.help = "Preallocation mode (allowed values: off)"
+},
+{ /* end of list */ }
+}
+};
+
 /*
  * Common part for opening disk images and files
  *
@@ -5681,14 +5736,12 @@ void bdrv_img_create(const char *filename, const char 
*fmt,
 return;
 }
 
-if (!proto_drv->create_opts) {
-error_setg(errp, "Protocol driver '%s' does not support image 
creation",
-   proto_drv->format_name);
-return;
-}
-
 create_opts = qemu_opts_append(create_opts, drv->create_opts);
-create_opts = qemu_opts_append(create_opts, proto_drv->create_opts);
+if (proto_drv->create_opts) {
+create_opts = qemu_opts_append(create_opts, proto_drv->create_opts);
+} else {
+create_opts = qemu_opts_append(create_opts, _create_opts);
+}
 
 /* Create parameter list with default values */
 opts = qemu_opts_create(create_opts, NULL, 0, _abort);
-- 
2.21.0




[Qemu-block] [RFC 3/5] block: Fall back to fallback truncate function

2019-07-11 Thread Max Reitz
file-posix does not need to basically duplicate our fallback truncate
implementation; and sheepdog can fall back to it for "shrinking" files.

Signed-off-by: Max Reitz 
---
 block/file-posix.c | 21 +
 block/sheepdog.c   |  2 +-
 2 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index ab05b51a66..bcddfc7fbe 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2031,23 +2031,7 @@ static int coroutine_fn raw_co_truncate(BlockDriverState 
*bs, int64_t offset,
 return raw_regular_truncate(bs, s->fd, offset, prealloc, errp);
 }
 
-if (prealloc != PREALLOC_MODE_OFF) {
-error_setg(errp, "Preallocation mode '%s' unsupported for this "
-   "non-regular file", PreallocMode_str(prealloc));
-return -ENOTSUP;
-}
-
-if (S_ISCHR(st.st_mode) || S_ISBLK(st.st_mode)) {
-if (offset > raw_getlength(bs)) {
-error_setg(errp, "Cannot grow device files");
-return -EINVAL;
-}
-} else {
-error_setg(errp, "Resizing this file is not supported");
-return -ENOTSUP;
-}
-
-return 0;
+return -ENOTSUP;
 }
 
 #ifdef __OpenBSD__
@@ -3413,7 +3397,6 @@ static BlockDriver bdrv_host_device = {
 .bdrv_io_unplug = raw_aio_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
-.bdrv_co_truncate   = raw_co_truncate,
 .bdrv_getlength= raw_getlength,
 .bdrv_get_info = raw_get_info,
 .bdrv_get_allocated_file_size
@@ -3537,7 +3520,6 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_io_unplug = raw_aio_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
-.bdrv_co_truncate= raw_co_truncate,
 .bdrv_getlength  = raw_getlength,
 .has_variable_length = true,
 .bdrv_get_allocated_file_size
@@ -3669,7 +3651,6 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_io_unplug = raw_aio_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
 
-.bdrv_co_truncate= raw_co_truncate,
 .bdrv_getlength  = raw_getlength,
 .has_variable_length = true,
 .bdrv_get_allocated_file_size
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 6f402e5d4d..4af4961cb7 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -2301,7 +2301,7 @@ static int coroutine_fn sd_co_truncate(BlockDriverState 
*bs, int64_t offset,
 max_vdi_size = (UINT64_C(1) << s->inode.block_size_shift) * MAX_DATA_OBJS;
 if (offset < old_size) {
 error_setg(errp, "shrinking is not supported");
-return -EINVAL;
+return -ENOTSUP;
 } else if (offset > max_vdi_size) {
 error_setg(errp, "too big image size");
 return -EINVAL;
-- 
2.21.0




[Qemu-block] [RFC 2/5] block: Generic truncation fallback

2019-07-11 Thread Max Reitz
If a protocol driver does not support truncation, we call fall back to
effectively not doing anything if the new size is less than the actual
file size.  This is what we have been doing for some host device drivers
already.

The only caveat is that we have to zero out everything in the first
sector that lies beyond the new "EOF" so we do not get any surprises
with format probing.

Signed-off-by: Max Reitz 
---
 block/io.c | 69 ++
 1 file changed, 65 insertions(+), 4 deletions(-)

diff --git a/block/io.c b/block/io.c
index 24a18759fd..382728fa9a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3064,6 +3064,57 @@ static void bdrv_parent_cb_resize(BlockDriverState *bs)
 }
 }
 
+static int coroutine_fn bdrv_co_truncate_fallback(BdrvChild *child,
+  int64_t offset,
+  PreallocMode prealloc,
+  Error **errp)
+{
+BlockDriverState *bs = child->bs;
+int64_t cur_size = bdrv_getlength(bs);
+
+if (cur_size < 0) {
+error_setg_errno(errp, -cur_size,
+ "Failed to inquire current file size");
+return cur_size;
+}
+
+if (prealloc != PREALLOC_MODE_OFF) {
+error_setg(errp, "Unsupported preallocation mode: %s",
+   PreallocMode_str(prealloc));
+return -ENOTSUP;
+}
+
+if (offset > cur_size) {
+error_setg(errp, "Cannot grow this %s node", bs->drv->format_name);
+return -ENOTSUP;
+}
+
+/*
+ * Overwrite first "post-EOF" parts of the first sector with
+ * zeroes so raw images will not be misprobed
+ */
+if (offset < BDRV_SECTOR_SIZE && offset < cur_size) {
+int64_t fill_len = MIN(BDRV_SECTOR_SIZE - offset, cur_size - offset);
+int ret;
+
+if (!(child->perm & BLK_PERM_WRITE)) {
+error_setg(errp, "Cannot write to this node to clear the file past 
"
+   "the truncated EOF");
+return -EPERM;
+}
+
+ret = bdrv_co_pwrite_zeroes(child, offset, fill_len, 0);
+if (ret < 0) {
+error_setg_errno(errp, -ret,
+ "Failed to clear file past the truncated EOF");
+return ret;
+}
+}
+
+return 0;
+}
+
+
 /**
  * Truncate file to 'offset' bytes (needed only for file protocols)
  */
@@ -3074,6 +3125,7 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, 
int64_t offset,
 BlockDriver *drv = bs->drv;
 BdrvTrackedRequest req;
 int64_t old_size, new_bytes;
+Error *local_err = NULL;
 int ret;
 
 
@@ -3127,15 +3179,24 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, 
int64_t offset,
 ret = bdrv_co_truncate(bs->file, offset, prealloc, errp);
 goto out;
 }
-error_setg(errp, "Image format driver does not support resize");
+error_setg(_err, "Image format driver does not support resize");
 ret = -ENOTSUP;
-goto out;
+} else {
+ret = drv->bdrv_co_truncate(bs, offset, prealloc, _err);
 }
 
-ret = drv->bdrv_co_truncate(bs, offset, prealloc, errp);
-if (ret < 0) {
+if (ret == -ENOTSUP && drv->bdrv_file_open) {
+error_free(local_err);
+
+ret = bdrv_co_truncate_fallback(child, offset, prealloc, errp);
+if (ret < 0) {
+goto out;
+}
+} else if (ret < 0) {
+error_propagate(errp, local_err);
 goto out;
 }
+
 ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not refresh total sector count");
-- 
2.21.0




Re: [Qemu-block] [PATCH 8/8] iotests/257: test traditional sync modes

2019-07-11 Thread John Snow



On 7/11/19 8:37 AM, Max Reitz wrote:
> On 11.07.19 05:21, John Snow wrote:
>>
>> On 7/10/19 4:46 PM, Max Reitz wrote:
>>> On 10.07.19 21:00, John Snow wrote:
 On 7/10/19 1:14 PM, Max Reitz wrote:
> On 10.07.19 03:05, John Snow wrote:
>
> Hm.  How useful is bitmap support for 'top' then, anyway?  That means
> that if you want to resume a top backup, you always have to resume it
> like it was a full backup.  Which sounds kind of useless.
>
> Max
>

 Good point!

 I think this can be fixed by doing an initialization pass of the
 copy_bitmap when sync=top to set only the allocated regions in the bitmap.

 This means that the write notifier won't copy out regions that are
 written to that weren't already in the top layer. I believe this is
 actually a bugfix; the data we'd copy out in such cases is actually in
 the backing layer and shouldn't be copied with sync=top.
>>>
>>> Now that you mention it...  I didn’t realize that.  Yes, you’re right.
>>>
 So this would have two effects:
 (1) sync=top gets a little more judicious about what it copies out on
 sync=top, and
 (2) the bitmap return value is more meaningful again.

>>
>> This might be harder than I first thought.
>>
>> initializing the copy_bitmap generally happens before we install the
>> write notifier, which means that it occurs before the first yield.
>>
>> However, checking the allocation status can potentially be very slow,
>> can't it? I can't just hog the thread while I check.
> 
> I was thinking about that myself.  It isn’t that bad, because you aren’t
> doing the full block_status dance but just checking allocation status,
> which is reasonably quick (it just needs to look at the image format
> metadata, it doesn’t go down to the protocol layer).
> 
> But it’s probably not so good to halt the monitor for this, yes.
> 
>> There are ways to cooperatively process write notifier interrupts and
>> continue to check allocated status once we enter the main loop, but the
>> problem there becomes: if we fail early, how can we tell if the backup
>> is worth resuming?
>>
>> We might not have reached a convergence point for the copy_bitmap before
>> we failed, and still have a lot of extra bits set.
> 
> Is that so bad?
> 
>> I suppose at least in the case where we aren't trying to save the
>> copy_bitmap and need it to mean something specific, this is a reasonable
>> approach to fixing sync=TOP.
>>
>> As far as resume is concerned, I don't think I have good ideas. I could
>> emit an event or something if you're using sync=top with a bitmap for
>> output, but that feels *so* specialized for a niche(?) command that I
>> don't know if it's worth pursuing.
>>
>> (Plus, even then, what do you do if it fails before you see that event?
>> You just have to give up on what we copied out? That seems like a waste
>> and not the point of this exercise.)
> 
> Before that event, the bitmap can still be usable, as long as all
> “unknown” areas are set to dirty.  Sure, your resumed backup will then
> copy too much data.  But who cares.
> 
> So I don’t think you even need an event.
> 
>> The only way I can think of at all to get a retry on sync=top is to take
>> an always policy, and to allow a special invocation with something like
>> mode=bitmap+top:
> 
> Yes, that was my first idea, too.  But I didn’t even write about it,
> because of...
> 
>> "Assume we need to copy anything set in the bitmap, unless it's not in
>> the top layer, and then skip it."
>>
>> Which seems awful, because it would be a specialty mode for the
>> exclusive purpose of re-trying sync=top backups.
> 
> ...exactly this.
> 
>> Meh.
> 
> I don’t think it’s all that bad.
> 

No, it's just not ideal and it's something I'd have to defend in a
patch. It's a caveat that would need documenting.

"Hey, depending on how far the job got before it failed, you might not
want to resume it because it may not have finished determining which
segments held allocated data. There's no way to tell if this happened or
not."

It makes the decision making process by e.g. libvirt harder, though
there are still some heuristics you could use, like:

- Is the bitmap count less than the size of the top image?
- Is it bigger?

And that might be good enough when deciding how to proceed. I suppose if
we want to give more precise mechanisms for this we'd always be within
our right to continue refining it and just document that it MIGHT have
extra bits set.

--js



Re: [Qemu-block] [PATCH-for-4.1 v4 2/5] hw/block/pflash_cfi01: Use the correct READ_ARRAY value

2019-07-11 Thread Dr. David Alan Gilbert
* Philippe Mathieu-Daudé (phi...@redhat.com) wrote:
> In the "Read Array Flowchart" the command has a value of 0xFF.
> 
> In the document [*] the "Read Array Flowchart", the READ_ARRAY
> command has a value of 0xff.
> 
> Use the correct value in the pflash model.
> 
> There is no change of behavior in the guest, because:
> - when the guest were sending 0xFF, the reset_flash label
>   was setting the command value as 0x00
> - 0x00 was used internally for READ_ARRAY
> 
> To keep migration with older versions behaving correctly, we
> decide to always migrate the READ_ARRAY as 0x00.
> 
> [*] "Common Flash Interface (CFI) and Command Sets"
> (Intel Application Note 646)
> Appendix B "Basic Command Set"
> 
> Reviewed-by: John Snow 
> Reviewed-by: Alistair Francis 
> Regression-tested-by: Laszlo Ersek 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> v3: Handle migrating the 'cmd' field.
> v4: Handle migrating to older QEMU (Dave)
> 
> Since Laszlo stated he did not test migration [*], I'm keeping his
> test tag, because the change with v2 has no impact in the tests
> he ran.
> 
> Likewise I'm keeping John and Alistair tags, but I'd like an extra
> review for the migration change, thanks!
> 
> [*] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg00679.html
> ---
>  hw/block/pflash_cfi01.c | 57 ++---
>  1 file changed, 48 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
> index 9e34fd4e82..85bb2132c0 100644
> --- a/hw/block/pflash_cfi01.c
> +++ b/hw/block/pflash_cfi01.c
> @@ -96,6 +96,37 @@ struct PFlashCFI01 {
>  bool old_multiple_chip_handling;
>  };
>  
> +static int pflash_pre_save(void *opaque)
> +{
> +PFlashCFI01 *s = opaque;
> +
> +/*
> + * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the
> + * READ_ARRAY command. To preserve migrating to these older version,
> + * always migrate the READ_ARRAY command as 0x00.
> + */
> +if (s->cmd == 0xff) {
> +s->cmd = 0x00;
> +}
> +
> +return 0;
> +}
> +
> +static int pflash_post_save(void *opaque)
> +{
> +PFlashCFI01 *s = opaque;
> +
> +/*
> + * If migration failed, the guest will continue to run.
> + * Restore the correct READ_ARRAY value.
> + */
> +if (s->cmd == 0x00) {
> +s->cmd = 0xff;
> +}

OK, from a migration point of view I think we're OK, as long
as you never have a valid situation where cmd was 0x00 and
it's now suddenly going to become 0xff.

Dave

> +return 0;
> +}
> +
>  static int pflash_post_load(void *opaque, int version_id);
>  
>  static const VMStateDescription vmstate_pflash = {
> @@ -103,6 +134,8 @@ static const VMStateDescription vmstate_pflash = {
>  .version_id = 1,
>  .minimum_version_id = 1,
>  .post_load = pflash_post_load,
> +.pre_save = pflash_pre_save,
> +.post_save = pflash_post_save,
>  .fields = (VMStateField[]) {
>  VMSTATE_UINT8(wcycle, PFlashCFI01),
>  VMSTATE_UINT8(cmd, PFlashCFI01),
> @@ -277,10 +310,9 @@ static uint32_t pflash_read(PFlashCFI01 *pfl, hwaddr 
> offset,
>  /* This should never happen : reset state & treat it as a read */
>  DPRINTF("%s: unknown command state: %x\n", __func__, pfl->cmd);
>  pfl->wcycle = 0;
> -pfl->cmd = 0;
> +pfl->cmd = 0xff;
>  /* fall through to read code */
> -case 0x00:
> -/* Flash area read */
> +case 0xff: /* Read Array */
>  ret = pflash_data_read(pfl, offset, width, be);
>  break;
>  case 0x10: /* Single byte program */
> @@ -448,8 +480,6 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
>  case 0:
>  /* read mode */
>  switch (cmd) {
> -case 0x00: /* ??? */
> -goto reset_flash;
>  case 0x10: /* Single Byte Program */
>  case 0x40: /* Single Byte Program */
>  DPRINTF("%s: Single Byte Program\n", __func__);
> @@ -526,7 +556,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
>  if (cmd == 0xd0) { /* confirm */
>  pfl->wcycle = 0;
>  pfl->status |= 0x80;
> -} else if (cmd == 0xff) { /* read array mode */
> +} else if (cmd == 0xff) { /* Read Array */
>  goto reset_flash;
>  } else
>  goto error_flash;
> @@ -553,7 +583,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
>  } else if (cmd == 0x01) {
>  pfl->wcycle = 0;
>  pfl->status |= 0x80;
> -} else if (cmd == 0xff) {
> +} else if (cmd == 0xff) { /* read array mode */
>  goto reset_flash;
>  } else {
>  DPRINTF("%s: Unknown (un)locking command\n", __func__);
> @@ -645,7 +675,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
>  trace_pflash_reset();
>  

Re: [Qemu-block] [PATCH v3] LUKS: support preallocation

2019-07-11 Thread Stefano Garzarella
On Thu, Jul 11, 2019 at 06:09:40PM +0300, Maxim Levitsky wrote:
> preallocation=off and preallocation=metadata
> both allocate luks header only, and preallocation=falloc/full
> is passed to underlying file.
> 
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
> 
> Signed-off-by: Maxim Levitsky 
> 
> ---
> 
> Note that QMP support was only compile tested, since I am still learning
> on how to use it.
> 
> If there is some library/script/etc which makes it more high level,
> I would more that glad to hear about it. So far I used the qmp-shell
> 
> Also can I use qmp's blockdev-create outside a vm running?
> 
>  block/crypto.c   | 29 ++---
>  qapi/block-core.json |  5 -
>  2 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/block/crypto.c b/block/crypto.c
> index 8237424ae6..034a645652 100644
> --- a/block/crypto.c
> +++ b/block/crypto.c
> @@ -74,6 +74,7 @@ static ssize_t block_crypto_read_func(QCryptoBlock *block,
>  struct BlockCryptoCreateData {
>  BlockBackend *blk;
>  uint64_t size;
> +PreallocMode prealloc;
>  };
>  
>  
> @@ -112,7 +113,7 @@ static ssize_t block_crypto_init_func(QCryptoBlock *block,
>   * available to the guest, so we must take account of that
>   * which will be used by the crypto header
>   */
> -return blk_truncate(data->blk, data->size + headerlen, PREALLOC_MODE_OFF,
> +return blk_truncate(data->blk, data->size + headerlen, data->prealloc,
>  errp);
>  }
>  
> @@ -251,6 +252,7 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
> format,
>  static int block_crypto_co_create_generic(BlockDriverState *bs,
>int64_t size,
>QCryptoBlockCreateOptions *opts,
> +  PreallocMode prealloc,
>Error **errp)
>  {
>  int ret;
> @@ -266,9 +268,14 @@ static int 
> block_crypto_co_create_generic(BlockDriverState *bs,
>  goto cleanup;
>  }
>  
> +if (prealloc == PREALLOC_MODE_METADATA) {
> +prealloc = PREALLOC_MODE_OFF;
> +}
> +
>  data = (struct BlockCryptoCreateData) {
>  .blk = blk,
>  .size = size,
> +.prealloc = prealloc,
>  };
>  
>  crypto = qcrypto_block_create(opts, NULL,
> @@ -500,6 +507,7 @@ block_crypto_co_create_luks(BlockdevCreateOptions 
> *create_options, Error **errp)
>  BlockdevCreateOptionsLUKS *luks_opts;
>  BlockDriverState *bs = NULL;
>  QCryptoBlockCreateOptions create_opts;
> +PreallocMode preallocation = PREALLOC_MODE_OFF;
>  int ret;
>  
>  assert(create_options->driver == BLOCKDEV_DRIVER_LUKS);
> @@ -515,8 +523,11 @@ block_crypto_co_create_luks(BlockdevCreateOptions 
> *create_options, Error **errp)
>  .u.luks = *qapi_BlockdevCreateOptionsLUKS_base(luks_opts),
>  };
>  
> +if (luks_opts->has_preallocation)
> +preallocation = luks_opts->preallocation;
> +
>  ret = block_crypto_co_create_generic(bs, luks_opts->size, _opts,
> - errp);
> + preallocation, errp);
>  if (ret < 0) {
>  goto fail;
>  }
> @@ -534,12 +545,24 @@ static int coroutine_fn 
> block_crypto_co_create_opts_luks(const char *filename,
>  QCryptoBlockCreateOptions *create_opts = NULL;
>  BlockDriverState *bs = NULL;
>  QDict *cryptoopts;
> +PreallocMode prealloc;
> +char *buf = NULL;
>  int64_t size;
>  int ret;
> +Error *local_err = NULL;
>  
>  /* Parse options */
>  size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
>  
> +buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
> +prealloc = qapi_enum_parse(_lookup, buf,
> +   PREALLOC_MODE_OFF, _err);
> +g_free(buf);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +return -EINVAL;
> +}
> +
>  cryptoopts = qemu_opts_to_qdict_filtered(opts, NULL,
>   _crypto_create_opts_luks,
>   true);
> @@ -565,7 +588,7 @@ static int coroutine_fn 
> block_crypto_co_create_opts_luks(const char *filename,
>  }
>  
>  /* Create format layer */
> -ret = block_crypto_co_create_generic(bs, size, create_opts, errp);
> +ret = block_crypto_co_create_generic(bs, size, create_opts, prealloc, 
> errp);
>  if (ret < 0) {
>  goto fail;
>  }
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 0d43d4f37c..ebcfc9f903 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -4205,13 +4205,16 @@
>  #
>  # @file Node to create the image format on
>  # @size Size of the virtual disk in bytes
> +# @preallocationPreallocation mode for the new image (default: off;
> +#   allowed values: 

[Qemu-block] [PATCH-for-4.1 v4 5/5] hw/block/pflash_cfi01: Add the DeviceReset() handler

2019-07-11 Thread Philippe Mathieu-Daudé
A "system reset" sets the device state machine in READ_ARRAY mode
and, after some delay, set the SR.7 READY bit.

We do not model timings, so we set the SR.7 bit directly.

The TYPE_DEVICE interface provides a DeviceReset handler.
This pflash device is a subclass of TYPE_SYS_BUS_DEVICE (which
is a subclass of TYPE_DEVICE).
SYS_BUS devices are automatically plugged into the 'main system
bus', which is the root of the qbus tree.
Devices in the qbus tree are guaranteed to have their reset()
handler called after realize() and before we try to run the guest.

To avoid incoherent states when the machine resets (see but report
below), factor out the reset code into pflash_cfi01_system_reset,
and register the method as a device reset callback.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1678713
Reported-by: Laszlo Ersek 
Reviewed-by: John Snow 
Regression-tested-by: Laszlo Ersek 
Reviewed-by: Alistair Francis 
Signed-off-by: Philippe Mathieu-Daudé 
---
v3: reword description
---
 hw/block/pflash_cfi01.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index aa2126f6dc..dd2ace92a8 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -795,8 +795,6 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
 pfl->max_device_width = pfl->device_width;
 }
 
-pflash_mode_read_array(pfl);
-pfl->status = 0x80; /* WSM ready */
 /* Hardcoded CFI table */
 /* Standard "QRY" string */
 pfl->cfi_table[0x10] = 'Q';
@@ -884,6 +882,18 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
 pfl->cfi_table[0x3f] = 0x01; /* Number of protection fields */
 }
 
+static void pflash_cfi01_system_reset(DeviceState *dev)
+{
+PFlashCFI01 *pfl = PFLASH_CFI01(dev);
+
+pflash_mode_read_array(pfl);
+/*
+ * The WSM ready timer occurs at most 150ns after system reset.
+ * This model deliberately ignores this delay.
+ */
+pfl->status = 0x80;
+}
+
 static Property pflash_cfi01_properties[] = {
 DEFINE_PROP_DRIVE("drive", PFlashCFI01, blk),
 /* num-blocks is the number of blocks actually visible to the guest,
@@ -928,6 +938,7 @@ static void pflash_cfi01_class_init(ObjectClass *klass, 
void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
 
+dc->reset = pflash_cfi01_system_reset;
 dc->realize = pflash_cfi01_realize;
 dc->props = pflash_cfi01_properties;
 dc->vmsd = _pflash;
-- 
2.20.1




[Qemu-block] [PATCH-for-4.1 v4 3/5] hw/block/pflash_cfi01: Extract pflash_mode_read_array()

2019-07-11 Thread Philippe Mathieu-Daudé
The same pattern is used when setting the flash in READ_ARRAY mode:
- Set the state machine command to READ_ARRAY
- Reset the write_cycle counter
- Reset the memory region in ROMD

Refactor the current code by extracting this pattern.
It is used twice:
- On a write access (on command failure, error, or explicitly asked)
- When the device is initialized. Here the ROMD mode is hidden
  by the memory_region_init_rom_device() call.

Rename the 'reset_flash' as 'mode_read_array' to make explicit we
do not reset the device, we simply set its internal state machine
in the READ_ARRAY mode. We do not reset the status register error
bits, as a device reset would do.

Reviewed-by: John Snow 
Reviewed-by: Alistair Francis 
Regression-tested-by: Laszlo Ersek 
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/pflash_cfi01.c | 36 
 hw/block/trace-events   |  1 +
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 85bb2132c0..6c3fefcd2d 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -145,6 +145,14 @@ static const VMStateDescription vmstate_pflash = {
 }
 };
 
+static void pflash_mode_read_array(PFlashCFI01 *pfl)
+{
+trace_pflash_mode_read_array();
+pfl->cmd = 0xff; /* Read Array */
+pfl->wcycle = 0;
+memory_region_rom_device_set_romd(>mem, true);
+}
+
 /* Perform a CFI query based on the bank width of the flash.
  * If this code is called we know we have a device_width set for
  * this flash.
@@ -502,7 +510,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
 case 0x50: /* Clear status bits */
 DPRINTF("%s: Clear status bits\n", __func__);
 pfl->status = 0x0;
-goto reset_flash;
+goto mode_read_array;
 case 0x60: /* Block (un)lock */
 DPRINTF("%s: Block unlock\n", __func__);
 break;
@@ -527,10 +535,10 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
 break;
 case 0xf0: /* Probe for AMD flash */
 DPRINTF("%s: Probe for AMD flash\n", __func__);
-goto reset_flash;
+goto mode_read_array;
 case 0xff: /* Read array mode */
 DPRINTF("%s: Read array mode\n", __func__);
-goto reset_flash;
+goto mode_read_array;
 default:
 goto error_flash;
 }
@@ -557,7 +565,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
 pfl->wcycle = 0;
 pfl->status |= 0x80;
 } else if (cmd == 0xff) { /* Read Array */
-goto reset_flash;
+goto mode_read_array;
 } else
 goto error_flash;
 
@@ -584,15 +592,15 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
 pfl->wcycle = 0;
 pfl->status |= 0x80;
 } else if (cmd == 0xff) { /* read array mode */
-goto reset_flash;
+goto mode_read_array;
 } else {
 DPRINTF("%s: Unknown (un)locking command\n", __func__);
-goto reset_flash;
+goto mode_read_array;
 }
 break;
 case 0x98:
 if (cmd == 0xff) {
-goto reset_flash;
+goto mode_read_array;
 } else {
 DPRINTF("%s: leaving query mode\n", __func__);
 }
@@ -652,7 +660,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
 " the data is already written to storage!\n"
 "Flash device reset into READ mode.\n",
 __func__);
-goto reset_flash;
+goto mode_read_array;
 }
 break;
 default:
@@ -662,7 +670,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
 default:
 /* Should never happen */
 DPRINTF("%s: invalid write state\n",  __func__);
-goto reset_flash;
+goto mode_read_array;
 }
 return;
 
@@ -671,11 +679,8 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
   "(offset " TARGET_FMT_plx ", wcycle 0x%x cmd 0x%x value 
0x%x)"
   "\n", __func__, offset, pfl->wcycle, pfl->cmd, value);
 
- reset_flash:
-trace_pflash_reset();
-memory_region_rom_device_set_romd(>mem, true);
-pfl->wcycle = 0;
-pfl->cmd = 0xff;
+ mode_read_array:
+pflash_mode_read_array(pfl);
 }
 
 
@@ -790,8 +795,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
 pfl->max_device_width = pfl->device_width;
 }
 
-pfl->wcycle = 0;
-pfl->cmd = 0xff;
+pflash_mode_read_array(pfl);
 pfl->status = 0;
 /* Hardcoded CFI table */
 /* Standard "QRY" string */
diff --git a/hw/block/trace-events b/hw/block/trace-events
index 13d1b21dd4..91a8a106c0 100644
--- a/hw/block/trace-events

[Qemu-block] [PATCH-for-4.1 v4 4/5] hw/block/pflash_cfi01: Start state machine as READY to accept commands

2019-07-11 Thread Philippe Mathieu-Daudé
When the state machine is ready to accept command, the bit 7 of
the status register (SR) is set to 1.
The guest polls the status register and check this bit before
writting command to the internal 'Write State Machine' (WSM).

Set SR.7 bit to 1 when the device is created.

There is no migration impact by this change.

Reference: Read Array Flowchart
  "Common Flash Interface (CFI) and Command Sets"
   (Intel Application Note 646)
   Appendix B "Basic Command Set"

Reviewed-by: John Snow 
Reviewed-by: Alistair Francis 
Regression-tested-by: Laszlo Ersek 
Signed-off-by: Philippe Mathieu-Daudé 
---
v3: Added migration comment.
---
 hw/block/pflash_cfi01.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 6c3fefcd2d..aa2126f6dc 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -796,7 +796,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
 }
 
 pflash_mode_read_array(pfl);
-pfl->status = 0;
+pfl->status = 0x80; /* WSM ready */
 /* Hardcoded CFI table */
 /* Standard "QRY" string */
 pfl->cfi_table[0x10] = 'Q';
-- 
2.20.1




[Qemu-block] [PATCH-for-4.1 v4 2/5] hw/block/pflash_cfi01: Use the correct READ_ARRAY value

2019-07-11 Thread Philippe Mathieu-Daudé
In the "Read Array Flowchart" the command has a value of 0xFF.

In the document [*] the "Read Array Flowchart", the READ_ARRAY
command has a value of 0xff.

Use the correct value in the pflash model.

There is no change of behavior in the guest, because:
- when the guest were sending 0xFF, the reset_flash label
  was setting the command value as 0x00
- 0x00 was used internally for READ_ARRAY

To keep migration with older versions behaving correctly, we
decide to always migrate the READ_ARRAY as 0x00.

[*] "Common Flash Interface (CFI) and Command Sets"
(Intel Application Note 646)
Appendix B "Basic Command Set"

Reviewed-by: John Snow 
Reviewed-by: Alistair Francis 
Regression-tested-by: Laszlo Ersek 
Signed-off-by: Philippe Mathieu-Daudé 
---
v3: Handle migrating the 'cmd' field.
v4: Handle migrating to older QEMU (Dave)

Since Laszlo stated he did not test migration [*], I'm keeping his
test tag, because the change with v2 has no impact in the tests
he ran.

Likewise I'm keeping John and Alistair tags, but I'd like an extra
review for the migration change, thanks!

[*] https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg00679.html
---
 hw/block/pflash_cfi01.c | 57 ++---
 1 file changed, 48 insertions(+), 9 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 9e34fd4e82..85bb2132c0 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -96,6 +96,37 @@ struct PFlashCFI01 {
 bool old_multiple_chip_handling;
 };
 
+static int pflash_pre_save(void *opaque)
+{
+PFlashCFI01 *s = opaque;
+
+/*
+ * Previous to QEMU v4.1 an incorrect value of 0x00 was used for the
+ * READ_ARRAY command. To preserve migrating to these older version,
+ * always migrate the READ_ARRAY command as 0x00.
+ */
+if (s->cmd == 0xff) {
+s->cmd = 0x00;
+}
+
+return 0;
+}
+
+static int pflash_post_save(void *opaque)
+{
+PFlashCFI01 *s = opaque;
+
+/*
+ * If migration failed, the guest will continue to run.
+ * Restore the correct READ_ARRAY value.
+ */
+if (s->cmd == 0x00) {
+s->cmd = 0xff;
+}
+
+return 0;
+}
+
 static int pflash_post_load(void *opaque, int version_id);
 
 static const VMStateDescription vmstate_pflash = {
@@ -103,6 +134,8 @@ static const VMStateDescription vmstate_pflash = {
 .version_id = 1,
 .minimum_version_id = 1,
 .post_load = pflash_post_load,
+.pre_save = pflash_pre_save,
+.post_save = pflash_post_save,
 .fields = (VMStateField[]) {
 VMSTATE_UINT8(wcycle, PFlashCFI01),
 VMSTATE_UINT8(cmd, PFlashCFI01),
@@ -277,10 +310,9 @@ static uint32_t pflash_read(PFlashCFI01 *pfl, hwaddr 
offset,
 /* This should never happen : reset state & treat it as a read */
 DPRINTF("%s: unknown command state: %x\n", __func__, pfl->cmd);
 pfl->wcycle = 0;
-pfl->cmd = 0;
+pfl->cmd = 0xff;
 /* fall through to read code */
-case 0x00:
-/* Flash area read */
+case 0xff: /* Read Array */
 ret = pflash_data_read(pfl, offset, width, be);
 break;
 case 0x10: /* Single byte program */
@@ -448,8 +480,6 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
 case 0:
 /* read mode */
 switch (cmd) {
-case 0x00: /* ??? */
-goto reset_flash;
 case 0x10: /* Single Byte Program */
 case 0x40: /* Single Byte Program */
 DPRINTF("%s: Single Byte Program\n", __func__);
@@ -526,7 +556,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
 if (cmd == 0xd0) { /* confirm */
 pfl->wcycle = 0;
 pfl->status |= 0x80;
-} else if (cmd == 0xff) { /* read array mode */
+} else if (cmd == 0xff) { /* Read Array */
 goto reset_flash;
 } else
 goto error_flash;
@@ -553,7 +583,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
 } else if (cmd == 0x01) {
 pfl->wcycle = 0;
 pfl->status |= 0x80;
-} else if (cmd == 0xff) {
+} else if (cmd == 0xff) { /* read array mode */
 goto reset_flash;
 } else {
 DPRINTF("%s: Unknown (un)locking command\n", __func__);
@@ -645,7 +675,7 @@ static void pflash_write(PFlashCFI01 *pfl, hwaddr offset,
 trace_pflash_reset();
 memory_region_rom_device_set_romd(>mem, true);
 pfl->wcycle = 0;
-pfl->cmd = 0;
+pfl->cmd = 0xff;
 }
 
 
@@ -761,7 +791,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
 }
 
 pfl->wcycle = 0;
-pfl->cmd = 0;
+pfl->cmd = 0xff;
 pfl->status = 0;
 /* Hardcoded CFI table */
 /* Standard "QRY" string */
@@ -1001,5 +1031,14 @@ static int pflash_post_load(void *opaque, int version_id)
 pfl->vmstate = 

[Qemu-block] [PATCH-for-4.1 v4 0/5] hw/block/pflash_cfi01: Add DeviceReset() handler

2019-07-11 Thread Philippe Mathieu-Daudé
The pflash device lacks a reset() function.
When a machine is resetted, the flash might be in an
inconsistent state, leading to unexpected behavior:
https://bugzilla.redhat.com/show_bug.cgi?id=1678713
Resolve this issue by adding a DeviceReset() handler.

Fix also two minor issues, and clean a bit the codebase.

Since v1: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg00962.html
- addressed Laszlo review comments

Since v2:
- consider migration (Laszlo, Peter)

Since v3:
- more reliable migration (Dave)
- dropped patches 6-9 not required for next release

$ git backport-diff -u v3
Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/5:[] [--] 'hw/block/pflash_cfi01: Removed an unused timer'
002/5:[0048] [FC] 'hw/block/pflash_cfi01: Use the correct READ_ARRAY value'
003/5:[] [--] 'hw/block/pflash_cfi01: Extract pflash_mode_read_array()'
004/5:[] [--] 'hw/block/pflash_cfi01: Start state machine as READY to 
accept commands'
005/5:[] [--] 'hw/block/pflash_cfi01: Add the DeviceReset() handler'

Regards,

Phil.

Philippe Mathieu-Daudé (5):
  hw/block/pflash_cfi01: Removed an unused timer
  hw/block/pflash_cfi01: Use the correct READ_ARRAY value
  hw/block/pflash_cfi01: Extract pflash_mode_read_array()
  hw/block/pflash_cfi01: Start state machine as READY to accept commands
  hw/block/pflash_cfi01: Add the DeviceReset() handler

 hw/block/pflash_cfi01.c | 109 +++-
 hw/block/trace-events   |   1 +
 2 files changed, 75 insertions(+), 35 deletions(-)

-- 
2.20.1




[Qemu-block] [PATCH-for-4.1 v4 1/5] hw/block/pflash_cfi01: Removed an unused timer

2019-07-11 Thread Philippe Mathieu-Daudé
The 'CFI02' NOR flash was introduced in commit 29133e9a0fff, with
timing modelled. One year later, the CFI01 model was introduced
(commit 05ee37ebf630) based on the CFI02 model. As noted in the
header, "It does not support timings". 12 years later, we never
had to model the device timings. Time to remove the unused timer,
we can still add it back if required.

Suggested-by: Laszlo Ersek 
Reviewed-by: Wei Yang 
Reviewed-by: Laszlo Ersek 
Reviewed-by: Alistair Francis 
Regression-tested-by: Laszlo Ersek 
Signed-off-by: Philippe Mathieu-Daudé 
---
v2: Fixed commit description (Laszlo)
---
 hw/block/pflash_cfi01.c | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index db4a246b22..9e34fd4e82 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -42,7 +42,6 @@
 #include "hw/block/flash.h"
 #include "sysemu/block-backend.h"
 #include "qapi/error.h"
-#include "qemu/timer.h"
 #include "qemu/bitops.h"
 #include "qemu/error-report.h"
 #include "qemu/host-utils.h"
@@ -90,7 +89,6 @@ struct PFlashCFI01 {
 uint8_t cfi_table[0x52];
 uint64_t counter;
 unsigned int writeblock_size;
-QEMUTimer *timer;
 MemoryRegion mem;
 char *name;
 void *storage;
@@ -114,18 +112,6 @@ static const VMStateDescription vmstate_pflash = {
 }
 };
 
-static void pflash_timer (void *opaque)
-{
-PFlashCFI01 *pfl = opaque;
-
-trace_pflash_timer_expired(pfl->cmd);
-/* Reset flash */
-pfl->status ^= 0x80;
-memory_region_rom_device_set_romd(>mem, true);
-pfl->wcycle = 0;
-pfl->cmd = 0;
-}
-
 /* Perform a CFI query based on the bank width of the flash.
  * If this code is called we know we have a device_width set for
  * this flash.
@@ -774,7 +760,6 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
 pfl->max_device_width = pfl->device_width;
 }
 
-pfl->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, pflash_timer, pfl);
 pfl->wcycle = 0;
 pfl->cmd = 0;
 pfl->status = 0;
-- 
2.20.1




[Qemu-block] [PATCH v3] LUKS: support preallocation

2019-07-11 Thread Maxim Levitsky
preallocation=off and preallocation=metadata
both allocate luks header only, and preallocation=falloc/full
is passed to underlying file.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951

Signed-off-by: Maxim Levitsky 

---

Note that QMP support was only compile tested, since I am still learning
on how to use it.

If there is some library/script/etc which makes it more high level,
I would more that glad to hear about it. So far I used the qmp-shell

Also can I use qmp's blockdev-create outside a vm running?

 block/crypto.c   | 29 ++---
 qapi/block-core.json |  5 -
 2 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 8237424ae6..034a645652 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -74,6 +74,7 @@ static ssize_t block_crypto_read_func(QCryptoBlock *block,
 struct BlockCryptoCreateData {
 BlockBackend *blk;
 uint64_t size;
+PreallocMode prealloc;
 };
 
 
@@ -112,7 +113,7 @@ static ssize_t block_crypto_init_func(QCryptoBlock *block,
  * available to the guest, so we must take account of that
  * which will be used by the crypto header
  */
-return blk_truncate(data->blk, data->size + headerlen, PREALLOC_MODE_OFF,
+return blk_truncate(data->blk, data->size + headerlen, data->prealloc,
 errp);
 }
 
@@ -251,6 +252,7 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
 static int block_crypto_co_create_generic(BlockDriverState *bs,
   int64_t size,
   QCryptoBlockCreateOptions *opts,
+  PreallocMode prealloc,
   Error **errp)
 {
 int ret;
@@ -266,9 +268,14 @@ static int block_crypto_co_create_generic(BlockDriverState 
*bs,
 goto cleanup;
 }
 
+if (prealloc == PREALLOC_MODE_METADATA) {
+prealloc = PREALLOC_MODE_OFF;
+}
+
 data = (struct BlockCryptoCreateData) {
 .blk = blk,
 .size = size,
+.prealloc = prealloc,
 };
 
 crypto = qcrypto_block_create(opts, NULL,
@@ -500,6 +507,7 @@ block_crypto_co_create_luks(BlockdevCreateOptions 
*create_options, Error **errp)
 BlockdevCreateOptionsLUKS *luks_opts;
 BlockDriverState *bs = NULL;
 QCryptoBlockCreateOptions create_opts;
+PreallocMode preallocation = PREALLOC_MODE_OFF;
 int ret;
 
 assert(create_options->driver == BLOCKDEV_DRIVER_LUKS);
@@ -515,8 +523,11 @@ block_crypto_co_create_luks(BlockdevCreateOptions 
*create_options, Error **errp)
 .u.luks = *qapi_BlockdevCreateOptionsLUKS_base(luks_opts),
 };
 
+if (luks_opts->has_preallocation)
+preallocation = luks_opts->preallocation;
+
 ret = block_crypto_co_create_generic(bs, luks_opts->size, _opts,
- errp);
+ preallocation, errp);
 if (ret < 0) {
 goto fail;
 }
@@ -534,12 +545,24 @@ static int coroutine_fn 
block_crypto_co_create_opts_luks(const char *filename,
 QCryptoBlockCreateOptions *create_opts = NULL;
 BlockDriverState *bs = NULL;
 QDict *cryptoopts;
+PreallocMode prealloc;
+char *buf = NULL;
 int64_t size;
 int ret;
+Error *local_err = NULL;
 
 /* Parse options */
 size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
 
+buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
+prealloc = qapi_enum_parse(_lookup, buf,
+   PREALLOC_MODE_OFF, _err);
+g_free(buf);
+if (local_err) {
+error_propagate(errp, local_err);
+return -EINVAL;
+}
+
 cryptoopts = qemu_opts_to_qdict_filtered(opts, NULL,
  _crypto_create_opts_luks,
  true);
@@ -565,7 +588,7 @@ static int coroutine_fn 
block_crypto_co_create_opts_luks(const char *filename,
 }
 
 /* Create format layer */
-ret = block_crypto_co_create_generic(bs, size, create_opts, errp);
+ret = block_crypto_co_create_generic(bs, size, create_opts, prealloc, 
errp);
 if (ret < 0) {
 goto fail;
 }
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0d43d4f37c..ebcfc9f903 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4205,13 +4205,16 @@
 #
 # @file Node to create the image format on
 # @size Size of the virtual disk in bytes
+# @preallocationPreallocation mode for the new image (default: off;
+#   allowed values: off/falloc/full
 #
 # Since: 2.12
 ##
 { 'struct': 'BlockdevCreateOptionsLUKS',
   'base': 'QCryptoBlockCreateOptionsLUKS',
   'data': { 'file': 'BlockdevRef',
-'size': 'size' } }
+'size': 'size',
+'*preallocation':   'PreallocMode' } }
 
 ##
 # @BlockdevCreateOptionsNfs:
-- 

Re: [Qemu-block] [RFC] Re-evaluating subcluster allocation for qcow2 images

2019-07-11 Thread Alberto Garcia
On Thu 11 Jul 2019 04:32:34 PM CEST, Kevin Wolf wrote:

>> - It is possible to configure very easily the number of subclusters per
>>   cluster. It is now hardcoded to 32 in qcow2_do_open() but any power of
>>   2 would work (just change the number there if you want to test
>>   it). Would an option for this be worth adding?
>
> I think for testing we can just change the constant. Once th feature
> is merged and used in production, I don't think there is any reason to
> leave bits unused.

Me neither unless we want to allow the 64 subclusters scenario that I
mentioned.

>> - We would now have "all zeroes" bits at the cluster and subcluster
>> levels, so there's an ambiguity here that we need to solve. In
>> particular, what happens if we have a QCOW2_CLUSTER_ZERO_ALLOC
>> cluster but some bits from the bitmap are set? Do we ignore them
>> completely?
>
> The (super)cluster zero bit should probably always be clear if
> subclusters are used. If it's set, we have a corrupted image.

I mentioned in an earlier e-mail that one possibility is to leave that
bit as it is now and use the bitmap only for the allocation status (so
we'd have 64 subclusters). If QCOW_OFLAG_ZERO is set and the subcluster
is not allocated then it's all zeroes.

With this we'd double the amount of subclusters but we'd lose the
possibility to have zero and unallocated subclusters at the same time.

>> I also ran some I/O tests using a similar scenario like last time
>> (SSD drive, 40GB backing image). Here are the results, you can see
>> the difference between the previous prototype (8 subclusters per
>> cluster) and the new one (32):
>
> Is the 8 subclusters test run with the old version (64 bit L2 entries)
> or the new version (128 bit L2 entries) with bits left unused?

It's the old version of the code (I copied & pasted the numbers from the
previous table).

>> |--++---+-|
>> | Cluster size | 32 subclusters | 8 subclusters | subclusters=off |
>> |--++---+-|
>> | 4 KB |80 IOPS |  101 IOPS | 92 IOPS |
>> | 8 KB |   108 IOPS |  299 IOPS |417 IOPS |
>> |16 KB |  3440 IOPS | 7555 IOPS |   3347 IOPS |
>> |32 KB | 10718 IOPS |13038 IOPS |   2435 IOPS |
>> |64 KB | 12569 IOPS |10613 IOPS |   1622 IOPS |
>> |   128 KB | 11444 IOPS | 4907 IOPS |866 IOPS |
>> |   256 KB |  9335 IOPS | 2618 IOPS |561 IOPS |
>> |   512 KB |   185 IOPS | 1678 IOPS |353 IOPS |
>> |  1024 KB |  2477 IOPS |  863 IOPS |212 IOPS |
>> |  2048 KB |  1536 IOPS |  571 IOPS |123 IOPS |
>> |--++---+-|
>> 
>> I'm surprised about the 256 KB cluster / 32 subclusters case (I would
>> expect ~3300 IOPS), but I ran it a few times and the results are always
>> the same. I still haven't investigated why that happens. The rest of the
>> results seem more or less normal.
>
> Shouldn't 256k/8k perform similarly to 64k/8k, or maybe a bit better?
> Why did you expect ~3300 IOPS?

Sorry I meant the 512k/16k case, which is obviously the outlier there.
 
> I found other results more surprising. In particular:
>
> * Why does 64k/2k perform better than 128k/4k when the block size for
>   your requests is 4k?

They should perform similar because the only difference in practice is
that in the former case you set two bits on the bitmap and in the latter
only one. The difference is not too big, I could run the tests again and
if the results are consistent I can investigate what's going on.

But yes, I would expect 128k/4k to be the fastest of them all.

> * Why is the maximum for 8 subclusters higher than for 32 subclusters?
>   I guess this does make some sense if the 8 subclusters case actually
>   used 64 bit L2 entries. If you did use 128 bit entries for both 32 and
>   8 subclusters, I don't see why 8 subclusters should perform better in
>   any case.

I used 64-bit entries for the 8 subcluster case. I can try with the new
code and see what happens.

> * What causes the minimum at 512k with 32 subclusters?

That's the case that I meant earlier, and I still don't have a good
hypothesis of why that happens. I'll need to debug it.

Berto



Re: [Qemu-block] [RFC] Re-evaluating subcluster allocation for qcow2 images

2019-07-11 Thread Kevin Wolf
Am 11.07.2019 um 16:08 hat Alberto Garcia geschrieben:
> Some questions that are still open:
> 
> - It is possible to configure very easily the number of subclusters per
>   cluster. It is now hardcoded to 32 in qcow2_do_open() but any power of
>   2 would work (just change the number there if you want to test
>   it). Would an option for this be worth adding?

I think for testing we can just change the constant. Once th feature is
merged and used in production, I don't think there is any reason to
leave bits unused.

> - We could also allow the user to choose 64 subclusters per cluster and
>   disable the "all zeroes" bits in that case. It is quite simple in
>   terms of lines of code but it would make the qcow2 spec a bit more
>   complicated.
> 
> - We would now have "all zeroes" bits at the cluster and subcluster
>   levels, so there's an ambiguity here that we need to solve. In
>   particular, what happens if we have a QCOW2_CLUSTER_ZERO_ALLOC cluster
>   but some bits from the bitmap are set? Do we ignore them completely?

The (super)cluster zero bit should probably always be clear if
subclusters are used. If it's set, we have a corrupted image.

> I also ran some I/O tests using a similar scenario like last time (SSD
> drive, 40GB backing image). Here are the results, you can see the
> difference between the previous prototype (8 subclusters per cluster)
> and the new one (32):

Is the 8 subclusters test run with the old version (64 bit L2 entries)
or the new version (128 bit L2 entries) with bits left unused?

> |--++---+-|
> | Cluster size | 32 subclusters | 8 subclusters | subclusters=off |
> |--++---+-|
> | 4 KB |80 IOPS |  101 IOPS | 92 IOPS |
> | 8 KB |   108 IOPS |  299 IOPS |417 IOPS |
> |16 KB |  3440 IOPS | 7555 IOPS |   3347 IOPS |
> |32 KB | 10718 IOPS |13038 IOPS |   2435 IOPS |
> |64 KB | 12569 IOPS |10613 IOPS |   1622 IOPS |
> |   128 KB | 11444 IOPS | 4907 IOPS |866 IOPS |
> |   256 KB |  9335 IOPS | 2618 IOPS |561 IOPS |
> |   512 KB |   185 IOPS | 1678 IOPS |353 IOPS |
> |  1024 KB |  2477 IOPS |  863 IOPS |212 IOPS |
> |  2048 KB |  1536 IOPS |  571 IOPS |123 IOPS |
> |--++---+-|
> 
> I'm surprised about the 256 KB cluster / 32 subclusters case (I would
> expect ~3300 IOPS), but I ran it a few times and the results are always
> the same. I still haven't investigated why that happens. The rest of the
> results seem more or less normal.

Shouldn't 256k/8k perform similarly to 64k/8k, or maybe a bit better?
Why did you expect ~3300 IOPS?

I found other results more surprising. In particular:

* Why does 64k/2k perform better than 128k/4k when the block size for
  your requests is 4k?

* Why is the maximum for 8 subclusters higher than for 32 subclusters?
  I guess this does make some sense if the 8 subclusters case actually
  used 64 bit L2 entries. If you did use 128 bit entries for both 32 and
  8 subclusters, I don't see why 8 subclusters should perform better in
  any case.

* What causes the minimum at 512k with 32 subclusters? The other two
  setups have a maximum and performance decreases monotonically to both
  sides. This one has a minimum at 512k and larger cluster sizes improve
  performance again.

  In fact, 512k performs really bad compared even to subclusters=off.

Kevin



Re: [Qemu-block] [PATCH] LUKS: support preallocation in qemu-img

2019-07-11 Thread Kevin Wolf
Am 11.07.2019 um 15:50 hat Eric Blake geschrieben:
> On 7/11/19 7:24 AM, Max Reitz wrote:
> 
> >>> So it isn’t just me who expects these to pre-initialize the image to 0.
> >>>  Hm, although...  I suppose @falloc technically does not specify whether
> >>> the data reads as zeroes.  I kind of find it to be implied, but, well...
> >>
> >> I personally don't really think that zeros are important, but rather the 
> >> level of allocation.
> >> posix_fallocate probably won't write the data blocks but rather only the 
> >> inode metadata / used block bitmap/etc.
> >>
> >> On the other hand writing zeros (or anything else) will force the block 
> >> layer to actually write to the underlying
> >> storage which could trigger lower layer allocation if the underlying 
> >> storage is thin-provisioned.
> >>
> >> In fact IMHO, instead of writing zeros, it would be better to write random 
> >> garbage instead (or have that as an even 'fuller'
> >> preallocation mode), since underlying storage might 'compress' the zeros. 
> > 
> > Which is actually an argument why you should just write zeroes on the
> > LUKS layer, because this will then turn into quasi-random data on the
> > protocol layer.
> 
> We want preallocation to be fast (insofar as possible). Writing zeroes
> in LUKS is not fast, because it forces random data on the protocol
> layer; while writing zeroes on the protocol layer can be fast, even if
> it reads back as random on the LUKS layer. If you WANT to guarantee
> reading zeroes, that's image scrubbing, not preallocation.  I think this
> patch is taking the right approach, of letting the underlying layer
> allocate data efficiently (but the burden is then on the underlying
> layer to actually allocate data, and not optimize by compressing zeroes
> into non-allocated storage).

Isn't letting the host efficiently preallocate things what we have
preallocation=falloc for? We implement preallocation=full as explicit
writes to make sure that no shortcuts are taken and things are _really_
preallocated throughout all layers. Not being efficient, but thorough is
almost like the whole point of the option.

So I'm inclined to think that writing zeros on the LUKS layer would be
right for full preallocation.

Kevin


signature.asc
Description: PGP signature


Re: [Qemu-block] [RFC] Re-evaluating subcluster allocation for qcow2 images

2019-07-11 Thread Alberto Garcia
On Thu 27 Jun 2019 07:08:29 PM CEST, Denis Lunev wrote:

> But can we get a link to the repo with actual version of patches.

Hi,

I updated my code to increase the L2 entry size from 64 bits to 128 bits
and thanks to this we now have 32 subclusters per cluster (32 bits for
"subcluster allocated" and 32 for "subcluster is all zeroes").

I also fixed a few bugs on the way and started to clean the code a bit
so it is more readable. You can get it here:

   
https://github.com/bertogg/qemu/releases/tag/subcluster-allocation-prototype-20190711

The idea is that you can test it, evaluate the performance and see
whether the general approach makes sense, but this is obviously not
release-quality code so don't focus too much on the coding style,
variable names, hacks, etc. Many things need to change, other things
still need to be implemented, and I'm already on the process of doing
it.

Some questions that are still open:

- It is possible to configure very easily the number of subclusters per
  cluster. It is now hardcoded to 32 in qcow2_do_open() but any power of
  2 would work (just change the number there if you want to test
  it). Would an option for this be worth adding?

- We could also allow the user to choose 64 subclusters per cluster and
  disable the "all zeroes" bits in that case. It is quite simple in
  terms of lines of code but it would make the qcow2 spec a bit more
  complicated.

- We would now have "all zeroes" bits at the cluster and subcluster
  levels, so there's an ambiguity here that we need to solve. In
  particular, what happens if we have a QCOW2_CLUSTER_ZERO_ALLOC cluster
  but some bits from the bitmap are set? Do we ignore them completely?

I also ran some I/O tests using a similar scenario like last time (SSD
drive, 40GB backing image). Here are the results, you can see the
difference between the previous prototype (8 subclusters per cluster)
and the new one (32):

|--++---+-|
| Cluster size | 32 subclusters | 8 subclusters | subclusters=off |
|--++---+-|
| 4 KB |80 IOPS |  101 IOPS | 92 IOPS |
| 8 KB |   108 IOPS |  299 IOPS |417 IOPS |
|16 KB |  3440 IOPS | 7555 IOPS |   3347 IOPS |
|32 KB | 10718 IOPS |13038 IOPS |   2435 IOPS |
|64 KB | 12569 IOPS |10613 IOPS |   1622 IOPS |
|   128 KB | 11444 IOPS | 4907 IOPS |866 IOPS |
|   256 KB |  9335 IOPS | 2618 IOPS |561 IOPS |
|   512 KB |   185 IOPS | 1678 IOPS |353 IOPS |
|  1024 KB |  2477 IOPS |  863 IOPS |212 IOPS |
|  2048 KB |  1536 IOPS |  571 IOPS |123 IOPS |
|--++---+-|

I'm surprised about the 256 KB cluster / 32 subclusters case (I would
expect ~3300 IOPS), but I ran it a few times and the results are always
the same. I still haven't investigated why that happens. The rest of the
results seem more or less normal.

I will now continue working towards having something a complete
solution, but any feedback or comments will be very welcome.

Regards,

Berto



Re: [Qemu-block] [PATCH v2] LUKS: support preallocation in qemu-img

2019-07-11 Thread Maxim Levitsky
On Thu, 2019-07-11 at 15:43 +0200, Max Reitz wrote:
> On 11.07.19 11:11, Maxim Levitsky wrote:
> > preallocation=off and preallocation=metadata
> > both allocate luks header only, and preallocation=falloc/full
> > is passed to underlying file.
> > 
> > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
> > 
> > Signed-off-by: Maxim Levitsky 
> > ---
> >  block/crypto.c | 25 ++---
> >  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> FWIW, do you see the implementation of block_crypto_co_truncate()?
> Like, how it just passes preallocation requests through to the
> underlying layer?  How I said it shouldn’t be done?
> 
> Yes, that was me, in commit 7ea37c30660.
> 
> So, er, yeah.
> 
> > diff --git a/block/crypto.c b/block/crypto.c
> > index 8237424ae6..cbc291301e 100644
> > --- a/block/crypto.c
> > +++ b/block/crypto.c
> 
> [...]
> 
> > @@ -534,12 +537,28 @@ static int coroutine_fn 
> > block_crypto_co_create_opts_luks(const char *filename,
> >  QCryptoBlockCreateOptions *create_opts = NULL;
> >  BlockDriverState *bs = NULL;
> >  QDict *cryptoopts;
> > +PreallocMode prealloc;
> > +char *buf = NULL;
> >  int64_t size;
> >  int ret;
> > +Error *local_err = NULL;
> >  
> >  /* Parse options */
> >  size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
> >  
> > +buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
> > +prealloc = qapi_enum_parse(_lookup, buf,
> > +   PREALLOC_MODE_OFF, _err);
> 
> Please align such lines to the opening parenthesis.
True - I really need to invest some time to update the checkpatch.pl
in qemu source tree to be up to date, or find a way to use the kernel one,
it is so useful to let it catch these things instead of wasting your time.

> 
> > +g_free(buf);
> > +if (local_err) {
> > +error_propagate(errp, local_err);
> > +return -EINVAL;
> > +}
> > +
> > +if (prealloc == PREALLOC_MODE_METADATA) {
> > +prealloc  = PREALLOC_MODE_OFF;
> 
> There is one space too many here.
Oops, same thing as above.

> 
> > +}
> > +
> 
> I think you also need to add a @preallocation parameter to
> BlockdevCreateOptionsLUKS and handle it in block_crypto_co_create_luks().

I was under impression that with new qmp based blockdev-create api, the user
should pretty much do the preallocation itself on the underlying block file,
and then create the luks on it.

However I do see that qcow2 has a preallocation mode there, so I guess I was 
wrong.
Will do it now.

Best regards,
Maxim Levitsky







Re: [Qemu-block] [PATCH] doc: Preallocation does not require writing zeroes

2019-07-11 Thread Maxim Levitsky
On Thu, 2019-07-11 at 15:29 +0200, Max Reitz wrote:
> When preallocating an encrypted qcow2 image, it just lets the protocol
> driver write data and then does not mark the clusters as zero.
> Therefore, reading this image will yield effectively random data.
> 
> As such, we have not fulfilled the promise of always writing zeroes when
> preallocating an image in a while.  It seems that nobody has really
> cared, so change the documentation to conform to qemu's actual behavior.
> 
> Signed-off-by: Max Reitz 

I did a little grep on qemu source tree with mentions of preallocation, 
and it looks like these are the only mentions that need to be fixed.
So, thank you very much, and

Reviewed-by: Maxim Levitsky 

Best regards,
Maxim Levitsky




Re: [Qemu-block] [PATCH] LUKS: support preallocation in qemu-img

2019-07-11 Thread Daniel P . Berrangé
On Thu, Jul 11, 2019 at 08:50:56AM -0500, Eric Blake wrote:
> On 7/11/19 7:24 AM, Max Reitz wrote:
> 
> >>> So it isn’t just me who expects these to pre-initialize the image to 0.
> >>>  Hm, although...  I suppose @falloc technically does not specify whether
> >>> the data reads as zeroes.  I kind of find it to be implied, but, well...
> >>
> >> I personally don't really think that zeros are important, but rather the 
> >> level of allocation.
> >> posix_fallocate probably won't write the data blocks but rather only the 
> >> inode metadata / used block bitmap/etc.
> >>
> >> On the other hand writing zeros (or anything else) will force the block 
> >> layer to actually write to the underlying
> >> storage which could trigger lower layer allocation if the underlying 
> >> storage is thin-provisioned.
> >>
> >> In fact IMHO, instead of writing zeros, it would be better to write random 
> >> garbage instead (or have that as an even 'fuller'
> >> preallocation mode), since underlying storage might 'compress' the zeros. 
> > 
> > Which is actually an argument why you should just write zeroes on the
> > LUKS layer, because this will then turn into quasi-random data on the
> > protocol layer.
> 
> We want preallocation to be fast (insofar as possible). Writing zeroes
> in LUKS is not fast, because it forces random data on the protocol
> layer; while writing zeroes on the protocol layer can be fast, even if
> it reads back as random on the LUKS layer. If you WANT to guarantee
> reading zeroes, that's image scrubbing, not preallocation.  I think this
> patch is taking the right approach, of letting the underlying layer
> allocate data efficiently (but the burden is then on the underlying
> layer to actually allocate data, and not optimize by compressing zeroes
> into non-allocated storage).

On the topic of scrubbing, it would actually be nice to have a
"secure delete" for QEMU block driver formats that can do some
level of scrubbing in software and/or calling out to hardware support.

Similarly to prealloc a choice of 'metadata' or 'full'. Wwith LUKS
you can do well by just scrubbing the image header (which kills the
master decryption key rendering payload useless).

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [PATCH] LUKS: support preallocation in qemu-img

2019-07-11 Thread Eric Blake
On 7/11/19 7:24 AM, Max Reitz wrote:

>>> So it isn’t just me who expects these to pre-initialize the image to 0.
>>>  Hm, although...  I suppose @falloc technically does not specify whether
>>> the data reads as zeroes.  I kind of find it to be implied, but, well...
>>
>> I personally don't really think that zeros are important, but rather the 
>> level of allocation.
>> posix_fallocate probably won't write the data blocks but rather only the 
>> inode metadata / used block bitmap/etc.
>>
>> On the other hand writing zeros (or anything else) will force the block 
>> layer to actually write to the underlying
>> storage which could trigger lower layer allocation if the underlying storage 
>> is thin-provisioned.
>>
>> In fact IMHO, instead of writing zeros, it would be better to write random 
>> garbage instead (or have that as an even 'fuller'
>> preallocation mode), since underlying storage might 'compress' the zeros. 
> 
> Which is actually an argument why you should just write zeroes on the
> LUKS layer, because this will then turn into quasi-random data on the
> protocol layer.

We want preallocation to be fast (insofar as possible). Writing zeroes
in LUKS is not fast, because it forces random data on the protocol
layer; while writing zeroes on the protocol layer can be fast, even if
it reads back as random on the LUKS layer. If you WANT to guarantee
reading zeroes, that's image scrubbing, not preallocation.  I think this
patch is taking the right approach, of letting the underlying layer
allocate data efficiently (but the burden is then on the underlying
layer to actually allocate data, and not optimize by compressing zeroes
into non-allocated storage).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v2] LUKS: support preallocation in qemu-img

2019-07-11 Thread Max Reitz
On 11.07.19 11:11, Maxim Levitsky wrote:
> preallocation=off and preallocation=metadata
> both allocate luks header only, and preallocation=falloc/full
> is passed to underlying file.
> 
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
> 
> Signed-off-by: Maxim Levitsky 
> ---
>  block/crypto.c | 25 ++---
>  1 file changed, 22 insertions(+), 3 deletions(-)

FWIW, do you see the implementation of block_crypto_co_truncate()?
Like, how it just passes preallocation requests through to the
underlying layer?  How I said it shouldn’t be done?

Yes, that was me, in commit 7ea37c30660.

So, er, yeah.

> diff --git a/block/crypto.c b/block/crypto.c
> index 8237424ae6..cbc291301e 100644
> --- a/block/crypto.c
> +++ b/block/crypto.c
[...]

> @@ -534,12 +537,28 @@ static int coroutine_fn 
> block_crypto_co_create_opts_luks(const char *filename,
>  QCryptoBlockCreateOptions *create_opts = NULL;
>  BlockDriverState *bs = NULL;
>  QDict *cryptoopts;
> +PreallocMode prealloc;
> +char *buf = NULL;
>  int64_t size;
>  int ret;
> +Error *local_err = NULL;
>  
>  /* Parse options */
>  size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
>  
> +buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
> +prealloc = qapi_enum_parse(_lookup, buf,
> +   PREALLOC_MODE_OFF, _err);

Please align such lines to the opening parenthesis.

> +g_free(buf);
> +if (local_err) {
> +error_propagate(errp, local_err);
> +return -EINVAL;
> +}
> +
> +if (prealloc == PREALLOC_MODE_METADATA) {
> +prealloc  = PREALLOC_MODE_OFF;

There is one space too many here.

> +}
> +

I think you also need to add a @preallocation parameter to
BlockdevCreateOptionsLUKS and handle it in block_crypto_co_create_luks().

Max

>  cryptoopts = qemu_opts_to_qdict_filtered(opts, NULL,
>   _crypto_create_opts_luks,
>   true);



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH] doc: Preallocation does not require writing zeroes

2019-07-11 Thread Daniel P . Berrangé
On Thu, Jul 11, 2019 at 03:29:35PM +0200, Max Reitz wrote:
> When preallocating an encrypted qcow2 image, it just lets the protocol
> driver write data and then does not mark the clusters as zero.
> Therefore, reading this image will yield effectively random data.
> 
> As such, we have not fulfilled the promise of always writing zeroes when
> preallocating an image in a while.  It seems that nobody has really
> cared, so change the documentation to conform to qemu's actual behavior.
> 
> Signed-off-by: Max Reitz 
> ---
>  qapi/block-core.json | 9 +
>  docs/qemu-block-drivers.texi | 4 ++--
>  qemu-img.texi| 4 ++--
>  3 files changed, 9 insertions(+), 8 deletions(-)


Reviewed-by: Daniel P. Berrangé 


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [PATCH] doc: Preallocation does not require writing zeroes

2019-07-11 Thread Eric Blake
On 7/11/19 8:29 AM, Max Reitz wrote:
> When preallocating an encrypted qcow2 image, it just lets the protocol
> driver write data and then does not mark the clusters as zero.
> Therefore, reading this image will yield effectively random data.
> 
> As such, we have not fulfilled the promise of always writing zeroes when
> preallocating an image in a while.  It seems that nobody has really
> cared, so change the documentation to conform to qemu's actual behavior.
> 
> Signed-off-by: Max Reitz 
> ---

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH] doc: Preallocation does not require writing zeroes

2019-07-11 Thread Max Reitz
When preallocating an encrypted qcow2 image, it just lets the protocol
driver write data and then does not mark the clusters as zero.
Therefore, reading this image will yield effectively random data.

As such, we have not fulfilled the promise of always writing zeroes when
preallocating an image in a while.  It seems that nobody has really
cared, so change the documentation to conform to qemu's actual behavior.

Signed-off-by: Max Reitz 
---
 qapi/block-core.json | 9 +
 docs/qemu-block-drivers.texi | 4 ++--
 qemu-img.texi| 4 ++--
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0d43d4f37c..a4363b84d2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -5167,10 +5167,11 @@
 # @off: no preallocation
 # @metadata: preallocate only for metadata
 # @falloc: like @full preallocation but allocate disk space by
-#  posix_fallocate() rather than writing zeros.
-# @full: preallocate all data by writing zeros to device to ensure disk
-#space is really available. @full preallocation also sets up
-#metadata correctly.
+#  posix_fallocate() rather than writing data.
+# @full: preallocate all data by writing it to the device to ensure
+#disk space is really available. This data may or may not be
+#zero, depending on the image format and storage.
+#@full preallocation also sets up metadata correctly.
 #
 # Since: 2.2
 ##
diff --git a/docs/qemu-block-drivers.texi b/docs/qemu-block-drivers.texi
index 91ab0eceae..c02547e28c 100644
--- a/docs/qemu-block-drivers.texi
+++ b/docs/qemu-block-drivers.texi
@@ -31,8 +31,8 @@ Supported options:
 @item preallocation
 Preallocation mode (allowed values: @code{off}, @code{falloc}, @code{full}).
 @code{falloc} mode preallocates space for image by calling posix_fallocate().
-@code{full} mode preallocates space for image by writing zeros to underlying
-storage.
+@code{full} mode preallocates space for image by writing data to underlying
+storage.  This data may or may not be zero, depending on the storage location.
 @end table
 
 @item qcow2
diff --git a/qemu-img.texi b/qemu-img.texi
index c8e9bba515..b5156d6316 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -666,8 +666,8 @@ Supported options:
 @item preallocation
 Preallocation mode (allowed values: @code{off}, @code{falloc}, @code{full}).
 @code{falloc} mode preallocates space for image by calling posix_fallocate().
-@code{full} mode preallocates space for image by writing zeros to underlying
-storage.
+@code{full} mode preallocates space for image by writing data to underlying
+storage.  This data may or may not be zero, depending on the storage location.
 @end table
 
 @item qcow2
-- 
2.21.0




Re: [Qemu-block] [PATCH-for-4.1] hw/block/pflash_cfi02: Explicit switch fallthrough for ERASE commands

2019-07-11 Thread Peter Maydell
On Thu, 11 Jul 2019 at 14:08, Philippe Mathieu-Daudé  wrote:
>
> Previous to commit ddb6f2254, the DQ2 bit was incorrectly set
> during PROGRAM command (0xA0). The commit reordered the switch
> cases to only set the DQ2 bit for the ERASE commands using a
> fallthrough, but did not explicit the fallthrough is intentional.
>
> Mark the switch fallthrough with a comment interpretable by C
> preprocessors and static analysis tools.
>
> Reported-by: Coverity (CID 1403012)
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/block/pflash_cfi02.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
> index 83084b9d72..f68837a449 100644
> --- a/hw/block/pflash_cfi02.c
> +++ b/hw/block/pflash_cfi02.c
> @@ -367,6 +367,7 @@ static uint64_t pflash_read(void *opaque, hwaddr offset, 
> unsigned int width)
>  case 0x30: /* Sector Erase */
>  /* Toggle bit 2 during erase, but not program. */
>  toggle_dq2(pfl);
> +/* fall through */
>  case 0xA0: /* Program */
>  /* Toggle bit 6 */
>  toggle_dq6(pfl);
> --

Reviewed-by: Peter Maydell 

thanks
-- PMM



[Qemu-block] [PATCH-for-4.1] hw/block/pflash_cfi02: Explicit switch fallthrough for ERASE commands

2019-07-11 Thread Philippe Mathieu-Daudé
Previous to commit ddb6f2254, the DQ2 bit was incorrectly set
during PROGRAM command (0xA0). The commit reordered the switch
cases to only set the DQ2 bit for the ERASE commands using a
fallthrough, but did not explicit the fallthrough is intentional.

Mark the switch fallthrough with a comment interpretable by C
preprocessors and static analysis tools.

Reported-by: Coverity (CID 1403012)
Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/pflash_cfi02.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index 83084b9d72..f68837a449 100644
--- a/hw/block/pflash_cfi02.c
+++ b/hw/block/pflash_cfi02.c
@@ -367,6 +367,7 @@ static uint64_t pflash_read(void *opaque, hwaddr offset, 
unsigned int width)
 case 0x30: /* Sector Erase */
 /* Toggle bit 2 during erase, but not program. */
 toggle_dq2(pfl);
+/* fall through */
 case 0xA0: /* Program */
 /* Toggle bit 6 */
 toggle_dq6(pfl);
-- 
2.20.1




Re: [Qemu-block] [Qemu-devel] [PATCH v6] qemu-io: add pattern file for write command

2019-07-11 Thread Max Reitz
On 27.06.19 17:26, Denis Plotnikov wrote:
> 
> 
> On 19.06.2019 13:09, Vladimir Sementsov-Ogievskiy wrote:
>> 10.06.2019 16:21, Denis Plotnikov wrote:
>>> The patch allows to provide a pattern file for write
>>> command. There was no similar ability before.
>>>
>>> Signed-off-by: Denis Plotnikov 
>>> ---
>>> v6:
>>> * the pattern file is read once to reduce io
>>>
>>> v5:
>>> * file name initiated with null to make compilers happy
>>>
>>> v4:
>>> * missing signed-off clause added
>>>
>>> v3:
>>> * missing file closing added
>>> * exclusive flags processing changed
>>> * buffer void* converted to char* to fix pointer arithmetics
>>> * file reading error processing added
>>> ---
>>>qemu-io-cmds.c | 88 ++
>>>1 file changed, 82 insertions(+), 6 deletions(-)

[...]

>>> +if (l < len) {
>>> +char *file_buf = g_malloc(sizeof(char) * l);

Note that sizeof(char) is guaranteed to be 1.

(C1x standard, 6.5.3.4, paragraph 4.)

>>> +memcpy(file_buf, buf, l);
>>
>> I see no reason to copy it, you can just use buf_origin pointer
>> instead.
> I use buf_origin to save the beginning pointer to return it from the 
> function avoiding later calculation of the beginning address since 
> pointer of the buf is changed in the loop.

The point is that you don’t need file_buf.  You can drop it, replace all
occurrences of file_buf by buf_origin, and probably use memmove instead
of memcpy.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH] LUKS: support preallocation in qemu-img

2019-07-11 Thread Daniel P . Berrangé
On Thu, Jul 11, 2019 at 02:23:55PM +0200, Max Reitz wrote:
> On 11.07.19 11:20, Daniel P. Berrangé wrote:
> > On Wed, Jul 10, 2019 at 11:24:46PM +0200, Max Reitz wrote:
> >> On 10.07.19 19:03, Maxim Levitsky wrote:
> >>> preallocation=off and preallocation=metadata
> >>> both allocate luks header only, and preallocation=falloc/full
> >>> is passed to underlying file, with the given image size.
> >>>
> >>> Note that the actual preallocated size is a bit smaller due
> >>> to luks header.
> >>
> >> Couldn’t you just preallocate it after creating the crypto header so
> >> qcrypto_block_get_payload_offset(crypto->block) + size is the actual
> >> file size?
> > 
> > Yeah that would be preferrable. If that's really not possible, we
> > could likely provide some API to query the expected hreader size for
> > a given set of creation options. 
> > 
> >>
> >>> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
> >>>
> >>> Signed-off-by: Maxim Levitsky 
> >>> ---
> >>>  block/crypto.c | 28 ++--
> >>>  1 file changed, 26 insertions(+), 2 deletions(-)
> >>
> >> Hm.  I would expect a preallocated image to read 0.  But if you just
> >> pass this through to the protocol layer, it won’t read 0.
> > 
> > Yes, it will be zeros at the physical layer, but unintelligble
> > garbage from POV of the virtual disk.
> > 
> > I don't think this is really a problem though - this is what you
> > get already if you create a LUKS volume on top of a block device
> > today.
> 
> Which is why we have BlockDriver.bdrv_has_zero_init(), which the LUKS
> driver does not implement, hence it being treated as false.
> 
> But if you are preallocating, you have a choice of what you write, and
> why not make that zeroes?
> 
> > AFAIK, we've not documented that preallocation guarantees future
> > reads will return zeros. Preallocation simply ensures that all
> > required space is allocated upfront. We do mention that it might
> > be achieved by writing zeros to the underlying storage but never
> > said you'll get zeros back.
> 
> But we have, as I wrote in my second reply.  PreallocMode's
> documentation says that at least “full” is writing zeroes, and to say
> those zeroes can be anywhere in the stack is cheating, from my POV.

I guess it depends on your interpretation of the docs. In qemu-img
man page it says

  "falloc" mode preallocates space for image by calling posix_fallocate().
  "full" mode preallocates space for image by writing zeros to underlying
  storage.

To me both those sentances are talking about the lowest level in the
stack, closest to the physical storage medium, though I can understand
if people have other interpretations.

> > IOW I think its at most a docs problem to more clearly explain
> > that preallocation != guaranteed zeros for reads.
> > 
> >> (In fact, I don’t even quite see the point of having LUKS as an own
> >> format still.  It was useful when qcow2 didn’t have LUKS support, but
> >> now it does, so...  I suppose everyone using the LUKS format should
> >> actually be using qcow2 with LUKS?)
> > 
> > Certainly not. LUKS on raw is going to be very common, not least because
> > that's directly compatible with what Linux kernel supports. If you don't
> > want the features of qcow2 like snapshots, it just adds overhead and mgmt
> > complexity for no gain, especially if dealing with block device backed
> > storage (iSCSI, RBD).
> > 
> > OpenStack will use cryptsetup when initializing its block storage with
> > LUKS, then tell QEMU to run with the raw + LUKS driver.
> 
> I see the compatibility with the Linux kernel, yes (as I wrote in my
> second reply), but I’m not sure whether “overhead” really is that big of
> a point when using encryption.

Overhead is not purely about CPU burn. There's non-negligible memory
overhead for qcow2s data tables that doesn't exist at all with raw.
The mgmt complexity & interoperability is the real killer feature
benefit of raw + LUKS vs qcow + LUKS though.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-block] [Qemu-devel] [PATCH] LUKS: support preallocation in qemu-img

2019-07-11 Thread Max Reitz
On 11.07.19 14:23, Max Reitz wrote:
> On 11.07.19 11:20, Daniel P. Berrangé wrote:
>> On Wed, Jul 10, 2019 at 11:24:46PM +0200, Max Reitz wrote:

[...]

>>> Hm.  I would expect a preallocated image to read 0.  But if you just
>>> pass this through to the protocol layer, it won’t read 0.
>>
>> Yes, it will be zeros at the physical layer, but unintelligble
>> garbage from POV of the virtual disk.
>>
>> I don't think this is really a problem though - this is what you
>> get already if you create a LUKS volume on top of a block device
>> today.
> 
> Which is why we have BlockDriver.bdrv_has_zero_init(), which the LUKS
> driver does not implement, hence it being treated as false.
> 
> But if you are preallocating, you have a choice of what you write, and
> why not make that zeroes?
> 
>> AFAIK, we've not documented that preallocation guarantees future
>> reads will return zeros. Preallocation simply ensures that all
>> required space is allocated upfront. We do mention that it might
>> be achieved by writing zeros to the underlying storage but never
>> said you'll get zeros back.
> 
> But we have, as I wrote in my second reply.  PreallocMode's
> documentation says that at least “full” is writing zeroes, and to say
> those zeroes can be anywhere in the stack is cheating, from my POV.

I should add that I don’t mind changing the current documentation too much:

>> IOW I think its at most a docs problem to more clearly explain
>> that preallocation != guaranteed zeros for reads.

If there is a good reason to do that, sure.  But it needs to be done
explicitly, with an accompanying justification.  I don’t like just
ignoring the documentation we have.

(And yes, if something says “this writes zeroes”, I personally will
always interpret that as “it will read as zeroes”.)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PULL 24/27] hw/block/pflash_cfi02: Implement erase suspend/resume

2019-07-11 Thread Philippe Mathieu-Daudé
On 7/11/19 2:35 PM, Peter Maydell wrote:
> On Tue, 2 Jul 2019 at 04:29, Philippe Mathieu-Daudé  wrote:
>>
>> From: Stephen Checkoway 
>>
>> During a sector erase (but not a chip erase), the embeded erase program
>> can be suspended. Once suspended, the sectors not selected for erasure
>> may be read and programmed. Autoselect mode is allowed during erase
>> suspend mode. Presumably, CFI queries are similarly allowed so this
>> commit allows them as well.
>>
>> Since guest firmware can use status bits DQ7, DQ6, DQ3, and DQ2 to
>> determine the current state of sector erasure, these bits are properly
>> implemented.
>>
>> @@ -305,13 +364,16 @@ static uint64_t pflash_read(void *opaque, hwaddr 
>> offset, unsigned int width)
>>  }
>>  DPRINTF("%s: ID " TARGET_FMT_plx " %" PRIx64 "\n", __func__, boff, 
>> ret);
>>  break;
>> -case 0xA0:
>>  case 0x10:
>>  case 0x30:
>> +/* Toggle bit 2 during erase, but not program. */
>> +toggle_dq2(pfl);
>> +case 0xA0:
>> +/* Toggle bit 6 */
>> +toggle_dq6(pfl);
>>  /* Status register read */
>>  ret = pfl->status;
>>  DPRINTF("%s: status %" PRIx64 "\n", __func__, ret);
>> -toggle_dq6(pfl);
>>  break;
>>  case 0x98:
>>  /* CFI query mode */
> 
> Hi; Coverity (CID 1403012) flags up the case 0x30 as an implicit
> fallthrough. Should it have an explicit "break" or a "/* fall through */"
> comment?

Yes, it is an implicit fallthrough. I'll send a patch.

Thanks,

Phil.



Re: [Qemu-block] [PATCH 8/8] iotests/257: test traditional sync modes

2019-07-11 Thread Max Reitz
On 11.07.19 05:21, John Snow wrote:
> 
> On 7/10/19 4:46 PM, Max Reitz wrote:
>> On 10.07.19 21:00, John Snow wrote:
>>> On 7/10/19 1:14 PM, Max Reitz wrote:
 On 10.07.19 03:05, John Snow wrote:

 Hm.  How useful is bitmap support for 'top' then, anyway?  That means
 that if you want to resume a top backup, you always have to resume it
 like it was a full backup.  Which sounds kind of useless.

 Max

>>>
>>> Good point!
>>>
>>> I think this can be fixed by doing an initialization pass of the
>>> copy_bitmap when sync=top to set only the allocated regions in the bitmap.
>>>
>>> This means that the write notifier won't copy out regions that are
>>> written to that weren't already in the top layer. I believe this is
>>> actually a bugfix; the data we'd copy out in such cases is actually in
>>> the backing layer and shouldn't be copied with sync=top.
>>
>> Now that you mention it...  I didn’t realize that.  Yes, you’re right.
>>
>>> So this would have two effects:
>>> (1) sync=top gets a little more judicious about what it copies out on
>>> sync=top, and
>>> (2) the bitmap return value is more meaningful again.
>>>
> 
> This might be harder than I first thought.
> 
> initializing the copy_bitmap generally happens before we install the
> write notifier, which means that it occurs before the first yield.
> 
> However, checking the allocation status can potentially be very slow,
> can't it? I can't just hog the thread while I check.

I was thinking about that myself.  It isn’t that bad, because you aren’t
doing the full block_status dance but just checking allocation status,
which is reasonably quick (it just needs to look at the image format
metadata, it doesn’t go down to the protocol layer).

But it’s probably not so good to halt the monitor for this, yes.

> There are ways to cooperatively process write notifier interrupts and
> continue to check allocated status once we enter the main loop, but the
> problem there becomes: if we fail early, how can we tell if the backup
> is worth resuming?
> 
> We might not have reached a convergence point for the copy_bitmap before
> we failed, and still have a lot of extra bits set.

Is that so bad?

> I suppose at least in the case where we aren't trying to save the
> copy_bitmap and need it to mean something specific, this is a reasonable
> approach to fixing sync=TOP.
> 
> As far as resume is concerned, I don't think I have good ideas. I could
> emit an event or something if you're using sync=top with a bitmap for
> output, but that feels *so* specialized for a niche(?) command that I
> don't know if it's worth pursuing.
> 
> (Plus, even then, what do you do if it fails before you see that event?
> You just have to give up on what we copied out? That seems like a waste
> and not the point of this exercise.)

Before that event, the bitmap can still be usable, as long as all
“unknown” areas are set to dirty.  Sure, your resumed backup will then
copy too much data.  But who cares.

So I don’t think you even need an event.

> The only way I can think of at all to get a retry on sync=top is to take
> an always policy, and to allow a special invocation with something like
> mode=bitmap+top:

Yes, that was my first idea, too.  But I didn’t even write about it,
because of...

> "Assume we need to copy anything set in the bitmap, unless it's not in
> the top layer, and then skip it."
> 
> Which seems awful, because it would be a specialty mode for the
> exclusive purpose of re-trying sync=top backups.

...exactly this.

> Meh.

I don’t think it’s all that bad.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PULL 24/27] hw/block/pflash_cfi02: Implement erase suspend/resume

2019-07-11 Thread Peter Maydell
On Tue, 2 Jul 2019 at 04:29, Philippe Mathieu-Daudé  wrote:
>
> From: Stephen Checkoway 
>
> During a sector erase (but not a chip erase), the embeded erase program
> can be suspended. Once suspended, the sectors not selected for erasure
> may be read and programmed. Autoselect mode is allowed during erase
> suspend mode. Presumably, CFI queries are similarly allowed so this
> commit allows them as well.
>
> Since guest firmware can use status bits DQ7, DQ6, DQ3, and DQ2 to
> determine the current state of sector erasure, these bits are properly
> implemented.
>
> @@ -305,13 +364,16 @@ static uint64_t pflash_read(void *opaque, hwaddr 
> offset, unsigned int width)
>  }
>  DPRINTF("%s: ID " TARGET_FMT_plx " %" PRIx64 "\n", __func__, boff, 
> ret);
>  break;
> -case 0xA0:
>  case 0x10:
>  case 0x30:
> +/* Toggle bit 2 during erase, but not program. */
> +toggle_dq2(pfl);
> +case 0xA0:
> +/* Toggle bit 6 */
> +toggle_dq6(pfl);
>  /* Status register read */
>  ret = pfl->status;
>  DPRINTF("%s: status %" PRIx64 "\n", __func__, ret);
> -toggle_dq6(pfl);
>  break;
>  case 0x98:
>  /* CFI query mode */

Hi; Coverity (CID 1403012) flags up the case 0x30 as an implicit
fallthrough. Should it have an explicit "break" or a "/* fall through */"
comment?

thanks
-- PMM



Re: [Qemu-block] [PATCH] LUKS: support preallocation in qemu-img

2019-07-11 Thread Max Reitz
On 11.07.19 10:39, Maxim Levitsky wrote:
> On Wed, 2019-07-10 at 23:52 +0200, Max Reitz wrote:
>> On 10.07.19 23:24, Max Reitz wrote:
>>> On 10.07.19 19:03, Maxim Levitsky wrote:
 preallocation=off and preallocation=metadata
 both allocate luks header only, and preallocation=falloc/full
 is passed to underlying file, with the given image size.

 Note that the actual preallocated size is a bit smaller due
 to luks header.
>>>
>>> Couldn’t you just preallocate it after creating the crypto header so
>>> qcrypto_block_get_payload_offset(crypto->block) + size is the actual
>>> file size?
> 
> I kind of thought of the same thing after I send the patch. I'll see now it I 
> can make it work.
> 
> 
>>>
 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951

 Signed-off-by: Maxim Levitsky 
 ---
  block/crypto.c | 28 ++--
  1 file changed, 26 insertions(+), 2 deletions(-)
>>>
>>> Hm.  I would expect a preallocated image to read 0.  But if you just
>>> pass this through to the protocol layer, it won’t read 0.
>>>
>>> (In fact, I don’t even quite see the point of having LUKS as an own
>>> format still.  It was useful when qcow2 didn’t have LUKS support, but
>>> now it does, so...  I suppose everyone using the LUKS format should
>>> actually be using qcow2 with LUKS?)
>>
>> Kevin just pointed out to me that our LUKS format is compatible to the
>> actual layout cryptsetup uses.  OK, that is an important use case.
>>
>> Hm.  Unfortunately, that doesn’t really necessitate preallocation.
>>
>> Well, whatever.  If it’s simple enough, that shouldn’t stop us from
>> implementing preallocation anyway.
> Exactly. Since I already know the area of qemu-img relatively well, and
> this bug is on my backlog, I thought why not to do it.
> 
> 
>>
>>
>> Now I found that qapi/block-core.json defines PreallocMode’s falloc and
>> full values as follows:
>>
>>> # @falloc: like @full preallocation but allocate disk space by
>>> #  posix_fallocate() rather than writing zeros.
>>> # @full: preallocate all data by writing zeros to device to ensure disk
>>> #space is really available. @full preallocation also sets up
>>> #metadata correctly.
>>
>> So it isn’t just me who expects these to pre-initialize the image to 0.
>>  Hm, although...  I suppose @falloc technically does not specify whether
>> the data reads as zeroes.  I kind of find it to be implied, but, well...
> 
> I personally don't really think that zeros are important, but rather the 
> level of allocation.
> posix_fallocate probably won't write the data blocks but rather only the 
> inode metadata / used block bitmap/etc.
> 
> On the other hand writing zeros (or anything else) will force the block layer 
> to actually write to the underlying
> storage which could trigger lower layer allocation if the underlying storage 
> is thin-provisioned.
> 
> In fact IMHO, instead of writing zeros, it would be better to write random 
> garbage instead (or have that as an even 'fuller'
> preallocation mode), since underlying storage might 'compress' the zeros. 

Which is actually an argument why you should just write zeroes on the
LUKS layer, because this will then turn into quasi-random data on the
protocol layer.

Max

> In this version I do have a bug that I mentioned, about not preallocation 
> some data at the end of the image, and I will
> fix it, so that all image is zeros as expected
> 
> Best regards,
>   Maxim Levitsky
> 
> 
>>
>> Max
>>
> 
> 




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH] LUKS: support preallocation in qemu-img

2019-07-11 Thread Max Reitz
On 11.07.19 11:20, Daniel P. Berrangé wrote:
> On Wed, Jul 10, 2019 at 11:24:46PM +0200, Max Reitz wrote:
>> On 10.07.19 19:03, Maxim Levitsky wrote:
>>> preallocation=off and preallocation=metadata
>>> both allocate luks header only, and preallocation=falloc/full
>>> is passed to underlying file, with the given image size.
>>>
>>> Note that the actual preallocated size is a bit smaller due
>>> to luks header.
>>
>> Couldn’t you just preallocate it after creating the crypto header so
>> qcrypto_block_get_payload_offset(crypto->block) + size is the actual
>> file size?
> 
> Yeah that would be preferrable. If that's really not possible, we
> could likely provide some API to query the expected hreader size for
> a given set of creation options. 
> 
>>
>>> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
>>>
>>> Signed-off-by: Maxim Levitsky 
>>> ---
>>>  block/crypto.c | 28 ++--
>>>  1 file changed, 26 insertions(+), 2 deletions(-)
>>
>> Hm.  I would expect a preallocated image to read 0.  But if you just
>> pass this through to the protocol layer, it won’t read 0.
> 
> Yes, it will be zeros at the physical layer, but unintelligble
> garbage from POV of the virtual disk.
> 
> I don't think this is really a problem though - this is what you
> get already if you create a LUKS volume on top of a block device
> today.

Which is why we have BlockDriver.bdrv_has_zero_init(), which the LUKS
driver does not implement, hence it being treated as false.

But if you are preallocating, you have a choice of what you write, and
why not make that zeroes?

> AFAIK, we've not documented that preallocation guarantees future
> reads will return zeros. Preallocation simply ensures that all
> required space is allocated upfront. We do mention that it might
> be achieved by writing zeros to the underlying storage but never
> said you'll get zeros back.

But we have, as I wrote in my second reply.  PreallocMode's
documentation says that at least “full” is writing zeroes, and to say
those zeroes can be anywhere in the stack is cheating, from my POV.

> IOW I think its at most a docs problem to more clearly explain
> that preallocation != guaranteed zeros for reads.
> 
>> (In fact, I don’t even quite see the point of having LUKS as an own
>> format still.  It was useful when qcow2 didn’t have LUKS support, but
>> now it does, so...  I suppose everyone using the LUKS format should
>> actually be using qcow2 with LUKS?)
> 
> Certainly not. LUKS on raw is going to be very common, not least because
> that's directly compatible with what Linux kernel supports. If you don't
> want the features of qcow2 like snapshots, it just adds overhead and mgmt
> complexity for no gain, especially if dealing with block device backed
> storage (iSCSI, RBD).
> 
> OpenStack will use cryptsetup when initializing its block storage with
> LUKS, then tell QEMU to run with the raw + LUKS driver.

I see the compatibility with the Linux kernel, yes (as I wrote in my
second reply), but I’m not sure whether “overhead” really is that big of
a point when using encryption.

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PULL 0/3] Bitmaps patches

2019-07-11 Thread Peter Maydell
On Wed, 10 Jul 2019 at 20:23, John Snow  wrote:
>
> The following changes since commit 6df2cdf44a82426f7a59dcb03f0dd2181ed7fdfa:
>
>   Update version for v4.1.0-rc0 release (2019-07-09 17:21:53 +0100)
>
> are available in the Git repository at:
>
>   https://github.com/jnsnow/qemu.git tags/bitmaps-pull-request
>
> for you to fetch changes up to a7786bfb0effe0b4b0fc61d8a8cd307c0b739ed7:
>
>   docs/bitmaps: use QMP lexer instead of json (2019-07-10 15:08:07 -0400)
>
> 
> Pull request:
>   This is a build fix.
>
> 
>
> John Snow (3):
>   docs/interop/bitmaps.rst: Fix typos
>   sphinx: add qmp_lexer
>   docs/bitmaps: use QMP lexer instead of json


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.1
for any user-visible changes.

-- PMM



Re: [Qemu-block] [Qemu-devel] [PATCH] LUKS: support preallocation in qemu-img

2019-07-11 Thread Daniel P . Berrangé
On Wed, Jul 10, 2019 at 11:24:46PM +0200, Max Reitz wrote:
> On 10.07.19 19:03, Maxim Levitsky wrote:
> > preallocation=off and preallocation=metadata
> > both allocate luks header only, and preallocation=falloc/full
> > is passed to underlying file, with the given image size.
> > 
> > Note that the actual preallocated size is a bit smaller due
> > to luks header.
> 
> Couldn’t you just preallocate it after creating the crypto header so
> qcrypto_block_get_payload_offset(crypto->block) + size is the actual
> file size?

Yeah that would be preferrable. If that's really not possible, we
could likely provide some API to query the expected hreader size for
a given set of creation options. 

> 
> > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
> > 
> > Signed-off-by: Maxim Levitsky 
> > ---
> >  block/crypto.c | 28 ++--
> >  1 file changed, 26 insertions(+), 2 deletions(-)
> 
> Hm.  I would expect a preallocated image to read 0.  But if you just
> pass this through to the protocol layer, it won’t read 0.

Yes, it will be zeros at the physical layer, but unintelligble
garbage from POV of the virtual disk.

I don't think this is really a problem though - this is what you
get already if you create a LUKS volume on top of a block device
today.

AFAIK, we've not documented that preallocation guarantees future
reads will return zeros. Preallocation simply ensures that all
required space is allocated upfront. We do mention that it might
be achieved by writing zeros to the underlying storage but never
said you'll get zeros back.

IOW I think its at most a docs problem to more clearly explain
that preallocation != guaranteed zeros for reads.

> (In fact, I don’t even quite see the point of having LUKS as an own
> format still.  It was useful when qcow2 didn’t have LUKS support, but
> now it does, so...  I suppose everyone using the LUKS format should
> actually be using qcow2 with LUKS?)

Certainly not. LUKS on raw is going to be very common, not least because
that's directly compatible with what Linux kernel supports. If you don't
want the features of qcow2 like snapshots, it just adds overhead and mgmt
complexity for no gain, especially if dealing with block device backed
storage (iSCSI, RBD).

OpenStack will use cryptsetup when initializing its block storage with
LUKS, then tell QEMU to run with the raw + LUKS driver.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



[Qemu-block] [PATCH v2] LUKS: support preallocation in qemu-img

2019-07-11 Thread Maxim Levitsky
preallocation=off and preallocation=metadata
both allocate luks header only, and preallocation=falloc/full
is passed to underlying file.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951

Signed-off-by: Maxim Levitsky 
---
 block/crypto.c | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/block/crypto.c b/block/crypto.c
index 8237424ae6..cbc291301e 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -74,6 +74,7 @@ static ssize_t block_crypto_read_func(QCryptoBlock *block,
 struct BlockCryptoCreateData {
 BlockBackend *blk;
 uint64_t size;
+PreallocMode prealloc;
 };
 
 
@@ -112,7 +113,7 @@ static ssize_t block_crypto_init_func(QCryptoBlock *block,
  * available to the guest, so we must take account of that
  * which will be used by the crypto header
  */
-return blk_truncate(data->blk, data->size + headerlen, PREALLOC_MODE_OFF,
+return blk_truncate(data->blk, data->size + headerlen, data->prealloc,
 errp);
 }
 
@@ -251,6 +252,7 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
 static int block_crypto_co_create_generic(BlockDriverState *bs,
   int64_t size,
   QCryptoBlockCreateOptions *opts,
+  PreallocMode prealloc,
   Error **errp)
 {
 int ret;
@@ -269,6 +271,7 @@ static int block_crypto_co_create_generic(BlockDriverState 
*bs,
 data = (struct BlockCryptoCreateData) {
 .blk = blk,
 .size = size,
+.prealloc = prealloc,
 };
 
 crypto = qcrypto_block_create(opts, NULL,
@@ -516,7 +519,7 @@ block_crypto_co_create_luks(BlockdevCreateOptions 
*create_options, Error **errp)
 };
 
 ret = block_crypto_co_create_generic(bs, luks_opts->size, _opts,
- errp);
+ PREALLOC_MODE_OFF, errp);
 if (ret < 0) {
 goto fail;
 }
@@ -534,12 +537,28 @@ static int coroutine_fn 
block_crypto_co_create_opts_luks(const char *filename,
 QCryptoBlockCreateOptions *create_opts = NULL;
 BlockDriverState *bs = NULL;
 QDict *cryptoopts;
+PreallocMode prealloc;
+char *buf = NULL;
 int64_t size;
 int ret;
+Error *local_err = NULL;
 
 /* Parse options */
 size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
 
+buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
+prealloc = qapi_enum_parse(_lookup, buf,
+   PREALLOC_MODE_OFF, _err);
+g_free(buf);
+if (local_err) {
+error_propagate(errp, local_err);
+return -EINVAL;
+}
+
+if (prealloc == PREALLOC_MODE_METADATA) {
+prealloc  = PREALLOC_MODE_OFF;
+}
+
 cryptoopts = qemu_opts_to_qdict_filtered(opts, NULL,
  _crypto_create_opts_luks,
  true);
@@ -565,7 +584,7 @@ static int coroutine_fn 
block_crypto_co_create_opts_luks(const char *filename,
 }
 
 /* Create format layer */
-ret = block_crypto_co_create_generic(bs, size, create_opts, errp);
+ret = block_crypto_co_create_generic(bs, size, create_opts, prealloc, 
errp);
 if (ret < 0) {
 goto fail;
 }
-- 
2.17.2




Re: [Qemu-block] [PATCH] LUKS: support preallocation in qemu-img

2019-07-11 Thread Maxim Levitsky
On Wed, 2019-07-10 at 23:52 +0200, Max Reitz wrote:
> On 10.07.19 23:24, Max Reitz wrote:
> > On 10.07.19 19:03, Maxim Levitsky wrote:
> > > preallocation=off and preallocation=metadata
> > > both allocate luks header only, and preallocation=falloc/full
> > > is passed to underlying file, with the given image size.
> > > 
> > > Note that the actual preallocated size is a bit smaller due
> > > to luks header.
> > 
> > Couldn’t you just preallocate it after creating the crypto header so
> > qcrypto_block_get_payload_offset(crypto->block) + size is the actual
> > file size?

I kind of thought of the same thing after I send the patch. I'll see now it I 
can make it work.


> > 
> > > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1534951
> > > 
> > > Signed-off-by: Maxim Levitsky 
> > > ---
> > >  block/crypto.c | 28 ++--
> > >  1 file changed, 26 insertions(+), 2 deletions(-)
> > 
> > Hm.  I would expect a preallocated image to read 0.  But if you just
> > pass this through to the protocol layer, it won’t read 0.
> > 
> > (In fact, I don’t even quite see the point of having LUKS as an own
> > format still.  It was useful when qcow2 didn’t have LUKS support, but
> > now it does, so...  I suppose everyone using the LUKS format should
> > actually be using qcow2 with LUKS?)
> 
> Kevin just pointed out to me that our LUKS format is compatible to the
> actual layout cryptsetup uses.  OK, that is an important use case.
> 
> Hm.  Unfortunately, that doesn’t really necessitate preallocation.
> 
> Well, whatever.  If it’s simple enough, that shouldn’t stop us from
> implementing preallocation anyway.
Exactly. Since I already know the area of qemu-img relatively well, and
this bug is on my backlog, I thought why not to do it.


> 
> 
> Now I found that qapi/block-core.json defines PreallocMode’s falloc and
> full values as follows:
> 
> > # @falloc: like @full preallocation but allocate disk space by
> > #  posix_fallocate() rather than writing zeros.
> > # @full: preallocate all data by writing zeros to device to ensure disk
> > #space is really available. @full preallocation also sets up
> > #metadata correctly.
> 
> So it isn’t just me who expects these to pre-initialize the image to 0.
>  Hm, although...  I suppose @falloc technically does not specify whether
> the data reads as zeroes.  I kind of find it to be implied, but, well...

I personally don't really think that zeros are important, but rather the level 
of allocation.
posix_fallocate probably won't write the data blocks but rather only the inode 
metadata / used block bitmap/etc.

On the other hand writing zeros (or anything else) will force the block layer 
to actually write to the underlying
storage which could trigger lower layer allocation if the underlying storage is 
thin-provisioned.

In fact IMHO, instead of writing zeros, it would be better to write random 
garbage instead (or have that as an even 'fuller'
preallocation mode), since underlying storage might 'compress' the zeros. 

In this version I do have a bug that I mentioned, about not preallocation some 
data at the end of the image, and I will
fix it, so that all image is zeros as expected

Best regards,
Maxim Levitsky


> 
> Max
> 





Re: [Qemu-block] [PATCH 0/4] virtio: handle zoned backing devices

2019-07-11 Thread Paolo Bonzini
On 11/07/19 02:52, Dmitry Fomichev wrote:
> Paolo,
> WRT to Host Aware drives, these MAY work, but we don't have any of these
> available for testing and are not able to verify which drivers do work
> with them and which do not. This is the reason for not letting them pass
> thru. If you prefer, I can enable passing HA drives in V2.

In theory host aware should work the same as drive managed if the host
is oblivious of zones, so I prefer enabling them indeed.

Thanks,

Paolo