Re: [PATCH v4 2/2] vhost-user-blk: delay vhost_user_blk_disconnect

2020-06-01 Thread Li Feng
Hi Raphael,
I'm sorry. I just end my journey today.

Yes, pls sign off me here.
this patch is nearly the same as my previous patch.

Thanks,
Feng Li

Raphael Norwitz  于2020年5月31日周日 上午8:55写道:
>
> On Thu, May 28, 2020 at 5:13 AM Dima Stepanov  wrote:
> >
> > A socket write during vhost-user communication may trigger a disconnect
> > event, calling vhost_user_blk_disconnect() and clearing all the
> > vhost_dev structures holding data that vhost-user functions expect to
> > remain valid to roll back initialization correctly. Delay the cleanup to
> > keep vhost_dev structure valid.
> > There are two possible states to handle:
> > 1. RUN_STATE_PRELAUNCH: skip bh oneshot call and perform disconnect in
> > the caller routine.
> > 2. RUN_STATE_RUNNING: delay by using bh
> >
> > BH changes are based on the similar changes for the vhost-user-net
> > device:
> >   commit e7c83a885f865128ae3cf1946f8cb538b63cbfba
> >   "vhost-user: delay vhost_user_stop"
> >
> > Signed-off-by: Dima Stepanov 
>
> Reviewed-by: Raphael Norwitz 
>
> Li Feng - would you also like to sign off here?
>
> > ---
> >  hw/block/vhost-user-blk.c | 38 +-
> >  1 file changed, 37 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > index 9d8c0b3..76838e7 100644
> > --- a/hw/block/vhost-user-blk.c
> > +++ b/hw/block/vhost-user-blk.c
> > @@ -349,6 +349,19 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
> >  vhost_dev_cleanup(>dev);
> >  }
> >
> > +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);
> > +
> > +static void vhost_user_blk_chr_closed_bh(void *opaque)
> > +{
> > +DeviceState *dev = opaque;
> > +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > +VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > +
> > +vhost_user_blk_disconnect(dev);
> > +qemu_chr_fe_set_handlers(>chardev, NULL, NULL, vhost_user_blk_event,
> > +NULL, opaque, NULL, true);
> > +}
> > +
> >  static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
> >  {
> >  DeviceState *dev = opaque;
> > @@ -363,7 +376,30 @@ static void vhost_user_blk_event(void *opaque, 
> > QEMUChrEvent event)
> >  }
> >  break;
> >  case CHR_EVENT_CLOSED:
> > -vhost_user_blk_disconnect(dev);
> > +/*
> > + * A close event may happen during a read/write, but vhost
> > + * code assumes the vhost_dev remains setup, so delay the
> > + * stop & clear. There are two possible paths to hit this
> > + * disconnect event:
> > + * 1. When VM is in the RUN_STATE_PRELAUNCH state. The
> > + * vhost_user_blk_device_realize() is a caller.
> > + * 2. In tha main loop phase after VM start.
> > + *
> > + * For p2 the disconnect event will be delayed. We can't
> > + * do the same for p1, because we are not running the loop
> > + * at this moment. So just skip this step and perform
> > + * disconnect in the caller function.
> > + *
> > + * TODO: maybe it is a good idea to make the same fix
> > + * for other vhost-user devices.
> > + */
> > +if (runstate_is_running()) {
> > +AioContext *ctx = qemu_get_current_aio_context();
> > +
> > +qemu_chr_fe_set_handlers(>chardev, NULL, NULL, NULL, NULL,
> > +NULL, NULL, false);
> > +aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, 
> > opaque);
> > +}
> >  break;
> >  case CHR_EVENT_BREAK:
> >  case CHR_EVENT_MUX_IN:
> > --
> > 2.7.4
> >
> >



Re: [PATCH v3 0/6] iotests: Dump QCOW2 dirty bitmaps metadata

2020-06-01 Thread Eric Blake

On 6/1/20 8:48 AM, Andrey Shinkevich wrote:

Add dirty bitmap information to QCOW2 metadata dump in qcow2.py script.

v3:
   01: JSON format output possibility added.


Also, you split it into a series.  Thanks; this makes it easier to 
review each step :)




v2:
   01: Refactoring of the Python code in the script qcow2.py.
   New methods were added. The bitmap dictionary was instantiated.
   The all of bitmaps information is read completely before
   printing the dictionary.
   02: The outputs of the tests 031, 036 and 061 were modified.

Andrey Shinkevich (6):
   iotests: Add extension names to qcow2.py dump
   iotests: move check for printable data to QcowHeaderExtension class
   iotests: dump bitmap extension data with qcow2.py
   iotests: Dump bitmap directory info with qcow2.py
   iotests: Dump bitmap table entries serialized in QCOW2 image
   iotests: Dump QCOW2 image metadata in JSON format with qcow2.py

  tests/qemu-iotests/031.out  |  22 +--
  tests/qemu-iotests/036.out  |   4 +-
  tests/qemu-iotests/061.out  |  18 +--
  tests/qemu-iotests/qcow2.py | 338 ++--
  4 files changed, 346 insertions(+), 36 deletions(-)



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v2 00/20] backup performance: block_status + async

2020-06-01 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20200601181118.579-1-vsement...@virtuozzo.com/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

PASS 1 fdc-test /x86_64/fdc/cmos
PASS 2 fdc-test /x86_64/fdc/no_media_on_start
PASS 3 fdc-test /x86_64/fdc/read_without_media
==8167==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 4 fdc-test /x86_64/fdc/media_change
PASS 5 fdc-test /x86_64/fdc/sense_interrupt
PASS 6 fdc-test /x86_64/fdc/relative_seek
---
PASS 32 test-opts-visitor /visitor/opts/range/beyond
PASS 33 test-opts-visitor /visitor/opts/dict/unvisited
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-coroutine -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-coroutine" 
==8242==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-coroutine /basic/no-dangling-access
PASS 2 test-coroutine /basic/lifecycle
PASS 3 test-coroutine /basic/yield
==8242==WARNING: ASan is ignoring requested __asan_handle_no_return: stack top: 
0x7ffc47dee000; bottom 0x7f11b97d2000; size: 0x00ea8e61c000 (100748080)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
PASS 4 test-coroutine /basic/nesting
---
PASS 12 test-aio /aio/event/flush
PASS 13 test-aio /aio/event/wait/no-flush-cb
PASS 14 test-aio /aio/timer/schedule
==8257==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 15 test-aio /aio/coroutine/queue-chaining
PASS 16 test-aio /aio-gsource/flush
PASS 17 test-aio /aio-gsource/bh/schedule
---
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img 
tests/qtest/ide-test -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="ide-test" 
PASS 28 test-aio /aio-gsource/timer/schedule
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-aio-multithread -m=quick -k --tap < /dev/null | 
./scripts/tap-driver.pl --test-name="test-aio-multithread" 
==8268==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-aio-multithread /aio/multi/lifecycle
==8265==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 ide-test /x86_64/ide/identify
PASS 2 test-aio-multithread /aio/multi/schedule
==8285==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 2 ide-test /x86_64/ide/flush
PASS 3 test-aio-multithread /aio/multi/mutex/contended
==8296==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 3 ide-test /x86_64/ide/bmdma/simple_rw
==8307==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 4 ide-test /x86_64/ide/bmdma/trim
==8313==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 4 test-aio-multithread /aio/multi/mutex/handoff
PASS 5 test-aio-multithread /aio/multi/mutex/mcs
PASS 6 test-aio-multithread /aio/multi/mutex/pthread
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-throttle -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-throttle" 
==8330==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 1 test-throttle /throttle/leak_bucket
PASS 2 test-throttle /throttle/compute_wait
PASS 3 test-throttle /throttle/init
---
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-thread-pool -m=quick -k --tap < /dev/null | ./scripts/tap-driver.pl 
--test-name="test-thread-pool" 
PASS 1 test-thread-pool /thread-pool/submit
PASS 2 test-thread-pool /thread-pool/submit-aio
==8334==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 3 test-thread-pool /thread-pool/submit-co
PASS 4 test-thread-pool /thread-pool/submit-many
PASS 5 test-thread-pool /thread-pool/cancel
==8401==WARNING: ASan doesn't fully support makecontext/swapcontext functions 
and may produce false positives in some cases!
PASS 6 test-thread-pool /thread-pool/cancel-async
MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}  
tests/test-hbitmap -m=quick -k --tap < /dev/null | 

Re: [PATCH v2 00/20] backup performance: block_status + async

2020-06-01 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20200601181118.579-1-vsement...@virtuozzo.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  TESTcheck-unit: tests/test-logging
  TESTcheck-unit: tests/test-replication
**
ERROR:/tmp/qemu-test/src/tests/test-replication.c:428:test_secondary_start: 
assertion failed: (!local_err)
ERROR - Bail out! 
ERROR:/tmp/qemu-test/src/tests/test-replication.c:428:test_secondary_start: 
assertion failed: (!local_err)
make: *** [check-unit] Error 1
make: *** Waiting for unfinished jobs
  TESTcheck-qtest-x86_64: tests/qtest/boot-order-test
  TESTcheck-qtest-x86_64: tests/qtest/bios-tables-test
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=73ef29198bda41a1bce9aa3697266e4c', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-j0ydkht8/src/docker-src.2020-06-01-14.44.54.8388:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=73ef29198bda41a1bce9aa3697266e4c
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-j0ydkht8/src'
make: *** [docker-run-test-quick@centos7] Error 2

real14m11.116s
user0m9.306s


The full log is available at
http://patchew.org/logs/20200601181118.579-1-vsement...@virtuozzo.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v2 00/20] backup performance: block_status + async

2020-06-01 Thread Vladimir Sementsov-Ogievskiy

01.06.2020 21:10, Vladimir Sementsov-Ogievskiy wrote:

Hi all!

This a last part of original
"[RFC 00/24] backup performance: block_status + async", prepartions are
already merged.

The series turn backup into series of block_copy_async calls, covering
the whole disk, so we get block-status based paralallel async requests
out of the box, which gives performance gain:

-    -  --  
--    ---
mirror(upstream)  backup(new)backup(new, no-copy-range) 
 backup(new, copy-range-1w)  backup(upstream)  backup(upstream, no-copy-range)
hdd-ext4:hdd-ext4  18.86 +- 0.11 45.50 +- 2.35  19.22 +- 0.09   
19.51 +- 0.09   22.85 +- 5.98 19.72 +- 0.35
hdd-ext4:ssd-ext4  8.99 +- 0.02  9.30 +- 0.01   8.97 +- 0.02
9.02 +- 0.029.68 +- 0.26  9.84 +- 0.12
ssd-ext4:hdd-ext4  9.09 +- 0.11  9.34 +- 0.10   9.34 +- 0.10
8.99 +- 0.0111.37 +- 0.37 11.47 +- 0.30
ssd-ext4:ssd-ext4  4.07 +- 0.02  5.41 +- 0.05   4.05 +- 0.01
8.35 +- 0.589.83 +- 0.64  8.62 +- 0.35
hdd-xfs:hdd-xfs18.90 +- 0.19 43.26 +- 2.47  19.62 +- 0.14   
19.38 +- 0.16   19.55 +- 0.26 19.62 +- 0.12
hdd-xfs:ssd-xfs8.93 +- 0.12  9.35 +- 0.03   8.93 +- 0.08
8.93 +- 0.059.79 +- 0.30  9.55 +- 0.15
ssd-xfs:hdd-xfs9.15 +- 0.07  9.74 +- 0.28   9.29 +- 0.03
9.08 +- 0.0510.85 +- 0.31 10.91 +- 0.30
ssd-xfs:ssd-xfs4.06 +- 0.01  4.93 +- 0.02   4.04 +- 0.01
8.17 +- 0.429.52 +- 0.49  8.85 +- 0.46
ssd-ext4:nbd   9.96 +- 0.11  11.45 +- 0.15  11.45 +- 0.02   
17.22 +- 0.06   34.45 +- 1.35 35.16 +- 0.37
nbd:ssd-ext4   9.84 +- 0.02  9.84 +- 0.04   9.80 +- 0.06
18.96 +- 0.06   30.89 +- 0.73 31.46 +- 0.21
-    -  --  
--    ---


I should add, that nbd results may be damaged by the fact that node with nbd 
server is my desktop, which was used for another tasks in parallel. Still I 
don't think it really hurt.




The table shows, that copy_range is in bad relation with parallel async
requests. copy_range brings real performance gain only on supporting fs,
like btrfs. But even on such fs, I'm not sure that this is a good
default behavior: if we do offload copy, so, that no real copy but just
link block in backup the same blocks as in original, this means that
further write from guest will lead to fragmentation of guest disk, when
the aim of backup is to operate transparently for the guest.

So, in addition to these series I also suggest to disable copy_range by
default.

===

How to test:

prepare images:
In a directories, where you want to place source and target images,
prepare images by:

for img in test-source test-target; do
  ./qemu-img create -f raw $img 1000M;
  ./qemu-img bench -c 1000 -d 1 -f raw -s 1M -w --pattern=0xff $img
done

prepare similar image for nbd server, and start it somewhere by

  qemu-nbd --persistent --nocache -f raw IMAGE

Then, run benchmark, like this:
./bench-backup.py --qemu new:../../x86_64-softmmu/qemu-system-x86_64 
upstream:/work/src/qemu/up-backup-block-copy-master/x86_64-softmmu/qemu-system-x86_64
 --dir hdd-ext4:/test-a hdd-xfs:/test-b ssd-ext4:/ssd ssd-xfs:/ssd-b --test 
$(for fs in ext4 xfs; do echo hdd-$fs:hdd-$fs hdd-$fs:ssd-$fs ssd-$fs:hdd-$fs 
ssd-$fs:ssd-$fs; done) --nbd 192.168.100.2 --test ssd-ext4:nbd nbd:ssd-ext4

(you may simply reduce number of directories/test-cases, use --help for
  help)

===

Note, that I included here
"[PATCH] block/block-copy: block_copy_dirty_clusters: fix failure check"
which was previously sent in separate, but still untouched in mailing
list. It still may be applied separately.

Vladimir Sementsov-Ogievskiy (20):
   block/block-copy: block_copy_dirty_clusters: fix failure check
   iotests: 129 don't check backup "busy"
   qapi: backup: add x-use-copy-range parameter
   block/block-copy: More explicit call_state
   block/block-copy: implement block_copy_async
   block/block-copy: add max_chunk and max_workers parameters
   block/block-copy: add ratelimit to block-copy
   block/block-copy: add block_copy_cancel
   blockjob: add set_speed to BlockJobDriver
   job: call job_enter from job_user_pause
   qapi: backup: add x-max-chunk and x-max-workers parameters
   iotests: 56: prepare for backup over block-copy
   iotests: 129: prepare for backup over block-copy
   iotests: 185: prepare for backup over block-copy
   iotests: 219: prepare for backup over block-copy
   iotests: 257: prepare for backup over block-copy
   backup: move to block-copy

[PATCH v2 19/20] simplebench: bench_block_job: add cmd_options argument

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
Add argument to allow additional block-job options.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 scripts/simplebench/bench-example.py   |  2 +-
 scripts/simplebench/bench_block_job.py | 13 -
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/scripts/simplebench/bench-example.py 
b/scripts/simplebench/bench-example.py
index c642a5b891..c3d17213e3 100644
--- a/scripts/simplebench/bench-example.py
+++ b/scripts/simplebench/bench-example.py
@@ -24,7 +24,7 @@ from bench_block_job import bench_block_copy, drv_file, 
drv_nbd
 
 def bench_func(env, case):
 """ Handle one "cell" of benchmarking table. """
-return bench_block_copy(env['qemu_binary'], env['cmd'],
+return bench_block_copy(env['qemu_binary'], env['cmd'], {}
 case['source'], case['target'])
 
 
diff --git a/scripts/simplebench/bench_block_job.py 
b/scripts/simplebench/bench_block_job.py
index 9808d696cf..7332845c1c 100755
--- a/scripts/simplebench/bench_block_job.py
+++ b/scripts/simplebench/bench_block_job.py
@@ -1,4 +1,4 @@
-#!/usr/bin/env python
+#!/usr/bin/env python3
 #
 # Benchmark block jobs
 #
@@ -78,16 +78,19 @@ def bench_block_job(cmd, cmd_args, qemu_args):
 
 
 # Bench backup or mirror
-def bench_block_copy(qemu_binary, cmd, source, target):
+def bench_block_copy(qemu_binary, cmd, cmd_options, source, target):
 """Helper to run bench_block_job() for mirror or backup"""
 assert cmd in ('blockdev-backup', 'blockdev-mirror')
 
 source['node-name'] = 'source'
 target['node-name'] = 'target'
 
-return bench_block_job(cmd,
-   {'job-id': 'job0', 'device': 'source',
-'target': 'target', 'sync': 'full'},
+cmd_options['job-id'] = 'job0'
+cmd_options['device'] = 'source'
+cmd_options['target'] = 'target'
+cmd_options['sync'] = 'full'
+
+return bench_block_job(cmd, cmd_options,
[qemu_binary,
 '-blockdev', json.dumps(source),
 '-blockdev', json.dumps(target)])
-- 
2.21.0




[PATCH v2 20/20] simplebench: add bench-backup.py

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
Add script to benchmark new backup architecture.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 scripts/simplebench/bench-backup.py | 132 
 1 file changed, 132 insertions(+)
 create mode 100755 scripts/simplebench/bench-backup.py

diff --git a/scripts/simplebench/bench-backup.py 
b/scripts/simplebench/bench-backup.py
new file mode 100755
index 00..8930d23887
--- /dev/null
+++ b/scripts/simplebench/bench-backup.py
@@ -0,0 +1,132 @@
+#!/usr/bin/env python3
+#
+# Bench backup block-job
+#
+# Copyright (c) 2020 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import argparse
+import simplebench
+from bench_block_job import bench_block_copy, drv_file, drv_nbd
+
+
+def bench_func(env, case):
+""" Handle one "cell" of benchmarking table. """
+cmd_options = env['cmd-options'] if 'cmd-options' in env else {}
+return bench_block_copy(env['qemu-binary'], env['cmd'],
+cmd_options,
+case['source'], case['target'])
+
+
+def bench(args):
+test_cases = []
+
+sources = {}
+targets = {}
+for d in args.dir:
+label, path = d.split(':')
+sources[label] = drv_file(path + '/test-source')
+targets[label] = drv_file(path + '/test-target')
+
+if args.nbd:
+nbd = args.nbd.split(':')
+host = nbd[0]
+port = '10809' if len(nbd) == 1 else nbd[1]
+drv = drv_nbd(host, port)
+sources['nbd'] = drv
+targets['nbd'] = drv
+
+for t in args.test:
+src, dst = t.split(':')
+
+test_cases.append({
+'id': t,
+'source': sources[src],
+'target': targets[dst]
+})
+
+binaries = []
+upstream = None
+for i, q in enumerate(args.qemu):
+name_path = q.split(':')
+if len(name_path) == 1:
+binaries.append((f'q{i}', name_path[0]))
+else:
+binaries.append((name_path[0], name_path[1]))
+if name_path[0] == 'upstream' or name_path[0] == 'master':
+upstream = binaries[-1]
+
+test_envs = []
+if upstream:
+label, path = upstream
+test_envs.append({
+'id': f'mirror({label})',
+'cmd': 'blockdev-mirror',
+'qemu-binary': path
+})
+
+for label, path in binaries:
+test_envs.append({
+'id': f'backup({label})',
+'cmd': 'blockdev-backup',
+'qemu-binary': path
+})
+test_envs.append({
+'id': f'backup({label}, no-copy-range)',
+'cmd': 'blockdev-backup',
+'cmd-options': {'x-use-copy-range': False},
+'qemu-binary': path
+})
+if label == 'new':
+test_envs.append({
+'id': f'backup({label}, copy-range-1w)',
+'cmd': 'blockdev-backup',
+'cmd-options': {'x-use-copy-range': True,
+'x-max-workers': 1},
+'qemu-binary': path
+})
+
+result = simplebench.bench(bench_func, test_envs, test_cases, count=3)
+print(simplebench.ascii(result))
+
+
+class ExtendAction(argparse.Action):
+def __call__(self, parser, namespace, values, option_string=None):
+items = getattr(namespace, self.dest) or []
+items.extend(values)
+setattr(namespace, self.dest, items)
+
+
+if __name__ == '__main__':
+p = argparse.ArgumentParser('Backup benchmark')
+p.add_argument('--qemu', nargs='+', help='Qemu binaries to compare, just '
+   'file path with label, like label:/path/to/qemu. Qemu '
+   'labeled "new" should support x-max-workers argument for '
+   'backup job, labeled "upstream" will be used also to run '
+   'mirror benchmark for comparison.',
+   action=ExtendAction)
+p.add_argument('--dir', nargs='+', help='Directories, each containing '
+   '"test-source" and/or "test-target" files, raw images to '
+   'used in benchmarking. File path with label, like '
+   'label:/path/to/directory', action=ExtendAction)
+p.add_argument('--nbd', help='host:port for remote NBD '
+   'image, (or just host, for default port 10809). Use it in '

[PATCH v2 16/20] iotests: 257: prepare for backup over block-copy

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
Iotest 257 dumps a lot of in-progress information of backup job, such
as offset and bitmap dirtiness. Further commit will move backup to be
one block-copy call, which will introduce async parallel requests
instead of plain cluster-by-cluster copying. To keep things
deterministic, allow only one worker (only one copy request at a time)
for this test.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/257 |   1 +
 tests/qemu-iotests/257.out | 306 ++---
 2 files changed, 154 insertions(+), 153 deletions(-)

diff --git a/tests/qemu-iotests/257 b/tests/qemu-iotests/257
index 004a433b8b..732b2f0071 100755
--- a/tests/qemu-iotests/257
+++ b/tests/qemu-iotests/257
@@ -191,6 +191,7 @@ def blockdev_backup(vm, device, target, sync, **kwargs):
 target=target,
 sync=sync,
 filter_node_name='backup-top',
+x_max_workers=1,
 **kwargs)
 return result
 
diff --git a/tests/qemu-iotests/257.out b/tests/qemu-iotests/257.out
index 64dd460055..6997b56567 100644
--- a/tests/qemu-iotests/257.out
+++ b/tests/qemu-iotests/257.out
@@ -30,7 +30,7 @@ write -P0x76 0x3ff 0x1
 {"execute": "job-dismiss", "arguments": {"id": "bdc-fmt-job"}}
 {"return": {}}
 {}
-{"execute": "blockdev-backup", "arguments": {"device": "drive0", 
"filter-node-name": "backup-top", "job-id": "ref_backup_0", "sync": "full", 
"target": "ref_target_0"}}
+{"execute": "blockdev-backup", "arguments": {"device": "drive0", 
"filter-node-name": "backup-top", "job-id": "ref_backup_0", "sync": "full", 
"target": "ref_target_0", "x-max-workers": 1}}
 {"return": {}}
 {"data": {"device": "ref_backup_0", "len": 67108864, "offset": 67108864, 
"speed": 0, "type": "backup"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
 
@@ -78,7 +78,7 @@ expecting 6 dirty sectors; have 6. OK!
 {"execute": "job-dismiss", "arguments": {"id": "bdc-fmt-job"}}
 {"return": {}}
 {}
-{"execute": "blockdev-backup", "arguments": {"device": "drive0", 
"filter-node-name": "backup-top", "job-id": "ref_backup_1", "sync": "full", 
"target": "ref_target_1"}}
+{"execute": "blockdev-backup", "arguments": {"device": "drive0", 
"filter-node-name": "backup-top", "job-id": "ref_backup_1", "sync": "full", 
"target": "ref_target_1", "x-max-workers": 1}}
 {"return": {}}
 {"data": {"device": "ref_backup_1", "len": 67108864, "offset": 67108864, 
"speed": 0, "type": "backup"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
 
@@ -92,7 +92,7 @@ expecting 6 dirty sectors; have 6. OK!
 {"execute": "job-dismiss", "arguments": {"id": "bdc-fmt-job"}}
 {"return": {}}
 {}
-{"execute": "blockdev-backup", "arguments": {"auto-finalize": false, "bitmap": 
"bitmap0", "bitmap-mode": "never", "device": "drive0", "filter-node-name": 
"backup-top", "job-id": "backup_1", "sync": "bitmap", "target": 
"backup_target_1"}}
+{"execute": "blockdev-backup", "arguments": {"auto-finalize": false, "bitmap": 
"bitmap0", "bitmap-mode": "never", "device": "drive0", "filter-node-name": 
"backup-top", "job-id": "backup_1", "sync": "bitmap", "target": 
"backup_target_1", "x-max-workers": 1}}
 {"return": {}}
 
 --- Write #2 ---
@@ -205,7 +205,7 @@ expecting 15 dirty sectors; have 15. OK!
 {"execute": "job-dismiss", "arguments": {"id": "bdc-fmt-job"}}
 {"return": {}}
 {}
-{"execute": "blockdev-backup", "arguments": {"device": "drive0", 
"filter-node-name": "backup-top", "job-id": "ref_backup_2", "sync": "full", 
"target": "ref_target_2"}}
+{"execute": "blockdev-backup", "arguments": {"device": "drive0", 
"filter-node-name": "backup-top", "job-id": "ref_backup_2", "sync": "full", 
"target": "ref_target_2", "x-max-workers": 1}}
 {"return": {}}
 {"data": {"device": "ref_backup_2", "len": 67108864, "offset": 67108864, 
"speed": 0, "type": "backup"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
 
@@ -219,7 +219,7 @@ expecting 15 dirty sectors; have 15. OK!
 {"execute": "job-dismiss", "arguments": {"id": "bdc-fmt-job"}}
 {"return": {}}
 {}
-{"execute": "blockdev-backup", "arguments": {"auto-finalize": false, "bitmap": 
"bitmap0", "bitmap-mode": "never", "device": "drive0", "filter-node-name": 
"backup-top", "job-id": "backup_2", "sync": "bitmap", "target": 
"backup_target_2"}}
+{"execute": "blockdev-backup", "arguments": {"auto-finalize": false, "bitmap": 
"bitmap0", "bitmap-mode": "never", "device": "drive0", "filter-node-name": 
"backup-top", "job-id": "backup_2", "sync": "bitmap", "target": 
"backup_target_2", "x-max-workers": 1}}
 {"return": {}}
 {"execute": "job-finalize", "arguments": {"id": "backup_2"}}
 {"return": {}}
@@ -290,7 +290,7 @@ write -P0x76 0x3ff 0x1
 {"execute": "job-dismiss", "arguments": {"id": "bdc-fmt-job"}}
 {"return": {}}
 {}
-{"execute": "blockdev-backup", "arguments": {"device": "drive0", 
"filter-node-name": 

[PATCH v2 17/20] backup: move to block-copy

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
This brings async request handling and block-status driven chunk sizes
to backup out of the box, which improves backup performance.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block-copy.h |   9 +--
 block/backup.c | 145 +++--
 block/block-copy.c |  21 ++
 3 files changed, 83 insertions(+), 92 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 370a194d3c..a3e11d3fa2 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -18,7 +18,6 @@
 #include "block/block.h"
 #include "qemu/co-shared-resource.h"
 
-typedef void (*ProgressBytesCallbackFunc)(int64_t bytes, void *opaque);
 typedef void (*BlockCopyAsyncCallbackFunc)(int ret, bool error_is_read,
void *opaque);
 typedef struct BlockCopyState BlockCopyState;
@@ -29,11 +28,6 @@ BlockCopyState *block_copy_state_new(BdrvChild *source, 
BdrvChild *target,
  BdrvRequestFlags write_flags,
  Error **errp);
 
-void block_copy_set_progress_callback(
-BlockCopyState *s,
-ProgressBytesCallbackFunc progress_bytes_callback,
-void *progress_opaque);
-
 void block_copy_set_progress_meter(BlockCopyState *s, ProgressMeter *pm);
 
 void block_copy_state_free(BlockCopyState *s);
@@ -57,7 +51,8 @@ BlockCopyCallState *block_copy_async(BlockCopyState *s,
  int64_t offset, int64_t bytes,
  bool ratelimit, int max_workers,
  int64_t max_chunk,
- BlockCopyAsyncCallbackFunc cb);
+ BlockCopyAsyncCallbackFunc cb,
+ void *cb_opaque);
 
 /*
  * Set speed limit for block-copy instance. All block-copy operations related 
to
diff --git a/block/backup.c b/block/backup.c
index ec2676abc2..59c00f5293 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -44,42 +44,19 @@ typedef struct BackupBlockJob {
 BlockdevOnError on_source_error;
 BlockdevOnError on_target_error;
 uint64_t len;
-uint64_t bytes_read;
 int64_t cluster_size;
 int max_workers;
 int64_t max_chunk;
 
 BlockCopyState *bcs;
+
+BlockCopyCallState *bcs_call;
+int ret;
+bool error_is_read;
 } BackupBlockJob;
 
 static const BlockJobDriver backup_job_driver;
 
-static void backup_progress_bytes_callback(int64_t bytes, void *opaque)
-{
-BackupBlockJob *s = opaque;
-
-s->bytes_read += bytes;
-}
-
-static int coroutine_fn backup_do_cow(BackupBlockJob *job,
-  int64_t offset, uint64_t bytes,
-  bool *error_is_read)
-{
-int ret = 0;
-int64_t start, end; /* bytes */
-
-start = QEMU_ALIGN_DOWN(offset, job->cluster_size);
-end = QEMU_ALIGN_UP(bytes + offset, job->cluster_size);
-
-trace_backup_do_cow_enter(job, start, offset, bytes);
-
-ret = block_copy(job->bcs, start, end - start, error_is_read);
-
-trace_backup_do_cow_return(job, offset, bytes, ret);
-
-return ret;
-}
-
 static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
 {
 BdrvDirtyBitmap *bm;
@@ -159,54 +136,58 @@ static BlockErrorAction 
backup_error_action(BackupBlockJob *job,
 }
 }
 
-static bool coroutine_fn yield_and_check(BackupBlockJob *job)
+static void coroutine_fn backup_block_copy_callback(int ret, bool 
error_is_read,
+void *opaque)
 {
-uint64_t delay_ns;
-
-if (job_is_cancelled(>common.job)) {
-return true;
-}
-
-/*
- * We need to yield even for delay_ns = 0 so that bdrv_drain_all() can
- * return. Without a yield, the VM would not reboot.
- */
-delay_ns = block_job_ratelimit_get_delay(>common, job->bytes_read);
-job->bytes_read = 0;
-job_sleep_ns(>common.job, delay_ns);
-
-if (job_is_cancelled(>common.job)) {
-return true;
-}
+BackupBlockJob *s = opaque;
 
-return false;
+s->bcs_call = NULL;
+s->ret = ret;
+s->error_is_read = error_is_read;
+job_enter(>common.job);
 }
 
 static int coroutine_fn backup_loop(BackupBlockJob *job)
 {
-bool error_is_read;
-int64_t offset;
-BdrvDirtyBitmapIter *bdbi;
-int ret = 0;
+while (true) { /* retry loop */
+assert(!job->bcs_call);
+job->bcs_call = block_copy_async(job->bcs, 0,
+ QEMU_ALIGN_UP(job->len,
+   job->cluster_size),
+ true, job->max_workers, 
job->max_chunk,
+ backup_block_copy_callback, job);
 
-bdbi = bdrv_dirty_iter_new(block_copy_dirty_bitmap(job->bcs));
-while ((offset = bdrv_dirty_iter_next(bdbi)) != -1) {
-

[PATCH v2 18/20] block/block-copy: drop unused argument of block_copy()

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block-copy.h |  3 +--
 block/backup-top.c |  2 +-
 block/block-copy.c | 11 ++-
 3 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index a3e11d3fa2..760dfc9eae 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -35,8 +35,7 @@ void block_copy_state_free(BlockCopyState *s);
 int64_t block_copy_reset_unallocated(BlockCopyState *s,
  int64_t offset, int64_t *count);
 
-int coroutine_fn block_copy(BlockCopyState *s, int64_t offset, int64_t bytes,
-bool *error_is_read);
+int coroutine_fn block_copy(BlockCopyState *s, int64_t offset, int64_t bytes);
 
 /*
  * Run block-copy in a coroutine, return state pointer. If finished early
diff --git a/block/backup-top.c b/block/backup-top.c
index 0a09544c76..f4230aded4 100644
--- a/block/backup-top.c
+++ b/block/backup-top.c
@@ -61,7 +61,7 @@ static coroutine_fn int backup_top_cbw(BlockDriverState *bs, 
uint64_t offset,
 off = QEMU_ALIGN_DOWN(offset, s->cluster_size);
 end = QEMU_ALIGN_UP(offset + bytes, s->cluster_size);
 
-return block_copy(s->bcs, off, end - off, NULL);
+return block_copy(s->bcs, off, end - off);
 }
 
 static int coroutine_fn backup_top_co_pdiscard(BlockDriverState *bs,
diff --git a/block/block-copy.c b/block/block-copy.c
index 6a9d891b63..6cb721cc26 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -713,8 +713,7 @@ static int coroutine_fn 
block_copy_common(BlockCopyCallState *call_state)
 return ret;
 }
 
-int coroutine_fn block_copy(BlockCopyState *s, int64_t start, int64_t bytes,
-bool *error_is_read)
+int coroutine_fn block_copy(BlockCopyState *s, int64_t start, int64_t bytes)
 {
 BlockCopyCallState call_state = {
 .s = s,
@@ -723,13 +722,7 @@ int coroutine_fn block_copy(BlockCopyState *s, int64_t 
start, int64_t bytes,
 .max_workers = BLOCK_COPY_MAX_WORKERS,
 };
 
-int ret = block_copy_common(_state);
-
-if (error_is_read && ret < 0) {
-*error_is_read = call_state.error_is_read;
-}
-
-return ret;
+return block_copy_common(_state);
 }
 
 static void coroutine_fn block_copy_async_co_entry(void *opaque)
-- 
2.21.0




[PATCH v2 13/20] iotests: 129: prepare for backup over block-copy

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
After introducing parallel async copy requests instead of plain
cluster-by-cluster copying loop, backup job may finish earlier than
final assertion in do_test_stop. Let's require slow backup explicitly
by specifying speed parameter.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/129 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/129 b/tests/qemu-iotests/129
index 4db5eca441..bca56b589d 100755
--- a/tests/qemu-iotests/129
+++ b/tests/qemu-iotests/129
@@ -76,7 +76,7 @@ class TestStopWithBlockJob(iotests.QMPTestCase):
 def test_drive_backup(self):
 self.do_test_stop("drive-backup", device="drive0",
   target=self.target_img,
-  sync="full")
+  sync="full", speed=1024)
 
 def test_block_commit(self):
 self.do_test_stop("block-commit", device="drive0")
-- 
2.21.0




[PATCH v2 06/20] block/block-copy: add max_chunk and max_workers parameters

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
They will be used for backup.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block-copy.h |  5 +
 block/block-copy.c | 10 --
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index ada0d99566..600984c733 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -47,6 +47,11 @@ int coroutine_fn block_copy(BlockCopyState *s, int64_t 
offset, int64_t bytes,
 /*
  * Run block-copy in a coroutine, return state pointer. If finished early
  * returns NULL (@cb is called anyway).
+ *
+ * @max_workers means maximum of parallel coroutines to execute sub-requests,
+ * must be > 0.
+ *
+ * @max_chunk means maximum length for one IO operation. Zero means unlimited.
  */
 BlockCopyCallState *block_copy_async(BlockCopyState *s,
  int64_t offset, int64_t bytes,
diff --git a/block/block-copy.c b/block/block-copy.c
index a0477d90f3..4114d1fd25 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -34,6 +34,8 @@ typedef struct BlockCopyCallState {
 BlockCopyState *s;
 int64_t offset;
 int64_t bytes;
+int max_workers;
+int64_t max_chunk;
 BlockCopyAsyncCallbackFunc cb;
 
 /* State */
@@ -144,10 +146,11 @@ static BlockCopyTask 
*block_copy_task_create(BlockCopyState *s,
  int64_t offset, int64_t bytes)
 {
 BlockCopyTask *task;
+int64_t max_chunk = MIN_NON_ZERO(s->copy_size, call_state->max_chunk);
 
 if (!bdrv_dirty_bitmap_next_dirty_area(s->copy_bitmap,
offset, offset + bytes,
-   s->copy_size, , ))
+   max_chunk, , ))
 {
 return NULL;
 }
@@ -616,7 +619,7 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
 bytes = end - offset;
 
 if (!aio && bytes) {
-aio = aio_task_pool_new(BLOCK_COPY_MAX_WORKERS);
+aio = aio_task_pool_new(call_state->max_workers);
 }
 
 ret = block_copy_task_run(aio, task);
@@ -695,6 +698,7 @@ int coroutine_fn block_copy(BlockCopyState *s, int64_t 
start, int64_t bytes,
 .s = s,
 .offset = start,
 .bytes = bytes,
+.max_workers = BLOCK_COPY_MAX_WORKERS,
 };
 
 int ret = block_copy_common(_state);
@@ -726,6 +730,8 @@ BlockCopyCallState *block_copy_async(BlockCopyState *s,
 .offset = offset,
 .bytes = bytes,
 .cb = cb,
+.max_workers = max_workers ?: BLOCK_COPY_MAX_WORKERS,
+.max_chunk = max_chunk,
 };
 
 qemu_coroutine_enter(co);
-- 
2.21.0




[PATCH v2 15/20] iotests: 219: prepare for backup over block-copy

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
The further change of moving backup to be a on block-copy call will
make copying chunk-size and cluster-size a separate things. So, even
with 64k cluster sized qcow2 image, default chunk would be 1M.
Test 219 depends on specified chunk-size. Update it for explicit
chunk-size for backup as for mirror.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/219 | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/tests/qemu-iotests/219 b/tests/qemu-iotests/219
index db272c5249..2bbed28f39 100755
--- a/tests/qemu-iotests/219
+++ b/tests/qemu-iotests/219
@@ -203,13 +203,13 @@ with iotests.FilePath('disk.img') as disk_path, \
 # but related to this also automatic state transitions like job
 # completion), but still get pause points often enough to avoid making this
 # test very slow, it's important to have the right ratio between speed and
-# buf_size.
+# copy-chunk-size.
 #
-# For backup, buf_size is hard-coded to the source image cluster size 
(64k),
-# so we'll pick the same for mirror. The slice time, i.e. the granularity
-# of the rate limiting is 100ms. With a speed of 256k per second, we can
-# get four pause points per second. This gives us 250ms per iteration,
-# which should be enough to stay deterministic.
+# Chose 64k copy-chunk-size both for mirror (by buf_size) and backup (by
+# x-max-chunk). The slice time, i.e. the granularity of the rate limiting
+# is 100ms. With a speed of 256k per second, we can get four pause points
+# per second. This gives us 250ms per iteration, which should be enough to
+# stay deterministic.
 
 test_job_lifecycle(vm, 'drive-mirror', has_ready=True, job_args={
 'device': 'drive0-node',
@@ -226,6 +226,7 @@ with iotests.FilePath('disk.img') as disk_path, \
 'target': copy_path,
 'sync': 'full',
 'speed': 262144,
+'x-max-chunk': 65536,
 'auto-finalize': auto_finalize,
 'auto-dismiss': auto_dismiss,
 })
-- 
2.21.0




[PATCH v2 12/20] iotests: 56: prepare for backup over block-copy

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
After introducing parallel async copy requests instead of plain
cluster-by-cluster copying loop, we'll have to wait for paused status,
as we need to wait for several parallel request. So, let's gently wait
instead of just asserting that job already paused.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/056 | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/056 b/tests/qemu-iotests/056
index f73fc74457..2ced356a43 100755
--- a/tests/qemu-iotests/056
+++ b/tests/qemu-iotests/056
@@ -306,8 +306,12 @@ class BackupTest(iotests.QMPTestCase):
 event = self.vm.event_wait(name="BLOCK_JOB_ERROR",
match={'data': {'device': 'drive0'}})
 self.assertNotEqual(event, None)
-# OK, job should be wedged
-res = self.vm.qmp('query-block-jobs')
+# OK, job should pause, but it can't do it immediately, as it can't
+# cancel other parallel requests (which didn't fail)
+while True:
+res = self.vm.qmp('query-block-jobs')
+if res['return'][0]['status'] == 'paused':
+break
 self.assert_qmp(res, 'return[0]/status', 'paused')
 res = self.vm.qmp('block-job-dismiss', id='drive0')
 self.assert_qmp(res, 'error/desc',
-- 
2.21.0




[PATCH v2 14/20] iotests: 185: prepare for backup over block-copy

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
The further change of moving backup to be a on block-copy call will
make copying chunk-size and cluster-size a separate things. So, even
with 64k cluster sized qcow2 image, default chunk would be 1M.
185 test however assumes, that with speed limited to 64K, one iteration
would result in offset=64K. It will change, as first iteration would
result in offset=1M independently of speed.

So, let's explicitly specify, what test wants: set max-chunk to 64K, so
that one iteration is 64K. Note, that we don't need to limit
max-workers, as block-copy rate limitator will handle the situation and
wouldn't start new workers when speed limit is obviously reached.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/185 | 3 ++-
 tests/qemu-iotests/185.out | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/185 b/tests/qemu-iotests/185
index fd5e6ebe11..6afb3fc82f 100755
--- a/tests/qemu-iotests/185
+++ b/tests/qemu-iotests/185
@@ -182,7 +182,8 @@ _send_qemu_cmd $h \
   'target': '$TEST_IMG.copy',
   'format': '$IMGFMT',
   'sync': 'full',
-  'speed': 65536 } }" \
+  'speed': 65536,
+  'x-max-chunk': 65536 } }" \
 "return"
 
 # If we don't sleep here 'quit' command races with disk I/O
diff --git a/tests/qemu-iotests/185.out b/tests/qemu-iotests/185.out
index ac5ab16bc8..5232647972 100644
--- a/tests/qemu-iotests/185.out
+++ b/tests/qemu-iotests/185.out
@@ -61,7 +61,7 @@ Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 
cluster_size=65536 l
 
 { 'execute': 'qmp_capabilities' }
 {"return": {}}
-{ 'execute': 'drive-backup', 'arguments': { 'device': 'disk', 'target': 
'TEST_DIR/t.IMGFMT.copy', 'format': 'IMGFMT', 'sync': 'full', 'speed': 65536 } }
+{ 'execute': 'drive-backup', 'arguments': { 'device': 'disk', 'target': 
'TEST_DIR/t.IMGFMT.copy', 'format': 'IMGFMT', 'sync': 'full', 'speed': 65536, 
'x-max-chunk': 65536 } }
 Formatting 'TEST_DIR/t.qcow2.copy', fmt=qcow2 size=67108864 cluster_size=65536 
lazy_refcounts=off refcount_bits=16 compression_type=zlib
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "created", "id": "disk"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"JOB_STATUS_CHANGE", "data": {"status": "running", "id": "disk"}}
-- 
2.21.0




[PATCH v2 10/20] job: call job_enter from job_user_pause

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
If main job coroutine called job_yield (while some background process
is in progress), we should give it a chance to call job_pause_point().
It will be used in backup, when moved on async block-copy.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 job.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/job.c b/job.c
index 53be57a3a0..0a9510ece1 100644
--- a/job.c
+++ b/job.c
@@ -578,6 +578,7 @@ void job_user_pause(Job *job, Error **errp)
 }
 job->user_paused = true;
 job_pause(job);
+job_enter(job);
 }
 
 bool job_user_paused(Job *job)
-- 
2.21.0




[PATCH v2 07/20] block/block-copy: add ratelimit to block-copy

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
We are going to directly use one async block-copy operation for backup
job, so we need rate limitator.

We want to maintain current backup behavior: only background copying is
limited and copy-before-write operations only participate in limit
calculation. Therefore we need one rate limitator for block-copy state
and boolean flag for block-copy call state for actual limitation.

Note, that we can't just calculate each chunk in limitator after
successful copying: it will not save us from starting a lot of async
sub-requests which will exceed limit too much. Instead let's use the
following scheme on sub-request creation:
1. If at the moment limit is not exceeded, create the request and
account it immediately.
2. If at the moment limit is already exceeded, drop create sub-request
and handle limit instead (by sleep).
With this approach we'll never exceed the limit more than by one
sub-request (which pretty much matches current backup behavior).

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block-copy.h |  8 +++
 block/block-copy.c | 44 ++
 2 files changed, 52 insertions(+)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 600984c733..d40e691123 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -59,6 +59,14 @@ BlockCopyCallState *block_copy_async(BlockCopyState *s,
  int64_t max_chunk,
  BlockCopyAsyncCallbackFunc cb);
 
+/*
+ * Set speed limit for block-copy instance. All block-copy operations related 
to
+ * this BlockCopyState will participate in speed calculation, but only
+ * block_copy_async calls with @ratelimit=true will be actually limited.
+ */
+void block_copy_set_speed(BlockCopyState *s, BlockCopyCallState *call_state,
+  uint64_t speed);
+
 BdrvDirtyBitmap *block_copy_dirty_bitmap(BlockCopyState *s);
 void block_copy_set_skip_unallocated(BlockCopyState *s, bool skip);
 
diff --git a/block/block-copy.c b/block/block-copy.c
index 4114d1fd25..851d9c8aaf 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -26,6 +26,7 @@
 #define BLOCK_COPY_MAX_BUFFER (1 * MiB)
 #define BLOCK_COPY_MAX_MEM (128 * MiB)
 #define BLOCK_COPY_MAX_WORKERS 64
+#define BLOCK_COPY_SLICE_TIME 1ULL /* ns */
 
 static coroutine_fn int block_copy_task_entry(AioTask *task);
 
@@ -36,11 +37,13 @@ typedef struct BlockCopyCallState {
 int64_t bytes;
 int max_workers;
 int64_t max_chunk;
+bool ratelimit;
 BlockCopyAsyncCallbackFunc cb;
 
 /* State */
 bool failed;
 bool finished;
+QemuCoSleepState *sleep_state;
 
 /* OUT parameters */
 bool error_is_read;
@@ -103,6 +106,9 @@ typedef struct BlockCopyState {
 void *progress_opaque;
 
 SharedResource *mem;
+
+uint64_t speed;
+RateLimit rate_limit;
 } BlockCopyState;
 
 static BlockCopyTask *find_conflicting_task(BlockCopyState *s,
@@ -611,6 +617,21 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
 }
 task->zeroes = ret & BDRV_BLOCK_ZERO;
 
+if (s->speed) {
+if (call_state->ratelimit) {
+uint64_t ns = ratelimit_calculate_delay(>rate_limit, 0);
+if (ns > 0) {
+block_copy_task_end(task, -EAGAIN);
+g_free(task);
+qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, ns,
+  _state->sleep_state);
+continue;
+}
+}
+
+ratelimit_calculate_delay(>rate_limit, task->bytes);
+}
+
 trace_block_copy_process(s, task->offset);
 
 co_get_from_shres(s->mem, task->bytes);
@@ -649,6 +670,13 @@ out:
 return ret < 0 ? ret : found_dirty;
 }
 
+static void block_copy_kick(BlockCopyCallState *call_state)
+{
+if (call_state->sleep_state) {
+qemu_co_sleep_wake(call_state->sleep_state);
+}
+}
+
 /*
  * block_copy_common
  *
@@ -729,6 +757,7 @@ BlockCopyCallState *block_copy_async(BlockCopyState *s,
 .s = s,
 .offset = offset,
 .bytes = bytes,
+.ratelimit = ratelimit,
 .cb = cb,
 .max_workers = max_workers ?: BLOCK_COPY_MAX_WORKERS,
 .max_chunk = max_chunk,
@@ -752,3 +781,18 @@ void block_copy_set_skip_unallocated(BlockCopyState *s, 
bool skip)
 {
 s->skip_unallocated = skip;
 }
+
+void block_copy_set_speed(BlockCopyState *s, BlockCopyCallState *call_state,
+  uint64_t speed)
+{
+uint64_t old_speed = s->speed;
+
+s->speed = speed;
+if (speed > 0) {
+ratelimit_set_speed(>rate_limit, speed, BLOCK_COPY_SLICE_TIME);
+}
+
+if (call_state && old_speed && (speed > old_speed || speed == 0)) {
+block_copy_kick(call_state);
+}
+}
-- 
2.21.0




[PATCH v2 11/20] qapi: backup: add x-max-chunk and x-max-workers parameters

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
Add new parameters to configure future backup features. The patch
doesn't introduce aio backup requests (so we actually have only one
worker) neither requests larger than one cluster. Still, formally we
satisfy these maximums anyway, so add the parameters now, to facilitate
further patch which will really change backup job behavior.

Options are added with x- prefixes, as the only use for them are some
very conservative iotests which will be updated soon.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json  |  9 -
 include/block/block_int.h |  7 +++
 block/backup.c| 21 +
 block/replication.c   |  2 +-
 blockdev.c|  5 +
 5 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0c7600e4ec..f4abcde34e 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1407,6 +1407,12 @@
 #
 # @x-use-copy-range: use copy offloading if possible. Default true. (Since 5.1)
 #
+# @x-max-workers: maximum of parallel requests for static data backup. This
+# doesn't influence copy-before-write operations. (Since: 5.1)
+#
+# @x-max-chunk: maximum chunk length for static data backup. This doesn't
+#   influence copy-before-write operations. (Since: 5.1)
+#
 # Note: @on-source-error and @on-target-error only affect background
 #   I/O.  If an error occurs during a guest write request, the device's
 #   rerror/werror actions will be used.
@@ -1421,7 +1427,8 @@
 '*on-source-error': 'BlockdevOnError',
 '*on-target-error': 'BlockdevOnError',
 '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
-'*filter-node-name': 'str', '*x-use-copy-range': 'bool'  } }
+'*filter-node-name': 'str', '*x-use-copy-range': 'bool',
+'*x-max-workers': 'int', '*x-max-chunk': 'int64' } }
 
 ##
 # @DriveBackup:
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 93b9b3bdc0..d93a170d37 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1207,6 +1207,11 @@ void mirror_start(const char *job_id, BlockDriverState 
*bs,
  * @sync_mode: What parts of the disk image should be copied to the 
destination.
  * @sync_bitmap: The dirty bitmap if sync_mode is 'bitmap' or 'incremental'
  * @bitmap_mode: The bitmap synchronization policy to use.
+ * @max_workers: The limit for parallel requests for main backup loop.
+ *   Must be >= 1.
+ * @max_chunk: The limit for one IO operation length in main backup loop.
+ * Must be not less than job cluster size or zero. Zero means no
+ * specific limit.
  * @on_source_error: The action to take upon error reading from the source.
  * @on_target_error: The action to take upon error writing to the target.
  * @creation_flags: Flags that control the behavior of the Job lifetime.
@@ -1226,6 +1231,8 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 bool compress,
 const char *filter_node_name,
 bool use_copy_range,
+int max_workers,
+int64_t max_chunk,
 BlockdevOnError on_source_error,
 BlockdevOnError on_target_error,
 int creation_flags,
diff --git a/block/backup.c b/block/backup.c
index 76847b4daf..ec2676abc2 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -46,6 +46,8 @@ typedef struct BackupBlockJob {
 uint64_t len;
 uint64_t bytes_read;
 int64_t cluster_size;
+int max_workers;
+int64_t max_chunk;
 
 BlockCopyState *bcs;
 } BackupBlockJob;
@@ -335,6 +337,8 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
   bool compress,
   const char *filter_node_name,
   bool use_copy_range,
+  int max_workers,
+  int64_t max_chunk,
   BlockdevOnError on_source_error,
   BlockdevOnError on_target_error,
   int creation_flags,
@@ -355,6 +359,16 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 assert(sync_mode != MIRROR_SYNC_MODE_INCREMENTAL);
 assert(sync_bitmap || sync_mode != MIRROR_SYNC_MODE_BITMAP);
 
+if (max_workers < 1) {
+error_setg(errp, "At least one worker needed");
+return NULL;
+}
+
+if (max_chunk < 0) {
+error_setg(errp, "max-chunk is negative");
+return NULL;
+}
+
 if (bs == target) {
 error_setg(errp, "Source and target cannot be the same");
 return NULL;
@@ -422,6 +436,11 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 if (cluster_size < 0) {
 goto error;
 }
+if (max_chunk && max_chunk < cluster_size) {
+error_setg(errp, 

[PATCH v2 08/20] block/block-copy: add block_copy_cancel

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
Add function to cancel running async block-copy call. It will be used
in backup.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block-copy.h |  7 +++
 block/block-copy.c | 22 +++---
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index d40e691123..370a194d3c 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -67,6 +67,13 @@ BlockCopyCallState *block_copy_async(BlockCopyState *s,
 void block_copy_set_speed(BlockCopyState *s, BlockCopyCallState *call_state,
   uint64_t speed);
 
+/*
+ * Cancel running block-copy call.
+ * Cancel leaves block-copy state valid: dirty bits are correct and you may use
+ * cancel +  to emulate pause/resume.
+ */
+void block_copy_cancel(BlockCopyCallState *call_state);
+
 BdrvDirtyBitmap *block_copy_dirty_bitmap(BlockCopyState *s);
 void block_copy_set_skip_unallocated(BlockCopyState *s, bool skip);
 
diff --git a/block/block-copy.c b/block/block-copy.c
index 851d9c8aaf..b551feb6c2 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -44,6 +44,8 @@ typedef struct BlockCopyCallState {
 bool failed;
 bool finished;
 QemuCoSleepState *sleep_state;
+bool cancelled;
+Coroutine *canceller;
 
 /* OUT parameters */
 bool error_is_read;
@@ -582,7 +584,7 @@ block_copy_dirty_clusters(BlockCopyCallState *call_state)
 assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
 assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));
 
-while (bytes && aio_task_pool_status(aio) == 0) {
+while (bytes && aio_task_pool_status(aio) == 0 && !call_state->cancelled) {
 BlockCopyTask *task;
 int64_t status_bytes;
 
@@ -693,7 +695,7 @@ static int coroutine_fn 
block_copy_common(BlockCopyCallState *call_state)
 do {
 ret = block_copy_dirty_clusters(call_state);
 
-if (ret == 0) {
+if (ret == 0 && !call_state->cancelled) {
 ret = block_copy_wait_one(call_state->s, call_state->offset,
   call_state->bytes);
 }
@@ -707,13 +709,18 @@ static int coroutine_fn 
block_copy_common(BlockCopyCallState *call_state)
  * 2. We have waited for some intersecting block-copy request
  *It may have failed and produced new dirty bits.
  */
-} while (ret > 0);
+} while (ret > 0 && !call_state->cancelled);
 
 if (call_state->cb) {
 call_state->cb(ret, call_state->error_is_read,
call_state->s->progress_opaque);
 }
 
+if (call_state->canceller) {
+aio_co_wake(call_state->canceller);
+call_state->canceller = NULL;
+}
+
 call_state->finished = true;
 
 return ret;
@@ -772,6 +779,15 @@ BlockCopyCallState *block_copy_async(BlockCopyState *s,
 
 return call_state;
 }
+
+void block_copy_cancel(BlockCopyCallState *call_state)
+{
+call_state->cancelled = true;
+call_state->canceller = qemu_coroutine_self();
+block_copy_kick(call_state);
+qemu_coroutine_yield();
+}
+
 BdrvDirtyBitmap *block_copy_dirty_bitmap(BlockCopyState *s)
 {
 return s->copy_bitmap;
-- 
2.21.0




[PATCH v2 04/20] block/block-copy: More explicit call_state

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
Refactor common path to use BlockCopyCallState pointer as parameter, to
prepare it for use in asynchronous block-copy (at least, we'll need to
run block-copy in a coroutine, passing the whole parameters as one
pointer).

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/block-copy.c | 51 ++
 1 file changed, 38 insertions(+), 13 deletions(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index 43a018d190..75882a094c 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -30,7 +30,15 @@
 static coroutine_fn int block_copy_task_entry(AioTask *task);
 
 typedef struct BlockCopyCallState {
+/* IN parameters */
+BlockCopyState *s;
+int64_t offset;
+int64_t bytes;
+
+/* State */
 bool failed;
+
+/* OUT parameters */
 bool error_is_read;
 } BlockCopyCallState;
 
@@ -541,15 +549,17 @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
  * Returns 1 if dirty clusters found and successfully copied, 0 if no dirty
  * clusters found and -errno on failure.
  */
-static int coroutine_fn block_copy_dirty_clusters(BlockCopyState *s,
-  int64_t offset, int64_t 
bytes,
-  bool *error_is_read)
+static int coroutine_fn
+block_copy_dirty_clusters(BlockCopyCallState *call_state)
 {
+BlockCopyState *s = call_state->s;
+int64_t offset = call_state->offset;
+int64_t bytes = call_state->bytes;
+
 int ret = 0;
 bool found_dirty = false;
 int64_t end = offset + bytes;
 AioTaskPool *aio = NULL;
-BlockCopyCallState call_state = {false, false};
 
 /*
  * block_copy() user is responsible for keeping source and target in same
@@ -565,7 +575,7 @@ static int coroutine_fn 
block_copy_dirty_clusters(BlockCopyState *s,
 BlockCopyTask *task;
 int64_t status_bytes;
 
-task = block_copy_task_create(s, _state, offset, bytes);
+task = block_copy_task_create(s, call_state, offset, bytes);
 if (!task) {
 /* No more dirty bits in the bitmap */
 trace_block_copy_skip_range(s, offset, bytes);
@@ -630,15 +640,12 @@ out:
 
 aio_task_pool_free(aio);
 }
-if (error_is_read && ret < 0) {
-*error_is_read = call_state.error_is_read;
-}
 
 return ret < 0 ? ret : found_dirty;
 }
 
 /*
- * block_copy
+ * block_copy_common
  *
  * Copy requested region, accordingly to dirty bitmap.
  * Collaborate with parallel block_copy requests: if they succeed it will help
@@ -646,16 +653,16 @@ out:
  * it means that some I/O operation failed in context of _this_ block_copy 
call,
  * not some parallel operation.
  */
-int coroutine_fn block_copy(BlockCopyState *s, int64_t offset, int64_t bytes,
-bool *error_is_read)
+static int coroutine_fn block_copy_common(BlockCopyCallState *call_state)
 {
 int ret;
 
 do {
-ret = block_copy_dirty_clusters(s, offset, bytes, error_is_read);
+ret = block_copy_dirty_clusters(call_state);
 
 if (ret == 0) {
-ret = block_copy_wait_one(s, offset, bytes);
+ret = block_copy_wait_one(call_state->s, call_state->offset,
+  call_state->bytes);
 }
 
 /*
@@ -672,6 +679,24 @@ int coroutine_fn block_copy(BlockCopyState *s, int64_t 
offset, int64_t bytes,
 return ret;
 }
 
+int coroutine_fn block_copy(BlockCopyState *s, int64_t start, int64_t bytes,
+bool *error_is_read)
+{
+BlockCopyCallState call_state = {
+.s = s,
+.offset = start,
+.bytes = bytes,
+};
+
+int ret = block_copy_common(_state);
+
+if (error_is_read && ret < 0) {
+*error_is_read = call_state.error_is_read;
+}
+
+return ret;
+}
+
 BdrvDirtyBitmap *block_copy_dirty_bitmap(BlockCopyState *s)
 {
 return s->copy_bitmap;
-- 
2.21.0




[PATCH v2 03/20] qapi: backup: add x-use-copy-range parameter

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
Add parameter to enable/disable copy_range. Keep current default for
now (enabled).

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 qapi/block-core.json   | 4 +++-
 block/backup-top.h | 1 +
 include/block/block-copy.h | 2 +-
 include/block/block_int.h  | 1 +
 block/backup-top.c | 4 +++-
 block/backup.c | 4 +++-
 block/block-copy.c | 4 ++--
 block/replication.c| 1 +
 blockdev.c | 5 +
 9 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6fbacddab2..0c7600e4ec 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1405,6 +1405,8 @@
 #above node specified by @drive. If this option is not 
given,
 #a node name is autogenerated. (Since: 4.2)
 #
+# @x-use-copy-range: use copy offloading if possible. Default true. (Since 5.1)
+#
 # Note: @on-source-error and @on-target-error only affect background
 #   I/O.  If an error occurs during a guest write request, the device's
 #   rerror/werror actions will be used.
@@ -1419,7 +1421,7 @@
 '*on-source-error': 'BlockdevOnError',
 '*on-target-error': 'BlockdevOnError',
 '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
-'*filter-node-name': 'str' } }
+'*filter-node-name': 'str', '*x-use-copy-range': 'bool'  } }
 
 ##
 # @DriveBackup:
diff --git a/block/backup-top.h b/block/backup-top.h
index e5cabfa197..2d74a976d8 100644
--- a/block/backup-top.h
+++ b/block/backup-top.h
@@ -33,6 +33,7 @@ BlockDriverState *bdrv_backup_top_append(BlockDriverState 
*source,
  BlockDriverState *target,
  const char *filter_node_name,
  uint64_t cluster_size,
+ bool use_copy_range,
  BdrvRequestFlags write_flags,
  BlockCopyState **bcs,
  Error **errp);
diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index aac85e1488..6397505f30 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -22,7 +22,7 @@ typedef void (*ProgressBytesCallbackFunc)(int64_t bytes, void 
*opaque);
 typedef struct BlockCopyState BlockCopyState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
- int64_t cluster_size,
+ int64_t cluster_size, bool use_copy_range,
  BdrvRequestFlags write_flags,
  Error **errp);
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 791de6a59c..93b9b3bdc0 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1225,6 +1225,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 BitmapSyncMode bitmap_mode,
 bool compress,
 const char *filter_node_name,
+bool use_copy_range,
 BlockdevOnError on_source_error,
 BlockdevOnError on_target_error,
 int creation_flags,
diff --git a/block/backup-top.c b/block/backup-top.c
index af2f20f346..0a09544c76 100644
--- a/block/backup-top.c
+++ b/block/backup-top.c
@@ -188,6 +188,7 @@ BlockDriverState *bdrv_backup_top_append(BlockDriverState 
*source,
  BlockDriverState *target,
  const char *filter_node_name,
  uint64_t cluster_size,
+ bool use_copy_range,
  BdrvRequestFlags write_flags,
  BlockCopyState **bcs,
  Error **errp)
@@ -246,7 +247,8 @@ BlockDriverState *bdrv_backup_top_append(BlockDriverState 
*source,
 
 state->cluster_size = cluster_size;
 state->bcs = block_copy_state_new(top->backing, state->target,
-  cluster_size, write_flags, _err);
+  cluster_size, use_copy_range,
+  write_flags, _err);
 if (local_err) {
 error_prepend(_err, "Cannot create block-copy-state: ");
 goto fail;
diff --git a/block/backup.c b/block/backup.c
index 4f13bb20a5..76847b4daf 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -334,6 +334,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
   BitmapSyncMode bitmap_mode,
   bool compress,
   const char *filter_node_name,
+  bool use_copy_range,
   

[PATCH v2 05/20] block/block-copy: implement block_copy_async

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
We'll need async block-copy invocation to use in backup directly.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block-copy.h | 13 +
 block/block-copy.c | 40 ++
 2 files changed, 53 insertions(+)

diff --git a/include/block/block-copy.h b/include/block/block-copy.h
index 6397505f30..ada0d99566 100644
--- a/include/block/block-copy.h
+++ b/include/block/block-copy.h
@@ -19,7 +19,10 @@
 #include "qemu/co-shared-resource.h"
 
 typedef void (*ProgressBytesCallbackFunc)(int64_t bytes, void *opaque);
+typedef void (*BlockCopyAsyncCallbackFunc)(int ret, bool error_is_read,
+   void *opaque);
 typedef struct BlockCopyState BlockCopyState;
+typedef struct BlockCopyCallState BlockCopyCallState;
 
 BlockCopyState *block_copy_state_new(BdrvChild *source, BdrvChild *target,
  int64_t cluster_size, bool use_copy_range,
@@ -41,6 +44,16 @@ int64_t block_copy_reset_unallocated(BlockCopyState *s,
 int coroutine_fn block_copy(BlockCopyState *s, int64_t offset, int64_t bytes,
 bool *error_is_read);
 
+/*
+ * Run block-copy in a coroutine, return state pointer. If finished early
+ * returns NULL (@cb is called anyway).
+ */
+BlockCopyCallState *block_copy_async(BlockCopyState *s,
+ int64_t offset, int64_t bytes,
+ bool ratelimit, int max_workers,
+ int64_t max_chunk,
+ BlockCopyAsyncCallbackFunc cb);
+
 BdrvDirtyBitmap *block_copy_dirty_bitmap(BlockCopyState *s);
 void block_copy_set_skip_unallocated(BlockCopyState *s, bool skip);
 
diff --git a/block/block-copy.c b/block/block-copy.c
index 75882a094c..a0477d90f3 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -34,9 +34,11 @@ typedef struct BlockCopyCallState {
 BlockCopyState *s;
 int64_t offset;
 int64_t bytes;
+BlockCopyAsyncCallbackFunc cb;
 
 /* State */
 bool failed;
+bool finished;
 
 /* OUT parameters */
 bool error_is_read;
@@ -676,6 +678,13 @@ static int coroutine_fn 
block_copy_common(BlockCopyCallState *call_state)
  */
 } while (ret > 0);
 
+if (call_state->cb) {
+call_state->cb(ret, call_state->error_is_read,
+   call_state->s->progress_opaque);
+}
+
+call_state->finished = true;
+
 return ret;
 }
 
@@ -697,6 +706,37 @@ int coroutine_fn block_copy(BlockCopyState *s, int64_t 
start, int64_t bytes,
 return ret;
 }
 
+static void coroutine_fn block_copy_async_co_entry(void *opaque)
+{
+block_copy_common(opaque);
+}
+
+BlockCopyCallState *block_copy_async(BlockCopyState *s,
+ int64_t offset, int64_t bytes,
+ bool ratelimit, int max_workers,
+ int64_t max_chunk,
+ BlockCopyAsyncCallbackFunc cb)
+{
+BlockCopyCallState *call_state = g_new(BlockCopyCallState, 1);
+Coroutine *co = qemu_coroutine_create(block_copy_async_co_entry,
+  call_state);
+
+*call_state = (BlockCopyCallState) {
+.s = s,
+.offset = offset,
+.bytes = bytes,
+.cb = cb,
+};
+
+qemu_coroutine_enter(co);
+
+if (call_state->finished) {
+g_free(call_state);
+return NULL;
+}
+
+return call_state;
+}
 BdrvDirtyBitmap *block_copy_dirty_bitmap(BlockCopyState *s)
 {
 return s->copy_bitmap;
-- 
2.21.0




[PATCH v2 09/20] blockjob: add set_speed to BlockJobDriver

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
We are going to use async block-copy call in backup, so we'll need to
passthrough setting backup speed to block-copy call.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/blockjob_int.h | 2 ++
 blockjob.c   | 6 ++
 2 files changed, 8 insertions(+)

diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index e2824a36a8..6633d83da2 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -52,6 +52,8 @@ struct BlockJobDriver {
  * besides job->blk to the new AioContext.
  */
 void (*attached_aio_context)(BlockJob *job, AioContext *new_context);
+
+void (*set_speed)(BlockJob *job, int64_t speed);
 };
 
 /**
diff --git a/blockjob.c b/blockjob.c
index 470facfd47..6a0cd392e2 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -256,6 +256,7 @@ static bool job_timer_pending(Job *job)
 
 void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
 {
+const BlockJobDriver *drv = block_job_driver(job);
 int64_t old_speed = job->speed;
 
 if (job_apply_verb(>job, JOB_VERB_SET_SPEED, errp)) {
@@ -270,6 +271,11 @@ void block_job_set_speed(BlockJob *job, int64_t speed, 
Error **errp)
 ratelimit_set_speed(>limit, speed, BLOCK_JOB_SLICE_TIME);
 
 job->speed = speed;
+
+if (drv->set_speed) {
+drv->set_speed(job, speed);
+}
+
 if (speed && speed <= old_speed) {
 return;
 }
-- 
2.21.0




[PATCH v2 01/20] block/block-copy: block_copy_dirty_clusters: fix failure check

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
ret may be > 0 on success path at this point. Fix assertion, which may
crash currently.

Fixes: 4ce5dd3e9b5ee0fac18625860eb3727399ee965e
Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/block-copy.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/block-copy.c b/block/block-copy.c
index bb8d0569f2..f7428a7c08 100644
--- a/block/block-copy.c
+++ b/block/block-copy.c
@@ -622,8 +622,10 @@ out:
  * block_copy_task_run. If it fails, it means some task already failed
  * for real reason, let's return first failure.
  * Still, assert that we don't rewrite failure by success.
+ *
+ * Note: ret may be positive here because of block-status result.
  */
-assert(ret == 0 || aio_task_pool_status(aio) < 0);
+assert(ret >= 0 || aio_task_pool_status(aio) < 0);
 ret = aio_task_pool_status(aio);
 
 aio_task_pool_free(aio);
-- 
2.21.0




[PATCH v2 00/20] backup performance: block_status + async

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
Hi all!

This a last part of original
"[RFC 00/24] backup performance: block_status + async", prepartions are
already merged.

The series turn backup into series of block_copy_async calls, covering
the whole disk, so we get block-status based paralallel async requests
out of the box, which gives performance gain:

-    -  --  
--    ---
   mirror(upstream)  backup(new)backup(new, no-copy-range)  
backup(new, copy-range-1w)  backup(upstream)  backup(upstream, no-copy-range)
hdd-ext4:hdd-ext4  18.86 +- 0.11 45.50 +- 2.35  19.22 +- 0.09   
19.51 +- 0.09   22.85 +- 5.98 19.72 +- 0.35
hdd-ext4:ssd-ext4  8.99 +- 0.02  9.30 +- 0.01   8.97 +- 0.02
9.02 +- 0.029.68 +- 0.26  9.84 +- 0.12
ssd-ext4:hdd-ext4  9.09 +- 0.11  9.34 +- 0.10   9.34 +- 0.10
8.99 +- 0.0111.37 +- 0.37 11.47 +- 0.30
ssd-ext4:ssd-ext4  4.07 +- 0.02  5.41 +- 0.05   4.05 +- 0.01
8.35 +- 0.589.83 +- 0.64  8.62 +- 0.35
hdd-xfs:hdd-xfs18.90 +- 0.19 43.26 +- 2.47  19.62 +- 0.14   
19.38 +- 0.16   19.55 +- 0.26 19.62 +- 0.12
hdd-xfs:ssd-xfs8.93 +- 0.12  9.35 +- 0.03   8.93 +- 0.08
8.93 +- 0.059.79 +- 0.30  9.55 +- 0.15
ssd-xfs:hdd-xfs9.15 +- 0.07  9.74 +- 0.28   9.29 +- 0.03
9.08 +- 0.0510.85 +- 0.31 10.91 +- 0.30
ssd-xfs:ssd-xfs4.06 +- 0.01  4.93 +- 0.02   4.04 +- 0.01
8.17 +- 0.429.52 +- 0.49  8.85 +- 0.46
ssd-ext4:nbd   9.96 +- 0.11  11.45 +- 0.15  11.45 +- 0.02   
17.22 +- 0.06   34.45 +- 1.35 35.16 +- 0.37
nbd:ssd-ext4   9.84 +- 0.02  9.84 +- 0.04   9.80 +- 0.06
18.96 +- 0.06   30.89 +- 0.73 31.46 +- 0.21
-    -  --  
--    ---


The table shows, that copy_range is in bad relation with parallel async
requests. copy_range brings real performance gain only on supporting fs,
like btrfs. But even on such fs, I'm not sure that this is a good
default behavior: if we do offload copy, so, that no real copy but just
link block in backup the same blocks as in original, this means that
further write from guest will lead to fragmentation of guest disk, when
the aim of backup is to operate transparently for the guest.

So, in addition to these series I also suggest to disable copy_range by
default.

===

How to test:

prepare images:
In a directories, where you want to place source and target images,
prepare images by:

for img in test-source test-target; do
 ./qemu-img create -f raw $img 1000M;
 ./qemu-img bench -c 1000 -d 1 -f raw -s 1M -w --pattern=0xff $img
done

prepare similar image for nbd server, and start it somewhere by

 qemu-nbd --persistent --nocache -f raw IMAGE

Then, run benchmark, like this:
./bench-backup.py --qemu new:../../x86_64-softmmu/qemu-system-x86_64 
upstream:/work/src/qemu/up-backup-block-copy-master/x86_64-softmmu/qemu-system-x86_64
 --dir hdd-ext4:/test-a hdd-xfs:/test-b ssd-ext4:/ssd ssd-xfs:/ssd-b --test 
$(for fs in ext4 xfs; do echo hdd-$fs:hdd-$fs hdd-$fs:ssd-$fs ssd-$fs:hdd-$fs 
ssd-$fs:ssd-$fs; done) --nbd 192.168.100.2 --test ssd-ext4:nbd nbd:ssd-ext4

(you may simply reduce number of directories/test-cases, use --help for
 help)

===

Note, that I included here
"[PATCH] block/block-copy: block_copy_dirty_clusters: fix failure check"
which was previously sent in separate, but still untouched in mailing
list. It still may be applied separately.

Vladimir Sementsov-Ogievskiy (20):
  block/block-copy: block_copy_dirty_clusters: fix failure check
  iotests: 129 don't check backup "busy"
  qapi: backup: add x-use-copy-range parameter
  block/block-copy: More explicit call_state
  block/block-copy: implement block_copy_async
  block/block-copy: add max_chunk and max_workers parameters
  block/block-copy: add ratelimit to block-copy
  block/block-copy: add block_copy_cancel
  blockjob: add set_speed to BlockJobDriver
  job: call job_enter from job_user_pause
  qapi: backup: add x-max-chunk and x-max-workers parameters
  iotests: 56: prepare for backup over block-copy
  iotests: 129: prepare for backup over block-copy
  iotests: 185: prepare for backup over block-copy
  iotests: 219: prepare for backup over block-copy
  iotests: 257: prepare for backup over block-copy
  backup: move to block-copy
  block/block-copy: drop unused argument of block_copy()
  simplebench: bench_block_job: add cmd_options argument
  simplebench: add bench-backup.py

 qapi/block-core.json   |  11 +-
 block/backup-top.h |   1 +
 

[PATCH v2 02/20] iotests: 129 don't check backup "busy"

2020-06-01 Thread Vladimir Sementsov-Ogievskiy
Busy is racy, job has it's "pause-points" when it's not busy. Drop this
check.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/129 | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tests/qemu-iotests/129 b/tests/qemu-iotests/129
index b0da4a5541..4db5eca441 100755
--- a/tests/qemu-iotests/129
+++ b/tests/qemu-iotests/129
@@ -66,7 +66,6 @@ class TestStopWithBlockJob(iotests.QMPTestCase):
 result = self.vm.qmp("stop")
 self.assert_qmp(result, 'return', {})
 result = self.vm.qmp("query-block-jobs")
-self.assert_qmp(result, 'return[0]/busy', True)
 self.assert_qmp(result, 'return[0]/ready', False)
 
 def test_drive_mirror(self):
-- 
2.21.0




Re: Discard commands not working with virtio-blk despite using recent QEMU version

2020-06-01 Thread David Scherfgen
Hi John,

thank you. It seems like you're correct, my guest OS is running a kernel
that's too old. The VirtIO driver has only "recently" been extended to
support this:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d548e65904ae43b0637d200a2441fc94e0589c30
I guess I'll stick with SCSI then.
The guest's filesystem is ext4, which supports trimming.

Best regards
David Scherfgen

John Snow  schrieb am Mo., 1. Juni 2020, 19:51:

>
>
> On 6/1/20 12:00 PM, David Scherfgen wrote:
> > Dear mailing list,
> >
> > I'm using QEMU 4.2.0 (coming with Ubuntu 20.04 LTS) to run a virtual
> > webserver. It uses a ZFS volume as its disk over the VirtIO bus
> > (virtio-blk).
> >
> > After equipping my server with SSDs and moving my virtual
> > webserver's disk to them (actually a ZFS volume in a mirror
> > configuration), I was surprised to find out that discard commands didn't
> > work. When running fstrim on the guest, I got:
> >
> > fstrim: /: FITRIM ioctl failed: Operation not supported
> >
> > This surprised me because I read about virtio-blk being adapted so that
> > it can handle discard commands. I checked the source code for QEMU 4.2.0
> > and indeed the changes to virtio-blk to support discard were in there.
> >
> > Switching to the SCSI bus solved the problem. But nevertheless,
> > shouldn't it also be working with virtio-blk now?
> >
> > Further below, I'm posting the QEMU command lines.
> >
> > Thanks in advance for any ideas on this.
> >
> > Best regards
> > David Scherfgen
> >
> > --
> >
> > QEMU command lines:
> >
> > With the VirtIO bus (discard not working):
> >
> > qemu-system-x86_64 -enable-kvm -name guest=webserver,debug-threads=on -S
> > -object
> >
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-30-webserver/master-key.aes
> > -machine pc-i440fx-4.2,accel=kvm,usb=off,dump-guest-core=off -cpu qemu64
> > -m 49152 -overcommit mem-lock=off -smp 8,sockets=1,cores=4,threads=2
> > -uuid 7d619885-9c53-2fa3-ca69-032e6fba6a90 -no-user-config -nodefaults
> > -chardev socket,id=charmonitor,fd=32,server,nowait -mon
> > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
> > -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
> > -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -blockdev
> >
> {"driver":"host_device","filename":"/dev/zvol/tank-ssd/webserver","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}
> > -blockdev
> >
> {"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}
> > -device
> >
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on
> > -netdev tap,fd=34,id=hostnet0,vhost=on,vhostfd=35 -device
> >
> virtio-net-pci,netdev=hostnet0,id=net0,mac=00:50:56:00:f2:8e,bus=pci.0,addr=0x3
> > -vnc 0.0.0.0:1 ,password -device
> > cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
> > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -cpu host -k de
> > -sandbox
> > on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
> > -msg timestamp=on
> >
> > With the SCSI (discard working):
> >
> > qemu-system-x86_64 -enable-kvm -name guest=webserver,debug-threads=on -S
> > -object
> >
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-31-webserver/master-key.aes
> > -machine pc-i440fx-4.2,accel=kvm,usb=off,dump-guest-core=off -cpu qemu64
> > -m 49152 -overcommit mem-lock=off -smp 8,sockets=1,cores=4,threads=2
> > -uuid 7d619885-9c53-2fa3-ca69-032e6fba6a90 -no-user-config -nodefaults
> > -chardev socket,id=charmonitor,fd=32,server,nowait -mon
> > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
> > -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
> > -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -blockdev
> >
> {"driver":"host_device","filename":"/dev/zvol/tank-ssd/webserver","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}
> > -blockdev
> >
> {"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}
> > -device
> >
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-1-format,id=scsi0-0-0-0,bootindex=1,write-cache=on
> > -netdev tap,fd=34,id=hostnet0,vhost=on,vhostfd=35 -device
> >
> virtio-net-pci,netdev=hostnet0,id=net0,mac=00:50:56:00:f2:8e,bus=pci.0,addr=0x3
> > -vnc 0.0.0.0:1 ,password -device
> > cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
> > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -cpu host -k de
> > -sandbox
> > on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
> > -msg timestamp=on
>
> Hi, I didn't follow personally the development of adding 

Re: Discard commands not working with virtio-blk despite using recent QEMU version

2020-06-01 Thread John Snow



On 6/1/20 12:00 PM, David Scherfgen wrote:
> Dear mailing list,
> 
> I'm using QEMU 4.2.0 (coming with Ubuntu 20.04 LTS) to run a virtual
> webserver. It uses a ZFS volume as its disk over the VirtIO bus
> (virtio-blk).
> 
> After equipping my server with SSDs and moving my virtual
> webserver's disk to them (actually a ZFS volume in a mirror
> configuration), I was surprised to find out that discard commands didn't
> work. When running fstrim on the guest, I got:
> 
> fstrim: /: FITRIM ioctl failed: Operation not supported
> 
> This surprised me because I read about virtio-blk being adapted so that
> it can handle discard commands. I checked the source code for QEMU 4.2.0
> and indeed the changes to virtio-blk to support discard were in there.
> 
> Switching to the SCSI bus solved the problem. But nevertheless,
> shouldn't it also be working with virtio-blk now?
> 
> Further below, I'm posting the QEMU command lines.
> 
> Thanks in advance for any ideas on this.
> 
> Best regards
> David Scherfgen
> 
> --
> 
> QEMU command lines:
> 
> With the VirtIO bus (discard not working):
> 
> qemu-system-x86_64 -enable-kvm -name guest=webserver,debug-threads=on -S
> -object
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-30-webserver/master-key.aes
> -machine pc-i440fx-4.2,accel=kvm,usb=off,dump-guest-core=off -cpu qemu64
> -m 49152 -overcommit mem-lock=off -smp 8,sockets=1,cores=4,threads=2
> -uuid 7d619885-9c53-2fa3-ca69-032e6fba6a90 -no-user-config -nodefaults
> -chardev socket,id=charmonitor,fd=32,server,nowait -mon
> chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
> -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -blockdev
> {"driver":"host_device","filename":"/dev/zvol/tank-ssd/webserver","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}
> -blockdev
> {"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on
> -netdev tap,fd=34,id=hostnet0,vhost=on,vhostfd=35 -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=00:50:56:00:f2:8e,bus=pci.0,addr=0x3
> -vnc 0.0.0.0:1 ,password -device
> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -cpu host -k de
> -sandbox
> on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
> -msg timestamp=on
> 
> With the SCSI (discard working):
> 
> qemu-system-x86_64 -enable-kvm -name guest=webserver,debug-threads=on -S
> -object
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-31-webserver/master-key.aes
> -machine pc-i440fx-4.2,accel=kvm,usb=off,dump-guest-core=off -cpu qemu64
> -m 49152 -overcommit mem-lock=off -smp 8,sockets=1,cores=4,threads=2
> -uuid 7d619885-9c53-2fa3-ca69-032e6fba6a90 -no-user-config -nodefaults
> -chardev socket,id=charmonitor,fd=32,server,nowait -mon
> chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
> -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -blockdev
> {"driver":"host_device","filename":"/dev/zvol/tank-ssd/webserver","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}
> -blockdev
> {"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}
> -device
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-1-format,id=scsi0-0-0-0,bootindex=1,write-cache=on
> -netdev tap,fd=34,id=hostnet0,vhost=on,vhostfd=35 -device
> virtio-net-pci,netdev=hostnet0,id=net0,mac=00:50:56:00:f2:8e,bus=pci.0,addr=0x3
> -vnc 0.0.0.0:1 ,password -device
> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -cpu host -k de
> -sandbox
> on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
> -msg timestamp=on

Hi, I didn't follow personally the development of adding TRIM to
virtio-blk, so below are some naive questions:

- What kernel does the guest run? I think the kernel needs to support it
for that block driver too.

- The filesystem driver in the guest needs to support it as well. (What
FS are you using in the guest? Is that ZFS? Not all implementations
support TRIM as far as I know, depends on version and so on.)

- does discard need to be enabled for the device model as well as the
block backend? (-device virtio-blk-pci,discard=on) ... The patch makes
it look like it should be enabled by default, but it's something to try.




Discard commands not working with virtio-blk despite using recent QEMU version

2020-06-01 Thread David Scherfgen
Dear mailing list,

I'm using QEMU 4.2.0 (coming with Ubuntu 20.04 LTS) to run a virtual
webserver. It uses a ZFS volume as its disk over the VirtIO bus
(virtio-blk).

After equipping my server with SSDs and moving my virtual webserver's disk
to them (actually a ZFS volume in a mirror configuration), I was surprised
to find out that discard commands didn't work. When running fstrim on the
guest, I got:

fstrim: /: FITRIM ioctl failed: Operation not supported

This surprised me because I read about virtio-blk being adapted so that it
can handle discard commands. I checked the source code for QEMU 4.2.0 and
indeed the changes to virtio-blk to support discard were in there.

Switching to the SCSI bus solved the problem. But nevertheless, shouldn't
it also be working with virtio-blk now?

Further below, I'm posting the QEMU command lines.

Thanks in advance for any ideas on this.

Best regards
David Scherfgen

--

QEMU command lines:

With the VirtIO bus (discard not working):

qemu-system-x86_64 -enable-kvm -name guest=webserver,debug-threads=on -S
-object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-30-webserver/master-key.aes
-machine pc-i440fx-4.2,accel=kvm,usb=off,dump-guest-core=off -cpu qemu64 -m
49152 -overcommit mem-lock=off -smp 8,sockets=1,cores=4,threads=2 -uuid
7d619885-9c53-2fa3-ca69-032e6fba6a90 -no-user-config -nodefaults -chardev
socket,id=charmonitor,fd=32,server,nowait -mon
chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
-boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -blockdev
{"driver":"host_device","filename":"/dev/zvol/tank-ssd/webserver","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}
-blockdev
{"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on
-netdev tap,fd=34,id=hostnet0,vhost=on,vhostfd=35 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:50:56:00:f2:8e,bus=pci.0,addr=0x3
-vnc 0.0.0.0:1,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -cpu host -k de
-sandbox
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
-msg timestamp=on

With the SCSI (discard working):

qemu-system-x86_64 -enable-kvm -name guest=webserver,debug-threads=on -S
-object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-31-webserver/master-key.aes
-machine pc-i440fx-4.2,accel=kvm,usb=off,dump-guest-core=off -cpu qemu64 -m
49152 -overcommit mem-lock=off -smp 8,sockets=1,cores=4,threads=2 -uuid
7d619885-9c53-2fa3-ca69-032e6fba6a90 -no-user-config -nodefaults -chardev
socket,id=charmonitor,fd=32,server,nowait -mon
chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
-boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -blockdev
{"driver":"host_device","filename":"/dev/zvol/tank-ssd/webserver","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}
-blockdev
{"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}
-device
scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-1-format,id=scsi0-0-0-0,bootindex=1,write-cache=on
-netdev tap,fd=34,id=hostnet0,vhost=on,vhostfd=35 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:50:56:00:f2:8e,bus=pci.0,addr=0x3
-vnc 0.0.0.0:1,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -cpu host -k de
-sandbox
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
-msg timestamp=on


Re: [PATCH v5 00/15] Apply COR-filter to the block-stream permanently

2020-06-01 Thread Andrey Shinkevich
PINGING...

Please


From: Andrey Shinkevich 
Sent: Wednesday, May 13, 2020 12:50 PM
To: qemu-block@nongnu.org 
Cc: qemu-de...@nongnu.org ; kw...@redhat.com 
; mre...@redhat.com ; arm...@redhat.com 
; f...@euphon.net ; js...@redhat.com 
; stefa...@redhat.com ; 
ebl...@redhat.com ; Denis Lunev ; 
Vladimir Sementsov-Ogievskiy ; Andrey Shinkevich 

Subject: [PATCH v5 00/15] Apply COR-filter to the block-stream permanently

With this series, all the block-stream COR operations pass through
the COR-filter. The patches 01-08/15 are taken from the series
"block: Deal with filters" by Max Reitz, the full version of that
can be found in the branches:

  https://git.xanclic.moe/XanClic/qemu child-access-functions-v6
  https://github.com/XanClic/qemu child-access-functions-v6

  When running iotests, apply "char-socket: Fix race condition"
  to avoid sporadic segmentation faults.

v5:
  01: Included forgotten file block/copy-on-read.h

v4:
  01: Initialization of the COR-filter BDRVStateCOR member.

v3:
  01: The COR filter insert/remove functions moved to block/copy-on-read.c
  to be a part of API.
  02: block/stream.c code refactoring.
  03: The separate call to block_job_add_bdrv() is used to block operations
  on the active node after the filter inserted and the job created.
  04: The iotests case 030::test_overlapping_4 was modified to unbound
  the block-stream job from the base node.
  05: The COR driver functions preadv/pwritev replaced with their analogous
  preadv/pwritev_part.

v2:
  01: No more skipping filters while checking for operation blockers.
  However, we exclude filters between the bottom node and base
  because we do not set the operation blockers for filters anymore.
  02: As stated above, we do not set the operation blockers for filters
  anymore. So, skip filters when we block operations for the target
  node.
  03: The comment added for the patch 4/7.
  04: The QAPI target version changed to 5.1.
  05: The 'filter-node-name' now excluded from using in the test #030.
  If we need it no more in a final version of the series, the patch
  5/7 may be removed.
  06: The COR-filter included into the frozen chain of a block-stream job.
  The 'above_base' node pointer is left because it is essential for
  finding the base node in case of filters above.


Andrey Shinkevich (7):
  block: prepare block-stream for using COR-filter
  copy-on-read: Support change filename functions
  copy-on-read: Support preadv/pwritev_part functions
  copy-on-read: add filter append/drop functions
  qapi: add filter-node-name to block-stream
  iotests: prepare 245 for using filter in block-stream
  block: apply COR-filter to block-stream jobs

Max Reitz (8):
  block: Mark commit and mirror as filter drivers
  copy-on-read: Support compressed writes
  block: Add child access functions
  block: Add chain helper functions
  block: Include filters when freezing backing chain
  block: Use CAFs in block status functions
  commit: Deal with filters when blocking intermediate nodes
  block: Use CAFs when working with backing chains

 block.c| 275 ++---
 block/commit.c |  85 ++---
 block/copy-on-read.c   | 159 ++--
 block/copy-on-read.h   |  36 ++
 block/io.c |  19 +--
 block/mirror.c |   6 +-
 block/monitor/block-hmp-cmds.c |   4 +-
 block/stream.c |  89 +
 blockdev.c |  19 ++-
 include/block/block.h  |   6 +-
 include/block/block_int.h  |  67 +-
 qapi/block-core.json   |   6 +
 tests/qemu-iotests/030 |  17 +--
 tests/qemu-iotests/141.out |   2 +-
 tests/qemu-iotests/245 |  10 +-
 15 files changed, 661 insertions(+), 139 deletions(-)
 create mode 100644 block/copy-on-read.h

--
1.8.3.1



[PATCH v3 6/6] iotests: Dump QCOW2 image metadata in JSON format with qcow2.py

2020-06-01 Thread Andrey Shinkevich
Represent QCOW2 metadata dumping with qcow2.py script in JSON format

{
"QCOW2_header_extensions": [
{
"Header_extension": "Feature table",
"magic": "0x6803f857",
"length": 192,
"data_str": ""
},
{
"Header_extension": "Bitmaps",
"magic": "0x23852875",
"length": 24,
"data": {
"nb_bitmaps": 2,
"reserved32": 0,
"bitmap_directory_size": 64,
"bitmap_directory_offset": 1048576,
"entries": [
{
"name": "bitmap-1",
"flags": [],
"flag_bits": 0,
"bitmap_table_offset": 589824,
"bitmap_table_size": 8,
"type": 1,
"granularity": 16,
"name_size": 8,
"extra_data_size": 0,
"bitmap_table": {
"table_entries": [
{
"type": "serialized",
"offset": 655360,
"size": 65536
},

Suggested-by: Vladimir Sementsov-Ogievskiy 
Signed-off-by: Andrey Shinkevich 
---
 tests/qemu-iotests/qcow2.py | 108 ++--
 1 file changed, 105 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/qcow2.py b/tests/qemu-iotests/qcow2.py
index 76e0c69..fd1ef4f 100755
--- a/tests/qemu-iotests/qcow2.py
+++ b/tests/qemu-iotests/qcow2.py
@@ -3,11 +3,21 @@
 import sys
 import struct
 import string
+import json
 
 
+dump_json = False
 cluster_size = 0
 
 
+class ComplexEncoder(json.JSONEncoder):
+def default(self, obj):
+if hasattr(obj, 'get_info_dict'):
+return obj.get_info_dict()
+else:
+return json.JSONEncoder.default(self, obj)
+
+
 class Qcow2BitmapTableEntry:
 
 BME_TABLE_ENTRY_OFFSET_MASK = 0x00fffe00
@@ -23,6 +33,9 @@ class Qcow2BitmapTableEntry:
 index = entry & self.BME_TABLE_ENTRY_FLAG_ALL_ONES
 self.type = self.bmte_type[index]
 
+def get_info_dict(self):
+return dict(type=self.type, offset=self.offset, size=self.cluster_size)
+
 
 class Qcow2BitmapTable:
 
@@ -39,6 +52,9 @@ class Qcow2BitmapTable:
  entry.cluster_size))
 print("")
 
+def get_info_dict(self):
+return dict(table_entries=self.entries)
+
 
 class Qcow2BitmapDirEntry:
 
@@ -102,6 +118,18 @@ class Qcow2BitmapDirEntry:
 
 self.bitmap_table.print_bitmap_table()
 
+def get_info_dict(self):
+return dict(name=self.name,
+flags=self.bitmap_flags,
+flag_bits=self.flag_bits,
+bitmap_table_offset=self.bitmap_table_offset,
+bitmap_table_size=self.bitmap_table_size,
+type=self.type,
+granularity=self.granularity_bits,
+name_size=self.name_size,
+extra_data_size=self.extra_data_size,
+bitmap_table=self.bitmap_table)
+
 
 class Qcow2BitmapDirectory:
 
@@ -177,6 +205,31 @@ class Qcow2BitmapExt:
 self.dump_bitmap_ext()
 self.dump_bitmap_directory()
 
+def get_info_dict(self):
+return dict(nb_bitmaps=self.nb_bitmaps,
+reserved32=self.reserved32,
+bitmap_directory_size=self.bitmap_directory_size,
+bitmap_directory_offset=self.bitmap_directory_offset,
+entries=self.bitmaps)
+
+
+class Qcow2HeaderDoc:
+
+def __init__(self, h):
+self.header = h
+
+def get_info_dict(self):
+return dict(QCOW2_header=self.header)
+
+
+class Qcow2HeaderExtensionsDoc:
+
+def __init__(self, extensions):
+self.extensions = extensions
+
+def get_info_dict(self):
+return dict(QCOW2_header_extensions=self.extensions)
+
 
 class QcowHeaderExtension:
 
@@ -224,6 +277,17 @@ class QcowHeaderExtension:
 if self.obj is not None:
 self.obj.load(fd)
 
+def get_info_dict(self):
+he_dict = dict(Header_extension=self.name,
+   magic=hex(self.magic),
+   length=self.length)
+if self.obj is not None:
+he_dict.update(data=self.obj)
+else:
+he_dict.update(data_str=self.data_str)
+
+return he_dict
+
 
 class QcowHeader:
 
@@ -353,9 +417,34 @@ class QcowHeader:
 print("%-25s" % f[2], value_str)
 print("")
 
+def get_info_dict(self):
+return dict(magic=hex(self.magic),
+version=self.version,
+backing_file_offset=hex(self.backing_file_offset),
+

[PATCH v3 0/6] iotests: Dump QCOW2 dirty bitmaps metadata

2020-06-01 Thread Andrey Shinkevich
Add dirty bitmap information to QCOW2 metadata dump in qcow2.py script.

v3:
  01: JSON format output possibility added.

v2:
  01: Refactoring of the Python code in the script qcow2.py.
  New methods were added. The bitmap dictionary was instantiated.
  The all of bitmaps information is read completely before
  printing the dictionary.
  02: The outputs of the tests 031, 036 and 061 were modified.

Andrey Shinkevich (6):
  iotests: Add extension names to qcow2.py dump
  iotests: move check for printable data to QcowHeaderExtension class
  iotests: dump bitmap extension data with qcow2.py
  iotests: Dump bitmap directory info with qcow2.py
  iotests: Dump bitmap table entries serialized in QCOW2 image
  iotests: Dump QCOW2 image metadata in JSON format with qcow2.py

 tests/qemu-iotests/031.out  |  22 +--
 tests/qemu-iotests/036.out  |   4 +-
 tests/qemu-iotests/061.out  |  18 +--
 tests/qemu-iotests/qcow2.py | 338 ++--
 4 files changed, 346 insertions(+), 36 deletions(-)

-- 
1.8.3.1




[PATCH v3 5/6] iotests: Dump bitmap table entries serialized in QCOW2 image

2020-06-01 Thread Andrey Shinkevich
Add bitmap table info to the QCOW2 metadata dump with qcow2.py.

Bitmap name   bitmap-1
...
itmap tabletypeoffset  size
0   serialized  0xa 65536
1   all-zeroes  0x0 65536
2   all-zeroes  0x0 65536
3   all-zeroes  0x0 65536
4   all-zeroes  0x0 65536
5   all-zeroes  0x0 65536
6   all-zeroes  0x0 65536
7   all-zeroes  0x0 65536

Signed-off-by: Andrey Shinkevich 
---
 tests/qemu-iotests/qcow2.py | 48 +
 1 file changed, 48 insertions(+)

diff --git a/tests/qemu-iotests/qcow2.py b/tests/qemu-iotests/qcow2.py
index e4453f6..76e0c69 100755
--- a/tests/qemu-iotests/qcow2.py
+++ b/tests/qemu-iotests/qcow2.py
@@ -5,6 +5,41 @@ import struct
 import string
 
 
+cluster_size = 0
+
+
+class Qcow2BitmapTableEntry:
+
+BME_TABLE_ENTRY_OFFSET_MASK = 0x00fffe00
+BME_TABLE_ENTRY_FLAG_ALL_ONES = 1
+bmte_type = ['all-zeroes', 'all-ones', 'serialized']
+
+def __init__(self, entry):
+self.cluster_size = cluster_size
+self.offset = entry & self.BME_TABLE_ENTRY_OFFSET_MASK
+if self.offset != 0:
+index = 2
+else:
+index = entry & self.BME_TABLE_ENTRY_FLAG_ALL_ONES
+self.type = self.bmte_type[index]
+
+
+class Qcow2BitmapTable:
+
+def __init__(self, raw_table):
+self.entries = []
+for entry in raw_table:
+self.entries.append(Qcow2BitmapTableEntry(entry))
+
+def print_bitmap_table(self):
+bitmap_table = enumerate(self.entries)
+print("Bitmap table\ttype\t\toffset\t\tsize")
+for i, entry in bitmap_table:
+print("\t%-4d\t%s\t%#x\t\t%d" % (i, entry.type, entry.offset,
+ entry.cluster_size))
+print("")
+
+
 class Qcow2BitmapDirEntry:
 
 name = ''
@@ -48,6 +83,12 @@ class Qcow2BitmapDirEntry:
 return struct.calcsize(self.fmt) + self.name_size + \
 self.extra_data_size
 
+def read_bitmap_table(self, fd):
+fd.seek(self.bitmap_table_offset)
+table_size = self.bitmap_table_size * struct.calcsize(self.uint64_t)
+table = [e[0] for e in struct.iter_unpack('>Q', fd.read(table_size))]
+self.bitmap_table = Qcow2BitmapTable(table)
+
 def dump_bitmap_dir_entry(self):
 print("%-25s" % 'Bitmap name', self.name)
 
@@ -59,6 +100,8 @@ class Qcow2BitmapDirEntry:
 value_str = f[1] % value
 print("%-25s" % f[2], value_str)
 
+self.bitmap_table.print_bitmap_table()
+
 
 class Qcow2BitmapDirectory:
 
@@ -83,6 +126,9 @@ class Qcow2BitmapDirectory:
 shift = ((entry_raw_size + 7) & ~7) - entry_raw_size
 fd.seek(shift, 1)
 
+for bm in self.bitmaps:
+bm.read_bitmap_table(fd)
+
 def get_bitmaps(self):
 return self.bitmaps
 
@@ -223,6 +269,8 @@ class QcowHeader:
 
 self.set_defaults()
 self.cluster_size = 1 << self.cluster_bits
+global cluster_size
+cluster_size = self.cluster_size
 
 fd.seek(self.header_length)
 self.load_extensions(fd)
-- 
1.8.3.1




[PATCH v3 2/6] iotests: move check for printable data to QcowHeaderExtension class

2020-06-01 Thread Andrey Shinkevich
Let us differ binary data type from string one for the extension data
variable and keep the string as the QcowHeaderExtension class member
in the script qcow2.py.

Signed-off-by: Andrey Shinkevich 
---
 tests/qemu-iotests/qcow2.py | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/tests/qemu-iotests/qcow2.py b/tests/qemu-iotests/qcow2.py
index e824b09..18e4923 100755
--- a/tests/qemu-iotests/qcow2.py
+++ b/tests/qemu-iotests/qcow2.py
@@ -13,6 +13,12 @@ class QcowHeaderExtension:
 QCOW2_EXT_MAGIC_DATA_FILE = 0x44415441
 
 def __init__(self, magic, length, data):
+data_str = data[:length]
+if all(c in string.printable.encode('ascii') for c in data_str):
+data_str = "'%s'" % data_str.decode('ascii')
+else:
+data_str = ""
+
 if length % 8 != 0:
 padding = 8 - (length % 8)
 data += b"\0" * padding
@@ -21,6 +27,7 @@ class QcowHeaderExtension:
 self.length = length
 self.data = data
 self.name = self.extension_name(magic)
+self.data_str = data_str
 
 @classmethod
 def create(cls, magic, data):
@@ -162,16 +169,10 @@ class QcowHeader:
 def dump_extensions(self):
 for ex in self.extensions:
 
-data = ex.data[:ex.length]
-if all(c in string.printable.encode('ascii') for c in data):
-data = "'%s'" % data.decode('ascii')
-else:
-data = ""
-
 print("%-25s %s" % ("Header extension:", ex.name))
 print("%-25s %#x" % ("magic", ex.magic))
 print("%-25s %d" % ("length", ex.length))
-print("%-25s %s" % ("data", data))
+print("%-25s %s" % ("data", ex.data_str))
 print("")
 
 
-- 
1.8.3.1




[PATCH v3 1/6] iotests: Add extension names to qcow2.py dump

2020-06-01 Thread Andrey Shinkevich
Header extension: Feature table
magic 0x6803f857
length192
data  

The change incurs modification of the output in 031, 036 and 061 test
cases.

Signed-off-by: Andrey Shinkevich 
---
 tests/qemu-iotests/031.out  | 22 +++---
 tests/qemu-iotests/036.out  |  4 ++--
 tests/qemu-iotests/061.out  | 18 +-
 tests/qemu-iotests/qcow2.py | 23 ---
 4 files changed, 42 insertions(+), 25 deletions(-)

diff --git a/tests/qemu-iotests/031.out b/tests/qemu-iotests/031.out
index 5a4beda..966c8d9 100644
--- a/tests/qemu-iotests/031.out
+++ b/tests/qemu-iotests/031.out
@@ -24,7 +24,7 @@ autoclear_features[]
 refcount_order4
 header_length 72
 
-Header extension:
+Header extension: Unknown
 magic 0x12345678
 length31
 data  'This is a test header extension'
@@ -52,7 +52,7 @@ autoclear_features[]
 refcount_order4
 header_length 72
 
-Header extension:
+Header extension: Unknown
 magic 0x12345678
 length31
 data  'This is a test header extension'
@@ -80,12 +80,12 @@ autoclear_features[]
 refcount_order4
 header_length 72
 
-Header extension:
+Header extension: Backing format
 magic 0xe2792aca
 length11
 data  'host_device'
 
-Header extension:
+Header extension: Unknown
 magic 0x12345678
 length31
 data  'This is a test header extension'
@@ -115,12 +115,12 @@ autoclear_features[]
 refcount_order4
 header_length 112
 
-Header extension:
+Header extension: Feature table
 magic 0x6803f857
 length336
 data  
 
-Header extension:
+Header extension: Unknown
 magic 0x12345678
 length31
 data  'This is a test header extension'
@@ -148,12 +148,12 @@ autoclear_features[]
 refcount_order4
 header_length 112
 
-Header extension:
+Header extension: Feature table
 magic 0x6803f857
 length336
 data  
 
-Header extension:
+Header extension: Unknown
 magic 0x12345678
 length31
 data  'This is a test header extension'
@@ -181,17 +181,17 @@ autoclear_features[]
 refcount_order4
 header_length 112
 
-Header extension:
+Header extension: Backing format
 magic 0xe2792aca
 length11
 data  'host_device'
 
-Header extension:
+Header extension: Feature table
 magic 0x6803f857
 length336
 data  
 
-Header extension:
+Header extension: Unknown
 magic 0x12345678
 length31
 data  'This is a test header extension'
diff --git a/tests/qemu-iotests/036.out b/tests/qemu-iotests/036.out
index e409acf..81a7366 100644
--- a/tests/qemu-iotests/036.out
+++ b/tests/qemu-iotests/036.out
@@ -24,7 +24,7 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 incompatible_features []
 compatible_features   []
 autoclear_features[63]
-Header extension:
+Header extension: Feature table
 magic 0x6803f857
 length336
 data  
@@ -36,7 +36,7 @@ No errors were found on the image.
 incompatible_features []
 compatible_features   []
 autoclear_features[]
-Header extension:
+Header extension: Feature table
 magic 0x6803f857
 length336
 data  
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index a51ad1b..7821b7f 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -24,7 +24,7 @@ autoclear_features[]
 refcount_order4
 header_length 112
 
-Header extension:
+Header extension: Feature table
 magic 0x6803f857
 length336
 data  
@@ -82,7 +82,7 @@ autoclear_features[]
 refcount_order4
 header_length 112
 
-Header extension:
+Header extension: Feature table
 magic 0x6803f857
 length336
 data  
@@ -138,7 +138,7 @@ autoclear_features[]
 refcount_order4
 header_length 112
 
-Header extension:
+Header extension: Feature table
 magic 0x6803f857
 length336
 data

[PATCH v3 4/6] iotests: Dump bitmap directory info with qcow2.py

2020-06-01 Thread Andrey Shinkevich
Read and dump entries from the bitmap directory of QCOW2 image with the
script qcow2.py.

Header extension: Bitmaps
...
Bitmap name   bitmap-1
flag  auto
bitmap_table_offset   0xf
bitmap_table_size 8
flag_bits 2
type  1
granularity_bits  16
name_size 8
extra_data_size   0

Suggested-by: Kevin Wolf 
Signed-off-by: Andrey Shinkevich 
---
 tests/qemu-iotests/qcow2.py | 104 +++-
 1 file changed, 103 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/qcow2.py b/tests/qemu-iotests/qcow2.py
index 8286115..e4453f6 100755
--- a/tests/qemu-iotests/qcow2.py
+++ b/tests/qemu-iotests/qcow2.py
@@ -5,6 +5,88 @@ import struct
 import string
 
 
+class Qcow2BitmapDirEntry:
+
+name = ''
+
+uint8_t = 'B'
+uint16_t = 'H'
+uint32_t = 'I'
+uint64_t = 'Q'
+
+fields = [
+[uint64_t, '%#x', 'bitmap_table_offset'],
+[uint32_t, '%d',  'bitmap_table_size'],
+[uint32_t, '%d',  'flag_bits'],
+[uint8_t,  '%d',  'type'],
+[uint8_t,  '%d',  'granularity_bits'],
+[uint16_t, '%d',  'name_size'],
+[uint32_t, '%d',  'extra_data_size']
+]
+
+fmt = '>' + ''.join(field[0] for field in fields)
+
+def __init__(self, data):
+
+entry = struct.unpack(Qcow2BitmapDirEntry.fmt, data)
+self.__dict__ = dict((field[2], entry[i])
+ for i, field in enumerate(
+ Qcow2BitmapDirEntry.fields))
+
+self.bitmap_table_size = self.bitmap_table_size \
+* struct.calcsize(self.uint64_t)
+
+self.bitmap_flags = []
+BME_FLAG_IN_USE = 1
+BME_FLAG_AUTO = 1 << 1
+if (self.flag_bits & BME_FLAG_IN_USE) != 0:
+self.bitmap_flags.append("in-use")
+if (self.flag_bits & BME_FLAG_AUTO) != 0:
+self.bitmap_flags.append("auto")
+
+def bitmap_dir_entry_raw_size(self):
+return struct.calcsize(self.fmt) + self.name_size + \
+self.extra_data_size
+
+def dump_bitmap_dir_entry(self):
+print("%-25s" % 'Bitmap name', self.name)
+
+for fl in self.bitmap_flags:
+print("%-25s" % 'flag', fl)
+
+for f in Qcow2BitmapDirEntry.fields:
+value = self.__dict__[f[2]]
+value_str = f[1] % value
+print("%-25s" % f[2], value_str)
+
+
+class Qcow2BitmapDirectory:
+
+def __init__(self, bm_header_ext):
+self.nb_bitmaps = bm_header_ext.nb_bitmaps
+self.bitmap_directory_offset = bm_header_ext.bitmap_directory_offset
+self.bitmap_directory_size = bm_header_ext.bitmap_directory_size
+
+def read_bitmap_directory(self, fd):
+self.bitmaps = []
+fd.seek(self.bitmap_directory_offset)
+buf_size = struct.calcsize(Qcow2BitmapDirEntry.fmt)
+
+for n in range(self.nb_bitmaps):
+buf = fd.read(buf_size)
+dir_entry = Qcow2BitmapDirEntry(buf)
+fd.seek(dir_entry.extra_data_size, 1)
+bitmap_name = fd.read(dir_entry.name_size)
+dir_entry.name = bitmap_name.decode('ascii')
+self.bitmaps.append(dir_entry)
+entry_raw_size = dir_entry.bitmap_dir_entry_raw_size()
+shift = ((entry_raw_size + 7) & ~7) - entry_raw_size
+fd.seek(shift, 1)
+
+def get_bitmaps(self):
+return self.bitmaps
+
+
 class Qcow2BitmapExt:
 
 uint32_t = 'I'
@@ -33,8 +115,21 @@ class Qcow2BitmapExt:
 print("%-25s" % f[2], value_str)
 print("")
 
+def read_bitmap_directory(self, fd):
+bm_directory = Qcow2BitmapDirectory(self)
+bm_directory.read_bitmap_directory(fd)
+self.bitmaps = bm_directory.get_bitmaps()
+
+def load(self, fd):
+self.read_bitmap_directory(fd)
+
+def dump_bitmap_directory(self):
+for bm in self.bitmaps:
+bm.dump_bitmap_dir_entry()
+
 def dump_ext(self):
 self.dump_bitmap_ext()
+self.dump_bitmap_directory()
 
 
 class QcowHeaderExtension:
@@ -79,6 +174,10 @@ class QcowHeaderExtension:
 self.QCOW2_EXT_MAGIC_DATA_FILE: 'Data file',
 }.get(magic, 'Unknown')
 
+def load(self, fd):
+if self.obj is not None:
+self.obj.load(fd)
+
 
 class QcowHeader:
 
@@ -157,7 +256,10 @@ class QcowHeader:
 else:
 padded = (length + 7) & ~7
 data = fd.read(padded)
-self.extensions.append(QcowHeaderExtension(magic, length, 
data))
+self.extensions.append(QcowHeaderExtension(magic, length,
+   data))
+for ex in self.extensions:
+ex.load(fd)
 
 def update_extensions(self, fd):
 
-- 
1.8.3.1




[PATCH v3 3/6] iotests: dump bitmap extension data with qcow2.py

2020-06-01 Thread Andrey Shinkevich
Add bitmap header extension data, if any, to the dump in qcow2.py.

Header extension: Bitmaps
magic 0x23852875
length24
nb_bitmaps2
reserved320
bitmap_directory_size 0x40
bitmap_directory_offset   0x10

Signed-off-by: Andrey Shinkevich 
---
 tests/qemu-iotests/qcow2.py | 42 +-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/qcow2.py b/tests/qemu-iotests/qcow2.py
index 18e4923..8286115 100755
--- a/tests/qemu-iotests/qcow2.py
+++ b/tests/qemu-iotests/qcow2.py
@@ -4,6 +4,39 @@ import sys
 import struct
 import string
 
+
+class Qcow2BitmapExt:
+
+uint32_t = 'I'
+uint64_t = 'Q'
+
+fields = [
+[uint32_t, '%d',  'nb_bitmaps'],
+[uint32_t, '%d',  'reserved32'],
+[uint64_t, '%#x', 'bitmap_directory_size'],
+[uint64_t, '%#x', 'bitmap_directory_offset']
+]
+
+fmt = '>' + ''.join(field[0] for field in fields)
+
+def __init__(self, data):
+
+extension = struct.unpack(Qcow2BitmapExt.fmt, data)
+self.__dict__ = dict((field[2], extension[i])
+ for i, field in enumerate(Qcow2BitmapExt.fields))
+
+def dump_bitmap_ext(self):
+for f in Qcow2BitmapExt.fields:
+value = self.__dict__[f[2]]
+value_str = f[1] % value
+
+print("%-25s" % f[2], value_str)
+print("")
+
+def dump_ext(self):
+self.dump_bitmap_ext()
+
+
 class QcowHeaderExtension:
 
 QCOW2_EXT_MAGIC_BACKING_FORMAT = 0xE2792ACA
@@ -13,12 +46,16 @@ class QcowHeaderExtension:
 QCOW2_EXT_MAGIC_DATA_FILE = 0x44415441
 
 def __init__(self, magic, length, data):
+self.obj = None
 data_str = data[:length]
 if all(c in string.printable.encode('ascii') for c in data_str):
 data_str = "'%s'" % data_str.decode('ascii')
 else:
 data_str = ""
 
+if magic == self.QCOW2_EXT_MAGIC_BITMAPS:
+self.obj = Qcow2BitmapExt(data)
+
 if length % 8 != 0:
 padding = 8 - (length % 8)
 data += b"\0" * padding
@@ -172,7 +209,10 @@ class QcowHeader:
 print("%-25s %s" % ("Header extension:", ex.name))
 print("%-25s %#x" % ("magic", ex.magic))
 print("%-25s %d" % ("length", ex.length))
-print("%-25s %s" % ("data", ex.data_str))
+if ex.obj is not None:
+ex.obj.dump_ext()
+else:
+print("%-25s %s" % ("data", ex.data_str))
 print("")
 
 
-- 
1.8.3.1




Re: [PATCH 2/5] vhost: involve device backends in feature negotiation

2020-06-01 Thread Jason Wang



On 2020/5/29 下午9:56, Stefan Hajnoczi wrote:

On Fri, May 29, 2020 at 03:04:54PM +0800, Jason Wang wrote:

On 2020/5/23 上午1:17, Stefan Hajnoczi wrote:

Many vhost devices in QEMU currently do not involve the device backend
in feature negotiation. This seems fine at first glance for device types
without their own feature bits (virtio-net has many but other device
types have none).

This overlooks the fact that QEMU's virtqueue implementation and the
device backend's implementation may support different features.  QEMU
must not report features to the guest that the the device backend
doesn't support.

For example, QEMU supports VIRTIO 1.1 packed virtqueues while many
existing vhost device backends do not. When the user sets packed=on the
device backend breaks. This should have been handled gracefully by
feature negotiation instead.

Introduce vhost_get_default_features() and update all vhost devices in
QEMU to involve the device backend in feature negotiation.

This patch fixes the following error:

$ x86_64-softmmu/qemu-system-x86_64 \
-drive if=virtio,file=test.img,format=raw \
-chardev socket,path=/tmp/vhost-user-blk.sock,id=char0 \
-device vhost-user-blk-pci,chardev=char0,packed=on \
-object memory-backend-memfd,size=1G,share=on,id=ram0 \
-M accel=kvm,memory-backend=ram0
qemu-system-x86_64: Failed to set msg fds.
qemu-system-x86_64: vhost VQ 0 ring restore failed: -1: Success (0)


It looks to me this could be fixed simply by adding VIRTIO_F_RING_PACKED
into user_feature_bits in vhost-user-blk.c.

And for the rest, we need require them to call vhost_get_features() with the
proper feature bits that needs to be acked by the backend.

There is a backwards-compatibility problem: we cannot call
vhost_get_features() with the full set of feature bits that each device
type supports because existing vhost-user backends don't advertise
features properly. QEMU disables features not advertised by the
vhost-user backend.

For example, if we add VIRTIO_RING_F_EVENT_IDX then it will always be
disabled for existing libvhost-user backends because they don't
advertise this bit :(.



Agree.




The reason I introduced vhost_get_default_features() is to at least
salvage VIRTIO_F_IOMMU_PLATFORM and VIRTIO_F_RING_PACKED. These bits can
be safely negotiated for all devices.

Any new transport/vring VIRTIO feature bits can also be added to
vhost_get_default_features() safely.

If a vhost-user device wants to use device type-specific feature bits
then it really needs to call vhost_get_features() directly as you
suggest. But it's important to check whether existing vhost-user
backends actually advertise them - because if they don't advertise them
but rely on them then we'll break existing backends.

Would you prefer a different approach?



I get you now so I think not.

Maybe we need document the expected behavior of VHOST_USER_GET_FEATURES 
e.g which set of features that must be advertised explicitly.


And for the set you mention here, we probably need to add:

VIRTIO_F_ORDER_PLATFORM,
VIRTIO_F_ANY_LAYOUT (not sure if it was too late).

And

VIRTIO_F_IN_ORDER





diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index aff98a0ede..f8a144dcd0 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -48,6 +48,23 @@ static unsigned int used_memslots;
   static QLIST_HEAD(, vhost_dev) vhost_devices =
   QLIST_HEAD_INITIALIZER(vhost_devices);
+/*
+ * Feature bits that device backends must explicitly report. Feature bits not
+ * listed here maybe set by QEMU without checking with the device backend.
+ * Ideally all feature bits would be listed here but existing vhost device
+ * implementations do not explicitly report bits like VIRTIO_F_VERSION_1, so we
+ * can only assume they are supported.


For most backends, we care about the features for datapath. So this is not
true at least for networking device, since VIRTIO_F_VERSION_1 have impact on
the length of vnet header length.

The net device is in good shape and doesn't use vhost_default_feature_bits[].

vhost_default_feature_bits[] is for devices that haven't been
negotiating feature bits properly. The goal is start involving them in
feature negotiation without breaking existing backends.

Would you like me to rephrase this comment in some way?



That will be better.





+ *
+ * New feature bits added to the VIRTIO spec should usually be included here
+ * so that existing vhost device backends that do not support them yet continue
+ * to work.


It actually depends on the type of backend.

Kernel vhost-net does not validate GSO features since qemu can talk directly
to TAP and vhost doesn't report those features. But for vhost-user GSO
features must be validated by qemu since qemu can't see what is behind
vhost-user.

Maybe the comment should say "New transport/vring feature bits". I'm
thinking about things like VIRTIO_F_RING_PACKED that are not
device-specific but require backend support.

The GSO features you 

Re: [PULL 00/25] python-next patches for 2020-05-31

2020-06-01 Thread Peter Maydell
On Sun, 31 May 2020 at 17:40, Philippe Mathieu-Daudé  wrote:
>
> The following changes since commit c86274bc2e34295764fb44c2aef3cf29623f9b4b:
>
>   Merge remote-tracking branch 'remotes/stsquad/tags/pull-testing-tcg-plugins=
> -270520-1' into staging (2020-05-29 17:41:45 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/philmd/qemu.git tags/python-next-20200531
>
> for you to fetch changes up to 1c80c87c8c2489e4318c93c844aa29bc1d014146:
>
>   tests/acceptance: refactor boot_linux to allow code reuse (2020-05-31 18:25=
> :31 +0200)
>
> 
> Python queue:
>
> * migration acceptance test fix
> * introduce pylintrc & flake8 config
> * various cleanups (Python3, style)
> * vm-test can set QEMU_LOCAL=3D1 to use locally built binaries
> * refactored BootLinuxBase & LinuxKernelTest acceptance classes
>
> https://gitlab.com/philmd/qemu/pipelines/151323210
> https://travis-ci.org/github/philmd/qemu/builds/693157969
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.1
for any user-visible changes.

-- PMM