Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading preparation part 3
On 2019/8/9 16:21, Stefan Hajnoczi wrote: > On Thu, Aug 08, 2019 at 10:53:16AM +0100, Dr. David Alan Gilbert wrote: >> * Stefan Hajnoczi (stefa...@redhat.com) wrote: >>> On Wed, Aug 07, 2019 at 04:57:15PM -0400, Vivek Goyal wrote: >>> 2. Can MAP/UNMAP be performed directly in QEMU via a separate virtqueue? >> >> I think there's two things to solve here that I don't currently know the >> answer to: >> 2a) We'd need to get the fd to qemu for the thing to mmap; >> we might be able to cache the fd on the qemu side for existing >> mappings, so when asking for a new mapping for an existing file then >> it would already have the fd. >> >> 2b) Running a device with a mix of queues inside QEMU and on >> vhost-user; I don't think we have anything with that mix > > vhost-user-net works in the same way. The ctrl queue is handled by QEMU > and the rx/tx queues by the vhost device. This is in fact how vhost was > initially designed: the vhost device is not a full virtio device, only > the dataplane. Agreed. > >>> 3. Can READ/WRITE be performed directly in QEMU via a separate virtqueue >>>to eliminate the bad address problem? >> >> Are you thinking of doing all read/writes that way, or just the corner >> cases? It doesn't seem worth it for the corner cases unless you're >> finding them cropping up in real work loads. > > Send all READ/WRITE requests to QEMU instead of virtiofsd. > > Only handle metadata requests in virtiofsd (OPEN, RELEASE, READDIR, > MKDIR, etc). > Sorry for not catching your point, and I like the virtiofsd to do READ/WRITE requests and qemu handle metadata requests, as virtiofsd is good at processing dataplane things due to thread-pool and CPU affinity(maybe in the future). As you said, virtiofsd is just acting as a vhost-user device which should care less about ctrl request. If our concern is improving mmap/write/read performance, why not adding a delay worker for unmmap which could decrease the ummap times. Maybe virtiofsd could still handle both data and meta requests by this way. Jun
Re: [Qemu-devel] [RFC PATCH v2 00/39] rewrite MMX/SSE instruction translation
On 8/9/19 9:12 PM, Jan Bobek wrote: > This is a v2 of the patch series posted in [1]. Patches 1-9 are just > cleanups; patches 10-39 are something actually interesting. Compared > to v1, I started using preprocessor more extensively to generate > repetitive boilerplate code; opinions/alternatives are welcome and > appreciated. This is tricky. I'm not keen on code entirely expanded via macros because it becomes extremely difficult to debug. All statements get recorded at the same line of the location of the expansion, which makes the gdb "step" command finish the entire function because there is no next line. Once upon a time I wrote some code that's extremely macro crazy: https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=soft-fp/op-common.h;hb=HEAD It has been extremely difficult to maintain over the years. We have just recently gotten rid of some of the macros in the softmmu code https://patchwork.ozlabs.org/project/qemu-devel/list/?series=105441 replacing most of them with inline functions. A lot of what you have needs very little adjustment to address the debugging problem. E.g. > +#define INSNOP_INIT(opT, init_stmt)\ > +static int insnop_init(opT)(CPUX86State *env, DisasContext *s, \ > +int modrm, insnop_t(opT) *op) \ > +{ \ > +init_stmt; \ > +} > +INSNOP( > +M, TCGv, > +do { > +if (decode_modrm_mod(env, s, modrm) == 3) { > +INSNOP_INIT_FAIL; > +} else { > +INSNOP_INIT_OK(s->A0); > +} > +} while (0), > +do { > +assert(*op == s->A0); > +gen_lea_modrm(env, s, modrm); > +} while (0), > +INSNOP_FINALIZE_NOOP) Rearrange this as #define INSNOP_INIT(OPT) \ static bool insnop_##OPT##_init(CPUX86State *env, DisasContext *s, \ int modrm, insnop_##OPT##_t *op) #define INSNOP_PREPARE(OPT) \ static void insnop_##OPT##_prepare(CPUX86State *env, DisasContext *s, \ int modrm, insnop_##OPT##_t *op) INSNOP_INIT(M) { if (decode_modrm_mod(env, s, modrm) == 3) { INSNOP_INIT_FAIL; } else { INSNOP_INIT_OK(s->A0); } } INSNOP_PREPARE(M) { assert(*op == s->A0); gen_lea_modrm(env, s, modrm); } etc and suddenly the entire expansion does not occur on a single line. Further specific commentary to follow. r~
Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading preparation part 3
On Thu, Aug 08, 2019 at 08:53:20AM -0400, Vivek Goyal wrote: > On Thu, Aug 08, 2019 at 10:53:16AM +0100, Dr. David Alan Gilbert wrote: > > * Stefan Hajnoczi (stefa...@redhat.com) wrote: > > > On Wed, Aug 07, 2019 at 04:57:15PM -0400, Vivek Goyal wrote: > > > > Kernel also serializes MAP/UNMAP on one inode. So you will need to run > > > > multiple jobs operating on different inodes to see parallel MAP/UNMAP > > > > (atleast from kernel's point of view). > > > > > > Okay, there is still room to experiment with how MAP and UNMAP are > > > handled by virtiofsd and QEMU even if the host kernel ultimately becomes > > > the bottleneck. > > > > > > One possible optimization is to eliminate REMOVEMAPPING requests when > > > the guest driver knows a SETUPMAPPING will follow immediately. I see > > > the following request pattern in a fio randread iodepth=64 job: > > > > > > unique: 995348, opcode: SETUPMAPPING (48), nodeid: 135, insize: 80, > > > pid: 1351 > > > lo_setupmapping(ino=135, fi=0x(nil), foffset=3860856832, len=2097152, > > > moffset=859832320, flags=0) > > > unique: 995348, success, outsize: 16 > > > unique: 995350, opcode: REMOVEMAPPING (49), nodeid: 135, insize: 60, > > > pid: 12 > > > unique: 995350, success, outsize: 16 > > > unique: 995352, opcode: SETUPMAPPING (48), nodeid: 135, insize: 80, > > > pid: 1351 > > > lo_setupmapping(ino=135, fi=0x(nil), foffset=16777216, len=2097152, > > > moffset=861929472, flags=0) > > > unique: 995352, success, outsize: 16 > > > unique: 995354, opcode: REMOVEMAPPING (49), nodeid: 135, insize: 60, > > > pid: 12 > > > unique: 995354, success, outsize: 16 > > > virtio_send_msg: elem 9: with 1 in desc of length 16 > > > unique: 995356, opcode: SETUPMAPPING (48), nodeid: 135, insize: 80, > > > pid: 1351 > > > lo_setupmapping(ino=135, fi=0x(nil), foffset=383778816, len=2097152, > > > moffset=864026624, flags=0) > > > unique: 995356, success, outsize: 16 > > > unique: 995358, opcode: REMOVEMAPPING (49), nodeid: 135, insize: 60, > > > pid: 12 > > > > > > The REMOVEMAPPING requests are unnecessary since we can map over the top > > > of the old mapping instead of taking the extra step of removing it > > > first. > > > > Yep, those should go - I think Vivek likes to keep them for testing > > since they make things fail more completely if there's a screwup. > > I like to keep them because otherwise they keep the resources busy > on host. If DAX range is being used immediately, then this optimization > makes more sense. I will keep this in mind. > Other than the resource not being released, do you think there'll be any stale data problem if we don't do removemapping at all, neither background reclaim nor inline reclaim? (truncate/punch_hole/evict_inode still needs to remove mapping though) thanks, -liubo > > > > > Some more questions to consider for DAX performance optimization: > > > > > > 1. Is FUSE_READ/FUSE_WRITE more efficient than DAX for some I/O patterns? > > > > Probably for cases where the data is only accessed once, and you can't > > preemptively map. > > Another variant on (1) is whether we could do read/writes while the mmap > > is happening to absorb the latency. > > For small random I/O, dax might not be very effective. Overhead of > setting up mapping and tearing it down is significant. > > Vivek > > ___ > Virtio-fs mailing list > virtio...@redhat.com > https://www.redhat.com/mailman/listinfo/virtio-fs
Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading preparation part 3
On Fri, Aug 09, 2019 at 09:21:02AM +0100, Stefan Hajnoczi wrote: > On Thu, Aug 08, 2019 at 10:53:16AM +0100, Dr. David Alan Gilbert wrote: > > * Stefan Hajnoczi (stefa...@redhat.com) wrote: > > > On Wed, Aug 07, 2019 at 04:57:15PM -0400, Vivek Goyal wrote: > > > 2. Can MAP/UNMAP be performed directly in QEMU via a separate virtqueue? > > > > I think there's two things to solve here that I don't currently know the > > answer to: > > 2a) We'd need to get the fd to qemu for the thing to mmap; > > we might be able to cache the fd on the qemu side for existing > > mappings, so when asking for a new mapping for an existing file then > > it would already have the fd. > > > > 2b) Running a device with a mix of queues inside QEMU and on > > vhost-user; I don't think we have anything with that mix > > vhost-user-net works in the same way. The ctrl queue is handled by QEMU > and the rx/tx queues by the vhost device. This is in fact how vhost was > initially designed: the vhost device is not a full virtio device, only > the dataplane. > > > 3. Can READ/WRITE be performed directly in QEMU via a separate virtqueue > > >to eliminate the bad address problem? > > > > Are you thinking of doing all read/writes that way, or just the corner > > cases? It doesn't seem worth it for the corner cases unless you're > > finding them cropping up in real work loads. > > Send all READ/WRITE requests to QEMU instead of virtiofsd. > > Only handle metadata requests in virtiofsd (OPEN, RELEASE, READDIR, > MKDIR, etc). For now qemu is not aware of virtio-fs's fd info, but I think it's doable, I like the idea. thanks, -liubo > > > > I'm not going to tackle DAX optimization myself right now but wanted to > > > share these ideas. > > > > One I was thinking about that feels easier than (2) was to change the > > vhost slave protocol to be split transaction; it wouldn't do anything > > for the latency but it would be able to do some in parallel if we can > > get the kernel to feed it. > > There are two cases: > 1. mmapping multiple inode. This should benefit from parallelism, >although mmap is still expensive because it involves TLB shootdown >for all other threads running this process. > 2. mmapping the same inode. Here the host kernel is likely to serialize >mmaps even more, making it hard to gain performance. > > It's probably worth writing a tiny benchmark first to evaluate the > potential gains. > > Stefan > ___ > Virtio-fs mailing list > virtio...@redhat.com > https://www.redhat.com/mailman/listinfo/virtio-fs
Re: [Qemu-devel] [PATCH v3 20/29] Include qemu/main-loop.h less
Alex Bennée writes: > Markus Armbruster writes: > >> Philippe Mathieu-Daudé writes: >> >>> On 8/9/19 8:46 AM, Markus Armbruster wrote: In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster --- >>> [...] diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 77f5df59b0..ac18a1184a 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -5,7 +5,6 @@ #include "qapi/qapi-types-run-state.h" #include "qemu/timer.h" #include "qemu/notify.h" -#include "qemu/main-loop.h" #include "qemu/bitmap.h" #include "qemu/uuid.h" #include "qom/object.h" >>> >>> netmap failing again :S >>> >>> $ make docker-image-debian-amd64 V=1 DEBUG=1 >>> [...] >>> CC net/netmap.o >>> net/netmap.c: In function 'netmap_update_fd_handler': >>> net/netmap.c:109:5: error: implicit declaration of function >>> 'qemu_set_fd_handler' [-Werror=implicit-function-declaration] >>> qemu_set_fd_handler(s->nmd->fd, >>> ^~~ >>> net/netmap.c:109:5: error: nested extern declaration of >>> 'qemu_set_fd_handler' [-Werror=nested-externs] >> >> I managed to lose the fix somehow. >> >> I admit I ran "make docker-test-build", realized docker needs root, and >> went "sod it, cross fingers & send out the patches". > > I've sent some patches to make docker-test-build more closely resemble > what shippable exercises. > > As for root you can setup a docker group and do it that way (see the > docs in docs/devel/testing.rst). It's not recommended for production > machines as it makes escalation fairly trivial (the daemon itself still > runs as root). As Dan Walsh explained in a blog post[*], access to the docker socket is equivalent to root. Might be okay on a throwaway or special-purpose box, but definitely not on my desktop. The solution the blog post recommends for now is sudo with password, which I consider only marginally better: instead of leaving the safe door open, we install a security camera to log access to the safe, *then* leave the safe door open. Just in case whoever helps himself to the contents of the safe is too lazy to help himself to the logs, too. In the great tradition of throwing security under the bus to get work done, I set up sudo. Avoiding NOPASSWD: turns out to be impractical. Running "make docker-test-build" fails for me on master (v4.1.0-rc4), details appended. >Hopefully Marc's podman support: > > Subject: [PATCH v2 0/5] tests/docker: add podman support > Date: Tue, 9 Jul 2019 23:43:25 +0400 > Message-Id: <20190709194330.837-1-marcandre.lur...@redhat.com> > > will make these requirements a little less onerous. Sounds like a much needed upgrade to me. [...] [*] https://www.projectatomic.io/blog/2015/08/why-we-dont-let-non-root-users-run-docker-in-centos-fedora-or-rhel/ My failure: $ make -C bld docker-test-build make: Entering directory '/work/armbru/qemu/bld' BUILD centos7 make[1]: Entering directory '/work/armbru/qemu/bld' GEN /work/armbru/qemu/bld/docker-src.2019-08-10-07.29.32.8915/qemu.tar COPYRUNNER RUN test-build in qemu:centos7 [...] make[1]: Leaving directory '/work/armbru/qemu/bld' BUILD debian9 BUILD debian-amd64 make[1]: Entering directory '/work/armbru/qemu/bld' GEN /work/armbru/qemu/bld/docker-src.2019-08-10-07.30.18.17180/qemu.tar COPYRUNNER RUN test-build in qemu:debian-amd64 [...] install -c -m 0644 /tmp/qemu-test/build/trace-events-all "/tmp/qemu-test/build/=destdir/tmp/qemu-test/install/share/qemu/trace-events-all" Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs func(*targs, **kargs) File "/work/armbru/qemu/tests/docker/docker.py", line 234, in _kill_instances return self._do_kill_instances(True) File "/work/armbru/qemu/tests/docker/docker.py", line 213, in _do_kill_instances for i in self._output(cmd).split(): File "/work/armbru/qemu/tests/docker/docker.py", line 239, in _output **kwargs) File "/usr/lib64/python2.7/subprocess.py", line 223, in check_output raise CalledProcessError(retcode, cmd, output=output) CalledProcessError: Command '['sudo', 'docker', 'ps', '-q']' returned non-zero exit status 1 Error in sys.exitfunc: Traceback (most recent call last):
[Qemu-devel] [PATCH v3 4/7] block/backup: drop handling of max_transfer for copy_range
Since previous commit, copy_range supports max_transfer, so we don't need to handle it by hand. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Max Reitz --- block/backup.c | 11 ++- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/block/backup.c b/block/backup.c index c6a3b2b7bb..228ba9423c 100644 --- a/block/backup.c +++ b/block/backup.c @@ -54,7 +54,6 @@ typedef struct BackupBlockJob { QLIST_HEAD(, CowRequest) inflight_reqs; bool use_copy_range; -int64_t copy_range_size; BdrvRequestFlags write_flags; bool initializing_bitmap; @@ -156,12 +155,11 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job, int ret; int nr_clusters; BlockBackend *blk = job->common.blk; -int nbytes; +int nbytes = end - start; int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0; -assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size)); +assert(end - start < INT_MAX); assert(QEMU_IS_ALIGNED(start, job->cluster_size)); -nbytes = MIN(job->copy_range_size, end - start); nr_clusters = DIV_ROUND_UP(nbytes, job->cluster_size); bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size * nr_clusters); @@ -756,11 +754,6 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs, job->copy_bitmap = copy_bitmap; copy_bitmap = NULL; job->use_copy_range = !compress; /* compression isn't supported for it */ -job->copy_range_size = MIN_NON_ZERO(blk_get_max_transfer(job->common.blk), -blk_get_max_transfer(job->target)); -job->copy_range_size = MAX(job->cluster_size, - QEMU_ALIGN_UP(job->copy_range_size, - job->cluster_size)); /* Required permissions are already taken with target's blk_new() */ block_job_add_bdrv(>common, "target", target, 0, BLK_PERM_ALL, -- 2.18.0
[Qemu-devel] [PATCH v3 7/7] block/backup: merge duplicated logic into backup_do_cow
backup_cow_with_offload and backup_cow_with_bounce_buffer contains a lot of duplicated logic. Move it into backup_do_cow. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Max Reitz --- block/backup.c | 97 -- 1 file changed, 38 insertions(+), 59 deletions(-) diff --git a/block/backup.c b/block/backup.c index 65f7212c85..0ac31c2760 100644 --- a/block/backup.c +++ b/block/backup.c @@ -104,87 +104,61 @@ static void cow_request_end(CowRequest *req) * error occurred, return a negative error number * * @bounce_buffer is assumed to enough to store - * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @bytes) bytes */ -static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job, - int64_t start, - int64_t end, - bool is_write_notifier, - bool *error_is_read, - void *bounce_buffer) +static int coroutine_fn backup_cow_with_bounce_buffer( +BackupBlockJob *job, int64_t offset, int64_t bytes, +BdrvRequestFlags read_flags, bool *error_is_read, void *bounce_buffer) { -int ret; BlockBackend *blk = job->common.blk; -int nbytes, remaining_bytes; -int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0; - -assert(QEMU_IS_ALIGNED(start, job->cluster_size)); -bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start); -nbytes = MIN(end - start, job->len - start); - -remaining_bytes = nbytes; -while (remaining_bytes) { -int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes); +while (bytes) { +int ret; +int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, bytes); -ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags); +ret = blk_co_pread(blk, offset, chunk, bounce_buffer, read_flags); if (ret < 0) { -trace_backup_do_cow_read_fail(job, start, ret); +trace_backup_do_cow_read_fail(job, offset, ret); if (error_is_read) { *error_is_read = true; } -goto fail; +return ret; } -ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer, +ret = blk_co_pwrite(job->target, offset, chunk, bounce_buffer, job->write_flags); if (ret < 0) { -trace_backup_do_cow_write_fail(job, start, ret); +trace_backup_do_cow_write_fail(job, offset, ret); if (error_is_read) { *error_is_read = false; } -goto fail; +return ret; } -start += chunk; -remaining_bytes -= chunk; +offset += chunk; +bytes -= chunk; } -return nbytes; -fail: -bdrv_set_dirty_bitmap(job->copy_bitmap, start, job->cluster_size); -return ret; - +return 0; } /* Copy range to target and return the bytes copied. If error occurred, return a * negative error number. */ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job, -int64_t start, -int64_t end, -bool is_write_notifier) +int64_t offset, +int64_t bytes, +BdrvRequestFlags read_flags) { int ret; -int nr_clusters; BlockBackend *blk = job->common.blk; -int nbytes = MIN(end - start, job->len - start); -int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0; - -assert(end - start < INT_MAX); -assert(QEMU_IS_ALIGNED(start, job->cluster_size)); -nr_clusters = DIV_ROUND_UP(nbytes, job->cluster_size); -bdrv_reset_dirty_bitmap(job->copy_bitmap, start, -job->cluster_size * nr_clusters); -ret = blk_co_copy_range(blk, start, job->target, start, nbytes, + +ret = blk_co_copy_range(blk, offset, job->target, offset, bytes, read_flags, job->write_flags); if (ret < 0) { -trace_backup_do_cow_copy_range_fail(job, start, ret); -bdrv_set_dirty_bitmap(job->copy_bitmap, start, - job->cluster_size * nr_clusters); -return ret; +trace_backup_do_cow_copy_range_fail(job, offset, ret); } -return nbytes; +return ret; } /* @@ -268,6 +242,8 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job, int64_t start, end; /* bytes */ void *bounce_buffer = NULL; int64_t skip_bytes; +BdrvRequestFlags read_flags = +is_write_notifier ?
[Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
backup_cow_with_offload can transfer more than one cluster. Let backup_cow_with_bounce_buffer behave similarly. It reduces the number of IO requests, since there is no need to copy cluster by cluster. Logic around bounce_buffer allocation changed: we can't just allocate one-cluster-sized buffer to share for all iterations. We can't also allocate buffer of full-request length it may be too large, so BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic is to allocate a buffer sufficient to handle all remaining iterations at the point where we need the buffer for the first time. Bonus: get rid of pointer-to-pointer. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/backup.c | 65 +++--- 1 file changed, 41 insertions(+), 24 deletions(-) diff --git a/block/backup.c b/block/backup.c index d482d93458..65f7212c85 100644 --- a/block/backup.c +++ b/block/backup.c @@ -27,6 +27,7 @@ #include "qemu/error-report.h" #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16) +#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024) typedef struct CowRequest { int64_t start_byte; @@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req) qemu_co_queue_restart_all(>wait_queue); } -/* Copy range to target with a bounce buffer and return the bytes copied. If - * error occurred, return a negative error number */ +/* + * Copy range to target with a bounce buffer and return the bytes copied. If + * error occurred, return a negative error number + * + * @bounce_buffer is assumed to enough to store + * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes + */ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job, int64_t start, int64_t end, bool is_write_notifier, bool *error_is_read, - void **bounce_buffer) + void *bounce_buffer) { int ret; BlockBackend *blk = job->common.blk; -int nbytes; +int nbytes, remaining_bytes; int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0; assert(QEMU_IS_ALIGNED(start, job->cluster_size)); -bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size); -nbytes = MIN(job->cluster_size, job->len - start); -if (!*bounce_buffer) { -*bounce_buffer = blk_blockalign(blk, job->cluster_size); -} +bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start); +nbytes = MIN(end - start, job->len - start); -ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags); -if (ret < 0) { -trace_backup_do_cow_read_fail(job, start, ret); -if (error_is_read) { -*error_is_read = true; + +remaining_bytes = nbytes; +while (remaining_bytes) { +int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes); + +ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags); +if (ret < 0) { +trace_backup_do_cow_read_fail(job, start, ret); +if (error_is_read) { +*error_is_read = true; +} +goto fail; } -goto fail; -} -ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer, -job->write_flags); -if (ret < 0) { -trace_backup_do_cow_write_fail(job, start, ret); -if (error_is_read) { -*error_is_read = false; +ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer, +job->write_flags); +if (ret < 0) { +trace_backup_do_cow_write_fail(job, start, ret); +if (error_is_read) { +*error_is_read = false; +} +goto fail; } -goto fail; + +start += chunk; +remaining_bytes -= chunk; } return nbytes; @@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job, } } if (!job->use_copy_range) { +if (!bounce_buffer) { +size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER, + MAX(dirty_end - start, end - dirty_end)); +bounce_buffer = blk_try_blockalign(job->common.blk, len); +} ret = backup_cow_with_bounce_buffer(job, start, dirty_end, is_write_notifier, -error_is_read, _buffer); +error_is_read, bounce_buffer); } if (ret < 0) { break; -- 2.18.0
[Qemu-devel] [PATCH v3 3/7] block/io: handle alignment and max_transfer for copy_range
copy_range ignores these limitations, let's improve it. Signed-off-by: Vladimir Sementsov-Ogievskiy --- block/io.c | 44 1 file changed, 36 insertions(+), 8 deletions(-) diff --git a/block/io.c b/block/io.c index 06305c6ea6..45b1e1f76e 100644 --- a/block/io.c +++ b/block/io.c @@ -3005,11 +3005,13 @@ static int coroutine_fn bdrv_co_copy_range_internal( { BdrvTrackedRequest req; int ret; +uint32_t align, max_transfer; /* TODO We can support BDRV_REQ_NO_FALLBACK here */ assert(!(read_flags & BDRV_REQ_NO_FALLBACK)); assert(!(write_flags & BDRV_REQ_NO_FALLBACK)); + if (!dst || !dst->bs) { return -ENOMEDIUM; } @@ -3029,9 +3031,19 @@ static int coroutine_fn bdrv_co_copy_range_internal( return ret; } +align = MAX(src->bs->bl.request_alignment, dst->bs->bl.request_alignment); +max_transfer = +QEMU_ALIGN_DOWN(MIN_NON_ZERO(MIN_NON_ZERO(src->bs->bl.max_transfer, + dst->bs->bl.max_transfer), + INT_MAX), align); + if (!src->bs->drv->bdrv_co_copy_range_from || !dst->bs->drv->bdrv_co_copy_range_to -|| src->bs->encrypted || dst->bs->encrypted) { +|| src->bs->encrypted || dst->bs->encrypted || +(max_transfer == 0 && bytes > 0) || +!QEMU_IS_ALIGNED(src_offset, src->bs->bl.request_alignment) || +!QEMU_IS_ALIGNED(dst_offset, dst->bs->bl.request_alignment) || +!QEMU_IS_ALIGNED(bytes, align)) { return -ENOTSUP; } @@ -3046,11 +3058,22 @@ static int coroutine_fn bdrv_co_copy_range_internal( wait_serialising_requests(); } -ret = src->bs->drv->bdrv_co_copy_range_from(src->bs, -src, src_offset, -dst, dst_offset, -bytes, -read_flags, write_flags); +while (bytes) { +int num = MIN(bytes, max_transfer); + +ret = src->bs->drv->bdrv_co_copy_range_from(src->bs, +src, src_offset, +dst, dst_offset, +num, +read_flags, +write_flags); +if (ret < 0) { +break; +} +bytes -= num; +src_offset += num; +dst_offset += num; +} tracked_request_end(); bdrv_dec_in_flight(src->bs); @@ -3060,12 +3083,17 @@ static int coroutine_fn bdrv_co_copy_range_internal( BDRV_TRACKED_WRITE); ret = bdrv_co_write_req_prepare(dst, dst_offset, bytes, , write_flags); -if (!ret) { +while (!ret && bytes) { +int num = MIN(bytes, max_transfer); + ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs, src, src_offset, dst, dst_offset, - bytes, + num, read_flags, write_flags); +bytes -= num; +src_offset += num; +dst_offset += num; } bdrv_co_write_req_finish(dst, dst_offset, bytes, , ret); tracked_request_end(); -- 2.18.0
[Qemu-devel] [PATCH v3 5/7] block/backup: fix backup_cow_with_offload for last cluster
We shouldn't try to copy bytes beyond EOF. Fix it. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Max Reitz --- block/backup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/backup.c b/block/backup.c index 228ba9423c..d482d93458 100644 --- a/block/backup.c +++ b/block/backup.c @@ -155,7 +155,7 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job, int ret; int nr_clusters; BlockBackend *blk = job->common.blk; -int nbytes = end - start; +int nbytes = MIN(end - start, job->len - start); int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0; assert(end - start < INT_MAX); -- 2.18.0
[Qemu-devel] [PATCH v3 2/7] block/backup: refactor write_flags
write flags are constant, let's store it in BackupBlockJob instead of recalculating. It also makes two boolean fields to be unused, so, drop them. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: John Snow Reviewed-by: Max Reitz --- block/backup.c | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/block/backup.c b/block/backup.c index d815436455..c6a3b2b7bb 100644 --- a/block/backup.c +++ b/block/backup.c @@ -50,14 +50,13 @@ typedef struct BackupBlockJob { uint64_t len; uint64_t bytes_read; int64_t cluster_size; -bool compress; NotifierWithReturn before_write; QLIST_HEAD(, CowRequest) inflight_reqs; bool use_copy_range; int64_t copy_range_size; -bool serialize_target_writes; +BdrvRequestFlags write_flags; bool initializing_bitmap; } BackupBlockJob; @@ -113,10 +112,6 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job, BlockBackend *blk = job->common.blk; int nbytes; int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0; -int write_flags = -(job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0) | -(job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0); - assert(QEMU_IS_ALIGNED(start, job->cluster_size)); bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size); @@ -135,7 +130,7 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job, } ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer, -write_flags); +job->write_flags); if (ret < 0) { trace_backup_do_cow_write_fail(job, start, ret); if (error_is_read) { @@ -163,7 +158,6 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job, BlockBackend *blk = job->common.blk; int nbytes; int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0; -int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0; assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size)); assert(QEMU_IS_ALIGNED(start, job->cluster_size)); @@ -172,7 +166,7 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job, bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size * nr_clusters); ret = blk_co_copy_range(blk, start, job->target, start, nbytes, -read_flags, write_flags); +read_flags, job->write_flags); if (ret < 0) { trace_backup_do_cow_copy_range_fail(job, start, ret); bdrv_set_dirty_bitmap(job->copy_bitmap, start, @@ -748,10 +742,16 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs, job->sync_mode = sync_mode; job->sync_bitmap = sync_bitmap; job->bitmap_mode = bitmap_mode; -job->compress = compress; -/* Detect image-fleecing (and similar) schemes */ -job->serialize_target_writes = bdrv_chain_contains(target, bs); +/* + * Set write flags: + * 1. Detect image-fleecing (and similar) schemes + * 2. Handle compression + */ +job->write_flags = +(bdrv_chain_contains(target, bs) ? BDRV_REQ_SERIALISING : 0) | +(compress ? BDRV_REQ_WRITE_COMPRESSED : 0); + job->cluster_size = cluster_size; job->copy_bitmap = copy_bitmap; copy_bitmap = NULL; -- 2.18.0
[Qemu-devel] [PATCH v3 0/7] backup improvements
Hi all! There are some fixes and refactorings I need on my way to resend my backup-top series. It's obvious now that I need to share copying code between backup and backup-top, as backup copying code becomes smarter and more complicated. So the goal of the series is to make copying code more share-able. Based-on: https://github.com/jnsnow/qemu bitmaps v3: 03: Ha, fix real bug, we shouldn't touch src before handling write-zero, as src may be NULL. So, replace align and max_transfer calculation together with max_transfer == 0 check. Drop comment, as there is no special place for it now.. 04: add Max's r-b: 06-07: rewrite to keep bounce_buffer sharing between iterations and to limit allocation [Eric] v2 (by Max's comments): 02: Add Max's r-b 03: - split out backup changes to 03 - handle common max_transfer = 0 case 04: splat out from 02 06: fix allocation size 07: - rebase on 06 changes - add Max's r-b two patches are dropped or postponed for the next step Vladimir Sementsov-Ogievskiy (7): block/backup: deal with zero detection block/backup: refactor write_flags block/io: handle alignment and max_transfer for copy_range block/backup: drop handling of max_transfer for copy_range block/backup: fix backup_cow_with_offload for last cluster block/backup: teach backup_cow_with_bounce_buffer to copy more at once block/backup: merge duplicated logic into backup_do_cow block/backup.c | 154 ++--- block/io.c | 44 +++--- blockdev.c | 8 +-- 3 files changed, 110 insertions(+), 96 deletions(-) -- 2.18.0
[Qemu-devel] [PATCH v3 1/7] block/backup: deal with zero detection
We have detect_zeroes option, so at least for blockdev-backup user should define it if zero-detection is needed. For drive-backup leave detection enabled by default but do it through existing option instead of open-coding. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Max Reitz --- block/backup.c | 15 ++- blockdev.c | 8 2 files changed, 10 insertions(+), 13 deletions(-) diff --git a/block/backup.c b/block/backup.c index adc4d44244..d815436455 100644 --- a/block/backup.c +++ b/block/backup.c @@ -113,7 +113,10 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job, BlockBackend *blk = job->common.blk; int nbytes; int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0; -int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0; +int write_flags = +(job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0) | +(job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0); + assert(QEMU_IS_ALIGNED(start, job->cluster_size)); bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size); @@ -131,14 +134,8 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job, goto fail; } -if (buffer_is_zero(*bounce_buffer, nbytes)) { -ret = blk_co_pwrite_zeroes(job->target, start, - nbytes, write_flags | BDRV_REQ_MAY_UNMAP); -} else { -ret = blk_co_pwrite(job->target, start, -nbytes, *bounce_buffer, write_flags | -(job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0)); -} +ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer, +write_flags); if (ret < 0) { trace_backup_do_cow_write_fail(job, start, ret); if (error_is_read) { diff --git a/blockdev.c b/blockdev.c index 29c6c6044a..2d7e7be538 100644 --- a/blockdev.c +++ b/blockdev.c @@ -3613,7 +3613,7 @@ static BlockJob *do_drive_backup(DriveBackup *backup, JobTxn *txn, BlockDriverState *source = NULL; BlockJob *job = NULL; AioContext *aio_context; -QDict *options = NULL; +QDict *options; Error *local_err = NULL; int flags; int64_t size; @@ -3686,10 +3686,10 @@ static BlockJob *do_drive_backup(DriveBackup *backup, JobTxn *txn, goto out; } +options = qdict_new(); +qdict_put_str(options, "discard", "unmap"); +qdict_put_str(options, "detect-zeroes", "unmap"); if (backup->format) { -if (!options) { -options = qdict_new(); -} qdict_put_str(options, "driver", backup->format); } -- 2.18.0
Re: [Qemu-devel] [PATCH v3 00/29] Tame a few "touch this, recompile the world" headers
Philippe Mathieu-Daudé writes: > Hi Markus, > > On 8/9/19 8:46 AM, Markus Armbruster wrote: >> We have quite a few "touch this, recompile the world" headers. My >> "build everything" tree has some 6600 objects (not counting tests and >> objects that don't depend on qemu/osdep.h). Touching any of 54 >> headers triggers a recompile of more than half of them. >> >> This series reduces them to 46. [...] >> Observed patterns of #include misuse: >> >> * Copy pasta >> >> I found and deleted quite a few #include that were almost certainly >> never needed. The most likely explanation is lazy copying from a >> "similar" file. My deletions produced only minor improvements, >> though. >> >> * "Convenience" headers >> >> We sometimes have a header include a bunch of other headers the >> header itself doesn't need, so the header's users don't have to. An >> extreme case is hw/hw.h: it pulls in more than 40 other headers, >> then declares just hw_error(). Most of its users need only a >> fraction of it. PATCH 08-09,12-18 fix that, trading the very >> occasional convenience of not having to type a few #include >> directives for build speed. >> >> * "Fat" headers >> >> Some headers provide many things to many customers. Bad when the >> customers generally need only parts. Worse when such a "fat" header >> pulls in loads more. This series grapples with three instances: >> qapi/qapi-types-common.h (PATCH 03), hw/boards.h, which pulls in >> almost 70 headers (PATCH 19-23), and sysemu/sysemu.h, which pulls in >> more than 20 (PATCH 23-28). >> >> * Design erosion >> >> Off-the-cuff additions to headers can erode design. For instance, >> the generated trace.h were carefully designed for minimal >> dependencies. We let them balloon when we added the per-vCPU >> tracing feature a few years later. PATCH 07 grapples with that. > > What can prevent us from these misuse patterns? Excellent question. > Will you redo this analysis/series after each releases? Perhaps I should first explain my analysis, i.e. where my numbers come from, then my cleanups. Counting #include directives is not useful by itself. For instance, #include "qemu/typedefs.h" occurs just once, yet almost all .o depend on it. Better: count the actual make dependencies. Compiling FOO.c generates FOO.d in addition to FOO.o. These .d list the .h the .o depend on. I wrote a stupid Python program to count how many .o depend on each .h. This identifies widely included headers. Also useful are the .d for the .c generated by "make check-headers": they tell us how many headers each header pulls in. Widely included headers that pull in lots more are particularly promising candidates for #include cleanup. The actual cleanup is manual work. Getting rid of copy pasta and "convenience" headers is pretty straightforward. "Fat" headers and design erosion can require more thought. Another difficult part is identifying how to order the cleanups for reviewability. Testing is straightforward, but slw; much world-recompiling for many, many worlds. I started looking into this back in 2016[1], and updated the analysis recently[2]. I contributed a few cleanups myself, and at least Paolo also chipped in. I can update the analysis more frequently if that helps. However, ... > How to automate misuse checks? > > Can we report some metrics and warn if there a considerable discrepancy? ... detecting we've made things worse weeks or months after we did is clearly suboptimal. The actual numbers depend on build configuration. For reproducible numbers, we need to hold configuration constant for all developers. I figure our containerized builds could do that for us. To detect metrics going south, we need a baseline, and a definition of "south". If we store the baseline in git and fail the test when the actual numbers are too "south" of the baseline, patches making things a lot worse will have to update the baseline. This should ensure proper scrutiny. However, a slow slide south over multiple independent patches will arbitrary blame the last one, while the others get off scot free. I'm not sure how to best handle new headers. Having to update the baseline whenever you add a new header will likely add too much friction. Needs thought. Perhaps we can start with fully automating the measurement part, then passively gather data for a while to learn how the numbers change during development. [1] https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg03271.html [2] https://lists.nongnu.org/archive/html/qemu-devel/2019-05/msg06291.html
Re: [Qemu-devel] [PATCH v6 24/42] block: Use child access functions for QAPI queries
09.08.2019 19:13, Max Reitz wrote: > query-block, query-named-block-nodes, and query-blockstats now return > any filtered child under "backing", not just bs->backing or COW > children. This is so that filters do not interrupt the reported backing > chain. This changes the output for iotest 184, as the throttled node > now appears as a backing child. > > Signed-off-by: Max Reitz > --- > block/qapi.c | 39 +++--- > tests/qemu-iotests/184.out | 7 ++- > 2 files changed, 30 insertions(+), 16 deletions(-) > > diff --git a/block/qapi.c b/block/qapi.c > index 9a185cba48..4f59ac1c0f 100644 > --- a/block/qapi.c > +++ b/block/qapi.c [..] > @@ -354,9 +357,9 @@ static void bdrv_query_info(BlockBackend *blk, BlockInfo > **p_info, > BlockDriverState *bs = blk_bs(blk); > char *qdev; > > -/* Skip automatically inserted nodes that the user isn't aware of */ > -while (bs && bs->drv && bs->implicit) { > -bs = backing_bs(bs); > +if (bs) { > +/* Skip automatically inserted nodes that the user isn't aware of */ > +bs = bdrv_skip_implicit_filters(bs); > } bdrv_skip_implicit_filters supports NULL, so it may be written without "if" Anyway: Reviewed-by: Vladimir Sementsov-Ogievskiy [..] -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback
09.08.2019 19:13, Max Reitz wrote: > If the driver does not implement bdrv_get_allocated_file_size(), we > should fall back to cumulating the allocated size of all non-COW > children instead of just bs->file. > > Suggested-by: Vladimir Sementsov-Ogievskiy > Signed-off-by: Max Reitz > --- > block.c | 22 -- > 1 file changed, 20 insertions(+), 2 deletions(-) > > diff --git a/block.c b/block.c > index 1070aa1ba9..6e1ddab056 100644 > --- a/block.c > +++ b/block.c > @@ -4650,9 +4650,27 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState > *bs) > if (drv->bdrv_get_allocated_file_size) { > return drv->bdrv_get_allocated_file_size(bs); > } > -if (bs->file) { > -return bdrv_get_allocated_file_size(bs->file->bs); > + > +if (!QLIST_EMPTY(>children)) { > +BdrvChild *child; > +int64_t child_size, total_size = 0; > + > +QLIST_FOREACH(child, >children, next) { > +if (child == bdrv_filtered_cow_child(bs)) { > +/* Ignore COW backing files */ > +continue; > +} > + > +child_size = bdrv_get_allocated_file_size(child->bs); > +if (child_size < 0) { > +return child_size; > +} > +total_size += child_size; > +} > + > +return total_size; > } > + > return -ENOTSUP; > } > > Hmm.. 1. No children -> -ENOTSUP 2. Only cow child -> 0 3. Some non-cow children -> SUM It's all arguable (the strictest way is -ENOTSUP in either case), but if we want to fallback to SUM of non-cow children, 1. and 2. should return the same. -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v6 20/42] block/snapshot: Fix fallback
09.08.2019 19:13, Max Reitz wrote: > If the top node's driver does not provide snapshot functionality and we > want to fall back to a node down the chain, we need to snapshot all > non-COW children. For simplicity's sake, just do not fall back if there > is more than one such child. > > bdrv_snapshot_goto() becomes a bit weird because we may have to redirect > the actual child pointer, so it only works if the fallback child is > bs->file or bs->backing (and then we have to find out which it is). > > Suggested-by: Vladimir Sementsov-Ogievskiy > Signed-off-by: Max Reitz > --- > block/snapshot.c | 100 +-- > 1 file changed, 79 insertions(+), 21 deletions(-) > > diff --git a/block/snapshot.c b/block/snapshot.c > index f2f48f926a..35403c167f 100644 > --- a/block/snapshot.c > +++ b/block/snapshot.c > @@ -146,6 +146,32 @@ bool bdrv_snapshot_find_by_id_and_name(BlockDriverState > *bs, > return ret; > } > > +/** > + * Return the child BDS to which we can fall back if the given BDS > + * does not support snapshots. > + * Return NULL if there is no BDS to (safely) fall back to. > + */ > +static BlockDriverState *bdrv_snapshot_fallback(BlockDriverState *bs) > +{ > +BlockDriverState *child_bs = NULL; > +BdrvChild *child; > + > +QLIST_FOREACH(child, >children, next) { > +if (child == bdrv_filtered_cow_child(bs)) { > +/* Ignore: COW children need not be included in snapshots */ > +continue; > +} > + > +if (child_bs) { > +/* Cannot fall back to a single child if there are multiple */ > +return NULL; > +} > +child_bs = child->bs; > +} > + > +return child_bs; > +} > + > int bdrv_can_snapshot(BlockDriverState *bs) > { > BlockDriver *drv = bs->drv; > @@ -154,8 +180,9 @@ int bdrv_can_snapshot(BlockDriverState *bs) > } > > if (!drv->bdrv_snapshot_create) { > -if (bs->file != NULL) { > -return bdrv_can_snapshot(bs->file->bs); > +BlockDriverState *fallback_bs = bdrv_snapshot_fallback(bs); > +if (fallback_bs) { > +return bdrv_can_snapshot(fallback_bs); > } > return 0; > } > @@ -167,14 +194,15 @@ int bdrv_snapshot_create(BlockDriverState *bs, >QEMUSnapshotInfo *sn_info) > { > BlockDriver *drv = bs->drv; > +BlockDriverState *fallback_bs = bdrv_snapshot_fallback(bs); > if (!drv) { > return -ENOMEDIUM; > } > if (drv->bdrv_snapshot_create) { > return drv->bdrv_snapshot_create(bs, sn_info); > } > -if (bs->file) { > -return bdrv_snapshot_create(bs->file->bs, sn_info); > +if (fallback_bs) { > +return bdrv_snapshot_create(fallback_bs, sn_info); > } > return -ENOTSUP; > } > @@ -184,6 +212,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs, > Error **errp) > { > BlockDriver *drv = bs->drv; > +BlockDriverState *fallback_bs; > int ret, open_ret; > > if (!drv) { > @@ -204,39 +233,66 @@ int bdrv_snapshot_goto(BlockDriverState *bs, > return ret; > } > > -if (bs->file) { > -BlockDriverState *file; > -QDict *options = qdict_clone_shallow(bs->options); > +fallback_bs = bdrv_snapshot_fallback(bs); > +if (fallback_bs) { > +QDict *options; > QDict *file_options; > Error *local_err = NULL; > +bool is_backing_child; > +BdrvChild **child_pointer; > + > +/* > + * We need a pointer to the fallback child pointer, so let us > + * see whether the child is referenced by a field in the BDS > + * object. > + */ > +if (fallback_bs == bs->file->bs) { > +is_backing_child = false; > +child_pointer = >file; > +} else if (fallback_bs == bs->backing->bs) { > +is_backing_child = true; > +child_pointer = >backing; > +} else { > +/* > + * The fallback child is not referenced by a field in the > + * BDS object. We cannot go on then. > + */ > +error_setg(errp, "Block driver does not support snapshots"); > +return -ENOTSUP; > +} > + Hmm.. Should not this check be included into bdrv_snapshot_fallback(), to work only with file and backing? And could we allow fallback only for filters? Is there real usecase except filters? Or may be, drop fallback at all? > > -file = bs->file->bs; > /* Prevent it from getting deleted when detached from bs */ > -bdrv_ref(file); > +bdrv_ref(fallback_bs); > > -qdict_extract_subqdict(options, _options, "file."); > +qdict_extract_subqdict(options, _options, > + is_backing_child ? "backing." : "file."); > qobject_unref(file_options); > -
Re: [Qemu-devel] [PATCH v6 18/42] block: Use CAFs in bdrv_refresh_filename()
09.08.2019 19:13, Max Reitz wrote: > bdrv_refresh_filename() and the kind of related bdrv_dirname() should > look to the primary child when they wish to copy the underlying file's > filename. > > Signed-off-by: Max Reitz Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v6 15/42] block: Re-evaluate backing file handling in reopen
09.08.2019 19:13, Max Reitz wrote: > Reopening a node's backing child needs a bit of special handling because > the "backing" child has different defaults than all other children > (among other things). Adding filter support here is a bit more > difficult than just using the child access functions. In fact, we often > have to directly use bs->backing because these functions are about the > "backing" child (which may or may not be the COW backing file). > > Signed-off-by: Max Reitz Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v6 16/42] block: Flush all children in generic code
09.08.2019 19:13, Max Reitz wrote: > If the driver does not support .bdrv_co_flush() so bdrv_co_flush() > itself has to flush the children of the given node, it should not flush > just bs->file->bs, but in fact all children. > > In any case, the BLKDBG_EVENT() should be emitted on the primary child, > because that is where a blkdebug node would be if there is any. > > Suggested-by: Vladimir Sementsov-Ogievskiy > Signed-off-by: Max Reitz > --- > block/io.c | 23 +-- > 1 file changed, 17 insertions(+), 6 deletions(-) > > diff --git a/block/io.c b/block/io.c > index c5a8e3e6a3..bcc770d336 100644 > --- a/block/io.c > +++ b/block/io.c > @@ -2572,6 +2572,8 @@ static void coroutine_fn bdrv_flush_co_entry(void > *opaque) > > int coroutine_fn bdrv_co_flush(BlockDriverState *bs) > { > +BdrvChild *primary_child = bdrv_primary_child(bs); > +BdrvChild *child; > int current_gen; > int ret = 0; > > @@ -2601,7 +2603,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs) > } > > /* Write back cached data to the OS even with cache=unsafe */ > -BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_OS); > +BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_OS); > if (bs->drv->bdrv_co_flush_to_os) { > ret = bs->drv->bdrv_co_flush_to_os(bs); > if (ret < 0) { > @@ -2611,15 +2613,15 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs) > > /* But don't actually force it to the disk with cache=unsafe */ > if (bs->open_flags & BDRV_O_NO_FLUSH) { > -goto flush_parent; > +goto flush_children; > } > > /* Check if we really need to flush anything */ > if (bs->flushed_gen == current_gen) { > -goto flush_parent; > +goto flush_children; > } > > -BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK); > +BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_DISK); > if (!bs->drv) { > /* bs->drv->bdrv_co_flush() might have ejected the BDS >* (even in case of apparent success) */ > @@ -2663,8 +2665,17 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs) > /* Now flush the underlying protocol. It will also have BDRV_O_NO_FLUSH >* in the case of cache=unsafe, so there are no useless flushes. >*/ > -flush_parent: > -ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0; > +flush_children: > +ret = 0; > +QLIST_FOREACH(child, >children, next) { > +int this_child_ret; > + > +this_child_ret = bdrv_co_flush(child->bs); > +if (!ret) { > +ret = this_child_ret; > +} > +} Hmm, you said that we want to flush only children with write-access from parent.. Shouldn't we check it? Or we assume that it's always safe to call bdrv_co_flush on a node? > + > out: > /* Notify any pending flushes that we have completed */ > if (ret == 0) { > -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v6 14/42] block: Use CAFs when working with backing chains
09.08.2019 19:13, Max Reitz wrote: > Use child access functions when iterating through backing chains so > filters do not break the chain. > > Signed-off-by: Max Reitz Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [Qemu-devel] RISC-V: Vector && DSP Extension
On 8/9/19 6:54 PM, Alistair Francis wrote: On Thu, Aug 8, 2019 at 2:52 AM liuzhiwei wrote: Hi all, My workmate and I have been working on Vector & Dsp extension, and I'd like to share develop status with folks. Cool! The spec references for Vector extension is riscv-v-spec-0.7.1, and riscv-p-spec-0.5 for DSP extension. The code of vector extension is ready and under testing, the first patch will be sent about two weeks later. After that we will forward working on DSP extension, and send the first patch in middle October. What code are you talking about? Is this QEMU code? Hi Alistair, It's the QEMU code I have been working on these days, which implements Vector extension. It is under testing, and will be sent later. Could the maintainers tell me whether the specs referenced are appropriate? Is anyone working on these extensions? I'd like to get your status, and maybe discuss questions and work togather. Just use the latest (master) from the ISA spec git repo. I will follow your advice.Thanks for your attention to this matter. Best Regards, Zhiwei I don't know anyone doing vector work for QEMU. It would be very useful, but everyone is busy with something at the moment unfortunately. Alistair Best Regards LIU Zhiwei
Re: [Qemu-devel] [PATCH v6 10/42] block: Drop bdrv_is_encrypted()
09.08.2019 19:13, Max Reitz wrote: > The original purpose of bdrv_is_encrypted() was to inquire whether a BDS > can be used without the user entering a password or not. It has not > been used for that purpose for quite some time. > > Actually, it is not even fit for that purpose, because to answer that > question, it would have recursively query all of the given node's > children. > > So now we have to decide in which direction we want to fix > bdrv_is_encrypted(): Recursively query all children, or drop it and just > use bs->encrypted to get the current node's status? > > Nowadays, its only purpose is to report through bdrv_query_image_info() > whether the given image is encrypted or not. For this purpose, it is > probably more interesting to see whether a given node itself is > encrypted or not (otherwise, a management application cannot discern for > certain which nodes are really encrypted and which just have encrypted > children). > > Suggested-by: Vladimir Sementsov-Ogievskiy > Signed-off-by: Max Reitz Reviewed-by: Vladimir Sementsov-Ogievskiy -- Best regards, Vladimir
Re: [Qemu-devel] RISC-V: Vector && DSP Extension
On 8/8/19 6:48 AM, Chih-Min Chao wrote: On Thu, Aug 8, 2019 at 7:29 PM Aleksandar Markovic mailto:aleksandar.m.m...@gmail.com>> wrote: On Thu, Aug 8, 2019 at 11:52 AM liuzhiwei mailto:zhiwei_...@c-sky.com>> wrote: > Hi all, > > My workmate and I have been working on Vector & Dsp extension, and > I'd like to share develop status with folks. > > The spec references for Vector extension is riscv-v-spec-0.7.1, and > riscv-p-spec-0.5 for DSP extension. Hello, Liu. I will not answer your questions directly, however I want to bring to you and others another perspective on this situation. First, please provide the link to the specifications. Via Google, I found that "riscv-v-spec-0.7.1" is titled "Working draft of the proposed RISC-V V vector extension". I could not find "riscv-p-spec-0.5". I am not sure what the QEMU policy towards "working draft proposal" type of specification is. Peter, can you perhaps clarify that or any other related issue? Hi Aleksandar, As for riscv-v-spec 0.7.1, it is first stable spec for target software development though the name is working draft. The architecture skeleton is fix and most of work are focusing the issues related to micro-architecture implementation complexity. Sifive has released an open source implementation on spike simulation and Imperas also provides another implementation with its binary simulator. I think it is worth to include the extension in Qemu at this moment. As for riscv-p-spec-0.5, I think Andes has fully supported this extension and should release more detailed spec in the near future (described Riscv Technical Update 2019/06). They have implement lots of DSP kernel based on this extension and also provided impressed performance result. It is also worth to be reviewed (at least [RFC]) if the detailed spec is public. ref: 1. https://content.riscv.org/wp-content/uploads/2019/06/17.40-Vector_RISCV-20190611-Vectors.pdf 2. https://content.riscv.org/wp-content/uploads/2019/06/17.20-P-ext-RVW-Zurich-20190611.pdf 3. https://content.riscv.org/wp-content/uploads/2019/06/10.05-TechCommitteeUpdate-June-2019-Copy.pdf chihmin Hi chihmin, Thank you for the detailed and informative response. You have a very good understanding of the specifications. I will modify the code according to the latest spec(currently riscv-v-spec 0.7.2) from the ISA spec git repo as Alistair advised. Yours, Zhiwei I would advice some caution in these cases. The major issue is backward compatibility, but there are other issues too. Let's say, fairness. If we let emulation of a component based on a "working draft proposal" be integrated into QEMU, this will set a precedent, and many other developer would rightfully ask for their contributions based on drafts to be integrated into QEMU. Our policy should be as equal as possible to all contribution, large or small, riscv or alpha, cpu or device, tcg or kvm - in my honest opinion. QEMU upstream should not be a collecting place for all imaginable experimentations, certain criteria on what is appropriate for upstreaming exist and must continue to exist. Yours, Aleksandar > The code of vector extension is > ready and under testing, the first patch will be sent about two weeks > later. After that we will forward working on DSP extension, and send the > first patch in middle October. > > Could the maintainers tell me whether the specs referenced are > appropriate? Is anyone working on these extensions? I'd like to get > your status, and maybe discuss questions and work togather. > > Best Regards > > LIU Zhiwei > > > >
Re: [Qemu-devel] [PATCH v6 09/42] block: Include filters when freezing backing chain
09.08.2019 19:13, Max Reitz wrote: > In order to make filters work in backing chains, the associated > functions must be able to deal with them and freeze all filter links, be > they COW or R/W filter links. > > In the process, rename these functions to reflect that they now act on > generalized chains of filter nodes instead of backing chains alone. > > While at it, add some comments that note which functions require their > caller to ensure that a given child link is not frozen, and how the > callers do so. > > Signed-off-by: Max Reitz > --- > include/block/block.h | 10 +++--- > block.c | 81 +-- > block/commit.c| 8 ++--- > block/mirror.c| 4 +-- > block/stream.c| 8 ++--- > 5 files changed, 62 insertions(+), 49 deletions(-) > > diff --git a/include/block/block.h b/include/block/block.h > index 50a07c1c33..f6f09b95cd 100644 > --- a/include/block/block.h > +++ b/include/block/block.h > @@ -364,11 +364,11 @@ int bdrv_drop_intermediate(BlockDriverState *top, > BlockDriverState *base, > BlockDriverState *bdrv_find_overlay(BlockDriverState *active, > BlockDriverState *bs); > BlockDriverState *bdrv_find_base(BlockDriverState *bs); > -bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState > *base, > - Error **errp); > -int bdrv_freeze_backing_chain(BlockDriverState *bs, BlockDriverState *base, > - Error **errp); > -void bdrv_unfreeze_backing_chain(BlockDriverState *bs, BlockDriverState > *base); > +bool bdrv_is_chain_frozen(BlockDriverState *bs, BlockDriverState *base, > + Error **errp); > +int bdrv_freeze_chain(BlockDriverState *bs, BlockDriverState *base, > + Error **errp); > +void bdrv_unfreeze_chain(BlockDriverState *bs, BlockDriverState *base); > > > typedef struct BdrvCheckResult { > diff --git a/block.c b/block.c > index adf82efb0e..650c00d182 100644 > --- a/block.c > +++ b/block.c > @@ -2303,12 +2303,15 @@ static void bdrv_replace_child_noperm(BdrvChild > *child, >* If @new_bs is not NULL, bdrv_check_perm() must be called beforehand, as > this >* function uses bdrv_set_perm() to update the permissions according to the > new >* reference that @new_bs gets. > + * > + * Callers must ensure that child->frozen is false. >*/ > static void bdrv_replace_child(BdrvChild *child, BlockDriverState *new_bs) > { > BlockDriverState *old_bs = child->bs; > uint64_t perm, shared_perm; > > +/* Asserts that child->frozen == false */ > bdrv_replace_child_noperm(child, new_bs); > > /* > @@ -2468,6 +2471,7 @@ static void bdrv_detach_child(BdrvChild *child) > g_free(child); > } > > +/* Callers must ensure that child->frozen is false. */ > void bdrv_root_unref_child(BdrvChild *child) > { > BlockDriverState *child_bs; > @@ -2477,10 +2481,6 @@ void bdrv_root_unref_child(BdrvChild *child) > bdrv_unref(child_bs); > } > > -/** > - * Clear all inherits_from pointers from children and grandchildren of > - * @root that point to @root, where necessary. > - */ Hmm, unrelated chunk? Without it: Reviewed-by: Vladimir Sementsov-Ogievskiy > static void bdrv_unset_inherits_from(BlockDriverState *root, BdrvChild > *child) > { > BdrvChild *c; > @@ -2505,6 +2505,7 @@ static void bdrv_unset_inherits_from(BlockDriverState > *root, BdrvChild *child) [..] -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v2 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
10.08.2019 15:47, Eric Blake wrote: > On 8/10/19 7:12 AM, Vladimir Sementsov-Ogievskiy wrote: >> 09.08.2019 18:59, Eric Blake wrote: >>> On 8/9/19 10:32 AM, Vladimir Sementsov-Ogievskiy wrote: backup_cow_with_offload can transfer more than on cluster. Let >>> >>> s/on/one/ >>> backup_cow_with_bounce_buffer behave similarly. It reduces number of IO and there are no needs to copy cluster by cluster. >>> >>> It reduces the number of IO requests, since there is no need to copy >>> cluster by cluster. > - bool *error_is_read, - void **bounce_buffer) + bool *error_is_read) >>> >>> Why is this signature changing? >>> > >> >> 2. Actually it is a question about memory allocator: is it fast and optimized >> enough to not care, or is it bad, and we should work-around it like in >> mirror? And in my opinion (proved by a bit of benchmarking) memalign >> is fast enough to don't care. I was answering similar question in more >> details >> and with some numbers here: >> https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg00087.html >> >> So, I'd prefere to not care and keep code simpler. But if you don't agree, >> I can leave shared buffer here, at least until introduction of parallel >> requests. >> Then, it will be something like qemu_try_blockalign(MIN(bytes, 64M)).. > > It may still be worth capping at 64M. I'm not opposed to a change to > per-request allocation rather than trying to reuse a buffer, if it is > going to make parallelization efforts easier; but I am worried about the > maximum memory usage. I'm more worried that you are trying to cram two > distinct changes into one patch, and didn't even mention the change > about a change from buffer reuse to per-request allocations, in the > commit message. If you make that sort of change, it should be called > out in the commit message as intentional, or maybe even split to a > separate patch. > OK, I failed to hide it :) Will split out and add 64M limit. -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v2 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
On 8/10/19 7:12 AM, Vladimir Sementsov-Ogievskiy wrote: > 09.08.2019 18:59, Eric Blake wrote: >> On 8/9/19 10:32 AM, Vladimir Sementsov-Ogievskiy wrote: >>> backup_cow_with_offload can transfer more than on cluster. Let >> >> s/on/one/ >> >>> backup_cow_with_bounce_buffer behave similarly. It reduces number >>> of IO and there are no needs to copy cluster by cluster. >> >> It reduces the number of IO requests, since there is no need to copy >> cluster by cluster. >>> - bool *error_is_read, >>> - void **bounce_buffer) >>> + bool *error_is_read) >> >> Why is this signature changing? >> > > 2. Actually it is a question about memory allocator: is it fast and optimized > enough to not care, or is it bad, and we should work-around it like in > mirror? And in my opinion (proved by a bit of benchmarking) memalign > is fast enough to don't care. I was answering similar question in more details > and with some numbers here: > https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg00087.html > > So, I'd prefere to not care and keep code simpler. But if you don't agree, > I can leave shared buffer here, at least until introduction of parallel > requests. > Then, it will be something like qemu_try_blockalign(MIN(bytes, 64M)).. It may still be worth capping at 64M. I'm not opposed to a change to per-request allocation rather than trying to reuse a buffer, if it is going to make parallelization efforts easier; but I am worried about the maximum memory usage. I'm more worried that you are trying to cram two distinct changes into one patch, and didn't even mention the change about a change from buffer reuse to per-request allocations, in the commit message. If you make that sort of change, it should be called out in the commit message as intentional, or maybe even split to a separate patch. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v2 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once
09.08.2019 18:59, Eric Blake wrote: > On 8/9/19 10:32 AM, Vladimir Sementsov-Ogievskiy wrote: >> backup_cow_with_offload can transfer more than on cluster. Let > > s/on/one/ > >> backup_cow_with_bounce_buffer behave similarly. It reduces number >> of IO and there are no needs to copy cluster by cluster. > > It reduces the number of IO requests, since there is no need to copy > cluster by cluster. > >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy >> --- >> block/backup.c | 29 +++-- >> 1 file changed, 15 insertions(+), 14 deletions(-) >> >> diff --git a/block/backup.c b/block/backup.c >> index d482d93458..155e21d0a3 100644 >> --- a/block/backup.c >> +++ b/block/backup.c >> @@ -104,22 +104,25 @@ static int coroutine_fn >> backup_cow_with_bounce_buffer(BackupBlockJob *job, >> int64_t start, >> int64_t end, >> bool >> is_write_notifier, >> - bool *error_is_read, >> - void **bounce_buffer) >> + bool *error_is_read) > > Why is this signature changing? > >> { >> int ret; >> BlockBackend *blk = job->common.blk; >> int nbytes; >> int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0; >> +void *bounce_buffer; >> >> assert(QEMU_IS_ALIGNED(start, job->cluster_size)); >> -bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size); >> -nbytes = MIN(job->cluster_size, job->len - start); >> -if (!*bounce_buffer) { >> -*bounce_buffer = blk_blockalign(blk, job->cluster_size); > > Pre-patch, you allocate the bounce_buffer at most once (but limited to a > cluster size), post-patch, you are now allocating and freeing a bounce > buffer every iteration through. There may be fewer calls because you > are allocating something bigger, but still, isn't it a good goal to try > and allocate the bounce buffer as few times as possible and reuse it, > rather than getting a fresh one each iteration? Yes, it's a "degradation" of this patch, I was afraid of this question. However, I doubt that it should be optimized: 1. I'm going to run several copy requests in coroutines to parallelize copying loop, to improve performance (series for qcow2 are here https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg06654.html), so we'll need several buffers for parallel copying requests and it's extremely easier to allocate buffer when it's needed and free after it, than do some allocated memory sharing like in mirror. 2. Actually it is a question about memory allocator: is it fast and optimized enough to not care, or is it bad, and we should work-around it like in mirror? And in my opinion (proved by a bit of benchmarking) memalign is fast enough to don't care. I was answering similar question in more details and with some numbers here: https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg00087.html So, I'd prefere to not care and keep code simpler. But if you don't agree, I can leave shared buffer here, at least until introduction of parallel requests. Then, it will be something like qemu_try_blockalign(MIN(bytes, 64M)).. > >> + >> +nbytes = MIN(end - start, job->len - start); > > What is the largest this will be? Will it exhaust memory on a 32-bit or > otherwise memory-constrained system, where you should have some maximum > cap for how large the bounce buffer will be while still getting better > performance than one cluster at a time and not slowing things down by > over-sizing malloc? 64M, maybe? > Hmm, good point. I thought that it's safe to allocate buffer for a full request, as if such buffer is already allocated for the guest request itself, it should not be bad to allocate another one with same size. But I forgot about write-zeros and discard operations which may be large without any allocation and here we need to allocate anyway. Hmm2, but we have parallel guest writes(discards) and therefore parallel copy-before-write operations. So total allocation is not limited anyway and it should be fixed somehow too.. -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH] iotests: Fix 141 when run with qed
09.08.2019 21:52, Max Reitz wrote: > 69f47505ee has changed qcow2 in such a way that the commit job run in > test 141 (and 144[1]) returns before it emits the READY event. However, > 141 also runs with qed, where the order is still the other way around. > Just filter out the {"return": {}} so the test passes for qed again. > > [1] 144 only runs with qcow2, so it is fine as it is. > > Suggested-by: Vladimir Sementsov-Ogievskiy > Fixes: 69f47505ee66afaa513305de0c1895a224e52c45 > Signed-off-by: Max Reitz Hm, not exactly remember, but these three lines looks like you are doing my work, I'm sorry for this :( Reviewed-by: Vladimir Sementsov-Ogievskiy > --- > tests/qemu-iotests/141 | 9 +++-- > tests/qemu-iotests/141.out | 5 - > tests/qemu-iotests/common.filter | 5 + > 3 files changed, 12 insertions(+), 7 deletions(-) > > diff --git a/tests/qemu-iotests/141 b/tests/qemu-iotests/141 > index 2197a82d45..8c2ae79f2b 100755 > --- a/tests/qemu-iotests/141 > +++ b/tests/qemu-iotests/141 > @@ -58,16 +58,21 @@ test_blockjob() > }}}" \ > 'return' > > +# If "$2" is an event, we may or may not see it before the > +# {"return": {}}. Therefore, filter the {"return": {}} out both > +# here and in the next command. (Naturally, if we do not see it Here "it" is not event but "{"return": {}}", which is a bit confusing, as previously we speak about seeing event.. > +# here, we will see it before the next command can be executed, > +# so it will appear in the next _send_qemu_cmd's output.) > _send_qemu_cmd $QEMU_HANDLE \ > "$1" \ > "$2" \ > -| _filter_img_create > +| _filter_img_create | _filter_qmp_empty_return > > # We want this to return an error because the block job is still running > _send_qemu_cmd $QEMU_HANDLE \ > "{'execute': 'blockdev-del', > 'arguments': {'node-name': 'drv0'}}" \ > -'error' | _filter_generated_node_ids > +'error' | _filter_generated_node_ids | _filter_qmp_empty_return > > _send_qemu_cmd $QEMU_HANDLE \ > "{'execute': 'block-job-cancel', > diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out > index 4d71d9dcae..dbd3bdef6c 100644 > --- a/tests/qemu-iotests/141.out > +++ b/tests/qemu-iotests/141.out > @@ -10,7 +10,6 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 > backing_file=TEST_DIR/m. > Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 > backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": > "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}} > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": > "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}} > -{"return": {}} > {"error": {"class": "GenericError", "desc": "Node drv0 is in use"}} > {"return": {}} > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": > "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}} > @@ -27,7 +26,6 @@ Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 > backing_file=TEST_DIR/t. > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": > "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}} > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": > "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}} > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": > "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": > 0, "type": "mirror"}} > -{"return": {}} > {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block > device is in use by block job: mirror"}} > {"return": {}} > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": > "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}} > @@ -42,7 +40,6 @@ Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 > backing_file=TEST_DIR/t. > {"return": {}} > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": > "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}} > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": > "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}} > -{"return": {}} > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": > "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}} > {"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": > "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": > 0, "type": "commit"}} > {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block > device is in use by block job: commit"}} > @@ -61,7 +58,6 @@ wrote 1048576/1048576 bytes at offset 0 > {"return": {}} >
Re: [Qemu-devel] backup bug or question
09.08.2019 23:13, John Snow wrote: > > > On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote: >> Hi! >> >> Hmm, hacking around backup I have a question: >> >> What prevents guest write request after job_start but before setting >> write notifier? >> >> code path: >> >> qmp_drive_backup or transaction with backup >> >> job_start >> aio_co_enter(job_co_entry) /* may only schedule execution, isn't it >> ? */ >> >> >> >> job_co_entry >> job_pause_point() /* it definitely yields, isn't it bad? */ >> job->driver->run() /* backup_run */ >> >> >> >> backup_run() >> bdrv_add_before_write_notifier() >> >> ... >> > > I think you're right... :( > > > We create jobs like this: > > job->paused= true; > job->pause_count = 1; > > > And then job_start does this: > > job->co = qemu_coroutine_create(job_co_entry, job); > job->pause_count--; > job->busy = true; > job->paused = false; > > > Which means that job_co_entry is being called before we lift the pause: > > assert(job && job->driver && job->driver->run); > job_pause_point(job); > job->ret = job->driver->run(job, >err); > > ...Which means that we are definitely yielding in job_pause_point. > > Yeah, that's a race condition waiting to happen. > >> And what guarantees we give to the user? Is it guaranteed that write >> notifier is >> set when qmp command returns? >> >> And I guess, if we start several backups in a transaction it should be >> guaranteed >> that the set of backups is consistent and correspond to one point in time... >> > > I would have hoped that maybe the drain_all coupled with the individual > jobs taking drain_start and drain_end would save us, but I guess we > simply don't have a guarantee that all backup jobs WILL have installed > their handler by the time the transaction ends. > > Or, if there is that guarantee, I don't know what provides it, so I > think we shouldn't count on it accidentally working anymore. > > > > I think we should do two things: > > 1. Move the handler installation to creation time. > 2. Modify backup_before_write_notify to return without invoking > backup_do_cow if the job isn't started yet. > Hmm, I don't see, how it helps.. No-op write-notifier will not save as from guest write, is it? -- Best regards, Vladimir
Re: [Qemu-devel] [PATCH v1 1/7] fpu: move LIT64 helper to softfloat-types
Peter Maydell writes: > On Thu, 8 Aug 2019 at 17:41, Alex Bennée wrote: >> >> This simple pasting helper can be used by those who don't need the >> entire softfloat api. Move it to the smaller types header. >> >> Signed-off-by: Alex Bennée >> --- >> include/fpu/softfloat-types.h | 2 ++ >> include/fpu/softfloat.h | 2 -- >> 2 files changed, 2 insertions(+), 2 deletions(-) > > I think we should be trying to get rid of uses of this > macro, not making it easier to use in more places... True - that is a more invasive (but mechanical) patch. -- Alex Bennée