Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading preparation part 3

2019-08-10 Thread piaojun



On 2019/8/9 16:21, Stefan Hajnoczi wrote:
> On Thu, Aug 08, 2019 at 10:53:16AM +0100, Dr. David Alan Gilbert wrote:
>> * Stefan Hajnoczi (stefa...@redhat.com) wrote:
>>> On Wed, Aug 07, 2019 at 04:57:15PM -0400, Vivek Goyal wrote:
>>> 2. Can MAP/UNMAP be performed directly in QEMU via a separate virtqueue?
>>
>> I think there's two things to solve here that I don't currently know the
>> answer to:
>>   2a) We'd need to get the fd to qemu for the thing to mmap;
>>   we might be able to cache the fd on the qemu side for existing
>>   mappings, so when asking for a new mapping for an existing file then
>>   it would already have the fd.
>>
>>   2b) Running a device with a mix of queues inside QEMU and on
>>   vhost-user; I don't think we have anything with that mix
> 
> vhost-user-net works in the same way.  The ctrl queue is handled by QEMU
> and the rx/tx queues by the vhost device.  This is in fact how vhost was
> initially designed: the vhost device is not a full virtio device, only
> the dataplane.

Agreed.

> 
>>> 3. Can READ/WRITE be performed directly in QEMU via a separate virtqueue
>>>to eliminate the bad address problem?
>>
>> Are you thinking of doing all read/writes that way, or just the corner
>> cases? It doesn't seem worth it for the corner cases unless you're
>> finding them cropping up in real work loads.
> 
> Send all READ/WRITE requests to QEMU instead of virtiofsd.
> 
> Only handle metadata requests in virtiofsd (OPEN, RELEASE, READDIR,
> MKDIR, etc).
> 

Sorry for not catching your point, and I like the virtiofsd to do
READ/WRITE requests and qemu handle metadata requests, as virtiofsd is
good at processing dataplane things due to thread-pool and CPU
affinity(maybe in the future). As you said, virtiofsd is just acting as
a vhost-user device which should care less about ctrl request.

If our concern is improving mmap/write/read performance, why not adding
a delay worker for unmmap which could decrease the ummap times. Maybe
virtiofsd could still handle both data and meta requests by this way.

Jun



Re: [Qemu-devel] [RFC PATCH v2 00/39] rewrite MMX/SSE instruction translation

2019-08-10 Thread Richard Henderson
On 8/9/19 9:12 PM, Jan Bobek wrote:
> This is a v2 of the patch series posted in [1]. Patches 1-9 are just
> cleanups; patches 10-39 are something actually interesting. Compared
> to v1, I started using preprocessor more extensively to generate
> repetitive boilerplate code; opinions/alternatives are welcome and
> appreciated.

This is tricky.  I'm not keen on code entirely expanded via macros because it
becomes extremely difficult to debug.  All statements get recorded at the same
line of the location of the expansion, which makes the gdb "step" command
finish the entire function because there is no next line.

Once upon a time I wrote some code that's extremely macro crazy:

https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=soft-fp/op-common.h;hb=HEAD

It has been extremely difficult to maintain over the years.

We have just recently gotten rid of some of the macros in the softmmu code

https://patchwork.ozlabs.org/project/qemu-devel/list/?series=105441

replacing most of them with inline functions.

A lot of what you have needs very little adjustment to address the debugging
problem.  E.g.

> +#define INSNOP_INIT(opT, init_stmt)\
> +static int insnop_init(opT)(CPUX86State *env, DisasContext *s, \
> +int modrm, insnop_t(opT) *op)  \
> +{  \
> +init_stmt; \
> +}

> +INSNOP(
> +M, TCGv,
> +do {
> +if (decode_modrm_mod(env, s, modrm) == 3) {
> +INSNOP_INIT_FAIL;
> +} else {
> +INSNOP_INIT_OK(s->A0);
> +}
> +} while (0),
> +do {
> +assert(*op == s->A0);
> +gen_lea_modrm(env, s, modrm);
> +} while (0),
> +INSNOP_FINALIZE_NOOP)

Rearrange this as

#define INSNOP_INIT(OPT) \
static bool insnop_##OPT##_init(CPUX86State *env, DisasContext *s, \
int modrm, insnop_##OPT##_t *op)

#define INSNOP_PREPARE(OPT) \
static void insnop_##OPT##_prepare(CPUX86State *env, DisasContext *s, \
   int modrm, insnop_##OPT##_t *op)

INSNOP_INIT(M)
{
if (decode_modrm_mod(env, s, modrm) == 3) {
INSNOP_INIT_FAIL;
} else {
INSNOP_INIT_OK(s->A0);
}
}

INSNOP_PREPARE(M)
{
assert(*op == s->A0);
gen_lea_modrm(env, s, modrm);
}

etc and suddenly the entire expansion does not occur on a single line.

Further specific commentary to follow.


r~



Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading preparation part 3

2019-08-10 Thread Liu Bo
On Thu, Aug 08, 2019 at 08:53:20AM -0400, Vivek Goyal wrote:
> On Thu, Aug 08, 2019 at 10:53:16AM +0100, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefa...@redhat.com) wrote:
> > > On Wed, Aug 07, 2019 at 04:57:15PM -0400, Vivek Goyal wrote:
> > > > Kernel also serializes MAP/UNMAP on one inode. So you will need to run
> > > > multiple jobs operating on different inodes to see parallel MAP/UNMAP
> > > > (atleast from kernel's point of view).
> > > 
> > > Okay, there is still room to experiment with how MAP and UNMAP are
> > > handled by virtiofsd and QEMU even if the host kernel ultimately becomes
> > > the bottleneck.
> > > 
> > > One possible optimization is to eliminate REMOVEMAPPING requests when
> > > the guest driver knows a SETUPMAPPING will follow immediately.  I see
> > > the following request pattern in a fio randread iodepth=64 job:
> > > 
> > >   unique: 995348, opcode: SETUPMAPPING (48), nodeid: 135, insize: 80, 
> > > pid: 1351
> > >   lo_setupmapping(ino=135, fi=0x(nil), foffset=3860856832, len=2097152, 
> > > moffset=859832320, flags=0)
> > >  unique: 995348, success, outsize: 16
> > >   unique: 995350, opcode: REMOVEMAPPING (49), nodeid: 135, insize: 60, 
> > > pid: 12
> > >  unique: 995350, success, outsize: 16
> > >   unique: 995352, opcode: SETUPMAPPING (48), nodeid: 135, insize: 80, 
> > > pid: 1351
> > >   lo_setupmapping(ino=135, fi=0x(nil), foffset=16777216, len=2097152, 
> > > moffset=861929472, flags=0)
> > >  unique: 995352, success, outsize: 16
> > >   unique: 995354, opcode: REMOVEMAPPING (49), nodeid: 135, insize: 60, 
> > > pid: 12
> > >  unique: 995354, success, outsize: 16
> > >   virtio_send_msg: elem 9: with 1 in desc of length 16
> > >   unique: 995356, opcode: SETUPMAPPING (48), nodeid: 135, insize: 80, 
> > > pid: 1351
> > >   lo_setupmapping(ino=135, fi=0x(nil), foffset=383778816, len=2097152, 
> > > moffset=864026624, flags=0)
> > >  unique: 995356, success, outsize: 16
> > >   unique: 995358, opcode: REMOVEMAPPING (49), nodeid: 135, insize: 60, 
> > > pid: 12
> > > 
> > > The REMOVEMAPPING requests are unnecessary since we can map over the top
> > > of the old mapping instead of taking the extra step of removing it
> > > first.
> > 
> > Yep, those should go - I think Vivek likes to keep them for testing
> > since they make things fail more completely if there's a screwup.
> 
> I like to keep them because otherwise they keep the resources busy
> on host. If DAX range is being used immediately, then this optimization
> makes more sense. I will keep this in mind.
>

Other than the resource not being released, do you think there'll be
any stale data problem if we don't do removemapping at all, neither
background reclaim nor inline reclaim?
(truncate/punch_hole/evict_inode still needs to remove mapping though)

thanks,
-liubo

> > 
> > > Some more questions to consider for DAX performance optimization:
> > > 
> > > 1. Is FUSE_READ/FUSE_WRITE more efficient than DAX for some I/O patterns?
> > 
> > Probably for cases where the data is only accessed once, and you can't
> > preemptively map.
> > Another variant on (1) is whether we could do read/writes while the mmap
> > is happening to absorb the latency.
> 
> For small random I/O, dax might not be very effective. Overhead of
> setting up mapping and tearing it down is significant.
> 
> Vivek
> 
> ___
> Virtio-fs mailing list
> virtio...@redhat.com
> https://www.redhat.com/mailman/listinfo/virtio-fs



Re: [Qemu-devel] [Virtio-fs] [PATCH 0/4] virtiofsd: multithreading preparation part 3

2019-08-10 Thread Liu Bo
On Fri, Aug 09, 2019 at 09:21:02AM +0100, Stefan Hajnoczi wrote:
> On Thu, Aug 08, 2019 at 10:53:16AM +0100, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefa...@redhat.com) wrote:
> > > On Wed, Aug 07, 2019 at 04:57:15PM -0400, Vivek Goyal wrote:
> > > 2. Can MAP/UNMAP be performed directly in QEMU via a separate virtqueue?
> > 
> > I think there's two things to solve here that I don't currently know the
> > answer to:
> >   2a) We'd need to get the fd to qemu for the thing to mmap;
> >   we might be able to cache the fd on the qemu side for existing
> >   mappings, so when asking for a new mapping for an existing file then
> >   it would already have the fd.
> > 
> >   2b) Running a device with a mix of queues inside QEMU and on
> >   vhost-user; I don't think we have anything with that mix
> 
> vhost-user-net works in the same way.  The ctrl queue is handled by QEMU
> and the rx/tx queues by the vhost device.  This is in fact how vhost was
> initially designed: the vhost device is not a full virtio device, only
> the dataplane.

> > > 3. Can READ/WRITE be performed directly in QEMU via a separate virtqueue
> > >to eliminate the bad address problem?
> > 
> > Are you thinking of doing all read/writes that way, or just the corner
> > cases? It doesn't seem worth it for the corner cases unless you're
> > finding them cropping up in real work loads.
> 
> Send all READ/WRITE requests to QEMU instead of virtiofsd.
> 
> Only handle metadata requests in virtiofsd (OPEN, RELEASE, READDIR,
> MKDIR, etc).

For now qemu is not aware of virtio-fs's fd info, but I think it's
doable, I like the idea.

thanks,
-liubo
> 
> > > I'm not going to tackle DAX optimization myself right now but wanted to
> > > share these ideas.
> > 
> > One I was thinking about that feels easier than (2) was to change the
> > vhost slave protocol to be split transaction; it wouldn't do anything
> > for the latency but it would be able to do some in parallel if we can
> > get the kernel to feed it.
> 
> There are two cases:
> 1. mmapping multiple inode.  This should benefit from parallelism,
>although mmap is still expensive because it involves TLB shootdown
>for all other threads running this process.
> 2. mmapping the same inode.  Here the host kernel is likely to serialize
>mmaps even more, making it hard to gain performance.
> 
> It's probably worth writing a tiny benchmark first to evaluate the
> potential gains.
> 
> Stefan



> ___
> Virtio-fs mailing list
> virtio...@redhat.com
> https://www.redhat.com/mailman/listinfo/virtio-fs



Re: [Qemu-devel] [PATCH v3 20/29] Include qemu/main-loop.h less

2019-08-10 Thread Markus Armbruster
Alex Bennée  writes:

> Markus Armbruster  writes:
>
>> Philippe Mathieu-Daudé  writes:
>>
>>> On 8/9/19 8:46 AM, Markus Armbruster wrote:
 In my "build everything" tree, changing qemu/main-loop.h triggers a
 recompile of some 5600 out of 6600 objects (not counting tests and
 objects that don't depend on qemu/osdep.h).  It includes block/aio.h,
 which in turn includes qemu/event_notifier.h, qemu/notify.h,
 qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h,
 qemu/thread.h, qemu/timer.h, and a few more.

 Include qemu/main-loop.h only where it's needed.  Touching it now
 recompiles only some 1700 objects.  For block/aio.h and
 qemu/event_notifier.h, these numbers drop from 5600 to 2800.  For the
 others, they shrink only slightly.

 Signed-off-by: Markus Armbruster 
 ---
>>> [...]
 diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
 index 77f5df59b0..ac18a1184a 100644
 --- a/include/sysemu/sysemu.h
 +++ b/include/sysemu/sysemu.h
 @@ -5,7 +5,6 @@
  #include "qapi/qapi-types-run-state.h"
  #include "qemu/timer.h"
  #include "qemu/notify.h"
 -#include "qemu/main-loop.h"
  #include "qemu/bitmap.h"
  #include "qemu/uuid.h"
  #include "qom/object.h"
>>>
>>> netmap failing again :S
>>>
>>> $ make docker-image-debian-amd64 V=1 DEBUG=1
>>> [...]
>>>   CC  net/netmap.o
>>> net/netmap.c: In function 'netmap_update_fd_handler':
>>> net/netmap.c:109:5: error: implicit declaration of function
>>> 'qemu_set_fd_handler' [-Werror=implicit-function-declaration]
>>>  qemu_set_fd_handler(s->nmd->fd,
>>>  ^~~
>>> net/netmap.c:109:5: error: nested extern declaration of
>>> 'qemu_set_fd_handler' [-Werror=nested-externs]
>>
>> I managed to lose the fix somehow.
>>
>> I admit I ran "make docker-test-build", realized docker needs root, and
>> went "sod it, cross fingers & send out the patches".
>
> I've sent some patches to make docker-test-build more closely resemble
> what shippable exercises.
>
> As for root you can setup a docker group and do it that way (see the
> docs in docs/devel/testing.rst). It's not recommended for production
> machines as it makes escalation fairly trivial (the daemon itself still
> runs as root).

As Dan Walsh explained in a blog post[*], access to the docker socket is
equivalent to root.  Might be okay on a throwaway or special-purpose
box, but definitely not on my desktop.

The solution the blog post recommends for now is sudo with password,
which I consider only marginally better: instead of leaving the safe
door open, we install a security camera to log access to the safe,
*then* leave the safe door open.  Just in case whoever helps himself to
the contents of the safe is too lazy to help himself to the logs, too.

In the great tradition of throwing security under the bus to get work
done, I set up sudo.  Avoiding NOPASSWD: turns out to be impractical.

Running "make docker-test-build" fails for me on master (v4.1.0-rc4),
details appended.

>Hopefully Marc's podman support:
>
>   Subject: [PATCH v2 0/5] tests/docker: add podman support
>   Date: Tue,  9 Jul 2019 23:43:25 +0400
>   Message-Id: <20190709194330.837-1-marcandre.lur...@redhat.com>
>
> will make these requirements a little less onerous.

Sounds like a much needed upgrade to me.

[...]

[*] 
https://www.projectatomic.io/blog/2015/08/why-we-dont-let-non-root-users-run-docker-in-centos-fedora-or-rhel/


My failure:

$ make -C bld docker-test-build
make: Entering directory '/work/armbru/qemu/bld'
  BUILD   centos7
make[1]: Entering directory '/work/armbru/qemu/bld'
  GEN /work/armbru/qemu/bld/docker-src.2019-08-10-07.29.32.8915/qemu.tar
  COPYRUNNER
RUN test-build in qemu:centos7
[...]
make[1]: Leaving directory '/work/armbru/qemu/bld'
  BUILD   debian9
  BUILD   debian-amd64
make[1]: Entering directory '/work/armbru/qemu/bld'
  GEN /work/armbru/qemu/bld/docker-src.2019-08-10-07.30.18.17180/qemu.tar
  COPYRUNNER
RUN test-build in qemu:debian-amd64
[...]
install -c -m 0644 /tmp/qemu-test/build/trace-events-all 
"/tmp/qemu-test/build/=destdir/tmp/qemu-test/install/share/qemu/trace-events-all"
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
  File "/work/armbru/qemu/tests/docker/docker.py", line 234, in _kill_instances
return self._do_kill_instances(True)
  File "/work/armbru/qemu/tests/docker/docker.py", line 213, in 
_do_kill_instances
for i in self._output(cmd).split():
  File "/work/armbru/qemu/tests/docker/docker.py", line 239, in _output
**kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 223, in check_output
raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '['sudo', 'docker', 'ps', '-q']' returned non-zero 
exit status 1
Error in sys.exitfunc:
Traceback (most recent call last):
 

[Qemu-devel] [PATCH v3 4/7] block/backup: drop handling of max_transfer for copy_range

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
Since previous commit, copy_range supports max_transfer, so we don't
need to handle it by hand.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/backup.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index c6a3b2b7bb..228ba9423c 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -54,7 +54,6 @@ typedef struct BackupBlockJob {
 QLIST_HEAD(, CowRequest) inflight_reqs;
 
 bool use_copy_range;
-int64_t copy_range_size;
 
 BdrvRequestFlags write_flags;
 bool initializing_bitmap;
@@ -156,12 +155,11 @@ static int coroutine_fn 
backup_cow_with_offload(BackupBlockJob *job,
 int ret;
 int nr_clusters;
 BlockBackend *blk = job->common.blk;
-int nbytes;
+int nbytes = end - start;
 int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
 
-assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size));
+assert(end - start < INT_MAX);
 assert(QEMU_IS_ALIGNED(start, job->cluster_size));
-nbytes = MIN(job->copy_range_size, end - start);
 nr_clusters = DIV_ROUND_UP(nbytes, job->cluster_size);
 bdrv_reset_dirty_bitmap(job->copy_bitmap, start,
 job->cluster_size * nr_clusters);
@@ -756,11 +754,6 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 job->copy_bitmap = copy_bitmap;
 copy_bitmap = NULL;
 job->use_copy_range = !compress; /* compression isn't supported for it */
-job->copy_range_size = MIN_NON_ZERO(blk_get_max_transfer(job->common.blk),
-blk_get_max_transfer(job->target));
-job->copy_range_size = MAX(job->cluster_size,
-   QEMU_ALIGN_UP(job->copy_range_size,
- job->cluster_size));
 
 /* Required permissions are already taken with target's blk_new() */
 block_job_add_bdrv(>common, "target", target, 0, BLK_PERM_ALL,
-- 
2.18.0




[Qemu-devel] [PATCH v3 7/7] block/backup: merge duplicated logic into backup_do_cow

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
backup_cow_with_offload and backup_cow_with_bounce_buffer contains a
lot of duplicated logic. Move it into backup_do_cow.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/backup.c | 97 --
 1 file changed, 38 insertions(+), 59 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 65f7212c85..0ac31c2760 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -104,87 +104,61 @@ static void cow_request_end(CowRequest *req)
  * error occurred, return a negative error number
  *
  * @bounce_buffer is assumed to enough to store
- * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
+ * MIN(BACKUP_MAX_BOUNCE_BUFFER, @bytes) bytes
  */
-static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
-  int64_t start,
-  int64_t end,
-  bool is_write_notifier,
-  bool *error_is_read,
-  void *bounce_buffer)
+static int coroutine_fn backup_cow_with_bounce_buffer(
+BackupBlockJob *job, int64_t offset, int64_t bytes,
+BdrvRequestFlags read_flags, bool *error_is_read, void *bounce_buffer)
 {
-int ret;
 BlockBackend *blk = job->common.blk;
-int nbytes, remaining_bytes;
-int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
-
-assert(QEMU_IS_ALIGNED(start, job->cluster_size));
-bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
-nbytes = MIN(end - start, job->len - start);
-
 
-remaining_bytes = nbytes;
-while (remaining_bytes) {
-int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
+while (bytes) {
+int ret;
+int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, bytes);
 
-ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
+ret = blk_co_pread(blk, offset, chunk, bounce_buffer, read_flags);
 if (ret < 0) {
-trace_backup_do_cow_read_fail(job, start, ret);
+trace_backup_do_cow_read_fail(job, offset, ret);
 if (error_is_read) {
 *error_is_read = true;
 }
-goto fail;
+return ret;
 }
 
-ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
+ret = blk_co_pwrite(job->target, offset, chunk, bounce_buffer,
 job->write_flags);
 if (ret < 0) {
-trace_backup_do_cow_write_fail(job, start, ret);
+trace_backup_do_cow_write_fail(job, offset, ret);
 if (error_is_read) {
 *error_is_read = false;
 }
-goto fail;
+return ret;
 }
 
-start += chunk;
-remaining_bytes -= chunk;
+offset += chunk;
+bytes -= chunk;
 }
 
-return nbytes;
-fail:
-bdrv_set_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
-return ret;
-
+return 0;
 }
 
 /* Copy range to target and return the bytes copied. If error occurred, return 
a
  * negative error number. */
 static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,
-int64_t start,
-int64_t end,
-bool is_write_notifier)
+int64_t offset,
+int64_t bytes,
+BdrvRequestFlags read_flags)
 {
 int ret;
-int nr_clusters;
 BlockBackend *blk = job->common.blk;
-int nbytes = MIN(end - start, job->len - start);
-int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
-
-assert(end - start < INT_MAX);
-assert(QEMU_IS_ALIGNED(start, job->cluster_size));
-nr_clusters = DIV_ROUND_UP(nbytes, job->cluster_size);
-bdrv_reset_dirty_bitmap(job->copy_bitmap, start,
-job->cluster_size * nr_clusters);
-ret = blk_co_copy_range(blk, start, job->target, start, nbytes,
+
+ret = blk_co_copy_range(blk, offset, job->target, offset, bytes,
 read_flags, job->write_flags);
 if (ret < 0) {
-trace_backup_do_cow_copy_range_fail(job, start, ret);
-bdrv_set_dirty_bitmap(job->copy_bitmap, start,
-  job->cluster_size * nr_clusters);
-return ret;
+trace_backup_do_cow_copy_range_fail(job, offset, ret);
 }
 
-return nbytes;
+return ret;
 }
 
 /*
@@ -268,6 +242,8 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 int64_t start, end; /* bytes */
 void *bounce_buffer = NULL;
 int64_t skip_bytes;
+BdrvRequestFlags read_flags =
+is_write_notifier ? 

[Qemu-devel] [PATCH v3 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
backup_cow_with_offload can transfer more than one cluster. Let
backup_cow_with_bounce_buffer behave similarly. It reduces the number
of IO requests, since there is no need to copy cluster by cluster.

Logic around bounce_buffer allocation changed: we can't just allocate
one-cluster-sized buffer to share for all iterations. We can't also
allocate buffer of full-request length it may be too large, so
BACKUP_MAX_BOUNCE_BUFFER is introduced. And finally, allocation logic
is to allocate a buffer sufficient to handle all remaining iterations
at the point where we need the buffer for the first time.

Bonus: get rid of pointer-to-pointer.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/backup.c | 65 +++---
 1 file changed, 41 insertions(+), 24 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index d482d93458..65f7212c85 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -27,6 +27,7 @@
 #include "qemu/error-report.h"
 
 #define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
+#define BACKUP_MAX_BOUNCE_BUFFER (64 * 1024 * 1024)
 
 typedef struct CowRequest {
 int64_t start_byte;
@@ -98,44 +99,55 @@ static void cow_request_end(CowRequest *req)
 qemu_co_queue_restart_all(>wait_queue);
 }
 
-/* Copy range to target with a bounce buffer and return the bytes copied. If
- * error occurred, return a negative error number */
+/*
+ * Copy range to target with a bounce buffer and return the bytes copied. If
+ * error occurred, return a negative error number
+ *
+ * @bounce_buffer is assumed to enough to store
+ * MIN(BACKUP_MAX_BOUNCE_BUFFER, @end - @start) bytes
+ */
 static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,
   int64_t start,
   int64_t end,
   bool is_write_notifier,
   bool *error_is_read,
-  void **bounce_buffer)
+  void *bounce_buffer)
 {
 int ret;
 BlockBackend *blk = job->common.blk;
-int nbytes;
+int nbytes, remaining_bytes;
 int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
 
 assert(QEMU_IS_ALIGNED(start, job->cluster_size));
-bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
-nbytes = MIN(job->cluster_size, job->len - start);
-if (!*bounce_buffer) {
-*bounce_buffer = blk_blockalign(blk, job->cluster_size);
-}
+bdrv_reset_dirty_bitmap(job->copy_bitmap, start, end - start);
+nbytes = MIN(end - start, job->len - start);
 
-ret = blk_co_pread(blk, start, nbytes, *bounce_buffer, read_flags);
-if (ret < 0) {
-trace_backup_do_cow_read_fail(job, start, ret);
-if (error_is_read) {
-*error_is_read = true;
+
+remaining_bytes = nbytes;
+while (remaining_bytes) {
+int chunk = MIN(BACKUP_MAX_BOUNCE_BUFFER, remaining_bytes);
+
+ret = blk_co_pread(blk, start, chunk, bounce_buffer, read_flags);
+if (ret < 0) {
+trace_backup_do_cow_read_fail(job, start, ret);
+if (error_is_read) {
+*error_is_read = true;
+}
+goto fail;
 }
-goto fail;
-}
 
-ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
-job->write_flags);
-if (ret < 0) {
-trace_backup_do_cow_write_fail(job, start, ret);
-if (error_is_read) {
-*error_is_read = false;
+ret = blk_co_pwrite(job->target, start, chunk, bounce_buffer,
+job->write_flags);
+if (ret < 0) {
+trace_backup_do_cow_write_fail(job, start, ret);
+if (error_is_read) {
+*error_is_read = false;
+}
+goto fail;
 }
-goto fail;
+
+start += chunk;
+remaining_bytes -= chunk;
 }
 
 return nbytes;
@@ -301,9 +313,14 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 }
 }
 if (!job->use_copy_range) {
+if (!bounce_buffer) {
+size_t len = MIN(BACKUP_MAX_BOUNCE_BUFFER,
+ MAX(dirty_end - start, end - dirty_end));
+bounce_buffer = blk_try_blockalign(job->common.blk, len);
+}
 ret = backup_cow_with_bounce_buffer(job, start, dirty_end,
 is_write_notifier,
-error_is_read, _buffer);
+error_is_read, bounce_buffer);
 }
 if (ret < 0) {
 break;
-- 
2.18.0




[Qemu-devel] [PATCH v3 3/7] block/io: handle alignment and max_transfer for copy_range

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
copy_range ignores these limitations, let's improve it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 44 
 1 file changed, 36 insertions(+), 8 deletions(-)

diff --git a/block/io.c b/block/io.c
index 06305c6ea6..45b1e1f76e 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3005,11 +3005,13 @@ static int coroutine_fn bdrv_co_copy_range_internal(
 {
 BdrvTrackedRequest req;
 int ret;
+uint32_t align, max_transfer;
 
 /* TODO We can support BDRV_REQ_NO_FALLBACK here */
 assert(!(read_flags & BDRV_REQ_NO_FALLBACK));
 assert(!(write_flags & BDRV_REQ_NO_FALLBACK));
 
+
 if (!dst || !dst->bs) {
 return -ENOMEDIUM;
 }
@@ -3029,9 +3031,19 @@ static int coroutine_fn bdrv_co_copy_range_internal(
 return ret;
 }
 
+align = MAX(src->bs->bl.request_alignment, dst->bs->bl.request_alignment);
+max_transfer =
+QEMU_ALIGN_DOWN(MIN_NON_ZERO(MIN_NON_ZERO(src->bs->bl.max_transfer,
+  
dst->bs->bl.max_transfer),
+ INT_MAX), align);
+
 if (!src->bs->drv->bdrv_co_copy_range_from
 || !dst->bs->drv->bdrv_co_copy_range_to
-|| src->bs->encrypted || dst->bs->encrypted) {
+|| src->bs->encrypted || dst->bs->encrypted ||
+(max_transfer == 0 && bytes > 0) ||
+!QEMU_IS_ALIGNED(src_offset, src->bs->bl.request_alignment) ||
+!QEMU_IS_ALIGNED(dst_offset, dst->bs->bl.request_alignment) ||
+!QEMU_IS_ALIGNED(bytes, align)) {
 return -ENOTSUP;
 }
 
@@ -3046,11 +3058,22 @@ static int coroutine_fn bdrv_co_copy_range_internal(
 wait_serialising_requests();
 }
 
-ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
-src, src_offset,
-dst, dst_offset,
-bytes,
-read_flags, write_flags);
+while (bytes) {
+int num = MIN(bytes, max_transfer);
+
+ret = src->bs->drv->bdrv_co_copy_range_from(src->bs,
+src, src_offset,
+dst, dst_offset,
+num,
+read_flags,
+write_flags);
+if (ret < 0) {
+break;
+}
+bytes -= num;
+src_offset += num;
+dst_offset += num;
+}
 
 tracked_request_end();
 bdrv_dec_in_flight(src->bs);
@@ -3060,12 +3083,17 @@ static int coroutine_fn bdrv_co_copy_range_internal(
   BDRV_TRACKED_WRITE);
 ret = bdrv_co_write_req_prepare(dst, dst_offset, bytes, ,
 write_flags);
-if (!ret) {
+while (!ret && bytes) {
+int num = MIN(bytes, max_transfer);
+
 ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs,
   src, src_offset,
   dst, dst_offset,
-  bytes,
+  num,
   read_flags, write_flags);
+bytes -= num;
+src_offset += num;
+dst_offset += num;
 }
 bdrv_co_write_req_finish(dst, dst_offset, bytes, , ret);
 tracked_request_end();
-- 
2.18.0




[Qemu-devel] [PATCH v3 5/7] block/backup: fix backup_cow_with_offload for last cluster

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
We shouldn't try to copy bytes beyond EOF. Fix it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/backup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/backup.c b/block/backup.c
index 228ba9423c..d482d93458 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -155,7 +155,7 @@ static int coroutine_fn 
backup_cow_with_offload(BackupBlockJob *job,
 int ret;
 int nr_clusters;
 BlockBackend *blk = job->common.blk;
-int nbytes = end - start;
+int nbytes = MIN(end - start, job->len - start);
 int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
 
 assert(end - start < INT_MAX);
-- 
2.18.0




[Qemu-devel] [PATCH v3 2/7] block/backup: refactor write_flags

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
write flags are constant, let's store it in BackupBlockJob instead of
recalculating. It also makes two boolean fields to be unused, so,
drop them.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: John Snow 
Reviewed-by: Max Reitz 
---
 block/backup.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index d815436455..c6a3b2b7bb 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -50,14 +50,13 @@ typedef struct BackupBlockJob {
 uint64_t len;
 uint64_t bytes_read;
 int64_t cluster_size;
-bool compress;
 NotifierWithReturn before_write;
 QLIST_HEAD(, CowRequest) inflight_reqs;
 
 bool use_copy_range;
 int64_t copy_range_size;
 
-bool serialize_target_writes;
+BdrvRequestFlags write_flags;
 bool initializing_bitmap;
 } BackupBlockJob;
 
@@ -113,10 +112,6 @@ static int coroutine_fn 
backup_cow_with_bounce_buffer(BackupBlockJob *job,
 BlockBackend *blk = job->common.blk;
 int nbytes;
 int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
-int write_flags =
-(job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0) |
-(job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
-
 
 assert(QEMU_IS_ALIGNED(start, job->cluster_size));
 bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
@@ -135,7 +130,7 @@ static int coroutine_fn 
backup_cow_with_bounce_buffer(BackupBlockJob *job,
 }
 
 ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
-write_flags);
+job->write_flags);
 if (ret < 0) {
 trace_backup_do_cow_write_fail(job, start, ret);
 if (error_is_read) {
@@ -163,7 +158,6 @@ static int coroutine_fn 
backup_cow_with_offload(BackupBlockJob *job,
 BlockBackend *blk = job->common.blk;
 int nbytes;
 int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
-int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
 
 assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size));
 assert(QEMU_IS_ALIGNED(start, job->cluster_size));
@@ -172,7 +166,7 @@ static int coroutine_fn 
backup_cow_with_offload(BackupBlockJob *job,
 bdrv_reset_dirty_bitmap(job->copy_bitmap, start,
 job->cluster_size * nr_clusters);
 ret = blk_co_copy_range(blk, start, job->target, start, nbytes,
-read_flags, write_flags);
+read_flags, job->write_flags);
 if (ret < 0) {
 trace_backup_do_cow_copy_range_fail(job, start, ret);
 bdrv_set_dirty_bitmap(job->copy_bitmap, start,
@@ -748,10 +742,16 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 job->sync_mode = sync_mode;
 job->sync_bitmap = sync_bitmap;
 job->bitmap_mode = bitmap_mode;
-job->compress = compress;
 
-/* Detect image-fleecing (and similar) schemes */
-job->serialize_target_writes = bdrv_chain_contains(target, bs);
+/*
+ * Set write flags:
+ *  1. Detect image-fleecing (and similar) schemes
+ *  2. Handle compression
+ */
+job->write_flags =
+(bdrv_chain_contains(target, bs) ? BDRV_REQ_SERIALISING : 0) |
+(compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
+
 job->cluster_size = cluster_size;
 job->copy_bitmap = copy_bitmap;
 copy_bitmap = NULL;
-- 
2.18.0




[Qemu-devel] [PATCH v3 0/7] backup improvements

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
Hi all!

There are some fixes and refactorings I need on my way to resend
my backup-top series. It's obvious now that I need to share copying
code between backup and backup-top, as backup copying code becomes
smarter and more complicated. So the goal of the series is to make copying
code more share-able.

Based-on: https://github.com/jnsnow/qemu bitmaps 

v3:
03: Ha, fix real bug, we shouldn't touch src before handling write-zero,
as src may be NULL. So, replace align and max_transfer calculation
together with max_transfer == 0 check. Drop comment, as there is no
special place for it now..
04: add Max's r-b:
06-07: rewrite to keep bounce_buffer sharing between iterations and to
   limit allocation [Eric]

v2 (by Max's comments):

02: Add Max's r-b
03: - split out backup changes to 03
   - handle common max_transfer = 0 case
04: splat out from 02
06: fix allocation size
07: - rebase on 06 changes
   - add Max's r-b

two patches are dropped or postponed for the next step

Vladimir Sementsov-Ogievskiy (7):
  block/backup: deal with zero detection
  block/backup: refactor write_flags
  block/io: handle alignment and max_transfer for copy_range
  block/backup: drop handling of max_transfer for copy_range
  block/backup: fix backup_cow_with_offload for last cluster
  block/backup: teach backup_cow_with_bounce_buffer to copy more at once
  block/backup: merge duplicated logic into backup_do_cow

 block/backup.c | 154 ++---
 block/io.c |  44 +++---
 blockdev.c |   8 +--
 3 files changed, 110 insertions(+), 96 deletions(-)

-- 
2.18.0




[Qemu-devel] [PATCH v3 1/7] block/backup: deal with zero detection

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
We have detect_zeroes option, so at least for blockdev-backup user
should define it if zero-detection is needed. For drive-backup leave
detection enabled by default but do it through existing option instead
of open-coding.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Max Reitz 
---
 block/backup.c | 15 ++-
 blockdev.c |  8 
 2 files changed, 10 insertions(+), 13 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index adc4d44244..d815436455 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -113,7 +113,10 @@ static int coroutine_fn 
backup_cow_with_bounce_buffer(BackupBlockJob *job,
 BlockBackend *blk = job->common.blk;
 int nbytes;
 int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
-int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;
+int write_flags =
+(job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0) |
+(job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);
+
 
 assert(QEMU_IS_ALIGNED(start, job->cluster_size));
 bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
@@ -131,14 +134,8 @@ static int coroutine_fn 
backup_cow_with_bounce_buffer(BackupBlockJob *job,
 goto fail;
 }
 
-if (buffer_is_zero(*bounce_buffer, nbytes)) {
-ret = blk_co_pwrite_zeroes(job->target, start,
-   nbytes, write_flags | BDRV_REQ_MAY_UNMAP);
-} else {
-ret = blk_co_pwrite(job->target, start,
-nbytes, *bounce_buffer, write_flags |
-(job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0));
-}
+ret = blk_co_pwrite(job->target, start, nbytes, *bounce_buffer,
+write_flags);
 if (ret < 0) {
 trace_backup_do_cow_write_fail(job, start, ret);
 if (error_is_read) {
diff --git a/blockdev.c b/blockdev.c
index 29c6c6044a..2d7e7be538 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3613,7 +3613,7 @@ static BlockJob *do_drive_backup(DriveBackup *backup, 
JobTxn *txn,
 BlockDriverState *source = NULL;
 BlockJob *job = NULL;
 AioContext *aio_context;
-QDict *options = NULL;
+QDict *options;
 Error *local_err = NULL;
 int flags;
 int64_t size;
@@ -3686,10 +3686,10 @@ static BlockJob *do_drive_backup(DriveBackup *backup, 
JobTxn *txn,
 goto out;
 }
 
+options = qdict_new();
+qdict_put_str(options, "discard", "unmap");
+qdict_put_str(options, "detect-zeroes", "unmap");
 if (backup->format) {
-if (!options) {
-options = qdict_new();
-}
 qdict_put_str(options, "driver", backup->format);
 }
 
-- 
2.18.0




Re: [Qemu-devel] [PATCH v3 00/29] Tame a few "touch this, recompile the world" headers

2019-08-10 Thread Markus Armbruster
Philippe Mathieu-Daudé  writes:

> Hi Markus,
>
> On 8/9/19 8:46 AM, Markus Armbruster wrote:
>> We have quite a few "touch this, recompile the world" headers.  My
>> "build everything" tree has some 6600 objects (not counting tests and
>> objects that don't depend on qemu/osdep.h).  Touching any of 54
>> headers triggers a recompile of more than half of them.
>> 
>> This series reduces them to 46.
[...]
>> Observed patterns of #include misuse:
>> 
>> * Copy pasta
>> 
>>   I found and deleted quite a few #include that were almost certainly
>>   never needed.  The most likely explanation is lazy copying from a
>>   "similar" file.  My deletions produced only minor improvements,
>>   though.
>> 
>> * "Convenience" headers
>> 
>>   We sometimes have a header include a bunch of other headers the
>>   header itself doesn't need, so the header's users don't have to.  An
>>   extreme case is hw/hw.h: it pulls in more than 40 other headers,
>>   then declares just hw_error().  Most of its users need only a
>>   fraction of it.  PATCH 08-09,12-18 fix that, trading the very
>>   occasional convenience of not having to type a few #include
>>   directives for build speed.
>> 
>> * "Fat" headers
>> 
>>   Some headers provide many things to many customers.  Bad when the
>>   customers generally need only parts.  Worse when such a "fat" header
>>   pulls in loads more.  This series grapples with three instances:
>>   qapi/qapi-types-common.h (PATCH 03), hw/boards.h, which pulls in
>>   almost 70 headers (PATCH 19-23), and sysemu/sysemu.h, which pulls in
>>   more than 20 (PATCH 23-28).
>> 
>> * Design erosion
>> 
>>   Off-the-cuff additions to headers can erode design.  For instance,
>>   the generated trace.h were carefully designed for minimal
>>   dependencies.  We let them balloon when we added the per-vCPU
>>   tracing feature a few years later.  PATCH 07 grapples with that.
>
> What can prevent us from these misuse patterns?

Excellent question.

> Will you redo this analysis/series after each releases?

Perhaps I should first explain my analysis, i.e. where my numbers come
from, then my cleanups.

Counting #include directives is not useful by itself.  For instance,
#include "qemu/typedefs.h" occurs just once, yet almost all .o depend on
it.

Better: count the actual make dependencies.  Compiling FOO.c generates
FOO.d in addition to FOO.o.  These .d list the .h the .o depend on.  I
wrote a stupid Python program to count how many .o depend on each .h.
This identifies widely included headers.

Also useful are the .d for the .c generated by "make check-headers":
they tell us how many headers each header pulls in.  Widely included
headers that pull in lots more are particularly promising candidates for
#include cleanup.

The actual cleanup is manual work.  Getting rid of copy pasta and
"convenience" headers is pretty straightforward.  "Fat" headers and
design erosion can require more thought.  Another difficult part is
identifying how to order the cleanups for reviewability.  Testing is
straightforward, but slw; much world-recompiling for many, many
worlds.

I started looking into this back in 2016[1], and updated the analysis
recently[2].  I contributed a few cleanups myself, and at least Paolo
also chipped in.

I can update the analysis more frequently if that helps.  However, ...

> How to automate misuse checks?
>
> Can we report some metrics and warn if there a considerable discrepancy?

... detecting we've made things worse weeks or months after we did is
clearly suboptimal.

The actual numbers depend on build configuration.  For reproducible
numbers, we need to hold configuration constant for all developers.  I
figure our containerized builds could do that for us.

To detect metrics going south, we need a baseline, and a definition of
"south".

If we store the baseline in git and fail the test when the actual
numbers are too "south" of the baseline, patches making things a lot
worse will have to update the baseline.  This should ensure proper
scrutiny.

However, a slow slide south over multiple independent patches will 
arbitrary blame the last one, while the others get off scot free.

I'm not sure how to best handle new headers.  Having to update the
baseline whenever you add a new header will likely add too much
friction.

Needs thought.

Perhaps we can start with fully automating the measurement part, then
passively gather data for a while to learn how the numbers change during
development.



[1] https://lists.nongnu.org/archive/html/qemu-devel/2016-03/msg03271.html
[2] https://lists.nongnu.org/archive/html/qemu-devel/2019-05/msg06291.html



Re: [Qemu-devel] [PATCH v6 24/42] block: Use child access functions for QAPI queries

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 19:13, Max Reitz wrote:
> query-block, query-named-block-nodes, and query-blockstats now return
> any filtered child under "backing", not just bs->backing or COW
> children.  This is so that filters do not interrupt the reported backing
> chain.  This changes the output for iotest 184, as the throttled node
> now appears as a backing child.
> 
> Signed-off-by: Max Reitz 
> ---
>   block/qapi.c   | 39 +++---
>   tests/qemu-iotests/184.out |  7 ++-
>   2 files changed, 30 insertions(+), 16 deletions(-)
> 
> diff --git a/block/qapi.c b/block/qapi.c
> index 9a185cba48..4f59ac1c0f 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c

[..]

> @@ -354,9 +357,9 @@ static void bdrv_query_info(BlockBackend *blk, BlockInfo 
> **p_info,
>   BlockDriverState *bs = blk_bs(blk);
>   char *qdev;
>   
> -/* Skip automatically inserted nodes that the user isn't aware of */
> -while (bs && bs->drv && bs->implicit) {
> -bs = backing_bs(bs);
> +if (bs) {
> +/* Skip automatically inserted nodes that the user isn't aware of */
> +bs = bdrv_skip_implicit_filters(bs);
>   }

bdrv_skip_implicit_filters supports NULL, so it may be written without "if"

Anyway:
Reviewed-by: Vladimir Sementsov-Ogievskiy 

[..]

-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v6 22/42] block: Fix bdrv_get_allocated_file_size's fallback

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 19:13, Max Reitz wrote:
> If the driver does not implement bdrv_get_allocated_file_size(), we
> should fall back to cumulating the allocated size of all non-COW
> children instead of just bs->file.
> 
> Suggested-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Max Reitz 
> ---
>   block.c | 22 --
>   1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 1070aa1ba9..6e1ddab056 100644
> --- a/block.c
> +++ b/block.c
> @@ -4650,9 +4650,27 @@ int64_t bdrv_get_allocated_file_size(BlockDriverState 
> *bs)
>   if (drv->bdrv_get_allocated_file_size) {
>   return drv->bdrv_get_allocated_file_size(bs);
>   }
> -if (bs->file) {
> -return bdrv_get_allocated_file_size(bs->file->bs);
> +
> +if (!QLIST_EMPTY(>children)) {
> +BdrvChild *child;
> +int64_t child_size, total_size = 0;
> +
> +QLIST_FOREACH(child, >children, next) {
> +if (child == bdrv_filtered_cow_child(bs)) {
> +/* Ignore COW backing files */
> +continue;
> +}
> +
> +child_size = bdrv_get_allocated_file_size(child->bs);
> +if (child_size < 0) {
> +return child_size;
> +}
> +total_size += child_size;
> +}
> +
> +return total_size;
>   }
> +
>   return -ENOTSUP;
>   }
>   
> 

Hmm..

1. No children -> -ENOTSUP
2. Only cow child -> 0
3. Some non-cow children -> SUM

It's all arguable (the strictest way is -ENOTSUP in either case),
but if we want to fallback to SUM of non-cow children, 1. and 2. should return
the same.

-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v6 20/42] block/snapshot: Fix fallback

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 19:13, Max Reitz wrote:
> If the top node's driver does not provide snapshot functionality and we
> want to fall back to a node down the chain, we need to snapshot all
> non-COW children.  For simplicity's sake, just do not fall back if there
> is more than one such child.
> 
> bdrv_snapshot_goto() becomes a bit weird because we may have to redirect
> the actual child pointer, so it only works if the fallback child is
> bs->file or bs->backing (and then we have to find out which it is).
> 
> Suggested-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Max Reitz 
> ---
>   block/snapshot.c | 100 +--
>   1 file changed, 79 insertions(+), 21 deletions(-)
> 
> diff --git a/block/snapshot.c b/block/snapshot.c
> index f2f48f926a..35403c167f 100644
> --- a/block/snapshot.c
> +++ b/block/snapshot.c
> @@ -146,6 +146,32 @@ bool bdrv_snapshot_find_by_id_and_name(BlockDriverState 
> *bs,
>   return ret;
>   }
>   
> +/**
> + * Return the child BDS to which we can fall back if the given BDS
> + * does not support snapshots.
> + * Return NULL if there is no BDS to (safely) fall back to.
> + */
> +static BlockDriverState *bdrv_snapshot_fallback(BlockDriverState *bs)
> +{
> +BlockDriverState *child_bs = NULL;
> +BdrvChild *child;
> +
> +QLIST_FOREACH(child, >children, next) {
> +if (child == bdrv_filtered_cow_child(bs)) {
> +/* Ignore: COW children need not be included in snapshots */
> +continue;
> +}
> +
> +if (child_bs) {
> +/* Cannot fall back to a single child if there are multiple */
> +return NULL;
> +}
> +child_bs = child->bs;
> +}
> +
> +return child_bs;
> +}
> +
>   int bdrv_can_snapshot(BlockDriverState *bs)
>   {
>   BlockDriver *drv = bs->drv;
> @@ -154,8 +180,9 @@ int bdrv_can_snapshot(BlockDriverState *bs)
>   }
>   
>   if (!drv->bdrv_snapshot_create) {
> -if (bs->file != NULL) {
> -return bdrv_can_snapshot(bs->file->bs);
> +BlockDriverState *fallback_bs = bdrv_snapshot_fallback(bs);
> +if (fallback_bs) {
> +return bdrv_can_snapshot(fallback_bs);
>   }
>   return 0;
>   }
> @@ -167,14 +194,15 @@ int bdrv_snapshot_create(BlockDriverState *bs,
>QEMUSnapshotInfo *sn_info)
>   {
>   BlockDriver *drv = bs->drv;
> +BlockDriverState *fallback_bs = bdrv_snapshot_fallback(bs);
>   if (!drv) {
>   return -ENOMEDIUM;
>   }
>   if (drv->bdrv_snapshot_create) {
>   return drv->bdrv_snapshot_create(bs, sn_info);
>   }
> -if (bs->file) {
> -return bdrv_snapshot_create(bs->file->bs, sn_info);
> +if (fallback_bs) {
> +return bdrv_snapshot_create(fallback_bs, sn_info);
>   }
>   return -ENOTSUP;
>   }
> @@ -184,6 +212,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
>  Error **errp)
>   {
>   BlockDriver *drv = bs->drv;
> +BlockDriverState *fallback_bs;
>   int ret, open_ret;
>   
>   if (!drv) {
> @@ -204,39 +233,66 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
>   return ret;
>   }
>   
> -if (bs->file) {
> -BlockDriverState *file;
> -QDict *options = qdict_clone_shallow(bs->options);
> +fallback_bs = bdrv_snapshot_fallback(bs);
> +if (fallback_bs) {
> +QDict *options;
>   QDict *file_options;
>   Error *local_err = NULL;
> +bool is_backing_child;
> +BdrvChild **child_pointer;
> +
> +/*
> + * We need a pointer to the fallback child pointer, so let us
> + * see whether the child is referenced by a field in the BDS
> + * object.
> + */
> +if (fallback_bs == bs->file->bs) {
> +is_backing_child = false;
> +child_pointer = >file;
> +} else if (fallback_bs == bs->backing->bs) {
> +is_backing_child = true;
> +child_pointer = >backing;
> +} else {
> +/*
> + * The fallback child is not referenced by a field in the
> + * BDS object.  We cannot go on then.
> + */
> +error_setg(errp, "Block driver does not support snapshots");
> +return -ENOTSUP;
> +}
> +

Hmm.. Should not this check be included into bdrv_snapshot_fallback(), to
work only with file and backing?

And could we allow fallback only for filters? Is there real usecase except 
filters?
Or may be, drop fallback at all?


>   
> -file = bs->file->bs;
>   /* Prevent it from getting deleted when detached from bs */
> -bdrv_ref(file);
> +bdrv_ref(fallback_bs);
>   
> -qdict_extract_subqdict(options, _options, "file.");
> +qdict_extract_subqdict(options, _options,
> +   is_backing_child ? "backing." : "file.");
>   qobject_unref(file_options);
> - 

Re: [Qemu-devel] [PATCH v6 18/42] block: Use CAFs in bdrv_refresh_filename()

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 19:13, Max Reitz wrote:
> bdrv_refresh_filename() and the kind of related bdrv_dirname() should
> look to the primary child when they wish to copy the underlying file's
> filename.
> 
> Signed-off-by: Max Reitz 


Reviewed-by: Vladimir Sementsov-Ogievskiy 

-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v6 15/42] block: Re-evaluate backing file handling in reopen

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 19:13, Max Reitz wrote:
> Reopening a node's backing child needs a bit of special handling because
> the "backing" child has different defaults than all other children
> (among other things).  Adding filter support here is a bit more
> difficult than just using the child access functions.  In fact, we often
> have to directly use bs->backing because these functions are about the
> "backing" child (which may or may not be the COW backing file).
> 
> Signed-off-by: Max Reitz

Reviewed-by: Vladimir Sementsov-Ogievskiy 

-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v6 16/42] block: Flush all children in generic code

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 19:13, Max Reitz wrote:
> If the driver does not support .bdrv_co_flush() so bdrv_co_flush()
> itself has to flush the children of the given node, it should not flush
> just bs->file->bs, but in fact all children.
> 
> In any case, the BLKDBG_EVENT() should be emitted on the primary child,
> because that is where a blkdebug node would be if there is any.
> 
> Suggested-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Max Reitz 
> ---
>   block/io.c | 23 +--
>   1 file changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index c5a8e3e6a3..bcc770d336 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2572,6 +2572,8 @@ static void coroutine_fn bdrv_flush_co_entry(void 
> *opaque)
>   
>   int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>   {
> +BdrvChild *primary_child = bdrv_primary_child(bs);
> +BdrvChild *child;
>   int current_gen;
>   int ret = 0;
>   
> @@ -2601,7 +2603,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>   }
>   
>   /* Write back cached data to the OS even with cache=unsafe */
> -BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_OS);
> +BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_OS);
>   if (bs->drv->bdrv_co_flush_to_os) {
>   ret = bs->drv->bdrv_co_flush_to_os(bs);
>   if (ret < 0) {
> @@ -2611,15 +2613,15 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>   
>   /* But don't actually force it to the disk with cache=unsafe */
>   if (bs->open_flags & BDRV_O_NO_FLUSH) {
> -goto flush_parent;
> +goto flush_children;
>   }
>   
>   /* Check if we really need to flush anything */
>   if (bs->flushed_gen == current_gen) {
> -goto flush_parent;
> +goto flush_children;
>   }
>   
> -BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
> +BLKDBG_EVENT(primary_child, BLKDBG_FLUSH_TO_DISK);
>   if (!bs->drv) {
>   /* bs->drv->bdrv_co_flush() might have ejected the BDS
>* (even in case of apparent success) */
> @@ -2663,8 +2665,17 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>   /* Now flush the underlying protocol.  It will also have BDRV_O_NO_FLUSH
>* in the case of cache=unsafe, so there are no useless flushes.
>*/
> -flush_parent:
> -ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
> +flush_children:
> +ret = 0; > +QLIST_FOREACH(child, >children, next) {
> +int this_child_ret;
> +
> +this_child_ret = bdrv_co_flush(child->bs);
> +if (!ret) {
> +ret = this_child_ret;
> +}
> +}

Hmm, you said that we want to flush only children with write-access from 
parent..
Shouldn't we check it? Or we assume that it's always safe to call bdrv_co_flush 
on
a node?

> +
>   out:
>   /* Notify any pending flushes that we have completed */
>   if (ret == 0) {
> 


-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v6 14/42] block: Use CAFs when working with backing chains

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 19:13, Max Reitz wrote:
> Use child access functions when iterating through backing chains so
> filters do not break the chain.
> 
> Signed-off-by: Max Reitz 

Reviewed-by: Vladimir Sementsov-Ogievskiy 


-- 
Best regards,
Vladimir


Re: [Qemu-devel] RISC-V: Vector && DSP Extension

2019-08-10 Thread LIU ZhiWei



On 8/9/19 6:54 PM, Alistair Francis wrote:

On Thu, Aug 8, 2019 at 2:52 AM liuzhiwei  wrote:

Hi all,

 My workmate  and I have been working on Vector & Dsp extension, and
I'd like to share develop status  with folks.

Cool!


 The spec references for  Vector extension is riscv-v-spec-0.7.1, and
riscv-p-spec-0.5 for DSP extension. The code of vector extension is
ready and under testing,  the first patch will be sent about two weeks
later. After that we will forward working on DSP extension, and send the
first patch in middle  October.

What code are you talking about? Is this QEMU code?


Hi Alistair,

It's the QEMU code I have been working on these days, which implements Vector 
extension. It is under testing,
and will be sent later.


  Could the maintainers  tell me whether the specs referenced are
appropriate? Is anyone working on these extensions?  I'd like to get
your status, and maybe discuss questions and work togather.

Just use the latest (master) from the ISA spec git repo.


I will follow your advice.Thanks for your attention to this matter.

Best Regards,

Zhiwei



I don't know anyone doing vector work for QEMU. It would be very
useful, but everyone is busy with something at the moment
unfortunately.

Alistair


Best Regards

LIU Zhiwei





Re: [Qemu-devel] [PATCH v6 10/42] block: Drop bdrv_is_encrypted()

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 19:13, Max Reitz wrote:
> The original purpose of bdrv_is_encrypted() was to inquire whether a BDS
> can be used without the user entering a password or not.  It has not
> been used for that purpose for quite some time.
> 
> Actually, it is not even fit for that purpose, because to answer that
> question, it would have recursively query all of the given node's
> children.
> 
> So now we have to decide in which direction we want to fix
> bdrv_is_encrypted(): Recursively query all children, or drop it and just
> use bs->encrypted to get the current node's status?
> 
> Nowadays, its only purpose is to report through bdrv_query_image_info()
> whether the given image is encrypted or not.  For this purpose, it is
> probably more interesting to see whether a given node itself is
> encrypted or not (otherwise, a management application cannot discern for
> certain which nodes are really encrypted and which just have encrypted
> children).
> 
> Suggested-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Max Reitz 

Reviewed-by: Vladimir Sementsov-Ogievskiy 


-- 
Best regards,
Vladimir


Re: [Qemu-devel] RISC-V: Vector && DSP Extension

2019-08-10 Thread LIU ZhiWei



On 8/8/19 6:48 AM, Chih-Min Chao wrote:



On Thu, Aug 8, 2019 at 7:29 PM Aleksandar Markovic 
mailto:aleksandar.m.m...@gmail.com>> wrote:


On Thu, Aug 8, 2019 at 11:52 AM liuzhiwei mailto:zhiwei_...@c-sky.com>> wrote:

> Hi all,
>
>     My workmate  and I have been working on Vector & Dsp
extension, and
> I'd like to share develop status  with folks.
>
>     The spec references for  Vector extension is
riscv-v-spec-0.7.1, and
> riscv-p-spec-0.5 for DSP extension.


Hello, Liu.

I will not answer your questions directly, however I want to bring
to you
and others another perspective on this situation.

First, please provide the link to the specifications. Via Google,
I found
that "riscv-v-spec-0.7.1" is titled "Working draft of the proposed
RISC-V V
vector extension". I could not find "riscv-p-spec-0.5".

I am not sure what the QEMU policy towards "working draft
proposal" type of
specification is. Peter, can you perhaps clarify that or any other
related
issue?


Hi Aleksandar,

As for riscv-v-spec 0.7.1, it is first stable spec for target software 
development
though the name is working draft.  The architecture skeleton is fix 
and most of
work are focusing the issues related to micro-architecture 
implementation complexity.
Sifive has released an open source implementation on spike simulation 
and Imperas also
provides another implementation with its binary simulator.  I think it 
is worth to include the extension

in Qemu at this moment.

As for riscv-p-spec-0.5, I think Andes has fully supported this 
extension and should release more
detailed spec in the near future (described Riscv Technical Update 
2019/06).
They have implement lots of DSP kernel based on this extension and 
also provided impressed
performance result.  It is also worth to be reviewed (at least [RFC]) 
if the detailed  spec is public.



ref:
     1. 
https://content.riscv.org/wp-content/uploads/2019/06/17.40-Vector_RISCV-20190611-Vectors.pdf
     2. 
https://content.riscv.org/wp-content/uploads/2019/06/17.20-P-ext-RVW-Zurich-20190611.pdf
     3. 
https://content.riscv.org/wp-content/uploads/2019/06/10.05-TechCommitteeUpdate-June-2019-Copy.pdf



chihmin


Hi chihmin,

Thank you for the detailed and informative response. You have a very 
good understanding of the specifications.


I will modify the code according to the latest spec(currently 
riscv-v-spec 0.7.2) from the ISA spec git repo as Alistair advised.


Yours,

Zhiwei



I would advice some caution in these cases. The major issue is
backward
compatibility, but there are other issues too. Let's say,
fairness. If we
let emulation of a component based on a "working draft proposal" be
integrated into QEMU, this will set a precedent, and many other
developer
would rightfully ask for their contributions based on drafts to be
integrated into QEMU. Our policy should be as equal as possible to all
contribution, large or small, riscv or alpha, cpu or device, tcg
or kvm -
in my honest opinion. QEMU upstream should not be a collecting
place for
all imaginable experimentations, certain criteria on what is
appropriate
for upstreaming exist and must continue to exist.

Yours,
Aleksandar




> The code of vector extension is
> ready and under testing,  the first patch will be sent about two
weeks
> later. After that we will forward working on DSP extension, and
send the
> first patch in middle  October.
>
>      Could the maintainers  tell me whether the specs referenced are
> appropriate? Is anyone working on these extensions? I'd like to get
> your status, and maybe discuss questions and work togather.
>
> Best Regards
>
> LIU Zhiwei
>
>
>
>



Re: [Qemu-devel] [PATCH v6 09/42] block: Include filters when freezing backing chain

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 19:13, Max Reitz wrote:
> In order to make filters work in backing chains, the associated
> functions must be able to deal with them and freeze all filter links, be
> they COW or R/W filter links.
> 
> In the process, rename these functions to reflect that they now act on
> generalized chains of filter nodes instead of backing chains alone.
> 
> While at it, add some comments that note which functions require their
> caller to ensure that a given child link is not frozen, and how the
> callers do so.
> 
> Signed-off-by: Max Reitz 
> ---
>   include/block/block.h | 10 +++---
>   block.c   | 81 +--
>   block/commit.c|  8 ++---
>   block/mirror.c|  4 +--
>   block/stream.c|  8 ++---
>   5 files changed, 62 insertions(+), 49 deletions(-)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index 50a07c1c33..f6f09b95cd 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -364,11 +364,11 @@ int bdrv_drop_intermediate(BlockDriverState *top, 
> BlockDriverState *base,
>   BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>   BlockDriverState *bs);
>   BlockDriverState *bdrv_find_base(BlockDriverState *bs);
> -bool bdrv_is_backing_chain_frozen(BlockDriverState *bs, BlockDriverState 
> *base,
> -  Error **errp);
> -int bdrv_freeze_backing_chain(BlockDriverState *bs, BlockDriverState *base,
> -  Error **errp);
> -void bdrv_unfreeze_backing_chain(BlockDriverState *bs, BlockDriverState 
> *base);
> +bool bdrv_is_chain_frozen(BlockDriverState *bs, BlockDriverState *base,
> +  Error **errp);
> +int bdrv_freeze_chain(BlockDriverState *bs, BlockDriverState *base,
> +  Error **errp);
> +void bdrv_unfreeze_chain(BlockDriverState *bs, BlockDriverState *base);
>   
>   
>   typedef struct BdrvCheckResult {
> diff --git a/block.c b/block.c
> index adf82efb0e..650c00d182 100644
> --- a/block.c
> +++ b/block.c
> @@ -2303,12 +2303,15 @@ static void bdrv_replace_child_noperm(BdrvChild 
> *child,
>* If @new_bs is not NULL, bdrv_check_perm() must be called beforehand, as 
> this
>* function uses bdrv_set_perm() to update the permissions according to the 
> new
>* reference that @new_bs gets.
> + *
> + * Callers must ensure that child->frozen is false.
>*/
>   static void bdrv_replace_child(BdrvChild *child, BlockDriverState *new_bs)
>   {
>   BlockDriverState *old_bs = child->bs;
>   uint64_t perm, shared_perm;
>   
> +/* Asserts that child->frozen == false */
>   bdrv_replace_child_noperm(child, new_bs);
>   
>   /*
> @@ -2468,6 +2471,7 @@ static void bdrv_detach_child(BdrvChild *child)
>   g_free(child);
>   }
>   
> +/* Callers must ensure that child->frozen is false. */
>   void bdrv_root_unref_child(BdrvChild *child)
>   {
>   BlockDriverState *child_bs;
> @@ -2477,10 +2481,6 @@ void bdrv_root_unref_child(BdrvChild *child)
>   bdrv_unref(child_bs);
>   }
>   
> -/**
> - * Clear all inherits_from pointers from children and grandchildren of
> - * @root that point to @root, where necessary.
> - */

Hmm, unrelated chunk? Without it:
Reviewed-by: Vladimir Sementsov-Ogievskiy 

>   static void bdrv_unset_inherits_from(BlockDriverState *root, BdrvChild 
> *child)
>   {
>   BdrvChild *c;
> @@ -2505,6 +2505,7 @@ static void bdrv_unset_inherits_from(BlockDriverState 
> *root, BdrvChild *child)

[..]

-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v2 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
10.08.2019 15:47, Eric Blake wrote:
> On 8/10/19 7:12 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 09.08.2019 18:59, Eric Blake wrote:
>>> On 8/9/19 10:32 AM, Vladimir Sementsov-Ogievskiy wrote:
 backup_cow_with_offload can transfer more than on cluster. Let
>>>
>>> s/on/one/
>>>
 backup_cow_with_bounce_buffer behave similarly. It reduces number
 of IO and there are no needs to copy cluster by cluster.
>>>
>>> It reduces the number of IO requests, since there is no need to copy
>>> cluster by cluster.
> 
 -  bool *error_is_read,
 -  void 
 **bounce_buffer)
 +  bool *error_is_read)
>>>
>>> Why is this signature changing?
>>>
> 
>>
>> 2. Actually it is a question about memory allocator: is it fast and optimized
>> enough to not care, or is it bad, and we should work-around it like in
>> mirror? And in my opinion (proved by a bit of benchmarking) memalign
>> is fast enough to don't care. I was answering similar question in more 
>> details
>> and with some numbers here:
>> https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg00087.html
>>
>> So, I'd prefere to not care and keep code simpler. But if you don't agree,
>> I can leave shared buffer here, at least until introduction of parallel 
>> requests.
>> Then, it will be something like qemu_try_blockalign(MIN(bytes, 64M))..
> 
> It may still be worth capping at 64M.  I'm not opposed to a change to
> per-request allocation rather than trying to reuse a buffer, if it is
> going to make parallelization efforts easier; but I am worried about the
> maximum memory usage.  I'm more worried that you are trying to cram two
> distinct changes into one patch, and didn't even mention the change
> about a change from buffer reuse to per-request allocations, in the
> commit message.  If you make that sort of change, it should be called
> out in the commit message as intentional, or maybe even split to a
> separate patch.
> 

OK, I failed to hide it :) Will split out and add 64M limit.

-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v2 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once

2019-08-10 Thread Eric Blake
On 8/10/19 7:12 AM, Vladimir Sementsov-Ogievskiy wrote:
> 09.08.2019 18:59, Eric Blake wrote:
>> On 8/9/19 10:32 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> backup_cow_with_offload can transfer more than on cluster. Let
>>
>> s/on/one/
>>
>>> backup_cow_with_bounce_buffer behave similarly. It reduces number
>>> of IO and there are no needs to copy cluster by cluster.
>>
>> It reduces the number of IO requests, since there is no need to copy
>> cluster by cluster.

>>> -  bool *error_is_read,
>>> -  void **bounce_buffer)
>>> +  bool *error_is_read)
>>
>> Why is this signature changing?
>>

> 
> 2. Actually it is a question about memory allocator: is it fast and optimized
> enough to not care, or is it bad, and we should work-around it like in
> mirror? And in my opinion (proved by a bit of benchmarking) memalign
> is fast enough to don't care. I was answering similar question in more details
> and with some numbers here:
> https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg00087.html
> 
> So, I'd prefere to not care and keep code simpler. But if you don't agree,
> I can leave shared buffer here, at least until introduction of parallel 
> requests.
> Then, it will be something like qemu_try_blockalign(MIN(bytes, 64M))..

It may still be worth capping at 64M.  I'm not opposed to a change to
per-request allocation rather than trying to reuse a buffer, if it is
going to make parallelization efforts easier; but I am worried about the
maximum memory usage.  I'm more worried that you are trying to cram two
distinct changes into one patch, and didn't even mention the change
about a change from buffer reuse to per-request allocations, in the
commit message.  If you make that sort of change, it should be called
out in the commit message as intentional, or maybe even split to a
separate patch.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 6/7] block/backup: teach backup_cow_with_bounce_buffer to copy more at once

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 18:59, Eric Blake wrote:
> On 8/9/19 10:32 AM, Vladimir Sementsov-Ogievskiy wrote:
>> backup_cow_with_offload can transfer more than on cluster. Let
> 
> s/on/one/
> 
>> backup_cow_with_bounce_buffer behave similarly. It reduces number
>> of IO and there are no needs to copy cluster by cluster.
> 
> It reduces the number of IO requests, since there is no need to copy
> cluster by cluster.
> 
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   block/backup.c | 29 +++--
>>   1 file changed, 15 insertions(+), 14 deletions(-)
>>
>> diff --git a/block/backup.c b/block/backup.c
>> index d482d93458..155e21d0a3 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -104,22 +104,25 @@ static int coroutine_fn 
>> backup_cow_with_bounce_buffer(BackupBlockJob *job,
>> int64_t start,
>> int64_t end,
>> bool 
>> is_write_notifier,
>> -  bool *error_is_read,
>> -  void **bounce_buffer)
>> +  bool *error_is_read)
> 
> Why is this signature changing?
> 
>>   {
>>   int ret;
>>   BlockBackend *blk = job->common.blk;
>>   int nbytes;
>>   int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;
>> +void *bounce_buffer;
>>   
>>   assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>> -bdrv_reset_dirty_bitmap(job->copy_bitmap, start, job->cluster_size);
>> -nbytes = MIN(job->cluster_size, job->len - start);
>> -if (!*bounce_buffer) {
>> -*bounce_buffer = blk_blockalign(blk, job->cluster_size);
> 
> Pre-patch, you allocate the bounce_buffer at most once (but limited to a
> cluster size), post-patch, you are now allocating and freeing a bounce
> buffer every iteration through.  There may be fewer calls because you
> are allocating something bigger, but still, isn't it a good goal to try
> and allocate the bounce buffer as few times as possible and reuse it,
> rather than getting a fresh one each iteration?

Yes, it's a "degradation" of this patch, I was afraid of this question.
However, I doubt that it should be optimized:

1. I'm going to run several copy requests in coroutines to parallelize copying 
loop,
to improve performance (series for qcow2 are here
https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg06654.html), so we'll 
need
several buffers for parallel copying requests and it's extremely easier to 
allocate
buffer when it's needed and free after it, than do some allocated memory 
sharing like
in mirror.

2. Actually it is a question about memory allocator: is it fast and optimized
enough to not care, or is it bad, and we should work-around it like in
mirror? And in my opinion (proved by a bit of benchmarking) memalign
is fast enough to don't care. I was answering similar question in more details
and with some numbers here:
https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg00087.html

So, I'd prefere to not care and keep code simpler. But if you don't agree,
I can leave shared buffer here, at least until introduction of parallel 
requests.
Then, it will be something like qemu_try_blockalign(MIN(bytes, 64M))..

> 
>> +
>> +nbytes = MIN(end - start, job->len - start);
> 
> What is the largest this will be? Will it exhaust memory on a 32-bit or
> otherwise memory-constrained system, where you should have some maximum
> cap for how large the bounce buffer will be while still getting better
> performance than one cluster at a time and not slowing things down by
> over-sizing malloc?  64M, maybe?
> 

Hmm, good point. I thought that it's safe to allocate buffer for a full request,
as if such buffer is already allocated for the guest request itself, it should
not be bad to allocate another one with same size. But I forgot about 
write-zeros
and discard operations which may be large without any allocation and here we 
need
to allocate anyway.

Hmm2, but we have parallel guest writes(discards) and therefore parallel
copy-before-write operations. So total allocation is not limited anyway and it
should be fixed somehow too..

-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH] iotests: Fix 141 when run with qed

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 21:52, Max Reitz wrote:
> 69f47505ee has changed qcow2 in such a way that the commit job run in
> test 141 (and 144[1]) returns before it emits the READY event.  However,
> 141 also runs with qed, where the order is still the other way around.
> Just filter out the {"return": {}} so the test passes for qed again.
> 
> [1] 144 only runs with qcow2, so it is fine as it is.
> 
> Suggested-by: Vladimir Sementsov-Ogievskiy 
> Fixes: 69f47505ee66afaa513305de0c1895a224e52c45
> Signed-off-by: Max Reitz 

Hm, not exactly remember, but these three lines looks like you are doing my
work, I'm sorry for this :(

Reviewed-by: Vladimir Sementsov-Ogievskiy 

> ---
>   tests/qemu-iotests/141   | 9 +++--
>   tests/qemu-iotests/141.out   | 5 -
>   tests/qemu-iotests/common.filter | 5 +
>   3 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/tests/qemu-iotests/141 b/tests/qemu-iotests/141
> index 2197a82d45..8c2ae79f2b 100755
> --- a/tests/qemu-iotests/141
> +++ b/tests/qemu-iotests/141
> @@ -58,16 +58,21 @@ test_blockjob()
> }}}" \
>   'return'
>   
> +# If "$2" is an event, we may or may not see it before the
> +# {"return": {}}.  Therefore, filter the {"return": {}} out both
> +# here and in the next command.  (Naturally, if we do not see it

Here "it" is not event but "{"return": {}}", which is a bit confusing,
as previously we speak about seeing event..

> +# here, we will see it before the next command can be executed,
> +# so it will appear in the next _send_qemu_cmd's output.)
>   _send_qemu_cmd $QEMU_HANDLE \
>   "$1" \
>   "$2" \
> -| _filter_img_create
> +| _filter_img_create | _filter_qmp_empty_return
>   
>   # We want this to return an error because the block job is still running
>   _send_qemu_cmd $QEMU_HANDLE \
>   "{'execute': 'blockdev-del',
> 'arguments': {'node-name': 'drv0'}}" \
> -'error' | _filter_generated_node_ids
> +'error' | _filter_generated_node_ids | _filter_qmp_empty_return
>   
>   _send_qemu_cmd $QEMU_HANDLE \
>   "{'execute': 'block-job-cancel',
> diff --git a/tests/qemu-iotests/141.out b/tests/qemu-iotests/141.out
> index 4d71d9dcae..dbd3bdef6c 100644
> --- a/tests/qemu-iotests/141.out
> +++ b/tests/qemu-iotests/141.out
> @@ -10,7 +10,6 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576 
> backing_file=TEST_DIR/m.
>   Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 
> backing_file=TEST_DIR/t.IMGFMT backing_fmt=IMGFMT
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
> -{"return": {}}
>   {"error": {"class": "GenericError", "desc": "Node drv0 is in use"}}
>   {"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "aborting", "id": "job0"}}
> @@ -27,7 +26,6 @@ Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 
> backing_file=TEST_DIR/t.
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 
> 0, "type": "mirror"}}
> -{"return": {}}
>   {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block 
> device is in use by block job: mirror"}}
>   {"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job0"}}
> @@ -42,7 +40,6 @@ Formatting 'TEST_DIR/o.IMGFMT', fmt=IMGFMT size=1048576 
> backing_file=TEST_DIR/t.
>   {"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job0"}}
> -{"return": {}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job0"}}
>   {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
> "BLOCK_JOB_READY", "data": {"device": "job0", "len": 0, "offset": 0, "speed": 
> 0, "type": "commit"}}
>   {"error": {"class": "GenericError", "desc": "Node 'drv0' is busy: block 
> device is in use by block job: commit"}}
> @@ -61,7 +58,6 @@ wrote 1048576/1048576 bytes at offset 0
>   {"return": {}}
> 

Re: [Qemu-devel] backup bug or question

2019-08-10 Thread Vladimir Sementsov-Ogievskiy
09.08.2019 23:13, John Snow wrote:
> 
> 
> On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Hi!
>>
>> Hmm, hacking around backup I have a question:
>>
>> What prevents guest write request after job_start but before setting
>> write notifier?
>>
>> code path:
>>
>> qmp_drive_backup or transaction with backup
>>
>>  job_start
>> aio_co_enter(job_co_entry) /* may only schedule execution, isn't it 
>> ? */
>>
>> 
>>
>> job_co_entry
>>  job_pause_point() /* it definitely yields, isn't it bad? */
>>  job->driver->run() /* backup_run */
>>
>> 
>>
>> backup_run()
>>  bdrv_add_before_write_notifier()
>>
>> ...
>>
> 
> I think you're right... :(
> 
> 
> We create jobs like this:
> 
> job->paused= true;
> job->pause_count   = 1;
> 
> 
> And then job_start does this:
> 
> job->co = qemu_coroutine_create(job_co_entry, job);
> job->pause_count--;
> job->busy = true;
> job->paused = false;
> 
> 
> Which means that job_co_entry is being called before we lift the pause:
> 
> assert(job && job->driver && job->driver->run);
> job_pause_point(job);
> job->ret = job->driver->run(job, >err);
> 
> ...Which means that we are definitely yielding in job_pause_point.
> 
> Yeah, that's a race condition waiting to happen.
> 
>> And what guarantees we give to the user? Is it guaranteed that write 
>> notifier is
>> set when qmp command returns?
>>
>> And I guess, if we start several backups in a transaction it should be 
>> guaranteed
>> that the set of backups is consistent and correspond to one point in time...
>>
> 
> I would have hoped that maybe the drain_all coupled with the individual
> jobs taking drain_start and drain_end would save us, but I guess we
> simply don't have a guarantee that all backup jobs WILL have installed
> their handler by the time the transaction ends.
> 
> Or, if there is that guarantee, I don't know what provides it, so I
> think we shouldn't count on it accidentally working anymore.
> 
> 
> 
> I think we should do two things:
> 
> 1. Move the handler installation to creation time.
> 2. Modify backup_before_write_notify to return without invoking
> backup_do_cow if the job isn't started yet.
> 

Hmm, I don't see, how it helps.. No-op write-notifier will not save as from
guest write, is it?


-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v1 1/7] fpu: move LIT64 helper to softfloat-types

2019-08-10 Thread Alex Bennée


Peter Maydell  writes:

> On Thu, 8 Aug 2019 at 17:41, Alex Bennée  wrote:
>>
>> This simple pasting helper can be used by those who don't need the
>> entire softfloat api. Move it to the smaller types header.
>>
>> Signed-off-by: Alex Bennée 
>> ---
>>  include/fpu/softfloat-types.h | 2 ++
>>  include/fpu/softfloat.h   | 2 --
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> I think we should be trying to get rid of uses of this
> macro, not making it easier to use in more places...

True - that is a more invasive (but mechanical) patch.

--
Alex Bennée