Am 08.11.2018 um 11:02 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 07.11.2018 21:16, Kevin Wolf wrote:
> > (Broken quoting in text/plain again)
> >
> > Am 01.11.2018 um 13:17 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> 27.09.2018 20:35, Max Reitz wrote:
> >>
> >>      On 07.08.18 19:43, Vladimir Sementsov-Ogievskiy wrote:
> >>
> >>          Memory allocation may become less efficient for encrypted case. 
> >> It's a
> >>          payment for further asynchronous scheme.
> >>
> >>          Signed-off-by: Vladimir Sementsov-Ogievskiy 
> >> <vsement...@virtuozzo.com>
> >>          ---
> >>           block/qcow2.c | 114 
> >> ++++++++++++++++++++++++++++++++--------------------------
> >>           1 file changed, 64 insertions(+), 50 deletions(-)
> >>
> >>          diff --git a/block/qcow2.c b/block/qcow2.c
> >>          index 65e3ca00e2..5e7f2ee318 100644
> >>          --- a/block/qcow2.c
> >>          +++ b/block/qcow2.c
> >>          @@ -1808,6 +1808,67 @@ out:
> >>               return ret;
> >>           }
> >>
> >>          +static coroutine_fn int qcow2_co_preadv_normal(BlockDriverState 
> >> *bs,
> >>          +                                               uint64_t 
> >> file_cluster_offset,
> >>          +                                               uint64_t offset,
> >>          +                                               uint64_t bytes,
> >>          +                                               QEMUIOVector 
> >> *qiov,
> >>          +                                               uint64_t 
> >> qiov_offset)
> >>          +{
> >>          +    int ret;
> >>          +    BDRVQcow2State *s = bs->opaque;
> >>          +    void *crypt_buf = NULL;
> >>          +    QEMUIOVector hd_qiov;
> >>          +    int offset_in_cluster = offset_into_cluster(s, offset);
> >>          +
> >>          +    if ((file_cluster_offset & 511) != 0) {
> >>          +        return -EIO;
> >>          +    }
> >>          +
> >>          +    qemu_iovec_init(&hd_qiov, qiov->niov);
> >>
> >>      So you're not just re-allocating the bounce buffer for every single
> >>      call, but also this I/O vector.  Hm.
> > And this one is actually at least a little more serious, I think.
> >
> > Decryption and decompression (including copying between the original
> > qiov and the bounce buffer) are already very slow, so allocating a
> > bounce buffer or not shouldn't make any noticable difference.
> >
> > But I'd like to keep allocations out of the fast path as much as we can.
> >
> >>          +    if (bs->encrypted) {
> >>          +        assert(s->crypto);
> >>          +        assert(bytes <= QCOW_MAX_CRYPT_CLUSTERS * 
> >> s->cluster_size);
> >>          +
> >>          +        crypt_buf = qemu_try_blockalign(bs->file->bs, bytes);
> >>
> >>      1. Why did you remove the comment?
> >>
> >>      2. The check for whether crypt_buf was successfully allocated is 
> >> missing.
> >>
> >>      3. Do you have any benchmarks for encrypted images?  Having to 
> >> allocate
> >>      the bounce buffer for potentially every single cluster sounds rather 
> >> bad
> >>      to me.
> > Actually, benchmarks for normal, fully allocated images would be a
> > little more interesting. Though I'm not sure if qcow2 actually performs
> > well enough that we'll see any difference even there (when we measured
> > memory allocation overhead etc. for raw images in the context of
> > comparing coroutines to AIO callback styes, the difference was already
> > hard to see).
> 
> Ok, I'll benchmark it and/or improve fast path (you mean code path, 
> where we skip coroutines creation, to handle the whole request at once, 
> yes?).

I had less the specific code path in mind, but the case of reading
unallocated clusters or reading/writing an already allocated area of
normal or zero clusters (no compression, no encrpytion, no other fancy
stuff). These are the cases that qcow2 images will hit the most in
practice.

> Hmm, all this staff with hd_qiov's in many places is due to we don't
> have qiov_offset parameter in .bdrv_co_preadv.. Is it a bad idea to
> add it?

I actually thought the same yesterday, though I'm not completely sure
yet about it. It makes the interfaces a bit uglier, but it could indeed
save us some work in the I/O path, so unless we find bigger reasons
against it, maybe we should do that.

> Anyway, I now think, it's better to start with decompress-threads 
> series, as compress-threads are already merged, then bring encryption to 
> threads too, and then return to these series. This way will remove some 
> questions about allocation, and may be it would be simpler and more 
> consistent to bring more things to coroutines, not only normal clusters.

Okay, we can do that.

Kevin

Reply via email to