Re: [Qemu-devel] [PATCH 4/9] mirror: efficiently zero out target

Eric Blake Wed, 15 Jun 2016 05:36:40 -0700

On 06/15/2016 02:46 AM, Denis V. Lunev wrote:
> On 06/15/2016 06:00 AM, Eric Blake wrote:
>> On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
>>> With a bdrv_co_write_zeroes method on a target BDS zeroes will not be
>>> placed
>>> into the wire. Thus the target could be very efficiently zeroed out.
>>> This
>>> is should be done with the largest chunk possible.
>>>


>> Probably nicer to track this in bytes.  And do you really want a
>> hard-coded arbitrary limit, or is it better to live with
>> MIN_NON_ZERO(target_bs->bl.max_pwrite_zeroes, INT_MAX)?
> unfortunately we should. INT_MAX is not aligned as required.
> May be we should align INT_MAX properly to fullfill
> write_zeroes alignment.
> 
> Hmm, may be we can align INT_MAX properly down. OK,
> I'll try to do that gracefully.

It's fairly easy to round a max_transfer or max_pwrite_zeroes down to an
aligned value; we already have code in io.c that does that in
bdrv_co_do_pwrite_zeroes().

> 
>>> @@ -512,7 +513,8 @@ static int mirror_dirty_init(MirrorBlockJob *s)
>>>         end = s->bdev_length / BDRV_SECTOR_SIZE;
>>>   -    if (base == NULL && !bdrv_has_zero_init(target_bs)) {
>>> +    if (base == NULL && !bdrv_has_zero_init(target_bs) &&
>>> +            target_bs->drv->bdrv_co_write_zeroes == NULL) {
>> Indentation is off, although if checkpatch.pl doesn't complain I guess
>> it doesn't matter that much.
>>
>> Why should you care whether the target_bs->drv implements a callback?
>> Can't you just rely on the normal bdrv_*() functions to do the dirty
>> work of picking the most efficient implementation without you having to
>> bypass the block layer?  In fact, isn't that the whole goal of
>> bdrv_make_zero() - why not call that instead of reimplementing it?
> this is the idea of the patch actually. If the callback is not
> implemented, we
> will have zeroes actually written or send to the wire. In this case
> there is
> not much sense to do that, the amount of data actually written will be
> significantly increased (some areas will be written twice - with zeroes and
> with the actual data).
>

But that's where bdrv_can_write_zeroes_with_unmap() comes in handy - you
can use the public interface to learn whether bdrv_make_zero() will be
efficient or not, without having to probe what the backend supports.

> On the other hand, if callback is implemented, we will have very small
> amount
> of data in the wire and written actually and thus will have a benefit. I am
> trying to avoid very small chunks of data. Here (during the migration
> process)
> the data is sent with 10 Mb chunks and with takes a LOT of time with NBD.
> We can send chunks 1.5 Gb (currently). They occupies the same 26 bytes
> of data
> on the transport layer.

I agree that we don't want to pre-initialize the device to zero unless
write zeroes is an efficient operation, but I don't think that the
existence of bs->drv->bdrv_co_[p]write_zeroes is the right way to find
that out.

I also think that we need to push harder on the NBD list that under the
new block limits proposal, we WANT to be able to advertise when the new
NBD_CMD_WRITE_ZEROES command will accept a larger size than
NBD_CMD_WRITE (as currently written, the BLOCK_INFO extension proposal
states that if a server advertises a max transaction size to the client,
then the client must honor that size for all commands including
NBD_CMD_WRITE_ZEROES, which would mean your 1.5G request [or my proposed
2G - 4k request] is invalid and would have to be a bunch of 32M requests).
https://sourceforge.net/p/nbd/mailman/message/35081223/

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 4/9] mirror: efficiently zero out target

Reply via email to