On 06/15/2016 02:46 AM, Denis V. Lunev wrote: > On 06/15/2016 06:00 AM, Eric Blake wrote: >> On 06/14/2016 09:25 AM, Denis V. Lunev wrote: >>> With a bdrv_co_write_zeroes method on a target BDS zeroes will not be >>> placed >>> into the wire. Thus the target could be very efficiently zeroed out. >>> This >>> is should be done with the largest chunk possible. >>>
>> Probably nicer to track this in bytes. And do you really want a >> hard-coded arbitrary limit, or is it better to live with >> MIN_NON_ZERO(target_bs->bl.max_pwrite_zeroes, INT_MAX)? > unfortunately we should. INT_MAX is not aligned as required. > May be we should align INT_MAX properly to fullfill > write_zeroes alignment. > > Hmm, may be we can align INT_MAX properly down. OK, > I'll try to do that gracefully. It's fairly easy to round a max_transfer or max_pwrite_zeroes down to an aligned value; we already have code in io.c that does that in bdrv_co_do_pwrite_zeroes(). > >>> @@ -512,7 +513,8 @@ static int mirror_dirty_init(MirrorBlockJob *s) >>> end = s->bdev_length / BDRV_SECTOR_SIZE; >>> - if (base == NULL && !bdrv_has_zero_init(target_bs)) { >>> + if (base == NULL && !bdrv_has_zero_init(target_bs) && >>> + target_bs->drv->bdrv_co_write_zeroes == NULL) { >> Indentation is off, although if checkpatch.pl doesn't complain I guess >> it doesn't matter that much. >> >> Why should you care whether the target_bs->drv implements a callback? >> Can't you just rely on the normal bdrv_*() functions to do the dirty >> work of picking the most efficient implementation without you having to >> bypass the block layer? In fact, isn't that the whole goal of >> bdrv_make_zero() - why not call that instead of reimplementing it? > this is the idea of the patch actually. If the callback is not > implemented, we > will have zeroes actually written or send to the wire. In this case > there is > not much sense to do that, the amount of data actually written will be > significantly increased (some areas will be written twice - with zeroes and > with the actual data). > But that's where bdrv_can_write_zeroes_with_unmap() comes in handy - you can use the public interface to learn whether bdrv_make_zero() will be efficient or not, without having to probe what the backend supports. > On the other hand, if callback is implemented, we will have very small > amount > of data in the wire and written actually and thus will have a benefit. I am > trying to avoid very small chunks of data. Here (during the migration > process) > the data is sent with 10 Mb chunks and with takes a LOT of time with NBD. > We can send chunks 1.5 Gb (currently). They occupies the same 26 bytes > of data > on the transport layer. I agree that we don't want to pre-initialize the device to zero unless write zeroes is an efficient operation, but I don't think that the existence of bs->drv->bdrv_co_[p]write_zeroes is the right way to find that out. I also think that we need to push harder on the NBD list that under the new block limits proposal, we WANT to be able to advertise when the new NBD_CMD_WRITE_ZEROES command will accept a larger size than NBD_CMD_WRITE (as currently written, the BLOCK_INFO extension proposal states that if a server advertises a max transaction size to the client, then the client must honor that size for all commands including NBD_CMD_WRITE_ZEROES, which would mean your 1.5G request [or my proposed 2G - 4k request] is invalid and would have to be a bunch of 32M requests). https://sourceforge.net/p/nbd/mailman/message/35081223/ -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature