On 02/22/2017 11:26 AM, Paolo Bonzini wrote:
> 
> 
> On 22/02/2017 18:11, Eric Blake wrote:
>>>> +    /* preferred - At least 4096, but larger as appropriate. */
>>>> +    sizes[1] = MAX(blk_get_opt_transfer(exp->blk), 4096);
>>
>> The NBD specification requires a non-zero power-of-2 number if the
>> server transmits the block size at all; 1 is the ideal number, followed
>> by whatever actual size we learn from the request_align of the device.

Oh shoot - now I notice I misread your complaint - I thought you were
complaining about sizes[0] (min_size) having a TODO comment; but you
were talking about sizes[1] (preferred_size).

The NBD spec wording can be changed if needed (after all, it is still
experimental), but it currently says:

"If block size constraints have not been advertised or agreed on
externally, then a client SHOULD assume a default minimum block size of
1, a preferred block size of 2^12 (4,096), and a maximum block size of
the smaller of the export size or 0xffffffff (effectively unlimited).

...

"The preferred block size represents the minimum size at which aligned
requests will have efficient I/O, avoiding behaviour such as
read-modify-write. If advertised, this MUST be a power of 2 at least as
large as the smaller of the minimum block size and 2^12 (4,096),
although larger values (such as the minimum granularity of a hole) are
also appropriate. The preferred block size MAY be larger than the export
size, in which case the client is unable to utilize the preferred block
size for that export. The server MAY advertise an export size that is
not an integer multiple of the preferred block size."

> 
> Oh, so it's the smallest "good" transfer size, or the preferred
> alignment.  That's not the same as the SCSI definition, which is:
> 
>    If a device server receives one of these commands with a transfer
>    size greater than this value, then the device server may incur
>    delays in processing the command. An OPTIMAL TRANSFER LENGTH field
>    set to 0000_0000h indicates that the device server does not report
>    an optimal transfer size.

Hmm - that's yet another limit. I don't know if our block layer exposes
it, or if it should expose it.

> 
> It's more similar to the physical block size:

Indeed; at least that was my intent (picking a size that will avoid
read-modify-write pessimations, as well as reflecting granularity of
trim/zero operations).

> 
>    When using logical block access commands (see 4.2.2), application
>    clients should:
>    a) specify an LBA that is aligned to a physical block boundary; and
>    b) access an integral number of physical blocks, provided that the
>    access does not go beyond the last LBA on the medium.
> 
> So I'd rather ignore it in the client, and send 4096 in the server.

Does that mean our BlockLimits structure documentation needs a tweak,
too?  It currently reads:

    /* Optimal transfer length in bytes.  A power of 2 is best but not
     * mandatory.  Must be a multiple of bl.request_alignment, or 0 if
     * no preferred size */
    uint32_t opt_transfer;

Are we trying to track both optimum size in the SCSI sense _and_ block
size in the O_DIRECT sense?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to