On 02/22/2017 11:26 AM, Paolo Bonzini wrote: > > > On 22/02/2017 18:11, Eric Blake wrote: >>>> + /* preferred - At least 4096, but larger as appropriate. */ >>>> + sizes[1] = MAX(blk_get_opt_transfer(exp->blk), 4096); >> >> The NBD specification requires a non-zero power-of-2 number if the >> server transmits the block size at all; 1 is the ideal number, followed >> by whatever actual size we learn from the request_align of the device.
Oh shoot - now I notice I misread your complaint - I thought you were complaining about sizes[0] (min_size) having a TODO comment; but you were talking about sizes[1] (preferred_size). The NBD spec wording can be changed if needed (after all, it is still experimental), but it currently says: "If block size constraints have not been advertised or agreed on externally, then a client SHOULD assume a default minimum block size of 1, a preferred block size of 2^12 (4,096), and a maximum block size of the smaller of the export size or 0xffffffff (effectively unlimited). ... "The preferred block size represents the minimum size at which aligned requests will have efficient I/O, avoiding behaviour such as read-modify-write. If advertised, this MUST be a power of 2 at least as large as the smaller of the minimum block size and 2^12 (4,096), although larger values (such as the minimum granularity of a hole) are also appropriate. The preferred block size MAY be larger than the export size, in which case the client is unable to utilize the preferred block size for that export. The server MAY advertise an export size that is not an integer multiple of the preferred block size." > > Oh, so it's the smallest "good" transfer size, or the preferred > alignment. That's not the same as the SCSI definition, which is: > > If a device server receives one of these commands with a transfer > size greater than this value, then the device server may incur > delays in processing the command. An OPTIMAL TRANSFER LENGTH field > set to 0000_0000h indicates that the device server does not report > an optimal transfer size. Hmm - that's yet another limit. I don't know if our block layer exposes it, or if it should expose it. > > It's more similar to the physical block size: Indeed; at least that was my intent (picking a size that will avoid read-modify-write pessimations, as well as reflecting granularity of trim/zero operations). > > When using logical block access commands (see 4.2.2), application > clients should: > a) specify an LBA that is aligned to a physical block boundary; and > b) access an integral number of physical blocks, provided that the > access does not go beyond the last LBA on the medium. > > So I'd rather ignore it in the client, and send 4096 in the server. Does that mean our BlockLimits structure documentation needs a tweak, too? It currently reads: /* Optimal transfer length in bytes. A power of 2 is best but not * mandatory. Must be a multiple of bl.request_alignment, or 0 if * no preferred size */ uint32_t opt_transfer; Are we trying to track both optimum size in the SCSI sense _and_ block size in the O_DIRECT sense? -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature