On Thu, Jul 23, 2020 at 10:54:31AM -0500, Eric Blake wrote: > I'm thinking of adding one or more callbacks to nbdkit to let > plugins/filters enforce various block size alignments (for example, > the swab filter requires 2/4/8 alignment, or VDDK requires 512 > alignment, etc). The NBD protocol currently has NBD_INFO_BLOCK_SIZE > which can be sent in reply to NBD_OPT_GO to tell the client about > sizing constraints; qemu already implements it as both client and > server, so we have some reasonable testing setups (although libnbd > will also need some additions to make it easier to expose > constraints to the user, and/or add new convenience APIs to do > blocksize-style read-modify-write at the libnbd client side rather > than needing the blocksize filter in the NBD server side). > > But NBD_INFO_BLOCK_SIZE is not the full picture: it only covers > minimum block size, preferred block size, and maximum data size; > there has been discussion on the NBD list about also advertising > maximum action size (discussion has mentioned trim and/or zero, but > even cache could benefit from larger buffer size than pread), which > means we should be thinking about supporting future protocol > extensions in whatever we expose to plugins. > > So, I'm thinking something like the following: > > New enum: > NBDKIT_BLOCK_SIZE_GET_MINIMUM > NBDKIT_BLOCK_SIZE_GET_PREFERRED > NBDKIT_BLOCK_SIZE_GET_MAX_DATA > NBDKIT_BLOCK_SIZE_GET_MAX_TRIM > NBDKIT_BLOCK_SIZE_GET_MAX_ZERO > NBDKIT_BLOCK_SIZE_GET_MAX_CACHE
enum or int? I think there are ABI problems with enums, although probably not unless we have more than 256 cases? > along with a new callback for plugins: > > int64_t block_size (void *handle, int which); > > where 'which' is one of the enum values. A future nbdkit might > request an enum value not recognized at the time the plugin was > compiled, so the recommended behavior is that a plugin returns -1 on > error, 0 to let nbdkit pick a sane default (including when 'which' > was unknown), or a positive value for an actual limit. For now, > nbdkit would advertise MIN, PREFERRED, and MAX_DATA limits (to > clients that use NBD_OPT_GO), while the others are enforced only > internally. The idea is that if the plugin installs a limit, a > client request that violates that limit will fail with EINVAL for > being unaligned, without even asking the plugin to try the response. Isn't the plan that the server would try to resolve the problem - eg. by making a RMW request or splitting a request? This would be especially the case for internal requests, but I could also see it having value for clients which either ignore the block size information, or don't implement it correctly/completely. > nbdkit calls the plugin callback once per connection per reasonable > 'which' (caching the results like it does for .get_size, and > skipping limits where .can_FOO fails). Returning int64_t allows us > to reuse this callback without change when we add v3 callbacks, > although values larger than 4G make no difference at present. I > thought the use of an enum was nicer than filling in a struct whose > size might change, or adding one callback per limit. Yes the enum/int seems like a better idea than dealing with structs, and is also easier to map into other programming languages. The overhead of a few extra indirect function calls is negligible because it's only once per connection. The only other alternative I can see would be to put these into struct nbdkit_plugin (as int64_t fields), but we have always regretted using plain fields instead of functions in this struct, eg. thread_model, config_help, etc. > Filters can relax limits (such as blocksize turning a plugin MIN 512 > into an advertised MIN 1, by doing read-modify-write as needed) or > tighten limits (the file plugin has MIN 1, but the swab filter > imposes a tighter MIN 2). Constraints between limits are as > follows: Can the server do this transparently between layers? Might be a lot easier to implement it once. > MIN: must be power of 2, between 1 and 64k; .get_size and .extents > must return results aligned to MIN (as any unaligned tail or extent > transition is inaccessible using only aligned operations). Defaults > to 1. > > PREFERRED: must be power of 2, between max(512, MIN) and 2M (this > upper limit is not specified by NBD spec, but matches qemu's > implementation of what it uses as the max qcow2 cluster size). > Defaults to max(4k, MIN). > > MAX_DATA: must be multiple of MIN and at least as large as > PREFERRED; values larger than 64M are okay but clamped by nbdkit's > own internal limits. Defaults to 64M. > > MAX_TRIM, MAX_ZERO, MAX_CACHE: must be multiple of MIN, should be at > least as big as MAX_DATA. Values 4G and larger are clamped by > nbdkit's own internal limits. Defaults to 4G-MIN for TRIM/ZERO, and > MAX_DATA for CACHE. Sounds reasonable otherwise. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/ _______________________________________________ Libguestfs mailing list [email protected] https://www.redhat.com/mailman/listinfo/libguestfs
