On 31 Mar 2016, at 14:02, Denis V. Lunev <d...@openvz.org> wrote: > From: Pavel Borzenkov <pborzen...@virtuozzo.com> > > There exist some cases when a client knows that the data it is going to > write is all zeroes. Such cases include mirroring or backing up a device > implemented by a sparse file.
Useful. > -- bit 0, `NBD_CMD_FLAG_FUA`; valid during `NBD_CMD_WRITE`. SHOULD be > - set to 1 if the client requires "Force Unit Access" mode of > - operation. MUST NOT be set unless transmission flags included > - `NBD_FLAG_SEND_FUA`. > +- bit 0, `NBD_CMD_FLAG_FUA`; valid during `NBD_CMD_WRITE` and > + `NBD_CMD_WRITE_ZEROES` commands. SHOULD be set to 1 if the client requires > + "Force Unit Access" mode of operation. MUST NOT be set unless transmission > + flags included `NBD_FLAG_SEND_FUA`. Not your fault, but this should actually say "unless export flags included". Transmission flags would be the flags with the command. > +- bit 1, `NBD_CMD_MAY_TRIM`; defined by the experimental `WRITE_ZEROES` > + extension; see below. For consistency, probably useful to say here: MUST NOT be set unless the export flags include NBD_FLAG_SEND_WRITE_ZEROES. > > #### Request types > > @@ -523,6 +528,10 @@ The following request types exist: > A client MUST NOT send a trim request unless `NBD_FLAG_SEND_TRIM` > was set in the transmission flags field. > > +* `NBD_CMD_WRITE_ZEROES` (6) > + > + Defined by the experimental `WRITE_ZEROES` extension; see below. > + > * Other requests > > Some third-party implementations may require additional protocol > @@ -654,6 +663,53 @@ option reply type. > message if they do not also send it as a reply to the > `NBD_OPT_SELECT` message. > > +### `WRITE_ZEROES` extension > + > +There exist some cases when a client knows that the data it is going to write > +is all zeroes. Such cases include mirroring or backing up a device > implemented > +by a sparse file. With current NBD command set, the client has to issue > +`NBD_CMD_WRITE` command with zeroed payload and transfer these zero bytes > +through the wire. The server has to write the data onto disk, effectively > +losing the sparseness. > + > +To remedy this, a `WRITE_ZEROES` extension is envisioned. This extension adds > +one new command and one new command flag. > + > +* `NBD_CMD_WRITE_ZEROES` (6) > + > + A write request with no payload. Length and offset define the location > + and amount of data to be zeroed. > + > + The server MUST zero out the data on disk, and then send the reply > + message. The server MAY send the reply message before the data has > + reached permanent storage. > + > + A client MUST NOT send a write zeroes request unless > + `NBD_FLAG_SEND_WRITE_ZEROES` was set in the transmission flags field. > + > + If the `NBD_FLAG_SEND_FUA` flag was set in the transmission flags field, > + the client MAY set the flag `NBD_CMD_FLAG_FUA` in the command flags > field. > + If this flag was set, the server MUST NOT send the reply until it has > + ensured that the newly-zeroed data has reached permanent storage. > + > + If the flag `NBD_CMD_FLAG_MAY_TRIM` was set by the client in the command > + flags field, the server MAY use trimming to zero out the area, but it > + MUST ensure that the data reads back as zero. > + Can you give an example of a situation where the client would not set this and it would be undesirable for the server to create a 'hole' using 'trim' type technology, even when the client doesn't specify it? I suspect there are already some backends (e.g. ceph on qemu-nbd) which will effectively do a 'trim' if you write 4k of zeroes even under current circumstances. IE why not always permit trimming PROVIDED the data always reads back as zero? This would be far simpler. -- Alex Bligh