Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
Ping. Should I write this up as a proper proposal? On Thu, Nov 16, 2017 at 05:20:29PM +0100, Wouter Verhelst wrote: > On Thu, Nov 16, 2017 at 09:30:41AM -0600, Eric Blake wrote: > > On 11/16/2017 03:51 AM, Wouter Verhelst wrote: > > > > >> I also remember from talking with Vladimir during KVM Forum last month > > >> that one of the shortfalls of the NBD protocol is that you can only ever > > >> send a length of up to 32 bits on the command side (unless we introduce > > >> structured commands in addition to our current work to add structured > > >> replies); > > > > > > Yes, and I'm thinking we should do so. This will obviously require more > > > negotiation. > > > > > > Can be done fairly easily though: > > > - Client negotiates structured replies (don't think it makes sense to do > > > structured requests without structured replies) > > > - Server sets an extra transmission flag to say "I am capable of > > > receiving extended requests" > > > - Extended requests have a different magic number, and should have a > > > "request length" field as well. I'm thinking we make it: > > > > > > magic (32b) > > > request length (16b) > > > type (16b) > > > flags (64b) > > > handle (64b) > > > from (64b) > > > data length(64b) > > > (extra data) > > > > > > Request length in this proposal should always be at least 320. > > > > 320 bits, but we pass length in bytes, so 40. > > Umm. Yes. I didn't sleep very well last night, sorry. > > > > I made flags 64 bits rather than 16 as per the current format, because > > > that way everything is aligned on a 4-byte boundary, which makes things > > > a bit easier on some architectures (e.g., I know that sparc doesn't like > > > unaligned 64-bit access). 64 bits for flags looks like a bit of a waste, > > > but then if we're going to waste some bits somewhere, I guess it's best > > > to assign them to flags. > > > > > > The idea is that "request length" is the length of the data that the > > > client is sending, and "data length" is the length of the range that > > > we're trying to deal with. > > > > Yep, sounds reasonable. > > > > > A write request would thus have to have request length be (data length + > > > 320); a read request would have request length be 320, and expect data > > > to be returned of data length bytes. > > > > Umm, if data length is 64 bits, but request length is 16 bits, we can't > > properly support write requests with that much payload. > > mumble mumble. Yes, of course. See above ;-) > > > I think it goes > > back to the idea of block sizes: the maximum request that a server is > > willing to accept (advertised via INFO during NBD_OPT_GO) still applies, > > so you can't NBD_CMD_WRITE with more than that maximum size (and > > therefore your maximum request size can't be more than that size plus > > the overhead of the 40 byte header), but OTHER commands that CAN operate > > on larger chunks of the export (WRITE_ZEROES, BLOCK_STATUS, maybe a new > > extension LOCK for allowing fcntl()-like locking semantics) can now be > > accommodated, because they don't have to send large amounts of payload > > along with the request. > > > > Keeping request length at 16 bits may therefore be too small (if we want > > to allow 32M write payloads), and may require us laying out the > > structured write header slightly differently (maybe fewer flag bits > > after all). > > Right, so let's make it: > > magic (32b) > flags (16b) > type (16b) > handle (64b) > from (64b) > data length(64b) > request length (64b) > (extra data) > > That actually copies the same structure as the current request header, but: > - data length is extended from 32 bits to 64 bits, so that long requests can > be > made where they make sense. We should make it clear though that block > size limitations still apply, and that a request to read 2^63 bytes of > data is likely to be refused by a server. > - request lenght is added, also as a 64-bit value. I could be persuaded > to make that a 32-bit value rather than a 64-bit one, though. > > This way, common code can be used to handle all data up to and including > the "from" field; what's beyond that would need to be decided based upon > the value of the magic field (32-byte data length in case of old > requests, 2 64-byte length values in the other case). > > Maybe, by the time we get here, it makes sense to also have "request > length" be 0 for a command that doesn't have any extra data beyond that > header. That would also make one particular case easier to read: > > if(req.type == NBD_CMD_WRITE && req.data_length != req.request_length) > return EINVAL; > > However, the downside of that is that we include header length in all > other cases (negotiation, structured replies), and this one would then > be the odd one out; I could imagine that might cause too much confusion. > Not sure whether the minor advantage a
Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
On Thu, Nov 16, 2017 at 09:30:41AM -0600, Eric Blake wrote: > On 11/16/2017 03:51 AM, Wouter Verhelst wrote: > > >> I also remember from talking with Vladimir during KVM Forum last month > >> that one of the shortfalls of the NBD protocol is that you can only ever > >> send a length of up to 32 bits on the command side (unless we introduce > >> structured commands in addition to our current work to add structured > >> replies); > > > > Yes, and I'm thinking we should do so. This will obviously require more > > negotiation. > > > > Can be done fairly easily though: > > - Client negotiates structured replies (don't think it makes sense to do > > structured requests without structured replies) > > - Server sets an extra transmission flag to say "I am capable of > > receiving extended requests" > > - Extended requests have a different magic number, and should have a > > "request length" field as well. I'm thinking we make it: > > > > magic (32b) > > request length (16b) > > type (16b) > > flags (64b) > > handle (64b) > > from (64b) > > data length(64b) > > (extra data) > > > > Request length in this proposal should always be at least 320. > > 320 bits, but we pass length in bytes, so 40. Umm. Yes. I didn't sleep very well last night, sorry. > > I made flags 64 bits rather than 16 as per the current format, because > > that way everything is aligned on a 4-byte boundary, which makes things > > a bit easier on some architectures (e.g., I know that sparc doesn't like > > unaligned 64-bit access). 64 bits for flags looks like a bit of a waste, > > but then if we're going to waste some bits somewhere, I guess it's best > > to assign them to flags. > > > > The idea is that "request length" is the length of the data that the > > client is sending, and "data length" is the length of the range that > > we're trying to deal with. > > Yep, sounds reasonable. > > > A write request would thus have to have request length be (data length + > > 320); a read request would have request length be 320, and expect data > > to be returned of data length bytes. > > Umm, if data length is 64 bits, but request length is 16 bits, we can't > properly support write requests with that much payload. mumble mumble. Yes, of course. See above ;-) > I think it goes > back to the idea of block sizes: the maximum request that a server is > willing to accept (advertised via INFO during NBD_OPT_GO) still applies, > so you can't NBD_CMD_WRITE with more than that maximum size (and > therefore your maximum request size can't be more than that size plus > the overhead of the 40 byte header), but OTHER commands that CAN operate > on larger chunks of the export (WRITE_ZEROES, BLOCK_STATUS, maybe a new > extension LOCK for allowing fcntl()-like locking semantics) can now be > accommodated, because they don't have to send large amounts of payload > along with the request. > > Keeping request length at 16 bits may therefore be too small (if we want > to allow 32M write payloads), and may require us laying out the > structured write header slightly differently (maybe fewer flag bits > after all). Right, so let's make it: magic (32b) flags (16b) type (16b) handle (64b) from (64b) data length(64b) request length (64b) (extra data) That actually copies the same structure as the current request header, but: - data length is extended from 32 bits to 64 bits, so that long requests can be made where they make sense. We should make it clear though that block size limitations still apply, and that a request to read 2^63 bytes of data is likely to be refused by a server. - request lenght is added, also as a 64-bit value. I could be persuaded to make that a 32-bit value rather than a 64-bit one, though. This way, common code can be used to handle all data up to and including the "from" field; what's beyond that would need to be decided based upon the value of the magic field (32-byte data length in case of old requests, 2 64-byte length values in the other case). Maybe, by the time we get here, it makes sense to also have "request length" be 0 for a command that doesn't have any extra data beyond that header. That would also make one particular case easier to read: if(req.type == NBD_CMD_WRITE && req.data_length != req.request_length) return EINVAL; However, the downside of that is that we include header length in all other cases (negotiation, structured replies), and this one would then be the odd one out; I could imagine that might cause too much confusion. Not sure whether the minor advantage above is worth that. > Or we can say that NBD_CMD_WRITE still has to use old-style requests. Let's not :-) > > A metadata request could then tack on extra data, where request length > > of 320 implies "all negotiated metadata contexts", but anything more > > than that would imply there are some metadata IDs passed along. > > Again,
Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
On 11/16/2017 03:51 AM, Wouter Verhelst wrote: >> I also remember from talking with Vladimir during KVM Forum last month >> that one of the shortfalls of the NBD protocol is that you can only ever >> send a length of up to 32 bits on the command side (unless we introduce >> structured commands in addition to our current work to add structured >> replies); > > Yes, and I'm thinking we should do so. This will obviously require more > negotiation. > > Can be done fairly easily though: > - Client negotiates structured replies (don't think it makes sense to do > structured requests without structured replies) > - Server sets an extra transmission flag to say "I am capable of > receiving extended requests" > - Extended requests have a different magic number, and should have a > "request length" field as well. I'm thinking we make it: > > magic (32b) > request length (16b) > type (16b) > flags (64b) > handle (64b) > from (64b) > data length(64b) > (extra data) > > Request length in this proposal should always be at least 320. 320 bits, but we pass length in bytes, so 40. > > I made flags 64 bits rather than 16 as per the current format, because > that way everything is aligned on a 4-byte boundary, which makes things > a bit easier on some architectures (e.g., I know that sparc doesn't like > unaligned 64-bit access). 64 bits for flags looks like a bit of a waste, > but then if we're going to waste some bits somewhere, I guess it's best > to assign them to flags. > > The idea is that "request length" is the length of the data that the > client is sending, and "data length" is the length of the range that > we're trying to deal with. Yep, sounds reasonable. > > A write request would thus have to have request length be (data length + > 320); a read request would have request length be 320, and expect data > to be returned of data length bytes. Umm, if data length is 64 bits, but request length is 16 bits, we can't properly support write requests with that much payload. I think it goes back to the idea of block sizes: the maximum request that a server is willing to accept (advertised via INFO during NBD_OPT_GO) still applies, so you can't NBD_CMD_WRITE with more than that maximum size (and therefore your maximum request size can't be more than that size plus the overhead of the 40 byte header), but OTHER commands that CAN operate on larger chunks of the export (WRITE_ZEROES, BLOCK_STATUS, maybe a new extension LOCK for allowing fcntl()-like locking semantics) can now be accommodated, because they don't have to send large amounts of payload along with the request. Keeping request length at 16 bits may therefore be too small (if we want to allow 32M write payloads), and may require us laying out the structured write header slightly differently (maybe fewer flag bits after all). Or we can say that NBD_CMD_WRITE still has to use old-style requests. > > A metadata request could then tack on extra data, where request length > of 320 implies "all negotiated metadata contexts", but anything more > than that would imply there are some metadata IDs passed along. Again, 40 bytes (not 320 bits), but yeah, that would be the idea. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
On Tue, Nov 14, 2017 at 01:06:17PM -0600, Eric Blake wrote: > On 11/14/2017 11:37 AM, Wouter Verhelst wrote: > > On Tue, Nov 14, 2017 at 10:41:39AM -0600, Eric Blake wrote: > >> Another thought - with structured replies, we finally have a way to let > >> the client ask for the server to send resize information whenever the > >> server wants, rather than having to be polled by a new client request > >> all the time. This is possible by having the server reply with a chunk > >> without the NBD_REPLY_FLAG_DONE bit, for as many times as it wants, > >> (that is, the server never officially ends the response to the single > >> client request for on-going status, until the client sends an > >> NBD_CMD_DISC). > > > > Hrm, yeah, that could work. > > > > Minor downside of this would be that a client would now be expected to > > continue listening "forever" (probably needs to do a blocking read() or > > a select() on the socket), whereas with the current situation a client > > could get away with only reading for as long as it expects data. > > > > I don't think that should be a blocker, but it might be something we > > might want to document. > > > >> I don't think the server should go into this mode without a flag bit > >> from the client requesting it (as it potentially ties up a thread that > >> could otherwise be used for parallel processing of other requests), > > > > Yeah. I think we should probably initiate this with a BLOCK_STATUS > > message that has a flag with which we mean "don't stop sending data on > > the given region for contexts that support it". > > Now we're mixing NBD_CMD_BLOCK_STATUS and NBD_CMD_RESIZE; Eh, right -- I had forgotten about RESIZE, actually :-) > I was thinking of the open-ended command for being informed of all > server-side-initiated size changes in response to RESIZE; but your mention of > an open-ended BLOCK_STATUS has an interesting connotation of being able to > get live updates as sections of a file are dirtied. For instance, or whatever other metadata we end up sending through BLOCK_STATUS. > I also remember from talking with Vladimir during KVM Forum last month > that one of the shortfalls of the NBD protocol is that you can only ever > send a length of up to 32 bits on the command side (unless we introduce > structured commands in addition to our current work to add structured > replies); Yes, and I'm thinking we should do so. This will obviously require more negotiation. Can be done fairly easily though: - Client negotiates structured replies (don't think it makes sense to do structured requests without structured replies) - Server sets an extra transmission flag to say "I am capable of receiving extended requests" - Extended requests have a different magic number, and should have a "request length" field as well. I'm thinking we make it: magic (32b) request length (16b) type (16b) flags (64b) handle (64b) from (64b) data length(64b) (extra data) Request length in this proposal should always be at least 320. I made flags 64 bits rather than 16 as per the current format, because that way everything is aligned on a 4-byte boundary, which makes things a bit easier on some architectures (e.g., I know that sparc doesn't like unaligned 64-bit access). 64 bits for flags looks like a bit of a waste, but then if we're going to waste some bits somewhere, I guess it's best to assign them to flags. The idea is that "request length" is the length of the data that the client is sending, and "data length" is the length of the range that we're trying to deal with. A write request would thus have to have request length be (data length + 320); a read request would have request length be 320, and expect data to be returned of data length bytes. A metadata request could then tack on extra data, where request length of 320 implies "all negotiated metadata contexts", but anything more than that would imply there are some metadata IDs passed along. etc. Thoughts? [...] > > However, I could imagine that there might be some cases wherein a server > > might be able to go into such a mode for two or more metadata contexts, > > and where a client might want to go into that mode for one of them but > > not all of them, while still wanting some information from them. > > > > This could be covered with metadata context syntax, but it's annoying > > and shouldn't be necessary. > > > > I'm starting to think I made a mistake when I said NBD_CMD_BLOCK_STATUS > > can't take a metadata context ID. Okay, there's no space for it, but > > that shouldn't have been a blocker. > > > > Thoughts? > > Nothing says the server has to reply the same length of information when > replying for multiple selected metadata contexts; but if we allow > different reply sizes all in one query, we may also need some way to > easily tell that the server has stopped sending metadata for one context > even though it is still providing additional replie
Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
On 11/14/2017 11:37 AM, Wouter Verhelst wrote: > On Tue, Nov 14, 2017 at 10:41:39AM -0600, Eric Blake wrote: >> Another thought - with structured replies, we finally have a way to let >> the client ask for the server to send resize information whenever the >> server wants, rather than having to be polled by a new client request >> all the time. This is possible by having the server reply with a chunk >> without the NBD_REPLY_FLAG_DONE bit, for as many times as it wants, >> (that is, the server never officially ends the response to the single >> client request for on-going status, until the client sends an >> NBD_CMD_DISC). > > Hrm, yeah, that could work. > > Minor downside of this would be that a client would now be expected to > continue listening "forever" (probably needs to do a blocking read() or > a select() on the socket), whereas with the current situation a client > could get away with only reading for as long as it expects data. > > I don't think that should be a blocker, but it might be something we > might want to document. > >> I don't think the server should go into this mode without a flag bit >> from the client requesting it (as it potentially ties up a thread that >> could otherwise be used for parallel processing of other requests), > > Yeah. I think we should probably initiate this with a BLOCK_STATUS > message that has a flag with which we mean "don't stop sending data on > the given region for contexts that support it". Now we're mixing NBD_CMD_BLOCK_STATUS and NBD_CMD_RESIZE; I was thinking of the open-ended command for being informed of all server-side-initiated size changes in response to RESIZE; but your mention of an open-ended BLOCK_STATUS has an interesting connotation of being able to get live updates as sections of a file are dirtied. I also remember from talking with Vladimir during KVM Forum last month that one of the shortfalls of the NBD protocol is that you can only ever send a length of up to 32 bits on the command side (unless we introduce structured commands in addition to our current work to add structured replies); even querying the full status of a 1TB volume would require at least 256 NBD_CMD_BLOCK_STATUS calls. But having a special case of a length of 0 meaning to report status as long as the server is interested could reduce the number of round trips by letting the server report more than 4G of status in response to one client query. There was also a thread a while ago about the possibility of a per-export flag denoting a volume is known to contain all-zeroes on startup, which can be used as a handy shortcut to having to query block status over each slice of the file; we weren't sure about burning a per-export flag on that information, but having it be a special mode of NBD_CMD_BLOCK_STATUS may be easy enough to codify. > > However, I could imagine that there might be some cases wherein a server > might be able to go into such a mode for two or more metadata contexts, > and where a client might want to go into that mode for one of them but > not all of them, while still wanting some information from them. > > This could be covered with metadata context syntax, but it's annoying > and shouldn't be necessary. > > I'm starting to think I made a mistake when I said NBD_CMD_BLOCK_STATUS > can't take a metadata context ID. Okay, there's no space for it, but > that shouldn't have been a blocker. > > Thoughts? Nothing says the server has to reply the same length of information when replying for multiple selected metadata contexts; but if we allow different reply sizes all in one query, we may also need some way to easily tell that the server has stopped sending metadata for one context even though it is still providing additional replies for another context. And maybe we do want to someday start thinking about structured requests; where being able to do per-command selection of metadata contexts (instead of per-export selection) may indeed be the first use case. > >> and that the server could reject a repeat command with the flag if it >> is already serving a previous open-ended request. > > Right. > > On the other hand, I can imagine that a client might also want to tell > the server that it is no longer interested in an outstanding request. In > such a case, it should be able to cancel it. Good point - if we allow the client to request an open-ended reply, it's also nice to let the client decide how long that open-endedness should last. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
On Tue, Nov 14, 2017 at 10:41:39AM -0600, Eric Blake wrote: > Another thought - with structured replies, we finally have a way to let > the client ask for the server to send resize information whenever the > server wants, rather than having to be polled by a new client request > all the time. This is possible by having the server reply with a chunk > without the NBD_REPLY_FLAG_DONE bit, for as many times as it wants, > (that is, the server never officially ends the response to the single > client request for on-going status, until the client sends an > NBD_CMD_DISC). Hrm, yeah, that could work. Minor downside of this would be that a client would now be expected to continue listening "forever" (probably needs to do a blocking read() or a select() on the socket), whereas with the current situation a client could get away with only reading for as long as it expects data. I don't think that should be a blocker, but it might be something we might want to document. > I don't think the server should go into this mode without a flag bit > from the client requesting it (as it potentially ties up a thread that > could otherwise be used for parallel processing of other requests), Yeah. I think we should probably initiate this with a BLOCK_STATUS message that has a flag with which we mean "don't stop sending data on the given region for contexts that support it". However, I could imagine that there might be some cases wherein a server might be able to go into such a mode for two or more metadata contexts, and where a client might want to go into that mode for one of them but not all of them, while still wanting some information from them. This could be covered with metadata context syntax, but it's annoying and shouldn't be necessary. I'm starting to think I made a mistake when I said NBD_CMD_BLOCK_STATUS can't take a metadata context ID. Okay, there's no space for it, but that shouldn't have been a blocker. Thoughts? > and that the server could reject a repeat command with the flag if it > is already serving a previous open-ended request. Right. On the other hand, I can imagine that a client might also want to tell the server that it is no longer interested in an outstanding request. In such a case, it should be able to cancel it. -- Could you people please use IRC like normal people?!? -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008 Hacklab
Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
[reviving an old thread] On 01/23/2017 08:54 AM, Eric Blake wrote: > I'm still thinking that allowing the client to query the current size is > useful. Over the weekend, I was thinking of SEEK_SET/SEEK_END semantics > (SEEK_CUR doesn't really make sense, since we don't maintain a current > offset), and wondering if we could improve the command as follows: > > If invoked without flags, it operates with SEEK_SET semantics (the > offset is the new requested size); if invoked with > NBD_CMD_FLAG_RELATIVE, it operates with SEEK_END semantics (the offset > is added to the existing size, and can be treated as a signed 64-bit > negative number to shrink). Either way, on success, the command replies > with a structured reply containing the new 64-bit total size of the > disk. This would require the reply to be a structured reply, and the > semantics of NBD_CMD_FLAG_RELATIVE with an offset of 0 becomes a way to > probe the existing disk size (thus enabling the server to resize without > client request, and perhaps used with some out-of-band means for the > client to know that it should query the server for an updated size). > Also note that the server's reply of the current size may be slightly > different than what was requested by the client (that is, we should > allow the server to round the client's request up to an appropriate > granularity) - we should probably require that the server can only round > up (not down). > > Thoughts? Another thought - with structured replies, we finally have a way to let the client ask for the server to send resize information whenever the server wants, rather than having to be polled by a new client request all the time. This is possible by having the server reply with a chunk without the NBD_REPLY_FLAG_DONE bit, for as many times as it wants, (that is, the server never officially ends the response to the single client request for on-going status, until the client sends an NBD_CMD_DISC). I don't think the server should go into this mode without a flag bit from the client requesting it (as it potentially ties up a thread that could otherwise be used for parallel processing of other requests), and that the server could reject a repeat command with the flag if it is already serving a previous open-ended request. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
Hi Eric, On Mon, Jan 23, 2017 at 08:54:48AM -0600, Eric Blake wrote: > On 01/22/2017 05:43 AM, Wouter Verhelst wrote: > > > > Having given this some more thought, I'm not entirely sure anymore that > > an active resize from the NBD layer is necessarily a layering violation. > > There might be other cases where this is useful, and e.g., the Linux > > kernel also supports active resizes of block devices in some cases > > (e.g., LVM). We'll also need to define what happens in case the resize > > operation isn't possible for some reason (e.g., the size increase asks > > for more space than the server's storage has available). > > > > I suggest the following semantics: > > > > - Add a transmission flag to indicate resize is possible for > > transmission phase (NBD_FLAG_SEND_RESIZE, flag 10). I don't see any > > need for an NBD_OPT_RESIZE or some such; just the flag should suffice. > > Except that in previous threads, you've argued that burning per-export > flags is rather expensive if it is instead soemthing that an NBD_OPT > exchange could enable. True, but I also said that flags make it easy to teach the kernel about something, without requiring extra API calls. Since this is just a thing of "it can be sent, doesn't need more metadata than that", a flag seems appropriate. > On the other hand, I could definitely see it being a per-export > setting (not all exports have the ability to be resized), so it > doesn't hurt my feelings to burn a per-export flag on this bit of > information. There's that too, yes. > > - NBD_CMD_RESIZE (command 7): resize the export. The "offset" field > > contains the new size, "length" is reserved (why not the other way > > around? offset is 64 bits, length is 32) > > - resize command can fail with: > > EPERM: server configuration does not allow this resize, or > > (optionally) other clients are using the same export and the request > > was for a smaller size. > > EIO: I/O error while trying to resize the device > > ENOSPC: new export size requires more space than is available > > ESHUTDOWN: server is shutting down > > > > I've added an "extension-resize" branch with the above. If that works > > for you, then run with it. If not, just implement what you want and send > > updates as you go along. > > You have at least one problem in there: > > +If the resize was successful, clients MAY now issue other requests > +up to the new size as requested in the request header. If the new > +size is larger than it was before the request, requests beyond the > +new size but not beyond the old one MUST fail with ENOSPC. > > You probably want "if the new size is _smaller_ than it was before", as > it is not possible to have requests beyond the new size but not beyond > the old if the new size is larger than the old. Ah, yes, thanks. I notice that that paragraph is also redundant with the next one. I'll fix that up tonight. > I'm still thinking that allowing the client to query the current size is > useful. Over the weekend, I was thinking of SEEK_SET/SEEK_END semantics > (SEEK_CUR doesn't really make sense, since we don't maintain a current > offset), and wondering if we could improve the command as follows: > > If invoked without flags, it operates with SEEK_SET semantics (the > offset is the new requested size); if invoked with > NBD_CMD_FLAG_RELATIVE, it operates with SEEK_END semantics (the offset > is added to the existing size, and can be treated as a signed 64-bit > negative number to shrink). Either way, on success, the command replies > with a structured reply containing the new 64-bit total size of the > disk. This would require the reply to be a structured reply, and the > semantics of NBD_CMD_FLAG_RELATIVE with an offset of 0 becomes a way to > probe the existing disk size (thus enabling the server to resize without > client request, and perhaps used with some out-of-band means for the > client to know that it should query the server for an updated size). That seems elegant, yes. > Also note that the server's reply of the current size may be slightly > different than what was requested by the client (that is, we should > allow the server to round the client's request up to an appropriate > granularity) - we should probably require that the server can only round > up (not down). Probably works too, indeed. > Thoughts? I think there's merit in what you're suggesting. If you want to suggest wording too, I'm all ears :-) -- < ron> I mean, the main *practical* problem with C++, is there's like a dozen people in the world who think they really understand all of its rules, and pretty much all of them are just lying to themselves too. -- #debian-devel, OFTC, 2016-02-12
Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
> On 23 Jan 2017, at 14:54, Eric Blake wrote: > > Thoughts? My main thought is that the purpose of the extension branches is to discuss and come to a consensus over experimental extension protocols. Whilst I think creating a branch should be a lightweight affair (fine), we explicitly say people should know the implications of using extensions in shipping code - specifically that the specification may change! -- Alex Bligh signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
On 01/22/2017 05:43 AM, Wouter Verhelst wrote: > > Having given this some more thought, I'm not entirely sure anymore that > an active resize from the NBD layer is necessarily a layering violation. > There might be other cases where this is useful, and e.g., the Linux > kernel also supports active resizes of block devices in some cases > (e.g., LVM). We'll also need to define what happens in case the resize > operation isn't possible for some reason (e.g., the size increase asks > for more space than the server's storage has available). > > I suggest the following semantics: > > - Add a transmission flag to indicate resize is possible for > transmission phase (NBD_FLAG_SEND_RESIZE, flag 10). I don't see any > need for an NBD_OPT_RESIZE or some such; just the flag should suffice. Except that in previous threads, you've argued that burning per-export flags is rather expensive if it is instead soemthing that an NBD_OPT exchange could enable. On the other hand, I could definitely see it being a per-export setting (not all exports have the ability to be resized), so it doesn't hurt my feelings to burn a per-export flag on this bit of information. > - NBD_CMD_RESIZE (command 7): resize the export. The "offset" field > contains the new size, "length" is reserved (why not the other way > around? offset is 64 bits, length is 32) > - resize command can fail with: > EPERM: server configuration does not allow this resize, or > (optionally) other clients are using the same export and the request > was for a smaller size. > EIO: I/O error while trying to resize the device > ENOSPC: new export size requires more space than is available > ESHUTDOWN: server is shutting down > > I've added an "extension-resize" branch with the above. If that works > for you, then run with it. If not, just implement what you want and send > updates as you go along. You have at least one problem in there: +If the resize was successful, clients MAY now issue other requests +up to the new size as requested in the request header. If the new +size is larger than it was before the request, requests beyond the +new size but not beyond the old one MUST fail with ENOSPC. You probably want "if the new size is _smaller_ than it was before", as it is not possible to have requests beyond the new size but not beyond the old if the new size is larger than the old. I'm still thinking that allowing the client to query the current size is useful. Over the weekend, I was thinking of SEEK_SET/SEEK_END semantics (SEEK_CUR doesn't really make sense, since we don't maintain a current offset), and wondering if we could improve the command as follows: If invoked without flags, it operates with SEEK_SET semantics (the offset is the new requested size); if invoked with NBD_CMD_FLAG_RELATIVE, it operates with SEEK_END semantics (the offset is added to the existing size, and can be treated as a signed 64-bit negative number to shrink). Either way, on success, the command replies with a structured reply containing the new 64-bit total size of the disk. This would require the reply to be a structured reply, and the semantics of NBD_CMD_FLAG_RELATIVE with an offset of 0 becomes a way to probe the existing disk size (thus enabling the server to resize without client request, and perhaps used with some out-of-band means for the client to know that it should query the server for an updated size). Also note that the server's reply of the current size may be slightly different than what was requested by the client (that is, we should allow the server to round the client's request up to an appropriate granularity) - we should probably require that the server can only round up (not down). Thoughts? -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
On Sun, Jan 22, 2017 at 06:25:09PM +0800, Bob Chen wrote: > Hi folks, > > My time schedule doesn't allow me to wait for the community's solution, so I > started to work on quick fix, which is to add a 'bdrv_truncate' function to > the > current NBD's BlockDriver. Basically it's an 'active resize' implementation. Fair enough. If you're going to do that though, please add some negotiation before trying to use the option, too. That way, other implementations can add support for that option too, if they want Having given this some more thought, I'm not entirely sure anymore that an active resize from the NBD layer is necessarily a layering violation. There might be other cases where this is useful, and e.g., the Linux kernel also supports active resizes of block devices in some cases (e.g., LVM). We'll also need to define what happens in case the resize operation isn't possible for some reason (e.g., the size increase asks for more space than the server's storage has available). I suggest the following semantics: - Add a transmission flag to indicate resize is possible for transmission phase (NBD_FLAG_SEND_RESIZE, flag 10). I don't see any need for an NBD_OPT_RESIZE or some such; just the flag should suffice. - NBD_CMD_RESIZE (command 7): resize the export. The "offset" field contains the new size, "length" is reserved (why not the other way around? offset is 64 bits, length is 32) - resize command can fail with: EPERM: server configuration does not allow this resize, or (optionally) other clients are using the same export and the request was for a smaller size. EIO: I/O error while trying to resize the device ENOSPC: new export size requires more space than is available ESHUTDOWN: server is shutting down I've added an "extension-resize" branch with the above. If that works for you, then run with it. If not, just implement what you want and send updates as you go along. > I also realized that the 'bdrv_truncate' caller stack is not in a coroutine, > seemed to be the main thread? Then I tried some synchronous code as below: > > int nbd_truncate(BlockDriverState *bs, int64_t offset) > { > //... > nbd_client_detach_aio_context(bs); > qio_channel_set_blocking(client->ioc, true, NULL); > > ret = nbd_send_request(client->ioc, &request); // step 1, send > custom NBD_CMD_RESIZE request > ret = nbd_receive_reply(client->ioc, &reply); > > read_sync(client->ioc, &new_size, sizeof(new_size)); // step 2, > expected to receive the confirmed new_size as data > new_size = be64_to_cpu(new_size); > > qio_channel_set_blocking(client->ioc, false, NULL); > nbd_client_attach_aio_context(bs, aio_context); > //... > } > > However at step 2, the 'new_size' I read is not always correct. Sometimes the > bytes are repeating, for instance 1073741824 (1GB) became 1073741824073741824 > ... > > Could you help me figure out what went wrong? > > > Regards, > Bob > > 2017-01-18 16:01 GMT+08:00 Wouter Verhelst : > > On Mon, Jan 16, 2017 at 01:36:21PM -0600, Eric Blake wrote: > > Maybe the structured reply proposal can be extended into this (reserve a > > "reply" header that can be issued as many times as desired by the server > > without the client ever having issued the request first, and where the > > reply never uses the end-of-reply marker), but I'm not sure I want to go > > that direction just yet. > > It's not necessarily a bad idea, which could also be used for: > - keepalive probes from server to client > - unsolicited ESHUTDOWN messages > > both of which are currently not possible and might be useful for the > protocol to have. > > -- > < ron> I mean, the main *practical* problem with C++, is there's like a > dozen > people in the world who think they really understand all of its > rules, > and pretty much all of them are just lying to themselves too. > -- #debian-devel, OFTC, 2016-02-12 > > -- < ron> I mean, the main *practical* problem with C++, is there's like a dozen people in the world who think they really understand all of its rules, and pretty much all of them are just lying to themselves too. -- #debian-devel, OFTC, 2016-02-12
Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
Hi folks, My time schedule doesn't allow me to wait for the community's solution, so I started to work on quick fix, which is to add a 'bdrv_truncate' function to the current NBD's BlockDriver. Basically it's an 'active resize' implementation. I also realized that the 'bdrv_truncate' caller stack is not in a coroutine, seemed to be the main thread? Then I tried some synchronous code as below: int nbd_truncate(BlockDriverState *bs, int64_t offset) { //... nbd_client_detach_aio_context(bs); qio_channel_set_blocking(client->ioc, true, NULL); ret = nbd_send_request(client->ioc, &request);// step 1, send custom NBD_CMD_RESIZE request ret = nbd_receive_reply(client->ioc, &reply); read_sync(client->ioc, &new_size, sizeof(new_size)); // step 2, expected to receive the confirmed new_size as data new_size = be64_to_cpu(new_size); qio_channel_set_blocking(client->ioc, false, NULL); nbd_client_attach_aio_context(bs, aio_context); //... } However at step 2, the 'new_size' I read is not always correct. Sometimes the bytes are repeating, for instance 1073741824 (1GB) became 1073741824073741824 ... Could you help me figure out what went wrong? Regards, Bob 2017-01-18 16:01 GMT+08:00 Wouter Verhelst : > On Mon, Jan 16, 2017 at 01:36:21PM -0600, Eric Blake wrote: > > Maybe the structured reply proposal can be extended into this (reserve a > > "reply" header that can be issued as many times as desired by the server > > without the client ever having issued the request first, and where the > > reply never uses the end-of-reply marker), but I'm not sure I want to go > > that direction just yet. > > It's not necessarily a bad idea, which could also be used for: > - keepalive probes from server to client > - unsolicited ESHUTDOWN messages > > both of which are currently not possible and might be useful for the > protocol to have. > > -- > < ron> I mean, the main *practical* problem with C++, is there's like a > dozen >people in the world who think they really understand all of its > rules, >and pretty much all of them are just lying to themselves too. > -- #debian-devel, OFTC, 2016-02-12 >
Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
On Mon, Jan 16, 2017 at 01:36:21PM -0600, Eric Blake wrote: > Maybe the structured reply proposal can be extended into this (reserve a > "reply" header that can be issued as many times as desired by the server > without the client ever having issued the request first, and where the > reply never uses the end-of-reply marker), but I'm not sure I want to go > that direction just yet. It's not necessarily a bad idea, which could also be used for: - keepalive probes from server to client - unsolicited ESHUTDOWN messages both of which are currently not possible and might be useful for the protocol to have. -- < ron> I mean, the main *practical* problem with C++, is there's like a dozen people in the world who think they really understand all of its rules, and pretty much all of them are just lying to themselves too. -- #debian-devel, OFTC, 2016-02-12
Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
On 01/14/2017 08:45 AM, Wouter Verhelst wrote: > Hi Eric, > > (side note: my mutt tells me that the signature on your message does not > validate. Not sure what's going on, but something you might want to look > into...) Not my fault, but a well-known issue with mailman. It rewrites portions of message bodies that look like headers, yet GPG signatures intentionally include a portion of the body that includes a repeat of the original headers in order to authenticate the original sender. Thus, the rewritten message no longer has a valid signature :( In the past I've compared a broken and pristine copy of my messages, and it's fairly annoying that the difference is sometimes as subtle as a space vs. a tab, due to the way the header line was rewritten. > > On Thu, Jan 12, 2017 at 12:45:56PM -0600, Eric Blake wrote: >> For resize smaller, things are a lot trickier - how do you block access >> to a portion of a file that the client still thinks exist, but which the >> server has truncated away? Maybe the easy way out is to state that the >> server MUST NOT resize smaller. > > I would prefer something along the lines of "the server MUST NOT > activate the 'resize smaller' command (which it received out of band > from whereever) until the client asked for and was told the new size of > the device." > > That is (with A & X being offsets, and B & Y being sizes, A+B < X): > > clientserver server mgmt > READ A,B no error > resize to X > READ X,Y no error > reread sizes > READ X,Y ENOSPC > READ A,B no error > > In that scenario, it should be the responsibility of the server to > ensure there are no more readers; servers could implement that (if they > don't want to do the required bookkeeping and keep such requests as > pending) by simply sending ESHUTDOWN or some such. In other words, a resize-smaller request (outside of the NBD protocol) can either force the server to tell the clients to disconnect, or else is merely a hint that is at the mercy of the client actually being smart enough to request the information. I don't know how we'd specify that, but the idea seems reasonable enough. The sad part is that the NBD protocol (currently) lacks any way for the server to send an unsolicited hint to the client. I've seen other protocols (such as libvirt) that include a hint mechanism; intentionally designed where a client that ignores the hints is no worse for the wear, but where the hints can allow the server to pass information to the client faster than what is possible if the client never asks the right question. Maybe the structured reply proposal can be extended into this (reserve a "reply" header that can be issued as many times as desired by the server without the client ever having issued the request first, and where the reply never uses the end-of-reply marker), but I'm not sure I want to go that direction just yet. > > I agree with Alex that adding an active resize command to the NBD > protocol feels like a layering violation. We should probably not do > that. Fair enough; for now it will just be a passive re-query command. > >> Should I spend time turning this idea into a more formal spec, along the >> lines of other NBD extension proposals? > > Feel free. I'll see what I can come up with in the next day or so, at least into an RFC-quality spec change. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
> On 14 Jan 2017, at 14:48, Wouter Verhelst wrote: > > On Thu, Jan 12, 2017 at 06:56:42PM +, Alex Bligh wrote: >> My preferred way to do this would be essentially to allow NBD_OPT_INFO >> to be sent (wrapped appropriately) during the main transmission phase. >> That would allow *any* INFO stuff to be reread. > > Can you go into a bit more detail how you'd see that? It feels a bit > awkward to me, at best; but I could be missing something. Well, the idea would be something like: NBD_CMD_INFO, use the offset specified as the the information type, and put the INFO reply within the reply block. I guess the client doesn't know the length of the reply so we'd have to use structured replies or similar. Although looking through the current info types, I can't see why today you'd want to reread anything other than the length, so maybe this is useless futureproofing. -- Alex Bligh
Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
On Thu, Jan 12, 2017 at 06:56:42PM +, Alex Bligh wrote: > My preferred way to do this would be essentially to allow NBD_OPT_INFO > to be sent (wrapped appropriately) during the main transmission phase. > That would allow *any* INFO stuff to be reread. Can you go into a bit more detail how you'd see that? It feels a bit awkward to me, at best; but I could be missing something. -- < ron> I mean, the main *practical* problem with C++, is there's like a dozen people in the world who think they really understand all of its rules, and pretty much all of them are just lying to themselves too. -- #debian-devel, OFTC, 2016-02-12
Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?
Hi Eric, (side note: my mutt tells me that the signature on your message does not validate. Not sure what's going on, but something you might want to look into...) On Thu, Jan 12, 2017 at 12:45:56PM -0600, Eric Blake wrote: > For resize smaller, things are a lot trickier - how do you block access > to a portion of a file that the client still thinks exist, but which the > server has truncated away? Maybe the easy way out is to state that the > server MUST NOT resize smaller. I would prefer something along the lines of "the server MUST NOT activate the 'resize smaller' command (which it received out of band from whereever) until the client asked for and was told the new size of the device." That is (with A & X being offsets, and B & Y being sizes, A+B < X): client server server mgmt READ A,Bno error resize to X READ X,Yno error reread sizes READ X,YENOSPC READ A,Bno error In that scenario, it should be the responsibility of the server to ensure there are no more readers; servers could implement that (if they don't want to do the required bookkeeping and keep such requests as pending) by simply sending ESHUTDOWN or some such. > > What about an active `resize` request from the client? Considering some NBD > > servers might have the capability to do instant resizing, not applying to > > LVM or host block device, of course... > > And perhaps the 're-read size' command can serve a dual purpose. Since > all NBD commands already include space for size and offset parameters, > we could define that if the client passes 0/0 for size and offset, it > wants to read the server's current size; if it passes 0/non-zero, it is > asking the server to resize to the new non-zero size (and the server's > success or error response tells whether the resize happened). I agree with Alex that adding an active resize command to the NBD protocol feels like a layering violation. We should probably not do that. > Should I spend time turning this idea into a more formal spec, along the > lines of other NBD extension proposals? Feel free. -- < ron> I mean, the main *practical* problem with C++, is there's like a dozen people in the world who think they really understand all of its rules, and pretty much all of them are just lying to themselves too. -- #debian-devel, OFTC, 2016-02-12