subject:"\[Qemu\-devel\] \[Nbd\] \[Qemu\-block\] How to online resize qemu disk with nbd protocol\?"

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-11-23 Thread Wouter Verhelst

Ping. Should I write this up as a proper proposal?

On Thu, Nov 16, 2017 at 05:20:29PM +0100, Wouter Verhelst wrote:
> On Thu, Nov 16, 2017 at 09:30:41AM -0600, Eric Blake wrote:
> > On 11/16/2017 03:51 AM, Wouter Verhelst wrote:
> > 
> > >> I also remember from talking with Vladimir during KVM Forum last month
> > >> that one of the shortfalls of the NBD protocol is that you can only ever
> > >> send a length of up to 32 bits on the command side (unless we introduce
> > >> structured commands in addition to our current work to add structured
> > >> replies);
> > > 
> > > Yes, and I'm thinking we should do so. This will obviously require more
> > > negotiation.
> > > 
> > > Can be done fairly easily though:
> > > - Client negotiates structured replies (don't think it makes sense to do
> > >   structured requests without structured replies)
> > > - Server sets an extra transmission flag to say "I am capable of
> > >   receiving extended requests"
> > > - Extended requests have a different magic number, and should have a
> > >   "request length" field as well. I'm thinking we make it:
> > > 
> > > magic  (32b)
> > > request length (16b)
> > > type   (16b)
> > > flags  (64b)
> > > handle (64b)
> > > from   (64b)
> > > data length(64b)
> > > (extra data)
> > > 
> > > Request length in this proposal should always be at least 320.
> > 
> > 320 bits, but we pass length in bytes, so 40.
> 
> Umm. Yes. I didn't sleep very well last night, sorry.
> 
> > > I made flags 64 bits rather than 16 as per the current format, because
> > > that way everything is aligned on a 4-byte boundary, which makes things
> > > a bit easier on some architectures (e.g., I know that sparc doesn't like
> > > unaligned 64-bit access). 64 bits for flags looks like a bit of a waste,
> > > but then if we're going to waste some bits somewhere, I guess it's best
> > > to assign them to flags.
> > > 
> > > The idea is that "request length" is the length of the data that the
> > > client is sending, and "data length" is the length of the range that
> > > we're trying to deal with.
> > 
> > Yep, sounds reasonable.
> > 
> > > A write request would thus have to have request length be (data length +
> > > 320); a read request would have request length be 320, and expect data
> > > to be returned of data length bytes.
> > 
> > Umm, if data length is 64 bits, but request length is 16 bits, we can't
> > properly support write requests with that much payload.
> 
> mumble mumble. Yes, of course. See above ;-)
> 
> > I think it goes
> > back to the idea of block sizes: the maximum request that a server is
> > willing to accept (advertised via INFO during NBD_OPT_GO) still applies,
> > so you can't NBD_CMD_WRITE with more than that maximum size (and
> > therefore your maximum request size can't be more than that size plus
> > the overhead of the 40 byte header), but OTHER commands that CAN operate
> > on larger chunks of the export (WRITE_ZEROES, BLOCK_STATUS, maybe a new
> > extension LOCK for allowing fcntl()-like locking semantics) can now be
> > accommodated, because they don't have to send large amounts of payload
> > along with the request.
> > 
> > Keeping request length at 16 bits may therefore be too small (if we want
> > to allow 32M write payloads), and may require us laying out the
> > structured write header slightly differently (maybe fewer flag bits
> > after all).
> 
> Right, so let's make it:
> 
> magic  (32b)
> flags  (16b)
> type   (16b)
> handle (64b)
> from   (64b)
> data length(64b)
> request length (64b)
> (extra data)
> 
> That actually copies the same structure as the current request header, but:
> - data length is extended from 32 bits to 64 bits, so that long requests can 
> be
>   made where they make sense. We should make it clear though that block
>   size limitations still apply, and that a request to read 2^63 bytes of
>   data is likely to be refused by a server.
> - request lenght is added, also as a 64-bit value. I could be persuaded
>   to make that a 32-bit value rather than a 64-bit one, though.
> 
> This way, common code can be used to handle all data up to and including
> the "from" field; what's beyond that would need to be decided based upon
> the value of the magic field (32-byte data length in case of old
> requests, 2 64-byte length values in the other case).
> 
> Maybe, by the time we get here, it makes sense to also have "request
> length" be 0 for a command that doesn't have any extra data beyond that
> header. That would also make one particular case easier to read:
> 
> if(req.type == NBD_CMD_WRITE && req.data_length != req.request_length)
> return EINVAL;
> 
> However, the downside of that is that we include header length in all
> other cases (negotiation, structured replies), and this one would then
> be the odd one out; I could imagine that might cause too much confusion.
> Not sure whether the minor advantage a

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-11-16 Thread Wouter Verhelst

On Thu, Nov 16, 2017 at 09:30:41AM -0600, Eric Blake wrote:
> On 11/16/2017 03:51 AM, Wouter Verhelst wrote:
> 
> >> I also remember from talking with Vladimir during KVM Forum last month
> >> that one of the shortfalls of the NBD protocol is that you can only ever
> >> send a length of up to 32 bits on the command side (unless we introduce
> >> structured commands in addition to our current work to add structured
> >> replies);
> > 
> > Yes, and I'm thinking we should do so. This will obviously require more
> > negotiation.
> > 
> > Can be done fairly easily though:
> > - Client negotiates structured replies (don't think it makes sense to do
> >   structured requests without structured replies)
> > - Server sets an extra transmission flag to say "I am capable of
> >   receiving extended requests"
> > - Extended requests have a different magic number, and should have a
> >   "request length" field as well. I'm thinking we make it:
> > 
> > magic  (32b)
> > request length (16b)
> > type   (16b)
> > flags  (64b)
> > handle (64b)
> > from   (64b)
> > data length(64b)
> > (extra data)
> > 
> > Request length in this proposal should always be at least 320.
> 
> 320 bits, but we pass length in bytes, so 40.

Umm. Yes. I didn't sleep very well last night, sorry.

> > I made flags 64 bits rather than 16 as per the current format, because
> > that way everything is aligned on a 4-byte boundary, which makes things
> > a bit easier on some architectures (e.g., I know that sparc doesn't like
> > unaligned 64-bit access). 64 bits for flags looks like a bit of a waste,
> > but then if we're going to waste some bits somewhere, I guess it's best
> > to assign them to flags.
> > 
> > The idea is that "request length" is the length of the data that the
> > client is sending, and "data length" is the length of the range that
> > we're trying to deal with.
> 
> Yep, sounds reasonable.
> 
> > A write request would thus have to have request length be (data length +
> > 320); a read request would have request length be 320, and expect data
> > to be returned of data length bytes.
> 
> Umm, if data length is 64 bits, but request length is 16 bits, we can't
> properly support write requests with that much payload.

mumble mumble. Yes, of course. See above ;-)

> I think it goes
> back to the idea of block sizes: the maximum request that a server is
> willing to accept (advertised via INFO during NBD_OPT_GO) still applies,
> so you can't NBD_CMD_WRITE with more than that maximum size (and
> therefore your maximum request size can't be more than that size plus
> the overhead of the 40 byte header), but OTHER commands that CAN operate
> on larger chunks of the export (WRITE_ZEROES, BLOCK_STATUS, maybe a new
> extension LOCK for allowing fcntl()-like locking semantics) can now be
> accommodated, because they don't have to send large amounts of payload
> along with the request.
> 
> Keeping request length at 16 bits may therefore be too small (if we want
> to allow 32M write payloads), and may require us laying out the
> structured write header slightly differently (maybe fewer flag bits
> after all).

Right, so let's make it:

magic  (32b)
flags  (16b)
type   (16b)
handle (64b)
from   (64b)
data length(64b)
request length (64b)
(extra data)

That actually copies the same structure as the current request header, but:
- data length is extended from 32 bits to 64 bits, so that long requests can be
  made where they make sense. We should make it clear though that block
  size limitations still apply, and that a request to read 2^63 bytes of
  data is likely to be refused by a server.
- request lenght is added, also as a 64-bit value. I could be persuaded
  to make that a 32-bit value rather than a 64-bit one, though.

This way, common code can be used to handle all data up to and including
the "from" field; what's beyond that would need to be decided based upon
the value of the magic field (32-byte data length in case of old
requests, 2 64-byte length values in the other case).

Maybe, by the time we get here, it makes sense to also have "request
length" be 0 for a command that doesn't have any extra data beyond that
header. That would also make one particular case easier to read:

if(req.type == NBD_CMD_WRITE && req.data_length != req.request_length)
return EINVAL;

However, the downside of that is that we include header length in all
other cases (negotiation, structured replies), and this one would then
be the odd one out; I could imagine that might cause too much confusion.
Not sure whether the minor advantage above is worth that.

> Or we can say that NBD_CMD_WRITE still has to use old-style requests.

Let's not :-)

> > A metadata request could then tack on extra data, where request length
> > of 320 implies "all negotiated metadata contexts", but anything more
> > than that would imply there are some metadata IDs passed along.
> 
> Again,

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-11-16 Thread Eric Blake

On 11/16/2017 03:51 AM, Wouter Verhelst wrote:

>> I also remember from talking with Vladimir during KVM Forum last month
>> that one of the shortfalls of the NBD protocol is that you can only ever
>> send a length of up to 32 bits on the command side (unless we introduce
>> structured commands in addition to our current work to add structured
>> replies);
> 
> Yes, and I'm thinking we should do so. This will obviously require more
> negotiation.
> 
> Can be done fairly easily though:
> - Client negotiates structured replies (don't think it makes sense to do
>   structured requests without structured replies)
> - Server sets an extra transmission flag to say "I am capable of
>   receiving extended requests"
> - Extended requests have a different magic number, and should have a
>   "request length" field as well. I'm thinking we make it:
> 
> magic  (32b)
> request length (16b)
> type   (16b)
> flags  (64b)
> handle (64b)
> from   (64b)
> data length(64b)
> (extra data)
> 
> Request length in this proposal should always be at least 320.

320 bits, but we pass length in bytes, so 40.

> 
> I made flags 64 bits rather than 16 as per the current format, because
> that way everything is aligned on a 4-byte boundary, which makes things
> a bit easier on some architectures (e.g., I know that sparc doesn't like
> unaligned 64-bit access). 64 bits for flags looks like a bit of a waste,
> but then if we're going to waste some bits somewhere, I guess it's best
> to assign them to flags.
> 
> The idea is that "request length" is the length of the data that the
> client is sending, and "data length" is the length of the range that
> we're trying to deal with.

Yep, sounds reasonable.

> 
> A write request would thus have to have request length be (data length +
> 320); a read request would have request length be 320, and expect data
> to be returned of data length bytes.

Umm, if data length is 64 bits, but request length is 16 bits, we can't
properly support write requests with that much payload.  I think it goes
back to the idea of block sizes: the maximum request that a server is
willing to accept (advertised via INFO during NBD_OPT_GO) still applies,
so you can't NBD_CMD_WRITE with more than that maximum size (and
therefore your maximum request size can't be more than that size plus
the overhead of the 40 byte header), but OTHER commands that CAN operate
on larger chunks of the export (WRITE_ZEROES, BLOCK_STATUS, maybe a new
extension LOCK for allowing fcntl()-like locking semantics) can now be
accommodated, because they don't have to send large amounts of payload
along with the request.

Keeping request length at 16 bits may therefore be too small (if we want
to allow 32M write payloads), and may require us laying out the
structured write header slightly differently (maybe fewer flag bits
after all).  Or we can say that NBD_CMD_WRITE still has to use old-style
requests.

> 
> A metadata request could then tack on extra data, where request length
> of 320 implies "all negotiated metadata contexts", but anything more
> than that would imply there are some metadata IDs passed along.

Again, 40 bytes (not 320 bits), but yeah, that would be the idea.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-11-16 Thread Wouter Verhelst

On Tue, Nov 14, 2017 at 01:06:17PM -0600, Eric Blake wrote:
> On 11/14/2017 11:37 AM, Wouter Verhelst wrote:
> > On Tue, Nov 14, 2017 at 10:41:39AM -0600, Eric Blake wrote:
> >> Another thought - with structured replies, we finally have a way to let
> >> the client ask for the server to send resize information whenever the
> >> server wants, rather than having to be polled by a new client request
> >> all the time.  This is possible by having the server reply with a chunk
> >> without the NBD_REPLY_FLAG_DONE bit, for as many times as it wants,
> >> (that is, the server never officially ends the response to the single
> >> client request for on-going status, until the client sends an
> >> NBD_CMD_DISC).
> > 
> > Hrm, yeah, that could work.
> > 
> > Minor downside of this would be that a client would now be expected to
> > continue listening "forever" (probably needs to do a blocking read() or
> > a select() on the socket), whereas with the current situation a client
> > could get away with only reading for as long as it expects data.
> > 
> > I don't think that should be a blocker, but it might be something we
> > might want to document.
> > 
> >> I don't think the server should go into this mode without a flag bit
> >> from the client requesting it (as it potentially ties up a thread that
> >> could otherwise be used for parallel processing of other requests),
> > 
> > Yeah. I think we should probably initiate this with a BLOCK_STATUS
> > message that has a flag with which we mean "don't stop sending data on
> > the given region for contexts that support it".
> 
> Now we're mixing NBD_CMD_BLOCK_STATUS and NBD_CMD_RESIZE;

Eh, right -- I had forgotten about RESIZE, actually :-)

> I was thinking of the open-ended command for being informed of all
> server-side-initiated size changes in response to RESIZE; but your mention of
> an open-ended BLOCK_STATUS has an interesting connotation of being able to
> get live updates as sections of a file are dirtied.

For instance, or whatever other metadata we end up sending through
BLOCK_STATUS.

> I also remember from talking with Vladimir during KVM Forum last month
> that one of the shortfalls of the NBD protocol is that you can only ever
> send a length of up to 32 bits on the command side (unless we introduce
> structured commands in addition to our current work to add structured
> replies);

Yes, and I'm thinking we should do so. This will obviously require more
negotiation.

Can be done fairly easily though:
- Client negotiates structured replies (don't think it makes sense to do
  structured requests without structured replies)
- Server sets an extra transmission flag to say "I am capable of
  receiving extended requests"
- Extended requests have a different magic number, and should have a
  "request length" field as well. I'm thinking we make it:

magic  (32b)
request length (16b)
type   (16b)
flags  (64b)
handle (64b)
from   (64b)
data length(64b)
(extra data)

Request length in this proposal should always be at least 320.

I made flags 64 bits rather than 16 as per the current format, because
that way everything is aligned on a 4-byte boundary, which makes things
a bit easier on some architectures (e.g., I know that sparc doesn't like
unaligned 64-bit access). 64 bits for flags looks like a bit of a waste,
but then if we're going to waste some bits somewhere, I guess it's best
to assign them to flags.

The idea is that "request length" is the length of the data that the
client is sending, and "data length" is the length of the range that
we're trying to deal with.

A write request would thus have to have request length be (data length +
320); a read request would have request length be 320, and expect data
to be returned of data length bytes.

A metadata request could then tack on extra data, where request length
of 320 implies "all negotiated metadata contexts", but anything more
than that would imply there are some metadata IDs passed along.

etc.

Thoughts?

[...]
> > However, I could imagine that there might be some cases wherein a server
> > might be able to go into such a mode for two or more metadata contexts,
> > and where a client might want to go into that mode for one of them but
> > not all of them, while still wanting some information from them.
> > 
> > This could be covered with metadata context syntax, but it's annoying
> > and shouldn't be necessary.
> > 
> > I'm starting to think I made a mistake when I said NBD_CMD_BLOCK_STATUS
> > can't take a metadata context ID. Okay, there's no space for it, but
> > that shouldn't have been a blocker.
> > 
> > Thoughts?
> 
> Nothing says the server has to reply the same length of information when
> replying for multiple selected metadata contexts; but if we allow
> different reply sizes all in one query, we may also need some way to
> easily tell that the server has stopped sending metadata for one context
> even though it is still providing additional replie

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-11-14 Thread Eric Blake

On 11/14/2017 11:37 AM, Wouter Verhelst wrote:
> On Tue, Nov 14, 2017 at 10:41:39AM -0600, Eric Blake wrote:
>> Another thought - with structured replies, we finally have a way to let
>> the client ask for the server to send resize information whenever the
>> server wants, rather than having to be polled by a new client request
>> all the time.  This is possible by having the server reply with a chunk
>> without the NBD_REPLY_FLAG_DONE bit, for as many times as it wants,
>> (that is, the server never officially ends the response to the single
>> client request for on-going status, until the client sends an
>> NBD_CMD_DISC).
> 
> Hrm, yeah, that could work.
> 
> Minor downside of this would be that a client would now be expected to
> continue listening "forever" (probably needs to do a blocking read() or
> a select() on the socket), whereas with the current situation a client
> could get away with only reading for as long as it expects data.
> 
> I don't think that should be a blocker, but it might be something we
> might want to document.
> 
>> I don't think the server should go into this mode without a flag bit
>> from the client requesting it (as it potentially ties up a thread that
>> could otherwise be used for parallel processing of other requests),
> 
> Yeah. I think we should probably initiate this with a BLOCK_STATUS
> message that has a flag with which we mean "don't stop sending data on
> the given region for contexts that support it".

Now we're mixing NBD_CMD_BLOCK_STATUS and NBD_CMD_RESIZE; I was thinking
of the open-ended command for being informed of all
server-side-initiated size changes in response to RESIZE; but your
mention of an open-ended BLOCK_STATUS has an interesting connotation of
being able to get live updates as sections of a file are dirtied.

I also remember from talking with Vladimir during KVM Forum last month
that one of the shortfalls of the NBD protocol is that you can only ever
send a length of up to 32 bits on the command side (unless we introduce
structured commands in addition to our current work to add structured
replies); even querying the full status of a 1TB volume would require at
least 256 NBD_CMD_BLOCK_STATUS calls.  But having a special case of a
length of 0 meaning to report status as long as the server is interested
could reduce the number of round trips by letting the server report more
than 4G of status in response to one client query.  There was also a
thread a while ago about the possibility of a per-export flag denoting a
volume is known to contain all-zeroes on startup, which can be used as a
handy shortcut to having to query block status over each slice of the
file; we weren't sure about burning a per-export flag on that
information, but having it be a special mode of NBD_CMD_BLOCK_STATUS may
be easy enough to codify.

> 
> However, I could imagine that there might be some cases wherein a server
> might be able to go into such a mode for two or more metadata contexts,
> and where a client might want to go into that mode for one of them but
> not all of them, while still wanting some information from them.
> 
> This could be covered with metadata context syntax, but it's annoying
> and shouldn't be necessary.
> 
> I'm starting to think I made a mistake when I said NBD_CMD_BLOCK_STATUS
> can't take a metadata context ID. Okay, there's no space for it, but
> that shouldn't have been a blocker.
> 
> Thoughts?

Nothing says the server has to reply the same length of information when
replying for multiple selected metadata contexts; but if we allow
different reply sizes all in one query, we may also need some way to
easily tell that the server has stopped sending metadata for one context
even though it is still providing additional replies for another context.

And maybe we do want to someday start thinking about structured
requests; where being able to do per-command selection of metadata
contexts (instead of per-export selection) may indeed be the first use case.

> 
>> and that the server could reject a repeat command with the flag if it
>> is already serving a previous open-ended request.
> 
> Right.
> 
> On the other hand, I can imagine that a client might also want to tell
> the server that it is no longer interested in an outstanding request. In
> such a case, it should be able to cancel it.

Good point - if we allow the client to request an open-ended reply, it's
also nice to let the client decide how long that open-endedness should last.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-11-14 Thread Wouter Verhelst

On Tue, Nov 14, 2017 at 10:41:39AM -0600, Eric Blake wrote:
> Another thought - with structured replies, we finally have a way to let
> the client ask for the server to send resize information whenever the
> server wants, rather than having to be polled by a new client request
> all the time.  This is possible by having the server reply with a chunk
> without the NBD_REPLY_FLAG_DONE bit, for as many times as it wants,
> (that is, the server never officially ends the response to the single
> client request for on-going status, until the client sends an
> NBD_CMD_DISC).

Hrm, yeah, that could work.

Minor downside of this would be that a client would now be expected to
continue listening "forever" (probably needs to do a blocking read() or
a select() on the socket), whereas with the current situation a client
could get away with only reading for as long as it expects data.

I don't think that should be a blocker, but it might be something we
might want to document.

> I don't think the server should go into this mode without a flag bit
> from the client requesting it (as it potentially ties up a thread that
> could otherwise be used for parallel processing of other requests),

Yeah. I think we should probably initiate this with a BLOCK_STATUS
message that has a flag with which we mean "don't stop sending data on
the given region for contexts that support it".

However, I could imagine that there might be some cases wherein a server
might be able to go into such a mode for two or more metadata contexts,
and where a client might want to go into that mode for one of them but
not all of them, while still wanting some information from them.

This could be covered with metadata context syntax, but it's annoying
and shouldn't be necessary.

I'm starting to think I made a mistake when I said NBD_CMD_BLOCK_STATUS
can't take a metadata context ID. Okay, there's no space for it, but
that shouldn't have been a blocker.

Thoughts?

> and that the server could reject a repeat command with the flag if it
> is already serving a previous open-ended request.

Right.

On the other hand, I can imagine that a client might also want to tell
the server that it is no longer interested in an outstanding request. In
such a case, it should be able to cancel it.

-- 
Could you people please use IRC like normal people?!?

  -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008
 Hacklab

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-11-14 Thread Eric Blake

[reviving an old thread]

On 01/23/2017 08:54 AM, Eric Blake wrote:

> I'm still thinking that allowing the client to query the current size is
> useful.  Over the weekend, I was thinking of SEEK_SET/SEEK_END semantics
> (SEEK_CUR doesn't really make sense, since we don't maintain a current
> offset), and wondering if we could improve the command as follows:
> 
> If invoked without flags, it operates with SEEK_SET semantics (the
> offset is the new requested size); if invoked with
> NBD_CMD_FLAG_RELATIVE, it operates with SEEK_END semantics (the offset
> is added to the existing size, and can be treated as a signed 64-bit
> negative number to shrink).  Either way, on success, the command replies
> with a structured reply containing the new 64-bit total size of the
> disk.  This would require the reply to be a structured reply, and the
> semantics of NBD_CMD_FLAG_RELATIVE with an offset of 0 becomes a way to
> probe the existing disk size (thus enabling the server to resize without
> client request, and perhaps used with some out-of-band means for the
> client to know that it should query the server for an updated size).
> Also note that the server's reply of the current size may be slightly
> different than what was requested by the client (that is, we should
> allow the server to round the client's request up to an appropriate
> granularity) - we should probably require that the server can only round
> up (not down).
> 
> Thoughts?

Another thought - with structured replies, we finally have a way to let
the client ask for the server to send resize information whenever the
server wants, rather than having to be polled by a new client request
all the time.  This is possible by having the server reply with a chunk
without the NBD_REPLY_FLAG_DONE bit, for as many times as it wants,
(that is, the server never officially ends the response to the single
client request for on-going status, until the client sends an
NBD_CMD_DISC).  I don't think the server should go into this mode
without a flag bit from the client requesting it (as it potentially ties
up a thread that could otherwise be used for parallel processing of
other requests), and that the server could reject a repeat command with
the flag if it is already serving a previous open-ended request.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-01-25 Thread Wouter Verhelst

Hi Eric,

On Mon, Jan 23, 2017 at 08:54:48AM -0600, Eric Blake wrote:
> On 01/22/2017 05:43 AM, Wouter Verhelst wrote:
> > 
> > Having given this some more thought, I'm not entirely sure anymore that
> > an active resize from the NBD layer is necessarily a layering violation.
> > There might be other cases where this is useful, and e.g., the Linux
> > kernel also supports active resizes of block devices in some cases
> > (e.g., LVM). We'll also need to define what happens in case the resize
> > operation isn't possible for some reason (e.g., the size increase asks
> > for more space than the server's storage has available).
> > 
> > I suggest the following semantics:
> > 
> > - Add a transmission flag to indicate resize is possible for
> >   transmission phase (NBD_FLAG_SEND_RESIZE, flag 10). I don't see any
> >   need for an NBD_OPT_RESIZE or some such; just the flag should suffice.
> 
> Except that in previous threads, you've argued that burning per-export
> flags is rather expensive if it is instead soemthing that an NBD_OPT
> exchange could enable.

True, but I also said that flags make it easy to teach the kernel about
something, without requiring extra API calls.

Since this is just a thing of "it can be sent, doesn't need more
metadata than that", a flag seems appropriate.

> On the other hand, I could definitely see it being a per-export
> setting (not all exports have the ability to be resized), so it
> doesn't hurt my feelings to burn a per-export flag on this bit of
> information.

There's that too, yes.

> > - NBD_CMD_RESIZE (command 7): resize the export. The "offset" field
> >   contains the new size, "length" is reserved (why not the other way
> >   around? offset is 64 bits, length is 32)
> > - resize command can fail with:
> >   EPERM: server configuration does not allow this resize, or
> >   (optionally) other clients are using the same export and the request
> >   was for a smaller size.
> >   EIO: I/O error while trying to resize the device
> >   ENOSPC: new export size requires more space than is available
> >   ESHUTDOWN: server is shutting down
> > 
> > I've added an "extension-resize" branch with the above. If that works
> > for you, then run with it. If not, just implement what you want and send
> > updates as you go along.
> 
> You have at least one problem in there:
> 
> +If the resize was successful, clients MAY now issue other requests
> +up to the new size as requested in the request header. If the new
> +size is larger than it was before the request, requests beyond the
> +new size but not beyond the old one MUST fail with ENOSPC.
> 
> You probably want "if the new size is _smaller_ than it was before", as
> it is not possible to have requests beyond the new size but not beyond
> the old if the new size is larger than the old.

Ah, yes, thanks.

I notice that that paragraph is also redundant with the next one. I'll
fix that up tonight.

> I'm still thinking that allowing the client to query the current size is
> useful.  Over the weekend, I was thinking of SEEK_SET/SEEK_END semantics
> (SEEK_CUR doesn't really make sense, since we don't maintain a current
> offset), and wondering if we could improve the command as follows:
> 
> If invoked without flags, it operates with SEEK_SET semantics (the
> offset is the new requested size); if invoked with
> NBD_CMD_FLAG_RELATIVE, it operates with SEEK_END semantics (the offset
> is added to the existing size, and can be treated as a signed 64-bit
> negative number to shrink).  Either way, on success, the command replies
> with a structured reply containing the new 64-bit total size of the
> disk.  This would require the reply to be a structured reply, and the
> semantics of NBD_CMD_FLAG_RELATIVE with an offset of 0 becomes a way to
> probe the existing disk size (thus enabling the server to resize without
> client request, and perhaps used with some out-of-band means for the
> client to know that it should query the server for an updated size).

That seems elegant, yes.

> Also note that the server's reply of the current size may be slightly
> different than what was requested by the client (that is, we should
> allow the server to round the client's request up to an appropriate
> granularity) - we should probably require that the server can only round
> up (not down).

Probably works too, indeed.

> Thoughts?

I think there's merit in what you're suggesting. If you want to suggest
wording too, I'm all ears :-)

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-01-23 Thread Alex Bligh

> On 23 Jan 2017, at 14:54, Eric Blake  wrote:
> 
> Thoughts?

My main thought is that the purpose of the extension branches is
to discuss and come to a consensus over experimental extension protocols.
Whilst I think creating a branch should be a lightweight affair
(fine), we explicitly say people should know the implications
of using extensions in shipping code - specifically that the
specification may change!

--
Alex Bligh

signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-01-23 Thread Eric Blake

On 01/22/2017 05:43 AM, Wouter Verhelst wrote:
> 
> Having given this some more thought, I'm not entirely sure anymore that
> an active resize from the NBD layer is necessarily a layering violation.
> There might be other cases where this is useful, and e.g., the Linux
> kernel also supports active resizes of block devices in some cases
> (e.g., LVM). We'll also need to define what happens in case the resize
> operation isn't possible for some reason (e.g., the size increase asks
> for more space than the server's storage has available).
> 
> I suggest the following semantics:
> 
> - Add a transmission flag to indicate resize is possible for
>   transmission phase (NBD_FLAG_SEND_RESIZE, flag 10). I don't see any
>   need for an NBD_OPT_RESIZE or some such; just the flag should suffice.

Except that in previous threads, you've argued that burning per-export
flags is rather expensive if it is instead soemthing that an NBD_OPT
exchange could enable.  On the other hand, I could definitely see it
being a per-export setting (not all exports have the ability to be
resized), so it doesn't hurt my feelings to burn a per-export flag on
this bit of information.

> - NBD_CMD_RESIZE (command 7): resize the export. The "offset" field
>   contains the new size, "length" is reserved (why not the other way
>   around? offset is 64 bits, length is 32)
> - resize command can fail with:
>   EPERM: server configuration does not allow this resize, or
>   (optionally) other clients are using the same export and the request
>   was for a smaller size.
>   EIO: I/O error while trying to resize the device
>   ENOSPC: new export size requires more space than is available
>   ESHUTDOWN: server is shutting down
> 
> I've added an "extension-resize" branch with the above. If that works
> for you, then run with it. If not, just implement what you want and send
> updates as you go along.

You have at least one problem in there:

+If the resize was successful, clients MAY now issue other requests
+up to the new size as requested in the request header. If the new
+size is larger than it was before the request, requests beyond the
+new size but not beyond the old one MUST fail with ENOSPC.

You probably want "if the new size is _smaller_ than it was before", as
it is not possible to have requests beyond the new size but not beyond
the old if the new size is larger than the old.

I'm still thinking that allowing the client to query the current size is
useful.  Over the weekend, I was thinking of SEEK_SET/SEEK_END semantics
(SEEK_CUR doesn't really make sense, since we don't maintain a current
offset), and wondering if we could improve the command as follows:

If invoked without flags, it operates with SEEK_SET semantics (the
offset is the new requested size); if invoked with
NBD_CMD_FLAG_RELATIVE, it operates with SEEK_END semantics (the offset
is added to the existing size, and can be treated as a signed 64-bit
negative number to shrink).  Either way, on success, the command replies
with a structured reply containing the new 64-bit total size of the
disk.  This would require the reply to be a structured reply, and the
semantics of NBD_CMD_FLAG_RELATIVE with an offset of 0 becomes a way to
probe the existing disk size (thus enabling the server to resize without
client request, and perhaps used with some out-of-band means for the
client to know that it should query the server for an updated size).
Also note that the server's reply of the current size may be slightly
different than what was requested by the client (that is, we should
allow the server to round the client's request up to an appropriate
granularity) - we should probably require that the server can only round
up (not down).

Thoughts?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-01-22 Thread Wouter Verhelst

On Sun, Jan 22, 2017 at 06:25:09PM +0800, Bob Chen wrote:
> Hi folks,
> 
> My time schedule doesn't allow me to wait for the community's solution, so I
> started to work on quick fix, which is to add a 'bdrv_truncate' function to 
> the
> current NBD's BlockDriver. Basically it's an 'active resize' implementation.

Fair enough.

If you're going to do that though, please add some negotiation before
trying to use the option, too. That way, other implementations can add
support for that option too, if they want

Having given this some more thought, I'm not entirely sure anymore that
an active resize from the NBD layer is necessarily a layering violation.
There might be other cases where this is useful, and e.g., the Linux
kernel also supports active resizes of block devices in some cases
(e.g., LVM). We'll also need to define what happens in case the resize
operation isn't possible for some reason (e.g., the size increase asks
for more space than the server's storage has available).

I suggest the following semantics:

- Add a transmission flag to indicate resize is possible for
  transmission phase (NBD_FLAG_SEND_RESIZE, flag 10). I don't see any
  need for an NBD_OPT_RESIZE or some such; just the flag should suffice.
- NBD_CMD_RESIZE (command 7): resize the export. The "offset" field
  contains the new size, "length" is reserved (why not the other way
  around? offset is 64 bits, length is 32)
- resize command can fail with:
  EPERM: server configuration does not allow this resize, or
  (optionally) other clients are using the same export and the request
  was for a smaller size.
  EIO: I/O error while trying to resize the device
  ENOSPC: new export size requires more space than is available
  ESHUTDOWN: server is shutting down

I've added an "extension-resize" branch with the above. If that works
for you, then run with it. If not, just implement what you want and send
updates as you go along.

> I also realized that the 'bdrv_truncate' caller stack is not in a coroutine,
> seemed to be the main thread? Then I tried some synchronous code as below:
> 
> int nbd_truncate(BlockDriverState *bs, int64_t offset)
> {
>     //...
>     nbd_client_detach_aio_context(bs);
>     qio_channel_set_blocking(client->ioc, true, NULL);
> 
>     ret = nbd_send_request(client->ioc, &request);            // step 1, send
> custom NBD_CMD_RESIZE request
>     ret = nbd_receive_reply(client->ioc, &reply);
> 
>     read_sync(client->ioc, &new_size, sizeof(new_size));     // step 2,
> expected to receive the confirmed new_size as data
>     new_size = be64_to_cpu(new_size);
> 
>     qio_channel_set_blocking(client->ioc, false, NULL);
>     nbd_client_attach_aio_context(bs, aio_context);
>     //...
> }
> 
> However at step 2, the 'new_size' I read is not always correct. Sometimes the
> bytes are repeating, for instance 1073741824 (1GB) became 1073741824073741824
> ...
> 
> Could you help me figure out what went wrong?
> 
> 
> Regards, 
> Bob
> 
> 2017-01-18 16:01 GMT+08:00 Wouter Verhelst :
> 
> On Mon, Jan 16, 2017 at 01:36:21PM -0600, Eric Blake wrote:
> > Maybe the structured reply proposal can be extended into this (reserve a
> > "reply" header that can be issued as many times as desired by the server
> > without the client ever having issued the request first, and where the
> > reply never uses the end-of-reply marker), but I'm not sure I want to go
> > that direction just yet.
> 
> It's not necessarily a bad idea, which could also be used for:
> - keepalive probes from server to client
> - unsolicited ESHUTDOWN messages
> 
> both of which are currently not possible and might be useful for the
> protocol to have.
> 
> --
> < ron> I mean, the main *practical* problem with C++, is there's like a
> dozen
>        people in the world who think they really understand all of its
> rules,
>        and pretty much all of them are just lying to themselves too.
>  -- #debian-devel, OFTC, 2016-02-12
> 
> 

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-01-22 Thread Bob Chen

Hi folks,

My time schedule doesn't allow me to wait for the community's solution, so
I started to work on quick fix, which is to add a 'bdrv_truncate' function
to the current NBD's BlockDriver. Basically it's an 'active resize'
implementation.

I also realized that the 'bdrv_truncate' caller stack is not in a
coroutine, seemed to be the main thread? Then I tried some synchronous code
as below:

int nbd_truncate(BlockDriverState *bs, int64_t offset)
{
//...
nbd_client_detach_aio_context(bs);
qio_channel_set_blocking(client->ioc, true, NULL);

ret = nbd_send_request(client->ioc, &request);// step 1,
send custom NBD_CMD_RESIZE request
ret = nbd_receive_reply(client->ioc, &reply);

read_sync(client->ioc, &new_size, sizeof(new_size)); // step 2,
expected to receive the confirmed new_size as data
new_size = be64_to_cpu(new_size);

qio_channel_set_blocking(client->ioc, false, NULL);
nbd_client_attach_aio_context(bs, aio_context);
//...
}

However at step 2, the 'new_size' I read is not always correct. Sometimes
the bytes are repeating, for instance 1073741824 (1GB)
became 1073741824073741824 ...

Could you help me figure out what went wrong?

Regards,
Bob

2017-01-18 16:01 GMT+08:00 Wouter Verhelst :

> On Mon, Jan 16, 2017 at 01:36:21PM -0600, Eric Blake wrote:
> > Maybe the structured reply proposal can be extended into this (reserve a
> > "reply" header that can be issued as many times as desired by the server
> > without the client ever having issued the request first, and where the
> > reply never uses the end-of-reply marker), but I'm not sure I want to go
> > that direction just yet.
>
> It's not necessarily a bad idea, which could also be used for:
> - keepalive probes from server to client
> - unsolicited ESHUTDOWN messages
>
> both of which are currently not possible and might be useful for the
> protocol to have.
>
> --
> < ron> I mean, the main *practical* problem with C++, is there's like a
> dozen
>people in the world who think they really understand all of its
> rules,
>and pretty much all of them are just lying to themselves too.
>  -- #debian-devel, OFTC, 2016-02-12
>

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-01-18 Thread Wouter Verhelst

On Mon, Jan 16, 2017 at 01:36:21PM -0600, Eric Blake wrote:
> Maybe the structured reply proposal can be extended into this (reserve a
> "reply" header that can be issued as many times as desired by the server
> without the client ever having issued the request first, and where the
> reply never uses the end-of-reply marker), but I'm not sure I want to go
> that direction just yet.

It's not necessarily a bad idea, which could also be used for:
- keepalive probes from server to client
- unsolicited ESHUTDOWN messages

both of which are currently not possible and might be useful for the
protocol to have.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-01-16 Thread Eric Blake

On 01/14/2017 08:45 AM, Wouter Verhelst wrote:
> Hi Eric,
> 
> (side note: my mutt tells me that the signature on your message does not
> validate. Not sure what's going on, but something you might want to look
> into...)

Not my fault, but a well-known issue with mailman. It rewrites portions
of message bodies that look like headers, yet GPG signatures
intentionally include a portion of the body that includes a repeat of
the original headers in order to authenticate the original sender. Thus,
the rewritten message no longer has a valid signature :(

In the past I've compared a broken and pristine copy of my messages, and
it's fairly annoying that the difference is sometimes as subtle as a
space vs. a tab, due to the way the header line was rewritten.

> 
> On Thu, Jan 12, 2017 at 12:45:56PM -0600, Eric Blake wrote:
>> For resize smaller, things are a lot trickier - how do you block access
>> to a portion of a file that the client still thinks exist, but which the
>> server has truncated away?  Maybe the easy way out is to state that the
>> server MUST NOT resize smaller.
> 
> I would prefer something along the lines of "the server MUST NOT
> activate the 'resize smaller' command (which it received out of band
> from whereever) until the client asked for and was told the new size of
> the device."
> 
> That is (with A & X being offsets, and B & Y being sizes, A+B < X):
> 
> clientserver  server mgmt
> READ A,B  no error
>   resize to X
> READ X,Y  no error
> reread sizes
> READ X,Y  ENOSPC
> READ A,B  no error
> 
> In that scenario, it should be the responsibility of the server to
> ensure there are no more readers; servers could implement that (if they
> don't want to do the required bookkeeping and keep such requests as
> pending) by simply sending ESHUTDOWN or some such.

In other words, a resize-smaller request (outside of the NBD protocol)
can either force the server to tell the clients to disconnect, or else
is merely a hint that is at the mercy of the client actually being smart
enough to request the information.  I don't know how we'd specify that,
but the idea seems reasonable enough.

The sad part is that the NBD protocol (currently) lacks any way for the
server to send an unsolicited hint to the client.  I've seen other
protocols (such as libvirt) that include a hint mechanism; intentionally
designed where a client that ignores the hints is no worse for the wear,
but where the hints can allow the server to pass information to the
client faster than what is possible if the client never asks the right
question.

Maybe the structured reply proposal can be extended into this (reserve a
"reply" header that can be issued as many times as desired by the server
without the client ever having issued the request first, and where the
reply never uses the end-of-reply marker), but I'm not sure I want to go
that direction just yet.

> 
> I agree with Alex that adding an active resize command to the NBD
> protocol feels like a layering violation. We should probably not do
> that.

Fair enough; for now it will just be a passive re-query command.

> 
>> Should I spend time turning this idea into a more formal spec, along the
>> lines of other NBD extension proposals?
> 
> Feel free.

I'll see what I can come up with in the next day or so, at least into an
RFC-quality spec change.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-01-14 Thread Alex Bligh

> On 14 Jan 2017, at 14:48, Wouter Verhelst  wrote:
> 
> On Thu, Jan 12, 2017 at 06:56:42PM +, Alex Bligh wrote:
>> My preferred way to do this would be essentially to allow NBD_OPT_INFO
>> to be sent (wrapped appropriately) during the main transmission phase.
>> That would allow *any* INFO stuff to be reread.
> 
> Can you go into a bit more detail how you'd see that? It feels a bit
> awkward to me, at best; but I could be missing something.

Well, the idea would be something like: NBD_CMD_INFO, use the offset
specified as the the information type, and put the INFO reply
within the reply block. I guess the client doesn't know the
length of the reply so we'd have to use structured replies
or similar. Although looking through the current info types,
I can't see why today you'd want to reread anything other than
the length, so maybe this is useless futureproofing.

-- 
Alex Bligh

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-01-14 Thread Wouter Verhelst

On Thu, Jan 12, 2017 at 06:56:42PM +, Alex Bligh wrote:
> My preferred way to do this would be essentially to allow NBD_OPT_INFO
> to be sent (wrapped appropriately) during the main transmission phase.
> That would allow *any* INFO stuff to be reread.

Can you go into a bit more detail how you'd see that? It feels a bit
awkward to me, at best; but I could be missing something.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

2017-01-14 Thread Wouter Verhelst

Hi Eric,

(side note: my mutt tells me that the signature on your message does not
validate. Not sure what's going on, but something you might want to look
into...)

On Thu, Jan 12, 2017 at 12:45:56PM -0600, Eric Blake wrote:
> For resize smaller, things are a lot trickier - how do you block access
> to a portion of a file that the client still thinks exist, but which the
> server has truncated away?  Maybe the easy way out is to state that the
> server MUST NOT resize smaller.

I would prefer something along the lines of "the server MUST NOT
activate the 'resize smaller' command (which it received out of band
from whereever) until the client asked for and was told the new size of
the device."

That is (with A & X being offsets, and B & Y being sizes, A+B < X):

client  server  server mgmt
READ A,Bno error
resize to X
READ X,Yno error
reread sizes
READ X,YENOSPC
READ A,Bno error

In that scenario, it should be the responsibility of the server to
ensure there are no more readers; servers could implement that (if they
don't want to do the required bookkeeping and keep such requests as
pending) by simply sending ESHUTDOWN or some such.

> > What about an active `resize` request from the client? Considering some NBD
> > servers might have the capability to do instant resizing, not applying to
> > LVM or host block device, of course...
> 
> And perhaps the 're-read size' command can serve a dual purpose.  Since
> all NBD commands already include space for size and offset parameters,
> we could define that if the client passes 0/0 for size and offset, it
> wants to read the server's current size; if it passes 0/non-zero, it is
> asking the server to resize to the new non-zero size (and the server's
> success or error response tells whether the resize happened).

I agree with Alex that adding an active resize command to the NBD
protocol feels like a layering violation. We should probably not do
that.

> Should I spend time turning this idea into a more formal spec, along the
> lines of other NBD extension proposals?

Feel free.

-- 
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
   people in the world who think they really understand all of its rules,
   and pretty much all of them are just lying to themselves too.
 -- #debian-devel, OFTC, 2016-02-12

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

Re: [Qemu-devel] [Nbd] [Qemu-block] How to online resize qemu disk with nbd protocol?

17 matches

Site Navigation

Mail list logo

Footer information