[Qemu-block] How to online resize qemu disk with nbd protocol?

2017-01-12 Thread Bob Chen
Hi,


My qemu runs on a 3rd party distributed block storage, and the disk backend
protocol is nbd.

I notices that there are differences between default qcow2 local disk and
my nbd disk, in terms of resizing the disk on the fly.

Local qcow2 disk could work no matter using qemu-img resize or qemu monitor
'block_resize', but the nbd disk seemed to fail to detect the backend size
change(had resized the disk on EBS at first). It said "this feature or
command is not currently supported".

Is that possible to hack qemu nbd code, making it the same way as resizing
local qcow2 disk? I have the interface to resize EBS disk at backend.


Regards,
Bob


Re: [Qemu-block] [PATCH] pflash_cfi01: fix per device sector length in CFI table

2017-01-12 Thread Peter Maydell
On 12 January 2017 at 10:35, David Engraf  wrote:
> The CFI entry for sector length must be set to sector length per device.
> This is important for boards using multiple devices like the ARM Vexpress
> board (width = 4, device-width = 2).
>
> Linux and u-boots calculate the size ratio by dividing both values:
>
> size_ratio = info->portwidth / info->chipwidth;
>
> After that the sector length will be multiplied by the size_ratio, thus the
> CFI entry for sector length is doubled. When Linux or u-boot send a sector
> erase, they expect to erase the doubled sector length, but QEMU only erases
> the board specified sector length.
>
> This patch fixes the sector length in the CFI table to match the length per
> device, equal to blocks_per_device.

Thanks for the patch. I haven't checked against the pflash spec yet,
but this looks like it's probably the right thing.

The only two machines which use a setup with multiple devices (ie
which specify device_width to the pflash_cfi01) are vexpress and virt.
For all other machines this patch leaves the behaviour unchanged.

Q: do we need to have some kind of nasty hack so that pre-2.9 virt
still gets the old broken values in the CFI table, for version and
migration compatibility? Ccing Drew for an opinion...

thanks
-- PMM



Re: [Qemu-block] [Qemu-devel] [PATCH] pflash_cfi01: fix per device sector length in CFI table

2017-01-12 Thread David Engraf

Am 12.01.2017 um 12:36 schrieb Andrew Jones:

On Thu, Jan 12, 2017 at 10:42:41AM +, Peter Maydell wrote:

On 12 January 2017 at 10:35, David Engraf  wrote:

The CFI entry for sector length must be set to sector length per device.
This is important for boards using multiple devices like the ARM Vexpress
board (width = 4, device-width = 2).

Linux and u-boots calculate the size ratio by dividing both values:

size_ratio = info->portwidth / info->chipwidth;

After that the sector length will be multiplied by the size_ratio, thus the
CFI entry for sector length is doubled. When Linux or u-boot send a sector
erase, they expect to erase the doubled sector length, but QEMU only erases
the board specified sector length.

This patch fixes the sector length in the CFI table to match the length per
device, equal to blocks_per_device.


Thanks for the patch. I haven't checked against the pflash spec yet,
but this looks like it's probably the right thing.

The only two machines which use a setup with multiple devices (ie
which specify device_width to the pflash_cfi01) are vexpress and virt.
For all other machines this patch leaves the behaviour unchanged.

Q: do we need to have some kind of nasty hack so that pre-2.9 virt
still gets the old broken values in the CFI table, for version and
migration compatibility? Ccing Drew for an opinion...



I'm pretty sure we need the nasty hack, but I'm also Ccing David for
his opinion.


I can update the patch if you give me some more information about the 
hack. Some ifdef for older versions?


Best regards
- David




[Qemu-block] [PATCH] pflash_cfi01: fix per device sector length in CFI table

2017-01-12 Thread David Engraf
The CFI entry for sector length must be set to sector length per device. 
This is important for boards using multiple devices like the ARM 
Vexpress board (width = 4, device-width = 2).


Linux and u-boots calculate the size ratio by dividing both values:

size_ratio = info->portwidth / info->chipwidth;

After that the sector length will be multiplied by the size_ratio, thus 
the CFI entry for sector length is doubled. When Linux or u-boot send a 
sector erase, they expect to erase the doubled sector length, but QEMU 
only erases the board specified sector length.


This patch fixes the sector length in the CFI table to match the length 
per device, equal to blocks_per_device.


Signed-off-by: David Engraf 

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 5f0ee9d..8bb61e4 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -703,7 +703,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
 pflash_t *pfl = CFI_PFLASH01(dev);
 uint64_t total_len;
 int ret;
-uint64_t blocks_per_device, device_len;
+uint64_t blocks_per_device, sector_len_per_device, device_len;
 int num_devices;
 Error *local_err = NULL;
 
@@ -727,7 +727,8 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
  */
 num_devices = pfl->device_width ? (pfl->bank_width / pfl->device_width) : 1;
 blocks_per_device = pfl->nb_blocs / num_devices;
-device_len = pfl->sector_len * blocks_per_device;
+sector_len_per_device = pfl->sector_len / num_devices;
+device_len = sector_len_per_device * blocks_per_device;
 
 /* XXX: to be fixed */
 #if 0
@@ -839,8 +840,8 @@ static void pflash_cfi01_realize(DeviceState *dev, Error **errp)
 /* Erase block region 1 */
 pfl->cfi_table[0x2D] = blocks_per_device - 1;
 pfl->cfi_table[0x2E] = (blocks_per_device - 1) >> 8;
-pfl->cfi_table[0x2F] = pfl->sector_len >> 8;
-pfl->cfi_table[0x30] = pfl->sector_len >> 16;
+pfl->cfi_table[0x2F] = sector_len_per_device >> 8;
+pfl->cfi_table[0x30] = sector_len_per_device >> 16;
 
 /* Extended */
 pfl->cfi_table[0x31] = 'P';


Re: [Qemu-block] [Qemu-devel] [PATCH] pflash_cfi01: fix per device sector length in CFI table

2017-01-12 Thread Andrew Jones
On Thu, Jan 12, 2017 at 10:42:41AM +, Peter Maydell wrote:
> On 12 January 2017 at 10:35, David Engraf  wrote:
> > The CFI entry for sector length must be set to sector length per device.
> > This is important for boards using multiple devices like the ARM Vexpress
> > board (width = 4, device-width = 2).
> >
> > Linux and u-boots calculate the size ratio by dividing both values:
> >
> > size_ratio = info->portwidth / info->chipwidth;
> >
> > After that the sector length will be multiplied by the size_ratio, thus the
> > CFI entry for sector length is doubled. When Linux or u-boot send a sector
> > erase, they expect to erase the doubled sector length, but QEMU only erases
> > the board specified sector length.
> >
> > This patch fixes the sector length in the CFI table to match the length per
> > device, equal to blocks_per_device.
> 
> Thanks for the patch. I haven't checked against the pflash spec yet,
> but this looks like it's probably the right thing.
> 
> The only two machines which use a setup with multiple devices (ie
> which specify device_width to the pflash_cfi01) are vexpress and virt.
> For all other machines this patch leaves the behaviour unchanged.
> 
> Q: do we need to have some kind of nasty hack so that pre-2.9 virt
> still gets the old broken values in the CFI table, for version and
> migration compatibility? Ccing Drew for an opinion...
>

I'm pretty sure we need the nasty hack, but I'm also Ccing David for
his opinion.

Thanks,
drew



Re: [Qemu-block] NBD handshake may block qemu main thread when socket delays or has packet loss

2017-01-12 Thread Stefan Hajnoczi
On Wed, Jan 04, 2017 at 02:44:53PM -0600, Eric Blake wrote:
> On 01/04/2017 02:45 AM, Fangyi (C) wrote:
> > As we all know, socket is in blocking mode when nbd is negotiating
> > with the other end. If the network is poor because of delay or packet
> > loss, socket read or write will return very slowly. The mainloop events
> > won't be handled in time util nbd handshake ends.
> 
> I wonder if Paolo's work to improve NBD coroutine usage after handshakes
> can be leveraged here?
> 
> https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg03224.html
> 
> > 
> > Any advices to solve the problem?
> 
> At any rate, it sounds like someone will have to patch NBD code to use
> coroutines instead of blocking for the handshake portion (and that's
> true in general - ANY operation that can block should probably be
> refactored into aio or coroutines so that the main loop can remain
> responsive).

This is a general issue with network block drivers.  They tend to do
blocking DNS resolution, blocking connection, and blocking protocol
handshake/negotiation in .bdrv_open().

We cannot expose a block device to the guest before it has been opened
because the disk's capacity is unknown plus the guest would experience
I/O timeouts or errors.

I think we need to agree on how to handle this for all different types
of network protocols, not just nbd, before code can be written.

One starting point is:

Should we make .drv_open() a coroutine and introduce a async concept to
blockdev_add, reopen, etc?

The BlockDriverState would be in a special OPENING or OFFLINE state
where its name is reserved but it cannot be used for I/O or emulated
devices yet.

QMP clients would have watch out for an event that tells them that it's
now okay to device-add the emulated storage device using the drive.

Any ideas for a nicer solution?

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] How to online resize qemu disk with nbd protocol?

2017-01-12 Thread Eric Blake
On 01/12/2017 04:22 AM, Bob Chen wrote:
> Hi,
> 
> 
> My qemu runs on a 3rd party distributed block storage, and the disk backend
> protocol is nbd.
> 
> I notices that there are differences between default qcow2 local disk and
> my nbd disk, in terms of resizing the disk on the fly.
> 
> Local qcow2 disk could work no matter using qemu-img resize 

No. Don't EVER use 'qemu-img resize' while a qemu process is actively
serving a VM guest.  Two concurrent modifiers to a qcow2 file is
unsupported, and is highly likely to corrupt your guest's view of the
disk, if not the entire qcow2 file.

> or qemu monitor
> 'block_resize',

Yes, that is the only supported way to do an online resize.

> but the nbd disk seemed to fail to detect the backend size
> change(had resized the disk on EBS at first). It said "this feature or
> command is not currently supported".

That's because the NBD protocol lacks a resize command.  You'd have to
first get that proposed as an NBD extension before qemu could support it.

> 
> Is that possible to hack qemu nbd code, making it the same way as resizing
> local qcow2 disk? I have the interface to resize EBS disk at backend.

Anything is possible in open source with enough time and patches, but
the place to tackle this is to first propose an extension to the NBD
protocol (I've added the NBD list in cc).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Nbd] [Qemu-devel] How to online resize qemu disk with nbd protocol?

2017-01-12 Thread Alex Bligh

> On 12 Jan 2017, at 14:43, Eric Blake  wrote:
> 
> That's because the NBD protocol lacks a resize command.  You'd have to
> first get that proposed as an NBD extension before qemu could support it.

Actually the NBD protocol lacks a 'make a disk with size X' command,
let alone a resize command. The size of an NBD disk is (currently)
entirely in the hands of the server. What I think we'd really need
would be a 'reread size' command, and have the server capable of
supporting resizing. That would then work for readonly images too.

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Qemu-block] [Nbd] [Qemu-devel] How to online resize qemu disk with nbd protocol?

2017-01-12 Thread Stefan Hajnoczi
On Thu, Jan 12, 2017 at 3:44 PM, Alex Bligh  wrote:
>> On 12 Jan 2017, at 14:43, Eric Blake  wrote:
>> That's because the NBD protocol lacks a resize command.  You'd have to
>> first get that proposed as an NBD extension before qemu could support it.
>
> Actually the NBD protocol lacks a 'make a disk with size X' command,
> let alone a resize command. The size of an NBD disk is (currently)
> entirely in the hands of the server. What I think we'd really need
> would be a 'reread size' command, and have the server capable of
> supporting resizing. That would then work for readonly images too.

That would be fine for QEMU.  Resizing LVM volumes or host block
devices works exactly like this.

Stefan



Re: [Qemu-block] [Nbd] [Qemu-devel] How to online resize qemu disk with nbd protocol?

2017-01-12 Thread Bob Chen
There might be a time window between the NBD server's resize and the
client's `re-read size` request. Is it safe?

What about an active `resize` request from the client? Considering some NBD
servers might have the capability to do instant resizing, not applying to
LVM or host block device, of course...


Regards,
Bob

2017-01-13 0:54 GMT+08:00 Stefan Hajnoczi :

> On Thu, Jan 12, 2017 at 3:44 PM, Alex Bligh  wrote:
> >> On 12 Jan 2017, at 14:43, Eric Blake  wrote:
> >> That's because the NBD protocol lacks a resize command.  You'd have to
> >> first get that proposed as an NBD extension before qemu could support
> it.
> >
> > Actually the NBD protocol lacks a 'make a disk with size X' command,
> > let alone a resize command. The size of an NBD disk is (currently)
> > entirely in the hands of the server. What I think we'd really need
> > would be a 'reread size' command, and have the server capable of
> > supporting resizing. That would then work for readonly images too.
>
> That would be fine for QEMU.  Resizing LVM volumes or host block
> devices works exactly like this.
>
> Stefan
>
>


Re: [Qemu-block] [Qemu-devel] [Nbd] How to online resize qemu disk with nbd protocol?

2017-01-12 Thread Eric Blake
On 01/12/2017 11:57 AM, Bob Chen wrote:
> There might be a time window between the NBD server's resize and the
> client's `re-read size` request. Is it safe?

For resize larger, it seems that it would be safe for the server to just
remember the last size it has advertised to the client.  As long as the
client doesn't request read/write to any offset beyond the
last-advertised size (either the size the server gave at handshake, or
the size that the server reported when the client last used the new
're-read size' command), there's no difference in behavior (and
well-behaved clients fall in this group); if the client DOES try to
access out-of-bounds with respect to the last known size, the server
SHOULD reject the access, but MAY serve it instead.  A client that is
unaware of the 're-read size' command will never learn that the server
is now offering a larger image.

For resize smaller, things are a lot trickier - how do you block access
to a portion of a file that the client still thinks exist, but which the
server has truncated away?  Maybe the easy way out is to state that the
server MUST NOT resize smaller.

> 
> What about an active `resize` request from the client? Considering some NBD
> servers might have the capability to do instant resizing, not applying to
> LVM or host block device, of course...

And perhaps the 're-read size' command can serve a dual purpose.  Since
all NBD commands already include space for size and offset parameters,
we could define that if the client passes 0/0 for size and offset, it
wants to read the server's current size; if it passes 0/non-zero, it is
asking the server to resize to the new non-zero size (and the server's
success or error response tells whether the resize happened).

Should I spend time turning this idea into a more formal spec, along the
lines of other NBD extension proposals?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [Nbd] How to online resize qemu disk with nbd protocol?

2017-01-12 Thread Alex Bligh

> On 12 Jan 2017, at 18:45, Eric Blake  wrote:
> 
> On 01/12/2017 11:57 AM, Bob Chen wrote:
>> There might be a time window between the NBD server's resize and the
>> client's `re-read size` request. Is it safe?
> 
> For resize larger, it seems that it would be safe for the server to just
> remember the last size it has advertised to the client.  As long as the
> client doesn't request read/write to any offset beyond the
> last-advertised size (either the size the server gave at handshake, or
> the size that the server reported when the client last used the new
> 're-read size' command), there's no difference in behavior (and
> well-behaved clients fall in this group); if the client DOES try to
> access out-of-bounds with respect to the last known size, the server
> SHOULD reject the access, but MAY serve it instead.  A client that is
> unaware of the 're-read size' command will never learn that the server
> is now offering a larger image.
> 
> For resize smaller, things are a lot trickier - how do you block access
> to a portion of a file that the client still thinks exist, but which the
> server has truncated away?  Maybe the easy way out is to state that the
> server MUST NOT resize smaller.

I'm not sure why this is any harder than Qemu writing to a physical partition
that changes size. You need to change the size of things in order. To
resize smaller, you tell Qemu it's smaller, then resize the partition. To
resize larger, you resize the partition then tell Qemu it's larger.

This is incidentally exactly the same thing you do if you have a filing
system on a partition (or a filing system on a real nbd device - we could
support the same reread of geometry in kernel).

>> 
>> What about an active `resize` request from the client? Considering some NBD
>> servers might have the capability to do instant resizing, not applying to
>> LVM or host block device, of course...
> 
> And perhaps the 're-read size' command can serve a dual purpose.  Since
> all NBD commands already include space for size and offset parameters,
> we could define that if the client passes 0/0 for size and offset, it
> wants to read the server's current size; if it passes 0/non-zero, it is
> asking the server to resize to the new non-zero size (and the server's
> success or error response tells whether the resize happened).
> 
> Should I spend time turning this idea into a more formal spec, along the
> lines of other NBD extension proposals?

I'd support the reread bit. I'm less sure we ever want to push a new
size from the client. It seems a bit like a layering violation. Perhaps
I can be convinced.

My preferred way to do this would be essentially to allow NBD_OPT_INFO
to be sent (wrapped appropriately) during the main transmission phase.
That would allow *any* INFO stuff to be reread.

If it's just this (rather than a resize command) I guess it could
go into the NBD_OPT_INFO extension branch. If it's going to do
more than _INFO_, then I guess it needs its own extension.

--
Alex Bligh






signature.asc
Description: Message signed with OpenPGP using GPGMail