Re: block snapshot issue with RBD

2024-05-29 Thread Kevin Wolf
Am 29.05.2024 um 12:14 hat Fiona Ebner geschrieben:
> I bisected this issue to d3007d348a ("block: Fix crash when loading
> snapshot on inactive node").
> 
> > diff --git a/block/snapshot.c b/block/snapshot.c
> > index ec8cf4810b..c4d40e80dd 100644
> > --- a/block/snapshot.c
> > +++ b/block/snapshot.c
> > @@ -196,8 +196,10 @@ bdrv_snapshot_fallback(BlockDriverState *bs)
> >  int bdrv_can_snapshot(BlockDriverState *bs)
> >  {
> >  BlockDriver *drv = bs->drv;
> > +
> >  GLOBAL_STATE_CODE();
> > -if (!drv || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
> > +
> > +if (!drv || !bdrv_is_inserted(bs) || !bdrv_is_writable(bs)) {
> >  return 0;
> >  }
> >  
> 
> So I guess the issue is that the blockdev is not writable when
> "postmigrate" state?

That makes sense. The error message really isn't great, but after
migration, the image is assumed to be owned by the destination, so we
can't use it any more. 'cont' basically asserts that the migration
failed and we can get ownership back. I don't think we can do without a
manual command reactivating the image on the source, but we could have
one that does this without resuming the VM.

Kevin




Re: block snapshot issue with RBD

2024-05-29 Thread Fiona Ebner
Hi,

Am 28.05.24 um 20:19 schrieb Jin Cao:
> Hi Ilya
> 
> On 5/28/24 11:13 AM, Ilya Dryomov wrote:
>> On Mon, May 27, 2024 at 9:06 PM Jin Cao  wrote:
>>>
>>> Supplementary info: VM is paused after "migrate" command. After being
>>> resumed with "cont", snapshot_delete_blkdev_internal works again, which
>>> is confusing, as disk snapshot generally recommend I/O is paused, and a
>>> frozen VM satisfy this requirement.
>>
>> Hi Jin,
>>
>> This doesn't seem to be related to RBD.  Given that the same error is
>> observed when using the RBD driver with the raw format, I would dig in
>> the direction of migration somehow "installing" the raw format (which
>> is on-disk compatible with the rbd format).
>>
> 
> Thanks for the hint.
> 
>> Also, did you mean to say "snapshot_blkdev_internal" instead of
>> "snapshot_delete_blkdev_internal" in both instances?
> 
> Sorry for my copy-and-paste mistake. Yes, it's snapshot_blkdev_internal.
> 
> -- 
> Sincerely,
> Jin Cao
> 
>>
>> Thanks,
>>
>>          Ilya
>>
>>>
>>> -- 
>>> Sincerely
>>> Jin Cao
>>>
>>> On 5/27/24 10:56 AM, Jin Cao wrote:
>>>> CC block and migration related address.
>>>>
>>>> On 5/27/24 12:03 AM, Jin Cao wrote:
>>>>> Hi,
>>>>>
>>>>> I encountered RBD block snapshot issue after doing migration.
>>>>>
>>>>> Steps
>>>>> -
>>>>>
>>>>> 1. Start QEMU with:
>>>>> ./qemu-system-x86_64 -name VM -machine q35 -accel kvm -cpu
>>>>> host,migratable=on -m 2G -boot menu=on,strict=on
>>>>> rbd:image/ubuntu-22.04-server-cloudimg-amd64.raw -net nic -net user
>>>>> -cdrom /home/my/path/of/cloud-init.iso -monitor stdio
>>>>>
>>>>> 2. Do block snapshot in monitor cmd: snapshot_delete_blkdev_internal.
>>>>> It works as expected: the snapshot is visable with command`rbd snap ls
>>>>> pool_name/image_name`.
>>>>>
>>>>> 3. Do pseudo migration with monitor cmd: migrate -d
>>>>> exec:cat>/tmp/vm.out
>>>>>
>>>>> 4. Do block snapshot again with snapshot_delete_blkdev_internal, then
>>>>> I get:
>>>>>  Error: Block format 'raw' used by device 'ide0-hd0' does not
>>>>> support internal snapshots
>>>>>
>>>>> I was hoping to do the second block snapshot successfully, and it
>>>>> feels abnormal the RBD block snapshot function is disrupted after
>>>>> migration.
>>>>>
>>>>> BTW, I get the same block snapshot error when I start QEMU with:
>>>>>   "-drive format=raw,file=rbd:pool_name/image_name"
>>>>>
>>>>> My questions is: how could I proceed with RBD block snapshot after the
>>>>> pseudo migration?
> 
> 

I bisected this issue to d3007d348a ("block: Fix crash when loading
snapshot on inactive node").

> diff --git a/block/snapshot.c b/block/snapshot.c
> index ec8cf4810b..c4d40e80dd 100644
> --- a/block/snapshot.c
> +++ b/block/snapshot.c
> @@ -196,8 +196,10 @@ bdrv_snapshot_fallback(BlockDriverState *bs)
>  int bdrv_can_snapshot(BlockDriverState *bs)
>  {
>  BlockDriver *drv = bs->drv;
> +
>  GLOBAL_STATE_CODE();
> -if (!drv || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
> +
> +if (!drv || !bdrv_is_inserted(bs) || !bdrv_is_writable(bs)) {
>  return 0;
>  }
>  

So I guess the issue is that the blockdev is not writable when
"postmigrate" state?

Best Regards,
Fiona




Re: block snapshot issue with RBD

2024-05-28 Thread Jin Cao

Hi Ilya

On 5/28/24 11:13 AM, Ilya Dryomov wrote:

On Mon, May 27, 2024 at 9:06 PM Jin Cao  wrote:


Supplementary info: VM is paused after "migrate" command. After being
resumed with "cont", snapshot_delete_blkdev_internal works again, which
is confusing, as disk snapshot generally recommend I/O is paused, and a
frozen VM satisfy this requirement.


Hi Jin,

This doesn't seem to be related to RBD.  Given that the same error is
observed when using the RBD driver with the raw format, I would dig in
the direction of migration somehow "installing" the raw format (which
is on-disk compatible with the rbd format).



Thanks for the hint.


Also, did you mean to say "snapshot_blkdev_internal" instead of
"snapshot_delete_blkdev_internal" in both instances?


Sorry for my copy-and-paste mistake. Yes, it's snapshot_blkdev_internal.

--
Sincerely,
Jin Cao



Thanks,

 Ilya



--
Sincerely
Jin Cao

On 5/27/24 10:56 AM, Jin Cao wrote:

CC block and migration related address.

On 5/27/24 12:03 AM, Jin Cao wrote:

Hi,

I encountered RBD block snapshot issue after doing migration.

Steps
-

1. Start QEMU with:
./qemu-system-x86_64 -name VM -machine q35 -accel kvm -cpu
host,migratable=on -m 2G -boot menu=on,strict=on
rbd:image/ubuntu-22.04-server-cloudimg-amd64.raw -net nic -net user
-cdrom /home/my/path/of/cloud-init.iso -monitor stdio

2. Do block snapshot in monitor cmd: snapshot_delete_blkdev_internal.
It works as expected: the snapshot is visable with command`rbd snap ls
pool_name/image_name`.

3. Do pseudo migration with monitor cmd: migrate -d exec:cat>/tmp/vm.out

4. Do block snapshot again with snapshot_delete_blkdev_internal, then
I get:
 Error: Block format 'raw' used by device 'ide0-hd0' does not
support internal snapshots

I was hoping to do the second block snapshot successfully, and it
feels abnormal the RBD block snapshot function is disrupted after
migration.

BTW, I get the same block snapshot error when I start QEMU with:
  "-drive format=raw,file=rbd:pool_name/image_name"

My questions is: how could I proceed with RBD block snapshot after the
pseudo migration?




Re: block snapshot issue with RBD

2024-05-28 Thread Ilya Dryomov
On Mon, May 27, 2024 at 9:06 PM Jin Cao  wrote:
>
> Supplementary info: VM is paused after "migrate" command. After being
> resumed with "cont", snapshot_delete_blkdev_internal works again, which
> is confusing, as disk snapshot generally recommend I/O is paused, and a
> frozen VM satisfy this requirement.

Hi Jin,

This doesn't seem to be related to RBD.  Given that the same error is
observed when using the RBD driver with the raw format, I would dig in
the direction of migration somehow "installing" the raw format (which
is on-disk compatible with the rbd format).

Also, did you mean to say "snapshot_blkdev_internal" instead of
"snapshot_delete_blkdev_internal" in both instances?

Thanks,

Ilya

>
> --
> Sincerely
> Jin Cao
>
> On 5/27/24 10:56 AM, Jin Cao wrote:
> > CC block and migration related address.
> >
> > On 5/27/24 12:03 AM, Jin Cao wrote:
> >> Hi,
> >>
> >> I encountered RBD block snapshot issue after doing migration.
> >>
> >> Steps
> >> -
> >>
> >> 1. Start QEMU with:
> >> ./qemu-system-x86_64 -name VM -machine q35 -accel kvm -cpu
> >> host,migratable=on -m 2G -boot menu=on,strict=on
> >> rbd:image/ubuntu-22.04-server-cloudimg-amd64.raw -net nic -net user
> >> -cdrom /home/my/path/of/cloud-init.iso -monitor stdio
> >>
> >> 2. Do block snapshot in monitor cmd: snapshot_delete_blkdev_internal.
> >> It works as expected: the snapshot is visable with command`rbd snap ls
> >> pool_name/image_name`.
> >>
> >> 3. Do pseudo migration with monitor cmd: migrate -d exec:cat>/tmp/vm.out
> >>
> >> 4. Do block snapshot again with snapshot_delete_blkdev_internal, then
> >> I get:
> >> Error: Block format 'raw' used by device 'ide0-hd0' does not
> >> support internal snapshots
> >>
> >> I was hoping to do the second block snapshot successfully, and it
> >> feels abnormal the RBD block snapshot function is disrupted after
> >> migration.
> >>
> >> BTW, I get the same block snapshot error when I start QEMU with:
> >>  "-drive format=raw,file=rbd:pool_name/image_name"
> >>
> >> My questions is: how could I proceed with RBD block snapshot after the
> >> pseudo migration?



Re: block snapshot issue with RBD

2024-05-27 Thread Jin Cao
Supplementary info: VM is paused after "migrate" command. After being 
resumed with "cont", snapshot_delete_blkdev_internal works again, which 
is confusing, as disk snapshot generally recommend I/O is paused, and a 
frozen VM satisfy this requirement.


--
Sincerely
Jin Cao

On 5/27/24 10:56 AM, Jin Cao wrote:

CC block and migration related address.

On 5/27/24 12:03 AM, Jin Cao wrote:

Hi,

I encountered RBD block snapshot issue after doing migration.

Steps
-

1. Start QEMU with:
./qemu-system-x86_64 -name VM -machine q35 -accel kvm -cpu 
host,migratable=on -m 2G -boot menu=on,strict=on 
rbd:image/ubuntu-22.04-server-cloudimg-amd64.raw -net nic -net user 
-cdrom /home/my/path/of/cloud-init.iso -monitor stdio


2. Do block snapshot in monitor cmd: snapshot_delete_blkdev_internal. 
It works as expected: the snapshot is visable with command`rbd snap ls 
pool_name/image_name`.


3. Do pseudo migration with monitor cmd: migrate -d exec:cat>/tmp/vm.out

4. Do block snapshot again with snapshot_delete_blkdev_internal, then 
I get:
    Error: Block format 'raw' used by device 'ide0-hd0' does not 
support internal snapshots


I was hoping to do the second block snapshot successfully, and it 
feels abnormal the RBD block snapshot function is disrupted after 
migration.


BTW, I get the same block snapshot error when I start QEMU with:
 "-drive format=raw,file=rbd:pool_name/image_name"

My questions is: how could I proceed with RBD block snapshot after the 
pseudo migration?




Re: block snapshot issue with RBD

2024-05-27 Thread Jin Cao

CC block and migration related address.

On 5/27/24 12:03 AM, Jin Cao wrote:

Hi,

I encountered RBD block snapshot issue after doing migration.

Steps
-

1. Start QEMU with:
./qemu-system-x86_64 -name VM -machine q35 -accel kvm -cpu 
host,migratable=on -m 2G -boot menu=on,strict=on 
rbd:image/ubuntu-22.04-server-cloudimg-amd64.raw -net nic -net user 
-cdrom /home/my/path/of/cloud-init.iso -monitor stdio


2. Do block snapshot in monitor cmd: snapshot_delete_blkdev_internal. It 
works as expected: the snapshot is visable with command`rbd snap ls 
pool_name/image_name`.


3. Do pseudo migration with monitor cmd: migrate -d exec:cat>/tmp/vm.out

4. Do block snapshot again with snapshot_delete_blkdev_internal, then I 
get:
    Error: Block format 'raw' used by device 'ide0-hd0' does not support 
internal snapshots


I was hoping to do the second block snapshot successfully, and it feels 
abnormal the RBD block snapshot function is disrupted after migration.


BTW, I get the same block snapshot error when I start QEMU with:
     "-drive format=raw,file=rbd:pool_name/image_name"

My questions is: how could I proceed with RBD block snapshot after the 
pseudo migration?




block snapshot issue with RBD

2024-05-27 Thread Jin Cao

Hi,

I encountered RBD block snapshot issue after doing migration.

Steps
-

1. Start QEMU with:
./qemu-system-x86_64 -name VM -machine q35 -accel kvm -cpu 
host,migratable=on -m 2G -boot menu=on,strict=on 
rbd:image/ubuntu-22.04-server-cloudimg-amd64.raw -net nic -net user 
-cdrom /home/my/path/of/cloud-init.iso -monitor stdio


2. Do block snapshot in monitor cmd: snapshot_delete_blkdev_internal. It 
works as expected: the snapshot is visable with command`rbd snap ls 
pool_name/image_name`.


3. Do pseudo migration with monitor cmd: migrate -d exec:cat>/tmp/vm.out

4. Do block snapshot again with snapshot_delete_blkdev_internal, then I get:
   Error: Block format 'raw' used by device 'ide0-hd0' does not support 
internal snapshots


I was hoping to do the second block snapshot successfully, and it feels 
abnormal the RBD block snapshot function is disrupted after migration.


BTW, I get the same block snapshot error when I start QEMU with:
"-drive format=raw,file=rbd:pool_name/image_name"

My questions is: how could I proceed with RBD block snapshot after the 
pseudo migration?