Re: Regression save/restore of vsock: (was [PULL 07/55] hw/virtio: move vm_running check to virtio_device_started)

2022-10-14 Thread Christian Borntraeger




Am 14.10.22 um 13:07 schrieb Alex Bennée:


Christian Borntraeger  writes:


Am 14.10.22 um 09:30 schrieb Christian Borntraeger:

Am 10.10.22 um 19:29 schrieb Michael S. Tsirkin:

From: Alex Bennée 

All the boilerplate virtio code does the same thing (or should at
least) of checking to see if the VM is running before attempting to
start VirtIO. Push the logic up to the common function to avoid
getting a copy and paste wrong.

Signed-off-by: Alex Bennée 
Message-Id: <20220802095010.3330793-11-alex.ben...@linaro.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 

This results in a regression for our s390x CI when doing
save/restore of guests with vsock:
      #1  0x03ff9a248580 raise (libc.so.6 + 0x48580)
      #2  0x03ff9a22b5c0 abort (libc.so.6 + 0x2b5c0)
      #3  0x03ff9a2409da __assert_fail_base (libc.so.6 + 
0x409da)
      #4  0x03ff9a240a4e __assert_fail (libc.so.6 + 0x40a4e)
      #5  0x02aa2d69a066 vhost_vsock_common_pre_save 
(qemu-system-s390x + 0x39a066)
      #6  0x02aa2d55570e vmstate_save_state_v 
(qemu-system-s390x + 0x25570e)
      #7  0x02aa2d556218 vmstate_save_state (qemu-system-s390x 
+ 0x256218)
      #8  0x02aa2d570ba4
qemu_savevm_state_complete_precopy_non_iterable (qemu-system-s390x +
0x270ba4)
      #9  0x02aa2d5710b6 qemu_savevm_state_complete_precopy 
(qemu-system-s390x + 0x2710b6)
      #10 0x02aa2d564d0e migration_completion 
(qemu-system-s390x + 0x264d0e)
      #11 0x02aa2d8db25c qemu_thread_start (qemu-system-s390x + 
0x5db25c)
      #12 0x03ff9a296248 start_thread (libc.so.6 + 0x96248)
      #13 0x03ff9a31183e thread_start (libc.so.6 + 0x11183e)




Something like
diff --git a/hw/virtio/vhost-vsock.c b/hw/virtio/vhost-vsock.c
index 7dc3c7393122..b4d056ae6f01 100644
--- a/hw/virtio/vhost-vsock.c
+++ b/hw/virtio/vhost-vsock.c
@@ -73,6 +73,10 @@ static void vhost_vsock_set_status(VirtIODevice *vdev, 
uint8_t status)
  bool should_start = virtio_device_started(vdev, status);
  int ret;
  +if (!vdev->vm_running) {
+should_start = false;
+}
+
  if (vhost_dev_is_started(>vhost_dev) == should_start) {
  return;
  }

helps.

The problem seems to be that virtio_device_started does ignore
vm_running when use_start is set.


Wouldn't it make more sense to re-order the check there, something like:

   static inline bool virtio_device_started(VirtIODevice *vdev, uint8_t status)
   {
   if (!vdev->vm_running) {
   return false;
   }

   if (vdev->use_started) {
   return vdev->started;
   }

   return status & VIRTIO_CONFIG_S_DRIVER_OK;
   }


That does work as well. (and it restores the original ordering so that makes 
sense).



Is the problem that vdev->started gets filled during the migration but
because the VM isn't running yet we can never actually run?


I dont know.



Re: Regression save/restore of vsock: (was [PULL 07/55] hw/virtio: move vm_running check to virtio_device_started)

2022-10-14 Thread Alex Bennée

Christian Borntraeger  writes:

> Am 14.10.22 um 09:30 schrieb Christian Borntraeger:
>> Am 10.10.22 um 19:29 schrieb Michael S. Tsirkin:
>>> From: Alex Bennée 
>>>
>>> All the boilerplate virtio code does the same thing (or should at
>>> least) of checking to see if the VM is running before attempting to
>>> start VirtIO. Push the logic up to the common function to avoid
>>> getting a copy and paste wrong.
>>>
>>> Signed-off-by: Alex Bennée 
>>> Message-Id: <20220802095010.3330793-11-alex.ben...@linaro.org>
>>> Reviewed-by: Michael S. Tsirkin 
>>> Signed-off-by: Michael S. Tsirkin 
>> This results in a regression for our s390x CI when doing
>> save/restore of guests with vsock:
>>      #1  0x03ff9a248580 raise (libc.so.6 + 0x48580)
>>      #2  0x03ff9a22b5c0 abort (libc.so.6 + 0x2b5c0)
>>      #3  0x03ff9a2409da __assert_fail_base (libc.so.6 + 
>> 0x409da)
>>      #4  0x03ff9a240a4e __assert_fail (libc.so.6 + 0x40a4e)
>>      #5  0x02aa2d69a066 vhost_vsock_common_pre_save 
>> (qemu-system-s390x + 0x39a066)
>>      #6  0x02aa2d55570e vmstate_save_state_v 
>> (qemu-system-s390x + 0x25570e)
>>      #7  0x02aa2d556218 vmstate_save_state 
>> (qemu-system-s390x + 0x256218)
>>      #8  0x02aa2d570ba4
>> qemu_savevm_state_complete_precopy_non_iterable (qemu-system-s390x +
>> 0x270ba4)
>>      #9  0x02aa2d5710b6 qemu_savevm_state_complete_precopy 
>> (qemu-system-s390x + 0x2710b6)
>>      #10 0x02aa2d564d0e migration_completion 
>> (qemu-system-s390x + 0x264d0e)
>>      #11 0x02aa2d8db25c qemu_thread_start (qemu-system-s390x 
>> + 0x5db25c)
>>      #12 0x03ff9a296248 start_thread (libc.so.6 + 0x96248)
>>      #13 0x03ff9a31183e thread_start (libc.so.6 + 0x11183e)
>> 
>
>
> Something like
> diff --git a/hw/virtio/vhost-vsock.c b/hw/virtio/vhost-vsock.c
> index 7dc3c7393122..b4d056ae6f01 100644
> --- a/hw/virtio/vhost-vsock.c
> +++ b/hw/virtio/vhost-vsock.c
> @@ -73,6 +73,10 @@ static void vhost_vsock_set_status(VirtIODevice *vdev, 
> uint8_t status)
>  bool should_start = virtio_device_started(vdev, status);
>  int ret;
>  +if (!vdev->vm_running) {
> +should_start = false;
> +}
> +
>  if (vhost_dev_is_started(>vhost_dev) == should_start) {
>  return;
>  }
>
> helps.
>
> The problem seems to be that virtio_device_started does ignore
> vm_running when use_start is set.

Wouldn't it make more sense to re-order the check there, something like:

  static inline bool virtio_device_started(VirtIODevice *vdev, uint8_t status)
  {
  if (!vdev->vm_running) {
  return false;
  }

  if (vdev->use_started) {
  return vdev->started;
  }

  return status & VIRTIO_CONFIG_S_DRIVER_OK;
  }

Is the problem that vdev->started gets filled during the migration but
because the VM isn't running yet we can never actually run?

-- 
Alex Bennée


Re: Regression save/restore of vsock: (was [PULL 07/55] hw/virtio: move vm_running check to virtio_device_started)

2022-10-14 Thread Christian Borntraeger



Am 14.10.22 um 10:37 schrieb Alex Bennée:


Christian Borntraeger  writes:


Am 10.10.22 um 19:29 schrieb Michael S. Tsirkin:

From: Alex Bennée 
All the boilerplate virtio code does the same thing (or should at
least) of checking to see if the VM is running before attempting to
start VirtIO. Push the logic up to the common function to avoid
getting a copy and paste wrong.
Signed-off-by: Alex Bennée 
Message-Id: <20220802095010.3330793-11-alex.ben...@linaro.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 


This results in a regression for our s390x CI when doing save/restore of guests 
with vsock:


 #1  0x03ff9a248580 raise (libc.so.6 + 0x48580)
 #2  0x03ff9a22b5c0 abort (libc.so.6 + 0x2b5c0)
 #3  0x03ff9a2409da __assert_fail_base (libc.so.6 + 0x409da)
 #4  0x03ff9a240a4e __assert_fail (libc.so.6 + 0x40a4e)
 #5  0x02aa2d69a066 vhost_vsock_common_pre_save 
(qemu-system-s390x + 0x39a066)
 #6  0x02aa2d55570e vmstate_save_state_v (qemu-system-s390x 
+ 0x25570e)
 #7  0x02aa2d556218 vmstate_save_state (qemu-system-s390x + 
0x256218)
 #8 0x02aa2d570ba4
qemu_savevm_state_complete_precopy_non_iterable (qemu-system-s390x +
0x270ba4)
 #9  0x02aa2d5710b6 qemu_savevm_state_complete_precopy 
(qemu-system-s390x + 0x2710b6)
 #10 0x02aa2d564d0e migration_completion (qemu-system-s390x 
+ 0x264d0e)
 #11 0x02aa2d8db25c qemu_thread_start (qemu-system-s390x + 
0x5db25c)
 #12 0x03ff9a296248 start_thread (libc.so.6 + 0x96248)
 #13 0x03ff9a31183e thread_start (libc.so.6 + 0x11183e)


Which test does this break?


migrate to file and restore.



Looking at the change the only thing I can think of is there is a subtle
change in the order of checks because if the device is set as
use_started we return the result regardless of vm or config state:

 if (vdev->use_started) {
 return vdev->started;
 }

Could some printfs confirm that?


Right. The problem is we now ignore the vm state and thus run into the 
assertion in vhost_vsock_common_pre_save.
Removing the asserting then results in virtio errors, which really indicates 
that the device must not be started.



Re: Regression save/restore of vsock: (was [PULL 07/55] hw/virtio: move vm_running check to virtio_device_started)

2022-10-14 Thread Alex Bennée


Christian Borntraeger  writes:

> Am 10.10.22 um 19:29 schrieb Michael S. Tsirkin:
>> From: Alex Bennée 
>> All the boilerplate virtio code does the same thing (or should at
>> least) of checking to see if the VM is running before attempting to
>> start VirtIO. Push the logic up to the common function to avoid
>> getting a copy and paste wrong.
>> Signed-off-by: Alex Bennée 
>> Message-Id: <20220802095010.3330793-11-alex.ben...@linaro.org>
>> Reviewed-by: Michael S. Tsirkin 
>> Signed-off-by: Michael S. Tsirkin 
>
> This results in a regression for our s390x CI when doing save/restore of 
> guests with vsock:
>
>
> #1  0x03ff9a248580 raise (libc.so.6 + 0x48580)
> #2  0x03ff9a22b5c0 abort (libc.so.6 + 0x2b5c0)
> #3  0x03ff9a2409da __assert_fail_base (libc.so.6 + 
> 0x409da)
> #4  0x03ff9a240a4e __assert_fail (libc.so.6 + 0x40a4e)
> #5  0x02aa2d69a066 vhost_vsock_common_pre_save 
> (qemu-system-s390x + 0x39a066)
> #6  0x02aa2d55570e vmstate_save_state_v 
> (qemu-system-s390x + 0x25570e)
> #7  0x02aa2d556218 vmstate_save_state (qemu-system-s390x 
> + 0x256218)
> #8 0x02aa2d570ba4
> qemu_savevm_state_complete_precopy_non_iterable (qemu-system-s390x +
> 0x270ba4)
> #9  0x02aa2d5710b6 qemu_savevm_state_complete_precopy 
> (qemu-system-s390x + 0x2710b6)
> #10 0x02aa2d564d0e migration_completion 
> (qemu-system-s390x + 0x264d0e)
> #11 0x02aa2d8db25c qemu_thread_start (qemu-system-s390x + 
> 0x5db25c)
> #12 0x03ff9a296248 start_thread (libc.so.6 + 0x96248)
> #13 0x03ff9a31183e thread_start (libc.so.6 + 0x11183e)

Which test does this break?

Looking at the change the only thing I can think of is there is a subtle
change in the order of checks because if the device is set as
use_started we return the result regardless of vm or config state:

if (vdev->use_started) {
return vdev->started;
}

Could some printfs confirm that?

-- 
Alex Bennée



Re: Regression save/restore of vsock: (was [PULL 07/55] hw/virtio: move vm_running check to virtio_device_started)

2022-10-14 Thread Christian Borntraeger

Am 14.10.22 um 09:30 schrieb Christian Borntraeger:

Am 10.10.22 um 19:29 schrieb Michael S. Tsirkin:

From: Alex Bennée 

All the boilerplate virtio code does the same thing (or should at
least) of checking to see if the VM is running before attempting to
start VirtIO. Push the logic up to the common function to avoid
getting a copy and paste wrong.

Signed-off-by: Alex Bennée 
Message-Id: <20220802095010.3330793-11-alex.ben...@linaro.org>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 


This results in a regression for our s390x CI when doing save/restore of guests 
with vsock:


     #1  0x03ff9a248580 raise (libc.so.6 + 0x48580)
     #2  0x03ff9a22b5c0 abort (libc.so.6 + 0x2b5c0)
     #3  0x03ff9a2409da __assert_fail_base (libc.so.6 + 0x409da)
     #4  0x03ff9a240a4e __assert_fail (libc.so.6 + 0x40a4e)
     #5  0x02aa2d69a066 vhost_vsock_common_pre_save 
(qemu-system-s390x + 0x39a066)
     #6  0x02aa2d55570e vmstate_save_state_v (qemu-system-s390x 
+ 0x25570e)
     #7  0x02aa2d556218 vmstate_save_state (qemu-system-s390x + 
0x256218)
     #8  0x02aa2d570ba4 
qemu_savevm_state_complete_precopy_non_iterable (qemu-system-s390x + 0x270ba4)
     #9  0x02aa2d5710b6 qemu_savevm_state_complete_precopy 
(qemu-system-s390x + 0x2710b6)
     #10 0x02aa2d564d0e migration_completion (qemu-system-s390x 
+ 0x264d0e)
     #11 0x02aa2d8db25c qemu_thread_start (qemu-system-s390x + 
0x5db25c)
     #12 0x03ff9a296248 start_thread (libc.so.6 + 0x96248)
     #13 0x03ff9a31183e thread_start (libc.so.6 + 0x11183e)




Something like
diff --git a/hw/virtio/vhost-vsock.c b/hw/virtio/vhost-vsock.c
index 7dc3c7393122..b4d056ae6f01 100644
--- a/hw/virtio/vhost-vsock.c
+++ b/hw/virtio/vhost-vsock.c
@@ -73,6 +73,10 @@ static void vhost_vsock_set_status(VirtIODevice *vdev, 
uint8_t status)
 bool should_start = virtio_device_started(vdev, status);
 int ret;
 
+if (!vdev->vm_running) {

+should_start = false;
+}
+
 if (vhost_dev_is_started(>vhost_dev) == should_start) {
 return;
 }

helps.

The problem seems to be that virtio_device_started does ignore vm_running when 
use_start is set.