Re: [PATCH QEMU v23 08/18] vfio: Register SaveVMHandlers for VFIO device

2020-05-21 Thread Dr. David Alan Gilbert
* Kirti Wankhede (kwankh...@nvidia.com) wrote:
> 
> 
> On 5/21/2020 7:48 PM, Dr. David Alan Gilbert wrote:
> > * Kirti Wankhede (kwankh...@nvidia.com) wrote:
> > > Define flags to be used as delimeter in migration file stream.
> > > Added .save_setup and .save_cleanup functions. Mapped & unmapped migration
> > > region from these functions at source during saving or pre-copy phase.
> > > Set VFIO device state depending on VM's state. During live migration, VM 
> > > is
> > > running when .save_setup is called, _SAVING | _RUNNING state is set for 
> > > VFIO
> > > device. During save-restore, VM is paused, _SAVING state is set for VFIO 
> > > device.
> > > 
> > > Signed-off-by: Kirti Wankhede 
> > > Reviewed-by: Neo Jia 



> > > +register_savevm_live("vfio", VMSTATE_INSTANCE_ID_ANY, 1,
> > > + _vfio_handlers, vbasedev);
> > 
> > Hi,
> >This is still the only bit which worries me, and I saw your note
> > saying you'd tested it; to calm my nerves, can you run with the
> > 'qemu_loadvm_state_section_startfull' trace enabled with 2 devices
> > and show me the output and qemu command line?
> > I'm trying to figure out how they end up represented in the stream.
> > 
> 
> Created mtty devices for source VM:
> echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1233" >
> /sys/class/mdev_bus/mtty/mdev_supported_types/mtty-2/create
> echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1234" >
> /sys/class/mdev_bus/mtty/mdev_supported_types/mtty-2/create
> 
> for destination VM:
> echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1235" >
> /sys/class/mdev_bus/mtty/mdev_supported_types/mtty-2/create
> echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1236" >
> /sys/class/mdev_bus/mtty/mdev_supported_types/mtty-2/create
> 
> Source qemu-cmdline:
> /usr/libexec/qemu-kvm \
>  -name guest=rhel75-mig,debug-threads=on \
>  -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off \
>  -cpu SandyBridge,vme=on,hypervisor=on,arat=on,xsaveopt=on \
>  -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 \
>  -uuid eefb718c-137c-d416-e573-dd74ecd3490d \
>  -drive
> file=/home/vm/rhel-75.qcow2,format=qcow2,if=none,id=drive-ide0-0-0,cache=none
> \
>  -device 
> ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1,write-cache=on
> \
>  -vnc 127.0.0.1:0 \
>  -device rtl8139,netdev=net0,mac=52:54:b2:88:86:2a,bus=pci.0,addr=0x3
> -netdev tap,id=net0,script=/root/qemu-ifup,downscript=no \
>  -device
> vfio-pci,sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1233
> \
>  -device
> vfio-pci,sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1234
> \
>  --trace events=/root/vfio_events \
>  -monitor unix:/tmp/qmp_socket1,server,nowait \
>  -serial stdio \
>  -msg timestamp=on
> 
> Destination qemu-cmdline:
> /usr/libexec/qemu-kvm \
>  -name guest=rhel75-mig,debug-threads=on \
>  -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off \
>  -cpu SandyBridge,vme=on,hypervisor=on,arat=on,xsaveopt=on \
>  -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 \
>  -uuid eefb718c-137c-d416-e573-dd74ecd3490d \
>  -drive
> file=/home/vm/rhel-75.qcow2,format=qcow2,if=none,id=drive-ide0-0-0,cache=none
> \
>  -device 
> ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1,write-cache=on
> \
>  -vnc 127.0.0.1:1 \
>  -device rtl8139,netdev=net0,mac=52:54:b2:88:86:2a,bus=pci.0,addr=0x3
> -netdev tap,id=net0,script=/root/qemu-ifup,downscript=no \
>  -device
> vfio-pci,sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1235
> \
>  -device
> vfio-pci,sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1236
> \
>  -incoming unix:/tmp/mig_socket \
>  --trace events=/root/vfio_events \
>  -monitor unix:/tmp/qmp_socket2,server,nowait \
>  -serial stdio \
>  -msg timestamp=on
> 
> Migrate:
> echo "migrate_set_speed 0" | sudo nc -U /tmp/qmp_socket1
> echo "migrate -d unix:/tmp/mig_socket" | sudo nc -U $/tmp/qmp_socket1
> 
> After migration, 'qemu_loadvm_state_section_startfull' traces:
> 
> qemu_loadvm_state_section_startfull 0.000 pid=1457 section_id=0x2
> idstr=b'ram' instance_id=0x0 version_id=0x4
> qemu_loadvm_state_section_startfull 515.606 pid=1457 section_id=0x2e
> idstr=b'vfio' instance_id=0x0 version_id=0x1
> qemu_loadvm_state_section_startfull 10.661 pid=1457 section_id=0x2f
> idstr=b'vfio' instance_id=0x1 version_id=0x1

Right, so this is my worry - we have two devices in the stream called
'vfio' with I think sequential id's - what makes each of your source
vfio devices go to the correct destination vfio device?  If the two
devices were different vfio devices, how would you ensure that they
ended up in the write place?  There's no requirement for
the order of the qemu command line on the source and the destination
to be the same, or for qemu to maintain semantics based on the order -
but I bet that's the ordering were getting here.

> idstr=b':00:03.0/rtl8139' instance_id=0x0 version_id=0x5

Now you see that PCI NIC has a nice PCI address as it's name in the

Re: [PATCH QEMU v23 08/18] vfio: Register SaveVMHandlers for VFIO device

2020-05-21 Thread Kirti Wankhede




On 5/21/2020 7:48 PM, Dr. David Alan Gilbert wrote:

* Kirti Wankhede (kwankh...@nvidia.com) wrote:

Define flags to be used as delimeter in migration file stream.
Added .save_setup and .save_cleanup functions. Mapped & unmapped migration
region from these functions at source during saving or pre-copy phase.
Set VFIO device state depending on VM's state. During live migration, VM is
running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO
device. During save-restore, VM is paused, _SAVING state is set for VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  hw/vfio/migration.c  | 73 
  hw/vfio/trace-events |  2 ++
  2 files changed, 75 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index c2f5564b51c3..773c8d16b1c1 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -8,12 +8,14 @@
   */
  
  #include "qemu/osdep.h"

+#include "qemu/main-loop.h"
  #include 
  
  #include "sysemu/runstate.h"

  #include "hw/vfio/vfio-common.h"
  #include "cpu.h"
  #include "migration/migration.h"
+#include "migration/vmstate.h"
  #include "migration/qemu-file.h"
  #include "migration/register.h"
  #include "migration/blocker.h"
@@ -24,6 +26,17 @@
  #include "pci.h"
  #include "trace.h"
  
+/*

+ * Flags used as delimiter:
+ * 0x => MSB 32-bit all 1s
+ * 0xef10 => emulated (virtual) function IO
+ * 0x => 16-bits reserved for flags
+ */
+#define VFIO_MIG_FLAG_END_OF_STATE  (0xef11ULL)
+#define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xef12ULL)
+#define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
+#define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
+
  static void vfio_migration_region_exit(VFIODevice *vbasedev)
  {
  VFIOMigration *migration = vbasedev->migration;
@@ -126,6 +139,64 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, 
uint32_t mask,
  return 0;
  }
  
+/* -- */

+
+static int vfio_save_setup(QEMUFile *f, void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret;
+
+trace_vfio_save_setup(vbasedev->name);
+
+qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
+
+if (migration->region.mmaps) {
+qemu_mutex_lock_iothread();
+ret = vfio_region_mmap(>region);
+qemu_mutex_unlock_iothread();
+if (ret) {
+error_report("%s: Failed to mmap VFIO migration region %d: %s",
+ vbasedev->name, migration->region.index,
+ strerror(-ret));
+return ret;
+}
+}
+
+ret = vfio_migration_set_state(vbasedev, ~0, VFIO_DEVICE_STATE_SAVING);
+if (ret) {
+error_report("%s: Failed to set state SAVING", vbasedev->name);
+return ret;
+}
+
+qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
+
+ret = qemu_file_get_error(f);
+if (ret) {
+return ret;
+}
+
+return 0;
+}
+
+static void vfio_save_cleanup(void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+
+if (migration->region.mmaps) {
+vfio_region_unmap(>region);
+}
+trace_vfio_save_cleanup(vbasedev->name);
+}
+
+static SaveVMHandlers savevm_vfio_handlers = {
+.save_setup = vfio_save_setup,
+.save_cleanup = vfio_save_cleanup,
+};
+
+/* -- */
+
  static void vfio_vmstate_change(void *opaque, int running, RunState state)
  {
  VFIODevice *vbasedev = opaque;
@@ -192,6 +263,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
  return ret;
  }
  
+register_savevm_live("vfio", VMSTATE_INSTANCE_ID_ANY, 1,

+ _vfio_handlers, vbasedev);


Hi,
   This is still the only bit which worries me, and I saw your note
saying you'd tested it; to calm my nerves, can you run with the
'qemu_loadvm_state_section_startfull' trace enabled with 2 devices
and show me the output and qemu command line?
I'm trying to figure out how they end up represented in the stream.



Created mtty devices for source VM:
echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1233" > 
/sys/class/mdev_bus/mtty/mdev_supported_types/mtty-2/create
echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1234" > 
/sys/class/mdev_bus/mtty/mdev_supported_types/mtty-2/create


for destination VM:
echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1235" > 
/sys/class/mdev_bus/mtty/mdev_supported_types/mtty-2/create
echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1236" > 
/sys/class/mdev_bus/mtty/mdev_supported_types/mtty-2/create


Source qemu-cmdline:
/usr/libexec/qemu-kvm \
 -name guest=rhel75-mig,debug-threads=on \
 -machine pc-i440fx-3.1,accel=kvm,usb=off,dump-guest-core=off \
 -cpu SandyBridge,vme=on,hypervisor=on,arat=on,xsaveopt=on \
 -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 \
 

Re: [PATCH QEMU v23 08/18] vfio: Register SaveVMHandlers for VFIO device

2020-05-21 Thread Dr. David Alan Gilbert
* Kirti Wankhede (kwankh...@nvidia.com) wrote:
> Define flags to be used as delimeter in migration file stream.
> Added .save_setup and .save_cleanup functions. Mapped & unmapped migration
> region from these functions at source during saving or pre-copy phase.
> Set VFIO device state depending on VM's state. During live migration, VM is
> running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO
> device. During save-restore, VM is paused, _SAVING state is set for VFIO 
> device.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  hw/vfio/migration.c  | 73 
> 
>  hw/vfio/trace-events |  2 ++
>  2 files changed, 75 insertions(+)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index c2f5564b51c3..773c8d16b1c1 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -8,12 +8,14 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/main-loop.h"
>  #include 
>  
>  #include "sysemu/runstate.h"
>  #include "hw/vfio/vfio-common.h"
>  #include "cpu.h"
>  #include "migration/migration.h"
> +#include "migration/vmstate.h"
>  #include "migration/qemu-file.h"
>  #include "migration/register.h"
>  #include "migration/blocker.h"
> @@ -24,6 +26,17 @@
>  #include "pci.h"
>  #include "trace.h"
>  
> +/*
> + * Flags used as delimiter:
> + * 0x => MSB 32-bit all 1s
> + * 0xef10 => emulated (virtual) function IO
> + * 0x => 16-bits reserved for flags
> + */
> +#define VFIO_MIG_FLAG_END_OF_STATE  (0xef11ULL)
> +#define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xef12ULL)
> +#define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
> +#define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
> +
>  static void vfio_migration_region_exit(VFIODevice *vbasedev)
>  {
>  VFIOMigration *migration = vbasedev->migration;
> @@ -126,6 +139,64 @@ static int vfio_migration_set_state(VFIODevice 
> *vbasedev, uint32_t mask,
>  return 0;
>  }
>  
> +/* -- */
> +
> +static int vfio_save_setup(QEMUFile *f, void *opaque)
> +{
> +VFIODevice *vbasedev = opaque;
> +VFIOMigration *migration = vbasedev->migration;
> +int ret;
> +
> +trace_vfio_save_setup(vbasedev->name);
> +
> +qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
> +
> +if (migration->region.mmaps) {
> +qemu_mutex_lock_iothread();
> +ret = vfio_region_mmap(>region);
> +qemu_mutex_unlock_iothread();
> +if (ret) {
> +error_report("%s: Failed to mmap VFIO migration region %d: %s",
> + vbasedev->name, migration->region.index,
> + strerror(-ret));
> +return ret;
> +}
> +}
> +
> +ret = vfio_migration_set_state(vbasedev, ~0, VFIO_DEVICE_STATE_SAVING);
> +if (ret) {
> +error_report("%s: Failed to set state SAVING", vbasedev->name);
> +return ret;
> +}
> +
> +qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
> +
> +ret = qemu_file_get_error(f);
> +if (ret) {
> +return ret;
> +}
> +
> +return 0;
> +}
> +
> +static void vfio_save_cleanup(void *opaque)
> +{
> +VFIODevice *vbasedev = opaque;
> +VFIOMigration *migration = vbasedev->migration;
> +
> +if (migration->region.mmaps) {
> +vfio_region_unmap(>region);
> +}
> +trace_vfio_save_cleanup(vbasedev->name);
> +}
> +
> +static SaveVMHandlers savevm_vfio_handlers = {
> +.save_setup = vfio_save_setup,
> +.save_cleanup = vfio_save_cleanup,
> +};
> +
> +/* -- */
> +
>  static void vfio_vmstate_change(void *opaque, int running, RunState state)
>  {
>  VFIODevice *vbasedev = opaque;
> @@ -192,6 +263,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
>  return ret;
>  }
>  
> +register_savevm_live("vfio", VMSTATE_INSTANCE_ID_ANY, 1,
> + _vfio_handlers, vbasedev);

Hi,
  This is still the only bit which worries me, and I saw your note
saying you'd tested it; to calm my nerves, can you run with the
'qemu_loadvm_state_section_startfull' trace enabled with 2 devices
and show me the output and qemu command line?
I'm trying to figure out how they end up represented in the stream.

Dave


>  vbasedev->vm_state = 
> qemu_add_vm_change_state_handler(vfio_vmstate_change,
>vbasedev);
>  
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index bd3d47b005cb..86c18def016e 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -149,3 +149,5 @@ vfio_migration_probe(const char *name, uint32_t index) " 
> (%s) Region %d"
>  vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d"
>  vfio_vmstate_change(const char *name, int running, const char *reason, 
> uint32_t dev_state) " (%s)