Re: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
On 6/25/19 12:11 PM, Dr. David Alan Gilbert wrote: * Marcel Apfelbaum (marcel.apfelb...@gmail.com) wrote: Hi Dmitry, On 6/25/19 11:39 AM, Dmitry Fleytman wrote: On 25 Jun 2019, at 11:14, Marcel Apfelbaum wrote: Hi Sukrit On 6/21/19 5:45 PM, Sukrit Bhatnagar wrote: Hi, [...] This RFC is meant to request suggestions on the things which are working and for help on the things which are not. [...] What is not working: [...] * It seems that vmxnet3 migration itself is not working properly, at least for me. The pvrdma device depends on it, vmxnet3 is function 0 and pvrdma is function 1. This is happening even for a build of unmodified code from the master branch. After migration, the network connectivity is lost at destination. Things are fine at the source before migration. This is the command I am using at src: sudo /home/skrtbhtngr/qemu/build/x86_64-softmmu/qemu-system-x86_64 \ -enable-kvm \ -m 2G -smp cpus=2 \ -hda /home/skrtbhtngr/fedora.img \ -netdev tap,id=hostnet0 \ -device vmxnet3,netdev=hostnet0,id=net0,mac=52:54:00:99:ff:bc \ -monitor telnet:127.0.0.1:,server,nowait \ -trace events=/home/skrtbhtngr/trace-events \ -vnc 0.0.0.0:0 Similar command is used for the dest. Currently, I am trying same-host migration for testing purpose, without the pvrdma device. Two tap interfaces, for src and dest were created successfully at the host. Kernel logs: ... br0: port 2(tap0) entered forwarding state ... br0: port 3(tap1) entered forwarding state tcpdump at the dest reports only outgoing ARP packets, which ask for gateway: "ARP, Request who-has _gateway tell guest1". Tried using user (slirp) as the network backend, but no luck. Also tried git bisect to find the issue using a working commit (given by Marcel), but it turns out that it is very old and I faced build errors one after another. Please note that e1000 live migration is working fine in the same setup. I tried to git bisect , but I couldn't find a working version of vmxnet supporting live migration I tried even a commit from December 2014 and it didn't work. What is strange (to me) is that the networking packets can't be sent from the guest (after migration) even after rebooting the guest. This makes me think that some network offload configuration wasn’t properly migrated or applied. What network backend are you using? Suktrit tried with tap device, I tried with slirp. If you can point me to the property that disables all the offloads it will really help. Do you see any outgoing packets in the sniffer? I didn't use the sniffer, I checked dmesg in guest, there was a line complaining that it can't send packets. What exactly was the error? I'll try to reproduce the error Thanks, Marcel I don't know much about vmxnet3; but if the guest driver is seeing the problem then I guess that's the best pointer we have. Dave Thanks, Marcel Any help or pointer would be greatly appreciated. Thanks, Marcel [...] -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
> On 25 Jun 2019, at 11:49, Marcel Apfelbaum wrote: > > Hi Dmitry, > > On 6/25/19 11:39 AM, Dmitry Fleytman wrote: >> >>> On 25 Jun 2019, at 11:14, Marcel Apfelbaum >>> wrote: >>> >>> Hi Sukrit >>> >>> On 6/21/19 5:45 PM, Sukrit Bhatnagar wrote: Hi, >>> [...] This RFC is meant to request suggestions on the things which are working and for help on the things which are not. >>> [...] What is not working: >>> [...] * It seems that vmxnet3 migration itself is not working properly, at least for me. The pvrdma device depends on it, vmxnet3 is function 0 and pvrdma is function 1. This is happening even for a build of unmodified code from the master branch. After migration, the network connectivity is lost at destination. Things are fine at the source before migration. This is the command I am using at src: sudo /home/skrtbhtngr/qemu/build/x86_64-softmmu/qemu-system-x86_64 \ -enable-kvm \ -m 2G -smp cpus=2 \ -hda /home/skrtbhtngr/fedora.img \ -netdev tap,id=hostnet0 \ -device vmxnet3,netdev=hostnet0,id=net0,mac=52:54:00:99:ff:bc \ -monitor telnet:127.0.0.1:,server,nowait \ -trace events=/home/skrtbhtngr/trace-events \ -vnc 0.0.0.0:0 Similar command is used for the dest. Currently, I am trying same-host migration for testing purpose, without the pvrdma device. Two tap interfaces, for src and dest were created successfully at the host. Kernel logs: ... br0: port 2(tap0) entered forwarding state ... br0: port 3(tap1) entered forwarding state tcpdump at the dest reports only outgoing ARP packets, which ask for gateway: "ARP, Request who-has _gateway tell guest1". Tried using user (slirp) as the network backend, but no luck. Also tried git bisect to find the issue using a working commit (given by Marcel), but it turns out that it is very old and I faced build errors one after another. Please note that e1000 live migration is working fine in the same setup. >>> I tried to git bisect , but I couldn't find a working version of vmxnet >>> supporting live migration >>> I tried even a commit from December 2014 and it didn't work. >>> >>> What is strange (to me) is that the networking packets can't be sent from >>> the guest (after migration) >>> even after rebooting the guest. >> This makes me think that some network offload configuration wasn’t properly >> migrated or applied. >> What network backend are you using? > > Suktrit tried with tap device, I tried with slirp. > If you can point me to the property that disables all the offloads it will > really help. > >> Do you see any outgoing packets in the sniffer? > > I didn't use the sniffer, I checked dmesg in guest, there was a line > complaining that it can't send packets. I see. If it cannot send packet on the guest side, then it’s not an offload. A snippet from dmesg will be helpful indeed. > > Thanks, > Marcel > >>> Any help or pointer would be greatly appreciated. >>> Thanks, >>> Marcel >>> >>> >>> [...]
Re: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
* Marcel Apfelbaum (marcel.apfelb...@gmail.com) wrote: > Hi Dmitry, > > On 6/25/19 11:39 AM, Dmitry Fleytman wrote: > > > > > On 25 Jun 2019, at 11:14, Marcel Apfelbaum > > > wrote: > > > > > > Hi Sukrit > > > > > > On 6/21/19 5:45 PM, Sukrit Bhatnagar wrote: > > > > Hi, > > > [...] > > > > This RFC is meant to request suggestions on the things which are > > > > working and for help on the things which are not. > > > > > > > [...] > > > > What is not working: > > > > > > > [...] > > > > * It seems that vmxnet3 migration itself is not working properly, at > > > > least > > > >for me. The pvrdma device depends on it, vmxnet3 is function 0 and > > > > pvrdma > > > >is function 1. This is happening even for a build of unmodified code > > > > from > > > >the master branch. > > > >After migration, the network connectivity is lost at destination. > > > >Things are fine at the source before migration. > > > >This is the command I am using at src: > > > > > > > >sudo /home/skrtbhtngr/qemu/build/x86_64-softmmu/qemu-system-x86_64 \ > > > > -enable-kvm \ > > > > -m 2G -smp cpus=2 \ > > > > -hda /home/skrtbhtngr/fedora.img \ > > > > -netdev tap,id=hostnet0 \ > > > > -device vmxnet3,netdev=hostnet0,id=net0,mac=52:54:00:99:ff:bc \ > > > > -monitor telnet:127.0.0.1:,server,nowait \ > > > > -trace events=/home/skrtbhtngr/trace-events \ > > > > -vnc 0.0.0.0:0 > > > > > > > >Similar command is used for the dest. Currently, I am trying > > > >same-host migration for testing purpose, without the pvrdma device. > > > >Two tap interfaces, for src and dest were created successfully at > > > >the host. Kernel logs: > > > >... > > > >br0: port 2(tap0) entered forwarding state > > > >... > > > >br0: port 3(tap1) entered forwarding state > > > > > > > >tcpdump at the dest reports only outgoing ARP packets, which ask > > > >for gateway: "ARP, Request who-has _gateway tell guest1". > > > > > > > >Tried using user (slirp) as the network backend, but no luck. > > > > Also tried git bisect to find the issue using a working commit > > > > (given > > > >by Marcel), but it turns out that it is very old and I faced build > > > >errors one after another. > > > > > > > >Please note that e1000 live migration is working fine in the same > > > > setup. > > > > > > > I tried to git bisect , but I couldn't find a working version of vmxnet > > > supporting live migration > > > I tried even a commit from December 2014 and it didn't work. > > > > > > What is strange (to me) is that the networking packets can't be sent from > > > the guest (after migration) > > > even after rebooting the guest. > > This makes me think that some network offload configuration wasn’t properly > > migrated or applied. > > What network backend are you using? > > Suktrit tried with tap device, I tried with slirp. > If you can point me to the property that disables all the offloads it will > really help. > > > Do you see any outgoing packets in the sniffer? > > I didn't use the sniffer, I checked dmesg in guest, there was a line > complaining that it can't send packets. What exactly was the error? I don't know much about vmxnet3; but if the guest driver is seeing the problem then I guess that's the best pointer we have. Dave > Thanks, > Marcel > > > > Any help or pointer would be greatly appreciated. > > > Thanks, > > > Marcel > > > > > > > > > [...] > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK
Re: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
Hi Dmitry, On 6/25/19 11:39 AM, Dmitry Fleytman wrote: On 25 Jun 2019, at 11:14, Marcel Apfelbaum wrote: Hi Sukrit On 6/21/19 5:45 PM, Sukrit Bhatnagar wrote: Hi, [...] This RFC is meant to request suggestions on the things which are working and for help on the things which are not. [...] What is not working: [...] * It seems that vmxnet3 migration itself is not working properly, at least for me. The pvrdma device depends on it, vmxnet3 is function 0 and pvrdma is function 1. This is happening even for a build of unmodified code from the master branch. After migration, the network connectivity is lost at destination. Things are fine at the source before migration. This is the command I am using at src: sudo /home/skrtbhtngr/qemu/build/x86_64-softmmu/qemu-system-x86_64 \ -enable-kvm \ -m 2G -smp cpus=2 \ -hda /home/skrtbhtngr/fedora.img \ -netdev tap,id=hostnet0 \ -device vmxnet3,netdev=hostnet0,id=net0,mac=52:54:00:99:ff:bc \ -monitor telnet:127.0.0.1:,server,nowait \ -trace events=/home/skrtbhtngr/trace-events \ -vnc 0.0.0.0:0 Similar command is used for the dest. Currently, I am trying same-host migration for testing purpose, without the pvrdma device. Two tap interfaces, for src and dest were created successfully at the host. Kernel logs: ... br0: port 2(tap0) entered forwarding state ... br0: port 3(tap1) entered forwarding state tcpdump at the dest reports only outgoing ARP packets, which ask for gateway: "ARP, Request who-has _gateway tell guest1". Tried using user (slirp) as the network backend, but no luck. Also tried git bisect to find the issue using a working commit (given by Marcel), but it turns out that it is very old and I faced build errors one after another. Please note that e1000 live migration is working fine in the same setup. I tried to git bisect , but I couldn't find a working version of vmxnet supporting live migration I tried even a commit from December 2014 and it didn't work. What is strange (to me) is that the networking packets can't be sent from the guest (after migration) even after rebooting the guest. This makes me think that some network offload configuration wasn’t properly migrated or applied. What network backend are you using? Suktrit tried with tap device, I tried with slirp. If you can point me to the property that disables all the offloads it will really help. Do you see any outgoing packets in the sniffer? I didn't use the sniffer, I checked dmesg in guest, there was a line complaining that it can't send packets. Thanks, Marcel Any help or pointer would be greatly appreciated. Thanks, Marcel [...]
Re: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
> On 25 Jun 2019, at 11:14, Marcel Apfelbaum wrote: > > Hi Sukrit > > On 6/21/19 5:45 PM, Sukrit Bhatnagar wrote: >> Hi, > [...] >> This RFC is meant to request suggestions on the things which are >> working and for help on the things which are not. >> > [...] >> What is not working: >> > [...] >> >> * It seems that vmxnet3 migration itself is not working properly, at least >> for me. The pvrdma device depends on it, vmxnet3 is function 0 and pvrdma >> is function 1. This is happening even for a build of unmodified code from >> the master branch. >> After migration, the network connectivity is lost at destination. >> Things are fine at the source before migration. >> This is the command I am using at src: >> >> sudo /home/skrtbhtngr/qemu/build/x86_64-softmmu/qemu-system-x86_64 \ >> -enable-kvm \ >> -m 2G -smp cpus=2 \ >> -hda /home/skrtbhtngr/fedora.img \ >> -netdev tap,id=hostnet0 \ >> -device vmxnet3,netdev=hostnet0,id=net0,mac=52:54:00:99:ff:bc \ >> -monitor telnet:127.0.0.1:,server,nowait \ >> -trace events=/home/skrtbhtngr/trace-events \ >> -vnc 0.0.0.0:0 >> >> Similar command is used for the dest. Currently, I am trying >> same-host migration for testing purpose, without the pvrdma device. >> Two tap interfaces, for src and dest were created successfully at >> the host. Kernel logs: >> ... >> br0: port 2(tap0) entered forwarding state >> ... >> br0: port 3(tap1) entered forwarding state >> >> tcpdump at the dest reports only outgoing ARP packets, which ask >> for gateway: "ARP, Request who-has _gateway tell guest1". >> >> Tried using user (slirp) as the network backend, but no luck. >> Also tried git bisect to find the issue using a working commit (given >> by Marcel), but it turns out that it is very old and I faced build >> errors one after another. >> >> Please note that e1000 live migration is working fine in the same setup. >> > > I tried to git bisect , but I couldn't find a working version of vmxnet > supporting live migration > I tried even a commit from December 2014 and it didn't work. > > What is strange (to me) is that the networking packets can't be sent from the > guest (after migration) > even after rebooting the guest. This makes me think that some network offload configuration wasn’t properly migrated or applied. What network backend are you using? Do you see any outgoing packets in the sniffer? > > Any help or pointer would be greatly appreciated. > Thanks, > Marcel > > > [...]
Re: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
Hi Sukrit On 6/21/19 5:45 PM, Sukrit Bhatnagar wrote: Hi, [...] This RFC is meant to request suggestions on the things which are working and for help on the things which are not. [...] What is not working: [...] * It seems that vmxnet3 migration itself is not working properly, at least for me. The pvrdma device depends on it, vmxnet3 is function 0 and pvrdma is function 1. This is happening even for a build of unmodified code from the master branch. After migration, the network connectivity is lost at destination. Things are fine at the source before migration. This is the command I am using at src: sudo /home/skrtbhtngr/qemu/build/x86_64-softmmu/qemu-system-x86_64 \ -enable-kvm \ -m 2G -smp cpus=2 \ -hda /home/skrtbhtngr/fedora.img \ -netdev tap,id=hostnet0 \ -device vmxnet3,netdev=hostnet0,id=net0,mac=52:54:00:99:ff:bc \ -monitor telnet:127.0.0.1:,server,nowait \ -trace events=/home/skrtbhtngr/trace-events \ -vnc 0.0.0.0:0 Similar command is used for the dest. Currently, I am trying same-host migration for testing purpose, without the pvrdma device. Two tap interfaces, for src and dest were created successfully at the host. Kernel logs: ... br0: port 2(tap0) entered forwarding state ... br0: port 3(tap1) entered forwarding state tcpdump at the dest reports only outgoing ARP packets, which ask for gateway: "ARP, Request who-has _gateway tell guest1". Tried using user (slirp) as the network backend, but no luck. Also tried git bisect to find the issue using a working commit (given by Marcel), but it turns out that it is very old and I faced build errors one after another. Please note that e1000 live migration is working fine in the same setup. I tried to git bisect , but I couldn't find a working version of vmxnet supporting live migration I tried even a commit from December 2014 and it didn't work. What is strange (to me) is that the networking packets can't be sent from the guest (after migration) even after rebooting the guest. Any help or pointer would be greatly appreciated. Thanks, Marcel [...]
Re: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
Patchew URL: https://patchew.org/QEMU/20190621144541.13770-1-skrtbht...@gmail.com/ Hi, This series seems to have some coding style problems. See output below for more information: Type: series Subject: [Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device Message-id: 20190621144541.13770-1-skrtbht...@gmail.com === TEST SCRIPT BEGIN === #!/bin/bash git rev-parse base > /dev/null || exit 0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram ./scripts/checkpatch.pl --mailback base.. === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' 21c2b30 hw/pvrdma: Add live migration support === OUTPUT BEGIN === ERROR: do not use C99 // comments #50: FILE: hw/rdma/vmw/pvrdma_main.c:610: +// Remap DSR ERROR: do not use C99 // comments #60: FILE: hw/rdma/vmw/pvrdma_main.c:620: +// Remap cmd slot WARNING: line over 80 characters #62: FILE: hw/rdma/vmw/pvrdma_main.c:622: +dev->dsr_info.req = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->cmd_slot_dma, ERROR: do not use C99 // comments #70: FILE: hw/rdma/vmw/pvrdma_main.c:630: +// Remap rsp slot WARNING: line over 80 characters #72: FILE: hw/rdma/vmw/pvrdma_main.c:632: +dev->dsr_info.rsp = rdma_pci_dma_map(pci_dev, dev->dsr_info.dsr->resp_slot_dma, total: 3 errors, 2 warnings, 77 lines checked Commit 21c2b3077a6c (hw/pvrdma: Add live migration support) has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 The full log is available at http://patchew.org/logs/20190621144541.13770-1-skrtbht...@gmail.com/testing.checkpatch/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-de...@redhat.com
[Qemu-devel] [RFC 0/1] Add live migration support to the PVRDMA device
Hi, I am a GSoC participant, trying to implement live migration for the pvrdma device with help from my mentors Marcel and Yuval. My current task is to save and load the various addresses that the device uses for DMA mapping. We will be adding the device state into live migration, incrementally. As the first step in the implementation, we are performing migration to the same host. This will save us from many complexities, such as GID change, at this stage, and we will address migration across hosts at a later point when same-host migration works. Currently, the save and load logic uses SaveVMHandlers, which is the legcay way, and will be ported to VMStateDescription once the existing issues are solved. This RFC is meant to request suggestions on the things which are working and for help on the things which are not. What is working: * pvrdma device is getting initialized in a VM, its GID entry is getting added to the host, and rc_pingpong is successful between two such VMs. This is when libvirt is used to launch the VMs. * The dma, cmd_slot_dma and resp_slot_dma addresses are saved at the source and loaded properly in the destination upon migration. That is, the values loaded at the dest during migration are the same as the ones saved. `dma` is provided by the guest device when it writes to BAR1, stored in dev->dsr_info.dma. A DSR is created on mapping to this address. `cmd_slot_dma` and `resp_slot_dma` are the dma addresses of the command and response buffers, respectively, which are provided by the guest through the DSR. * The DSR successfully (re)maps to the dma address loaded from migration at the dest. What is not working: * In the pvrdma_load() logic, the mapping to DSR is successful at dest. But the mapping for cmd and resp slots fails. rdma_pci_dma_map() eventually calls address_space_map(). Inside the latter, a global BounceBuffer bounce is checked to see if it is in use (the atomic_xchg() primitive). At the dest, it is in use and the dma remapping fails there, which fails the whole migration process. Essentially, I am looking for a way to remap guest physical address after a live migration (to the same host). Any tips on avoiding the BounceBuffer will also be great. I have also tried unmapping the cmd and resp slots at the source before saving the dma addresses in pvrdma_save(), but the mapping fails anyway. * It seems that vmxnet3 migration itself is not working properly, at least for me. The pvrdma device depends on it, vmxnet3 is function 0 and pvrdma is function 1. This is happening even for a build of unmodified code from the master branch. After migration, the network connectivity is lost at destination. Things are fine at the source before migration. This is the command I am using at src: sudo /home/skrtbhtngr/qemu/build/x86_64-softmmu/qemu-system-x86_64 \ -enable-kvm \ -m 2G -smp cpus=2 \ -hda /home/skrtbhtngr/fedora.img \ -netdev tap,id=hostnet0 \ -device vmxnet3,netdev=hostnet0,id=net0,mac=52:54:00:99:ff:bc \ -monitor telnet:127.0.0.1:,server,nowait \ -trace events=/home/skrtbhtngr/trace-events \ -vnc 0.0.0.0:0 Similar command is used for the dest. Currently, I am trying same-host migration for testing purpose, without the pvrdma device. Two tap interfaces, for src and dest were created successfully at the host. Kernel logs: ... br0: port 2(tap0) entered forwarding state ... br0: port 3(tap1) entered forwarding state tcpdump at the dest reports only outgoing ARP packets, which ask for gateway: "ARP, Request who-has _gateway tell guest1". Tried using user (slirp) as the network backend, but no luck. Also tried git bisect to find the issue using a working commit (given by Marcel), but it turns out that it is very old and I faced build errors one after another. Please note that e1000 live migration is working fine in the same setup. * Since we are aiming at trying on same-host migration first, I cannot use libvirt as it does not allow this. Currently, I am running the VMs using qemu-system commands. But libvirt is needed to add the GID entry of the guest device in the host. I am looking for a workaround, if that is possible at all. I started a thread few days ago for the same on libvirt-users: https://www.redhat.com/archives/libvirt-users/2019-June/msg00011.html Sukrit Bhatnagar (1): hw/pvrdma: Add live migration support hw/rdma/vmw/pvrdma_main.c | 56 +++ 1 file changed, 56 insertions(+) -- 2.21.0