Re: [PATCH v2 1/2] migration/rdma: Fix out of order wrid

2021-06-28 Thread lizhij...@fujitsu.com


On 25/06/2021 00:42, Dr. David Alan Gilbert wrote:
> * Li Zhijian (lizhij...@cn.fujitsu.com) wrote:
>> destination:
>> ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev 
>> tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device 
>> e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive 
>> if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 
>> -device 
>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 
>> -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga 
>> qxl -spice streaming-video=filter,port=5902,disable-ticketing -incoming 
>> rdma:192.168.22.23:
>> qemu-system-x86_64: -spice 
>> streaming-video=filter,port=5902,disable-ticketing: warning: short-form 
>> boolean option 'disable-ticketing' deprecated
>> Please use disable-ticketing=on instead
>> QEMU 6.0.50 monitor - type 'help' for more information
>> (qemu) trace-event qemu_rdma_block_for_wrid_miss on
>> (qemu) dest_init RDMA Device opened: kernel name rxe_eth0 uverbs device name 
>> uverbs2, infiniband_verbs class device path 
>> /sys/class/infiniband_verbs/uverbs2, infiniband class device path 
>> /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet
>> qemu_rdma_block_for_wrid_miss A Wanted wrid CONTROL SEND (2000) but got 
>> CONTROL RECV (4000)
>>
>> source:
>> ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev 
>> tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device 
>> e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive 
>> if=none,file=./Fedora-rdma-server.qcow2,id=drive-virtio-disk0 -device 
>> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 
>> -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga 
>> qxl -spice streaming-video=filter,port=5901,disable-ticketing -S
>> qemu-system-x86_64: -spice 
>> streaming-video=filter,port=5901,disable-ticketing: warning: short-form 
>> boolean option 'disable-ticketing' deprecated
>> Please use disable-ticketing=on instead
>> QEMU 6.0.50 monitor - type 'help' for more information
>> (qemu)
>> (qemu) trace-event qemu_rdma_block_for_wrid_miss on
>> (qemu) migrate -d rdma:192.168.22.23:
>> source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device 
>> name uverbs2, infiniband_verbs class device path 
>> /sys/class/infiniband_verbs/uverbs2, infiniband class device path 
>> /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet
>> (qemu) qemu_rdma_block_for_wrid_miss A Wanted wrid WRITE RDMA (1) but got 
>> CONTROL RECV (4000)
>>
>> NOTE: soft RoCE as the rdma device.
>> [root@iaas-rpma images]# rdma link show rxe_eth0/1
>> link rxe_eth0/1 state ACTIVE physical_state LINK_UP netdev eth0
>>
>> This migration cannot be completed since out of order(OOO) CQ event occurs.
>> OOO cases will occur in both source side and destination side. And it
>> happens on only SEND and RECV are out of order. OOO between 'WRITE RDMA' and
>> 'RECV' doesn't matter.
>>
>> below the OOO sequence:
>>source destination
>>qemu_rdma_write_one()  qemu_rdma_registration_handle()
>> 1.   post_recv X post_recv Y
>> 2.   post_send X
>> 3.   wait X CQ event
>> 4.   X CQ event
>> 5.   post_send Y
>> 6.   wait Y CQ event
>> 7.   Y CQ event (dropped)
>> 8.   Y CQ event(send Y done)
>> 9.   X CQ event(send X done)
>> 10. wait Y CQ event(dropped at (7), blocks 
>> forever)
>>
>> Looks it only happens on soft RoCE rdma device in my a hundred of runs,
>> a hardware IB device works fine.
>>
>> Here we introduce a independent send completion queue to distinguish
>> ibv_post_send completion queue from the original mixed completion queue.
>> It helps us to poll the specific CQE we are really interesting in.
> Hi Li,
>OK, it's a while since I've thought this much about completion, but I
> think that's OK, however, what stops the other messages, RDMA_WRITE and
> SEND_CONTROL being out of order?

Once either source or destination got below OOO wrid, both sides will wait for 
their FDs becoming
readable so that the migration will have no chance to be completed.
qemu_rdma_block_for_wrid_miss A Wanted wrid CONTROL SEND (2000) but got CONTROL 
RECV (4000)



>
>Could this be fixed another way; make block_for_wrid record a flag for
> WRID's it's received, and then check (and clear) that flag right at the
> start?

I intent to do so like [1], but i think it's too tricky and hard to understand.

And I have consideration about:
- should we record a OOO in 'WRITE RDMA' and CONTROL RECV even if it doesn't 
matter in practice
- how many ooo_wrid we should record, I have observed  2 later WRs' CQ arrived 
earlier than
the wanted one.



[1]: 
https://lore.kernel.org/qemu-devel/162371118578.2358.12447251487494492434@7c66fb7bc3ab/T/#t

Thanks
Li

>
> Dave
>
>> Signed-off-by: Li

Re: [PATCH v2 1/2] migration/rdma: Fix out of order wrid

2021-06-24 Thread Dr. David Alan Gilbert
* Li Zhijian (lizhij...@cn.fujitsu.com) wrote:
> destination:
> ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev 
> tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device 
> e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive 
> if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 
> -device 
> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 
> 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl 
> -spice streaming-video=filter,port=5902,disable-ticketing -incoming 
> rdma:192.168.22.23:
> qemu-system-x86_64: -spice 
> streaming-video=filter,port=5902,disable-ticketing: warning: short-form 
> boolean option 'disable-ticketing' deprecated
> Please use disable-ticketing=on instead
> QEMU 6.0.50 monitor - type 'help' for more information
> (qemu) trace-event qemu_rdma_block_for_wrid_miss on
> (qemu) dest_init RDMA Device opened: kernel name rxe_eth0 uverbs device name 
> uverbs2, infiniband_verbs class device path 
> /sys/class/infiniband_verbs/uverbs2, infiniband class device path 
> /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet
> qemu_rdma_block_for_wrid_miss A Wanted wrid CONTROL SEND (2000) but got 
> CONTROL RECV (4000)
> 
> source:
> ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev 
> tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device 
> e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive 
> if=none,file=./Fedora-rdma-server.qcow2,id=drive-virtio-disk0 -device 
> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 
> 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl 
> -spice streaming-video=filter,port=5901,disable-ticketing -S
> qemu-system-x86_64: -spice 
> streaming-video=filter,port=5901,disable-ticketing: warning: short-form 
> boolean option 'disable-ticketing' deprecated
> Please use disable-ticketing=on instead
> QEMU 6.0.50 monitor - type 'help' for more information
> (qemu)
> (qemu) trace-event qemu_rdma_block_for_wrid_miss on
> (qemu) migrate -d rdma:192.168.22.23:
> source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device 
> name uverbs2, infiniband_verbs class device path 
> /sys/class/infiniband_verbs/uverbs2, infiniband class device path 
> /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet
> (qemu) qemu_rdma_block_for_wrid_miss A Wanted wrid WRITE RDMA (1) but got 
> CONTROL RECV (4000)
> 
> NOTE: soft RoCE as the rdma device.
> [root@iaas-rpma images]# rdma link show rxe_eth0/1
> link rxe_eth0/1 state ACTIVE physical_state LINK_UP netdev eth0
> 
> This migration cannot be completed since out of order(OOO) CQ event occurs.
> OOO cases will occur in both source side and destination side. And it
> happens on only SEND and RECV are out of order. OOO between 'WRITE RDMA' and
> 'RECV' doesn't matter.
> 
> below the OOO sequence:
> source destination
>   qemu_rdma_write_one()  qemu_rdma_registration_handle()
> 1.post_recv X post_recv Y
> 2.post_send X
> 3.wait X CQ event
> 4.X CQ event
> 5.post_send Y
> 6.wait Y CQ event
> 7.Y CQ event (dropped)
> 8.Y CQ event(send Y done)
> 9.X CQ event(send X done)
> 10. wait Y CQ event(dropped at (7), blocks 
> forever)
> 
> Looks it only happens on soft RoCE rdma device in my a hundred of runs,
> a hardware IB device works fine.
> 
> Here we introduce a independent send completion queue to distinguish
> ibv_post_send completion queue from the original mixed completion queue.
> It helps us to poll the specific CQE we are really interesting in.

Hi Li,
  OK, it's a while since I've thought this much about completion, but I
think that's OK, however, what stops the other messages, RDMA_WRITE and
SEND_CONTROL being out of order?

  Could this be fixed another way; make block_for_wrid record a flag for
WRID's it's received, and then check (and clear) that flag right at the
start?

Dave

> Signed-off-by: Li Zhijian 
> ---
> V2 Introduce send completion queue
> ---
>  migration/rdma.c | 94 
>  1 file changed, 79 insertions(+), 15 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index d90b29a4b51..16fe0688858 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -359,8 +359,10 @@ typedef struct RDMAContext {
>  struct rdma_event_channel   *channel;
>  struct ibv_qp *qp;  /* queue pair */
>  struct ibv_comp_channel *comp_channel;  /* completion channel */
> +struct ibv_comp_channel *send_comp_channel;  /* send completion channel 
> */
>  struct ibv_pd *pd;  /* protection domain */
>  struct ibv_cq *cq;  /* completion queue */
> +struct ibv_cq *send_cq;   

[PATCH v2 1/2] migration/rdma: Fix out of order wrid

2021-06-18 Thread Li Zhijian
destination:
../qemu/build/qemu-system-x86_64 -enable-kvm -netdev 
tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device 
e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive 
if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 -device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 
2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl 
-spice streaming-video=filter,port=5902,disable-ticketing -incoming 
rdma:192.168.22.23:
qemu-system-x86_64: -spice streaming-video=filter,port=5902,disable-ticketing: 
warning: short-form boolean option 'disable-ticketing' deprecated
Please use disable-ticketing=on instead
QEMU 6.0.50 monitor - type 'help' for more information
(qemu) trace-event qemu_rdma_block_for_wrid_miss on
(qemu) dest_init RDMA Device opened: kernel name rxe_eth0 uverbs device name 
uverbs2, infiniband_verbs class device path 
/sys/class/infiniband_verbs/uverbs2, infiniband class device path 
/sys/class/infiniband/rxe_eth0, transport: (2) Ethernet
qemu_rdma_block_for_wrid_miss A Wanted wrid CONTROL SEND (2000) but got CONTROL 
RECV (4000)

source:
../qemu/build/qemu-system-x86_64 -enable-kvm -netdev 
tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device 
e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive 
if=none,file=./Fedora-rdma-server.qcow2,id=drive-virtio-disk0 -device 
virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 
2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl 
-spice streaming-video=filter,port=5901,disable-ticketing -S
qemu-system-x86_64: -spice streaming-video=filter,port=5901,disable-ticketing: 
warning: short-form boolean option 'disable-ticketing' deprecated
Please use disable-ticketing=on instead
QEMU 6.0.50 monitor - type 'help' for more information
(qemu)
(qemu) trace-event qemu_rdma_block_for_wrid_miss on
(qemu) migrate -d rdma:192.168.22.23:
source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device name 
uverbs2, infiniband_verbs class device path 
/sys/class/infiniband_verbs/uverbs2, infiniband class device path 
/sys/class/infiniband/rxe_eth0, transport: (2) Ethernet
(qemu) qemu_rdma_block_for_wrid_miss A Wanted wrid WRITE RDMA (1) but got 
CONTROL RECV (4000)

NOTE: soft RoCE as the rdma device.
[root@iaas-rpma images]# rdma link show rxe_eth0/1
link rxe_eth0/1 state ACTIVE physical_state LINK_UP netdev eth0

This migration cannot be completed since out of order(OOO) CQ event occurs.
OOO cases will occur in both source side and destination side. And it
happens on only SEND and RECV are out of order. OOO between 'WRITE RDMA' and
'RECV' doesn't matter.

below the OOO sequence:
  source destination
  qemu_rdma_write_one()  qemu_rdma_registration_handle()
1.  post_recv X post_recv Y
2.  post_send X
3.  wait X CQ event
4.  X CQ event
5.  post_send Y
6.  wait Y CQ event
7.  Y CQ event (dropped)
8.  Y CQ event(send Y done)
9.  X CQ event(send X done)
10. wait Y CQ event(dropped at (7), blocks 
forever)

Looks it only happens on soft RoCE rdma device in my a hundred of runs,
a hardware IB device works fine.

Here we introduce a independent send completion queue to distinguish
ibv_post_send completion queue from the original mixed completion queue.
It helps us to poll the specific CQE we are really interesting in.

Signed-off-by: Li Zhijian 
---
V2 Introduce send completion queue
---
 migration/rdma.c | 94 
 1 file changed, 79 insertions(+), 15 deletions(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index d90b29a4b51..16fe0688858 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -359,8 +359,10 @@ typedef struct RDMAContext {
 struct rdma_event_channel   *channel;
 struct ibv_qp *qp;  /* queue pair */
 struct ibv_comp_channel *comp_channel;  /* completion channel */
+struct ibv_comp_channel *send_comp_channel;  /* send completion channel */
 struct ibv_pd *pd;  /* protection domain */
 struct ibv_cq *cq;  /* completion queue */
+struct ibv_cq *send_cq; /* send completion queue */
 
 /*
  * If a previous write failed (perhaps because of a failed
@@ -1067,8 +1069,7 @@ static int qemu_rdma_alloc_pd_cq(RDMAContext *rdma)
 }
 
 /*
- * Completion queue can be filled by both read and write work requests,
- * so must reflect the sum of both possible queue sizes.
+ * Completion queue can be filled by read work requests.
  */
 rdma->cq = ibv_create_cq(rdma->verbs, (RDMA_SIGNALED_SEND_MAX * 3),
 NULL, rdma->comp_channel, 0);
@@ -1077,6 +1078,20 @@ static int qemu_rdma_al