Re: [PATCH v3] migration/rdma: Fix out of order wrid
On 28/10/2021 23:17, Dr. David Alan Gilbert wrote: > * Li Zhijian (lizhij...@cn.fujitsu.com) wrote: > > Apologies for taking so long. It's okay :), thanks for your review. > >> /* >> - * Completion queue can be filled by both read and write work requests, >> - * so must reflect the sum of both possible queue sizes. >> + * Completion queue can be filled by read work requests. >>*/ >> -rdma->cq = ibv_create_cq(rdma->verbs, (RDMA_SIGNALED_SEND_MAX * 3), >> -NULL, rdma->comp_channel, 0); >> -if (!rdma->cq) { >> +rdma->recv_cq = ibv_create_cq(rdma->verbs, (RDMA_SIGNALED_SEND_MAX * 3), >> + NULL, rdma->recv_comp_channel, 0); >> +if (!rdma->recv_cq) { >> +error_report("failed to allocate completion queue"); > Minor: It would be good to make this different from the error below; > e.g. 'failed to allocate receive completion queue' Good catch, i will amend them soon. > >> +goto err_alloc_pd_cq; >> +} >> + >> +/* create send completion channel */ >> +rdma->send_comp_channel = ibv_create_comp_channel(rdma->verbs); >> +if (!rdma->send_comp_channel) { >> +error_report("failed to allocate completion channel"); >> +goto err_alloc_pd_cq; >> +} >> + >> +rdma->send_cq = ibv_create_cq(rdma->verbs, (RDMA_SIGNALED_SEND_MAX * 3), >> + NULL, rdma->send_comp_channel, 0); >> +if (!rdma->send_cq) { >> error_report("failed to allocate completion queue"); >> goto err_alloc_pd_cq; >> } >> @@ -1083,11 +1098,19 @@ err_alloc_pd_cq: >> if (rdma->pd) { >> ibv_dealloc_pd(rdma->pd); >> } >> -if (rdma->comp_channel) { >> -ibv_destroy_comp_channel(rdma->comp_channel); >> +if (rdma->recv_comp_channel) { >> +ibv_destroy_comp_channel(rdma->recv_comp_channel); >> +} >> +if (rdma->send_comp_channel) { >> +ibv_destroy_comp_channel(rdma->send_comp_channel); >> +} >> +if (rdma->recv_cq) { >> +ibv_destroy_cq(rdma->recv_cq); >> +rdma->recv_cq = NULL; >> } > Don't you need to destroy the send_cq as well? we don't need to do that since send_cq is that last element we allot, that means send_cq will always be NULL once the code reaches here. Thanks Zhijian > > (Other than that I think it's fine) > > Dave > >
Re: [PATCH v3] migration/rdma: Fix out of order wrid
* Li Zhijian (lizhij...@cn.fujitsu.com) wrote: Apologies for taking so long. > destination: > ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev > tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device > e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive > if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 > -device > virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m > 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl > -spice streaming-video=filter,port=5902,disable-ticketing -incoming > rdma:192.168.22.23: > qemu-system-x86_64: -spice > streaming-video=filter,port=5902,disable-ticketing: warning: short-form > boolean option 'disable-ticketing' deprecated > Please use disable-ticketing=on instead > QEMU 6.0.50 monitor - type 'help' for more information > (qemu) trace-event qemu_rdma_block_for_wrid_miss on > (qemu) dest_init RDMA Device opened: kernel name rxe_eth0 uverbs device name > uverbs2, infiniband_verbs class device path > /sys/class/infiniband_verbs/uverbs2, infiniband class device path > /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet > qemu_rdma_block_for_wrid_miss A Wanted wrid CONTROL SEND (2000) but got > CONTROL RECV (4000) > > source: > ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev > tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device > e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive > if=none,file=./Fedora-rdma-server.qcow2,id=drive-virtio-disk0 -device > virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m > 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl > -spice streaming-video=filter,port=5901,disable-ticketing -S > qemu-system-x86_64: -spice > streaming-video=filter,port=5901,disable-ticketing: warning: short-form > boolean option 'disable-ticketing' deprecated > Please use disable-ticketing=on instead > QEMU 6.0.50 monitor - type 'help' for more information > (qemu) > (qemu) trace-event qemu_rdma_block_for_wrid_miss on > (qemu) migrate -d rdma:192.168.22.23: > source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device > name uverbs2, infiniband_verbs class device path > /sys/class/infiniband_verbs/uverbs2, infiniband class device path > /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet > (qemu) qemu_rdma_block_for_wrid_miss A Wanted wrid WRITE RDMA (1) but got > CONTROL RECV (4000) > > NOTE: we use soft RoCE as the rdma device. > [root@iaas-rpma images]# rdma link show rxe_eth0/1 > link rxe_eth0/1 state ACTIVE physical_state LINK_UP netdev eth0 > > This migration could not be completed when out of order(OOO) CQ event occurs. > The send queue and receive queue shared a same completion queue, and > qemu_rdma_block_for_wrid() will drop the CQs it's not interested in. But > the dropped CQs by qemu_rdma_block_for_wrid() could be later CQs it wants. > So in this case, qemu_rdma_block_for_wrid() will block forever. > > OOO cases will occur in both source side and destination side. And a > forever blocking happens on only SEND and RECV are out of order. OOO between > 'WRITE RDMA' and 'RECV' doesn't matter. > > below the OOO sequence: >source destination > rdma_write_one() qemu_rdma_registration_handle() > 1.S1: post_recv XD1: post_recv Y > 2.wait for recv CQ event X > 3. D2: post_send X ---+ > 4. wait for send CQ send event X (D2) | > 5.recv CQ event X reaches (D2) | > 6. +-S2: post_send Y | > 7. | wait for send CQ event Y | > 8. |recv CQ event Y (S2) (drop it) | > 9. +-send CQ event Y reaches (S2) | > 10. send CQ event X reaches (D2) -+ > 11. wait recv CQ event Y (dropped by (8)) > > Although a hardware IB works fine in my a hundred of runs, the IB > specification > doesn't guaratee the CQ order in such case. > > Here we introduce a independent send completion queue to distinguish > ibv_post_send completion queue from the original mixed completion queue. > It helps us to poll the specific CQE we are really interested in. > > Signed-off-by: Li Zhijian > --- > V3: rebase code, and combine 2/2 to 1/2 > V2: Introduce send completion queue > --- > migration/rdma.c | 132 +++ > 1 file changed, 98 insertions(+), 34 deletions(-) > > diff --git a/migration/rdma.c b/migration/rdma.c > index 5c2d113aa94..bb19a5afe73 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -358,9 +358,11 @@ typedef struct RDMAContext { >
Re: [PATCH v3] migration/rdma: Fix out of order wrid
ping again On 18/10/2021 18:18, Li, Zhijian/ζ ζΊε wrote: > ping > > > On 27/09/2021 15:07, Li Zhijian wrote: >> destination: >> ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev >> tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device >> e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive >> if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 >> -device >> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 >> -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga >> qxl -spice streaming-video=filter,port=5902,disable-ticketing -incoming >> rdma:192.168.22.23: >> qemu-system-x86_64: -spice >> streaming-video=filter,port=5902,disable-ticketing: warning: short-form >> boolean option 'disable-ticketing' deprecated >> Please use disable-ticketing=on instead >> QEMU 6.0.50 monitor - type 'help' for more information >> (qemu) trace-event qemu_rdma_block_for_wrid_miss on >> (qemu) dest_init RDMA Device opened: kernel name rxe_eth0 uverbs device name >> uverbs2, infiniband_verbs class device path >> /sys/class/infiniband_verbs/uverbs2, infiniband class device path >> /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet >> qemu_rdma_block_for_wrid_miss A Wanted wrid CONTROL SEND (2000) but got >> CONTROL RECV (4000) >> >> source: >> ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev >> tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device >> e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive >> if=none,file=./Fedora-rdma-server.qcow2,id=drive-virtio-disk0 -device >> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 >> -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga >> qxl -spice streaming-video=filter,port=5901,disable-ticketing -S >> qemu-system-x86_64: -spice >> streaming-video=filter,port=5901,disable-ticketing: warning: short-form >> boolean option 'disable-ticketing' deprecated >> Please use disable-ticketing=on instead >> QEMU 6.0.50 monitor - type 'help' for more information >> (qemu) >> (qemu) trace-event qemu_rdma_block_for_wrid_miss on >> (qemu) migrate -d rdma:192.168.22.23: >> source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device >> name uverbs2, infiniband_verbs class device path >> /sys/class/infiniband_verbs/uverbs2, infiniband class device path >> /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet >> (qemu) qemu_rdma_block_for_wrid_miss A Wanted wrid WRITE RDMA (1) but got >> CONTROL RECV (4000) >> >> NOTE: we use soft RoCE as the rdma device. >> [root@iaas-rpma images]# rdma link show rxe_eth0/1 >> link rxe_eth0/1 state ACTIVE physical_state LINK_UP netdev eth0 >> >> This migration could not be completed when out of order(OOO) CQ event occurs. >> The send queue and receive queue shared a same completion queue, and >> qemu_rdma_block_for_wrid() will drop the CQs it's not interested in. But >> the dropped CQs by qemu_rdma_block_for_wrid() could be later CQs it wants. >> So in this case, qemu_rdma_block_for_wrid() will block forever. >> >> OOO cases will occur in both source side and destination side. And a >> forever blocking happens on only SEND and RECV are out of order. OOO between >> 'WRITE RDMA' and 'RECV' doesn't matter. >> >> below the OOO sequence: >> source destination >> rdma_write_one() qemu_rdma_registration_handle() >> 1.S1: post_recv XD1: post_recv Y >> 2.wait for recv CQ event X >> 3. D2: post_send X ---+ >> 4. wait for send CQ send event X (D2) | >> 5.recv CQ event X reaches (D2) | >> 6. +-S2: post_send Y | >> 7. | wait for send CQ event Y | >> 8. |recv CQ event Y (S2) (drop it) | >> 9. +-send CQ event Y reaches (S2) | >> 10. send CQ event X reaches (D2) -+ >> 11. wait recv CQ event Y (dropped by >> (8)) >> >> Although a hardware IB works fine in my a hundred of runs, the IB >> specification >> doesn't guaratee the CQ order in such case. >> >> Here we introduce a independent send completion queue to distinguish >> ibv_post_send completion queue from the original mixed completion queue. >> It helps us to poll the specific CQE we are really interested in. >> >> Signed-off-by: Li Zhijian >> --- >> V3: rebase code, and combine 2/2 to 1/2 >> V2: Introduce send completion queue >> --- >>migration/rdma.c | 132 +++ >>1 file changed, 98 insertions(+), 34 deletions(-) >> >> diff --git a/migration/rdma.c b/migration/rdma.c >> index 5c2d113aa
Re: [PATCH v3] migration/rdma: Fix out of order wrid
ping On 27/09/2021 15:07, Li Zhijian wrote: > destination: > ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev > tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device > e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive > if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 > -device > virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m > 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl > -spice streaming-video=filter,port=5902,disable-ticketing -incoming > rdma:192.168.22.23: > qemu-system-x86_64: -spice > streaming-video=filter,port=5902,disable-ticketing: warning: short-form > boolean option 'disable-ticketing' deprecated > Please use disable-ticketing=on instead > QEMU 6.0.50 monitor - type 'help' for more information > (qemu) trace-event qemu_rdma_block_for_wrid_miss on > (qemu) dest_init RDMA Device opened: kernel name rxe_eth0 uverbs device name > uverbs2, infiniband_verbs class device path > /sys/class/infiniband_verbs/uverbs2, infiniband class device path > /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet > qemu_rdma_block_for_wrid_miss A Wanted wrid CONTROL SEND (2000) but got > CONTROL RECV (4000) > > source: > ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev > tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device > e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive > if=none,file=./Fedora-rdma-server.qcow2,id=drive-virtio-disk0 -device > virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m > 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl > -spice streaming-video=filter,port=5901,disable-ticketing -S > qemu-system-x86_64: -spice > streaming-video=filter,port=5901,disable-ticketing: warning: short-form > boolean option 'disable-ticketing' deprecated > Please use disable-ticketing=on instead > QEMU 6.0.50 monitor - type 'help' for more information > (qemu) > (qemu) trace-event qemu_rdma_block_for_wrid_miss on > (qemu) migrate -d rdma:192.168.22.23: > source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device > name uverbs2, infiniband_verbs class device path > /sys/class/infiniband_verbs/uverbs2, infiniband class device path > /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet > (qemu) qemu_rdma_block_for_wrid_miss A Wanted wrid WRITE RDMA (1) but got > CONTROL RECV (4000) > > NOTE: we use soft RoCE as the rdma device. > [root@iaas-rpma images]# rdma link show rxe_eth0/1 > link rxe_eth0/1 state ACTIVE physical_state LINK_UP netdev eth0 > > This migration could not be completed when out of order(OOO) CQ event occurs. > The send queue and receive queue shared a same completion queue, and > qemu_rdma_block_for_wrid() will drop the CQs it's not interested in. But > the dropped CQs by qemu_rdma_block_for_wrid() could be later CQs it wants. > So in this case, qemu_rdma_block_for_wrid() will block forever. > > OOO cases will occur in both source side and destination side. And a > forever blocking happens on only SEND and RECV are out of order. OOO between > 'WRITE RDMA' and 'RECV' doesn't matter. > > below the OOO sequence: > source destination >rdma_write_one() qemu_rdma_registration_handle() > 1.S1: post_recv XD1: post_recv Y > 2.wait for recv CQ event X > 3. D2: post_send X ---+ > 4. wait for send CQ send event X (D2) | > 5.recv CQ event X reaches (D2) | > 6. +-S2: post_send Y | > 7. | wait for send CQ event Y | > 8. |recv CQ event Y (S2) (drop it) | > 9. +-send CQ event Y reaches (S2) | > 10. send CQ event X reaches (D2) -+ > 11. wait recv CQ event Y (dropped by (8)) > > Although a hardware IB works fine in my a hundred of runs, the IB > specification > doesn't guaratee the CQ order in such case. > > Here we introduce a independent send completion queue to distinguish > ibv_post_send completion queue from the original mixed completion queue. > It helps us to poll the specific CQE we are really interested in. > > Signed-off-by: Li Zhijian > --- > V3: rebase code, and combine 2/2 to 1/2 > V2: Introduce send completion queue > --- > migration/rdma.c | 132 +++ > 1 file changed, 98 insertions(+), 34 deletions(-) > > diff --git a/migration/rdma.c b/migration/rdma.c > index 5c2d113aa94..bb19a5afe73 100644 > --- a/migration/rdma.c > +++ b/migration/rdma.c > @@ -358,9 +358,11 @@ typedef struct RDMAContext { > struct ibv_context *ver
[PATCH v3] migration/rdma: Fix out of order wrid
destination: ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl -spice streaming-video=filter,port=5902,disable-ticketing -incoming rdma:192.168.22.23: qemu-system-x86_64: -spice streaming-video=filter,port=5902,disable-ticketing: warning: short-form boolean option 'disable-ticketing' deprecated Please use disable-ticketing=on instead QEMU 6.0.50 monitor - type 'help' for more information (qemu) trace-event qemu_rdma_block_for_wrid_miss on (qemu) dest_init RDMA Device opened: kernel name rxe_eth0 uverbs device name uverbs2, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs2, infiniband class device path /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet qemu_rdma_block_for_wrid_miss A Wanted wrid CONTROL SEND (2000) but got CONTROL RECV (4000) source: ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive if=none,file=./Fedora-rdma-server.qcow2,id=drive-virtio-disk0 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl -spice streaming-video=filter,port=5901,disable-ticketing -S qemu-system-x86_64: -spice streaming-video=filter,port=5901,disable-ticketing: warning: short-form boolean option 'disable-ticketing' deprecated Please use disable-ticketing=on instead QEMU 6.0.50 monitor - type 'help' for more information (qemu) (qemu) trace-event qemu_rdma_block_for_wrid_miss on (qemu) migrate -d rdma:192.168.22.23: source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device name uverbs2, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs2, infiniband class device path /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet (qemu) qemu_rdma_block_for_wrid_miss A Wanted wrid WRITE RDMA (1) but got CONTROL RECV (4000) NOTE: we use soft RoCE as the rdma device. [root@iaas-rpma images]# rdma link show rxe_eth0/1 link rxe_eth0/1 state ACTIVE physical_state LINK_UP netdev eth0 This migration could not be completed when out of order(OOO) CQ event occurs. The send queue and receive queue shared a same completion queue, and qemu_rdma_block_for_wrid() will drop the CQs it's not interested in. But the dropped CQs by qemu_rdma_block_for_wrid() could be later CQs it wants. So in this case, qemu_rdma_block_for_wrid() will block forever. OOO cases will occur in both source side and destination side. And a forever blocking happens on only SEND and RECV are out of order. OOO between 'WRITE RDMA' and 'RECV' doesn't matter. below the OOO sequence: source destination rdma_write_one() qemu_rdma_registration_handle() 1.S1: post_recv XD1: post_recv Y 2.wait for recv CQ event X 3. D2: post_send X ---+ 4. wait for send CQ send event X (D2) | 5.recv CQ event X reaches (D2) | 6. +-S2: post_send Y | 7. | wait for send CQ event Y | 8. |recv CQ event Y (S2) (drop it) | 9. +-send CQ event Y reaches (S2) | 10. send CQ event X reaches (D2) -+ 11. wait recv CQ event Y (dropped by (8)) Although a hardware IB works fine in my a hundred of runs, the IB specification doesn't guaratee the CQ order in such case. Here we introduce a independent send completion queue to distinguish ibv_post_send completion queue from the original mixed completion queue. It helps us to poll the specific CQE we are really interested in. Signed-off-by: Li Zhijian --- V3: rebase code, and combine 2/2 to 1/2 V2: Introduce send completion queue --- migration/rdma.c | 132 +++ 1 file changed, 98 insertions(+), 34 deletions(-) diff --git a/migration/rdma.c b/migration/rdma.c index 5c2d113aa94..bb19a5afe73 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -358,9 +358,11 @@ typedef struct RDMAContext { struct ibv_context *verbs; struct rdma_event_channel *channel; struct ibv_qp *qp; /* queue pair */ -struct ibv_comp_channel *comp_channel; /* completion channel */ +struct ibv_comp_channel *recv_comp_channel; /* recv com