Li Zhijian <lizhij...@cn.fujitsu.com> wrote: > destination: > ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev > tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device > e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive > if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 > -device > virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m > 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl > -spice streaming-video=filter,port=5902,disable-ticketing -incoming > rdma:192.168.22.23:8888 > qemu-system-x86_64: -spice > streaming-video=filter,port=5902,disable-ticketing: warning: short-form > boolean option 'disable-ticketing' deprecated > Please use disable-ticketing=on instead > QEMU 6.0.50 monitor - type 'help' for more information > (qemu) trace-event qemu_rdma_block_for_wrid_miss on > (qemu) dest_init RDMA Device opened: kernel name rxe_eth0 uverbs device name > uverbs2, infiniband_verbs class device path > /sys/class/infiniband_verbs/uverbs2, infiniband class device path > /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet > qemu_rdma_block_for_wrid_miss A Wanted wrid CONTROL SEND (2000) but got > CONTROL RECV (4000) > > source: > ../qemu/build/qemu-system-x86_64 -enable-kvm -netdev > tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device > e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive > if=none,file=./Fedora-rdma-server.qcow2,id=drive-virtio-disk0 -device > virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m > 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl > -spice streaming-video=filter,port=5901,disable-ticketing -S > qemu-system-x86_64: -spice > streaming-video=filter,port=5901,disable-ticketing: warning: short-form > boolean option 'disable-ticketing' deprecated > Please use disable-ticketing=on instead > QEMU 6.0.50 monitor - type 'help' for more information > (qemu) > (qemu) trace-event qemu_rdma_block_for_wrid_miss on > (qemu) migrate -d rdma:192.168.22.23:8888 > source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device > name uverbs2, infiniband_verbs class device path > /sys/class/infiniband_verbs/uverbs2, infiniband class device path > /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet > (qemu) qemu_rdma_block_for_wrid_miss A Wanted wrid WRITE RDMA (1) but got > CONTROL RECV (4000) > > NOTE: we use soft RoCE as the rdma device. > [root@iaas-rpma images]# rdma link show rxe_eth0/1 > link rxe_eth0/1 state ACTIVE physical_state LINK_UP netdev eth0 > > This migration could not be completed when out of order(OOO) CQ event occurs. > The send queue and receive queue shared a same completion queue, and > qemu_rdma_block_for_wrid() will drop the CQs it's not interested in. But > the dropped CQs by qemu_rdma_block_for_wrid() could be later CQs it wants. > So in this case, qemu_rdma_block_for_wrid() will block forever. > > OOO cases will occur in both source side and destination side. And a > forever blocking happens on only SEND and RECV are out of order. OOO between > 'WRITE RDMA' and 'RECV' doesn't matter. > > below the OOO sequence: > source destination > rdma_write_one() qemu_rdma_registration_handle() > 1. S1: post_recv X D1: post_recv Y > 2. wait for recv CQ event X > 3. D2: post_send X ---------------+ > 4. wait for send CQ send event X (D2) | > 5. recv CQ event X reaches (D2) | > 6. +-S2: post_send Y | > 7. | wait for send CQ event Y | > 8. | recv CQ event Y (S2) (drop it) | > 9. +-send CQ event Y reaches (S2) | > 10. send CQ event X reaches (D2) -----+ > 11. wait recv CQ event Y (dropped by (8)) > > Although a hardware IB works fine in my a hundred of runs, the IB > specification > doesn't guaratee the CQ order in such case. > > Here we introduce a independent send completion queue to distinguish > ibv_post_send completion queue from the original mixed completion queue. > It helps us to poll the specific CQE we are really interested in. > > Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
Reviewed-by: Juan Quintela <quint...@redhat.com> Change is reasonable from migration point of view, and my RDMA knowledge is not good enough to discern. > @@ -3115,10 +3160,14 @@ static void > qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc, > { > QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc); > if (io_read) { > - aio_set_fd_handler(ctx, rioc->rdmain->comp_channel->fd, > + aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd, > + false, io_read, io_write, NULL, opaque); > + aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd, > false, io_read, io_write, NULL, opaque); > } else { > - aio_set_fd_handler(ctx, rioc->rdmaout->comp_channel->fd, > + aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd, > + false, io_read, io_write, NULL, opaque); > + aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd, > false, io_read, io_write, NULL, opaque); > } > } Not related tothis patch. But this function asks to be splited in two, it is a single if depending of one of the parameters. > @@ -3332,7 +3381,22 @@ static size_t qemu_rdma_save_page(QEMUFile *f, void > *opaque, > */ > while (1) { > uint64_t wr_id, wr_id_in; > - int ret = qemu_rdma_poll(rdma, &wr_id_in, NULL); > + int ret = qemu_rdma_poll(rdma, rdma->recv_cq, &wr_id_in, NULL); > + if (ret < 0) { > + error_report("rdma migration: polling error! %d", ret); To comment to what Dave said on the previosu review. If you touch this part again, you can also differentiate recv/send channel here? > + goto err; > + } > + > + wr_id = wr_id_in & RDMA_WRID_TYPE_MASK; > + > + if (wr_id == RDMA_WRID_NONE) { > + break; > + } Code was already that way, but creating a variable for not putting: if ((wr_id_in & RDMA_WRID_TYPE_MASK) == RDMA_WRID_NONE) { break; } I was just searching if wr_id was used anywhere else. Later, Juan.