> > retries = 0, err = 0, index = 16, ctrl_avail = 3, sqe_avail = 1020, > > ^^^^^^^^^^^^^^ > This looks like part of the problem. There should be 4 control messages > available by default. The receiver has sent a control message to the sender, > but the control message never completed at the receiver. My guess is that the > sender never received it either. It may be the missing buffer update, which > would be an RDMA write into the sender's 'target_sgl' array. > > Have you ever been able to reproduce this problem without using fork() > support?
A simple check to add is whether the rs_post_write() call ever fails, but in rs_send_credits() in particular. Checking for an error from ibv_get_cq_event() in rs_get_cq_event() may also be useful. In neither case do I expect an error, but we could confirm it. - Sean