> >   retries = 0, err = 0, index = 16, ctrl_avail = 3, sqe_avail = 1020,
> 
>                                       ^^^^^^^^^^^^^^
> This looks like part of the problem.  There should be 4 control messages
> available by default.  The receiver has sent a control message to the sender,
> but the control message never completed at the receiver.  My guess is that the
> sender never received it either.  It may be the missing buffer update, which
> would be an RDMA write into the sender's 'target_sgl' array.
> 
> Have you ever been able to reproduce this problem without using fork() 
> support?

A simple check to add is whether the rs_post_write() call ever fails, but in 
rs_send_credits() in particular.  Checking for an error from ibv_get_cq_event() 
in rs_get_cq_event() may also be useful.  In neither case do I expect an error, 
but we could confirm it.

- Sean

Reply via email to