Hello guys, I'm running some experimental tcp btl which implements rdma GET method and advertises it in its flags of the btl API. The btl`s send() method returns rc=1 to select fast path for PML. (this optimization was added in revision 18551 in v1.3)
It seems that in PML/ob1, mca_pml_ob1_send_request_start_rdma() function does not treat right such combination (btl GET + fastpath rc>0) and going into deadlock, i.e. +++ pml_ob1_sendreq.c +670 At this line, sendreq->req_state is 0 +++ pml_ob1_sendreq.c +800 At this line, if btl has GET method and btl`s send() returned fastpath hint - the call to mca_pml_ob1_rndv_completion_request() will decrement sendreq->req_state by one, leaving it to -1. This value of -1 will keep send_request_pml_complete_check() from completing request on PML level. The PML logic (in mca_pml_ob1_send_request_start_rdma) for PUT operation initializes req_state to "2" in pml_ob1_sendreq.c +791, but leaves req_state to 0 for GET operations. Please suggest. Thanks Mike.