We ran into this problem when testing rping over Intel/QLogic hardware: [root@rdmaperf3 ~]# rping -s -a 172.31.2.103 -v wait for CONNECTED state 10 connect error -1 cma event RDMA_CM_EVENT_REJECTED, error 28 [root@rdmaperf3 ~]#
[root@rdmaperf8 ~]# rping -c -a 172.31.2.103 -v -C 5 cma event RDMA_CM_EVENT_CONNECT_ERROR, error -1 wait for CONNECTED state 4 connect error -1 [root@rdmaperf8 ~]# Turns out this is because of a couple things: 1) rping, on the client side, clears the conn_params for the newly to be attempted connection, then sets: conn_param.responder_resources = 1; conn_param.initiator_depth = 1; conn_param.retry_count = 10; On the accept side, rping clears the conn_params and then sets just the responder_resources and initiator_depth, without even checking the incoming requested conn_param values from the incoming cm_id. So, OK, you can get away with that since this is a simple test program, but still not "best programming practices". However, the important part here is the retry_count of 10. That won't work on Intel/QLogic hardware. 2) the qib driver enforces a maximum of 7 for retry_count. I don't see anything in the spec that specifies a maximum for this entry, and in particular I know it doesn't call out for 7 to mean infinite retries like it does for rnr_retry_count. I don't think the spec really cares how we solve this, and I don't think there is a hard limit of 7 for the retry_count like the qib driver enforces. On the other hand, the spec doesn't call out a limit on the retry_count but I would assume each driver has the option to implement their own "reasonable, implementation defined" limit in a case like this. So, do we make qib more liberal in its acceptance of retry_count or do we fix rping to use a smaller number? Matters not to me... -- Doug Ledford <dledf...@redhat.com> GPG KeyID: 0E572FDD http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband
signature.asc
Description: OpenPGP digital signature