David McMillen wrote:


On Thu, Jul 23, 2009 at 8:29 PM, Steve Wise <[email protected] <mailto:[email protected]>> wrote:

    Can't you just up the value passed into rdma_resolve_addr()?
     Currently this code passes in 2000 (ms).  Did you try changing
    this to say 20000?


I didn't try that. Timeouts on rdma_resolve_addr are much more rare than on rdma_resolve_route, so test cases are harder to come by. I did want to offer a solution that seemed to work.

I have not looked at every code path for every possible subsystem that rdma_cm will use. I don't even have a good reason to know that any particular timeout value is appropriate. It would be nice if there was some way to get that information for a particular instance of an rdma_cm_id. The same goes for the retry mechanism - is it worthwhile to retry, and how many times is enough? The values in this patch happen to work for the Infiniband fabrics I use, but my experience is limited.

Are you saying that one rdma_resolve_addr with a 20,000 ms timeout is as good (or maybe even better) than 10 repeats of failed calls using 2,000 ms timeouts? If that is true, and always will be for any fabric rdma_cm uses, then it seems obvious that we should just change the timeout and not do the retry.

I think so.  But if you test it on your setup, that would be best...

Stevo
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to