David McMillen wrote:
On Thu, Jul 23, 2009 at 8:29 PM, Steve Wise
<[email protected] <mailto:[email protected]>> wrote:
Can't you just up the value passed into rdma_resolve_addr()?
Currently this code passes in 2000 (ms). Did you try changing
this to say 20000?
I didn't try that. Timeouts on rdma_resolve_addr are much more rare
than on rdma_resolve_route, so test cases are harder to come by. I
did want to offer a solution that seemed to work.
I have not looked at every code path for every possible subsystem that
rdma_cm will use. I don't even have a good reason to know that any
particular timeout value is appropriate. It would be nice if there
was some way to get that information for a particular instance of an
rdma_cm_id. The same goes for the retry mechanism - is it worthwhile
to retry, and how many times is enough? The values in this patch
happen to work for the Infiniband fabrics I use, but my experience is
limited.
Are you saying that one rdma_resolve_addr with a 20,000 ms timeout is
as good (or maybe even better) than 10 repeats of failed calls using
2,000 ms timeouts? If that is true, and always will be for any fabric
rdma_cm uses, then it seems obvious that we should just change the
timeout and not do the retry.
I think so. But if you test it on your setup, that would be best...
Stevo
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general