>So, since you flush the cache for your MPI jobs the gain you see is >basically by re-using the data collected by ipoib?
I need to correct what I said before about our MPI jobs. On our production clusters, we're using the local SA in OFED 1.2, which updates automatically on a timer. This patch removes the timer updates and instead gives control of the update policy to a user space app. The local SA sits beneath the existing ib_sa interface, and would have the PR data available when ipoib requests it. >If this is the case, do you get the same first-order benifit by >essentially using the ipoib cache for all PR queries? There are a couple of benefits. The number of PR queries is reduced from O(n^2) to O(n). The queries can also be done once up front, even started at different times if needed, rather than all at once at job startup. The jobs are also able to make progress even if the SA dies or is unreachable. >I'm trying to say, I think a simple kernel cache itself is fine, but >there should be only 1 cache (get rid of ipoib) and it should have a >really good interface to userspace so that the really hard problems >can be solved through user space code. I don't disagree, but (for now anyway) I believe that the natural interface for communicating with an SA related agent is a MAD interface based on the SA management class for the reasons I mentioned earlier. But this is really talking about extensions to the local SA patch, rather than addressing anything fundamentally wrong with the current patch set. - Sean _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
