> so the usage of mad snooping would be for cache invalidations, I wonder
> if registering on GID/MGID IN/OUT traps be sufficient for the same purpose?

That requires registration with the SA.  The intent is to avoid using a 
centralized service when possible.  Otherwise, we end up with all nodes 
registering for a trap, the SA needing to notify all nodes of in/out of 
service, and all nodes updating their caches at the same time.  The CM timeout 
approach avoids this; the only nodes that need to update their caches are ones 
which are actively trying to connect to a specific node.

Consider a case where one or more nodes are removed from an MPI run.  (Maybe 
the software on the nodes are being updated.)  Relying on traps would require 
SA communication to and from all nodes, even though the nodes won't be used.  
Additionally, once the nodes go back online, their path information may not 
have changed, so none of the work was even needed.

The focus of this patch is on thousands to tens of thousands of nodes.  At that 
scale, the likelihood of a node going up/down at any given point is high.

- Sean
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to