Sean Hefty wrote:
node 1 <-> switch A <-> switch B <-> switch C <-> SA

The host would only see port up/down events as of changes in the link
state in the local port or in the port which is connected to it through
the cable.

So, if you brought the link down/up between switches A & B, node 1
wouldn't receive any events, but it would be removed from the multicast
group?

good catch!

Indeed, when the link between switches A and B goes down, per the view point of the SM, the whole sub-fabric across A is lost and hence the node is dropped from all the multicast groups it is joined to.

However, from the view point of the node, no port down is experienced.

When the A-B link goes up, the SM discovers all nodes across A and probes their ports, though this process a port active event --might-- be generated by the HCA FW, but I am not sure its mandatory.

Since the only trigger for ipoib to rejoin to multicast groups is delivery of event by the hw driver, namely one of: port down/up, lid change, sm lid change, client re-register. I think we might have a hole here if none of these events is generated.

Please note that through this discovery, at least one mad is sent from the SM to the node. If we enforce the SM to set the re-register bit --each-- time it discovers a node, then the bug is solved.

I will test this scheme and let you know what I get (with the voltaire SM and mthca driver).

Eitan, Michael - any insight on the matter?

Or.


_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to