On 06:20 Tue 25 Sep , Hal Rosenstock wrote: > On Tue, 2007-09-25 at 15:00 +0200, Or Gerlitz wrote: > > Sean Hefty wrote: > > >>> node 1 <-> switch A <-> switch B <-> switch C <-> SA > > > > >> The host would only see port up/down events as of changes in the link > > >> state in the local port or in the port which is connected to it through > > >> the cable. > > > > > So, if you brought the link down/up between switches A & B, node 1 > > > wouldn't receive any events, but it would be removed from the multicast > > > group? > > > > good catch! > > > > Indeed, when the link between switches A and B goes down, per the view > > point of the SM, the whole sub-fabric across A is lost and hence the > > node is dropped from all the multicast groups it is joined to. > > No, it is not (dropped from all multicast groups it is joined to). It > may be removed from the multicast forwarding tables if there is no route > available but it is still a member of the group.
I cannot see it. With normal flow OpenSM will get trap on switch ports disconnection, this will trigger heavy sweep and whole A sub-fabrics will be dropped right after discovery phase (including multicast groups - it is in __osm_drop_mgr_remove_port()). > > > However, from the view point of the node, no port down is experienced. > > > > When the A-B link goes up, the SM discovers all nodes across A and > > probes their ports, though this process a port active event --might-- be > > generated by the HCA FW, but I am not sure its mandatory. > > > > Since the only trigger for ipoib to rejoin to multicast groups is > > delivery of event by the hw driver, namely one of: port down/up, lid > > change, sm lid change, client re-register. I think we might have a hole > > here if none of these events is generated. OpenSM will request client reregistration for all ports in A sub-fabric when it will be connected back and discovered again. Sasha > > It doesn't need to rejoin for this case. See above explanation. > > -- Hal > > > Please note that through this discovery, at least one mad is sent from > > the SM to the node. If we enforce the SM to set the re-register bit > > --each-- time it discovers a node, then the bug is solved. > > > > I will test this scheme and let you know what I get (with the voltaire > > SM and mthca driver). > > > > Eitan, Michael - any insight on the matter? > > > > Or. > > > > > > _______________________________________________ > > general mailing list > > [email protected] > > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > > > To unsubscribe, please visit > > http://openib.org/mailman/listinfo/openib-general > _______________________________________________ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
