Or Gerlitz wrote:
Sean Hefty wrote:
node 1 <-> switch A <-> switch B <-> switch C <-> SA
The host would only see port up/down events as of changes in the link
state in the local port or in the port which is connected to it through
the cable.
So, if you brought the link down/up between switches A & B, node 1
wouldn't receive any events, but it would be removed from the multicast
group?
However, from the view point of the node, no port down is experienced.
I have tested the a node <-> switch A <-> switch B (Voltaire SM/SA
running here) scheme and possible problem you have pointed on does not
happen:
First, when the A-B link goes down the node is removed from the
multicast group qat the SA database. No event is being experienced by
the node.
Second, when the link is brought back online, the SM discovers and
configures the port. This causes bunch (six!) of events to be generated
and IPoIB joins the multicast group (the broadcast in this case) and we
are done. This join actually goes out to the SA from the multicast core
code and the node is listed in the SA database for the group.
The node system I was using has: OFED 1.2 / mthca / device 25208 (Arbel
memfull) / firmware 4.8.200
I understand that for this device the events are generated by the
firmware and not by some filtering code that captures mad passed through
the process_mad() verb (Sean, am I correct?).
I don't know exactly which events were generated by the firmware, since
the IPoIB event handler does not print the event number, however, I am
sure that PORT_ACTIVE and PKEY_CHANGE (for this one there is a different
print which you can see below) were among them, and there is some chance
that CLIENT_REREGISTER is not one of them.
Note that port active event for itself is not enough for all this to
work, since the multicast code does not flush the entries and hence a
join that follows will be possibly replied with cached attributes (as
discussed earlier on this thread) and no query would be sent to the SA.
At the bottom line, with this device/firmware the problem does not
happen, but there's a possible hole here if the IB spec does not require
the SM to set the client re-register bit each time it discovers a node.
Below is the ipoib log after I have reconnected the cable (when I
removed it no event was generated and local read of the port info, eg
through ibv_devinfo, reported the port is UP...)
Or.
ib0: Port state change event
ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -102)
ib0: Flushing ib0
ib0.f1f1: Not flushing - IPOIB_FLAG_INITIALIZED not set.
ib0: flushing
ib0: downing ib_dev
ib0: stopping multicast thread
ib0: flushing multicast list
ib0: leaving MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: starting multicast thread
ib0: restarting multicast task
ib0: stopping multicast thread
ib0: starting multicast thread
ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: Port state change event
ib0: Flushing ib0
ib0.f1f1: Not flushing - IPOIB_FLAG_INITIALIZED not set.
ib0: flushing
ib0: downing ib_dev
ib0: stopping multicast thread
ib0: flushing multicast list
ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: starting multicast thread
ib0: restarting multicast task
ib0: stopping multicast thread
ib0: starting multicast thread
ib0: Port state change event
ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: Flushing ib0
ib0.f1f1: Not flushing - IPOIB_FLAG_INITIALIZED not set.
ib0: flushing
ib0: downing ib_dev
ib0: stopping multicast thread
ib0: flushing multicast list
ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: starting multicast thread
ib0: restarting multicast task
ib0: stopping multicast thread
ib0: starting multicast thread
ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: Port state change event
ib0: Flushing ib0
ib0.f1f1: Not flushing - IPOIB_FLAG_INITIALIZED not set.
ib0: flushing
ib0: downing ib_dev
ib0: stopping multicast thread
ib0: flushing multicast list
ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: starting multicast thread
ib0: restarting multicast task
ib0: stopping multicast thread
ib0: starting multicast thread
ib0: Port state change event
ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: Flushing ib0
ib0.f1f1: Not flushing - IPOIB_FLAG_INITIALIZED not set.
ib0: flushing
ib0: downing ib_dev
ib0: stopping multicast thread
ib0: flushing multicast list
ib0: deleting multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: starting multicast thread
ib0: restarting multicast task
ib0: stopping multicast thread
ib0: starting multicast thread
ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: pkey change event on port:1
ib0: Flushing ib0 and restarting it's QP
ib0.f1f1: Not flushing - IPOIB_FLAG_INITIALIZED not set.
ib0: Not flushing - pkey index not changed.
ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status -110)
ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status
-110
ib0: joining MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff
ib0: join completion for ff12:401b:ffff:0000:0000:0000:ffff:ffff (status 0)
ib0: Created ah 000001001f447640
ib0: MGID ff12:401b:ffff:0000:0000:0000:ffff:ffff AV 000001001f447640, LID
0xc000, SL 0
ib0: successfully joined all multicast groups
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general