On Wed, Jul 22, 2009 at 5:59 PM, Ira Weiny<[email protected]> wrote: > Check your multicast group membership and forwarding tables on the switches. > > We have had similar issues and have found that some nodes fail to join the > multicast groups for various reasons.
Also, such failures should be in the opensm log and at least give hint of the issue (e.g. rate, MTU, etc.). -- Hal > > Ira > > On Wed, 22 Jul 2009 15:55:42 -0600 > Todd Bowman <[email protected]> wrote: > >> I need a little direction to help solve an IPoIB issue. >> Software: OFED 1.3 and 1.4 stacks, running OpenSM >> >> >> Problem: >> IPoIB connections fail, meaning a node cannot ping all or some of the other >> IPoIB nodes. IB itself is still up, we can run IB tests with success. So >> far the only resolution is to restart the IB stack. Size of the cluster >> seems to be irrelevant. It has happened on clusters from around 64 to >> 1000s. >> >> >> My first instinct is that some information has been lost from SM/SA which is >> needed to create an IPoIB connection, but I'm not for sure what that >> information is or how to verify that it is gone. >> >> Thanks in advance, >> >> Todd >> > > > -- > Ira Weiny > Math Programmer/Computer Scientist > Lawrence Livermore National Lab > [email protected] > _______________________________________________ > general mailing list > [email protected] > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
