On Thursday 20 March 2008 15:29:35 Hal Rosenstock wrote: > On Thu, 2008-03-20 at 15:27 +0100, Bernd Schubert wrote: > > On Thursday 20 March 2008 15:12:03 Hal Rosenstock wrote: > > > On Thu, 2008-03-20 at 13:54 +0100, Bernd Schubert wrote: > > > > On Thursday 20 March 2008 13:27:36 Hal Rosenstock wrote: > > > > > On Thu, 2008-03-20 at 12:30 +0100, Bernd Schubert wrote: > > > > > > Hello, > > > > > > > > > > > > on one of our systems we get a rather huge numbers of > > > > > > RcvSwRelayErrors. All I find about RcvSwRelayErrors is > > > > > > > > > > > > "This counter can increase due to a valid network event" > > > > > > > > > > > > But what might cause? > > > > > > > > Ooops. This should have been "But what might cause it?" > > > > > > > > > Are you running IB multicast (e.g. IPoIB) ? That's the most common > > > > > cause. > > > > > > > > IPoIB is up, but so far only used initially by lustre for initial > > > > lnet o2ib setup, but then AFAIK not any more. I think some MPI > > > > stacks/applications also do their intial connection using IPoIB. > > > > > > > > But in general, once these connections are established, IPoIB is not > > > > much used anymore. > > > > > > The causes are: > > > 1. DLID mapping > > > 2. VL mapping > > > 3. looping (out port = in port) > > > > > > Is your subnet unstable in some way ? Are you using QoS ? > > > > We have seen some odd problems with opensm (from ofef-1.2.5) in the past > > and once only rebooting the switches did help. > > You might want to update OpenSM to OFED 1.3 version.
I won't manage to build new debian packages today, but I will do over Easter. Hope to also find the time to clean the debian rules a bit, to have it officially included in Debian. But will a new opensm help for these errors? > > > Yesterday I started monitoring the the fabric and even though there's not > > much traffic, I immediately noticed these errors. > > Were the counters cleared before you started looking ? Yes, sure. Thanks a lot for your help, Bernd -- Bernd Schubert Q-Leap Networks GmbH _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
