On Thu, 2008-03-20 at 13:54 +0100, Bernd Schubert wrote: > On Thursday 20 March 2008 13:27:36 Hal Rosenstock wrote: > > On Thu, 2008-03-20 at 12:30 +0100, Bernd Schubert wrote: > > > Hello, > > > > > > on one of our systems we get a rather huge numbers of RcvSwRelayErrors. > > > All I find about RcvSwRelayErrors is > > > > > > "This counter can increase due to a valid network event" > > > > > > But what might cause? > > Ooops. This should have been "But what might cause it?" > > > > > Are you running IB multicast (e.g. IPoIB) ? That's the most common > > cause. > > IPoIB is up, but so far only used initially by lustre for initial lnet o2ib > setup, but then AFAIK not any more. I think some MPI stacks/applications also > do their intial connection using IPoIB. > > But in general, once these connections are established, IPoIB is not much > used > anymore.
The causes are: 1. DLID mapping 2. VL mapping 3. looping (out port = in port) Is your subnet unstable in some way ? Are you using QoS ? -- Hal > > Thanks, > Bernd > > > > > > -- Hal > > > > > Thanks in advance for any help, > > > Bernd > > > > > > > > > [...] > > > 11: [RcvSwRelayErrors == 189] > > > 12: [RcvSwRelayErrors == 196] > > > 16: [RcvSwRelayErrors == 34655] > > > Errors for 0x000b8cffff002b33 "MT47396 Infiniscale-III Mellanox > > > Technologies ()" > > > 1: [RcvSwRelayErrors == 190] > > > 2: [RcvSwRelayErrors == 188] > > > 3: [RcvSwRelayErrors == 195] > > > 4: [RcvSwRelayErrors == 207] > > > 5: [RcvSwRelayErrors == 194] > > > 6: [RcvSwRelayErrors == 189] > > > 8: [RcvSwRelayErrors == 198] > > > 9: [RcvSwRelayErrors == 197] > > > 10: [RcvSwRelayErrors == 190] > > > 11: [RcvSwRelayErrors == 198] > > > 12: [RcvSwRelayErrors == 190] > > > 16: [RcvSwRelayErrors == 34711] > > > Errors for 0x000b8cffff002b43 "MT47396 Infiniscale-III Mellanox > > > Technologies ()" > > > 1: [RcvSwRelayErrors == 196] > > > 3: [RcvSwRelayErrors == 242] > > > [...] > > > _______________________________________________ general mailing list [email protected] http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
