Re: [OMPI users] RETRY EXCEEDED ERROR status number 12

2009-08-21 Thread Pavel Shamis (Pasha)
You may try to use ibdiagnet tool: http://linux.die.net/man/1/ibdiagnet The tool is part of OFED (http://www.openfabrics.org/) Pasha. Prentice Bisbal wrote: Several jobs on my cluster just died with the error below. Are there any IB/Open MPI diagnostics I should use to diagnose, should I

[OMPI users] RETRY EXCEEDED ERROR status number 12

2009-08-21 Thread Prentice Bisbal
Several jobs on my cluster just died with the error below. Are there any IB/Open MPI diagnostics I should use to diagnose, should I just reboot the nodes, or should I have the user who submitted these jobs just increase the retry count/timeout paramters?

Re: [OMPI users] RETRY EXCEEDED ERROR

2009-03-05 Thread Pavel Shamis (Pasha)
Thanks Pasha! ibdiagnet reports the following: -I--- -I- IPoIB Subnets Check -I--- -I- Subnet: IPv4 PKey:0x7fff QKey:0x0b1b MTU:2048Byte rate:10Gbps SL:0x00 -W- Port localhost/P1 lid=0x00e2

Re: [OMPI users] RETRY EXCEEDED ERROR

2009-03-05 Thread Jan Lindheim
On Thu, Mar 05, 2009 at 10:27:27AM +0200, Pavel Shamis (Pasha) wrote: > > >Time to dig up diagnostics tools and look at port statistics. > > > You may use ibdiagnet tool for the network debug - > *http://linux.die.net/man/1/ibdiagnet. *This tool is part of OFED. > > Pasha. >