Re: [MTT users] MTT Error on SLES11
Ethan, Jeff, Sorry for the perl confusion, nevermind... Below is attached the log and ini files, and also how I run MTT. Thanks, Rafael client/mtt -d -v -p -f openmpi.ini 2>&1 |tee test.log ... *** Run test phase starting >> Test run [trivial] Evaluating: trivial Found a match! trivial [trivial Evaluating: Simple >> Running with [openmpi-1.2.8] / [1.2.8] / [openmpi-1.2.8] Found MPI details: [mpi details: open mpi] Using [mpi details: open mpi] with [MPI Install: openmpi-1.2.8] Evaluating: # We can exit if the test passed or was skipped (i.e., there's no need # to cleanup). if test "$MTT_TEST_RUN_RESULT" = "passed" -o "$MTT_TEST_RUN_RESULT" = "skipped"; then exit 0 fi if test "$MTT_TEST_HOSTFILE" != ""; then args="--hostfile $MTT_TEST_HOSTFILE" elif test "$MTT_TEST_HOSTLIST" != ""; then args="--host $MTT_TEST_HOSTLIST" fi orterun $args -np $MTT_TEST_NP --prefix $MTT_TEST_PREFIX mtt_ompi_cleanup.pl Got final exec: mpirun (_hostfile(), "("--hostfile ", "()")", "(_hostlist(), "("--host ", "()")", "")") -np _np() --mca btl openib,self --debug --prefix _prefix() _executable() _argv() chdir /tmp/ompi-core-testers/installs/dLS2/tests/trivial/test_get__trivial Evaluating: require MTT::Test::Specify::Simple Evaluating: $ret = ::Test::Specify::Simple::Specify(@args) Evaluating: _executables(".") Got name: find_executables Got args: "." _do: $ret = MTT::Values::Functions::find_executables(".") _executables got . _exectuables returning: ./c_ring ./f77_ring ./f90_hello ./cxx_ring ./f77_hello ./cxx_hello ./f90_ring ./c_hello *** ERROR: Module aborted: MTT::Test::Specify::Simple:Specify: Can't use string ("8") as an ARRAY ref while "strict refs" in use at /tmp/ompi-core-testers/lib/MTT/Values.pm line 75. On Wed, 2009-04-08 at 14:15 -0400, Ethan Mallove wrote: > On Wed, Apr/08/2009 11:36:05AM, Rafael Folco wrote: > > Well, I took a look at /tmp/ompi-core-testers/lib/MTT/Values.pm line 75. > > > > This piece of code looks wrong to me: > > > > if ($#{@$ret} < 0) { > > > > $ret references an array > > @$ret points to the first element of this array > > $# returns the number of elements > > > > So this line is trying to count elements of the first element??! Doesn't > > make sense. Correct me if I am wrong, what am I missing here ? > > > > "if ($#{$ret} < 0) {" would be correct, without @. > > > > I believe "strict refs" has been forced somewhere on SLES11... I also > > tried on other distro and it works fine. > > How do perl -V differ between the two distros? > > I can not reproduce the error on a SLES 10 machine. > > Could you run MTT with the --debug option and send the output with the > line 75 perl error? That might help me determine which INI param is > responsible for the error. > > -Ethan > > > > > Thanks, > > > > Rafael > > > > On Tue, 2009-04-07 at 15:53 -0300, Rafael Folco wrote: > > > Hi, > > > > > > I'm trying to run MTT on SLES11, but I am getting an error message > > > during the RUN phase and I can't figure out what is the problem. > > > > > > *** ERROR: Module aborted: MTT::Test::Specify::Simple:Specify: Can't use > > > string ("183") as an ARRAY ref while "strict refs" in use at > > > /tmp/ompi-core-testers/lib/MTT/Values.pm line 75. > > > > > > What I could see was that this error is nothing specific to any > > > particular test, it happens at certain points during the RUN phase. > > > Also, the BUILD phase has been completed successfully for all tests. > > > > > > Anybody have already seen this? Any thoughts ? > > > > > > Thanks in advance. > > > > > > Rafael > > > > > > > > > > -- > > Rafael Folco > > OpenHPC / Test Lead > > IBM Linux Technology Center > > E-Mail: rfo...@linux.vnet.ibm.com > > > > ___ > > mtt-users mailing list > > mtt-us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users -- Rafael Folco OpenHPC / Test Lead IBM Linux Technology Center E-Mail: rfo...@linux.vnet.ibm.com #== # Overall configuration #== [MTT] # OMPI Core: if you are not running in a scheduled
Re: [MTT users] MTT Error on SLES11
Well, I took a look at /tmp/ompi-core-testers/lib/MTT/Values.pm line 75. This piece of code looks wrong to me: if ($#{@$ret} < 0) { $ret references an array @$ret points to the first element of this array $# returns the number of elements So this line is trying to count elements of the first element??! Doesn't make sense. Correct me if I am wrong, what am I missing here ? "if ($#{$ret} < 0) {" would be correct, without @. I believe "strict refs" has been forced somewhere on SLES11... I also tried on other distro and it works fine. Thanks, Rafael On Tue, 2009-04-07 at 15:53 -0300, Rafael Folco wrote: > Hi, > > I'm trying to run MTT on SLES11, but I am getting an error message > during the RUN phase and I can't figure out what is the problem. > > *** ERROR: Module aborted: MTT::Test::Specify::Simple:Specify: Can't use > string ("183") as an ARRAY ref while "strict refs" in use at > /tmp/ompi-core-testers/lib/MTT/Values.pm line 75. > > What I could see was that this error is nothing specific to any > particular test, it happens at certain points during the RUN phase. > Also, the BUILD phase has been completed successfully for all tests. > > Anybody have already seen this? Any thoughts ? > > Thanks in advance. > > Rafael > > -- Rafael Folco OpenHPC / Test Lead IBM Linux Technology Center E-Mail: rfo...@linux.vnet.ibm.com
[MTT users] MTT Error on SLES11
Hi, I'm trying to run MTT on SLES11, but I am getting an error message during the RUN phase and I can't figure out what is the problem. *** ERROR: Module aborted: MTT::Test::Specify::Simple:Specify: Can't use string ("183") as an ARRAY ref while "strict refs" in use at /tmp/ompi-core-testers/lib/MTT/Values.pm line 75. What I could see was that this error is nothing specific to any particular test, it happens at certain points during the RUN phase. Also, the BUILD phase has been completed successfully for all tests. Anybody have already seen this? Any thoughts ? Thanks in advance. Rafael -- Rafael Folco OpenHPC / Test Lead IBM Linux Technology Center E-Mail: rfo...@linux.vnet.ibm.com
Re: [MTT users] RETRY EXCEEDED ERROR
Thanks for the response, Pasha. Yes, I agree this is some issue with the IB network. I came to the list looking for some previous experience of other users... I wonder why 10.2.1.90 works with all other nodes, 10.2.1.50 works with all other nodes as well, but they can't work together. Maybe OFED lists will be more appropriate for this kind of question. Regards, Rafael On Thu, 2008-07-31 at 18:52 +0300, Pavel Shamis (Pasha) wrote: > The "RETRY EXCEEDED ERROR" error is related to IB and not MTT. > > The error says that IB failed to send IB packet from > > machine 10.2.1.90 to 10.2.1.50 > > You need to run your IB network monitoring tool and found the issue. > > Usually it is some bad cable in IB fabric that causes such errors. > > Regards, > Pasha > > > Rafael Folco wrote: > > Hi, > > > > I need some help, please. > > > > I'm running a set of MTT tests on my cluster and I have issues in a > > particular node. > > > > [0,1,7][btl_openib_component.c:1332:btl_openib_component_progress] from > > 10.2.1.90 to: 10.2.1.50 error polling HP CQ with status RETRY EXCEEDED > > ERROR status number 12 for wr_id 268870712 opcode 0 > > > > I am able to ping from 10.2.1.90 to 10.2.1.50, and they are visible to > > each other in the network, just like the other nodes. I've already > > checked the drivers, reinstalled openmpi, but nothing changes... > > > > On 10.2.1.90: > > # ping 10.2.1.50 > > PING 10.2.1.50 (10.2.1.50) 56(84) bytes of data. > > 64 bytes from 10.2.1.50: icmp_seq=1 ttl=64 time=9.95 ms > > 64 bytes from 10.2.1.50: icmp_seq=2 ttl=64 time=0.076 ms > > 64 bytes from 10.2.1.50: icmp_seq=3 ttl=64 time=0.114 ms > > > > > > The cable connections are the same to every node and all tests run fine > > without 10.2.1.90. In the other hand, when I add 10.2.1.90 to the > > hostlist, I get many failures. > > > > Please, could someone tell me why 10.2.1.90 doesn't like 10.2.1.50 ? Any > > clue? > > > > I don't see any problems with other combination of nodes. This is very > > very weird. > > > > > > MTT Results Summary > > hostname: p6ihopenhpc1-ib0 > > uname: Linux p6ihopenhpc1-ib0 2.6.16.60-0.21-ppc64 #1 SMP Tue May 6 > > 12:41:02 UTC 2008 ppc64 ppc64 ppc64 GNU/Linux > > who am i: root pts/3Jul 31 13:31 (elm3b150:S.0) > > +-+-+--+--+--+--+ > > | Phase | Section | Pass | Fail | Time out | Skip | > > +-+-+--+--+--+--+ > > | MPI install | openmpi-1.2.5 | 1| 0| 0| 0| > > | Test Build | trivial | 1| 0| 0| 0| > > | Test Build | ibm | 1| 0| 0| 0| > > | Test Build | onesided| 1| 0| 0| 0| > > | Test Build | mpicxx | 1| 0| 0| 0| > > | Test Build | imb | 1| 0| 0| 0| > > | Test Build | netpipe | 1| 0| 0| 0| > > | Test Run| trivial | 4| 4| 0| 0| > > | Test Run| ibm | 59 | 120 | 0| 3| > > | Test Run| onesided| 95 | 37 | 0| 0| > > | Test Run| mpicxx | 0| 1| 0| 0| > > | Test Run| imb correctness | 0| 1| 0| 0| > > | Test Run| imb performance | 0| 12 | 0| 0| > > | Test Run| netpipe | 1| 0| 0| 0| > > +-+-+--+--+--+--+ > > > > > > I also attached one of the errors here. > > > > Thanks in advance, > > > > Rafael > > > > > > > > > > ___ > > mtt-users mailing list > > mtt-us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users > > > -- Rafael Folco OpenHPC / Brazil Test Lead IBM Linux Technology Center E-Mail: rfo...@linux.vnet.ibm.com