RE: [ewg] FW: complete test summary
Tziporet, They have tested with same vendor HCAs and the MPI test are passing. Rupert -Original Message- From: Tziporet Koren [mailto:tzipo...@dev.mellanox.co.il] Sent: Thursday, December 11, 2008 5:22 AM To: Rupert Dance Cc: ewg@lists.openfabrics.org; ofa...@postal.iol.unh.edu Subject: Re: [ewg] FW: complete test summary Rupert Dance wrote: > Hello Tziporet, > > Here is the final UNH IOL summary report of testing done on RC6. > Many thanks > Regarding the mandatory tests, the Link Init failure is a specific > vendor issue and IPoIB failure has been documented in Bug 1287. > > The MPI failures in the Beta tests are being research now. I have > asked Jeff and Arlin Davis to look into these failures. UNH has noted > that it only occurs when the cluster includes HCA from multiple > vendors and when the number of processors exceeds 38. Jeff made the > following comment "I'm not entirely surprised that OMPI fails when > used with multiple vendor HCAs; I don't know if anyone has ever tested > that before...? I would not make it a requirement for passing that > OMPI has to work in a single MPI job with multiple vendor HCAs; I > don't know of many (any?) real-world environments that do this." > Can you test with same vendor HCA on all nodes and see this is passing > Thanks > > Rupert > > -Original Message- > From: Nickolas Wood [mailto:n...@iol.unh.edu] > Sent: Wednesday, December 10, 2008 9:41 AM > To: Rupert Dance > Cc: ofa...@postal.iol.unh.edu > Subject: complete test summary > > Hi, > It was my understanding that incremental status reports were > acceptable regarding the rc6 testing. I have been told that it was > not, there fore I have combined the previous emails into one for easier use. > > All the below results were gathered while using the complete, > multi vendor cluster with ofed 1.4 rc6 and the topology used during > the debug event. This results in a 62 process mpi cluster. > > Mandatory test results: >Link Init: FAIL - link speed issue >Fabric init: pass >IPoIB-Datagram: FAIL - initial packet loss >iSER: NA - no iSER target to test against >SRP: pass >SDP: pass > > BETA tests completed: >IPoIB-Connected: pass >mvapich1: pingping, pingpong tests - pass > all tests - FAIL >mvapich2: pingping, pingpong tests - pass > all tests - FAIL >openmpi: pingping, pingpong tests - pass > all tests - FAIL >intelmpi: pingping, pingpong tests - pass > all tests - FAIL >hpmpi:all tests - FAIL >dapltest: pass > > -Nick > > > ___ > ewg mailing list > ewg@lists.openfabrics.org > http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg > > ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] FW: complete test summary
Rupert Dance wrote: Hello Tziporet, Here is the final UNH IOL summary report of testing done on RC6. Many thanks Regarding the mandatory tests, the Link Init failure is a specific vendor issue and IPoIB failure has been documented in Bug 1287. The MPI failures in the Beta tests are being research now. I have asked Jeff and Arlin Davis to look into these failures. UNH has noted that it only occurs when the cluster includes HCA from multiple vendors and when the number of processors exceeds 38. Jeff made the following comment "I'm not entirely surprised that OMPI fails when used with multiple vendor HCAs; I don't know if anyone has ever tested that before...? I would not make it a requirement for passing that OMPI has to work in a single MPI job with multiple vendor HCAs; I don't know of many (any?) real-world environments that do this." Can you test with same vendor HCA on all nodes and see this is passing Thanks Rupert -Original Message- From: Nickolas Wood [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 10, 2008 9:41 AM To: Rupert Dance Cc: [EMAIL PROTECTED] Subject: complete test summary Hi, It was my understanding that incremental status reports were acceptable regarding the rc6 testing. I have been told that it was not, there fore I have combined the previous emails into one for easier use. All the below results were gathered while using the complete, multi vendor cluster with ofed 1.4 rc6 and the topology used during the debug event. This results in a 62 process mpi cluster. Mandatory test results: Link Init: FAIL - link speed issue Fabric init: pass IPoIB-Datagram: FAIL - initial packet loss iSER: NA - no iSER target to test against SRP: pass SDP: pass BETA tests completed: IPoIB-Connected: pass mvapich1: pingping, pingpong tests - pass all tests - FAIL mvapich2: pingping, pingpong tests - pass all tests - FAIL openmpi: pingping, pingpong tests - pass all tests - FAIL intelmpi: pingping, pingpong tests - pass all tests - FAIL hpmpi:all tests - FAIL dapltest: pass -Nick ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] FW: complete test summary
Hello Tziporet, Here is the final UNH IOL summary report of testing done on RC6. Regarding the mandatory tests, the Link Init failure is a specific vendor issue and IPoIB failure has been documented in Bug 1287. The MPI failures in the Beta tests are being research now. I have asked Jeff and Arlin Davis to look into these failures. UNH has noted that it only occurs when the cluster includes HCA from multiple vendors and when the number of processors exceeds 38. Jeff made the following comment "I'm not entirely surprised that OMPI fails when used with multiple vendor HCAs; I don't know if anyone has ever tested that before...? I would not make it a requirement for passing that OMPI has to work in a single MPI job with multiple vendor HCAs; I don't know of many (any?) real-world environments that do this." Thanks Rupert -Original Message- From: Nickolas Wood [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 10, 2008 9:41 AM To: Rupert Dance Cc: [EMAIL PROTECTED] Subject: complete test summary Hi, It was my understanding that incremental status reports were acceptable regarding the rc6 testing. I have been told that it was not, there fore I have combined the previous emails into one for easier use. All the below results were gathered while using the complete, multi vendor cluster with ofed 1.4 rc6 and the topology used during the debug event. This results in a 62 process mpi cluster. Mandatory test results: Link Init: FAIL - link speed issue Fabric init: pass IPoIB-Datagram: FAIL - initial packet loss iSER: NA - no iSER target to test against SRP: pass SDP: pass BETA tests completed: IPoIB-Connected: pass mvapich1: pingping, pingpong tests - pass all tests - FAIL mvapich2: pingping, pingpong tests - pass all tests - FAIL openmpi: pingping, pingpong tests - pass all tests - FAIL intelmpi: pingping, pingpong tests - pass all tests - FAIL hpmpi:all tests - FAIL dapltest: pass -Nick ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg