Hi Simon,

The t_pflush1 test is the first half of a two-part test that makes sure that a 
file can be read after a parallel application crashes as long the file is 
flushed first. The crash is simulated by calling exit(0) which causes the 
openmpi error messages.

So yes the test passes, but the behavior is not playing nice with open mpi 
(calling exit(0)). I would safely ignore the error from this test for now. 

I am certain that this does not have to do with the I/O problems with your 
application. If you want to further investigate the I/O issues with you 
application, please send a message here or to helpdesk with a way to replicate 
those I/O problems so we can further investigate.

Thanks,
Mohamad


-----Original Message-----
From: Hdf-forum [mailto:[email protected]] On Behalf Of 
Hammond, Simon David (-EXP)
Sent: Monday, January 11, 2016 11:05 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] [EXTERNAL] Re: HDF5 1.8.[14, 15, 16] with OpenMPI 
1.10.1 and Intel 16.1

Hi Matt,

I did go ahead and install the libraries but I’m getting some I/O problems with 
my application. I’m not sure if we can trust the PASSED statement.
This isn’t my area of expertise so I’m not sure what else I should try but its 
good someone else can replicate the behavior I’m seeing locally.

Thanks for your help,

S.

--
Si Hammond

Scalable Computer Architectures
Sandia National Laboratories, NM, USA





On 1/11/16, 9:47 AM, "Hdf-forum on behalf of Thompson, Matt 
(GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]"
<[email protected] on behalf of [email protected]> 
wrote:

>Simon, et al,
>
>I just built HDF5 1.8.15 using Intel 16.0.0.109 and Open MPI 1.10.0 
>(the only combo available to me at present that is 16/1.10) and I can 
>see the same behavior. I also tried setting RUNPARALLEL='mpirun -np 6' 
>in case it was a mpiexec v mpirun issue.
>
>Indeed, if you run the exectuable yourself
>
>> (1094) $ mpirun -q -np 6 ./t_pflush1
>> Testing H5Fflush (part1)
>>*** Hint ***
>> You can use environment variable HDF5_PARAPREFIX to run parallel test 
>>files in a  different directory or to add file type prefix. E.g.,
>>    HDF5_PARAPREFIX=pfs:/PFS/user/me
>>    export HDF5_PARAPREFIX
>> *** End of Hint ***
>>  PASSED
>> (1095) $ echo $?
>> 1
>> (1096) $
>
>I would not expect a return code of 1 to correspond with a PASSED. If 
>one goes past that, everything else seems to "work" but can we trust it?
>
>> (1100) $ mpirun -q -np 6 ./t_pflush1
>> Testing H5Fflush (part1)
>>*** Hint ***
>> You can use environment variable HDF5_PARAPREFIX to run parallel test 
>>files in a  different directory or to add file type prefix. E.g.,
>>    HDF5_PARAPREFIX=pfs:/PFS/user/me
>>    export HDF5_PARAPREFIX
>> *** End of Hint ***
>>  PASSED
>> (1101) $ echo $?
>> 1
>> (1102) $ mpirun -q -np 6 ./t_pflush2
>> Testing H5Fflush (part2 with flush)
>>*** Hint ***
>> You can use environment variable HDF5_PARAPREFIX to run parallel test 
>>files in a  different directory or to add file type prefix. E.g.,
>>    HDF5_PARAPREFIX=pfs:/PFS/user/me
>>    export HDF5_PARAPREFIX
>> *** End of Hint ***
>>  PASSED
>> Testing H5Fflush (part2 without flush) PASSED
>> (1103) $ echo $?
>> 0
>> (1104) $ mpirun -q -np 6 ./t_pshutdown  Testing proper shutdown of 
>>HDF5 library
>>*** Hint ***
>> You can use environment variable HDF5_PARAPREFIX to run parallel test 
>>files in a  different directory or to add file type prefix. E.g.,
>>    HDF5_PARAPREFIX=pfs:/PFS/user/me
>>    export HDF5_PARAPREFIX
>> *** End of Hint ***
>>  PASSED
>> (1105) $ echo $?
>> 0
>> (1106) $ mpirun -q -np 6 ./t_prestart  Testing proper shutdown of 
>>HDF5 library
>>*** Hint ***
>> You can use environment variable HDF5_PARAPREFIX to run parallel test 
>>files in a  different directory or to add file type prefix. E.g.,
>>    HDF5_PARAPREFIX=pfs:/PFS/user/me
>>    export HDF5_PARAPREFIX
>> *** End of Hint ***
>>  PASSED
>> (1107) $ echo $?
>> 0
>> (1108) $ mpirun -q -np 6 ./t_shapesame  
>>===================================
>> Shape Same Tests Start
>>      express_test = 1.
>> ===================================
>> MPI-process 3. hostname=borgo011
>>  ...SKIP!...
>>      980 of 1224 subtests skipped to expedite testing.
>>
>>
>>
>>
>> All tests were successful.
>>
>> All tests were successful.
>>
>> ===================================
>> Shape Same tests finished with no errors
>> ===================================
>> (1109) $ echo $?
>> 0
>> (1110) $
>>
>
>
>Matt
>
>On 01/10/2016 09:46 PM, Hammond, Simon David (-EXP) wrote:
>> Hi HDF5 users/developers,
>>
>> I am trying to bring up HDF5 1.8.X (X = 14, 15, 16) on a new test bed
>>system at Sandia using OpenMPI 1.10.1 and Intel 16.1.150 compilers.
>>
>> During the "make test" I get the following output for these libraries.
>>Rather oddly, the tests are reported as a failure but the test being run
>>is reported PASSED (see the output below).
>>
>> If I proceed to go ahead and install the libraries then I do receive
>>I/O errors in my application.
>>
>> Has anyone tried this combination of HDF5, MPI and Intel compilers and
>>had success or are there some pointers to what may need to be checked?
>>
>> Thanks for your help,
>>
>> Configure Line:
>>
>> ./configure 
>>--prefix=/home/projects/x86-64/hdf5/1.8.15/openmpi/1.10.1/intel/16.1.150
>>--enable-static --disable-shared --enable-parallel --enable-hl
>>CC=/home/projects/x86-64/openmpi/1.10.1/intel/16.1.150/bin/mpicc
>>CXX=/home/projects/x86-64/openmpi/1.10.1/intel/16.1.150/bin/mpicxx
>>FC=/home/projects/x86-64/openmpi/1.10.1/intel/16.1.150/bin/mpif90
>>CFLAGS="-O3 -g -fPIC" CXXFLAGS="-O3 -g -fPIC" FCFLAGS="-O3 -g -fPIC"
>>--enable-fortran --with-zlib=/home/projects/x86-64/zlib/1.2.8
>>
>> Test Output:
>>
>> <.. Lots of Output ..>
>> make[4]: Entering directory `/home/sdhammo/hdf5/hdf5-1.8.15/testpar'
>> ============================
>> Testing  t_pflush1
>> ============================
>>   t_pflush1  Test Log
>> ============================
>> Testing H5Fflush (part1)
>>*** Hint ***
>> You can use environment variable HDF5_PARAPREFIX to run parallel test
>>files in a
>> different directory or to add file type prefix. E.g.,
>>     HDF5_PARAPREFIX=pfs:/PFS/user/me
>>     export HDF5_PARAPREFIX
>> *** End of Hint ***
>>   PASSED
>> 
>>-------------------------------------------------------------------------
>>-
>> mpiexec has exited due to process rank 0 with PID 91152 on
>> node node17 exiting improperly. There are three reasons this could
>>occur:
>>
>> 1. this process did not call "init" before exiting, but others in
>> the job did. This can cause a job to hang indefinitely while it waits
>> for all processes to call "init". By rule, if one process calls "init",
>> then ALL processes must call "init" prior to termination.
>>
>> 2. this process called "init", but exited without calling "finalize".
>> By rule, all processes that call "init" MUST call "finalize" prior to
>> exiting or it will be considered an "abnormal termination"
>>
>> 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
>> orte_create_session_dirs is set to false. In this case, the run-time
>>cannot
>> detect that the abort call was an abnormal termination. Hence, the only
>> error message you will receive is this one.
>>
>> This may have caused other processes in the application to be
>> terminated by signals sent by mpiexec (as reported here).
>>
>> You can avoid this message by specifying -quiet on the mpiexec command
>>line.
>>
>> 
>>-------------------------------------------------------------------------
>>-
>> 21.72user 2.13system 0:08.38elapsed 284%CPU (0avgtext+0avgdata
>>12960maxresident)k
>> 9568inputs+3896outputs (1major+30976minor)pagefaults 0swaps
>> make[4]: *** [t_pflush1.chkexe_] Error 1
>> make[4]: Leaving directory `/home/sdhammo/hdf5/hdf5-1.8.15/testpar'
>> make[3]: *** [build-check-p] Error 1
>> make[3]: Leaving directory `/home/sdhammo/hdf5/hdf5-1.8.15/testpar'
>> make[2]: *** [test] Error 2
>> make[2]: Leaving directory `/home/sdhammo/hdf5/hdf5-1.8.15/testpar'
>> make[1]: *** [check-am] Error 2
>> make[1]: Leaving directory `/home/sdhammo/hdf5/hdf5-1.8.15/testpar'
>> make: *** [check-recursive] Error 1
>>
>>
>>
>> --
>> Simon Hammond
>> Center for Computing Research (Scalable Computer Architectures)
>> Sandia National Laboratories, NM
>> [Sent from remote connection, please excuse typing errors]
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
>>
>
>
>-- 
>Matt Thompson, SSAI, Sr Scientific Programmer/Analyst
>NASA GSFC,    Global Modeling and Assimilation Office
>Code 610.1,  8800 Greenbelt Rd,  Greenbelt,  MD 20771
>Phone: 301-614-6712                 Fax: 301-614-6246
>http://science.gsfc.nasa.gov/sed/bio/matthew.thompson
>
>_______________________________________________
>Hdf-forum is for HDF software users discussion.
>[email protected]
>http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
  • [Hdf-for... Hammond, Simon David (-EXP)
    • Re:... Thompson, Matt (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]
      • ... Hammond, Simon David (-EXP)
        • ... Mohamad Chaarawi

Reply via email to