After a lot of testing, my understanding is that :
1. if the data
(that belong to each MPI proc.) must be interleaved in the file, then
P-HDF5 (and MPI-IO) can reduce significantly the elapsed time spent for
IO
2. if not (independent data written independently by each MPI proc.),
then P-HDF5 / MPI-IO / sequential approaches are equivalent

A
posteriori, this seems logical to me. Are there other situations where
HDF5 may improve the IO speed-up (reduce elapsed time) ?

Franck

Le
2014-08-08 17:26, Rob Latham a écrit :
> On 08/08/2014 03:27 AM, houssen
wrote:
>> In short : are there things to know / make sure of / be aware
of to get
>> good performance with P-HDF5 ?
> 
> - turn on collective
I/O. it's not enabled by default
> 
> - HDF5 metadata might be a factor
if you have very many small
> datasets, but for most applications it's
not important
> 
> - consult your MPI library for any file-system
specific tuning you
> might be able to do. For example, Intel-MPI needs
you to set an
> environment variable before it will use any of the GPFS
or Panasas
> optimizations it has written.
> 
> - be mindful of type
conversions: if your data in memory is a 4-byte
> float, but they are
8-byte doubles on disk, HDF5 will "break
> collective" and do that I/O
independently.
> 
> 
>> To test this I wrote a MPI code. ... I expected
to get better
>> performance with MPI-IO and P-HDF5 than with the
sequential approach.
>> The spirit of this test code is very simple /
basic (each MPI process
>> writes his own block of data in the same
file, or, in separate files in
>> the sequential approach).
> 
>> Note :
in each case (sequential, MPI-IO, P-HDF5), when I say "write data
>> in
file", I mean writing big blocks / bunch of data at once (I do not
>>
write data one by one - I write the biggest block of data, but
smaller
>> than 2Gb, that is possible to write).
>> Note : I tried with
N = 1, 2, 4, 8, 16.
> 
> in 2014, 16 is not very parallel. serial I/O
has many benefits at
> modest levels of parallelism: caching, mostly.
>

>> Note : I generated files (MPI-IO, P-HDF5) whose size scaled from 1Gb
to
>> 16 Gb (which looks like a "very big" file to me).
> 
> that's
adequate, yes
> 
>> Note : I followed the P-HDF5 documentation (use
H5P_FILE_ACCESS and
>> H5P_DATASET_XFER property list + use hyperslab
"by chunks")
>> Note : the file system is "GPFS" (it has been installed
by the cluster
>> vendor : this is supposed to be ready to get
performance out of P-HDF5 -
>> I am an "application" guy that try to use
HDF5, I am not an "admin sys"
>> that would be familiar with complex
related stuffs related to the file
>> system)
> 
> Now we are getting
somewhere.
> 
>> Note : I compiled the HDF5 package like this
"./configure
>> --enable-parallel".
>> Note : I use CentOS + GNU
compilers (for both HDF5 package and my test
>> code) + hdf5-1.8.13
>>
Note : I use mpic++ (not h5pxx compilers - actually I didn't get why
>>
HDF5 provides compilers) to compile my test code, is this a problem ?
>

> just makes it easier to pick up any libraries needed. I don't use
>
the wrappers, either, which means sometimes I need to figure out what
>
new library (like -ldl) HDF5 needs.
> 
>> Any relevant clue /
information would be appreciated. If what I observe
>> is logical I
would just understand why, and, how / when it is possible
>> to get
performance out of P-HDF5. I just would like to get some logic
>> out of
this.
> 
> If you are using GPFS, there is one optimization that goes a
long way
> towards improving performance: aligning writes to file system
block
> boundaries. See this email from a few weeks ago:
> 
>
http://mail.lists.hdfgroup.org/pipermail/hdf-forum_lists.hdfgroup.org/2014-July/007963.html
>

> ==rob
> 
>>
>> Thanks for help,
>>
>> FH
>>
>> PS : I can give more
information and the code, if needed (?)
>>
>>
>>
_______________________________________________
>> Hdf-forum is for HDF
software users discussion.
>> [email protected]
>>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>
Twitter: https://twitter.com/hdf5
>>

 
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to