On 07/22/2014 01:31 AM, Angel de Vicente wrote:
If we focus on the "very bad" case:
Hardware:
+ Each node has 2x E5–2670 SandyBridge-EP chips, for a total of 16 cores
per node
+ Network is Infiniband
+ Parallel file system: GPFS
As per the software versions:
+ Intel compilers, version: 13.0.1 20121010
+ Intel(R) MPI Library for Linux* OS, Version 4.1 Update 1 Build 20130507
+ HDF version: HDF5 1.8.10
Intel's MPI library does not have any explicit optimizations for GPFS,
but the one optimization you need for GPFS is to align writes to the
file system block size.
you can do this with an MPI-IO hint: set "striping_unit" to your gpfs
block size (you can determine the gpfs block size via 'stat -f': see the
'Block size:' field.
Setting an MPI-IO hint via HDF5 requires setting up your file access
property list appropriately: you will need a non-null INFO parameter to
H5Pset_fapl_mpio
http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetFaplMpio
in C, it's like this:
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "striping_unit", "8388608") ;
/* or whatever your GPFS block size actually is*/
H5Pset_fapl_mpio(fapl, comm, info);
If you're with me so far, I think you'll see much better parallel write
performance once the MPI-IO library is trying harder to align writes.
Are you familiar with the Darshan statistics tool? you can use it to
confirm you are hitting (or not) unaligned writes.
==rob
The "good" case:
Hardware:
+ Each node has 2x E5-2680 SandaBridge chips, for a total of 16 cores
per node
+ Network is Infiniband
+ Parallel file system: Lustre
Software:
+ Intel compilers, version 14.0.3 20140422
+ BullXMPI, which AFAIK is a fork of Open MPI, version. 1.2.7.2
+ HDF version: HDF5 1.8.9
These are the timings for your program on GPFS using hdf5 trunk, xlf
compiler, mpich 3.1.1. I don’t see a large difference in writing times
between datasets.
These timings look really good, but how did you run the 1024 cores one?
I mean, the code in Pastebin assumes that it will be run with 64 cores
(nblocks = 4), so I guess for the 8 cores run you set that to (nblocks =
2). And for 1024 cores?
Again, thanks a lot for your help. Any pointer appreciated,
Ángel de Vicente
8 cores:
Timing report:
Timer Number Iterations Mean real time
Mean CPU time Minimum Maximum
---------------------------------------- (s)
(s) (s) (s)
WRITINGPMLX 1 0.2100E+00
0.2000E+00 0.2100E+00 0.2100E+00
WRITINGPMLY 1 0.1600E+00
0.1600E+00 0.1600E+00 0.1600E+00
WRITINGPMLZ 1 0.1600E+00
0.1600E+00 0.1600E+00 0.1600E+00
Timer Number Iterations Mean real time
Mean CPU time Minimum Maximum
---------------------------------------- (s)
(s) (s) (s)
WRITINGPMLX 1 0.4500E+00
0.4500E+00 0.4500E+00 0.4500E+00
WRITINGPMLY 1 0.4000E+00
0.4000E+00 0.4000E+00 0.4000E+00
WRITINGPMLZ 1 0.4400E+00
0.4500E+00 0.4400E+00 0.4400E+00
Timer Number Iterations Mean real time
Mean CPU time Minimum Maximum
---------------------------------------- (s)
(s) (s) (s)
WRITINGPMLX 1 0.1470E+01
0.1460E+01 0.1470E+01 0.1470E+01
WRITINGPMLY 1 0.1580E+01
0.1580E+01 0.1580E+01 0.1580E+01
WRITINGPMLZ 1 0.1730E+01
0.1730E+01 0.1730E+01 0.1730E+01
1024 cores:
Timer Number Iterations Mean real time
Mean CPU time Minimum Maximum
---------------------------------------- (s)
(s) (s) (s)
WRITINGPMLX 1 0.5118E+02
0.5118E+02 0.5118E+02 0.5118E+02
WRITINGPMLY 1 0.5228E+02
0.5228E+02 0.5228E+02 0.5228E+02
WRITINGPMLZ 1 0.5296E+02
0.5296E+02 0.5296E+02 0.5296E+02
Timer Number Iterations Mean real time
Mean CPU time Minimum Maximum
---------------------------------------- (s)
(s) (s) (s)
WRITINGPMLX 1 0.5185E+02
0.5185E+02 0.5185E+02 0.5185E+02
WRITINGPMLY 1 0.5543E+02
0.5543E+02 0.5543E+02 0.5543E+02
WRITINGPMLZ 1 0.5675E+02
0.5675E+02 0.5675E+02 0.5675E+02
Timer Number Iterations Mean real time
Mean CPU time Minimum Maximum
---------------------------------------- (s)
(s) (s) (s)
WRITINGPMLX 1 0.5035E+02
0.5035E+02 0.5035E+02 0.5035E+02
WRITINGPMLY 1 0.5739E+02
0.5739E+02 0.5739E+02 0.5739E+02
WRITINGPMLZ 1 0.5174E+02
0.5175E+02 0.5174E+02 0.5174E+02
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5