On 07/22/2014 01:31 AM, Angel de Vicente wrote:

If we focus on the "very bad" case:

Hardware:
+ Each node has 2x E5–2670 SandyBridge-EP chips, for a total of 16 cores
   per node
+ Network is Infiniband
+ Parallel file system: GPFS

As per the software versions:
+ Intel compilers, version: 13.0.1 20121010
+ Intel(R) MPI Library for Linux* OS, Version 4.1 Update 1 Build 20130507
+ HDF version: HDF5 1.8.10

Intel's MPI library does not have any explicit optimizations for GPFS, but the one optimization you need for GPFS is to align writes to the file system block size.

you can do this with an MPI-IO hint: set "striping_unit" to your gpfs block size (you can determine the gpfs block size via 'stat -f': see the 'Block size:' field.

Setting an MPI-IO hint via HDF5 requires setting up your file access property list appropriately: you will need a non-null INFO parameter to H5Pset_fapl_mpio

 http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetFaplMpio

in C, it's like this:
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "striping_unit", "8388608") ;
 /* or whatever your GPFS block size actually is*/
H5Pset_fapl_mpio(fapl, comm, info);


If you're with me so far, I think you'll see much better parallel write performance once the MPI-IO library is trying harder to align writes.

Are you familiar with the Darshan statistics tool? you can use it to confirm you are hitting (or not) unaligned writes.

==rob





The "good" case:

Hardware:
+ Each node has 2x E5-2680 SandaBridge chips, for a total of 16 cores
   per node
+ Network is Infiniband
+ Parallel file system: Lustre

Software:
+ Intel compilers, version 14.0.3 20140422
+ BullXMPI, which AFAIK is a fork of Open MPI, version. 1.2.7.2
+ HDF version: HDF5 1.8.9


These are the timings for your program on GPFS using hdf5 trunk, xlf
compiler, mpich 3.1.1. I don’t see a large difference in writing times
between datasets.

These timings look really good, but how did you run the 1024 cores one?
I mean, the code in Pastebin assumes that it will be run with 64 cores
(nblocks = 4), so I guess for the 8 cores run you set that to (nblocks =
2). And for 1024 cores?

Again, thanks a lot for your help. Any pointer appreciated,
Ángel de Vicente


8 cores:

  Timing report:
  Timer                                   Number Iterations Mean real time  
Mean CPU time        Minimum        Maximum
  ----------------------------------------                             (s)      
      (s)            (s)            (s)
  WRITINGPMLX                                             1     0.2100E+00     
0.2000E+00     0.2100E+00     0.2100E+00
  WRITINGPMLY                                             1     0.1600E+00     
0.1600E+00     0.1600E+00     0.1600E+00
  WRITINGPMLZ                                             1     0.1600E+00     
0.1600E+00     0.1600E+00     0.1600E+00

  Timer                                   Number Iterations Mean real time  
Mean CPU time        Minimum        Maximum
  ----------------------------------------                             (s)      
      (s)            (s)            (s)
  WRITINGPMLX                                             1     0.4500E+00     
0.4500E+00     0.4500E+00     0.4500E+00
  WRITINGPMLY                                             1     0.4000E+00     
0.4000E+00     0.4000E+00     0.4000E+00
  WRITINGPMLZ                                             1     0.4400E+00     
0.4500E+00     0.4400E+00     0.4400E+00

  Timer                                   Number Iterations Mean real time  
Mean CPU time        Minimum        Maximum
  ----------------------------------------                             (s)      
      (s)            (s)            (s)
  WRITINGPMLX                                             1     0.1470E+01     
0.1460E+01     0.1470E+01     0.1470E+01
  WRITINGPMLY                                             1     0.1580E+01     
0.1580E+01     0.1580E+01     0.1580E+01
  WRITINGPMLZ                                             1     0.1730E+01     
0.1730E+01     0.1730E+01     0.1730E+01


1024 cores:

  Timer                                   Number Iterations Mean real time  
Mean CPU time        Minimum        Maximum
  ----------------------------------------                             (s)      
      (s)            (s)            (s)
  WRITINGPMLX                                             1     0.5118E+02     
0.5118E+02     0.5118E+02     0.5118E+02
  WRITINGPMLY                                             1     0.5228E+02     
0.5228E+02     0.5228E+02     0.5228E+02
  WRITINGPMLZ                                             1     0.5296E+02     
0.5296E+02     0.5296E+02     0.5296E+02


  Timer                                   Number Iterations Mean real time  
Mean CPU time        Minimum        Maximum
  ----------------------------------------                             (s)      
      (s)            (s)            (s)
  WRITINGPMLX                                             1     0.5185E+02     
0.5185E+02     0.5185E+02     0.5185E+02
  WRITINGPMLY                                             1     0.5543E+02     
0.5543E+02     0.5543E+02     0.5543E+02
  WRITINGPMLZ                                             1     0.5675E+02     
0.5675E+02     0.5675E+02     0.5675E+02

   Timer                                   Number Iterations Mean real time  
Mean CPU time        Minimum        Maximum
  ----------------------------------------                             (s)      
      (s)            (s)            (s)
  WRITINGPMLX                                             1     0.5035E+02     
0.5035E+02     0.5035E+02     0.5035E+02
  WRITINGPMLY                                             1     0.5739E+02     
0.5739E+02     0.5739E+02     0.5739E+02
  WRITINGPMLZ                                             1     0.5174E+02     
0.5175E+02     0.5174E+02     0.5174E+02


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5




_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to