On Jul 10, 2014, at 6:08 PM, Angel de Vicente <[email protected]> wrote:

> Hi,
> 
> Angel de Vicente <[email protected]> writes:
>> I'm having a hard time trying to figure out what could be causing the
>> slow I/O behaviour that I see: the same code (in Fortran) run in three
>> different clusters behaves pretty similar in terms of I/O times, except
>> for one of the variables in the code, where I get two orders of
>> magnitude slower writes in one of machines (last timing data in the
>> e-mail). So I hope that somebody with more in-depth knowledge of
>> Parallel HDF5 can give me a hand with it.
> 
> regarding this issue, I extracted only the relevant parts of the code
> and I can reproduce the behaviour I was explaining in the previous
> e-mail with a very simple code, which you can see at:
> 
> http://pastebin.com/HjnS82Gp
> 
> (to compile it I just do h5pfc -o phdf5write timing.f90 phdf5write.f90,
> where timing.f90 is timing routine by Arjen Markus, available at
> http://pastebin.com/480RmNET)
> 
> The code generates a 4D array with the relevant sizes, and writes the
> relevant part of the data to a file in three modes PMLX, PMLY, and
> PMLZ (these replicate the ranks that I would get for X,Y and Z planes
> respectively when creating a Cartesian topology with the relevant MPI
> routines). 
> 
> The clusters where I run this code all have 16 cores per node. When I
> run this test code in only 8 cores (nblocks set to 2), then the three
> clusters behave similarly and there is no penalty for writing the
> PMLZ. When I run it in 64 nodes, only PMLZ is heavily penalized, and
> only very badly in one cluster, while badly in the other one. It looks
> like there is some contention issue with the Parallel file system when
> the cores span a number of nodes, but I certainly don't understand why
> it only affects the PMLZ variable and not the PMLY, and why one of the
> clusters doesn’t seem to be affected.

Can you be more specific about the hardware and the software you are using for 
each case (especially for the “very bad” case)? 
What architecture?
Parallel file system type?
What compiler/mpi type and version?
What version of HDF?

These are the timings for your program on GPFS using hdf5 trunk, xlf compiler, 
mpich 3.1.1. I don’t see a large difference in writing times between datasets.

8 cores:

 Timing report:
 Timer                                   Number Iterations Mean real time  Mean 
CPU time        Minimum        Maximum
 ----------------------------------------                             (s)       
     (s)            (s)            (s)
 WRITINGPMLX                                             1     0.2100E+00     
0.2000E+00     0.2100E+00     0.2100E+00
 WRITINGPMLY                                             1     0.1600E+00     
0.1600E+00     0.1600E+00     0.1600E+00
 WRITINGPMLZ                                             1     0.1600E+00     
0.1600E+00     0.1600E+00     0.1600E+00

 Timer                                   Number Iterations Mean real time  Mean 
CPU time        Minimum        Maximum
 ----------------------------------------                             (s)       
     (s)            (s)            (s)
 WRITINGPMLX                                             1     0.4500E+00     
0.4500E+00     0.4500E+00     0.4500E+00
 WRITINGPMLY                                             1     0.4000E+00     
0.4000E+00     0.4000E+00     0.4000E+00
 WRITINGPMLZ                                             1     0.4400E+00     
0.4500E+00     0.4400E+00     0.4400E+00

 Timer                                   Number Iterations Mean real time  Mean 
CPU time        Minimum        Maximum
 ----------------------------------------                             (s)       
     (s)            (s)            (s)
 WRITINGPMLX                                             1     0.1470E+01     
0.1460E+01     0.1470E+01     0.1470E+01
 WRITINGPMLY                                             1     0.1580E+01     
0.1580E+01     0.1580E+01     0.1580E+01
 WRITINGPMLZ                                             1     0.1730E+01     
0.1730E+01     0.1730E+01     0.1730E+01


1024 cores:

 Timer                                   Number Iterations Mean real time  Mean 
CPU time        Minimum        Maximum
 ----------------------------------------                             (s)       
     (s)            (s)            (s)
 WRITINGPMLX                                             1     0.5118E+02     
0.5118E+02     0.5118E+02     0.5118E+02
 WRITINGPMLY                                             1     0.5228E+02     
0.5228E+02     0.5228E+02     0.5228E+02
 WRITINGPMLZ                                             1     0.5296E+02     
0.5296E+02     0.5296E+02     0.5296E+02


 Timer                                   Number Iterations Mean real time  Mean 
CPU time        Minimum        Maximum
 ----------------------------------------                             (s)       
     (s)            (s)            (s)
 WRITINGPMLX                                             1     0.5185E+02     
0.5185E+02     0.5185E+02     0.5185E+02
 WRITINGPMLY                                             1     0.5543E+02     
0.5543E+02     0.5543E+02     0.5543E+02
 WRITINGPMLZ                                             1     0.5675E+02     
0.5675E+02     0.5675E+02     0.5675E+02

  Timer                                   Number Iterations Mean real time  
Mean CPU time        Minimum        Maximum
 ----------------------------------------                             (s)       
     (s)            (s)            (s)
 WRITINGPMLX                                             1     0.5035E+02     
0.5035E+02     0.5035E+02     0.5035E+02
 WRITINGPMLY                                             1     0.5739E+02     
0.5739E+02     0.5739E+02     0.5739E+02
 WRITINGPMLZ                                             1     0.5174E+02     
0.5175E+02     0.5174E+02     0.5174E+02


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to