Re: [Hdf-forum] Slow writing parallel HDF5 performance (for only one variable)

Scot Breitenfeld Tue, 22 Jul 2014 11:36:15 -0700

My mistake, results for 300:

nblocks = 2, run on 8 cores (1 thread per core), 1 node.



 Timer                                   Number Iterations Mean real time  Mean 
CPU time        Minimum        Maximum
 ----------------------------------------                             (s)       
     (s)            (s)            (s)
 WRITINGPMLX                                             1     0.8300E+00     
0.8300E+00     0.8300E+00     0.8300E+00
 WRITINGPMLY                                             1     0.8100E+00     
0.8100E+00     0.8100E+00     0.8100E+00
 WRITINGPMLZ                                             1     0.7500E+00     
0.7400E+00     0.7500E+00     0.7500E+00

nblocks = 4, run on 64 cores (1 thread per core), 4 nodes.

Timer                                   Number Iterations Mean real time  Mean 
CPU time        Minimum        Maximum
 ----------------------------------------                             (s)       
     (s)            (s)            (s)
 WRITINGPMLX                                             1     0.7800E+00     
0.7700E+00     0.7800E+00     0.7800E+00
 WRITINGPMLY                                             1     0.1020E+01     
0.1020E+01     0.1020E+01     0.1020E+01
 WRITINGPMLZ                                             1     0.8500E+00     
0.8500E+00     0.8500E+00     0.8500E+00

I’m not sure if that is too terrible. I did not use any MPI-IO hints.

An example in Fortran for MPI hints is:

INTEGER :: info

CALL MPI_Info_create(info, mpierror )
CALL MPI_Info_set(info, “IBM_largeblock_io", "true", mpierror);

CALL h5pset_fapl_mpio_f(plist_id, MPI_COMM_WORLD, info, hdferr)

You can change "quoted” code to Rob’s suggestions and like Rob said, running it 
with Darshan should help in tuning,

http://www.mcs.anl.gov/research/projects/darshan/

Scot


On Jul 22, 2014, at 1:31 AM, Angel de Vicente 
<[email protected]<mailto:[email protected]>> wrote:

Hi Scot,

thanks for trying this out.

Scot Breitenfeld <[email protected]<mailto:[email protected]>> writes:
Can you be more specific about the hardware and the software you are
using for each case (especially for the “very bad” case)?
What architecture?
Parallel file system type?
What compiler/mpi type and version?
What version of HDF?


If we focus on the "very bad" case:

Hardware:
+ Each node has 2x E5–2670 SandyBridge-EP chips, for a total of 16 cores
 per node
+ Network is Infiniband
+ Parallel file system: GPFS

As per the software versions:
+ Intel compilers, version: 13.0.1 20121010
+ Intel(R) MPI Library for Linux* OS, Version 4.1 Update 1 Build 20130507
+ HDF version: HDF5 1.8.10


The "good" case:

Hardware:
+ Each node has 2x E5-2680 SandaBridge chips, for a total of 16 cores
 per node
+ Network is Infiniband
+ Parallel file system: Lustre

Software:
+ Intel compilers, version 14.0.3 20140422
+ BullXMPI, which AFAIK is a fork of Open MPI, version. 1.2.7.2
+ HDF version: HDF5 1.8.9


These are the timings for your program on GPFS using hdf5 trunk, xlf
compiler, mpich 3.1.1. I don’t see a large difference in writing times
between datasets.

These timings look really good, but how did you run the 1024 cores one?
I mean, the code in Pastebin assumes that it will be run with 64 cores
(nblocks = 4), so I guess for the 8 cores run you set that to (nblocks =
2). And for 1024 cores?

Again, thanks a lot for your help. Any pointer appreciated,
Ángel de Vicente


8 cores:

Timing report:
Timer                                   Number Iterations Mean real time  Mean 
CPU time        Minimum        Maximum
----------------------------------------                             (s)        
    (s)            (s)            (s)
WRITINGPMLX                                             1     0.2100E+00     
0.2000E+00     0.2100E+00     0.2100E+00
WRITINGPMLY                                             1     0.1600E+00     
0.1600E+00     0.1600E+00     0.1600E+00
WRITINGPMLZ                                             1     0.1600E+00     
0.1600E+00     0.1600E+00     0.1600E+00

Timer                                   Number Iterations Mean real time  Mean 
CPU time        Minimum        Maximum
----------------------------------------                             (s)        
    (s)            (s)            (s)
WRITINGPMLX                                             1     0.4500E+00     
0.4500E+00     0.4500E+00     0.4500E+00
WRITINGPMLY                                             1     0.4000E+00     
0.4000E+00     0.4000E+00     0.4000E+00
WRITINGPMLZ                                             1     0.4400E+00     
0.4500E+00     0.4400E+00     0.4400E+00

Timer                                   Number Iterations Mean real time  Mean 
CPU time        Minimum        Maximum
----------------------------------------                             (s)        
    (s)            (s)            (s)
WRITINGPMLX                                             1     0.1470E+01     
0.1460E+01     0.1470E+01     0.1470E+01
WRITINGPMLY                                             1     0.1580E+01     
0.1580E+01     0.1580E+01     0.1580E+01
WRITINGPMLZ                                             1     0.1730E+01     
0.1730E+01     0.1730E+01     0.1730E+01


1024 cores:

Timer                                   Number Iterations Mean real time  Mean 
CPU time        Minimum        Maximum
----------------------------------------                             (s)        
    (s)            (s)            (s)
WRITINGPMLX                                             1     0.5118E+02     
0.5118E+02     0.5118E+02     0.5118E+02
WRITINGPMLY                                             1     0.5228E+02     
0.5228E+02     0.5228E+02     0.5228E+02
WRITINGPMLZ                                             1     0.5296E+02     
0.5296E+02     0.5296E+02     0.5296E+02


Timer                                   Number Iterations Mean real time  Mean 
CPU time        Minimum        Maximum
----------------------------------------                             (s)        
    (s)            (s)            (s)
WRITINGPMLX                                             1     0.5185E+02     
0.5185E+02     0.5185E+02     0.5185E+02
WRITINGPMLY                                             1     0.5543E+02     
0.5543E+02     0.5543E+02     0.5543E+02
WRITINGPMLZ                                             1     0.5675E+02     
0.5675E+02     0.5675E+02     0.5675E+02

 Timer                                   Number Iterations Mean real time  Mean 
CPU time        Minimum        Maximum
----------------------------------------                             (s)        
    (s)            (s)            (s)
WRITINGPMLX                                             1     0.5035E+02     
0.5035E+02     0.5035E+02     0.5035E+02
WRITINGPMLY                                             1     0.5739E+02     
0.5739E+02     0.5739E+02     0.5739E+02
WRITINGPMLZ                                             1     0.5174E+02     
0.5175E+02     0.5174E+02     0.5174E+02


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]<mailto:[email protected]>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


--
Ángel de Vicente
http://www.iac.es/galeria/angelv/
---------------------------------------------------------------------------------------------
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protecci�n de 
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning 
the Protection of Data, consulthttp://www.iac.es/disclaimer.php?lang=en


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]<mailto:[email protected]>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Slow writing parallel HDF5 performance (for only one variable)

Reply via email to