My mistake, results for 300:
nblocks = 2, run on 8 cores (1 thread per core), 1 node.
Timer Number Iterations Mean real time Mean
CPU time Minimum Maximum
---------------------------------------- (s)
(s) (s) (s)
WRITINGPMLX 1 0.8300E+00
0.8300E+00 0.8300E+00 0.8300E+00
WRITINGPMLY 1 0.8100E+00
0.8100E+00 0.8100E+00 0.8100E+00
WRITINGPMLZ 1 0.7500E+00
0.7400E+00 0.7500E+00 0.7500E+00
nblocks = 4, run on 64 cores (1 thread per core), 4 nodes.
Timer Number Iterations Mean real time Mean
CPU time Minimum Maximum
---------------------------------------- (s)
(s) (s) (s)
WRITINGPMLX 1 0.7800E+00
0.7700E+00 0.7800E+00 0.7800E+00
WRITINGPMLY 1 0.1020E+01
0.1020E+01 0.1020E+01 0.1020E+01
WRITINGPMLZ 1 0.8500E+00
0.8500E+00 0.8500E+00 0.8500E+00
I’m not sure if that is too terrible. I did not use any MPI-IO hints.
An example in Fortran for MPI hints is:
INTEGER :: info
CALL MPI_Info_create(info, mpierror )
CALL MPI_Info_set(info, “IBM_largeblock_io", "true", mpierror);
CALL h5pset_fapl_mpio_f(plist_id, MPI_COMM_WORLD, info, hdferr)
You can change "quoted” code to Rob’s suggestions and like Rob said, running it
with Darshan should help in tuning,
http://www.mcs.anl.gov/research/projects/darshan/
Scot
On Jul 22, 2014, at 1:31 AM, Angel de Vicente
<[email protected]<mailto:[email protected]>> wrote:
Hi Scot,
thanks for trying this out.
Scot Breitenfeld <[email protected]<mailto:[email protected]>> writes:
Can you be more specific about the hardware and the software you are
using for each case (especially for the “very bad” case)?
What architecture?
Parallel file system type?
What compiler/mpi type and version?
What version of HDF?
If we focus on the "very bad" case:
Hardware:
+ Each node has 2x E5–2670 SandyBridge-EP chips, for a total of 16 cores
per node
+ Network is Infiniband
+ Parallel file system: GPFS
As per the software versions:
+ Intel compilers, version: 13.0.1 20121010
+ Intel(R) MPI Library for Linux* OS, Version 4.1 Update 1 Build 20130507
+ HDF version: HDF5 1.8.10
The "good" case:
Hardware:
+ Each node has 2x E5-2680 SandaBridge chips, for a total of 16 cores
per node
+ Network is Infiniband
+ Parallel file system: Lustre
Software:
+ Intel compilers, version 14.0.3 20140422
+ BullXMPI, which AFAIK is a fork of Open MPI, version. 1.2.7.2
+ HDF version: HDF5 1.8.9
These are the timings for your program on GPFS using hdf5 trunk, xlf
compiler, mpich 3.1.1. I don’t see a large difference in writing times
between datasets.
These timings look really good, but how did you run the 1024 cores one?
I mean, the code in Pastebin assumes that it will be run with 64 cores
(nblocks = 4), so I guess for the 8 cores run you set that to (nblocks =
2). And for 1024 cores?
Again, thanks a lot for your help. Any pointer appreciated,
Ángel de Vicente
8 cores:
Timing report:
Timer Number Iterations Mean real time Mean
CPU time Minimum Maximum
---------------------------------------- (s)
(s) (s) (s)
WRITINGPMLX 1 0.2100E+00
0.2000E+00 0.2100E+00 0.2100E+00
WRITINGPMLY 1 0.1600E+00
0.1600E+00 0.1600E+00 0.1600E+00
WRITINGPMLZ 1 0.1600E+00
0.1600E+00 0.1600E+00 0.1600E+00
Timer Number Iterations Mean real time Mean
CPU time Minimum Maximum
---------------------------------------- (s)
(s) (s) (s)
WRITINGPMLX 1 0.4500E+00
0.4500E+00 0.4500E+00 0.4500E+00
WRITINGPMLY 1 0.4000E+00
0.4000E+00 0.4000E+00 0.4000E+00
WRITINGPMLZ 1 0.4400E+00
0.4500E+00 0.4400E+00 0.4400E+00
Timer Number Iterations Mean real time Mean
CPU time Minimum Maximum
---------------------------------------- (s)
(s) (s) (s)
WRITINGPMLX 1 0.1470E+01
0.1460E+01 0.1470E+01 0.1470E+01
WRITINGPMLY 1 0.1580E+01
0.1580E+01 0.1580E+01 0.1580E+01
WRITINGPMLZ 1 0.1730E+01
0.1730E+01 0.1730E+01 0.1730E+01
1024 cores:
Timer Number Iterations Mean real time Mean
CPU time Minimum Maximum
---------------------------------------- (s)
(s) (s) (s)
WRITINGPMLX 1 0.5118E+02
0.5118E+02 0.5118E+02 0.5118E+02
WRITINGPMLY 1 0.5228E+02
0.5228E+02 0.5228E+02 0.5228E+02
WRITINGPMLZ 1 0.5296E+02
0.5296E+02 0.5296E+02 0.5296E+02
Timer Number Iterations Mean real time Mean
CPU time Minimum Maximum
---------------------------------------- (s)
(s) (s) (s)
WRITINGPMLX 1 0.5185E+02
0.5185E+02 0.5185E+02 0.5185E+02
WRITINGPMLY 1 0.5543E+02
0.5543E+02 0.5543E+02 0.5543E+02
WRITINGPMLZ 1 0.5675E+02
0.5675E+02 0.5675E+02 0.5675E+02
Timer Number Iterations Mean real time Mean
CPU time Minimum Maximum
---------------------------------------- (s)
(s) (s) (s)
WRITINGPMLX 1 0.5035E+02
0.5035E+02 0.5035E+02 0.5035E+02
WRITINGPMLY 1 0.5739E+02
0.5739E+02 0.5739E+02 0.5739E+02
WRITINGPMLZ 1 0.5174E+02
0.5175E+02 0.5174E+02 0.5174E+02
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]<mailto:[email protected]>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
--
Ángel de Vicente
http://www.iac.es/galeria/angelv/
---------------------------------------------------------------------------------------------
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protecci�n de
Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law concerning
the Protection of Data, consulthttp://www.iac.es/disclaimer.php?lang=en
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]<mailto:[email protected]>
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5