On 11/13/2014 06:34 AM, Angel de Vicente wrote:
thanks. I'm not sure if these could be tuned a bit better, but with the
following hints the problem is all gone in the two problematic clusters
(for a given file size, one of the writing modes of the program was
taking about ~200x more time. With these hints all is back to normal,
and the problematic mode takes just the same time as the other ones).
You can pass anything you want for the "key": implementations will ignore hints
they do not understand. For the sake of anyone googling in the future, I will explain
what, if anything, the hints you passed in do:
call MPI_Info_create(info, error)
call MPI_Info_set(info,"IBM_largeblock_io","true", error)
this hint is useful for IBM PE platforms and tells GPFS you are about to do
large I/O. Over time, this hint will become less useful: IBM is moving away
from their own MPI-IO implementation and incorporating ROMIO.
call MPI_Info_set(info,"stripping_unit","4194304", error)
this one is probably the biggest help. In Collective I/O, ROMIO splits up the file into "file
domains" (and assigns those domains to a subset of processors called I/O aggregators). When
the "striping_unit" hint is set, ROMIO will align those file domains to that
striping_unit.
Sometimes, like on Blue Gene, ROMIO will detect the file system block size for
you, and this hint is not needed. No harm in providing it, though.
CALL
MPI_INFO_SET(info,"H5F_ACS_CORE_WRITE_TRACKING_PAGE_SIZE_DEF","524288",error)
I don't think this hint does anything.
CALL MPI_INFO_SET(info,"ind_rd_buffer_size","41943040", error)
CALL MPI_INFO_SET(info,"ind_wr_buffer_size","5242880", error)
CALL MPI_INFO_SET(info,"romio_ds_read","disable", error)
CALL MPI_INFO_SET(info,"romio_ds_write","disable", error)
No harm here, but if you are going to disable data sieving (romio_ds_read and
romio_ds_write) then there's no reason to tweak the independent read and write
buffer sizes.
CALL MPI_INFO_SET(info,"romio_cb_write","enable", error)
On many platforms (but not Blue Gene), romio will look at the access pattern.
If the pattern is not interleaved, ROMIO will not use collective buffering. At
today's scale, collective buffering is almost always a win, especially on GPFS
when combined with striping_unit.
CALL MPI_INFO_SET(info,"cb_buffer_size","4194304", error)
this buffer size might actually be a bit small, depending on how much data you
are writing/reading. If you have memory to spare, increasing this value is
often a good way to improve performance.
For the moment, problem solved. Thanks a lot,
tuning these stacks honestly way harder than it should be. thanks for your
persistence.
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5