Juha Jäykkä <[email protected]> writes: > For small files, chunking is probably not going to change performance in any > significant manner, so one option could be to simply not chunk small files at > all
This is effectively what is done now, considering that HDF5 needs chunking to be enabled to use H5S_UNLIMITED. > and then chunk big files "optimally" – whatever that means. HDFgroup > seems to think that "the chunk size be approximately equal to the > average expected size of the data block needed by the application." > (http://www.hdfgroup.org/training/HDFtraining/UsersGuide/Perform.fm2.html) > For more chunking stuff: > > In the case of PETSc I think that means not the WHOLE application, but one > MPI > rank (or perhaps one SMP host running a mixture of MPI ranks and OpenMP > threads), which is probably always going to be < 4 GB (except perhaps in the > mixture case). Output uses a collective write, so the granularity of the IO node is probably more relevant for writing (e.g., BG/Q would have one IO node per 128 compute nodes), but almost any chunk size should perform similarly. It would make a lot more difference for something like visualization where subsets of the data are read, typically with independent IO. > turning chunking completely off works too Are you sure? Did you try writing a second time step? The documentation says that H5S_UNLIMITED requires chunking. > See above, but note also that there can at most be 64k chunks in the file, so > fixing the chunk size to 10 MiB means limiting file size to 640 GiB. Thanks for noticing this limit. This might come from the 64k limit on attribute sizes. > My suggestion is to give PETSc a little more logic here, something like this: > > if sizeof(data) > 4GiB * 64k: no chunking # impossible to chunk! > elif sizeof(data) < small_file_limit: no chunking # probably best for speed > elif current rank's data size < 4 GB: chunk using current ranks data size Chunk size needs to be collective. We could compute an average size From each subdomain, but can't just use the subdomain size. > else divide current rank's data size by 2**(number of dimensions) until < 4 > GB > and then use that chunk size. We might want the chunk size to be smaller than 4GiB anyway to avoid out-of-memory problems for readers and writers. I think the chunk size (or maximum chunk size) should be settable by the user.
pgpFUakWFEjqf.pgp
Description: PGP signature
