Re: [Hdf-forum] Parallel hdf5 and writing region references

Mohamad Chaarawi Mon, 23 Nov 2015 13:16:52 -0800

Hi Pramod,

From: Hdf-forum [mailto:[email protected]] On Behalf Of 
pramod kumbhar
Sent: Monday, November 23, 2015 9:25 AM
To: [email protected]
Subject: [Hdf-forum] Parallel hdf5 and writing region references


Dear All,

Apologies for cross-posting! If anyone has come across below issue, please let 
me know.

I was trying to write compound datatype (containing region reference) in 
parallel using collective operation. With h5dump I saw that the region 
references were not created correctly. I looked into "Collective Calling 
Requirements" document and then realised that writing region reference is not 
supported in parallel i/o. I tried simple test example which gives an error 
"H5Dio.c line 664 in H5D__write(): Parallel IO does not support writing region 
reference datatypes yet".

But note that this error does't appear if I am writing a compound datatype 
containing region reference. Is this allowed/possible? (i.e. first create 
region reference and store it as the member of compound datatype and then write 
compound datatype in parallel collectively).

[msc] This isn’t possible to do, and the fact that it does not return the same 
failure as writing a region reference directly indicates that this is a bug in 
the library and we should address it. I created a ticket for this 
(HDFFV-9619<https://jira.hdfgroup.org/browse/HDFFV-9619>).

The dataset that I am creating is for neuron cells which are vary diverse in 
terms of sizes. Each mpi rank is writing variable-length data to dataset and I 
thought region-reference would be helpful but now have the issue while writing 
region references.

What are possible alternatives? I can think of:
- write <count,offset> pair to emulate region reference? (but lose flexibility 
as other users are going to use tools/python libraries to read datasets and in 
this case they have to use count+offset pair)
- another option could be to write datasets first and then single rank can 
create and write region references. (each cell has thousands of 
region_references and for large simulations this may not scale (?) )

[msc] Unfortunately variable length data & region reference are not supported 
in parallel. There haven’t been enough use cases to push support for them. I 
believe there is a way to add support, but does require some engineering effort 
to do so.
Other approaches you may try is one file per process approach, but not sure if 
you want to deal with a large number of files post data generation. Or you 
could access the file one process at a time in a round robin fashion (poor’s 
man parallel I/O), but again this might not be very scalable.

Maybe others who have experience working with region references and VL 
datatypes have other alternatives and can chime in. I haven’t dealt with use 
cases in parallel HDF5 that required VL datatypes.

Thanks,
Mohamad

Any other suggestions?

Thanks,
Pramod

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Parallel hdf5 and writing region references

Reply via email to