Hi Maxime,

H5Dwrite is for writing raw data, and unlike HDF5 metadata operations the 
library does not require them to be collective unless you ask for it.
For a list of HDF5 function calls which are required to be collective, look 
here:
http://www.hdfgroup.org/HDF5/doc/RM/CollectiveCalls.html

For raw data, we do not detect whether you are writing to the same position in 
the file or not, and so we just pass the data down onto MPI to write 
collectively or independently. The concept of collective I/O in MPI is to have 
all processes work together to write different portions of the file, not the 
same portion. So if you have 2 processes writing values X and Y to the same 
offset in the file, both writes will happen and the result is really undefined 
in MPI semantics.  Now if you do collective I/O with 2 processes writing X and 
Y to two adjacent positions in the file, mpi io would internally have 1 rank 
combine the two writes and execute it itself rather than have 2 smaller writes 
from different processes to the parallel file system. This is a very simple 
example of what collective I/O is. Of course things get more complicated with 
more processes and more data :-)

Not that it matters here, but note that in your code, you set your dxpl to use 
independent I/O and not collective: 
H5Pset_dxpl_mpio(md_plist_id, H5FD_MPIO_INDEPENDENT);

As for the attribute, it depends on what you need. I don’t know what sort of 
metadata you want to store. If that metadata is related to every large dataset 
that you write, then you should create that attribute on every large dataset. 
If it is metadata for the entire file, then you can just create it on the root 
group "/" (note this is not a dataset, but a group object.. those are 2 
different HDF5 objects. Look into the HDF5 user guide if you need more 
information). Note that attribute operations are regarded as HDF5 metadata 
operations, unlike H5Dread and H5Dwrite, and are always required to be 
collective, and should be called with the same parameters and values from all 
processes. HDF5 internally manages the metadata cache operations to the file 
system in that case so you don't end up writing multiple times to the file as 
was the case with what you were doing with raw data writes with H5Dwrite. 
Note also that If you call H5LTset_attribute_string() twice with the same 
attribute name, the older one is overwritten. So it really depends what you 
want to store as metadata and how.. 

Thanks,
Mohamad

-----Original Message-----
From: Hdf-forum [mailto:[email protected]] On Behalf Of 
Maxime Boissonneault
Sent: Wednesday, January 28, 2015 9:43 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] File keeps being updated long after the dataset is 
closed

Hi Mohamad,
Le 2015-01-28 10:32, Mohamad Chaarawi a écrit :
> Ha.. I bet that writeMetaDataDataset  is the culprit here...
> so you are saying that you create a scalar dataset (with 1 element), and then 
> write that same element n (n being the number of processes) the same time 
> from all processes? Why would you need to do such a thing in the first place? 
> If you need to write that element, you should just call writeMetaDataDataset 
> from rank 0. If you don't need that float, then you should just not write it 
> at all.
I was under the impression that HDF5 (or MPI IO) managed under the hood which 
process actually wrote data, and that such a small dataset would end up being 
written only by one rank. I actually thought that H5Dwrite, H5*close *needed* 
to be called by all processes, i.e. that they were collective.

I guess that at least H5Fclose is collective, since all processes need to close 
the file. Are the other ones not collective ?
> You called the metadata dataset an empty dataset essentially, so I understand 
> that you don't need it? If that is the case, then why not create the 
> attributes on the root group, or a different sub group for the current run, 
> or even the large dataset?
I did not know that there was a default, root, dataset. So you are saying that 
I can simply call H5LTset_attribute_string(file_id, "root", key, value) without 
creating a dataset first ?

I do not attach the metadata to the large dataset, because it is collective 
metadata and there may be more than one large dataset in the same file.
>
> What you are causing is having every process grab a lock on that file system 
> OST block, write that element and then release the lock. This is happening 
> 960 times in your case, which I interpret what is causing this performance 
> degradation..
This makes sense. I will test and make changes accordingly.

Maxime

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to