Hi Wolf,
It is OK to have to read/write different amount of elements/data from each
processor. That is not the problem.
The problem is that you cannot have each processor specify a different layout
of the dataset on disk. This is the same problem for example as having 1
process says the layout of the dataset is contiguous and other says it's
chunked.
The solution is very simple.. just don't adjust the chunk size for the dataset
on the last process.
I modified the replicator that you provided and attached to demonstrate how
this would work (I didn't do a lot of testing on it, just on my local machine,
but it should work fine).
Thanks,
Mohamad
-----Original Message-----
From: Hdf-forum [mailto:[email protected]] On Behalf Of Wolf
Dapp
Sent: Wednesday, April 08, 2015 9:53 AM
To: HDF Users Discussion List
Subject: Re: [Hdf-forum] parallel HDF5: H5Fclose hangs when not using a power
of 2 number of processes
Hi Mohamad,
thanks for your reply, and thanks for pointing that out. I appreciate that each
processor has a different chunk size, but I wasn't aware this is a problem.
What would you suggest as a workaround, or solution? The number of elements per
processor /is/ objectively different, and if I simply give the last process the
/same/ chunk_size (without adjusting the file space), the program crashes
violently.
#001: H5Dio.c line 342 in H5D__pre_write(): file selection+offset not within
extent
major: Dataspace
minor: Out of range
If I set both dimsf[0] and chunk_dims[0] such that the (padded) data fits, and
each process writes the same chunk (i.e., if I pad the filespace), then it
works, but then the rest of my workflow will break down, because the file is
then not 32x32, but 33x32, with the last column just zeroes, and in addition
the file would be different depending on how many processes write to it. I
suppose I could somehow resize the filespace to get it back to the proper
dimension? However, I'd probably have the same problem reading the data back in.
Or should I write to the file independently? I suppose I'd pay a hefty
performance price... (in a production run, ~1000 processes write ~20 GB
collectively, repeatedly).
Is there a recommended way how to handle this? The only worked example I can
find for collectively writing different numbers of elements writes /nothing at
all/ on one of the processes (as opposed to a smaller number of elements than
the others):
http://www.hdfgroup.org/ftp/HDF5/examples/misc-examples/coll_test.c
Thanks again for your help!
Wolf
On 04/08/15 16:23, Mohamad Chaarawi wrote:
> Hi Wolf,
>
> I found the problem in your program. Note that the hang vs the error
> stack (from Tim's email) is just different behaviors of different MPI
> implementations or versions. One implementation hangs when a call to
> MPI_File_set_size() from inside HDF5 is done with different arguments,
> and the other actually reports the error.
>
> On to the mistake in your program now.. HDF5 requires the call to
> H5Dcreate be collective. That doesn't mean only that all processes
> have to call it, but also all processes have to call it with the same
> arguments. You are creating a chunked dataset with the same chunked
> dimensions except on the last process where you edit the first
> dimension (nxLocal). This happens here:
>
> if ((nx%iNumOfProc) != 0) {
> nxLocal += 1;
> ixStart = myID*nxLocal;
> if (myID == iNumOfProc-1)
> nxLocal -= (nxLocal*iNumOfProc-nx); // last proc has less elements
>
>
> }
>
> You pass nxLocal to the chunk dimensions here:
> chunk_dims[0] = nxLocal;
>
> As long as 32*numofprocesses is 0, you don't modify nxLocal on the
> last process, which explains why it works in those situations.
>
> Note that it is ok to Read and Write to datasets collectively with
> different arguments, but you have to create the dataset with the same
> arguments including the same chunk dimensions. So what you do above
> causes one process to see a dataset with different chunk sizes in its
> metadata cache, so on file close time, when processes flush their
> metadata cache, one process has a different size of the file than the
> other processes and this is what causes the problem.
>
> Makes sense?
>
> Thanks,
>
> Mohamad
--
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
#include <stdlib.h>
#include <string>
#include <cassert> // assert()
#include <iostream>
#include <mpi.h>
#include <hdf5.h>
int myID = 0, iNumOfProc = 1, root = 0;
const int nx = 32, ny = nx;
int nxLocal, ixStart;
inline int getLinearIndex(int ix, int iy) { return ix*ny+iy; };
template<typename T> struct TypeIsDouble { static const bool value = false; };
// helper functions to test double
template<> struct TypeIsDouble< double > { static const bool value =
true; };
template<typename T>
void writeH5( const char *name, double * field)
{
// nxLocal == how many cols (of length ny each) does this process own
[ie. the block is nxLocal x ny in size]
// ixStart == (global) index of the beginning of (local) block
// myID == index of local process
// field == local data array, of size nxLocal*ny, contiguous
// ===============================================================
// =================== write output to HDF5 ======================
// ===============================================================
#ifdef VERBOSE
if (myID == root) { std::cout << " # ...writing to file " << name <<
"...\n"; }
#endif // VERBOSE
/* HDF5 APIs definitions */
hid_t file_id, dset_id;/* file and dataset identifiers */
hid_t filespace, memspace;/* file and memory dataspace identifiers */
hsize_t dimsf[2];/* dataset dimensions */
hsize_t chunk_dims[2];/* chunk dimensions */
hsize_t count[2];/* hyperslab selection parameters */
hsize_t stride[2];
hsize_t block[2];
hsize_t offset[2];
hid_t plist_id;/* property list identifier */
herr_t status_h5;
hid_t datatype = (TypeIsDouble< T >::value) ? H5T_NATIVE_DOUBLE :
H5T_NATIVE_FLOAT;
T * data = NULL;
MPI_Info info = MPI_INFO_NULL;
/* Set up file access property list with parallel I/O access */
plist_id = H5Pcreate(H5P_FILE_ACCESS);
H5Pset_fapl_mpio(plist_id, MPI_COMM_WORLD, info);
/* Create a new file collectively and release property list identifier. */
std::string filename;
filename.append(name);
file_id = H5Fcreate(filename.c_str(), H5F_ACC_TRUNC, H5P_DEFAULT, plist_id);
status_h5 = H5Pclose(plist_id);
assert(status_h5 >= 0);
/* Create chunked dataset. */
plist_id = H5Pcreate(H5P_DATASET_CREATE);
status_h5 = H5Pset_layout( plist_id, H5D_CHUNKED ); assert(status_h5 >= 0);
chunk_dims[0] = nx/iNumOfProc;
chunk_dims[1] = ny;
status_h5 = H5Pset_chunk(plist_id, 2, chunk_dims); assert(status_h5 >= 0);
/* Create the dataspace for the dataset. */
dimsf[0] = nx;
dimsf[1] = ny;
chunk_dims[0] = nxLocal;
chunk_dims[1] = ny;
filespace = H5Screate_simple(2, dimsf, NULL);
memspace = H5Screate_simple(2, chunk_dims, NULL);
std::string fname = name;
std::size_t pos = fname.find("."); // position of the occurence of "."
in fname
std::string dataset = fname.substr (0,pos);
dset_id = H5Dcreate(file_id, dataset.c_str(), datatype, filespace,
H5P_DEFAULT, plist_id, H5P_DEFAULT); // dataset.c_str() is the data set name
status_h5 = H5Pclose(plist_id); assert(status_h5 >= 0);
status_h5 = H5Sclose(filespace); assert(status_h5 >= 0);
/* Each process defines dataset in memory and writes it to the hyperslab in
the file. */
count[0] = 1; // one block in x-direction
count[1] = 1; // one block in y-direction
stride[0] = 1; // the blocks have no gaps between them in
x-direction
stride[1] = 1; // the blocks have no gaps between them in
y-direction
block[0] = chunk_dims[0]; // contiguous x-size of block is chunk_dims[0]
block[1] = chunk_dims[1]; // contiguous y-size of block is chunk_dims[1]
offset[0] = ixStart; // x-index [in data file] of (0,0) entry [in
memory] of block
offset[1] = 0; // y-index [in data file] of (0,0) entry [in
memory] of block
/* Select hyperslab in the file. */
filespace = H5Dget_space(dset_id);
status_h5 = H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, stride,
count, block); assert(status_h5 >= 0);
/* Create property list for collective dataset write. */
plist_id = H5Pcreate(H5P_DATASET_XFER);
H5Pset_dxpl_mpio(plist_id, H5FD_MPIO_COLLECTIVE);
/* Initialize data buffer; memory layout *may be* different from file
layout, due to padding */
data = (T *) calloc(chunk_dims[0]*chunk_dims[1],sizeof(T));
for(ptrdiff_t ix = 0; ix < nxLocal; ix++)
{
for(ptrdiff_t iy = 0; iy < ny; iy++)
{
ptrdiff_t kmem = getLinearIndex(ix,iy);
ptrdiff_t kfile = ix*ny + iy;
data[kfile] = field[kmem];
}
}
/* Write to file */
status_h5 = H5Dwrite(dset_id, datatype, memspace, filespace, plist_id,
data); assert(status_h5 >= 0);
free(data);
/* Close/release resources. */
status_h5 = H5Dclose(dset_id); assert(status_h5 >= 0);
status_h5 = H5Sclose(filespace); assert(status_h5 >= 0);
status_h5 = H5Sclose(memspace); assert(status_h5 >= 0);
status_h5 = H5Pclose(plist_id); assert(status_h5 >= 0);
std::cout << "proc"<<myID << " I CAN REACH THIS\n";
status_h5 = H5Fclose(file_id); assert(status_h5 >= 0); // WARNING number
of procs must be power of 2, else this call deadlocks; no clue why
std::cout << "proc"<<myID << " BUT NOT THIS\n";
}
int main(int argc, char **argv)
{
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &iNumOfProc);
MPI_Comm_rank (MPI_COMM_WORLD, &myID);
if (1.0*nx/iNumOfProc < 4) {
if (myID == root)
std::cerr << " ##### np " << iNumOfProc << ": System too
small.\n\n";
MPI_Finalize();
exit(0);
}
nxLocal = nx/iNumOfProc;
ixStart = myID*nxLocal;
if ((nx%iNumOfProc) != 0) {
nxLocal += 1;
ixStart = myID*nxLocal;
if (myID == iNumOfProc-1)
nxLocal -= (nxLocal*iNumOfProc-nx); // last proc has less elements
}
std::cout << " proc"<<myID << " " << nxLocal << " " << ixStart << " " <<
nxLocal*ny << std::endl;
double * data;
data = (double *) calloc(nxLocal*ny,sizeof(double));
for (int ix = 0; ix < nxLocal; ++ix) {
for (int iy = 0; iy < ny; ++iy) {
data[ix*ny+iy] = double((ix+ixStart)*ny+iy);
}
}
std::string filename = "hangs_test.", fname;
char buffer[50];
sprintf(buffer,"np%02d",(int)iNumOfProc);
filename.append(buffer);
fname = filename; fname.append(".h5");
writeH5<float>(fname.c_str(),data);
free(data);
if (myID == root) std::cout << " ### done.\n\n";
MPI_Finalize();
return(0);
}
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5