Hi, HDF users, I am trying to write a 1D array from several processes in an anarchic way. Each proc has a subset of the array to write, but elements are not contiguous and unsorted. Each proc knows the positions where it should write each element.
With some help from the thread http://hdf-forum.184993.n3.nabble.com/HDF5-Parallel-write-selection-using-hyperslabs-slow-write-tp3935966.html I tried to implement it. First, the master proc (only that one) creates the file: // create file hid_t fid = H5Fcreate( name.c_str(), H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); // prepare file space hsize_t dims[2] = {1, global_nElements}; hsize_t max_dims[2] = {H5S_UNLIMITED, global_nElements}; // not really needed but for future use hid_t file_space = H5Screate_simple(2, dims, max_dims); // prepare dataset hid_t plist = H5Pcreate(H5P_DATASET_CREATE); H5Pset_layout(plist, H5D_CHUNKED); hsize_t chunk_dims[2] = {1, global_nElements}; H5Pset_chunk(plist, 2, chunk_dims); // create dataset hid_t did = H5Dcreate(fid, "Id", H5T_NATIVE_UINT, file_space, H5P_DEFAULT, plist, H5P_DEFAULT); H5Dclose(did); H5Pclose(plist); H5Sclose(file_space); H5Fclose( fid ); Then, all procs open the file and write their subset: // define MPI file access hid_t file_access = H5Pcreate(H5P_FILE_ACCESS); H5Pset_fapl_mpio( file_access, MPI_COMM_WORLD, MPI_INFO_NULL ); // define MPI transfer mode hid_t transfer = H5Pcreate(H5P_DATASET_XFER); // Open the file hid_t fid = H5Fopen( name.c_str(), H5F_ACC_RDWR, file_access); // Open the existing dataset hid_t did = H5Dopen( fid, dataset.c_str(), H5P_DEFAULT ); // Get the file space hid_t file_space = H5Dget_space(did); // Define the memory space for this proc hsize_t count[2] = {1, (hsize_t) local_nElements}; hid_t mem_space = H5Screate_simple(2, count, NULL); // Select the elements for this particular proc (the `coords` array has been created before) H5Sselect_elements( file_space, H5S_SELECT_SET, local_nElements, coords ); // Write the previously generated `data` array H5Dwrite( did, H5T_NATIVE_UINT, mem_space , file_space , transfer, data ); // Close stuff H5Sclose(file_space); H5Dclose(did); H5Fclose( fid ); This version works but is VERY SLOW: more than 10 times slower than writing with 1 proc without H5Sselect_elements. Is this to be expected? Is there a way to make it faster? Using H5Pget_mpio_actual_io_mode, I realized that it was not using collective transfer, so I tried to force it using the following: H5Pset_dxpl_mpio( transfer, H5FD_MPIO_COLLECTIVE); But unfortunately, I get tons of the following error: HDF5-DIAG: Error detected in HDF5 (1.8.14) MPI-process 0: #000: H5Dio.c line 271 in H5Dwrite(): can't prepare for writing data major: Dataset minor: Write failed #001: H5Dio.c line 352 in H5D__pre_write(): can't write data major: Dataset minor: Write failed #002: H5Dio.c line 788 in H5D__write(): can't write data major: Dataset minor: Write failed #003: H5Dmpio.c line 757 in H5D__chunk_collective_write(): write error major: Dataspace minor: Write failed #004: H5Dmpio.c line 685 in H5D__chunk_collective_io(): couldn't finish linked chunk MPI-IO major: Low-level I/O minor: Can't get value #005: H5Dmpio.c line 881 in H5D__link_chunk_collective_io(): couldn't finish shared collective MPI-IO major: Data storage minor: Can't get value #006: H5Dmpio.c line 1401 in H5D__inter_collective_io(): couldn't finish collective MPI-IO major: Low-level I/O minor: Can't get value #007: H5Dmpio.c line 1445 in H5D__final_collective_io(): optimized write failed major: Dataset minor: Write failed #008: H5Dmpio.c line 297 in H5D__mpio_select_write(): can't finish collective parallel write major: Low-level I/O minor: Write failed #009: H5Fio.c line 171 in H5F_block_write(): write through metadata accumulator failed major: Low-level I/O minor: Write failed #010: H5Faccum.c line 825 in H5F__accum_write(): file write failed major: Low-level I/O minor: Write failed #011: H5FDint.c line 246 in H5FD_write(): driver write request failed major: Virtual File Layer minor: Write failed #012: H5FDmpio.c line 1802 in H5FD_mpio_write(): MPI_File_set_view failed major: Internal error (too specific to document in detail) minor: Some MPI function failed #013: H5FDmpio.c line 1802 in H5FD_mpio_write(): MPI_ERR_ARG: invalid argument of some other kind major: Internal error (too specific to document in detail) minor: MPI Error String The same happens with both HDF5 1.8.14 and 1.8.15 Any ideas how to fix this ? Thank you Fred
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
