Hello, I am updating a Monte Carlo particle transport code written in Fortran to be run on Titan (http://en.wikipedia.org/wiki/Titan_%28supercomputer%29), and I'm having some trouble achieving good performance writing to many small sections of large datasets with independent I/O. Monitoring with du, it looks like I'm writing only about 50 MB/s while creating a 78GB file. I was hoping to get much better from the Lustre filesystem.
For my current case I'm trying to write to 3000 1D datasets in the same file containing 3,534,960 doubles each, where each individual write will always be a contiguous 1D chunk of 206 doubles. These writes are coming from 980 different MPI processes, and each chunk of 206 that needs to be written could be in any of the 3000 datasets, anywhere within the large 1D array. I'm doing things in batches, where on each MPI process I collect a bunch of these chunks of writes on each process before doing a round of writing. It's trivial to calculate where in each of the large datasets each chunk should be written to, so at first I just looped through each chunk to write to and selected the appropriate hyperslab before writing. I figured this might be slow because it was jumping all around the datasets for each little write, so I switched to a scheme where I first sort the writes in the order they would appear in the dataset, and then create a large union of irregularly-spaced hyperslabs and do one big write, similar to the difference discussed here: http://hdf-forum.184993.n3.nabble.com/reading-multiple-irregularly-spaced-hyperslabs-td426929.html . This gave me about a 2 or 3x speedup from where I was, but I'm still pretty unsatisfied with these write speeds. I'm still a little new to HDF5, so I'm worried I might be missing something fundamental. I've done a little playing around with setting the chunking on these datasets to multiples of 206, but I haven't had any success there. Is there a way to accomplish the write pattern I described more efficiently than what I'm doing? Example code below showing what I'm doing now. This is with hdf5 1.8.11. Thanks in advance for any advice! Nick --- integer(HID_T) :: file_id, group_id, dset, dspace, plist integer(HSIZE_T) :: dims(1), block(1), count(1), start(1) ! All the datasets are created up front once dims(1) = 3534960 do i = 1, 3000 call h5fopen_f(filename, H5F_ACC_RDWR_F, file_id, hdf5_err) call h5gopen_f(file_id, groupname(i), group_id, hdf5_err) call h5screate_simple_f(1, dims, dspace, hdf5_err) call h5dcreate_f(group_id, 'data', H5T_NATIVE_DOUBLE, dspace, dset, hdf5_err) call h5dclose_f(dset, hdf5_err) call h5sclose_f(dspace, hdf5_err) call h5gclose_f(group_id, hdf5_err) call h5fclose_f(file_id, hdf5_err) end do ... ! The rest is run on each MPI process, which has n_chunks chunks of 206 to write ! The starting index in the large dataset for each chunk j is stored in chunk_starts(j) ! The chunks themselves are stored in one large contiguous array of doubles: chunks ! Setup file access property list with parallel I/O access call h5pcreate_f(H5P_FILE_ACCESS_F, plist, hdf5_err) call h5pset_fapl_mpio_f(plist, MPI_COMM_WORLD, MPI_INFO_NULL, hdf5_err) ! Open the file call h5fopen_f(filename, H5F_ACC_RDWR_F, file_id, hdf5_err, access_prp = plist) ! Close property list call h5pclose_f(plist, hdf5_err) ! Create the property list to describe independent parallel I/O call h5pcreate_f(H5P_DATASET_XFER_F, plist, hdf5_err) call h5pset_dxpl_mpio_f(plist, H5FD_MPIO_INDEPENDENT_F, hdf5_err) do i = 1, 3000 block = 206 count = 1 ! Open the group call h5gopen_f(file_id, groupname(i), group_id, hdf5_err) ! Open the dataset call h5dopen_f(group_id, 'data', dset, hdf5_err) ! Open the dataspace and memory space call h5dget_space_f(dset, dspace, hdf5_err) call h5screate_simple_f(1, block * n_chunks, memspace, hdf5_err) ! Select the irregular hyperslab call h5sselect_none_f(dspace, hdf5_err) do j = 1, n_chunks idx = chunk_start(j) start = (idx - 1) * block ! Select the hyperslab call h5sselect_hyperslab_f(dspace, H5S_SELECT_OR_F, start, & count, hdf5_err, block = block) end do ! Write the data chunks = chunks_for_group(i) f_ptr = c_loc(chunks(1)) call h5dwrite_f(dset, H5T_NATIVE_DOUBLE, f_ptr, hdf5_err, & file_space_id = dspace, mem_space_id = memspace, & xfer_prp = plist) ! Close the dataspace and memory space call h5sclose_f(dspace, hdf5_err) call h5sclose_f(memspace, hdf5_err) end do ! Close the group and file call h5gclose_f(group_id, hdf5_err) call h5fclose_f(file_id, hdf5_err)
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
