Hi,

I have a serious performance issue using phdf5 to writing lot of 1D float array 
data on clusters when the number of processers exceeds about 96.

I profiled the code and it shows that most of the MPI time is spent on 
H5Dcreate.

The writing (independent) is pretty quick. Are there any ways to speed up 
performance of the collective object definition? 

Ideally ones that don't involve tailoring settings to a specific cluster.

Here is the function that is slow to finish (and often hangs due to exceeding 
memory?) on more than ~96 processers:

herr_t ASDF_define_waveforms(hid_t loc_id, int num_waveforms, int nsamples,
                            long long int start_time, double sampling_rate,
                            char *event_name, char **waveform_names,
                            int *data_id) {
  int i;
  char char_sampling_rate[10];
  char char_start_time[10];

  // converts to decimal base.
  snprintf(char_start_time, sizeof(char_start_time), "%lld", start_time);
  snprintf(char_sampling_rate,
           sizeof(char_sampling_rate), "%1.7f", sampling_rate);

  for (i = 0; i < num_waveforms; ++i) {
    //CHK_H5(groups[i] = H5Gcreate(loc_id, waveform_names[i],
    //                      H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT));

    hid_t space_id, dcpl;
    hsize_t dims[1] = {nsamples}; // Length of waveform
    hsize_t maxdims[1] = {H5S_UNLIMITED};

    CHK_H5(space_id= H5Screate_simple(1, dims, maxdims));
    CHK_H5(dcpl = H5Pcreate(H5P_DATASET_CREATE));
    CHK_H5(H5Pset_chunk(dcpl, 1, dims));

    CHK_H5(data_id[i] = H5Dcreate(loc_id, waveform_names[i], H5T_IEEE_F32LE, 
space_id,
                                  H5P_DEFAULT, dcpl, H5P_DEFAULT));

    CHK_H5(ASDF_write_string_attribute(data_id[i], "event_id",
                                       event_name));
    CHK_H5(ASDF_write_double_attribute(data_id[i], "sampling_rate",
                                       sampling_rate));
    CHK_H5(ASDF_write_integer_attribute(data_id[i], "starttime",
                                       start_time));

    CHK_H5(H5Pclose(dcpl));
    CHK_H5(H5Sclose(space_id));
  }
  return 0; // Success
}

It is run in Fortran code in 3 do loops like this:

do k = 1 mysize
  do j = 1, num_stations_rank(k)
    do i = 1, 3
      call ASDF_define_waveforms(...)
    enddo
  enddo
enddo

So when mysize >96 this is a pretty large number of calls. Any help is 
appreciated.

Thanks,
James


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to