Hi,

> Are you experiencing some catastrophic resource
> consumption with 4096?
OK, to try and roughly characterize the performance for large numbers
of datasets, I ran the following stress test in c:

--- hdf5_stress_test.c ---
#include <stdio.h>
#include <stdlib.h>

#include "H5LT.h"

int
main(void)
{
  hid_t file_id;
  hsize_t dims[2];
  int data[256];
  char dset_name[32];
  herr_t status;
  int i;
  int total = 1000000;

  dims[0] = 16;
  dims[1] = 16;

  file_id = H5Fcreate("hdf5_stress_test.h5", H5F_ACC_TRUNC,
H5P_DEFAULT, H5P_DEFAULT);

  for (i=0; i<total; ++i) {
    sprintf(dset_name, "/dset_%07d", i);
    status = H5LTmake_dataset(file_id, dset_name, 2, dims,
H5T_NATIVE_INT, data);
    if (!(i%1000))
      printf("\r[%07d/%07d]", i, total);
    fflush(stdout);
  }
  status = H5Fclose(file_id);

  return 0;
}
--- ---
This seemed to run OK, taking ~4m30s and not using more than 130MB of
RAM. This created a file of size 1.4GB, which seems to be quite a bit
of overhead (it should be (256*4*10^6)/(2**20) = 976MB),  but there
may be extra meta-data stored which stays at a constant per dataset.
However, h5dump just sits there when I run it and does nothing (have
waited ~10mins). These things seem to suggest that hdf5 might not be
the best choice for my needs. Can anyone recommend an alternative
which is known to scale better?

Thanks,
James

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to