Hi,
> Are you experiencing some catastrophic resource
> consumption with 4096?
OK, to try and roughly characterize the performance for large numbers
of datasets, I ran the following stress test in c:
--- hdf5_stress_test.c ---
#include <stdio.h>
#include <stdlib.h>
#include "H5LT.h"
int
main(void)
{
hid_t file_id;
hsize_t dims[2];
int data[256];
char dset_name[32];
herr_t status;
int i;
int total = 1000000;
dims[0] = 16;
dims[1] = 16;
file_id = H5Fcreate("hdf5_stress_test.h5", H5F_ACC_TRUNC,
H5P_DEFAULT, H5P_DEFAULT);
for (i=0; i<total; ++i) {
sprintf(dset_name, "/dset_%07d", i);
status = H5LTmake_dataset(file_id, dset_name, 2, dims,
H5T_NATIVE_INT, data);
if (!(i%1000))
printf("\r[%07d/%07d]", i, total);
fflush(stdout);
}
status = H5Fclose(file_id);
return 0;
}
--- ---
This seemed to run OK, taking ~4m30s and not using more than 130MB of
RAM. This created a file of size 1.4GB, which seems to be quite a bit
of overhead (it should be (256*4*10^6)/(2**20) = 976MB), but there
may be extra meta-data stored which stays at a constant per dataset.
However, h5dump just sits there when I run it and does nothing (have
waited ~10mins). These things seem to suggest that hdf5 might not be
the best choice for my needs. Can anyone recommend an alternative
which is known to scale better?
Thanks,
James
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users