Hello,
My program that produces the data is written in c++ and I should be able to
link high level C API library H5TB.
The table has mixed types of ‘fields’ (or columns) (e.g. uint64, float etc).
The table can have up to 80 columns and 10000 rows, but it should be able to
scale to larger dimensions. The dimension of the table is fixed for each
production of the table.
On the consumer side, I’m considering PyPandas or PyTables. The program on
consumer side needs to apply numeric functions along each column, so storing
the columns rather than the rows in continuous space is much more efficient for
the consumer side program. Performance is more critical on consumer side as
it is aggregating output from multiple producer programs.
I’m considering H5TB high level API, along with the block write (i.e. write
fixed number of rows) approach proposed by Darryl in this forum
http://hdf-forum.184993.n3.nabble.com/hdf-forum-Efficient-Way-to-Write-Compound-Data-td193448.html#a193447.
On a write to a file, I would like the column rather than the row for each
block to be stored in contiguous memory in the H5 file, assuming that this will
help with performance when PyTables on consumer side accesses the table by
column (not by row).
Th example codes on Chunking
http://www.hdfgroup.org/HDF5/doc/Advanced/Chunking/ shows that chunk_dims[2]
has 2 elements. For example, if the block has 1000 rows, I would use
chunk_dim[2] = {1000, 1} so that the 1000 rows for each column is stored in
contiguous piece or memory.
Does H5TBmake_table() support such chunking dimension and if so, what is the
syntax that I would use ?
Thanks!_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5