Hi Efim,
Unfortunately chunking+compression doesn’t really help much with variable
length datatypes. Variable length datasets consist of an array of heap
pointers, so the bulk of the dataset doesn’t participate in any compression.
On the other hand your record size is large enough that you could setup
your storage to be a collection of scalar datasets. Since there is just
element per dataset, you can make the datatype be whatever the size of the row
is and use a compression filter. So rather than accessing a row via and index
into a dataset, you’d access a dataset via a link name (which could just be a
stringified version of a numeric index).
If you go this route, use the “libver=latest” option when opening the file.
Recent changes in the file format have made accessing objects from a large
group collection much more efficient.
John
From: Hdf-forum <[email protected]> on behalf of Efim
Dyadkin <[email protected]>
Reply-To: HDF Users Discussion List <[email protected]>
Date: Monday, June 5, 2017 at 3:40 PM
To: "[email protected]" <[email protected]>
Subject: [Hdf-forum] one element per chunk
Hi,
I need to implement a storage for data with the following properties:
1) multi-dimensional unlimited size data set of variable-length records
2) may be highly sparsed
3) usually randomly accessed one record at a time
4) each record may vary in size from tens of kilobytes to tens of
megabytes
I am thinking of unlimited chunked data space. However to make it efficient in
terms of disk space and access time I need to have my chunks as small as one
element. Could you please save me performance test and tell if such
configuration is practical with HDF5?
Thanks,
Efim
------------------- This e-mail, including any attached files, may contain
confidential and privileged information for the sole use of the intended
recipient. Any review, use, distribution, or disclosure by others is strictly
prohibited. If you are not the intended recipient (or authorized to receive
information for the intended recipient), please contact the sender by reply
e-mail and delete all copies of this message.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5