Hi Efim,

   Unfortunately chunking+compression doesn’t really help much with variable 
length datatypes.  Variable length datasets consist of an array of heap 
pointers, so the bulk of the dataset doesn’t participate in any compression.

    On the other hand your record size is large enough that you could setup 
your storage to be a collection of scalar datasets.  Since there is just 
element per dataset, you can make the datatype be whatever the size of the row 
is and use a compression filter.  So rather than accessing a row via and index 
into a dataset, you’d access a dataset via a link name (which could just be a 
stringified version of a numeric index).

   If you go this route, use the “libver=latest” option when opening the file.  
Recent changes in the file format have made accessing objects from a large 
group collection much more efficient.

John

From: Hdf-forum <[email protected]> on behalf of Efim 
Dyadkin <[email protected]>
Reply-To: HDF Users Discussion List <[email protected]>
Date: Monday, June 5, 2017 at 3:40 PM
To: "[email protected]" <[email protected]>
Subject: [Hdf-forum] one element per chunk

Hi,

I need to implement a storage for data with the following properties:

1)       multi-dimensional unlimited size data set of variable-length records

2)       may be highly sparsed

3)       usually randomly accessed one record at a time

4)       each record may vary in size from tens of kilobytes to tens of 
megabytes

I am thinking of unlimited chunked data space. However to make it efficient in 
terms of disk space and access time I need to have my chunks as small as one 
element. Could you please save me performance test and tell if such 
configuration is practical with HDF5?

Thanks,
Efim
------------------- This e-mail, including any attached files, may contain 
confidential and privileged information for the sole use of the intended 
recipient. Any review, use, distribution, or disclosure by others is strictly 
prohibited. If you are not the intended recipient (or authorized to receive 
information for the intended recipient), please contact the sender by reply 
e-mail and delete all copies of this message.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to