Re: [Hdf-forum] Seeking advice on HDF5 use case

Jason Newton Thu, 06 Aug 2015 15:35:24 -0700

I've had good luck with the packet table api from University of Illinois /
Boeing https://www.hdfgroup.org/HDF5/doc/HL/H5PT_Intro.html - it's built on
the table api.  I use the C++ wrapper for it, and while it's had some rough
edges it generally works the way you want it to.  You would not use any
filters and only native datatypes that do not need conversion for packet
tables, but you can always compress them at a later time.


One thing you might want to keep in mind though is that HDF is not robust
to sudden crashes of the system, hardware or software.  To that ends, the
more important and simple the data is, the more I write an HDF datatype
descriptor file and then dump those "datasets" to individual files I can
use the type information from an HDF file to parse.  This lets you keep all
the guarantees of the lower level file handling - if you know what you're
doing / use atomic writes and stuff while keeping the self-describing
capability of HDF.  Afterwards I write some simple python scripts with
h5py/numpy to pull in those datasets to HDF proper... it's simple method
and gives you the best of both worlds for reliability, compression, and
archivability.

-Jason

On Thu, Aug 6, 2015 at 7:46 AM, Petr KLAPKA <[email protected]> wrote:

> Good morning!
>
> My name is Petr Klapka,  My colleagues and I are in the process of
> evaluating HDF5 as a potential file format for a data acquisition tool.
>
> I have been working through the HDF5 tutorials and overcoming the API
> learning curve.  I was hoping you could offer some advice on the
> suitability of HDF5 for our intended purpose and perhaps save me the time
> of mis-using the format or API.
>
> The data being acquired are "samples" from four devices.  Every ~50ms a
> device provides a sample.  The sample is an array of structs.  The total
> size of the array varies but will be on average around  8 kilobytes.  (160k
> per second per device).
>
> The data will need to be recorded over a period of about an hour, meaning
> an uncompressed file size of around 2.3 Gigabytes.
>
> I will need to "play back" these samples, as well as jump around in the
> file, seeking on sample meta data and time.
>
> My questions to you are:
>
>    - Is HDF5 intended for data sets of this size and throughput given a
>    high performance Windows workstation?
>    - What is the "correct" usage pattern for this scenario?
>       - Is it to use a "Group" for each device, and create a "Dataset"
>       for each sample?  This would result in thousands of datasets in the file
>       per group, but I fully understand how to navigate this structure.
>       - Or should there only be four "Datasets" that are extensible, and
>       each sensor "sample" be appended into the dataset?  If this is the case,
>       can the dataset itself be searched for specific samples by time and
>       metadata?
>       - Or is this use case appropriate for the Table API?
>
> I will begin with prototyping the first scenario, since it is the most
> straight forward to understand and implement.  Please let me know your
> suggestions.  Many thanks!
>
> Best regards,
>
> Petr Klapka
> System Tools Engineer
> *Valeo* Radar Systems
> 46 River Rd
> Hudson, NH 03051
> Mobile: (603) 921-4440
> Office: (603) 578-8045
> *"Festina lente."*
>
> *This e-mail message is intended only for the use of the intended 
> recipient(s).
> The information contained therein may be confidential or privileged,
> and its disclosure or reproduction is strictly prohibited.
> If you are not the intended recipient, please return it immediately to its 
> sender
> at the above address and destroy it. *
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Seeking advice on HDF5 use case

Reply via email to