Dear Petr Klapka,
Am 06.08.2015 um 16:46 schrieb Petr KLAPKA
<[email protected]<mailto:[email protected]>>:
Good morning!
My name is Petr Klapka, My colleagues and I are in the process of evaluating
HDF5 as a potential file format for a data acquisition tool.
We use HDF5 in data acquisition at SINQ, PSI. Other places, like synchrotron
sources which generate much more data then we, too.
I have been working through the HDF5 tutorials and overcoming the API learning
curve. I was hoping you could offer some advice on the suitability of HDF5 for
our intended purpose and perhaps save me the time of mis-using the format or
API.
The data being acquired are "samples" from four devices. Every ~50ms a device
provides a sample. The sample is an array of structs. The total size of the
array varies but will be on average around 8 kilobytes. (160k per second per
device).
The data will need to be recorded over a period of about an hour, meaning an
uncompressed file size of around 2.3 Gigabytes.
I will need to "play back" these samples, as well as jump around in the file,
seeking on sample meta data and time.
My questions to you are:
* Is HDF5 intended for data sets of this size and throughput given a high
performance Windows workstation?
Sure, HDF5 excels for this kind of data sizes. Only 2.3 GB…. But if you use
Windows you throw away most of the cababilities of your machine.
* What is the "correct" usage pattern for this scenario?
* Is it to use a "Group" for each device, and create a "Dataset" for
each sample? This would result in thousands of datasets in the file per group,
but I fully understand how to navigate this structure.
This will not perform well. HDF does not like to have many small structures
generated.
* Or should there only be four "Datasets" that are extensible, and each
sensor "sample" be appended into the dataset? If this is the case, can the
dataset itself be searched for specific samples by time and metadata?
This is much better. Appending to a large array is good. You may have to play
with the chunking to get this working well. HDF5 does not support explicit
searching.
But if I understand your use case right, it would help to have another array
with time stamps running alongside. Then search for a specific time interval
comes down finding the index range of the interesting time range in the time
stamp array and then retrieve the slabs of data from the arrays by the indexes
just
found.
I know to little about your other meta data to suggest something.
* Or is this use case appropriate for the Table API?
Could be, I do not know that API well enough,
Best Regards,
Mark Könnecke
I will begin with prototyping the first scenario, since it is the most straight
forward to understand and implement. Please let me know your suggestions.
Many thanks!
Best regards,
Petr Klapka
System Tools Engineer
Valeo Radar Systems
46 River Rd
Hudson, NH 03051
Mobile: (603) 921-4440
Office: (603) 578-8045
"Festina lente."
This e-mail message is intended only for the use of the intended recipient(s).
The information contained therein may be confidential or privileged,
and its disclosure or reproduction is strictly prohibited.
If you are not the intended recipient, please return it immediately to its
sender
at the above address and destroy it.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]<mailto:[email protected]>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5