From: Hdf-forum <[email protected]> on behalf of Paul Anton 
Letnes <[email protected]>
Reply-To: Paul Anton Letnes <[email protected]>, HDF Users Discussion List 
<[email protected]>
Date: Tuesday, November 8, 2016 at 12:42 PM
To: "[email protected]" <[email protected]>
Subject: [Hdf-forum] Working with split hdf5 files

Hi!

First, thanks for creating hdf5, which is incredibly helpful for so many people!

I'm currently working on hyperspectral images. We've got a camera that writes 
one frame at a time into a rank-3 hdf5 dataset; the slowest varying index of 
the dataset is the frame number. To avoid corrupt files, we currently split 
each recording (think something along a video recording) into separate hdf5 
files, approx. 1 GB in size (configurable). Working with the split 
files/dataset is doable, but less elegant than putting one big dataset into one 
big file, obviously.

- Is there a way to ensure the integrity of partial recordings (against power 
loss, software crashes, you name it) without splitting them into all these 
small files?

Won't judicious use of H5Fflush() do the trick?


- Is there a way to create a "master file" that uses symbolic/external links to 
"link together" all the datasets (one per file) into something that looks like 
a dataset from a hdf5 user (h5py, matlab, ...)? I've noticed the "drivers" [1] 
that talk about split files, but I'm uncertain whether each sub-file is a valid 
hdf5 file? H5FD_MULTI superficially looks like what we need.

Based on what you've written, I was going to suggest the 'family' driver, 
https://support.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetFaplFamily. 
It'd probably be best of you could size things such that an integral number of 
frames goes into a given file. But, I suspect in general, that won't be 
possible.


- Can virtual datasets [2] be used from older (1.8.x) clients? Will it work for 
this purpose?

Virtual Data Sets are new to HDF5. I don't think 1.8 series supports them, or 
ever will. But, they will solve your desire to create a "single dataset" view 
of the data.

Given how big these seem to be, are you really thinking any sequential (e.g. 
non-parallel) app like matlab is really going to be able to do much with a 
single dataset view of this data?


- Or are we missing some great idea or feature in HDF5?

Depending what you need to do, mounting one file within another, 
https://support.hdfgroup.org/HDF5/doc/RM/RM_H5F.html#File-Mount, external 
links, 
https://support.hdfgroup.org/HDF5/doc/RM/RM_H5L.html#Link-CreateExternal, or 
object references, 
https://support.hdfgroup.org/HDF5/doc/RM/RM_H5R.html#Reference-Create might 
offser some of what you need.



Cheers,
Paul

[1] https://support.hdfgroup.org/HDF5/Tutor/filedrvr.html#predef
[2] 
https://support.hdfgroup.org/HDF5/docNewFeatures/NewFeaturesVirtualDatasetDocs.html

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to