Why does it have to be *one*single* HDF5 file? It might be possible if you had 
sort of a 'master' or 'root' HDF5 file and then a number of other HDF5 files 
that get 'mounted' into the master (much like unix fs 'mount' command) so that 
the libhdf5 caller things its only ever opening the master file. But, that 
master file would still need to know a lot ahead of time. If you don't know 
enough ahead of time about at least *some* of the contents of the resulting 
assembly (like number of datasets and their names or someting), then I think 
its going to be difficult.

The boot block is just part of the problem. After the boot block is 
successfully read, libhdf5 is going to want to read some metadata about the 
file's contents (group names, dataset names, etc.). That metadata can be 
scattered all over the "file". So, you'd need to construct things such that at 
least the *initial* metadata is in the first chunk of stuff you send.

Here is a simpler problem…can you make this work for a real file that is 
"growing" locally *without* having to re-open the file each time new stuff is 
added to the file? I mean, forget the HTTP part of the problem and see if you 
can get a libhdf5 caller to *behave* the way you want when the underlying file 
is being "assembled" in the way you plan? I think there may be ways of getting 
it to work if you treat it really as multiple HDF5 either by using things like 
a) external datasets, b) mounting files or c) using the 'family' virtual file 
driver (vfd). However, all of these approaches *will* have some requirement for 
at least *some* apriori knowledge of the file's contents.

Hope that helps.

Mark


From: Hdf-forum 
<[email protected]<mailto:[email protected]>>
 on behalf of Lion Krischer 
<[email protected]<mailto:[email protected]>>
Reply-To: HDF Users Discussion List 
<[email protected]<mailto:[email protected]>>
Date: Monday, March 28, 2016 6:03 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [Hdf-forum] Assembling HDF5 file piece by piece

Hi all,

we are planning on developing a web service that assembles potentially
very large HDF5 files on the fly and sends them over HTTP. We cannot
create the files and then send them once they are created - for one the
required disc space might not be available on the server and the latency
until the first byte is sent would also not be acceptable. Thus we would
need to assemble the files piece by piece and send each piece once it is
available.

Is this possible at all with the HDF5 format? We don't mind creating the
binary files by hand but the final results must be readable by libhdf5.

Initial tries with some simple scripts and a hex editor already failed
at the superblock level, namely the "End of File Address". If its set to
0 (or some other wrong number that is not the file size) libhdf5 raises
an error and refuses to read the file (note that I did adjust the
superblock checksum every time). If we assemble the file piece by piece
we don't know the final file size at the time the superblock is created
and streamed over HTTP. We also don't know what ends up in the final
file before we would have to send the first chunk - it might be possible
to determine an upper bound ahead of time but it would add significant
complexity to the point where we might not be able to execute the project.

Is there any way around this? Some magic "End of File Address" number or
some way to move the superblock to the end of the file? The HDF5 spec is
really long and I did not yet read everything so maybe I am missing
something.

Assuming this can be dealt with: Are there any other potential
roadblocks we might stumble into? If not: Any change on changing the
HDF5 spec/libhdf5 so it interprets a "End of File Address" of 0 as
"unspecified"?

Looking forward to your thoughts and advice. Thanks a lot!

Lion

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]<mailto:[email protected]>
http://secure-web.cisco.com/1tGyGB-0KIHvsExwdsWPpOoJwgx-8GBodHaq-RZ1bXDd_CjgkYfyaKop3dEDAW9cZ0LkoZC35m7s-YISk4PrXXjPS1Df4XWYaSFRVVIWwVbdy7zTTYfcrFa-Y6W_1N69L2PEA8QTAZVbFCKzx4CJ6W1elRG4omISfq1bMKtmJNAdJFZGOMfobAY71Y9fUz_-9FZmv00qMvRZOGirXRINpiVgOy0VRtPuE29yjhS-Iwaci91KPHEWTHYukvP7QeC7DzqgKviQmBHSYs02YLUgYunfv3nbZKbNplKVvi44P2AlrwDLQGtCQsxy2B9DMMQfQ_QCCYxgriCsp5hP96uzK-N0Yixp6D0Gc0aQf0Zm9euk/http%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Flistinfo%2Fhdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to