Hi Markus,
Sounds like there is more to investigate here. :-/ Unfortunately, I’m
very time constrained right now and can’t spend more hours in this direction.
I spoke with Elena and she’s going to see about some HDF Group staff to look
into the issue.
Quincey
> On Sep 18, 2017, at 11:41 PM, Krug, Markus <[email protected]> wrote:
>
> Dear Quincey,
>
> yes the file gets corrupted if you add the 4th object. However, the problem
> I’m observing is not related to the number of objects you add to the file or
> the number that are already in the file. It’s just because the HDF file spec
> did not specify the location of different blocks within the file. The entire
> spec is a linked list that has its origin in the superblock. The superblock
> itself is the only block that has rules about its location. So in my
> understanding all software that handles HDF files in any way should first
> explore the linked list structure and identify afterwards the location that
> are not used yet and can therefore be used for adding additional content to
> the HDF file if requested. From what I observe the HDFlib implementation
> behaves different. It has an algorithm where to locate the different blocks.
> This algorithm does not consider if these locations are already occupied or
> not. As long as you use the HDFlib implementation this behavior will not lead
> to any problems because you are somehow consistent. The problem shows up at
> that point in time when you generate HDF files with one tool and modify them
> afterwards with a tool that is based on HDFlib.
>
> Actually I’m quite surprised that this behavior hasn’t been observed before.
> I guess the reason is that not many projects use HDF files in embedded
> projects (small 16- or 32bit microcontroller with significant less than
> 1Mbyte program memory, and no or only a small real-time operating system).
> Additionally even in applications where computing power and memory is not a
> topic to be too concerned people use the HDFlib code or binary to save the
> time it takes to re-write it. Nevertheless, I’m almost sure I found a ‘hole’
> in the specification that needs to be fixed. Either in the file specification
> or the HDFlib implementation.
>
> I did not use h5check or h5debug. Is it necessary to compile the belonging
> code before I can use it? I’m also not sure if that will give me new results
> because the file I’m generating is accepted by HDFview with no problem at
> all. Do you think HDFview will accept files that do not follow the HDF
> standard?
>
> Best Regards
> Markus
> <>
> Von: Hdf-forum [mailto:[email protected]
> <mailto:[email protected]>] Im Auftrag von Quincey Koziol
> Gesendet: Montag, 18. September 2017 17:34
> An: HDF Users Discussion List <[email protected]
> <mailto:[email protected]>>
> Betreff: Re: [Hdf-forum] HDF lib incompatible with HDF file spec?
>
> Hi Markus,
> I’ve looked at the files you’ve produced and it seems like the
> first object is getting corrupted when you add the 4th object. Can you see
> if that’s the case? Also, have you been using the h5debug tool for looking
> at your files? (in the tools directory) Or h5check?
>
> Regards,
> Quincey
>
> On Sep 18, 2017, at 5:03 AM, Krug, Markus <[email protected]
> <mailto:[email protected]>> wrote:
>
> Dear all,
>
> I just want to come back to my question about incompatibility between the
> HDFlib and the HDF file spec concerning the actual physical layout of a HDF
> file. Can anyone confirm my observation that this can lead to corrupt files
> if they are generated first in a ‘non HDFlib based’ application that complies
> to the HDF file spec and then is altered in a ‘HDFlib based’ application like
> HDFview?
>
> Best Regards
> Markus
> Von: Krug, Markus
> Gesendet: Mittwoch, 6. September 2017 17:56
> An: 'HDF Users Discussion List' <[email protected]
> <mailto:[email protected]>>
> Betreff: AW: [Hdf-forum] HDF lib incompatible with HDF file spec?
>
> Dear Mark,
>
> completely correct. I wrote some routines that generate hdf files. However
> only a small subset of functionality is uses. More less only compressed,
> compound data types with a maximum number of 5 will be in the files. Very
> likely not more than two groups. I follow this paper
> (http://www.ep.liu.se/ecp/076/050/ecp12076050.pdf
> <http://www.ep.liu.se/ecp/076/050/ecp12076050.pdf>) concerning the hdf file
> layout because I have the need to write ‘time series’ in my embedded
> application.
>
> You are right. The HDF file spec is highly complex. Even my reduced
> functional set takes me significant more time that I was planning to get an
> understanding. In the meantime I think I understand what I need for my
> purpose. However, I’m not saying that the file that I can generate so far are
> 100% correct in the sense of the HDF file spec. But at least HDFview can read
> them with no problems. So it cannot be that wrong.
>
> Best Regards
> Markus
>
> Von: Hdf-forum [mailto:[email protected]
> <mailto:[email protected]>] Im Auftrag von Miller, Mark C.
> Gesendet: Dienstag, 5. September 2017 19:22
> An: HDF Users Discussion List <[email protected]
> <mailto:[email protected]>>
> Betreff: Re: [Hdf-forum] HDF lib incompatible with HDF file spec?
>
> Hmm. If I understand you, you have written code that you believe produces an
> HDF5 file according to the 3.0 file version specification,
> https://support.hdfgroup.org/HDF5/doc/H5.format.html
> <https://support.hdfgroup.org/HDF5/doc/H5.format.html> but nevertheless does
> NOT use the HDF5 library to do it. Furthermore, where 'extended padding' is
> concerned, your implementation does business differently than the HDF5
> implementation.
>
> You can prove HDF5 tools will *read* the file ok. But, in a read-modify-write
> scenario, the file is getting corrupted by HDF5 library due to the difference
> in how the two implementations handle the extended padding -- a feature that
> you explain is '...not defined at all -- not even recommended'.
>
> Is that about right?
>
> If so, it does indeed sound like a potential issue in the file format
> specification for HDF5.
>
> Your scenario sounds like a super useful test case...does a wholly
> independent implementation produce a file the HDF5 library can "handle"?
>
> I wonder if there are settings in HDF5 library you may need to set (such as
> alignment or block-size or something) such that read-modify-write will indeed
> work ok? I wonder if there is some metadata missing from your file that will
> inform the HDF5 library what specific settings it must use to properly read
> and write to the file? I wonder if there is some boot-block information you
> have neglected to include so that the HDF5 library is not aware of all the
> paramaters effecting the file's layout.
>
> The only reason for calling into question many possibilities of your
> implementation is that the HDF5 file format is fairly complex. I don't think
> it is easily duplicated without using the library itself. So, I think its
> highly likely you may be overlooking some important features of the format
> necessary for the HDF5 library to fully handle it.
>
> All that said, I commend your courage for attempting it and hope others can
> chime in with more detailed thoughts on what to do about it.
>
> Mark
>
>
>
> "Hdf-forum on behalf of Krug, Markus" wrote:
>
> Dear all,
>
> I just came around an interesting issue.
> I implemented the writing of HDF files on an embedded system. The amount of
> functionality I implemented is significant less than the HDF lib offers. So
> it is just tailored to my needs. I implemented everything on base of the HDF
> 3.0 file spec. One point of my tailoring was to optimize the file size.
> Therefore, I write every internal block in the HDF files aligned byte-by-byte
> to the next – or padded to the address alignment if it is requested by the
> HDF file specification. The HDF files generated by HDFview or Matlab have
> plenty of space in-between the internal blocks. Sometimes a few hundred
> bytes. As far as I read from the HDF file specification this ‘extended
> padding’ is not defined at all – not even recommended.
> However, this ‘extended padding’ that is performed by the HDF lib leads to a
> behavior that I would consider as an incompatibility to itself. To
> demonstrate this I attached two HDF files to this email. The first
> (sizeoptimized.h5) is generated by my embedded software and is optimized
> concerning the file size. It contains three compounds with each of them has 2
> elements. You should be able to open that file in HDFview or similar tools
> and read all its contents.
> The second file (sizeoptimizedextended.h5) is generated by HDFview by adding
> a fourth compound after the sizeoptimized.h5 file was opened in HDFview. You
> can see that the file is partly corrupted. The reason for this is that
> HDFview (and therefore the HDF lib I guess) is not really taking care about
> the position of the internal blocks of a file that it is writing to. It seems
> to me it has some internal mapping of those blocks. This mapping gets applied
> even if it will collide, and therefore corrupt, the existing blocks.
> If my observation is correct I think the HDF lib will need a bugfix or the
> HDF file spec will need a description of how the internal blocks are allowed
> to be positioned within a HDF file.
> I forgot to mention that I tried to use the HDF lib sources and compile it to
> my system. However, I quit after a couple of days because the way the sources
> are written are not suitable at all to adopt them to an embedded system that
> runs a simplified file system and a real-time operating system – and all of
> it has to fit into a few hundred kilobytes.
>
> Can anyone comment on my observation?
>
>
> Best Regards
> Markus
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected] <mailto:[email protected]>
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> <http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org>
> Twitter: https://twitter.com/hdf5 <https://twitter.com/hdf5>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected] <mailto:[email protected]>
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> <http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org>
> Twitter: https://twitter.com/hdf5 <https://twitter.com/hdf5>
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5