Re: [Hdf-forum] Performance problem writing small datasets hdf5.8.0.2

Jason Newton Mon, 09 May 2016 15:54:19 -0700

Karl,

What options are you creating  your datasets with?  When you say writing
the smaller HDF5 files this implies some filters are being used (like
gzip).  That would be a nono for real-time expectations.   It would be
useful if you could post your code or a filtered down version in case.  But
your very first experiment should be to try against 1.8.16 and maybe see if
H5Pset_libver_bounds with H5F_LIBVER_LATEST for the low and high arguments
after.  If you are using 1.8.02, you're using a copy from 2009 and missing
many bugfixes.


I would append to one dataset generally, this is actually what the packet
table does, but it also keeps the H5TB table it's running on open vs
H5TBappend_records.   But as I said, for 6 hz to maybe 100hz... it should
work, but I don't know how much jitter.  Btw tmpfs = ram... it is very
useful to look against it to see if this is a result of fsync's or
something.

-Jason

On Mon, May 9, 2016 at 4:42 PM, Karl Hoover <[email protected]> wrote:

> Greetings,
>
> Jason, thanks very much for the suggestions. I have tried different hard
> disk hardware, different computers, different file system types and turning
> off journaling. The result has always thus far been that the raw binary
> files can be written at full speed while writing the smaller HDF5 files
> introduces the constantly growing delay.
>
>
> It does seem that my approach of making each measurement a new HDF dataset
> might not be very good. But it is convenient and my whole software 'system'
> expects the data to be laid out that way.
>
>
> Best regards,
>
> Karl Hoover
>
>
>
>
> ------------------------------
> *From:* Hdf-forum <[email protected]> on behalf of
> Jason Newton <[email protected]>
> *Sent:* Monday, May 9, 2016 1:24 PM
> *To:* HDF Users Discussion List
> *Subject:* Re: [Hdf-forum] Performance problem writing small datasets
> hdf5.8.0.2
>
> Hi,
>
> I've used HDF5 in soft-real time before where I mostly met timings you'd
> seem to agree with with more data, to SSD, but I used the packet table
> interface (C/C++, no Java).  So it is possible, though I will also tell you
> I used posix file io to get more safety/perf/determinism, then post convert
> it (after DAQ is complete) to HDF5 for analysis/long term storage.  For
> your use case though, it sounds fine.
>
> You might want to test with a tmpfs partition like /dev/shm/ or /tmp,
> assuming they are mounted as tmpfs to separate out hardware and filesystem
> performances.
>
> I think you mean HDF5 1.8.02 (not really sure), but it should be stated
> you should test and USE the latest stable HDF5. It is very possible this
> will change your observations greatly (in a positive fashion, usually).
> You'd definitely want to keep files and datasets open while you append to
> them as closes imply flushes.
>
> Also, if this example file is indicative of your typical collections,
> creating datasets very frequently is high overhead.  I'm pretty sure you
> could do 10-100hz without much trouble on most linux filesystems (ext4,
> xfs), with occasional jitter if under high load.
>
>
> This constant growth in timing sounds very familiar to a bug I came across
> a few years ago, but I can't quite remember the cause of it.  I do remember
> it was fixed though.  I'll follow up if I recall more.
>
> -Jason
>
> On Mon, May 9, 2016 at 1:58 PM, Karl Hoover <[email protected]>
> wrote:
>
>> We're developing software for the control of a scientific instrument. At
>> an overall rate of bout 6Hz, 2.8 millisecond's worth of 18 bit samples at
>> 250 kHz on up to 24 channels. These data are shipped back over gigabit
>> Ethernet to a Linux PC running a simple Java program. These data can
>> reliably be written as a byte stream to disk at full speed with extremely
>> regular timing. Thus we are certain that our data acquisition, Ethernet
>> transport, Linux PC software and file system are working fine.
>>
>>
>> However, the users want data in a more portable, summarized format and we
>> selected hdf5. The 700 or so 18 bit samples of each channel are integrated
>> into 18 to 20 time bins. The resulting data sets are thus not very large at
>> all. I've attached a screen shot of a region of a typical file of typical
>> size and example data (much smaller than a typical file.)
>>
>>
>> The instrument operates in two distinct modes. In one mode the instrument
>> is stationary over the region of interest. This is working flawlessly. In
>> the other mode, the instrument is moved around and about in arbitrary
>> paths. In this mode the precise time of the data acquisition obviously is
>> critical. What we observe is that the performance of the system is fine
>> very stable at 6Hz except that every 9 seconds a delay occurs starting with
>> about a 10 ms delays growing without bound to 100's of milliseconds. There
>> is nothing in my software that knows anything about a 9 second interval.
>> And I've found that this delay only occurs when I *write* the HDF5 file.
>> All other processing including creating the HDF5 file can be performed
>> without any performance problem. It makes no difference whether I keep the
>> hdf5 file open or close it each time. I'm using HDF5.8.02 and the jni /
>> Java library. Any suggestions about how to fix this problem would be
>> appreciated.
>>
>>
>> Best regards,
>> Karl Hoover
>>
>> Senior Sofware Engineer
>>
>> Geometrics
>>
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> Twitter: https://twitter.com/hdf5
>>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Performance problem writing small datasets hdf5.8.0.2

Reply via email to