Karl, What options are you creating your datasets with? When you say writing the smaller HDF5 files this implies some filters are being used (like gzip). That would be a nono for real-time expectations. It would be useful if you could post your code or a filtered down version in case. But your very first experiment should be to try against 1.8.16 and maybe see if H5Pset_libver_bounds with H5F_LIBVER_LATEST for the low and high arguments after. If you are using 1.8.02, you're using a copy from 2009 and missing many bugfixes.
I would append to one dataset generally, this is actually what the packet table does, but it also keeps the H5TB table it's running on open vs H5TBappend_records. But as I said, for 6 hz to maybe 100hz... it should work, but I don't know how much jitter. Btw tmpfs = ram... it is very useful to look against it to see if this is a result of fsync's or something. -Jason On Mon, May 9, 2016 at 4:42 PM, Karl Hoover <[email protected]> wrote: > Greetings, > > Jason, thanks very much for the suggestions. I have tried different hard > disk hardware, different computers, different file system types and turning > off journaling. The result has always thus far been that the raw binary > files can be written at full speed while writing the smaller HDF5 files > introduces the constantly growing delay. > > > It does seem that my approach of making each measurement a new HDF dataset > might not be very good. But it is convenient and my whole software 'system' > expects the data to be laid out that way. > > > Best regards, > > Karl Hoover > > > > > ------------------------------ > *From:* Hdf-forum <[email protected]> on behalf of > Jason Newton <[email protected]> > *Sent:* Monday, May 9, 2016 1:24 PM > *To:* HDF Users Discussion List > *Subject:* Re: [Hdf-forum] Performance problem writing small datasets > hdf5.8.0.2 > > Hi, > > I've used HDF5 in soft-real time before where I mostly met timings you'd > seem to agree with with more data, to SSD, but I used the packet table > interface (C/C++, no Java). So it is possible, though I will also tell you > I used posix file io to get more safety/perf/determinism, then post convert > it (after DAQ is complete) to HDF5 for analysis/long term storage. For > your use case though, it sounds fine. > > You might want to test with a tmpfs partition like /dev/shm/ or /tmp, > assuming they are mounted as tmpfs to separate out hardware and filesystem > performances. > > I think you mean HDF5 1.8.02 (not really sure), but it should be stated > you should test and USE the latest stable HDF5. It is very possible this > will change your observations greatly (in a positive fashion, usually). > You'd definitely want to keep files and datasets open while you append to > them as closes imply flushes. > > Also, if this example file is indicative of your typical collections, > creating datasets very frequently is high overhead. I'm pretty sure you > could do 10-100hz without much trouble on most linux filesystems (ext4, > xfs), with occasional jitter if under high load. > > > This constant growth in timing sounds very familiar to a bug I came across > a few years ago, but I can't quite remember the cause of it. I do remember > it was fixed though. I'll follow up if I recall more. > > -Jason > > On Mon, May 9, 2016 at 1:58 PM, Karl Hoover <[email protected]> > wrote: > >> We're developing software for the control of a scientific instrument. At >> an overall rate of bout 6Hz, 2.8 millisecond's worth of 18 bit samples at >> 250 kHz on up to 24 channels. These data are shipped back over gigabit >> Ethernet to a Linux PC running a simple Java program. These data can >> reliably be written as a byte stream to disk at full speed with extremely >> regular timing. Thus we are certain that our data acquisition, Ethernet >> transport, Linux PC software and file system are working fine. >> >> >> However, the users want data in a more portable, summarized format and we >> selected hdf5. The 700 or so 18 bit samples of each channel are integrated >> into 18 to 20 time bins. The resulting data sets are thus not very large at >> all. I've attached a screen shot of a region of a typical file of typical >> size and example data (much smaller than a typical file.) >> >> >> The instrument operates in two distinct modes. In one mode the instrument >> is stationary over the region of interest. This is working flawlessly. In >> the other mode, the instrument is moved around and about in arbitrary >> paths. In this mode the precise time of the data acquisition obviously is >> critical. What we observe is that the performance of the system is fine >> very stable at 6Hz except that every 9 seconds a delay occurs starting with >> about a 10 ms delays growing without bound to 100's of milliseconds. There >> is nothing in my software that knows anything about a 9 second interval. >> And I've found that this delay only occurs when I *write* the HDF5 file. >> All other processing including creating the HDF5 file can be performed >> without any performance problem. It makes no difference whether I keep the >> hdf5 file open or close it each time. I'm using HDF5.8.02 and the jni / >> Java library. Any suggestions about how to fix this problem would be >> appreciated. >> >> >> Best regards, >> Karl Hoover >> >> Senior Sofware Engineer >> >> Geometrics >> >> >> >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> [email protected] >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >> Twitter: https://twitter.com/hdf5 >> > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > Twitter: https://twitter.com/hdf5 >
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
