2016-10-28 20:24 GMT+02:00 Miller, Mark C. <[email protected]>: > Can I just clarify some of this discussion... > > It reads like you are talking about compression ratios around 1.6x, less > than 2:1. Is that correct?
Yes, in our case we only do lossless compression so far, but we have been talking about lossy. Just haven't taken any steps yet, and I didn't even know about ZFP from before. It looks very interesting. > > FYI..ZFP demonstrates results far beyond that (10-30x and better) at the > expense of (some) loss. Yes, ZFP is of course in a completely different ball game compression ratio wise than the codecs I compared in my benchmark (which are all lossless). It looks very impressive from reading the material on the site and skimming the paper. > > However, current efforts indicate that losses are tolerable in many > post-processing analysis workflows. Right, we need to investigate, or rather I need to have a discussion with our physicists on how much error we can tolerate (I'm not doing any analysis myself, only visualization). Our data is single precision float to begin with. For the visualization part I'm sure we could get away with quite a bit of loss. > > We think the key to achieving good compression on floating point data, going > forward, is to allow for some well controlled loss. Yes, and it seems that ZFP has several knobs for controlling that loss which look really useful. > > See this page on on ZFP losses effect, for example, taking derivatives... > > http://computation.llnl.gov/projects/floating-point-compression/zfp-and-derivatives > > as compared to other compression methods. Thanks for the pointer. > > We already face loss-like noise in floating point results when dealing with > system differences either between current systems and software stacks or > over time as systems and software evolve. Indeed. We simply need to have a look at how much error we can tolerate. Elvis > > Mark > > -- > Mark C. Miller, LLNL > > From: Hdf-forum <[email protected]> on behalf of Elvis > Stansvik <[email protected]> > Reply-To: HDF Users Discussion List <[email protected]> > Date: Friday, October 28, 2016 at 11:08 AM > To: "[email protected]" <[email protected]> > Cc: HDF Users Discussion List <[email protected]> > Subject: Re: [Hdf-forum] New HDF5 compression plugin > > 2016-10-28 18:14 GMT+02:00 Francesc Alted <[email protected]>: > > > > 2016-10-28 18:04 GMT+02:00 Elvis Stansvik <[email protected]>: > > > 2016-10-28 17:53 GMT+02:00 Francesc Alted <[email protected]>: >> >> >> 2016-10-28 17:20 GMT+02:00 Elvis Stansvik >> <[email protected]>: >>> >>> 2016-10-28 16:33 GMT+02:00 Francesc Alted <[email protected]>: >>> > 2016-10-28 13:59 GMT+02:00 Elvis Stansvik >>> > <[email protected]>: >>> >> >>> >> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <[email protected]>: >>> >> > I second this request big time and would add zstd, if we are >>> >> > already >>> >> > trying >>> >> > out various encoders. ;) >>> >> >>> >> This may not be of interest, and does not include zstd, but I'm >>> >> attaching an excerpt from some of the results I got when back when >>> >> doing our basic benchmarking of some algorithms (all lossless). >>> >> >>> >> It was based on those that we settled on Blosc_LZ4HC at level 4, >>> >> since >>> >> we were looking for very fast decompression times, while longer >>> >> compression times and slightly larger file size was acceptable up to >>> >> certain points. The gzip results are included mostly because that's >>> >> what we were using at the time and I wanted them as a comparison, >>> >> but >>> >> we knew we wanted something else. The input for those benchmarks was >>> >> a >>> >> 500x300x300 float dataset containing a tomographic 3D image. >>> > >>> > >>> > Zstd was included in Blosc a while ago: >>> > >>> > http://blosc.org/blog/zstd-has-just-landed-in-blosc.html >>> > >>> > and its performance really shines, even on real data: >>> > >>> > >>> > >>> > >>> > http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html >>> > >>> > (although here, being only integers of 1 byte, only the BITSHUFFLE >>> > filter is >>> > used, but not the faster SHUFFLE). >>> > >>> > As Blosc offers the same API for a number of codecs, trying it in >>> > combination with Zstd should be really easy. >>> >>> Zstd indeed looks very well-balanced. The reason I didn't include it >>> back when I did those benchmarks was that we were really focused on >>> decompression speed in our application, compression speed was very >>> much secondary. So I included mostly LZ4 codecs. >> >> >> Yes, that makes sense, but I think you should give a try at least at the >> lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too). For >> these low compression levels Blosc chooses a block size that comfortably >> fits in L2. Also, note that the benchmarks above where for in-memory >> data, >> so for a typical disk-based workflow using HDF5, Blosc+Zstd can still >> perform well enough. > > Alright, thanks for the tip. I read the benchmarks too fast and didn't > realize it was all in-memory. I should definitely at Zstd. > > In our use case it's always from disk (or well, SSD), and sometimes > even slow-ish network mounts. > > > > Cool. Keep us informed. I am definitely interested. > > > I found the old input file and very quickly I ran the benchmark again > with Blosc_ZSTD with byte-based shuffling at compression levels 1, 2 > and 3: > > compressor,ctime_mean(s),ctime_std(s),rtime_mean(s),rtime_std(s),size(B) > blosc_zstd_1,0.73083,0.00104,0.29489,0.00338,116666294 > blosc_zstd_2,1.40672,0.00164,0.28097,0.00220,114666454 > blosc_zstd_3,1.48507,0.01872,0.26451,0.00208,113485801 > > Unfortunately I can't find the spreadsheet where I made those > diagrams, so can't make a new updated one (at least not easily right > now). > > But this shows that Zstd is very competitive. It achieves slightly > better compression ratio than Blosc_LZ4HC at level 4 (the original > file size was 189378052 bytes), which is what we picked, and the > compression is much faster. But Blosc_LZ4HC still wins out in the > decompression time, so I think in the end we picked the right one. > > Our use case is essentially compress once, decompress many many times. > And during the decompression the user will sit there and wait. That's > why decompression time was so important to us. > > Anyway, thanks a for making me have a look at Zstd, we may yet use it > somewhere else. > > And I now remember the real reason I didn't include it the first time > around: We're basing our product on Ubuntu 16.04, where Blosc 1.7 is > the packaged version (1.10 is where Zstd support was added), so I > lazily just skipped it :) > > Elvis > > > > > Elvis > >> >> >>> >>> >>> > >>> >> >>> >> I might try to dig up the script I used for the benchmark and see if >>> >> we still have the input I used, and do a test with lossy ZFP. It >>> >> could >>> >> be very interesting for creating 3D "thumbnails" in our application. >>> > >>> > >>> > It would be nice if your benchmark code (and dataset) can be made >>> > publicly >>> > available so as to serve to others as a good comparison. >>> >>> The dataset is unfortunately confidential and not something I can >>> release. I'm attaching the script I used though, it's very simple. >>> >>> But, a disclaimer: The benchmarks I did were not really thorough. They >>> were also internal and never really meant to be published. It was >>> mostly a quick and dirty test to see which of these LZ4 codecs would >>> be in the right ballpark for us. >> >> >> Ok. Thanks anyway. >> >>> >>> >>> Elvis >>> >>> > >>> >> >>> >> >>> >> Elvis >>> >> >>> >> > >>> >> > P >>> >> > >>> >> > >>> >> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote: >>> >> >> >>> >> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <[email protected]>: >>> >> >>> >>> >> >>> Hi All, >>> >> >>> >>> >> >>> Just wanted to mention a new HDF5 floating point compression >>> >> >>> plugin >>> >> >>> available on github... >>> >> >>> >>> >> >>> https://github.com/LLNL/H5Z-ZFP >>> >> >>> >>> >> >>> This plugin will come embedded in the next release of the Silo >>> >> >>> library >>> >> >>> as >>> >> >>> well. >>> >> >> >>> >> >> >>> >> >> Thanks for the pointer. That's very interesting. I had not heard >>> >> >> about >>> >> >> ZFP before. The ability to set a bound on the error in the >>> >> >> lossless >>> >> >> case seems very useful. >>> >> >> >>> >> >> Do you know if there has been any comparative benchmarks of ZFP >>> >> >> against other compressors? >>> >> >> >>> >> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC >>> >> >> at >>> >> >> level 4 for our datasets (3D float tomography data), but maybe it >>> >> >> would be worthwhile to look at ZFP as well.. >>> >> >> >>> >> >> Best regards, >>> >> >> Elvis >>> >> >> >>> >> >>> >>> >> >>> -- >>> >> >>> Mark C. Miller, LLNL >>> >> >>> >>> >> >>> _______________________________________________ >>> >> >>> Hdf-forum is for HDF software users discussion. >>> >> >>> [email protected] >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >>> >> >>> Twitter: https://twitter.com/hdf5 >>> >> >> >>> >> >> >>> >> >> _______________________________________________ >>> >> >> Hdf-forum is for HDF software users discussion. >>> >> >> [email protected] >>> >> >> >>> >> >> >>> >> >> >>> >> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >>> >> >> Twitter: https://twitter.com/hdf5 >>> >> >> >>> >> > >>> >> > _______________________________________________ >>> >> > Hdf-forum is for HDF software users discussion. >>> >> > [email protected] >>> >> > >>> >> > >>> >> > >>> >> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >>> >> > Twitter: https://twitter.com/hdf5 >>> >> >>> >> _______________________________________________ >>> >> Hdf-forum is for HDF software users discussion. >>> >> [email protected] >>> >> >>> >> >>> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >>> >> Twitter: https://twitter.com/hdf5 >>> > >>> > >>> > >>> > >>> > -- >>> > Francesc Alted >>> > >>> > _______________________________________________ >>> > Hdf-forum is for HDF software users discussion. >>> > [email protected] >>> > >>> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >>> > Twitter: https://twitter.com/hdf5 >> >> >> >> >> -- >> Francesc Alted > > > > > > -- > Francesc Alted > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > Twitter: https://twitter.com/hdf5 > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > Twitter: https://twitter.com/hdf5 _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
