This is something that I am very interested in trying out on Blue Waters, where, with my current lossless compression (gzip) I would end up making several PB of data over the next couple of years. There are certain variables in my data that I would be happy to compress lossily. With the compression ratios being reported this could be a gamechanger for me.
Is there a Fortran interface consistent with the other Fortranized routines in HDF? Leigh On Fri, Oct 28, 2016 at 1:45 PM Elvis Stansvik <[email protected]> wrote: > 2016-10-28 20:24 GMT+02:00 Miller, Mark C. <[email protected]>: > > Can I just clarify some of this discussion... > > > > It reads like you are talking about compression ratios around 1.6x, less > > than 2:1. Is that correct? > > Yes, in our case we only do lossless compression so far, but we have > been talking about lossy. Just haven't taken any steps yet, and I > didn't even know about ZFP from before. It looks very interesting. > > > > > FYI..ZFP demonstrates results far beyond that (10-30x and better) at the > > expense of (some) loss. > > Yes, ZFP is of course in a completely different ball game compression > ratio wise than the codecs I compared in my benchmark (which are all > lossless). It looks very impressive from reading the material on the > site and skimming the paper. > > > > > However, current efforts indicate that losses are tolerable in many > > post-processing analysis workflows. > > Right, we need to investigate, or rather I need to have a discussion > with our physicists on how much error we can tolerate (I'm not doing > any analysis myself, only visualization). Our data is single precision > float to begin with. For the visualization part I'm sure we could get > away with quite a bit of loss. > > > > > We think the key to achieving good compression on floating point data, > going > > forward, is to allow for some well controlled loss. > > Yes, and it seems that ZFP has several knobs for controlling that loss > which look really useful. > > > > > See this page on on ZFP losses effect, for example, taking derivatives... > > > > > http://computation.llnl.gov/projects/floating-point-compression/zfp-and-derivatives > > > > as compared to other compression methods. > > Thanks for the pointer. > > > > > We already face loss-like noise in floating point results when dealing > with > > system differences either between current systems and software stacks or > > over time as systems and software evolve. > > Indeed. > > We simply need to have a look at how much error we can tolerate. > > Elvis > > > > > Mark > > > > -- > > Mark C. Miller, LLNL > > > > From: Hdf-forum <[email protected]> on behalf of > Elvis > > Stansvik <[email protected]> > > Reply-To: HDF Users Discussion List <[email protected]> > > Date: Friday, October 28, 2016 at 11:08 AM > > To: "[email protected]" <[email protected]> > > Cc: HDF Users Discussion List <[email protected]> > > Subject: Re: [Hdf-forum] New HDF5 compression plugin > > > > 2016-10-28 18:14 GMT+02:00 Francesc Alted <[email protected]>: > > > > > > > > 2016-10-28 18:04 GMT+02:00 Elvis Stansvik <[email protected] > >: > > > > > > 2016-10-28 17:53 GMT+02:00 Francesc Alted <[email protected]>: > >> > >> > >> 2016-10-28 17:20 GMT+02:00 Elvis Stansvik > >> <[email protected]>: > >>> > >>> 2016-10-28 16:33 GMT+02:00 Francesc Alted <[email protected]>: > >>> > 2016-10-28 13:59 GMT+02:00 Elvis Stansvik > >>> > <[email protected]>: > >>> >> > >>> >> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <[email protected]>: > >>> >> > I second this request big time and would add zstd, if we are > >>> >> > already > >>> >> > trying > >>> >> > out various encoders. ;) > >>> >> > >>> >> This may not be of interest, and does not include zstd, but I'm > >>> >> attaching an excerpt from some of the results I got when back when > >>> >> doing our basic benchmarking of some algorithms (all lossless). > >>> >> > >>> >> It was based on those that we settled on Blosc_LZ4HC at level 4, > >>> >> since > >>> >> we were looking for very fast decompression times, while longer > >>> >> compression times and slightly larger file size was acceptable up to > >>> >> certain points. The gzip results are included mostly because that's > >>> >> what we were using at the time and I wanted them as a comparison, > >>> >> but > >>> >> we knew we wanted something else. The input for those benchmarks was > >>> >> a > >>> >> 500x300x300 float dataset containing a tomographic 3D image. > >>> > > >>> > > >>> > Zstd was included in Blosc a while ago: > >>> > > >>> > http://blosc.org/blog/zstd-has-just-landed-in-blosc.html > >>> > > >>> > and its performance really shines, even on real data: > >>> > > >>> > > >>> > > >>> > > >>> > > http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html > >>> > > >>> > (although here, being only integers of 1 byte, only the BITSHUFFLE > >>> > filter is > >>> > used, but not the faster SHUFFLE). > >>> > > >>> > As Blosc offers the same API for a number of codecs, trying it in > >>> > combination with Zstd should be really easy. > >>> > >>> Zstd indeed looks very well-balanced. The reason I didn't include it > >>> back when I did those benchmarks was that we were really focused on > >>> decompression speed in our application, compression speed was very > >>> much secondary. So I included mostly LZ4 codecs. > >> > >> > >> Yes, that makes sense, but I think you should give a try at least at the > >> lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too). For > >> these low compression levels Blosc chooses a block size that comfortably > >> fits in L2. Also, note that the benchmarks above where for in-memory > >> data, > >> so for a typical disk-based workflow using HDF5, Blosc+Zstd can still > >> perform well enough. > > > > Alright, thanks for the tip. I read the benchmarks too fast and didn't > > realize it was all in-memory. I should definitely at Zstd. > > > > In our use case it's always from disk (or well, SSD), and sometimes > > even slow-ish network mounts. > > > > > > > > Cool. Keep us informed. I am definitely interested. > > > > > > I found the old input file and very quickly I ran the benchmark again > > with Blosc_ZSTD with byte-based shuffling at compression levels 1, 2 > > and 3: > > > > compressor,ctime_mean(s),ctime_std(s),rtime_mean(s),rtime_std(s),size(B) > > blosc_zstd_1,0.73083,0.00104,0.29489,0.00338,116666294 > > blosc_zstd_2,1.40672,0.00164,0.28097,0.00220,114666454 > > blosc_zstd_3,1.48507,0.01872,0.26451,0.00208,113485801 > > > > Unfortunately I can't find the spreadsheet where I made those > > diagrams, so can't make a new updated one (at least not easily right > > now). > > > > But this shows that Zstd is very competitive. It achieves slightly > > better compression ratio than Blosc_LZ4HC at level 4 (the original > > file size was 189378052 bytes), which is what we picked, and the > > compression is much faster. But Blosc_LZ4HC still wins out in the > > decompression time, so I think in the end we picked the right one. > > > > Our use case is essentially compress once, decompress many many times. > > And during the decompression the user will sit there and wait. That's > > why decompression time was so important to us. > > > > Anyway, thanks a for making me have a look at Zstd, we may yet use it > > somewhere else. > > > > And I now remember the real reason I didn't include it the first time > > around: We're basing our product on Ubuntu 16.04, where Blosc 1.7 is > > the packaged version (1.10 is where Zstd support was added), so I > > lazily just skipped it :) > > > > Elvis > > > > > > > > > > Elvis > > > >> > >> > >>> > >>> > >>> > > >>> >> > >>> >> I might try to dig up the script I used for the benchmark and see if > >>> >> we still have the input I used, and do a test with lossy ZFP. It > >>> >> could > >>> >> be very interesting for creating 3D "thumbnails" in our application. > >>> > > >>> > > >>> > It would be nice if your benchmark code (and dataset) can be made > >>> > publicly > >>> > available so as to serve to others as a good comparison. > >>> > >>> The dataset is unfortunately confidential and not something I can > >>> release. I'm attaching the script I used though, it's very simple. > >>> > >>> But, a disclaimer: The benchmarks I did were not really thorough. They > >>> were also internal and never really meant to be published. It was > >>> mostly a quick and dirty test to see which of these LZ4 codecs would > >>> be in the right ballpark for us. > >> > >> > >> Ok. Thanks anyway. > >> > >>> > >>> > >>> Elvis > >>> > >>> > > >>> >> > >>> >> > >>> >> Elvis > >>> >> > >>> >> > > >>> >> > P > >>> >> > > >>> >> > > >>> >> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote: > >>> >> >> > >>> >> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <[email protected]>: > >>> >> >>> > >>> >> >>> Hi All, > >>> >> >>> > >>> >> >>> Just wanted to mention a new HDF5 floating point compression > >>> >> >>> plugin > >>> >> >>> available on github... > >>> >> >>> > >>> >> >>> https://github.com/LLNL/H5Z-ZFP > >>> >> >>> > >>> >> >>> This plugin will come embedded in the next release of the Silo > >>> >> >>> library > >>> >> >>> as > >>> >> >>> well. > >>> >> >> > >>> >> >> > >>> >> >> Thanks for the pointer. That's very interesting. I had not heard > >>> >> >> about > >>> >> >> ZFP before. The ability to set a bound on the error in the > >>> >> >> lossless > >>> >> >> case seems very useful. > >>> >> >> > >>> >> >> Do you know if there has been any comparative benchmarks of ZFP > >>> >> >> against other compressors? > >>> >> >> > >>> >> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC > >>> >> >> at > >>> >> >> level 4 for our datasets (3D float tomography data), but maybe it > >>> >> >> would be worthwhile to look at ZFP as well.. > >>> >> >> > >>> >> >> Best regards, > >>> >> >> Elvis > >>> >> >> > >>> >> >>> > >>> >> >>> -- > >>> >> >>> Mark C. Miller, LLNL > >>> >> >>> > >>> >> >>> _______________________________________________ > >>> >> >>> Hdf-forum is for HDF software users discussion. > >>> >> >>> [email protected] > >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > >>> >> >>> Twitter: https://twitter.com/hdf5 > >>> >> >> > >>> >> >> > >>> >> >> _______________________________________________ > >>> >> >> Hdf-forum is for HDF software users discussion. > >>> >> >> [email protected] > >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > >>> >> >> Twitter: https://twitter.com/hdf5 > >>> >> >> > >>> >> > > >>> >> > _______________________________________________ > >>> >> > Hdf-forum is for HDF software users discussion. > >>> >> > [email protected] > >>> >> > > >>> >> > > >>> >> > > >>> >> > > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > >>> >> > Twitter: https://twitter.com/hdf5 > >>> >> > >>> >> _______________________________________________ > >>> >> Hdf-forum is for HDF software users discussion. > >>> >> [email protected] > >>> >> > >>> >> > >>> >> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > >>> >> Twitter: https://twitter.com/hdf5 > >>> > > >>> > > >>> > > >>> > > >>> > -- > >>> > Francesc Alted > >>> > > >>> > _______________________________________________ > >>> > Hdf-forum is for HDF software users discussion. > >>> > [email protected] > >>> > > >>> > > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > >>> > Twitter: https://twitter.com/hdf5 > >> > >> > >> > >> > >> -- > >> Francesc Alted > > > > > > > > > > > > -- > > Francesc Alted > > > > > > _______________________________________________ > > Hdf-forum is for HDF software users discussion. > > [email protected] > > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > > Twitter: https://twitter.com/hdf5 > > > > > > _______________________________________________ > > Hdf-forum is for HDF software users discussion. > > [email protected] > > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > > Twitter: https://twitter.com/hdf5 > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > Twitter: https://twitter.com/hdf5 >
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
