This is something that I am very interested in trying out on Blue Waters,
where, with my current lossless compression (gzip) I would end up making
several PB of data over the next couple of years. There are certain
variables in my data that I would be happy to compress lossily. With the
compression ratios being reported this could be a gamechanger for me.

Is there a Fortran interface consistent with the other Fortranized routines
in HDF?

Leigh

On Fri, Oct 28, 2016 at 1:45 PM Elvis Stansvik <[email protected]>
wrote:

> 2016-10-28 20:24 GMT+02:00 Miller, Mark C. <[email protected]>:
> > Can I just clarify some of this discussion...
> >
> > It reads like you are talking about compression ratios around 1.6x, less
> > than 2:1. Is that correct?
>
> Yes, in our case we only do lossless compression so far, but we have
> been talking about lossy. Just haven't taken any steps yet, and I
> didn't even know about ZFP from before. It looks very interesting.
>
> >
> > FYI..ZFP demonstrates results far beyond that (10-30x and better) at the
> > expense of (some) loss.
>
> Yes, ZFP is of course in a completely different ball game compression
> ratio wise than the codecs I compared in my benchmark (which are all
> lossless). It looks very impressive from reading the material on the
> site and skimming the paper.
>
> >
> > However, current efforts indicate that losses are tolerable in many
> > post-processing analysis workflows.
>
> Right, we need to investigate, or rather I need to have a discussion
> with our physicists on how much error we can tolerate (I'm not doing
> any analysis myself, only visualization). Our data is single precision
> float to begin with. For the visualization part I'm sure we could get
> away with quite a bit of loss.
>
> >
> > We think the key to achieving good compression on floating point data,
> going
> > forward, is to allow for some well controlled loss.
>
> Yes, and it seems that ZFP has several knobs for controlling that loss
> which look really useful.
>
> >
> > See this page on on ZFP losses effect, for example, taking derivatives...
> >
> >
> http://computation.llnl.gov/projects/floating-point-compression/zfp-and-derivatives
> >
> > as compared to other compression methods.
>
> Thanks for the pointer.
>
> >
> > We already face loss-like noise in floating point results when dealing
> with
> > system differences either between current systems and software stacks or
> > over time as systems and software evolve.
>
> Indeed.
>
> We simply need to have a look at how much error we can tolerate.
>
> Elvis
>
> >
> > Mark
> >
> > --
> > Mark C. Miller, LLNL
> >
> > From: Hdf-forum <[email protected]> on behalf of
> Elvis
> > Stansvik <[email protected]>
> > Reply-To: HDF Users Discussion List <[email protected]>
> > Date: Friday, October 28, 2016 at 11:08 AM
> > To: "[email protected]" <[email protected]>
> > Cc: HDF Users Discussion List <[email protected]>
> > Subject: Re: [Hdf-forum] New HDF5 compression plugin
> >
> > 2016-10-28 18:14 GMT+02:00 Francesc Alted <[email protected]>:
> >
> >
> >
> > 2016-10-28 18:04 GMT+02:00 Elvis Stansvik <[email protected]
> >:
> >
> >
> > 2016-10-28 17:53 GMT+02:00 Francesc Alted <[email protected]>:
> >>
> >>
> >> 2016-10-28 17:20 GMT+02:00 Elvis Stansvik
> >> <[email protected]>:
> >>>
> >>> 2016-10-28 16:33 GMT+02:00 Francesc Alted <[email protected]>:
> >>> > 2016-10-28 13:59 GMT+02:00 Elvis Stansvik
> >>> > <[email protected]>:
> >>> >>
> >>> >> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <[email protected]>:
> >>> >> > I second this request big time and would add zstd, if we are
> >>> >> > already
> >>> >> > trying
> >>> >> > out various encoders. ;)
> >>> >>
> >>> >> This may not be of interest, and does not include zstd, but I'm
> >>> >> attaching an excerpt from some of the results I got when back when
> >>> >> doing our basic benchmarking of some algorithms (all lossless).
> >>> >>
> >>> >> It was based on those that we settled on Blosc_LZ4HC at level 4,
> >>> >> since
> >>> >> we were looking for very fast decompression times, while longer
> >>> >> compression times and slightly larger file size was acceptable up to
> >>> >> certain points. The gzip results are included mostly because that's
> >>> >> what we were using at the time and I wanted them as a comparison,
> >>> >> but
> >>> >> we knew we wanted something else. The input for those benchmarks was
> >>> >> a
> >>> >> 500x300x300 float dataset containing a tomographic 3D image.
> >>> >
> >>> >
> >>> > Zstd was included in Blosc a while ago:
> >>> >
> >>> > http://blosc.org/blog/zstd-has-just-landed-in-blosc.html
> >>> >
> >>> > and its performance really shines, even on real data:
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html
> >>> >
> >>> > (although here, being only integers of 1 byte, only the BITSHUFFLE
> >>> > filter is
> >>> > used, but not the faster SHUFFLE).
> >>> >
> >>> > As Blosc offers the same API for a number of codecs, trying it in
> >>> > combination with Zstd should be really easy.
> >>>
> >>> Zstd indeed looks very well-balanced. The reason I didn't include it
> >>> back when I did those benchmarks was that we were really focused on
> >>> decompression speed in our application, compression speed was very
> >>> much secondary. So I included mostly LZ4 codecs.
> >>
> >>
> >> Yes, that makes sense, but I think you should give a try at least at the
> >> lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too).  For
> >> these low compression levels Blosc chooses a block size that comfortably
> >> fits in L2.  Also, note that the benchmarks above where for in-memory
> >> data,
> >> so for a typical disk-based workflow using HDF5, Blosc+Zstd can still
> >> perform well enough.
> >
> > Alright, thanks for the tip. I read the benchmarks too fast and didn't
> > realize it was all in-memory. I should definitely at Zstd.
> >
> > In our use case it's always from disk (or well, SSD), and sometimes
> > even slow-ish network mounts.
> >
> >
> >
> > Cool.  Keep us informed.  I am definitely interested.
> >
> >
> > I found the old input file and very quickly I ran the benchmark again
> > with Blosc_ZSTD with byte-based shuffling at compression levels 1, 2
> > and 3:
> >
> > compressor,ctime_mean(s),ctime_std(s),rtime_mean(s),rtime_std(s),size(B)
> > blosc_zstd_1,0.73083,0.00104,0.29489,0.00338,116666294
> > blosc_zstd_2,1.40672,0.00164,0.28097,0.00220,114666454
> > blosc_zstd_3,1.48507,0.01872,0.26451,0.00208,113485801
> >
> > Unfortunately I can't find the spreadsheet where I made those
> > diagrams, so can't make a new updated one (at least not easily right
> > now).
> >
> > But this shows that Zstd is very competitive. It achieves slightly
> > better compression ratio than Blosc_LZ4HC at level 4 (the original
> > file size was 189378052 bytes), which is what we picked, and the
> > compression is much faster. But Blosc_LZ4HC still wins out in the
> > decompression time, so I think in the end we picked the right one.
> >
> > Our use case is essentially compress once, decompress many many times.
> > And during the decompression the user will sit there and wait. That's
> > why decompression time was so important to us.
> >
> > Anyway, thanks a for making me have a look at Zstd, we may yet use it
> > somewhere else.
> >
> > And I now remember the real reason I didn't include it the first time
> > around: We're basing our product on Ubuntu 16.04, where Blosc 1.7 is
> > the packaged version (1.10 is where Zstd support was added), so I
> > lazily just skipped it :)
> >
> > Elvis
> >
> >
> >
> >
> > Elvis
> >
> >>
> >>
> >>>
> >>>
> >>> >
> >>> >>
> >>> >> I might try to dig up the script I used for the benchmark and see if
> >>> >> we still have the input I used, and do a test with lossy ZFP. It
> >>> >> could
> >>> >> be very interesting for creating 3D "thumbnails" in our application.
> >>> >
> >>> >
> >>> > It would be nice if your benchmark code (and dataset) can be made
> >>> > publicly
> >>> > available so as to serve to others as a good comparison.
> >>>
> >>> The dataset is unfortunately confidential and not something I can
> >>> release. I'm attaching the script I used though, it's very simple.
> >>>
> >>> But, a disclaimer: The benchmarks I did were not really thorough. They
> >>> were also internal and never really meant to be published. It was
> >>> mostly a quick and dirty test to see which of these LZ4 codecs would
> >>> be in the right ballpark for us.
> >>
> >>
> >> Ok.  Thanks anyway.
> >>
> >>>
> >>>
> >>> Elvis
> >>>
> >>> >
> >>> >>
> >>> >>
> >>> >> Elvis
> >>> >>
> >>> >> >
> >>> >> > P
> >>> >> >
> >>> >> >
> >>> >> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
> >>> >> >>
> >>> >> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <[email protected]>:
> >>> >> >>>
> >>> >> >>> Hi All,
> >>> >> >>>
> >>> >> >>> Just wanted to mention a new HDF5 floating point compression
> >>> >> >>> plugin
> >>> >> >>> available on github...
> >>> >> >>>
> >>> >> >>> https://github.com/LLNL/H5Z-ZFP
> >>> >> >>>
> >>> >> >>> This plugin will come embedded in the next release of the Silo
> >>> >> >>> library
> >>> >> >>> as
> >>> >> >>> well.
> >>> >> >>
> >>> >> >>
> >>> >> >> Thanks for the pointer. That's very interesting. I had not heard
> >>> >> >> about
> >>> >> >> ZFP before. The ability to set a bound on the error in the
> >>> >> >> lossless
> >>> >> >> case seems very useful.
> >>> >> >>
> >>> >> >> Do you know if there has been any comparative benchmarks of ZFP
> >>> >> >> against other compressors?
> >>> >> >>
> >>> >> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC
> >>> >> >> at
> >>> >> >> level 4 for our datasets (3D float tomography data), but maybe it
> >>> >> >> would be worthwhile to look at ZFP as well..
> >>> >> >>
> >>> >> >> Best regards,
> >>> >> >> Elvis
> >>> >> >>
> >>> >> >>>
> >>> >> >>> --
> >>> >> >>> Mark C. Miller, LLNL
> >>> >> >>>
> >>> >> >>> _______________________________________________
> >>> >> >>> Hdf-forum is for HDF software users discussion.
> >>> >> >>> [email protected]
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> >>> >> >>> Twitter: https://twitter.com/hdf5
> >>> >> >>
> >>> >> >>
> >>> >> >> _______________________________________________
> >>> >> >> Hdf-forum is for HDF software users discussion.
> >>> >> >> [email protected]
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> >>> >> >> Twitter: https://twitter.com/hdf5
> >>> >> >>
> >>> >> >
> >>> >> > _______________________________________________
> >>> >> > Hdf-forum is for HDF software users discussion.
> >>> >> > [email protected]
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> >>> >> > Twitter: https://twitter.com/hdf5
> >>> >>
> >>> >> _______________________________________________
> >>> >> Hdf-forum is for HDF software users discussion.
> >>> >> [email protected]
> >>> >>
> >>> >>
> >>> >>
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> >>> >> Twitter: https://twitter.com/hdf5
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Francesc Alted
> >>> >
> >>> > _______________________________________________
> >>> > Hdf-forum is for HDF software users discussion.
> >>> > [email protected]
> >>> >
> >>> >
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> >>> > Twitter: https://twitter.com/hdf5
> >>
> >>
> >>
> >>
> >> --
> >> Francesc Alted
> >
> >
> >
> >
> >
> > --
> > Francesc Alted
> >
> >
> > _______________________________________________
> > Hdf-forum is for HDF software users discussion.
> > [email protected]
> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> > Twitter: https://twitter.com/hdf5
> >
> >
> > _______________________________________________
> > Hdf-forum is for HDF software users discussion.
> > [email protected]
> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> > Twitter: https://twitter.com/hdf5
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to