Can I just clarify some of this discussion...

It reads like you are talking about compression ratios around 1.6x, less than 
2:1. Is that correct?

FYI..ZFP demonstrates results far beyond that (10-30x and better) at the 
expense of (some) loss.

However, current efforts indicate that losses are tolerable in many 
post-processing analysis workflows.

We think the key to achieving good compression on floating point data, going 
forward, is to allow for some well controlled loss.

See this page on on ZFP losses effect, for example, taking derivatives...

http://computation.llnl.gov/projects/floating-point-compression/zfp-and-derivatives

as compared to other compression methods.

We already face loss-like noise in floating point results when dealing with 
system differences either between current systems and software stacks or over 
time as systems and software evolve.

Mark

--
Mark C. Miller, LLNL

From: Hdf-forum 
<[email protected]<mailto:[email protected]>>
 on behalf of Elvis Stansvik 
<[email protected]<mailto:[email protected]>>
Reply-To: HDF Users Discussion List 
<[email protected]<mailto:[email protected]>>
Date: Friday, October 28, 2016 at 11:08 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Cc: HDF Users Discussion List 
<[email protected]<mailto:[email protected]>>
Subject: Re: [Hdf-forum] New HDF5 compression plugin

2016-10-28 18:14 GMT+02:00 Francesc Alted 
<[email protected]<mailto:[email protected]>>:


2016-10-28 18:04 GMT+02:00 Elvis Stansvik 
<[email protected]<mailto:[email protected]>>:

2016-10-28 17:53 GMT+02:00 Francesc Alted 
<[email protected]<mailto:[email protected]>>:
>
>
> 2016-10-28 17:20 GMT+02:00 Elvis Stansvik
> <[email protected]<mailto:[email protected]>>:
>>
>> 2016-10-28 16:33 GMT+02:00 Francesc Alted 
>> <[email protected]<mailto:[email protected]>>:
>> > 2016-10-28 13:59 GMT+02:00 Elvis Stansvik
>> > <[email protected]<mailto:[email protected]>>:
>> >>
>> >> 2016-10-28 13:23 GMT+02:00 Peter Steinbach 
>> >> <[email protected]<mailto:[email protected]>>:
>> >> > I second this request big time and would add zstd, if we are
>> >> > already
>> >> > trying
>> >> > out various encoders. ;)
>> >>
>> >> This may not be of interest, and does not include zstd, but I'm
>> >> attaching an excerpt from some of the results I got when back when
>> >> doing our basic benchmarking of some algorithms (all lossless).
>> >>
>> >> It was based on those that we settled on Blosc_LZ4HC at level 4,
>> >> since
>> >> we were looking for very fast decompression times, while longer
>> >> compression times and slightly larger file size was acceptable up to
>> >> certain points. The gzip results are included mostly because that's
>> >> what we were using at the time and I wanted them as a comparison,
>> >> but
>> >> we knew we wanted something else. The input for those benchmarks was
>> >> a
>> >> 500x300x300 float dataset containing a tomographic 3D image.
>> >
>> >
>> > Zstd was included in Blosc a while ago:
>> >
>> > http://blosc.org/blog/zstd-has-just-landed-in-blosc.html
>> >
>> > and its performance really shines, even on real data:
>> >
>> >
>> >
>> > http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html
>> >
>> > (although here, being only integers of 1 byte, only the BITSHUFFLE
>> > filter is
>> > used, but not the faster SHUFFLE).
>> >
>> > As Blosc offers the same API for a number of codecs, trying it in
>> > combination with Zstd should be really easy.
>>
>> Zstd indeed looks very well-balanced. The reason I didn't include it
>> back when I did those benchmarks was that we were really focused on
>> decompression speed in our application, compression speed was very
>> much secondary. So I included mostly LZ4 codecs.
>
>
> Yes, that makes sense, but I think you should give a try at least at the
> lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too).  For
> these low compression levels Blosc chooses a block size that comfortably
> fits in L2.  Also, note that the benchmarks above where for in-memory
> data,
> so for a typical disk-based workflow using HDF5, Blosc+Zstd can still
> perform well enough.

Alright, thanks for the tip. I read the benchmarks too fast and didn't
realize it was all in-memory. I should definitely at Zstd.

In our use case it's always from disk (or well, SSD), and sometimes
even slow-ish network mounts.


Cool.  Keep us informed.  I am definitely interested.

I found the old input file and very quickly I ran the benchmark again
with Blosc_ZSTD with byte-based shuffling at compression levels 1, 2
and 3:

compressor,ctime_mean(s),ctime_std(s),rtime_mean(s),rtime_std(s),size(B)
blosc_zstd_1,0.73083,0.00104,0.29489,0.00338,116666294
blosc_zstd_2,1.40672,0.00164,0.28097,0.00220,114666454
blosc_zstd_3,1.48507,0.01872,0.26451,0.00208,113485801

Unfortunately I can't find the spreadsheet where I made those
diagrams, so can't make a new updated one (at least not easily right
now).

But this shows that Zstd is very competitive. It achieves slightly
better compression ratio than Blosc_LZ4HC at level 4 (the original
file size was 189378052 bytes), which is what we picked, and the
compression is much faster. But Blosc_LZ4HC still wins out in the
decompression time, so I think in the end we picked the right one.

Our use case is essentially compress once, decompress many many times.
And during the decompression the user will sit there and wait. That's
why decompression time was so important to us.

Anyway, thanks a for making me have a look at Zstd, we may yet use it
somewhere else.

And I now remember the real reason I didn't include it the first time
around: We're basing our product on Ubuntu 16.04, where Blosc 1.7 is
the packaged version (1.10 is where Zstd support was added), so I
lazily just skipped it :)

Elvis




Elvis

>
>
>>
>>
>> >
>> >>
>> >> I might try to dig up the script I used for the benchmark and see if
>> >> we still have the input I used, and do a test with lossy ZFP. It
>> >> could
>> >> be very interesting for creating 3D "thumbnails" in our application.
>> >
>> >
>> > It would be nice if your benchmark code (and dataset) can be made
>> > publicly
>> > available so as to serve to others as a good comparison.
>>
>> The dataset is unfortunately confidential and not something I can
>> release. I'm attaching the script I used though, it's very simple.
>>
>> But, a disclaimer: The benchmarks I did were not really thorough. They
>> were also internal and never really meant to be published. It was
>> mostly a quick and dirty test to see which of these LZ4 codecs would
>> be in the right ballpark for us.
>
>
> Ok.  Thanks anyway.
>
>>
>>
>> Elvis
>>
>> >
>> >>
>> >>
>> >> Elvis
>> >>
>> >> >
>> >> > P
>> >> >
>> >> >
>> >> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>> >> >>
>> >> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. 
>> >> >> <[email protected]<mailto:[email protected]>>:
>> >> >>>
>> >> >>> Hi All,
>> >> >>>
>> >> >>> Just wanted to mention a new HDF5 floating point compression
>> >> >>> plugin
>> >> >>> available on github...
>> >> >>>
>> >> >>> https://github.com/LLNL/H5Z-ZFP
>> >> >>>
>> >> >>> This plugin will come embedded in the next release of the Silo
>> >> >>> library
>> >> >>> as
>> >> >>> well.
>> >> >>
>> >> >>
>> >> >> Thanks for the pointer. That's very interesting. I had not heard
>> >> >> about
>> >> >> ZFP before. The ability to set a bound on the error in the
>> >> >> lossless
>> >> >> case seems very useful.
>> >> >>
>> >> >> Do you know if there has been any comparative benchmarks of ZFP
>> >> >> against other compressors?
>> >> >>
>> >> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC
>> >> >> at
>> >> >> level 4 for our datasets (3D float tomography data), but maybe it
>> >> >> would be worthwhile to look at ZFP as well..
>> >> >>
>> >> >> Best regards,
>> >> >> Elvis
>> >> >>
>> >> >>>
>> >> >>> --
>> >> >>> Mark C. Miller, LLNL
>> >> >>>
>> >> >>> _______________________________________________
>> >> >>> Hdf-forum is for HDF software users discussion.
>> >> >>> [email protected]<mailto:[email protected]>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> >>> Twitter: https://twitter.com/hdf5
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> Hdf-forum is for HDF software users discussion.
>> >> >> [email protected]<mailto:[email protected]>
>> >> >>
>> >> >>
>> >> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> >> Twitter: https://twitter.com/hdf5
>> >> >>
>> >> >
>> >> > _______________________________________________
>> >> > Hdf-forum is for HDF software users discussion.
>> >> > [email protected]<mailto:[email protected]>
>> >> >
>> >> >
>> >> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> > Twitter: https://twitter.com/hdf5
>> >>
>> >> _______________________________________________
>> >> Hdf-forum is for HDF software users discussion.
>> >> [email protected]<mailto:[email protected]>
>> >>
>> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> Twitter: https://twitter.com/hdf5
>> >
>> >
>> >
>> >
>> > --
>> > Francesc Alted
>> >
>> > _______________________________________________
>> > Hdf-forum is for HDF software users discussion.
>> > [email protected]<mailto:[email protected]>
>> >
>> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> > Twitter: https://twitter.com/hdf5
>
>
>
>
> --
> Francesc Alted




--
Francesc Alted

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]<mailto:[email protected]>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to