2016-10-28 18:14 GMT+02:00 Francesc Alted <[email protected]>:
>
>
> 2016-10-28 18:04 GMT+02:00 Elvis Stansvik <[email protected]>:
>>
>> 2016-10-28 17:53 GMT+02:00 Francesc Alted <[email protected]>:
>> >
>> >
>> > 2016-10-28 17:20 GMT+02:00 Elvis Stansvik
>> > <[email protected]>:
>> >>
>> >> 2016-10-28 16:33 GMT+02:00 Francesc Alted <[email protected]>:
>> >> > 2016-10-28 13:59 GMT+02:00 Elvis Stansvik
>> >> > <[email protected]>:
>> >> >>
>> >> >> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <[email protected]>:
>> >> >> > I second this request big time and would add zstd, if we are
>> >> >> > already
>> >> >> > trying
>> >> >> > out various encoders. ;)
>> >> >>
>> >> >> This may not be of interest, and does not include zstd, but I'm
>> >> >> attaching an excerpt from some of the results I got when back when
>> >> >> doing our basic benchmarking of some algorithms (all lossless).
>> >> >>
>> >> >> It was based on those that we settled on Blosc_LZ4HC at level 4,
>> >> >> since
>> >> >> we were looking for very fast decompression times, while longer
>> >> >> compression times and slightly larger file size was acceptable up to
>> >> >> certain points. The gzip results are included mostly because that's
>> >> >> what we were using at the time and I wanted them as a comparison,
>> >> >> but
>> >> >> we knew we wanted something else. The input for those benchmarks was
>> >> >> a
>> >> >> 500x300x300 float dataset containing a tomographic 3D image.
>> >> >
>> >> >
>> >> > Zstd was included in Blosc a while ago:
>> >> >
>> >> > http://blosc.org/blog/zstd-has-just-landed-in-blosc.html
>> >> >
>> >> > and its performance really shines, even on real data:
>> >> >
>> >> >
>> >> >
>> >> > http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html
>> >> >
>> >> > (although here, being only integers of 1 byte, only the BITSHUFFLE
>> >> > filter is
>> >> > used, but not the faster SHUFFLE).
>> >> >
>> >> > As Blosc offers the same API for a number of codecs, trying it in
>> >> > combination with Zstd should be really easy.
>> >>
>> >> Zstd indeed looks very well-balanced. The reason I didn't include it
>> >> back when I did those benchmarks was that we were really focused on
>> >> decompression speed in our application, compression speed was very
>> >> much secondary. So I included mostly LZ4 codecs.
>> >
>> >
>> > Yes, that makes sense, but I think you should give a try at least at the
>> > lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too).  For
>> > these low compression levels Blosc chooses a block size that comfortably
>> > fits in L2.  Also, note that the benchmarks above where for in-memory
>> > data,
>> > so for a typical disk-based workflow using HDF5, Blosc+Zstd can still
>> > perform well enough.
>>
>> Alright, thanks for the tip. I read the benchmarks too fast and didn't
>> realize it was all in-memory. I should definitely at Zstd.
>>
>> In our use case it's always from disk (or well, SSD), and sometimes
>> even slow-ish network mounts.
>
>
> Cool.  Keep us informed.  I am definitely interested.

I found the old input file and very quickly I ran the benchmark again
with Blosc_ZSTD with byte-based shuffling at compression levels 1, 2
and 3:

compressor,ctime_mean(s),ctime_std(s),rtime_mean(s),rtime_std(s),size(B)
blosc_zstd_1,0.73083,0.00104,0.29489,0.00338,116666294
blosc_zstd_2,1.40672,0.00164,0.28097,0.00220,114666454
blosc_zstd_3,1.48507,0.01872,0.26451,0.00208,113485801

Unfortunately I can't find the spreadsheet where I made those
diagrams, so can't make a new updated one (at least not easily right
now).

But this shows that Zstd is very competitive. It achieves slightly
better compression ratio than Blosc_LZ4HC at level 4 (the original
file size was 189378052 bytes), which is what we picked, and the
compression is much faster. But Blosc_LZ4HC still wins out in the
decompression time, so I think in the end we picked the right one.

Our use case is essentially compress once, decompress many many times.
And during the decompression the user will sit there and wait. That's
why decompression time was so important to us.

Anyway, thanks a for making me have a look at Zstd, we may yet use it
somewhere else.

And I now remember the real reason I didn't include it the first time
around: We're basing our product on Ubuntu 16.04, where Blosc 1.7 is
the packaged version (1.10 is where Zstd support was added), so I
lazily just skipped it :)

Elvis

>
>>
>>
>> Elvis
>>
>> >
>> >
>> >>
>> >>
>> >> >
>> >> >>
>> >> >> I might try to dig up the script I used for the benchmark and see if
>> >> >> we still have the input I used, and do a test with lossy ZFP. It
>> >> >> could
>> >> >> be very interesting for creating 3D "thumbnails" in our application.
>> >> >
>> >> >
>> >> > It would be nice if your benchmark code (and dataset) can be made
>> >> > publicly
>> >> > available so as to serve to others as a good comparison.
>> >>
>> >> The dataset is unfortunately confidential and not something I can
>> >> release. I'm attaching the script I used though, it's very simple.
>> >>
>> >> But, a disclaimer: The benchmarks I did were not really thorough. They
>> >> were also internal and never really meant to be published. It was
>> >> mostly a quick and dirty test to see which of these LZ4 codecs would
>> >> be in the right ballpark for us.
>> >
>> >
>> > Ok.  Thanks anyway.
>> >
>> >>
>> >>
>> >> Elvis
>> >>
>> >> >
>> >> >>
>> >> >>
>> >> >> Elvis
>> >> >>
>> >> >> >
>> >> >> > P
>> >> >> >
>> >> >> >
>> >> >> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
>> >> >> >>
>> >> >> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <[email protected]>:
>> >> >> >>>
>> >> >> >>> Hi All,
>> >> >> >>>
>> >> >> >>> Just wanted to mention a new HDF5 floating point compression
>> >> >> >>> plugin
>> >> >> >>> available on github...
>> >> >> >>>
>> >> >> >>> https://github.com/LLNL/H5Z-ZFP
>> >> >> >>>
>> >> >> >>> This plugin will come embedded in the next release of the Silo
>> >> >> >>> library
>> >> >> >>> as
>> >> >> >>> well.
>> >> >> >>
>> >> >> >>
>> >> >> >> Thanks for the pointer. That's very interesting. I had not heard
>> >> >> >> about
>> >> >> >> ZFP before. The ability to set a bound on the error in the
>> >> >> >> lossless
>> >> >> >> case seems very useful.
>> >> >> >>
>> >> >> >> Do you know if there has been any comparative benchmarks of ZFP
>> >> >> >> against other compressors?
>> >> >> >>
>> >> >> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC
>> >> >> >> at
>> >> >> >> level 4 for our datasets (3D float tomography data), but maybe it
>> >> >> >> would be worthwhile to look at ZFP as well..
>> >> >> >>
>> >> >> >> Best regards,
>> >> >> >> Elvis
>> >> >> >>
>> >> >> >>>
>> >> >> >>> --
>> >> >> >>> Mark C. Miller, LLNL
>> >> >> >>>
>> >> >> >>> _______________________________________________
>> >> >> >>> Hdf-forum is for HDF software users discussion.
>> >> >> >>> [email protected]
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> >> >>> Twitter: https://twitter.com/hdf5
>> >> >> >>
>> >> >> >>
>> >> >> >> _______________________________________________
>> >> >> >> Hdf-forum is for HDF software users discussion.
>> >> >> >> [email protected]
>> >> >> >>
>> >> >> >>
>> >> >> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> >> >> Twitter: https://twitter.com/hdf5
>> >> >> >>
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > Hdf-forum is for HDF software users discussion.
>> >> >> > [email protected]
>> >> >> >
>> >> >> >
>> >> >> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> >> > Twitter: https://twitter.com/hdf5
>> >> >>
>> >> >> _______________________________________________
>> >> >> Hdf-forum is for HDF software users discussion.
>> >> >> [email protected]
>> >> >>
>> >> >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> >> Twitter: https://twitter.com/hdf5
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Francesc Alted
>> >> >
>> >> > _______________________________________________
>> >> > Hdf-forum is for HDF software users discussion.
>> >> > [email protected]
>> >> >
>> >> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>> >> > Twitter: https://twitter.com/hdf5
>> >
>> >
>> >
>> >
>> > --
>> > Francesc Alted
>
>
>
>
> --
> Francesc Alted

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to