Mark,

Yes indeed, the model I use is written in Fortran 95 so it would be
convenient to be able to activate the ZFP filter like any of HDF5's
existing filters. I looked at the Fortran interface code and it does look
pretty easy to add Fortran wrappers.

In the past I have played with scale/offset and N-bit filters, but ended up
just going with lossless gzip for my current work. They all have Fortran
hooks. Here's a few lines from my code that choose gzip with compression
factor 1 (the fastest) after turning on chunking:

call h5pcreate_f(H5P_DATASET_CREATE_F,chunk_id,ierr);check_err(ierr)
call h5pset_chunk_f(chunk_id,rank,chunkdims,ierr);check_err(ierr)
call h5pset_deflate_f (chunk_id,1,ierr);check_err(ierr) !We have chosen
gzip level 1
call
h5dcreate_f(f_id,trim(varname),H5T_NATIVE_REAL,dspace_id,dset_id,ierr,chunk_id);check_err(ierr)
call h5dwrite_f(dset_id, H5T_NATIVE_REAL, MCM3d, dims, ierr);check_err(ierr)

The Fortran interface for HDF5 is typically very similar to the C
interface, with an extra argument for a return value (ierr above), and with
a trailing _f on the subroutine name. And flags/id's are all integers with
the Fortran interface. Looking at H5Pff.f90 in fortran/src which contains
the h5pset_deflate_f code:

! Fortran90 Interface:
  SUBROUTINE h5pset_deflate_f(prp_id, level, hdferr)
    IMPLICIT NONE
    INTEGER(HID_T), INTENT(IN) :: prp_id ! Property list identifier
    INTEGER, INTENT(IN) :: level         ! Compression level
    INTEGER, INTENT(OUT) :: hdferr       ! Error code
                                         ! 0 on success and -1 on failure

  hdferr = h5pset_deflate_c(prp_id, level)

  END SUBROUTINE h5pset_deflate_f

(I took out the interface block required by Windows).

And the C wrapper code is found in src/fortran/H5Pf.c:

int_f
h5pset_deflate_c ( hid_t_f *prp_id , int_f *level)
/******/
{
  int ret_value = 0;
  hid_t c_prp_id;
  unsigned c_level;
  herr_t status;

  c_prp_id = (hid_t)*prp_id;
  c_level = (unsigned)*level;
  status = H5Pset_deflate(c_prp_id, c_level);
  if ( status < 0  ) ret_value = -1;
  return ret_value;
}

I think it would be pretty straightforward to just emulate these types of
calls for setting compression parameters for ZFP in the Fortran interface.

If this filter is all it's cracked up to be (and I have no reason to assume
it isn't based upon the paper that describes it), it would definitely have
a big impact on the data footprint of my output (and that of many others
that choose to use it). I say this as someone who is currently waiting for
about 100 TB of archived data to make its way from tape storage to scratch
on Blue Waters, which will take about a week. With ZFP compression this
could be done overnight.

I'm fairly certain folks on machines like Blue Waters would be very
interested in ZFP being part of HDF5. I would love to see how it performs
at scale.

I have a conference to prepare for this week but I can work on this myself
the following week.

Leigh

On Sun, Oct 30, 2016 at 12:36 AM Miller, Mark C. <[email protected]> wrote:

> Hi Leigh,
>
> Hmm. Fortran interface, eh? You mean to the HDF5 filter we've made
> available or to the ZFP compression library?
>
> You mentioned "...Fortranized routines in HDF...", so I am assuming the
> HDF5 filter.
>
> Well, short answer is at present, no we don't have those. But, very easy
> to add.
>
> I've never used HDF5's fortran interface. Do you have your can you point
> me to example(s) that use filters already?
>
> If so, we could probably come up with what you would need and test it
> pretty quickly.
>
> Mark
>
>
>
> --
> Mark C. Miller, LLNL
>
> From: Hdf-forum <[email protected]> on behalf of Leigh
> Orf <[email protected]>
>
> Reply-To: HDF Users Discussion List <[email protected]>
> Date: Saturday, October 29, 2016 at 4:47 PM
> To: HDF Users Discussion List <[email protected]>
>
> Subject: Re: [Hdf-forum] New HDF5 compression plugin
>
> This is something that I am very interested in trying out on Blue Waters,
> where, with my current lossless compression (gzip) I would end up making
> several PB of data over the next couple of years. There are certain
> variables in my data that I would be happy to compress lossily. With the
> compression ratios being reported this could be a gamechanger for me.
>
> Is there a Fortran interface consistent with the other Fortranized
> routines in HDF?
>
> Leigh
>
> On Fri, Oct 28, 2016 at 1:45 PM Elvis Stansvik <
> [email protected]> wrote:
>
> 2016-10-28 20:24 GMT+02:00 Miller, Mark C. <[email protected]>:
> > Can I just clarify some of this discussion...
> >
> > It reads like you are talking about compression ratios around 1.6x, less
> > than 2:1. Is that correct?
>
> Yes, in our case we only do lossless compression so far, but we have
> been talking about lossy. Just haven't taken any steps yet, and I
> didn't even know about ZFP from before. It looks very interesting.
>
> >
> > FYI..ZFP demonstrates results far beyond that (10-30x and better) at the
> > expense of (some) loss.
>
> Yes, ZFP is of course in a completely different ball game compression
> ratio wise than the codecs I compared in my benchmark (which are all
> lossless). It looks very impressive from reading the material on the
> site and skimming the paper.
>
> >
> > However, current efforts indicate that losses are tolerable in many
> > post-processing analysis workflows.
>
> Right, we need to investigate, or rather I need to have a discussion
> with our physicists on how much error we can tolerate (I'm not doing
> any analysis myself, only visualization). Our data is single precision
> float to begin with. For the visualization part I'm sure we could get
> away with quite a bit of loss.
>
> >
> > We think the key to achieving good compression on floating point data,
> going
> > forward, is to allow for some well controlled loss.
>
> Yes, and it seems that ZFP has several knobs for controlling that loss
> which look really useful.
>
> >
> > See this page on on ZFP losses effect, for example, taking derivatives...
> >
> >
> http://computation.llnl.gov/projects/floating-point-compression/zfp-and-derivatives
> >
> > as compared to other compression methods.
>
> Thanks for the pointer.
>
> >
> > We already face loss-like noise in floating point results when dealing
> with
> > system differences either between current systems and software stacks or
> > over time as systems and software evolve.
>
> Indeed.
>
> We simply need to have a look at how much error we can tolerate.
>
> Elvis
>
> >
> > Mark
> >
> > --
> > Mark C. Miller, LLNL
> >
> > From: Hdf-forum <[email protected]> on behalf of
> Elvis
> > Stansvik <[email protected]>
> > Reply-To: HDF Users Discussion List <[email protected]>
> > Date: Friday, October 28, 2016 at 11:08 AM
> > To: "[email protected]" <[email protected]>
> > Cc: HDF Users Discussion List <[email protected]>
> > Subject: Re: [Hdf-forum] New HDF5 compression plugin
> >
> > 2016-10-28 18:14 GMT+02:00 Francesc Alted <[email protected]>:
> >
> >
> >
> > 2016-10-28 18:04 GMT+02:00 Elvis Stansvik <[email protected]
> >:
> >
> >
> > 2016-10-28 17:53 GMT+02:00 Francesc Alted <[email protected]>:
> >>
> >>
> >> 2016-10-28 17:20 GMT+02:00 Elvis Stansvik
> >> <[email protected]>:
> >>>
> >>> 2016-10-28 16:33 GMT+02:00 Francesc Alted <[email protected]>:
> >>> > 2016-10-28 13:59 GMT+02:00 Elvis Stansvik
> >>> > <[email protected]>:
> >>> >>
> >>> >> 2016-10-28 13:23 GMT+02:00 Peter Steinbach <[email protected]>:
> >>> >> > I second this request big time and would add zstd, if we are
> >>> >> > already
> >>> >> > trying
> >>> >> > out various encoders. ;)
> >>> >>
> >>> >> This may not be of interest, and does not include zstd, but I'm
> >>> >> attaching an excerpt from some of the results I got when back when
> >>> >> doing our basic benchmarking of some algorithms (all lossless).
> >>> >>
> >>> >> It was based on those that we settled on Blosc_LZ4HC at level 4,
> >>> >> since
> >>> >> we were looking for very fast decompression times, while longer
> >>> >> compression times and slightly larger file size was acceptable up to
> >>> >> certain points. The gzip results are included mostly because that's
> >>> >> what we were using at the time and I wanted them as a comparison,
> >>> >> but
> >>> >> we knew we wanted something else. The input for those benchmarks was
> >>> >> a
> >>> >> 500x300x300 float dataset containing a tomographic 3D image.
> >>> >
> >>> >
> >>> > Zstd was included in Blosc a while ago:
> >>> >
> >>> > http://blosc.org/blog/zstd-has-just-landed-in-blosc.html
> >>> >
> >>> > and its performance really shines, even on real data:
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> http://alimanfoo.github.io/2016/09/21/genotype-compression-benchmark.html
> >>> >
> >>> > (although here, being only integers of 1 byte, only the BITSHUFFLE
> >>> > filter is
> >>> > used, but not the faster SHUFFLE).
> >>> >
> >>> > As Blosc offers the same API for a number of codecs, trying it in
> >>> > combination with Zstd should be really easy.
> >>>
> >>> Zstd indeed looks very well-balanced. The reason I didn't include it
> >>> back when I did those benchmarks was that we were really focused on
> >>> decompression speed in our application, compression speed was very
> >>> much secondary. So I included mostly LZ4 codecs.
> >>
> >>
> >> Yes, that makes sense, but I think you should give a try at least at the
> >> lowest compression levels for Blosc+Zstd (1, 2 and probably 3 too).  For
> >> these low compression levels Blosc chooses a block size that comfortably
> >> fits in L2.  Also, note that the benchmarks above where for in-memory
> >> data,
> >> so for a typical disk-based workflow using HDF5, Blosc+Zstd can still
> >> perform well enough.
> >
> > Alright, thanks for the tip. I read the benchmarks too fast and didn't
> > realize it was all in-memory. I should definitely at Zstd.
> >
> > In our use case it's always from disk (or well, SSD), and sometimes
> > even slow-ish network mounts.
> >
> >
> >
> > Cool.  Keep us informed.  I am definitely interested.
> >
> >
> > I found the old input file and very quickly I ran the benchmark again
> > with Blosc_ZSTD with byte-based shuffling at compression levels 1, 2
> > and 3:
> >
> > compressor,ctime_mean(s),ctime_std(s),rtime_mean(s),rtime_std(s),size(B)
> > blosc_zstd_1,0.73083,0.00104,0.29489,0.00338,116666294
> > blosc_zstd_2,1.40672,0.00164,0.28097,0.00220,114666454
> > blosc_zstd_3,1.48507,0.01872,0.26451,0.00208,113485801
> >
> > Unfortunately I can't find the spreadsheet where I made those
> > diagrams, so can't make a new updated one (at least not easily right
> > now).
> >
> > But this shows that Zstd is very competitive. It achieves slightly
> > better compression ratio than Blosc_LZ4HC at level 4 (the original
> > file size was 189378052 bytes), which is what we picked, and the
> > compression is much faster. But Blosc_LZ4HC still wins out in the
> > decompression time, so I think in the end we picked the right one.
> >
> > Our use case is essentially compress once, decompress many many times.
> > And during the decompression the user will sit there and wait. That's
> > why decompression time was so important to us.
> >
> > Anyway, thanks a for making me have a look at Zstd, we may yet use it
> > somewhere else.
> >
> > And I now remember the real reason I didn't include it the first time
> > around: We're basing our product on Ubuntu 16.04, where Blosc 1.7 is
> > the packaged version (1.10 is where Zstd support was added), so I
> > lazily just skipped it :)
> >
> > Elvis
> >
> >
> >
> >
> > Elvis
> >
> >>
> >>
> >>>
> >>>
> >>> >
> >>> >>
> >>> >> I might try to dig up the script I used for the benchmark and see if
> >>> >> we still have the input I used, and do a test with lossy ZFP. It
> >>> >> could
> >>> >> be very interesting for creating 3D "thumbnails" in our application.
> >>> >
> >>> >
> >>> > It would be nice if your benchmark code (and dataset) can be made
> >>> > publicly
> >>> > available so as to serve to others as a good comparison.
> >>>
> >>> The dataset is unfortunately confidential and not something I can
> >>> release. I'm attaching the script I used though, it's very simple.
> >>>
> >>> But, a disclaimer: The benchmarks I did were not really thorough. They
> >>> were also internal and never really meant to be published. It was
> >>> mostly a quick and dirty test to see which of these LZ4 codecs would
> >>> be in the right ballpark for us.
> >>
> >>
> >> Ok.  Thanks anyway.
> >>
> >>>
> >>>
> >>> Elvis
> >>>
> >>> >
> >>> >>
> >>> >>
> >>> >> Elvis
> >>> >>
> >>> >> >
> >>> >> > P
> >>> >> >
> >>> >> >
> >>> >> > On 10/28/2016 01:12 PM, Elvis Stansvik wrote:
> >>> >> >>
> >>> >> >> 2016-10-28 1:53 GMT+02:00 Miller, Mark C. <[email protected]>:
> >>> >> >>>
> >>> >> >>> Hi All,
> >>> >> >>>
> >>> >> >>> Just wanted to mention a new HDF5 floating point compression
> >>> >> >>> plugin
> >>> >> >>> available on github...
> >>> >> >>>
> >>> >> >>> https://github.com/LLNL/H5Z-ZFP
> >>> >> >>>
> >>> >> >>> This plugin will come embedded in the next release of the Silo
> >>> >> >>> library
> >>> >> >>> as
> >>> >> >>> well.
> >>> >> >>
> >>> >> >>
> >>> >> >> Thanks for the pointer. That's very interesting. I had not heard
> >>> >> >> about
> >>> >> >> ZFP before. The ability to set a bound on the error in the
> >>> >> >> lossless
> >>> >> >> case seems very useful.
> >>> >> >>
> >>> >> >> Do you know if there has been any comparative benchmarks of ZFP
> >>> >> >> against other compressors?
> >>> >> >>
> >>> >> >> After some basic benchmarking, we recently settled on Blosc_LZ4HC
> >>> >> >> at
> >>> >> >> level 4 for our datasets (3D float tomography data), but maybe it
> >>> >> >> would be worthwhile to look at ZFP as well..
> >>> >> >>
> >>> >> >> Best regards,
> >>> >> >> Elvis
> >>> >> >>
> >>> >> >>>
> >>> >> >>> --
> >>> >> >>> Mark C. Miller, LLNL
> >>> >> >>>
> >>> >> >>> _______________________________________________
> >>> >> >>> Hdf-forum is for HDF software users discussion.
> >>> >> >>> [email protected]
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> >>> >> >>>
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> >>> >> >>> Twitter: https://twitter.com/hdf5
> >>> >> >>
> >>> >> >>
> >>> >> >> _______________________________________________
> >>> >> >> Hdf-forum is for HDF software users discussion.
> >>> >> >> [email protected]
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> >>> >> >> Twitter: https://twitter.com/hdf5
> >>> >> >>
> >>> >> >
> >>> >> > _______________________________________________
> >>> >> > Hdf-forum is for HDF software users discussion.
> >>> >> > [email protected]
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> >
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> >>> >> > Twitter: https://twitter.com/hdf5
> >>> >>
> >>> >> _______________________________________________
> >>> >> Hdf-forum is for HDF software users discussion.
> >>> >> [email protected]
> >>> >>
> >>> >>
> >>> >>
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> >>> >> Twitter: https://twitter.com/hdf5
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Francesc Alted
> >>> >
> >>> > _______________________________________________
> >>> > Hdf-forum is for HDF software users discussion.
> >>> > [email protected]
> >>> >
> >>> >
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> >>> > Twitter: https://twitter.com/hdf5
> >>
> >>
> >>
> >>
> >> --
> >> Francesc Alted
> >
> >
> >
> >
> >
> > --
> > Francesc Alted
> >
> >
> > _______________________________________________
> > Hdf-forum is for HDF software users discussion.
> > [email protected]
> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> > Twitter: https://twitter.com/hdf5
> >
> >
> > _______________________________________________
> > Hdf-forum is for HDF software users discussion.
> > [email protected]
> > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> > Twitter: https://twitter.com/hdf5
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
> Twitter: https://twitter.com/hdf5
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to