Hi Andrei,
        Sounds like a good suggestion to me, I'll put it into our issue tracker 
and we can get it scheduled for an upcoming release.  (It might not make it 
into the 1.8.8 release in November though)

        Thanks for the idea,
                Quincey

On Sep 6, 2011, at 7:46 PM, Salnikov, Andrei A. wrote:

> Hi all,
> 
> I was playing today with the different compression options 
> for our data trying to get some optimal numbers. Our data 
> are 16-bit images with dynamic range which is limited most 
> of the time so I would expect that shuffle filter should
> get us some improvement compared to just using plain zlib
> compression. To my surprise enabling shuffle did not 
> change compression factor at all. Looking at the shuffle 
> filter code it seems that the reason for that is the structure 
> of our data. The dataset which contains the images is a 
> 1-dimensional dataspace with each element containing another
> 2- or 3-dimensional image stack:
> 
> DATASET "..." {
>   DATATYPE  H5T_ARRAY { [32][185][388] H5T_STD_I16LE }
>   DATASPACE  SIMPLE { ( 2132 ) / ( H5S_UNLIMITED ) }
>   STORAGE_LAYOUT {
>      CHUNKED ( 1 )
>      SIZE 6608778714 (1.482:1 COMPRESSION)
>    }
>   FILTERS {
>      COMPRESSION DEFLATE { LEVEL 1 }
>   }
> 
> The size of the image arrays is quite big so the chunks 
> fit just one single array most of the time.
> 
> My understanding is that shuffle algorithm tries to
> re-order bytes form multiple objects in a chunk but because 
> there is just one object (which is array) in this case 
> it does not do anything at all. What I would like shuffle
> to do in this case is to shuffle 16-bit words from the array, 
> not to treat the array as a single object but look inside it.
> 
> I did some experimenting with the code and with a small change 
> to the code I managed to convince it to shuffle things correctly.
> The diff is below this message. It does indeed improve 
> compression of the image data and the data can be read back 
> correctly after de-shuffling with the standard code (h5dump
> shows identical results).
> 
> It would be really helpful for us if something like this could
> be added to HDF5 library. I do not particularly care about other 
> types such as compounds, but for the datasets whose elements are 
> plain arrays it can probably be done without breaking compatibility.
> OTOH if more options could be added to shuffle which control
> shuffling of arrays and other types of data it could become even 
> more useful.
> 
> Thanks,
> Andy
> 
> ----------------------------------------------------------------------
> This is the change applied to 1.8.6 code:
> 
> *** H5Zshuffle.c.orig   2011-02-14 08:23:19.000000000 -0800
> --- H5Zshuffle.c        2011-09-06 17:18:13.022259993 -0700
> ***************
> *** 88,93 ****
> --- 88,98 ----
>      if(H5P_get_filter_by_id(dcpl_plist, H5Z_FILTER_SHUFFLE, &flags, 
> &cd_nelmts, cd_values, (size_t)0, NULL, NULL) < 0)
>        HGOTO_ERROR(H5E_PLINE, H5E_CANTGET, FAIL, "can't get shuffle 
> parameters")
> 
> +     /* If object is an array use its base type */
> +     while (H5T_get_class(type, FALSE) == H5T_ARRAY) {
> +         type = H5T_get_super(type);
> +     }
> + 
>      /* Set "local" parameter for this dataset */
>      if((cd_values[H5Z_SHUFFLE_PARM_SIZE] = (unsigned)H5T_get_size(type)) == 
> 0)
>        HGOTO_ERROR(H5E_PLINE, H5E_BADTYPE, FAIL, "bad datatype size")
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to