Re: [Hdf-forum] Shuffle filter for array data

Salnikov, Andrei A. Wed, 07 Sep 2011 09:17:35 -0700

Hi Quincey,

thank you so much, we really appreciate this.


Cheers,
Andy


Quincey Koziol wrote on 2011-09-07:
> Hi Andrei,
>       Sounds like a good suggestion to me, I'll put it into our issue
> tracker and we can get it scheduled for an upcoming release.  (It might
> not make it into the 1.8.8 release in November though)
> 
>       Thanks for the idea,
>               Quincey
> 
> On Sep 6, 2011, at 7:46 PM, Salnikov, Andrei A. wrote:
> 
>> Hi all,
>> 
>> I was playing today with the different compression options
>> for our data trying to get some optimal numbers. Our data
>> are 16-bit images with dynamic range which is limited most
>> of the time so I would expect that shuffle filter should
>> get us some improvement compared to just using plain zlib
>> compression. To my surprise enabling shuffle did not
>> change compression factor at all. Looking at the shuffle
>> filter code it seems that the reason for that is the structure
>> of our data. The dataset which contains the images is a
>> 1-dimensional dataspace with each element containing another
>> 2- or 3-dimensional image stack:
>> 
>> DATASET "..." {
>>   DATATYPE  H5T_ARRAY { [32][185][388] H5T_STD_I16LE }
>>   DATASPACE  SIMPLE { ( 2132 ) / ( H5S_UNLIMITED ) }
>>   STORAGE_LAYOUT {
>>      CHUNKED ( 1 )
>>      SIZE 6608778714 (1.482:1 COMPRESSION)
>>    }
>>   FILTERS {
>>      COMPRESSION DEFLATE { LEVEL 1 }
>>   }
>> The size of the image arrays is quite big so the chunks
>> fit just one single array most of the time.
>> 
>> My understanding is that shuffle algorithm tries to
>> re-order bytes form multiple objects in a chunk but because
>> there is just one object (which is array) in this case
>> it does not do anything at all. What I would like shuffle
>> to do in this case is to shuffle 16-bit words from the array,
>> not to treat the array as a single object but look inside it.
>> 
>> I did some experimenting with the code and with a small change
>> to the code I managed to convince it to shuffle things correctly.
>> The diff is below this message. It does indeed improve
>> compression of the image data and the data can be read back
>> correctly after de-shuffling with the standard code (h5dump
>> shows identical results).
>> 
>> It would be really helpful for us if something like this could
>> be added to HDF5 library. I do not particularly care about other
>> types such as compounds, but for the datasets whose elements are
>> plain arrays it can probably be done without breaking compatibility.
>> OTOH if more options could be added to shuffle which control
>> shuffling of arrays and other types of data it could become even
>> more useful.
>> 
>> Thanks,
>> Andy
>> 
>> --------------------------------------------------------------------- -
>> This is the change applied to 1.8.6 code:
>> 
>> *** H5Zshuffle.c.orig   2011-02-14 08:23:19.000000000 -0800
>> --- H5Zshuffle.c        2011-09-06 17:18:13.022259993 -0700
>> ***************
>> *** 88,93 ****
>> --- 88,98 ----
>>      if(H5P_get_filter_by_id(dcpl_plist, H5Z_FILTER_SHUFFLE, &flags,
> &cd_nelmts, cd_values, (size_t)0, NULL, NULL) < 0)
>>        HGOTO_ERROR(H5E_PLINE, H5E_CANTGET, FAIL, "can't get shuffle
> parameters")
>> 
>> +     /* If object is an array use its base type */
>> +     while (H5T_get_class(type, FALSE) == H5T_ARRAY) {
>> +         type = H5T_get_super(type);
>> +     }
>> +
>>      /* Set "local" parameter for this dataset */
>>      if((cd_values[H5Z_SHUFFLE_PARM_SIZE] =
> (unsigned)H5T_get_size(type)) == 0)
>>        HGOTO_ERROR(H5E_PLINE, H5E_BADTYPE, FAIL, "bad datatype size")
>> 
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 
> 
> _______________________________________________ Hdf-forum is for HDF
> software users discussion. [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org




_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] Shuffle filter for array data

Reply via email to