Hi Quincey,
thank you so much, we really appreciate this.
Cheers,
Andy
Quincey Koziol wrote on 2011-09-07:
> Hi Andrei,
> Sounds like a good suggestion to me, I'll put it into our issue
> tracker and we can get it scheduled for an upcoming release. (It might
> not make it into the 1.8.8 release in November though)
>
> Thanks for the idea,
> Quincey
>
> On Sep 6, 2011, at 7:46 PM, Salnikov, Andrei A. wrote:
>
>> Hi all,
>>
>> I was playing today with the different compression options
>> for our data trying to get some optimal numbers. Our data
>> are 16-bit images with dynamic range which is limited most
>> of the time so I would expect that shuffle filter should
>> get us some improvement compared to just using plain zlib
>> compression. To my surprise enabling shuffle did not
>> change compression factor at all. Looking at the shuffle
>> filter code it seems that the reason for that is the structure
>> of our data. The dataset which contains the images is a
>> 1-dimensional dataspace with each element containing another
>> 2- or 3-dimensional image stack:
>>
>> DATASET "..." {
>> DATATYPE H5T_ARRAY { [32][185][388] H5T_STD_I16LE }
>> DATASPACE SIMPLE { ( 2132 ) / ( H5S_UNLIMITED ) }
>> STORAGE_LAYOUT {
>> CHUNKED ( 1 )
>> SIZE 6608778714 (1.482:1 COMPRESSION)
>> }
>> FILTERS {
>> COMPRESSION DEFLATE { LEVEL 1 }
>> }
>> The size of the image arrays is quite big so the chunks
>> fit just one single array most of the time.
>>
>> My understanding is that shuffle algorithm tries to
>> re-order bytes form multiple objects in a chunk but because
>> there is just one object (which is array) in this case
>> it does not do anything at all. What I would like shuffle
>> to do in this case is to shuffle 16-bit words from the array,
>> not to treat the array as a single object but look inside it.
>>
>> I did some experimenting with the code and with a small change
>> to the code I managed to convince it to shuffle things correctly.
>> The diff is below this message. It does indeed improve
>> compression of the image data and the data can be read back
>> correctly after de-shuffling with the standard code (h5dump
>> shows identical results).
>>
>> It would be really helpful for us if something like this could
>> be added to HDF5 library. I do not particularly care about other
>> types such as compounds, but for the datasets whose elements are
>> plain arrays it can probably be done without breaking compatibility.
>> OTOH if more options could be added to shuffle which control
>> shuffling of arrays and other types of data it could become even
>> more useful.
>>
>> Thanks,
>> Andy
>>
>> --------------------------------------------------------------------- -
>> This is the change applied to 1.8.6 code:
>>
>> *** H5Zshuffle.c.orig 2011-02-14 08:23:19.000000000 -0800
>> --- H5Zshuffle.c 2011-09-06 17:18:13.022259993 -0700
>> ***************
>> *** 88,93 ****
>> --- 88,98 ----
>> if(H5P_get_filter_by_id(dcpl_plist, H5Z_FILTER_SHUFFLE, &flags,
> &cd_nelmts, cd_values, (size_t)0, NULL, NULL) < 0)
>> HGOTO_ERROR(H5E_PLINE, H5E_CANTGET, FAIL, "can't get shuffle
> parameters")
>>
>> + /* If object is an array use its base type */
>> + while (H5T_get_class(type, FALSE) == H5T_ARRAY) {
>> + type = H5T_get_super(type);
>> + }
>> +
>> /* Set "local" parameter for this dataset */
>> if((cd_values[H5Z_SHUFFLE_PARM_SIZE] =
> (unsigned)H5T_get_size(type)) == 0)
>> HGOTO_ERROR(H5E_PLINE, H5E_BADTYPE, FAIL, "bad datatype size")
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
> _______________________________________________ Hdf-forum is for HDF
> software users discussion. [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org