Hi, I just tested with the Intel QuickAssist Technology, which provide hardware accelerate to GZIP, you can see detail here https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html
Here is the benchmark result run on Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with single thread lzbench 1.7.2 (64-bit Linux) Assembled by P.Skibinski | Compressor name | Compression| Decompress.| Compr. size | Ratio | Filename | | memcpy | 4942 MB/s | 5688 MB/s | 3263523 | 1.00 | calgary/calgary.tar | | qat 1.0.0 | 2312 MB/s | 3538 MB/s | 1274379 | 2.56 | calgary/calgary.tar | | snappy 1.1.4 | 283 MB/s | 1144 MB/s | 1686240 | 1.94 | calgary/calgary.tar | | lz4 1.7.5 | 453 MB/s | 2514 MB/s | 1685795 | 1.94 | calgary/calgary.tar | | zstd 1.3.1 -1 | 279 MB/s | 723 MB/s | 1187211 | 2.75 | calgary/calgary.tar | | zlib 1.2.11 -1 | 79 MB/s | 261 MB/s | 1240838 | 2.63 | calgary/calgary.tar | Thanks, XieQi -----Original Message----- From: Wes McKinney <[email protected]> Sent: Thursday, October 22, 2020 9:58 AM To: dev <[email protected]> Cc: [email protected]; Xu, Cheng A <[email protected]>; Dong, Xin <[email protected]>; Zhang, Jie1 <[email protected]>; Xie, Qi <[email protected]> Subject: Re: [Discuss] Provide pluggable APIs to support user customized compression codec Yes, I think he's asking about the motivation for the project. My understanding is that Snappy is used more often than Gzip with Parquet On Wed, Oct 21, 2020 at 8:53 PM Xie, Qi <[email protected]> wrote: > > Hi, Antoine > > Do you mean the performance data HW-GZIP compared with LZ4/ZSTD? > > Thanks, > XieQi > > -----Original Message----- > From: Antoine Pitrou <[email protected]> > Sent: Tuesday, October 20, 2020 10:38 PM > To: [email protected]; Xie, Qi <[email protected]> > Cc: Xu, Cheng A <[email protected]>; Dong, Xin > <[email protected]>; Zhang, Jie1 <[email protected]> > Subject: Re: [Discuss] Provide pluggable APIs to support user > customized compression codec > > > > Le 20/10/2020 à 12:09, Xie, Qi a écrit : > > Hi, Wes > > > > Yes currently the purpose of the key-value metadata is just a hint to > > indicate that the parquet file is compressed by plugin so that the parquet > > reader can load the plugin library and use plugin to decompress the file. > > There are many optimized GZIP implementations and may not compatible with > > the standard gzip, for example due to hardware limit, the HW-GZIP history > > window size maybe smaller than the standard gzip, so that HW-GZIP can't > > decompress the file compressed by standard gzip and because we are still > > use the Compression::GZIP as Compression::type, we need that metadata to > > distinguish it from the standard gzip. > > What does it bring over ZSTD or LZ4 exactly? > > Regards > > Antoine.
