Hi Jacky and Ravindra,

we have tested ZSTD vs snappy again with the latest code in 3 node spark
2.3 cluster on HDFS with TPCH 500 GB data.
Below is the summary

*1.  ZSTD store is 28.8% smaller compared to snappy*
*2.  Overall query time is degraded by 18.35% in ZSTD compared to snappy*
*3.  Load time in ZSTD has negligible degradation of 0.7 % compared to
snappy*

Based on this, I guess we cannot use ZSTD as default due to huge
degradation in query time.

Thanks,
Ajantha




On Fri, Feb 7, 2020 at 4:54 PM Ravindra Pesala <ravi.pes...@gmail.com>
wrote:

> Hi Jacky,
>
> As per the original PR
> https://github.com/apache/carbondata/pull/2628 , query performance got
> decreased by 20% ~ 50% compared to snappy.  So I am concerned about the
> performance. Please better have a proper tpch performance report on the
> regular cluster like we do for every version and decide based on that.
>
> Regards,
> Ravindra.
>
> On Fri, 7 Feb 2020 at 10:40 AM, Jacky Li <jacky.li...@qq.com> wrote:
>
> > Hi Ajantha,
> >
> >
> > Yes, decoder will use the compressorName stored in ChunkCompressionMeta
> > from the file header,
> > but I think it is better to put it in the name so that user can know the
> > compressor in the shell without reading it by launching engine.
> >
> >
> > In spark, for parquet/orc the file name written
> > is:&nbsp;part-00115-e2758995-4b10-4bd2-bf15-b4c176e587fe-c000.snappy.orc
> >
> >
> > In PR3606, I will handle the compatibility.
> >
> >
> > Regards,
> > Jacky
> >
> >
> > ------------------&nbsp;原始邮件&nbsp;------------------
> > 发件人:&nbsp;"Ajantha Bhat"<ajanthab...@gmail.com&gt;;
> > 发送时间:&nbsp;2020年2月6日(星期四) 晚上11:51
> > 收件人:&nbsp;"dev"<dev@carbondata.apache.org&gt;;
> >
> > 主题:&nbsp;Re: Discussion: change default compressor to ZSTD
> >
> >
> >
> > Hi,
> >
> > 33% is huge a reduction in store size. If there is negligible difference
> in
> > load and query time, we should definitely go for it.
> >
> > And does user really need to know about what compression is used ? change
> > in file name may be need to handle compatibility.
> > Already thrift *FileHeader, ChunkCompressionMeta* is storing the
> compressor
> > name. query time decoding can be based on this.
> >
> > Thanks,
> > Ajantha
> >
> >
> > On Thu, Feb 6, 2020 at 4:27 PM Jacky Li <jacky.li...@qq.com&gt; wrote:
> >
> > &gt; Hi,
> > &gt;
> > &gt;
> > &gt; I compared snappy and zstd compressor using TPCH for carbondata.
> > &gt;
> > &gt;
> > &gt; For TPCH lineitem table:
> > &gt; carbon-zstdcarbon-snappy
> > &gt; loading (s)5351
> > &gt; size795MB1.2GB
> > &gt;
> > &gt; TPCH-query:
> > &gt; Q14.2898.29
> > &gt; Q212.60912.986
> > &gt; Q314.90214.458
> > &gt; Q46.2765.954
> > &gt; Q523.14721.946
> > &gt; Q61.120.945
> > &gt; Q723.01728.007
> > &gt; Q814.55415.077
> > &gt; Q928.47227.473
> > &gt; Q1024.06724.682
> > &gt; Q113.3213.79
> > &gt; Q125.3115.185
> > &gt; Q1314.0811.84
> > &gt; Q142.2622.087
> > &gt; Q155.4964.772
> > &gt; Q1629.91929.833
> > &gt; Q177.0187.057
> > &gt; Q1817.36717.795
> > &gt; Q192.9312.865
> > &gt; Q2011.34710.937
> > &gt; Q2126.41628.414
> > &gt; Q225.9236.311
> > &gt; sum283.844290.704
> > &gt;
> > &gt;
> > &gt; As you can see, after using zstd, table size is 33% reduced
> comparing
> > to
> > &gt; snappy. And the data loading and query time difference is
> negligible.
> > So I
> > &gt; suggest to change the default compressor in carbondata from snappy
> to
> > zstd.
> > &gt;
> > &gt;
> > &gt; To change the default compressor, we need to:
> > &gt; 1. append the compressor name in the carbondata file name. So that
> > from
> > &gt; the file name user can know what compressor is used.
> > &gt; For example, file name will be changed from
> > &gt; &amp;nbsp;part-0-0_batchno0-0-0-1580982686749.carbondata
> > &gt;
> >
> to&amp;nbsp;&amp;nbsp;part-0-0_batchno0-0-0-1580982686749.snappy.carbondata
> > &gt;
> > or&amp;nbsp;&amp;nbsp;part-0-0_batchno0-0-0-1580982686749.zstd.carbondata
> > &gt;
> > &gt;
> > &gt; 2. Change the compressor constant in CarbonCommonConstaint.java file
> > to
> > &gt; use zstd as default compressor
> > &gt;
> > &gt;
> > &gt; What do you think?
> > &gt;
> > &gt;
> > &gt; Regards,
> > &gt; Jacky
>
> --
> Thanks & Regards,
> Ravi
>

Reply via email to