Ok, thanks for the test. Then for PR3606, I will only add the compressor name to the file name but not changing the default compressor to ZSTD.
Regards, Jacky > 2020年2月20日 下午12:52,Ajantha Bhat <ajanthab...@gmail.com> 写道: > > Hi Jacky and Ravindra, > > we have tested ZSTD vs snappy again with the latest code in 3 node spark > 2.3 cluster on HDFS with TPCH 500 GB data. > Below is the summary > > *1. ZSTD store is 28.8% smaller compared to snappy* > *2. Overall query time is degraded by 18.35% in ZSTD compared to snappy* > *3. Load time in ZSTD has negligible degradation of 0.7 % compared to > snappy* > > Based on this, I guess we cannot use ZSTD as default due to huge > degradation in query time. > > Thanks, > Ajantha > > > > > On Fri, Feb 7, 2020 at 4:54 PM Ravindra Pesala <ravi.pes...@gmail.com> > wrote: > >> Hi Jacky, >> >> As per the original PR >> https://github.com/apache/carbondata/pull/2628 , query performance got >> decreased by 20% ~ 50% compared to snappy. So I am concerned about the >> performance. Please better have a proper tpch performance report on the >> regular cluster like we do for every version and decide based on that. >> >> Regards, >> Ravindra. >> >> On Fri, 7 Feb 2020 at 10:40 AM, Jacky Li <jacky.li...@qq.com> wrote: >> >>> Hi Ajantha, >>> >>> >>> Yes, decoder will use the compressorName stored in ChunkCompressionMeta >>> from the file header, >>> but I think it is better to put it in the name so that user can know the >>> compressor in the shell without reading it by launching engine. >>> >>> >>> In spark, for parquet/orc the file name written >>> is: part-00115-e2758995-4b10-4bd2-bf15-b4c176e587fe-c000.snappy.orc >>> >>> >>> In PR3606, I will handle the compatibility. >>> >>> >>> Regards, >>> Jacky >>> >>> >>> ------------------ 原始邮件 ------------------ >>> 发件人: "Ajantha Bhat"<ajanthab...@gmail.com>; >>> 发送时间: 2020年2月6日(星期四) 晚上11:51 >>> 收件人: "dev"<dev@carbondata.apache.org>; >>> >>> 主题: Re: Discussion: change default compressor to ZSTD >>> >>> >>> >>> Hi, >>> >>> 33% is huge a reduction in store size. If there is negligible difference >> in >>> load and query time, we should definitely go for it. >>> >>> And does user really need to know about what compression is used ? change >>> in file name may be need to handle compatibility. >>> Already thrift *FileHeader, ChunkCompressionMeta* is storing the >> compressor >>> name. query time decoding can be based on this. >>> >>> Thanks, >>> Ajantha >>> >>> >>> On Thu, Feb 6, 2020 at 4:27 PM Jacky Li <jacky.li...@qq.com> wrote: >>> >>> > Hi, >>> > >>> > >>> > I compared snappy and zstd compressor using TPCH for carbondata. >>> > >>> > >>> > For TPCH lineitem table: >>> > carbon-zstdcarbon-snappy >>> > loading (s)5351 >>> > size795MB1.2GB >>> > >>> > TPCH-query: >>> > Q14.2898.29 >>> > Q212.60912.986 >>> > Q314.90214.458 >>> > Q46.2765.954 >>> > Q523.14721.946 >>> > Q61.120.945 >>> > Q723.01728.007 >>> > Q814.55415.077 >>> > Q928.47227.473 >>> > Q1024.06724.682 >>> > Q113.3213.79 >>> > Q125.3115.185 >>> > Q1314.0811.84 >>> > Q142.2622.087 >>> > Q155.4964.772 >>> > Q1629.91929.833 >>> > Q177.0187.057 >>> > Q1817.36717.795 >>> > Q192.9312.865 >>> > Q2011.34710.937 >>> > Q2126.41628.414 >>> > Q225.9236.311 >>> > sum283.844290.704 >>> > >>> > >>> > As you can see, after using zstd, table size is 33% reduced >> comparing >>> to >>> > snappy. And the data loading and query time difference is >> negligible. >>> So I >>> > suggest to change the default compressor in carbondata from snappy >> to >>> zstd. >>> > >>> > >>> > To change the default compressor, we need to: >>> > 1. append the compressor name in the carbondata file name. So that >>> from >>> > the file name user can know what compressor is used. >>> > For example, file name will be changed from >>> > &nbsp;part-0-0_batchno0-0-0-1580982686749.carbondata >>> > >>> >> to&nbsp;&nbsp;part-0-0_batchno0-0-0-1580982686749.snappy.carbondata >>> > >>> or&nbsp;&nbsp;part-0-0_batchno0-0-0-1580982686749.zstd.carbondata >>> > >>> > >>> > 2. Change the compressor constant in CarbonCommonConstaint.java file >>> to >>> > use zstd as default compressor >>> > >>> > >>> > What do you think? >>> > >>> > >>> > Regards, >>> > Jacky >> >> -- >> Thanks & Regards, >> Ravi >>