Hi, *I was able to decrease the memory usage in TLAB from 68GB to 29.94 GB for the same TPCH data* *without disabling adaptive encoding*.
*There is about 5% improvement in insert also*. Please check the PR. https://github.com/apache/carbondata/pull/3682 Before the change: [image: Screenshot from 2020-03-26 16-45-12] <https://user-images.githubusercontent.com/5889404/77640947-380c0e80-6f81-11ea-97ff-f1b8942d99c6.png> After the change: [image: Screenshot from 2020-03-26 16-51-31] <https://user-images.githubusercontent.com/5889404/77641533-34c55280-6f82-11ea-8a60-bfb6c8d8f52a.png> Thanks, Ajantha On Wed, Mar 25, 2020 at 2:51 PM Ravindra Pesala <ravi.pes...@gmail.com> wrote: > Hi Anantha, > > I think it is better to fix the problem instead of disabling the things. It > is already observed that store size increases proportionally. If my data > has more columns then it will be exponential. Store size directly impacts > the query performance in object store world. It is better to find a way to > fix it rather than removing things. > > Regards, > Ravindra. > > On Wed, 25 Mar 2020 at 5:04 PM, Ajantha Bhat <ajanthab...@gmail.com> > wrote: > > > Hi Ravi, please find the performance readings below. > > > > On TPCH 10GB data, carbon to carbon insert in on HDFS standalone cluster: > > > > > > *By disabling adaptive encoding for float and double.* > > insert is *more than 10% faster* [before 139 seconds, after this it is > > 114 seconds] and > > *saves 25% memory in TLAB*store size *has increased by 10% *[before 2.3 > > GB, after this it is 2.55 GB] > > > > Also we have below check. If data is more than 5 decimal precision. we > > don't apply adaptive encoding for double/float. > > So, I am not sure how much it is useful for real-world double precision > > data. > > > > [image: Screenshot from 2020-03-25 14-27-07.png] > > > > > > *Bottleneck is finding that decimal points from every float and double > > value [*PrimitivePageStatsCollector.getDecimalCount(double)*] * > > *where we convert to string and use substring().* > > > > so I want to disable adaptive encoding for double and float by default. > > > > Thanks, > > Ajantha > > > > On Wed, Mar 25, 2020 at 11:37 AM Ravindra Pesala <ravi.pes...@gmail.com> > > wrote: > > > >> Hi , > >> > >> It increases the store size. Can you give me performance figures with > and > >> without these changes. And also provide how much store size impact if > we > >> disable it. > >> > >> > >> Regards, > >> Ravindra. > >> > >> On Wed, 25 Mar 2020 at 1:51 PM, Ajantha Bhat <ajanthab...@gmail.com> > >> wrote: > >> > >> > Hi all, > >> > > >> > I have done insert into flow profiling using JMC with the latest code > >> > [with new optimized insert flow] > >> > > >> > It seems for *2.5GB* carbon to carbon insert, double and float stats > >> > collector has used *68.36 GB* [*25%* of TLAB (Thread local allocation > >> > buffer)] > >> > > >> > [image: Screenshot from 2020-03-25 11-18-04.png] > >> > *The problem is for every value of double and float in every row, we > >> call * > >> > *PrimitivePageStatsCollector.getDecimalCount()**Which makes new > objects > >> > every time.* > >> > > >> > So, I want to disable Adaptive encoding for float and double by > default. > >> > *I will make this configurable.* > > > > > >> > If some user has a well-sorted double or float column and wants to > apply > >> > adaptive encoding on that, they can enable it to reduce store size. > >> > > >> > Thanks, > >> > Ajantha > >> > > >> -- > >> Thanks & Regards, > >> Ravi > >> > > -- > Thanks & Regards, > Ravi >