Hi,

*I was able to decrease the memory usage in TLAB from 68GB to 29.94 GB for
the same TPCH data* *without disabling adaptive encoding*.

*There is about 5% improvement in insert also*. Please check the PR.

https://github.com/apache/carbondata/pull/3682

Before the change:
[image: Screenshot from 2020-03-26 16-45-12]
<https://user-images.githubusercontent.com/5889404/77640947-380c0e80-6f81-11ea-97ff-f1b8942d99c6.png>

After the change:

[image: Screenshot from 2020-03-26 16-51-31]
<https://user-images.githubusercontent.com/5889404/77641533-34c55280-6f82-11ea-8a60-bfb6c8d8f52a.png>

Thanks,

Ajantha


On Wed, Mar 25, 2020 at 2:51 PM Ravindra Pesala <ravi.pes...@gmail.com>
wrote:

> Hi Anantha,
>
> I think it is better to fix the problem instead of disabling the things. It
> is already observed that store size increases proportionally. If my data
> has more columns then it will be exponential.  Store size directly impacts
> the query performance in object store world. It is better to find a way to
> fix it rather than removing things.
>
> Regards,
> Ravindra.
>
> On Wed, 25 Mar 2020 at 5:04 PM, Ajantha Bhat <ajanthab...@gmail.com>
> wrote:
>
> > Hi Ravi, please find the performance readings below.
> >
> > On TPCH 10GB data, carbon to carbon insert in on HDFS standalone cluster:
> >
> >
> > *By disabling adaptive encoding for float and double.*
> > insert is *more than 10% faster* [before 139 seconds, after this it is
> > 114 seconds] and
> > *saves 25% memory in TLAB*store size *has increased by 10% *[before 2.3
> > GB, after this it is 2.55 GB]
> >
> > Also we have below check. If data is more than 5 decimal precision. we
> > don't apply adaptive encoding for double/float.
> > So, I am not sure how much it is useful for real-world double precision
> > data.
> >
> > [image: Screenshot from 2020-03-25 14-27-07.png]
> >
> >
> > *Bottleneck is finding that decimal points from every float and double
> > value [*PrimitivePageStatsCollector.getDecimalCount(double)*] *
> > *where we convert to string and use substring().*
> >
> > so I want to disable adaptive encoding for double and float by default.
> >
> > Thanks,
> > Ajantha
> >
> > On Wed, Mar 25, 2020 at 11:37 AM Ravindra Pesala <ravi.pes...@gmail.com>
> > wrote:
> >
> >> Hi ,
> >>
> >> It increases the store size.  Can you give me performance figures with
> and
> >> without these changes.  And also provide how much store size impact if
> we
> >> disable it.
> >>
> >>
> >> Regards,
> >> Ravindra.
> >>
> >> On Wed, 25 Mar 2020 at 1:51 PM, Ajantha Bhat <ajanthab...@gmail.com>
> >> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I have done insert into flow profiling using JMC with the latest code
> >> > [with new optimized insert flow]
> >> >
> >> > It seems for *2.5GB* carbon to carbon insert, double and float stats
> >> > collector has used *68.36 GB* [*25%* of TLAB (Thread local allocation
> >> > buffer)]
> >> >
> >> > [image: Screenshot from 2020-03-25 11-18-04.png]
> >> > *The problem is for every value of double and float in every row, we
> >> call *
> >> > *PrimitivePageStatsCollector.getDecimalCount()**Which makes new
> objects
> >> > every time.*
> >> >
> >> > So, I want to disable Adaptive encoding for float and double by
> default.
> >> > *I will make this configurable.*
> >
> >
> >> > If some user has a well-sorted double or float column and wants to
> apply
> >> > adaptive encoding on that, they can enable it to reduce store size.
> >> >
> >> > Thanks,
> >> > Ajantha
> >> >
> >> --
> >> Thanks & Regards,
> >> Ravi
> >>
> > --
> Thanks & Regards,
> Ravi
>

Reply via email to