[GitHub] [carbondata] ajantha-bhat opened a new pull request #3682: [CARBONDATA-3753] optimize double/float stats collector

GitBox Thu, 26 Mar 2020 04:16:11 -0700

ajantha-bhat opened a new pull request #3682: [CARBONDATA-3753] optimize 
double/float stats collector
URL: https://github.com/apache/carbondata/pull/3682
 
 
    ### Why is this PR needed?
   For every double/float column's value. we call 
   `PrimitivePageStatsCollector.getDecimalCount(double value)`
   problem is, here we create new bigdecimal object and plain string object 
every time.
   Which leads in huge memory usage during insert.
   
    ### What changes were proposed in this PR?
   Create only Bigdecimal object and use scale from that. 
       
    ### Does this PR introduce any user interface change?
    - No
    
    ### Is any new testcase added?
    - No
    
   Before the change:
   ![Screenshot from 2020-03-26 
16-45-12](https://user-images.githubusercontent.com/5889404/77640947-380c0e80-6f81-11ea-97ff-f1b8942d99c6.png)
   
   
   After the change:
   ![Screenshot from 2020-03-26 
16-30-27](https://user-images.githubusercontent.com/5889404/77640863-16128c00-6f81-11ea-8af6-1b60cc7a4ab8.png)
   
   There is about 5% improvement in insert for the TPCH lineitem table with10GB 
data without any change in store size.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] ajantha-bhat opened a new pull request #3682: [CARBONDATA-3753] optimize double/float stats collector

Reply via email to