Hi,all:
   Recently, I create two tables as ORC and Carbondata.  All of them contain
one hundred million records.
Then I submit aggregate querys to presto like : [Select  count(*)  from
tableB where attributeA = 'xxx'], 
carbon performs better than orc. 

However,  when i submit querys like: [Select attributeA , count(*)  from
tableB group by attributeA],  the performace of carbon is bad. Obviously
this query will result-in a full scan,  so QueryModel need to rebuild all
records with columns related. This step need a lot of time.

So i want to know is there any optimize techniques for this kind of problems
in spark?



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Aggregate-performace-tp7440.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Reply via email to