Hi,all: Recently, I create two tables as ORC and Carbondata. All of them contain one hundred million records. Then I submit aggregate querys to presto like : [Select count(*) from tableB where attributeA = 'xxx'], carbon performs better than orc.
However, when i submit querys like: [Select attributeA , count(*) from tableB group by attributeA], the performace of carbon is bad. Obviously this query will result-in a full scan, so QueryModel need to rebuild all records with columns related. This step need a lot of time. So i want to know is there any optimize techniques for this kind of problems in spark? -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Aggregate-performace-tp7440.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.