Hi

For -- 4) Lazy decoding of the dictionary,  just i tested 180 millions rows
data with the script: 
"select province,sum(age),count(*) from presto_carbondata group by province
order by province"

Spark integration module has "dictionary lazy decode", presto doesn't have
"dictionary lazy decode", the performance is 4.5 times difference, so
"dictionary lazy decode" might much help to improve aggregation performance.

The detail test result as below : 

*1. Presto+CarbonData is 9 second:*
presto:default> select province,sum(age),count(*) from presto_carbondata
group by province order by province;
 province |  _col1   |  _col2
----------+----------+---------
 AB       | 57442740 | 1385010
 BC       | 57488826 | 1385580
 MB       | 57564702 | 1386510
 NB       | 57599520 | 1386960
 NL       | 57446592 | 1383774
 NS       | 57448734 | 1384272
 NT       | 57534228 | 1386936
 NU       | 57506844 | 1385346
 ON       | 57484956 | 1384470
 PE       | 57325164 | 1379802
 QC       | 57467886 | 1385076
 SK       | 57385152 | 1382364
 YT       | 57377556 | 1383900
(13 rows)

Query 20170720_022833_00004_c9ky2, FINISHED, 1 node
Splits: 55 total, 55 done (100.00%)
0:09 [18M rows, 34.3MB] [1.92M rows/s, 3.65MB/s]

*2.Spark+CarbonData is :2 seconds*
scala> benchmark { carbon.sql("select province,sum(age),count(*) from
presto_carbondata group by province order by province").show }
+--------+--------+--------+
|province|sum(age)|count(1)|
+--------+--------+--------+
|      AB|57442740| 1385010|
|      BC|57488826| 1385580|
|      MB|57564702| 1386510|
|      NB|57599520| 1386960|
|      NL|57446592| 1383774|
|      NS|57448734| 1384272|
|      NT|57534228| 1386936|
|      NU|57506844| 1385346|
|      ON|57484956| 1384470|
|      PE|57325164| 1379802|
|      QC|57467886| 1385076|
|      SK|57385152| 1382364|
|      YT|57377556| 1383900|
+--------+--------+--------+

2109.346231ms



--
View this message in context: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Presto-CarbonData-optimization-work-discussion-tp18509p18522.html
Sent from the Apache CarbonData Dev Mailing List archive mailing list archive 
at Nabble.com.

Reply via email to