Hi Liang,

I see that province column data is not big, so I guess it hardly make any
impact with lazy decoding in this scenario. Can you do one more test by
excluding the province from dictionary in both presto and spark
integrations. It will tell whether it is really a lazy decoding issue or
not.

Regards,
Ravindra

On 20 July 2017 at 08:04, Liang Chen <chenliang6...@gmail.com> wrote:

> Hi
>
> For -- 4) Lazy decoding of the dictionary,  just i tested 180 millions rows
> data with the script:
> "select province,sum(age),count(*) from presto_carbondata group by province
> order by province"
>
> Spark integration module has "dictionary lazy decode", presto doesn't have
> "dictionary lazy decode", the performance is 4.5 times difference, so
> "dictionary lazy decode" might much help to improve aggregation
> performance.
>
> The detail test result as below :
>
> *1. Presto+CarbonData is 9 second:*
> presto:default> select province,sum(age),count(*) from presto_carbondata
> group by province order by province;
>  province |  _col1   |  _col2
> ----------+----------+---------
>  AB       | 57442740 | 1385010
>  BC       | 57488826 | 1385580
>  MB       | 57564702 | 1386510
>  NB       | 57599520 | 1386960
>  NL       | 57446592 | 1383774
>  NS       | 57448734 | 1384272
>  NT       | 57534228 | 1386936
>  NU       | 57506844 | 1385346
>  ON       | 57484956 | 1384470
>  PE       | 57325164 | 1379802
>  QC       | 57467886 | 1385076
>  SK       | 57385152 | 1382364
>  YT       | 57377556 | 1383900
> (13 rows)
>
> Query 20170720_022833_00004_c9ky2, FINISHED, 1 node
> Splits: 55 total, 55 done (100.00%)
> 0:09 [18M rows, 34.3MB] [1.92M rows/s, 3.65MB/s]
>
> *2.Spark+CarbonData is :2 seconds*
> scala> benchmark { carbon.sql("select province,sum(age),count(*) from
> presto_carbondata group by province order by province").show }
> +--------+--------+--------+
> |province|sum(age)|count(1)|
> +--------+--------+--------+
> |      AB|57442740| 1385010|
> |      BC|57488826| 1385580|
> |      MB|57564702| 1386510|
> |      NB|57599520| 1386960|
> |      NL|57446592| 1383774|
> |      NS|57448734| 1384272|
> |      NT|57534228| 1386936|
> |      NU|57506844| 1385346|
> |      ON|57484956| 1384470|
> |      PE|57325164| 1379802|
> |      QC|57467886| 1385076|
> |      SK|57385152| 1382364|
> |      YT|57377556| 1383900|
> +--------+--------+--------+
>
> 2109.346231ms
>
>
>
> --
> View this message in context: http://apache-carbondata-dev-
> mailing-list-archive.1130556.n5.nabble.com/Presto-
> CarbonData-optimization-work-discussion-tp18509p18522.html
> Sent from the Apache CarbonData Dev Mailing List archive mailing list
> archive at Nabble.com.
>



-- 
Thanks & Regards,
Ravi

Reply via email to