+1.

Thanks to Xiaoxiang for raising this; Kylin has some advanced but hidden
feature. As the function becomes stable, we should enable them by default
to benefit all users.

Please also raise similar discussion if you wish to enable some good
features.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Zhong, Yanghong <yangzh...@ebay.com.invalid> 于2019年3月18日周一 上午10:39写道:

> +1.
>
> Best regards,
> Yanghong Zhong
>
> On 2019/3/18, 10:27 AM, "Xiaoxiang Yu" <xiaoxiang...@kyligence.io> wrote:
>
>     Dear all,
>     I suggest enable "kylin.dictionary.shrunken-from-global-enabled" by
> default(it is disabled by default), because I found enable it will speed up
> cube build process when cube have count distinct(bitmap) on a large
> cardinality column. This feature is contributed in KYLIN-3491.
>
>     When using count distinct(bitmap) measure on a large cardinality
> column(this require global dictionary), build base cuboid step need
> frequent cache swap so it cannot finished within a reasonable period.
> KYLIN-3491 add a new step to build separated dictionary for each InputSplit
> before BuildBaseCuboid step. So mapper of BuildBaseCuboid step only has to
> fetch a smaller dictionary for itself(without unused value), instead of a
> larger global dictionary. It will reduce cache swap and make
> BuildBaseCuboid step run as quick as possible.
>
>     In my test env, my hadoop cluster is a CDH cluster with 56 vcore and
> 110GB Memory. I create a model with a fact table (153326740 rows) and three
> dimension tables, there are three count distinct(bitmap) measure which the
> largest cardinality of single column is 55200325. With ShrunkenDict
> disabled, the BuildBaseCuboid cannot completed in 22 hours. Comparatively,
> with ShrunkenDict enabled, build process completed in a reasonable
> duration(Extra Dictionary cost 5 minutes, Build Base Cuboid costs 5
> minutes).
>
>
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F14030549%2F54363305-ad25e200-46a5-11e9-8bc7-fe2c385c0278.png&amp;data=02%7C01%7Cyangzhong%40ebay.com%7C5f549f14059d4731d7a808d6ab4954ef%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C636884728786178583&amp;sdata=KuUcbcerY42oG4J11G1jlEcIs4v%2BPPVt40B9G9fqa80%3D&amp;reserved=0
>
>     If you want know more, please check
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKYLIN-3491&amp;data=02%7C01%7Cyangzhong%40ebay.com%7C5f549f14059d4731d7a808d6ab4954ef%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C636884728786178583&amp;sdata=T1P1rCA1munwUedC0PC4qttqbFqiDkda%2FZ%2BgqgkQn%2BE%3D&amp;reserved=0.
> If you have any suggestion, please let me know.
>
>     ----------------
>     Best wishes,
>     Xiaoxiang Yu
>
>
>
>

Reply via email to