+1
发送自 Windows 10 版邮件<https://go.microsoft.com/fwlink/?LinkId=550986>应用 ________________________________ 发件人: Billy Liu <billy...@apache.org> 发送时间: Monday, March 18, 2019 11:50:49 AM 收件人: dev 抄送: Xiaoxiang Yu 主题: Re: [Discussion] Enable shrunken dictionary by default 22 hours to 5 minutes, incredible progress. +1 With Warm regards Billy Liu ShaoFeng Shi <shaofeng...@apache.org> 于2019年3月18日周一 上午2:59写道: > > +1. > > Thanks to Xiaoxiang for raising this; Kylin has some advanced but hidden > feature. As the function becomes stable, we should enable them by default > to benefit all users. > > Please also raise similar discussion if you wish to enable some good > features. > > Best regards, > > Shaofeng Shi 史少锋 > Apache Kylin PMC > Email: shaofeng...@apache.org > > Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html > Join Kylin user mail group: user-subscr...@kylin.apache.org > Join Kylin dev mail group: dev-subscr...@kylin.apache.org > > > > > Zhong, Yanghong <yangzh...@ebay.com.invalid> 于2019年3月18日周一 上午10:39写道: > > > +1. > > > > Best regards, > > Yanghong Zhong > > > > On 2019/3/18, 10:27 AM, "Xiaoxiang Yu" <xiaoxiang...@kyligence.io> wrote: > > > > Dear all, > > I suggest enable "kylin.dictionary.shrunken-from-global-enabled" by > > default(it is disabled by default), because I found enable it will speed up > > cube build process when cube have count distinct(bitmap) on a large > > cardinality column. This feature is contributed in KYLIN-3491. > > > > When using count distinct(bitmap) measure on a large cardinality > > column(this require global dictionary), build base cuboid step need > > frequent cache swap so it cannot finished within a reasonable period. > > KYLIN-3491 add a new step to build separated dictionary for each InputSplit > > before BuildBaseCuboid step. So mapper of BuildBaseCuboid step only has to > > fetch a smaller dictionary for itself(without unused value), instead of a > > larger global dictionary. It will reduce cache swap and make > > BuildBaseCuboid step run as quick as possible. > > > > In my test env, my hadoop cluster is a CDH cluster with 56 vcore and > > 110GB Memory. I create a model with a fact table (153326740 rows) and three > > dimension tables, there are three count distinct(bitmap) measure which the > > largest cardinality of single column is 55200325. With ShrunkenDict > > disabled, the BuildBaseCuboid cannot completed in 22 hours. Comparatively, > > with ShrunkenDict enabled, build process completed in a reasonable > > duration(Extra Dictionary cost 5 minutes, Build Base Cuboid costs 5 > > minutes). > > > > > > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F14030549%2F54363305-ad25e200-46a5-11e9-8bc7-fe2c385c0278.png&data=02%7C01%7Cyangzhong%40ebay.com%7C5f549f14059d4731d7a808d6ab4954ef%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C636884728786178583&sdata=KuUcbcerY42oG4J11G1jlEcIs4v%2BPPVt40B9G9fqa80%3D&reserved=0 > > > > If you want know more, please check > > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKYLIN-3491&data=02%7C01%7Cyangzhong%40ebay.com%7C5f549f14059d4731d7a808d6ab4954ef%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C636884728786178583&sdata=T1P1rCA1munwUedC0PC4qttqbFqiDkda%2FZ%2BgqgkQn%2BE%3D&reserved=0. > > If you have any suggestion, please let me know. > > > > ---------------- > > Best wishes, > > Xiaoxiang Yu > > > > > > > >