[ https://issues.apache.org/jira/browse/KYLIN-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fangyuan Deng updated KYLIN-3729: --------------------------------- Attachment: KYLIN-3729.patch > CLUSTER BY CAST(field AS STRING) will accelerate base cuboid build with UHC > global dict > --------------------------------------------------------------------------------------- > > Key: KYLIN-3729 > URL: https://issues.apache.org/jira/browse/KYLIN-3729 > Project: Kylin > Issue Type: Improvement > Components: Job Engine > Affects Versions: v2.5.2 > Reporter: Fangyuan Deng > Assignee: Fangyuan Deng > Priority: Minor > Attachments: KYLIN-3729.patch, image-2018-12-19-12-01-20-430.png, > image-2018-12-19-12-02-08-913.png > > > As we know global dict is a sliced appendTrieTree using cache-loader , so if > we convert values to ids using global dict, ordered values will help. > And now we can set kylin.source.hive.flat-table-cluster-by-dict-column = uhc > column, to make source data CLUSTER BY uhc-column, this get better. > But the appendTrieTree is order by string, so we can CLUSTER BY > CAST(uhc-column AS STRING), to optimize most. > We can see the hdfs bytes read (most is global dict) reduce to 30% > !image-2018-12-19-12-01-20-430.png!!image-2018-12-19-12-02-08-913.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)