[ https://issues.apache.org/jira/browse/KYLIN-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yaqian Zhang updated KYLIN-4941: -------------------------------- Fix Version/s: (was: v3.1.3) Future > Support encoding raw data to base cuboid column-by-column > --------------------------------------------------------- > > Key: KYLIN-4941 > URL: https://issues.apache.org/jira/browse/KYLIN-4941 > Project: Kylin > Issue Type: Improvement > Components: Job Engine > Affects Versions: v3.1.1 > Reporter: Shengjun Zheng > Assignee: Shengjun Zheng > Priority: Major > Fix For: Future > > > When building with spark engine, the first step is to encode hive table's row > to base cuboid data. > The existing implementation is encoding row by row. If the cube has several > dictionary encoded measures, it has to use all dictionaries at the same time > to encode a single row. This causes heavy memory usage, and low cache hit > ratio of dictionary cache. > We optimized this case by encoding column by column, and it did bring > significant improvement over cubes with several high cardinality > dictionaries-encoded measures. > We will refine the implementation based on KYLIN3.x and share it out. -- This message was sent by Atlassian Jira (v8.3.4#803005)