[jira] [Created] (KYLIN-4941) Support encoding raw data to base cuboid column-by-column

ShengJun Zheng (Jira) Sun, 21 Mar 2021 19:21:08 -0700

ShengJun Zheng created KYLIN-4941:
-------------------------------------

             Summary: Support encoding raw data to base cuboid column-by-column
                 Key: KYLIN-4941
                 URL: https://issues.apache.org/jira/browse/KYLIN-4941
             Project: Kylin
          Issue Type: Improvement
          Components: Job Engine
    Affects Versions: v3.1.1
            Reporter: ShengJun Zheng
             Fix For: v3.1.2



When building with spark engine, the first step is to encode hive table's row 
to base cuboid data.

The existing implementation is encoding row by row. If the cube has several 
dictionary encoded measures, it has to use all dictionaries at the same time to 
encode a single row. This causes heavy memory usage, and low cache hit ratio of 
dictionary cache.

We optimized this case by encoding column by column, and it did bring 
significant improvement over cubes with several high cardinality 
dictionaries-encoded measures.

We will refine the implementation based on KYLIN3.x and share it out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (KYLIN-4941) Support encoding raw data to base cuboid column-by-column

Reply via email to