[jira] [Updated] (KYLIN-4941) Support encoding raw data to base cuboid column-by-column

Yaqian Zhang (Jira) Wed, 03 Nov 2021 01:02:41 -0700


     [ 
https://issues.apache.org/jira/browse/KYLIN-4941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yaqian Zhang updated KYLIN-4941:
--------------------------------
    Fix Version/s:     (was: v3.1.3)
                   Future

> Support encoding raw data to base cuboid column-by-column
> ---------------------------------------------------------
>
>                 Key: KYLIN-4941
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4941
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>    Affects Versions: v3.1.1
>            Reporter: Shengjun Zheng
>            Assignee: Shengjun Zheng
>            Priority: Major
>             Fix For: Future
>
>
> When building with spark engine, the first step is to encode hive table's row 
> to base cuboid data.
> The existing implementation is encoding row by row. If the cube has several 
> dictionary encoded measures, it has to use all dictionaries at the same time 
> to encode a single row. This causes heavy memory usage, and low cache hit 
> ratio of dictionary cache.
> We optimized this case by encoding column by column, and it did bring 
> significant improvement over cubes with several high cardinality 
> dictionaries-encoded measures.
> We will refine the implementation based on KYLIN3.x and share it out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (KYLIN-4941) Support encoding raw data to base cuboid column-by-column

Reply via email to