[ 
https://issues.apache.org/jira/browse/KYLIN-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yerui Sun updated KYLIN-1323:
-----------------------------
    Attachment: KYLIN-1323-1.x-staging.2.patch

Updated the patch for 1.x-staging branch.
Using config 'kylin.hbase.hfile.size.gb' instead of 
'kylin.hbase.hfile.per.region', more clear to users.
Also added test in BuildCubeWithEngineTest to cover this feature in CI. The 
trick is when system property 'useSandbox' is set to true, scale the cuboid 
data from 1KB to 1MB, to simulate regions larger than 1GB.

> Improve performance of converting data to hfile
> -----------------------------------------------
>
>                 Key: KYLIN-1323
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1323
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>    Affects Versions: v1.2
>            Reporter: Yerui Sun
>            Assignee: Yerui Sun
>             Fix For: v2.0, v1.3
>
>         Attachments: KYLIN-1323-1.x-staging.2.patch, 
> KYLIN-1323-1.x-staging.patch
>
>
> Supposed that we got 100GB data after cuboid building, and with setting that 
> 10GB per region. For now, 10 split keys was calculated, and 10 region 
> created, 10 reducer used in ‘convert to hfile’ step. 
> With optimization, we could calculate 100 (or more) split keys, and use all 
> them in ‘covert to file’ step, but sampled 10 keys in them to create regions. 
> The result is still 10 region created, but 100 reducer used in ‘convert to 
> file’ step. Of course, the hfile created is also 100, and load 10 files per 
> region. That’s should be fine, doesn’t affect the query performance 
> dramatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to