[ 
https://issues.apache.org/jira/browse/KYLIN-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733307#comment-14733307
 ] 

liyang commented on KYLIN-1007:
-------------------------------

Currently MR_V2 does not generate cuboid file, instead generate HFile directly. 
This saves one step of build, but when comes to merging, data have to be loaded 
through HBase scan API. This puts a heavy load on HBase region server.

Ideally, merge should load from HDFS directly without going through HBase 
server. This can be done in two ways.

1. Use TableSnapshotInputFormat. However currently this input format does not 
support multiple HTables.
2. Still generates the cuboid files. This requires an additional step in build 
process and takes extra storage. However is the only workable way for the 
moment.

For smooth transition, merge will check the existence of cuboid files, use it 
as a priority. Fallback to HTable scan otherwise.




> Engine MR_V2 also generates cuboid files and merge from it
> ----------------------------------------------------------
>
>                 Key: KYLIN-1007
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1007
>             Project: Kylin
>          Issue Type: Sub-task
>          Components: General
>    Affects Versions: v2.0
>            Reporter: liyang
>            Assignee: liyang
>             Fix For: v2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to