[ 
https://issues.apache.org/jira/browse/KYLIN-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15114871#comment-15114871
 ] 

Zhong Yanghong commented on KYLIN-1082:
---------------------------------------

In my implementation, there are two ways to distribute dependent jars to 
datanode. One is appending the related jars to the property 
"kylin.job.mr.lib.dir", the other is set the property "kylin.job.mr.lib.dir", 
and copy the related jars into this specified directory. Since for every 
machine running kylin, the Hive is supposed to be installed. Then there is a 
shell script called "find-hive-dependency.sh" will find the hive dependencies 
and set the property "kylin.hive.dependency". To avoid uploading too many 
useless jars which kylin jobs will not use, there is a filter inside 
"AbstractHadoopJob.java" to filter out only the jars to be used. 

> Hive dependencies should be add to tmpjars
> ------------------------------------------
>
>                 Key: KYLIN-1082
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1082
>             Project: Kylin
>          Issue Type: Bug
>          Components: Environment , Job Engine
>            Reporter: liyang
>            Assignee: Zhong Yanghong
>              Labels: newbie
>             Fix For: v2.1, v1.3
>
>         Attachments: auto_hive_tmpjars_1_x_staging.patch, 
> auto_hive_tmpjars_2_x_staging.patch
>
>
> Currently kylin assume all data nodes have hive deployment at exact same FS 
> location. However, a better position is to think hive as a client side app. 
> Then we need to ship hive jar with MR job every time.
> This make deploy kylin a lot easier in cluster that does not have hive on all 
> data nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to