[jira] [Commented] (KYLIN-1021) upload dependent jars of kylin to HDFS and set tmpjars

fengYu (JIRA) Thu, 10 Sep 2015 09:32:02 -0700

    [ 
https://issues.apache.org/jira/browse/KYLIN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739061#comment-14739061
 ]


fengYu commented on KYLIN-1021:
-------------------------------

First of all , I do not know what is the solution in general about this 
problem, maybe I should ask our hadoop administrator tomorrow..
However， I still think automation is always a better choice, these are my 
reasons:

1、If you want to deploy two kylin env at different server(with different 
metadata table), but hadoop/hbase/hive is not at the same location in different 
server . these two kylin server depend on the same hadoop cluster, In your way, 
I need copy all jars to the all hadoop node twice. this means the location of 
jars in hadoop node  restrict the location I deploy hadoop/hbase/hive client.

2、As mentioned by [[email protected]] , If jar files of kylin and its 
dependencies  vary from version to version, we should update the jars at every 
hadoop node(such as I want to update my hive version). But if we upload those 
jars when we need it, It will be easier..

3、At the first time, I upload all jar files and add a parameter in 
kylin.properties pointing to this directory, just like [~Shaofengshi] say, But 
I do know which one kylin need and which one not until I check the source code, 
I find kylin only set hive dependencies and hbase dependencies as classpath and 
kylin take job.jar(it is a combination jar which make up of kylin-job.jar and 
it dependencies jar files) as job class, So I change my realization because we 
get hive/hbase dependencies after kylin start. So I upload them to HDFS at the 
first time when we need upload them and we know what we upload clearly. 

I think tmpjars is a better choice just as the blog paper here : 
http://dongxicheng.org/mapreduce/run-hadoop-job-problems/ 

I will upload my patch tomorrow..please check it then..

> upload dependent jars of kylin to HDFS and set tmpjars
> ------------------------------------------------------
>
>                 Key: KYLIN-1021
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1021
>             Project: Kylin
>          Issue Type: Improvement
>    Affects Versions: v1.0
>            Reporter: fengYu
>
> As [~Shaofengshi] says in maillist : Regrading your question about the jar 
> files located in local disk instead of HDFS, yes the hadoop/hive/hbase jars 
> should exist in local disk on each machine of the hadoop cluster, with the 
> same locations; Kylin will not upload those jars; Please check and ensure the 
> consistency of your hadoop cluster.
> However, our hadoop cluster is managed by hadoop administrator, we have no 
> right to login those machine, even though we have the right, copy all files 
> to hundreds of machine will be a painful job(I do not know is there any tools 
> can do the job well).
> By the way, I can not get any tips about you measure(If you has the document， 
> tell me)...
> I change my source code and create a directory in kylin tmp 
> directory(kylin.hdfs.working.dir/kylin_metadata) and upload all jars to the 
> directory if the directory is empty(it only happened at the first time) when 
> submitting a mapreduce job, then set those locations to tmpjars of the 
> mapreduce job(just like kylin set tmpfiles before submit job), This is 
> automated and make kylin deploying easier..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KYLIN-1021) upload dependent jars of kylin to HDFS and set tmpjars

Reply via email to