[
https://issues.apache.org/jira/browse/KYLIN-1021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739061#comment-14739061
]
fengYu commented on KYLIN-1021:
-------------------------------
First of all , I do not know what is the solution in general about this
problem, maybe I should ask our hadoop administrator tomorrow..
However, I still think automation is always a better choice, these are my
reasons:
1、If you want to deploy two kylin env at different server(with different
metadata table), but hadoop/hbase/hive is not at the same location in different
server . these two kylin server depend on the same hadoop cluster, In your way,
I need copy all jars to the all hadoop node twice. this means the location of
jars in hadoop node restrict the location I deploy hadoop/hbase/hive client.
2、As mentioned by [[email protected]] , If jar files of kylin and its
dependencies vary from version to version, we should update the jars at every
hadoop node(such as I want to update my hive version). But if we upload those
jars when we need it, It will be easier..
3、At the first time, I upload all jar files and add a parameter in
kylin.properties pointing to this directory, just like [~Shaofengshi] say, But
I do know which one kylin need and which one not until I check the source code,
I find kylin only set hive dependencies and hbase dependencies as classpath and
kylin take job.jar(it is a combination jar which make up of kylin-job.jar and
it dependencies jar files) as job class, So I change my realization because we
get hive/hbase dependencies after kylin start. So I upload them to HDFS at the
first time when we need upload them and we know what we upload clearly.
I think tmpjars is a better choice just as the blog paper here :
http://dongxicheng.org/mapreduce/run-hadoop-job-problems/
I will upload my patch tomorrow..please check it then..
> upload dependent jars of kylin to HDFS and set tmpjars
> ------------------------------------------------------
>
> Key: KYLIN-1021
> URL: https://issues.apache.org/jira/browse/KYLIN-1021
> Project: Kylin
> Issue Type: Improvement
> Affects Versions: v1.0
> Reporter: fengYu
>
> As [~Shaofengshi] says in maillist : Regrading your question about the jar
> files located in local disk instead of HDFS, yes the hadoop/hive/hbase jars
> should exist in local disk on each machine of the hadoop cluster, with the
> same locations; Kylin will not upload those jars; Please check and ensure the
> consistency of your hadoop cluster.
> However, our hadoop cluster is managed by hadoop administrator, we have no
> right to login those machine, even though we have the right, copy all files
> to hundreds of machine will be a painful job(I do not know is there any tools
> can do the job well).
> By the way, I can not get any tips about you measure(If you has the document,
> tell me)...
> I change my source code and create a directory in kylin tmp
> directory(kylin.hdfs.working.dir/kylin_metadata) and upload all jars to the
> directory if the directory is empty(it only happened at the first time) when
> submitting a mapreduce job, then set those locations to tmpjars of the
> mapreduce job(just like kylin set tmpfiles before submit job), This is
> automated and make kylin deploying easier..
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)