[ 
https://issues.apache.org/jira/browse/SAMZA-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Fang updated SAMZA-307:
---------------------------

    Description: 
Currently, we have two ways of deploying the samza job to YARN cluster, from 
[HDFS|https://samza.incubator.apache.org/learn/tutorials/0.7.0/deploy-samza-job-from-hdfs.html]
 and [Http | 
https://samza.incubator.apache.org/learn/tutorials/0.7.0/run-in-multi-node-yarn.html],
 but neither of them is out-of-box. Users have to go through the tutorial, add 
dependencies, recompile, put the job package to HDFS or Http and then finally 
run. I feel it is a little cumbersome sometimes. We maybe able to provide a 
simpler way to deploy the job.

When users have YARN and HDFS in the same cluster (such as CDH5), we can 
provide a job-submit script which does:
1. take cluster configuration
2. call some jave code to upload the assembly (all the samza needed jars and is 
already-compiled) and user's job jar (which changes frequently) to the HDFS
3. run the job as usual. 

Therefore, the users only need to run one command line *instead of*:
1. going step by step from the tutorial during their first job
2. assembling all code and uploading to HDFS manually every time they make 
changes to their job. 

(Yes, I learnt it from [Spark's Yarn 
deploy|http://spark.apache.org/docs/latest/running-on-yarn.html] and [their 
code|https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala]
 ) 

When users only have YARN, I think they have no way but start the http server 
as tutorial. 

What do you think? Does the simplification make sense? Or the Samza will have 
some difficulties (issues) if we do the deploy in this way? Thank you.

 

  was:
Currently, we have two ways of deploying the samza job to YARN cluster, from 
[HDFS|https://samza.incubator.apache.org/learn/tutorials/0.7.0/deploy-samza-job-from-hdfs.html]
 and [Http | 
https://samza.incubator.apache.org/learn/tutorials/0.7.0/run-in-multi-node-yarn.html],
 but neither of them is out-of-box. Users have to go through the tutorial, add 
dependencies, recompile, put the job package to HDFS or Http and then finally 
run.

I feel it is a little cumbersome sometimes. We maybe able to provide a simpler 
way to deploy the job.

1. When users have YARN and HDFS in the same cluster (such as CDH5), we can 
provide a job-submit script which takes cluster configuration, then call some 
jave code to upload the assembly (all the samza needed jars and is 
already-compiled) along with user's job jar (which changes frequently) to the 
HDFS, and then run the job as usual. (Yes, I learnt it from [Spark's Yarn 
deploy|http://spark.apache.org/docs/latest/running-on-yarn.html])

2.  
 

 


> Simplify YARN deploy procedure 
> -------------------------------
>
>                 Key: SAMZA-307
>                 URL: https://issues.apache.org/jira/browse/SAMZA-307
>             Project: Samza
>          Issue Type: Improvement
>            Reporter: Yan Fang
>
> Currently, we have two ways of deploying the samza job to YARN cluster, from 
> [HDFS|https://samza.incubator.apache.org/learn/tutorials/0.7.0/deploy-samza-job-from-hdfs.html]
>  and [Http | 
> https://samza.incubator.apache.org/learn/tutorials/0.7.0/run-in-multi-node-yarn.html],
>  but neither of them is out-of-box. Users have to go through the tutorial, 
> add dependencies, recompile, put the job package to HDFS or Http and then 
> finally run. I feel it is a little cumbersome sometimes. We maybe able to 
> provide a simpler way to deploy the job.
> When users have YARN and HDFS in the same cluster (such as CDH5), we can 
> provide a job-submit script which does:
> 1. take cluster configuration
> 2. call some jave code to upload the assembly (all the samza needed jars and 
> is already-compiled) and user's job jar (which changes frequently) to the HDFS
> 3. run the job as usual. 
> Therefore, the users only need to run one command line *instead of*:
> 1. going step by step from the tutorial during their first job
> 2. assembling all code and uploading to HDFS manually every time they make 
> changes to their job. 
> (Yes, I learnt it from [Spark's Yarn 
> deploy|http://spark.apache.org/docs/latest/running-on-yarn.html] and [their 
> code|https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala]
>  ) 
> When users only have YARN, I think they have no way but start the http server 
> as tutorial. 
> What do you think? Does the simplification make sense? Or the Samza will have 
> some difficulties (issues) if we do the deploy in this way? Thank you.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to