[
https://issues.apache.org/jira/browse/SAMZA-307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057072#comment-14057072
]
Chris Riccomini commented on SAMZA-307:
---------------------------------------
I'd say (1) should remain independent of any wrapper script. If we want to have
a single script that runs (2) and then (3), I'm fine with that, provided that
the run-job.sh script sticks around.
Sounds like the script would auto pick the user's home directory if one isn't
specified, `hdfs fs -put`, and call run-job.sh with the path.
> Simplify YARN deploy procedure
> -------------------------------
>
> Key: SAMZA-307
> URL: https://issues.apache.org/jira/browse/SAMZA-307
> Project: Samza
> Issue Type: Improvement
> Reporter: Yan Fang
>
> Currently, we have two ways of deploying the samza job to YARN cluster, from
> [HDFS|https://samza.incubator.apache.org/learn/tutorials/0.7.0/deploy-samza-job-from-hdfs.html]
> and [Http |
> https://samza.incubator.apache.org/learn/tutorials/0.7.0/run-in-multi-node-yarn.html],
> but neither of them is out-of-box. Users have to go through the tutorial,
> add dependencies, recompile, put the job package to HDFS or Http and then
> finally run. I feel it is a little cumbersome sometimes. We maybe able to
> provide a simpler way to deploy the job.
> When users have YARN and HDFS in the same cluster (such as CDH5), we can
> provide a job-submit script which does:
> 1. take cluster configuration
> 2. call some jave code to upload the assembly (all the samza needed jars and
> is already-compiled) and user's job jar (which changes frequently) to the HDFS
> 3. run the job as usual.
> Therefore, the users only need to run one command line *instead of*:
> 1. going step by step from the tutorial during their first job
> 2. assembling all code and uploading to HDFS manually every time they make
> changes to their job.
> (Yes, I learnt it from [Spark's Yarn
> deploy|http://spark.apache.org/docs/latest/running-on-yarn.html] and [their
> code|https://github.com/apache/spark/blob/master/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala]
> )
> When users only have YARN, I think they have no way but start the http server
> as tutorial.
> What do you think? Does the simplification make sense? Or the Samza will have
> some difficulties (issues) if we do the deploy in this way? Thank you.
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)