db tsai,
if in yarn-cluster mode the driver runs inside yarn, how can you do a
rdd.collect and bring the results back to your application?


On Thu, Jun 19, 2014 at 2:33 PM, DB Tsai <dbt...@stanford.edu> wrote:

> We are submitting the spark job in our tomcat application using
> yarn-cluster mode with great success. As Kevin said, yarn-client mode
> runs driver in your local JVM, and it will have really bad network
> overhead when one do reduce action which will pull all the result from
> executor to your local JVM. Also, since you can only have one spark
> context object in one JVM, it will be tricky to run multiple spark
> jobs concurrently with yarn-clinet mode.
>
> This is how we submit spark job with yarn-cluster mode. Please use the
> current master code, otherwise, after the job is finished, spark will
> kill the JVM and exit your app.
>
> We setup the configuration of spark in a scala map.
>
>   def getArgsFromConf(conf: Map[String, String]): Array[String] = {
>     Array[String](
>       "--jar", conf.get("app.jar").getOrElse(""),
>       "--addJars", conf.get("spark.addJars").getOrElse(""),
>       "--class", conf.get("spark.mainClass").getOrElse(""),
>       "--num-executors", conf.get("spark.numWorkers").getOrElse("1"),
>       "--driver-memory", conf.get("spark.masterMemory").getOrElse("1g"),
>       "--executor-memory", conf.get("spark.workerMemory").getOrElse("1g"),
>       "--executor-cores", conf.get("spark.workerCores").getOrElse("1"))
>   }
>
>       System.setProperty("SPARK_YARN_MODE", "true")
>       val sparkConf = new SparkConf
>       val args = getArgsFromConf(conf)
>       new Client(new ClientArguments(args, sparkConf), hadoopConfig,
> sparkConf).run
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Thu, Jun 19, 2014 at 11:22 AM, Kevin Markey <kevin.mar...@oracle.com>
> wrote:
> > Yarn client is much like Spark client mode, except that the executors are
> > running in Yarn containers managed by the Yarn resource manager on the
> > cluster instead of as Spark workers managed by the Spark master.  The
> driver
> > executes as a local client in your local JVM.  It communicates with the
> > workers on the cluster.  Transformations are scheduled on the cluster by
> the
> > driver's logic.  Actions involve communication between local driver and
> > remote cluster executors.  So, there is some additional network overhead,
> > especially if the driver is not co-located on the cluster.  In
> yarn-cluster
> > mode -- in contrast, the driver is executed as a thread in a Yarn
> > application master on the cluster.
> >
> > In either case, the assembly JAR must be available to the application on
> the
> > cluster.  Best to copy it to HDFS and specify its location by exporting
> its
> > location as SPARK_JAR.
> >
> > Kevin Markey
> >
> >
> > On 06/19/2014 11:22 AM, Koert Kuipers wrote:
> >
> > i am trying to understand how yarn-client mode works. i am not using
> > spark-submit, but instead launching a spark job from within my own
> > application.
> >
> > i can see my application contacting yarn successfully, but then in yarn i
> > get an immediate error:
> >
> > Application application_1403117970283_0014 failed 2 times due to AM
> > Container for appattempt_1403117970283_0014_000002 exited with exitCode:
> > -1000 due to: File file:/home/koert/test-assembly-0.1-SNAPSHOT.jar does
> not
> > exist
> > .Failing this attempt.. Failing the application.
> >
> > why is yarn trying to fetch my jar, and why as a local file? i would
> expect
> > the jar to be send to yarn over the wire upon job submission?
> >
> >
>

Reply via email to