Re: Spark on Yarn: Connecting to Existing Instance

Sandy Ryza Wed, 09 Jul 2014 10:50:29 -0700

Spark doesn't currently offer you anything special to do this.  I.e. if you
want to write a Spark application that fires off jobs on behalf of remote
processes, you would need to implement the communication between those
remote processes and your Spark application code yourself.



On Wed, Jul 9, 2014 at 10:41 AM, John Omernik <j...@omernik.com> wrote:

> Thank you for the link.  In that link the following is written:
>
> For those familiar with the Spark API, an application corresponds to an
> instance of the SparkContext class. An application can be used for a
> single batch job, an interactive session with multiple jobs spaced apart,
> or a long-lived server continually satisfying requests
>
> So, if I wanted to use "a long-lived server continually satisfying
> requests" and then start a shell that connected to that context, how would
> I do that in Yarn? That's the problem I am having right now, I just want
> there to be that long lived service that I can utilize.
>
> Thanks!
>
>
> On Wed, Jul 9, 2014 at 11:14 AM, Sandy Ryza <sandy.r...@cloudera.com>
> wrote:
>
>> To add to Ron's answer, this post explains what it means to run Spark
>> against a YARN cluster, the difference between yarn-client and yarn-cluster
>> mode, and the reason spark-shell only works in yarn-client mode.
>>
>> http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/
>>
>> -Sandy
>>
>>
>> On Wed, Jul 9, 2014 at 9:09 AM, Ron Gonzalez <zlgonza...@yahoo.com>
>> wrote:
>>
>>> The idea behind YARN is that you can run different application types
>>> like MapReduce, Storm and Spark.
>>>
>>> I would recommend that you build your spark jobs in the main method
>>> without specifying how you deploy it. Then you can use spark-submit to tell
>>> Spark how you would want to deploy to it using yarn-cluster as the master.
>>> The key point here is that once you have YARN setup, the spark client
>>> connects to it using the $HADOOP_CONF_DIR that contains the resource
>>> manager address. In particular, this needs to be accessible from the
>>> classpath of the submitter since it implicitly uses this when it
>>> instantiates a YarnConfiguration instance. If you want more details, read
>>> org.apache.spark.deploy.yarn.Client.scala.
>>>
>>> You should be able to download a standalone YARN cluster from any of the
>>> Hadoop providers like Cloudera or Hortonworks. Once you have that, the
>>> spark programming guide describes what I mention above in sufficient detail
>>> for you to proceed.
>>>
>>> Thanks,
>>> Ron
>>>
>>> Sent from my iPad
>>>
>>> > On Jul 9, 2014, at 8:31 AM, John Omernik <j...@omernik.com> wrote:
>>> >
>>> > I am trying to get my head around using Spark on Yarn from a
>>> perspective of a cluster. I can start a Spark Shell no issues in Yarn.
>>> Works easily.  This is done in yarn-client mode and it all works well.
>>> >
>>> > In multiple examples, I see instances where people have setup Spark
>>> Clusters in Stand Alone mode, and then in the examples they "connect" to
>>> this cluster in Stand Alone mode. This is done often times using the
>>> spark:// string for connection.  Cool. s
>>> > But what I don't understand is how do I setup a Yarn instance that I
>>> can "connect" to? I.e. I tried running Spark Shell in yarn-cluster mode and
>>> it gave me an error, telling me to use yarn-client.  I see information on
>>> using spark-class or spark-submit.  But what I'd really like is a instance
>>> I can connect a spark-shell too, and have the instance stay up. I'd like to
>>> be able run other things on that instance etc. Is that possible with Yarn?
>>> I know there may be long running job challenges with Yarn, but I am just
>>> testing, I am just curious if I am looking at something completely bonkers
>>> here, or just missing something simple.
>>> >
>>> > Thanks!
>>> >
>>> >
>>>
>>
>>
>

Re: Spark on Yarn: Connecting to Existing Instance

Reply via email to