Re: Spark on Yarn: Connecting to Existing Instance

2014-08-21 Thread Chris Fregly
perhaps the author is referring to Spark Streaming applications?  they're
examples of long-running applications.

the application/domain-level protocol still needs to be implemented
yourself, as sandy pointed out.


On Wed, Jul 9, 2014 at 11:03 AM, John Omernik j...@omernik.com wrote:

 So how do I do the long-lived server continually satisfying requests in
 the Cloudera application? I am very confused by that at this point.


 On Wed, Jul 9, 2014 at 12:49 PM, Sandy Ryza sandy.r...@cloudera.com
 wrote:

 Spark doesn't currently offer you anything special to do this.  I.e. if
 you want to write a Spark application that fires off jobs on behalf of
 remote processes, you would need to implement the communication between
 those remote processes and your Spark application code yourself.


 On Wed, Jul 9, 2014 at 10:41 AM, John Omernik j...@omernik.com wrote:

 Thank you for the link.  In that link the following is written:

 For those familiar with the Spark API, an application corresponds to an
 instance of the SparkContext class. An application can be used for a
 single batch job, an interactive session with multiple jobs spaced apart,
 or a long-lived server continually satisfying requests

 So, if I wanted to use a long-lived server continually satisfying
 requests and then start a shell that connected to that context, how would
 I do that in Yarn? That's the problem I am having right now, I just want
 there to be that long lived service that I can utilize.

 Thanks!


 On Wed, Jul 9, 2014 at 11:14 AM, Sandy Ryza sandy.r...@cloudera.com
 wrote:

 To add to Ron's answer, this post explains what it means to run Spark
 against a YARN cluster, the difference between yarn-client and yarn-cluster
 mode, and the reason spark-shell only works in yarn-client mode.

 http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/

 -Sandy


 On Wed, Jul 9, 2014 at 9:09 AM, Ron Gonzalez zlgonza...@yahoo.com
 wrote:

 The idea behind YARN is that you can run different application types
 like MapReduce, Storm and Spark.

 I would recommend that you build your spark jobs in the main method
 without specifying how you deploy it. Then you can use spark-submit to 
 tell
 Spark how you would want to deploy to it using yarn-cluster as the master.
 The key point here is that once you have YARN setup, the spark client
 connects to it using the $HADOOP_CONF_DIR that contains the resource
 manager address. In particular, this needs to be accessible from the
 classpath of the submitter since it implicitly uses this when it
 instantiates a YarnConfiguration instance. If you want more details, read
 org.apache.spark.deploy.yarn.Client.scala.

 You should be able to download a standalone YARN cluster from any of
 the Hadoop providers like Cloudera or Hortonworks. Once you have that, the
 spark programming guide describes what I mention above in sufficient 
 detail
 for you to proceed.

 Thanks,
 Ron

 Sent from my iPad

  On Jul 9, 2014, at 8:31 AM, John Omernik j...@omernik.com wrote:
 
  I am trying to get my head around using Spark on Yarn from a
 perspective of a cluster. I can start a Spark Shell no issues in Yarn.
 Works easily.  This is done in yarn-client mode and it all works well.
 
  In multiple examples, I see instances where people have setup Spark
 Clusters in Stand Alone mode, and then in the examples they connect to
 this cluster in Stand Alone mode. This is done often times using the
 spark:// string for connection.  Cool. s
  But what I don't understand is how do I setup a Yarn instance that I
 can connect to? I.e. I tried running Spark Shell in yarn-cluster mode 
 and
 it gave me an error, telling me to use yarn-client.  I see information on
 using spark-class or spark-submit.  But what I'd really like is a instance
 I can connect a spark-shell too, and have the instance stay up. I'd like 
 to
 be able run other things on that instance etc. Is that possible with Yarn?
 I know there may be long running job challenges with Yarn, but I am just
 testing, I am just curious if I am looking at something completely bonkers
 here, or just missing something simple.
 
  Thanks!
 
 








Spark on Yarn: Connecting to Existing Instance

2014-07-09 Thread John Omernik
I am trying to get my head around using Spark on Yarn from a perspective of
a cluster. I can start a Spark Shell no issues in Yarn. Works easily.  This
is done in yarn-client mode and it all works well.

In multiple examples, I see instances where people have setup Spark
Clusters in Stand Alone mode, and then in the examples they connect to
this cluster in Stand Alone mode. This is done often times using the
spark:// string for connection.  Cool. s
But what I don't understand is how do I setup a Yarn instance that I can
connect to? I.e. I tried running Spark Shell in yarn-cluster mode and it
gave me an error, telling me to use yarn-client.  I see information on
using spark-class or spark-submit.  But what I'd really like is a instance
I can connect a spark-shell too, and have the instance stay up. I'd like to
be able run other things on that instance etc. Is that possible with Yarn?
I know there may be long running job challenges with Yarn, but I am just
testing, I am just curious if I am looking at something completely bonkers
here, or just missing something simple.

Thanks!


Re: Spark on Yarn: Connecting to Existing Instance

2014-07-09 Thread Ron Gonzalez
The idea behind YARN is that you can run different application types like 
MapReduce, Storm and Spark.

I would recommend that you build your spark jobs in the main method without 
specifying how you deploy it. Then you can use spark-submit to tell Spark how 
you would want to deploy to it using yarn-cluster as the master. The key point 
here is that once you have YARN setup, the spark client connects to it using 
the $HADOOP_CONF_DIR that contains the resource manager address. In particular, 
this needs to be accessible from the classpath of the submitter since it 
implicitly uses this when it instantiates a YarnConfiguration instance. If you 
want more details, read org.apache.spark.deploy.yarn.Client.scala.

You should be able to download a standalone YARN cluster from any of the Hadoop 
providers like Cloudera or Hortonworks. Once you have that, the spark 
programming guide describes what I mention above in sufficient detail for you 
to proceed.

Thanks,
Ron

Sent from my iPad

 On Jul 9, 2014, at 8:31 AM, John Omernik j...@omernik.com wrote:
 
 I am trying to get my head around using Spark on Yarn from a perspective of a 
 cluster. I can start a Spark Shell no issues in Yarn. Works easily.  This is 
 done in yarn-client mode and it all works well. 
 
 In multiple examples, I see instances where people have setup Spark Clusters 
 in Stand Alone mode, and then in the examples they connect to this cluster 
 in Stand Alone mode. This is done often times using the spark:// string for 
 connection.  Cool. s
 But what I don't understand is how do I setup a Yarn instance that I can 
 connect to? I.e. I tried running Spark Shell in yarn-cluster mode and it 
 gave me an error, telling me to use yarn-client.  I see information on using 
 spark-class or spark-submit.  But what I'd really like is a instance I can 
 connect a spark-shell too, and have the instance stay up. I'd like to be able 
 run other things on that instance etc. Is that possible with Yarn? I know 
 there may be long running job challenges with Yarn, but I am just testing, I 
 am just curious if I am looking at something completely bonkers here, or just 
 missing something simple. 
 
 Thanks!
 
 


Re: Spark on Yarn: Connecting to Existing Instance

2014-07-09 Thread Sandy Ryza
To add to Ron's answer, this post explains what it means to run Spark
against a YARN cluster, the difference between yarn-client and yarn-cluster
mode, and the reason spark-shell only works in yarn-client mode.
http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/

-Sandy


On Wed, Jul 9, 2014 at 9:09 AM, Ron Gonzalez zlgonza...@yahoo.com wrote:

 The idea behind YARN is that you can run different application types like
 MapReduce, Storm and Spark.

 I would recommend that you build your spark jobs in the main method
 without specifying how you deploy it. Then you can use spark-submit to tell
 Spark how you would want to deploy to it using yarn-cluster as the master.
 The key point here is that once you have YARN setup, the spark client
 connects to it using the $HADOOP_CONF_DIR that contains the resource
 manager address. In particular, this needs to be accessible from the
 classpath of the submitter since it implicitly uses this when it
 instantiates a YarnConfiguration instance. If you want more details, read
 org.apache.spark.deploy.yarn.Client.scala.

 You should be able to download a standalone YARN cluster from any of the
 Hadoop providers like Cloudera or Hortonworks. Once you have that, the
 spark programming guide describes what I mention above in sufficient detail
 for you to proceed.

 Thanks,
 Ron

 Sent from my iPad

  On Jul 9, 2014, at 8:31 AM, John Omernik j...@omernik.com wrote:
 
  I am trying to get my head around using Spark on Yarn from a perspective
 of a cluster. I can start a Spark Shell no issues in Yarn. Works easily.
  This is done in yarn-client mode and it all works well.
 
  In multiple examples, I see instances where people have setup Spark
 Clusters in Stand Alone mode, and then in the examples they connect to
 this cluster in Stand Alone mode. This is done often times using the
 spark:// string for connection.  Cool. s
  But what I don't understand is how do I setup a Yarn instance that I can
 connect to? I.e. I tried running Spark Shell in yarn-cluster mode and it
 gave me an error, telling me to use yarn-client.  I see information on
 using spark-class or spark-submit.  But what I'd really like is a instance
 I can connect a spark-shell too, and have the instance stay up. I'd like to
 be able run other things on that instance etc. Is that possible with Yarn?
 I know there may be long running job challenges with Yarn, but I am just
 testing, I am just curious if I am looking at something completely bonkers
 here, or just missing something simple.
 
  Thanks!
 
 



Re: Spark on Yarn: Connecting to Existing Instance

2014-07-09 Thread John Omernik
Thank you for the link.  In that link the following is written:

For those familiar with the Spark API, an application corresponds to an
instance of the SparkContext class. An application can be used for a single
batch job, an interactive session with multiple jobs spaced apart, or a
long-lived server continually satisfying requests

So, if I wanted to use a long-lived server continually satisfying
requests and then start a shell that connected to that context, how would
I do that in Yarn? That's the problem I am having right now, I just want
there to be that long lived service that I can utilize.

Thanks!


On Wed, Jul 9, 2014 at 11:14 AM, Sandy Ryza sandy.r...@cloudera.com wrote:

 To add to Ron's answer, this post explains what it means to run Spark
 against a YARN cluster, the difference between yarn-client and yarn-cluster
 mode, and the reason spark-shell only works in yarn-client mode.

 http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/

 -Sandy


 On Wed, Jul 9, 2014 at 9:09 AM, Ron Gonzalez zlgonza...@yahoo.com wrote:

 The idea behind YARN is that you can run different application types like
 MapReduce, Storm and Spark.

 I would recommend that you build your spark jobs in the main method
 without specifying how you deploy it. Then you can use spark-submit to tell
 Spark how you would want to deploy to it using yarn-cluster as the master.
 The key point here is that once you have YARN setup, the spark client
 connects to it using the $HADOOP_CONF_DIR that contains the resource
 manager address. In particular, this needs to be accessible from the
 classpath of the submitter since it implicitly uses this when it
 instantiates a YarnConfiguration instance. If you want more details, read
 org.apache.spark.deploy.yarn.Client.scala.

 You should be able to download a standalone YARN cluster from any of the
 Hadoop providers like Cloudera or Hortonworks. Once you have that, the
 spark programming guide describes what I mention above in sufficient detail
 for you to proceed.

 Thanks,
 Ron

 Sent from my iPad

  On Jul 9, 2014, at 8:31 AM, John Omernik j...@omernik.com wrote:
 
  I am trying to get my head around using Spark on Yarn from a
 perspective of a cluster. I can start a Spark Shell no issues in Yarn.
 Works easily.  This is done in yarn-client mode and it all works well.
 
  In multiple examples, I see instances where people have setup Spark
 Clusters in Stand Alone mode, and then in the examples they connect to
 this cluster in Stand Alone mode. This is done often times using the
 spark:// string for connection.  Cool. s
  But what I don't understand is how do I setup a Yarn instance that I
 can connect to? I.e. I tried running Spark Shell in yarn-cluster mode and
 it gave me an error, telling me to use yarn-client.  I see information on
 using spark-class or spark-submit.  But what I'd really like is a instance
 I can connect a spark-shell too, and have the instance stay up. I'd like to
 be able run other things on that instance etc. Is that possible with Yarn?
 I know there may be long running job challenges with Yarn, but I am just
 testing, I am just curious if I am looking at something completely bonkers
 here, or just missing something simple.
 
  Thanks!
 
 





Re: Spark on Yarn: Connecting to Existing Instance

2014-07-09 Thread John Omernik
So basically, I have Spark on Yarn running (spark shell) how do I connect
to it with another tool I am trying to test using the spark://IP:7077  URL
it's expecting? If that won't work with spark shell, or yarn-client mode,
how do I setup Spark on Yarn to be able to handle that?

Thanks!




On Wed, Jul 9, 2014 at 12:41 PM, John Omernik j...@omernik.com wrote:

 Thank you for the link.  In that link the following is written:

 For those familiar with the Spark API, an application corresponds to an
 instance of the SparkContext class. An application can be used for a
 single batch job, an interactive session with multiple jobs spaced apart,
 or a long-lived server continually satisfying requests

 So, if I wanted to use a long-lived server continually satisfying
 requests and then start a shell that connected to that context, how would
 I do that in Yarn? That's the problem I am having right now, I just want
 there to be that long lived service that I can utilize.

 Thanks!


 On Wed, Jul 9, 2014 at 11:14 AM, Sandy Ryza sandy.r...@cloudera.com
 wrote:

 To add to Ron's answer, this post explains what it means to run Spark
 against a YARN cluster, the difference between yarn-client and yarn-cluster
 mode, and the reason spark-shell only works in yarn-client mode.

 http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/

 -Sandy


 On Wed, Jul 9, 2014 at 9:09 AM, Ron Gonzalez zlgonza...@yahoo.com
 wrote:

 The idea behind YARN is that you can run different application types
 like MapReduce, Storm and Spark.

 I would recommend that you build your spark jobs in the main method
 without specifying how you deploy it. Then you can use spark-submit to tell
 Spark how you would want to deploy to it using yarn-cluster as the master.
 The key point here is that once you have YARN setup, the spark client
 connects to it using the $HADOOP_CONF_DIR that contains the resource
 manager address. In particular, this needs to be accessible from the
 classpath of the submitter since it implicitly uses this when it
 instantiates a YarnConfiguration instance. If you want more details, read
 org.apache.spark.deploy.yarn.Client.scala.

 You should be able to download a standalone YARN cluster from any of the
 Hadoop providers like Cloudera or Hortonworks. Once you have that, the
 spark programming guide describes what I mention above in sufficient detail
 for you to proceed.

 Thanks,
 Ron

 Sent from my iPad

  On Jul 9, 2014, at 8:31 AM, John Omernik j...@omernik.com wrote:
 
  I am trying to get my head around using Spark on Yarn from a
 perspective of a cluster. I can start a Spark Shell no issues in Yarn.
 Works easily.  This is done in yarn-client mode and it all works well.
 
  In multiple examples, I see instances where people have setup Spark
 Clusters in Stand Alone mode, and then in the examples they connect to
 this cluster in Stand Alone mode. This is done often times using the
 spark:// string for connection.  Cool. s
  But what I don't understand is how do I setup a Yarn instance that I
 can connect to? I.e. I tried running Spark Shell in yarn-cluster mode and
 it gave me an error, telling me to use yarn-client.  I see information on
 using spark-class or spark-submit.  But what I'd really like is a instance
 I can connect a spark-shell too, and have the instance stay up. I'd like to
 be able run other things on that instance etc. Is that possible with Yarn?
 I know there may be long running job challenges with Yarn, but I am just
 testing, I am just curious if I am looking at something completely bonkers
 here, or just missing something simple.
 
  Thanks!
 
 






Re: Spark on Yarn: Connecting to Existing Instance

2014-07-09 Thread Sandy Ryza
Spark doesn't currently offer you anything special to do this.  I.e. if you
want to write a Spark application that fires off jobs on behalf of remote
processes, you would need to implement the communication between those
remote processes and your Spark application code yourself.


On Wed, Jul 9, 2014 at 10:41 AM, John Omernik j...@omernik.com wrote:

 Thank you for the link.  In that link the following is written:

 For those familiar with the Spark API, an application corresponds to an
 instance of the SparkContext class. An application can be used for a
 single batch job, an interactive session with multiple jobs spaced apart,
 or a long-lived server continually satisfying requests

 So, if I wanted to use a long-lived server continually satisfying
 requests and then start a shell that connected to that context, how would
 I do that in Yarn? That's the problem I am having right now, I just want
 there to be that long lived service that I can utilize.

 Thanks!


 On Wed, Jul 9, 2014 at 11:14 AM, Sandy Ryza sandy.r...@cloudera.com
 wrote:

 To add to Ron's answer, this post explains what it means to run Spark
 against a YARN cluster, the difference between yarn-client and yarn-cluster
 mode, and the reason spark-shell only works in yarn-client mode.

 http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/

 -Sandy


 On Wed, Jul 9, 2014 at 9:09 AM, Ron Gonzalez zlgonza...@yahoo.com
 wrote:

 The idea behind YARN is that you can run different application types
 like MapReduce, Storm and Spark.

 I would recommend that you build your spark jobs in the main method
 without specifying how you deploy it. Then you can use spark-submit to tell
 Spark how you would want to deploy to it using yarn-cluster as the master.
 The key point here is that once you have YARN setup, the spark client
 connects to it using the $HADOOP_CONF_DIR that contains the resource
 manager address. In particular, this needs to be accessible from the
 classpath of the submitter since it implicitly uses this when it
 instantiates a YarnConfiguration instance. If you want more details, read
 org.apache.spark.deploy.yarn.Client.scala.

 You should be able to download a standalone YARN cluster from any of the
 Hadoop providers like Cloudera or Hortonworks. Once you have that, the
 spark programming guide describes what I mention above in sufficient detail
 for you to proceed.

 Thanks,
 Ron

 Sent from my iPad

  On Jul 9, 2014, at 8:31 AM, John Omernik j...@omernik.com wrote:
 
  I am trying to get my head around using Spark on Yarn from a
 perspective of a cluster. I can start a Spark Shell no issues in Yarn.
 Works easily.  This is done in yarn-client mode and it all works well.
 
  In multiple examples, I see instances where people have setup Spark
 Clusters in Stand Alone mode, and then in the examples they connect to
 this cluster in Stand Alone mode. This is done often times using the
 spark:// string for connection.  Cool. s
  But what I don't understand is how do I setup a Yarn instance that I
 can connect to? I.e. I tried running Spark Shell in yarn-cluster mode and
 it gave me an error, telling me to use yarn-client.  I see information on
 using spark-class or spark-submit.  But what I'd really like is a instance
 I can connect a spark-shell too, and have the instance stay up. I'd like to
 be able run other things on that instance etc. Is that possible with Yarn?
 I know there may be long running job challenges with Yarn, but I am just
 testing, I am just curious if I am looking at something completely bonkers
 here, or just missing something simple.
 
  Thanks!
 
 






Re: Spark on Yarn: Connecting to Existing Instance

2014-07-09 Thread John Omernik
So how do I do the long-lived server continually satisfying requests in
the Cloudera application? I am very confused by that at this point.


On Wed, Jul 9, 2014 at 12:49 PM, Sandy Ryza sandy.r...@cloudera.com wrote:

 Spark doesn't currently offer you anything special to do this.  I.e. if
 you want to write a Spark application that fires off jobs on behalf of
 remote processes, you would need to implement the communication between
 those remote processes and your Spark application code yourself.


 On Wed, Jul 9, 2014 at 10:41 AM, John Omernik j...@omernik.com wrote:

 Thank you for the link.  In that link the following is written:

 For those familiar with the Spark API, an application corresponds to an
 instance of the SparkContext class. An application can be used for a
 single batch job, an interactive session with multiple jobs spaced apart,
 or a long-lived server continually satisfying requests

 So, if I wanted to use a long-lived server continually satisfying
 requests and then start a shell that connected to that context, how would
 I do that in Yarn? That's the problem I am having right now, I just want
 there to be that long lived service that I can utilize.

 Thanks!


 On Wed, Jul 9, 2014 at 11:14 AM, Sandy Ryza sandy.r...@cloudera.com
 wrote:

 To add to Ron's answer, this post explains what it means to run Spark
 against a YARN cluster, the difference between yarn-client and yarn-cluster
 mode, and the reason spark-shell only works in yarn-client mode.

 http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/

 -Sandy


 On Wed, Jul 9, 2014 at 9:09 AM, Ron Gonzalez zlgonza...@yahoo.com
 wrote:

 The idea behind YARN is that you can run different application types
 like MapReduce, Storm and Spark.

 I would recommend that you build your spark jobs in the main method
 without specifying how you deploy it. Then you can use spark-submit to tell
 Spark how you would want to deploy to it using yarn-cluster as the master.
 The key point here is that once you have YARN setup, the spark client
 connects to it using the $HADOOP_CONF_DIR that contains the resource
 manager address. In particular, this needs to be accessible from the
 classpath of the submitter since it implicitly uses this when it
 instantiates a YarnConfiguration instance. If you want more details, read
 org.apache.spark.deploy.yarn.Client.scala.

 You should be able to download a standalone YARN cluster from any of
 the Hadoop providers like Cloudera or Hortonworks. Once you have that, the
 spark programming guide describes what I mention above in sufficient detail
 for you to proceed.

 Thanks,
 Ron

 Sent from my iPad

  On Jul 9, 2014, at 8:31 AM, John Omernik j...@omernik.com wrote:
 
  I am trying to get my head around using Spark on Yarn from a
 perspective of a cluster. I can start a Spark Shell no issues in Yarn.
 Works easily.  This is done in yarn-client mode and it all works well.
 
  In multiple examples, I see instances where people have setup Spark
 Clusters in Stand Alone mode, and then in the examples they connect to
 this cluster in Stand Alone mode. This is done often times using the
 spark:// string for connection.  Cool. s
  But what I don't understand is how do I setup a Yarn instance that I
 can connect to? I.e. I tried running Spark Shell in yarn-cluster mode and
 it gave me an error, telling me to use yarn-client.  I see information on
 using spark-class or spark-submit.  But what I'd really like is a instance
 I can connect a spark-shell too, and have the instance stay up. I'd like to
 be able run other things on that instance etc. Is that possible with Yarn?
 I know there may be long running job challenges with Yarn, but I am just
 testing, I am just curious if I am looking at something completely bonkers
 here, or just missing something simple.
 
  Thanks!