Re: Specify node where driver should run

Mich Talebzadeh Tue, 07 Jun 2016 08:35:23 -0700

Thanks. This is getting a bit confusing.

I have these modes for using Spark.



   1. Spark local All on the same host -->  -master local[n]l.. No need to
   start master and slaves. Uses resources as you submit the job.
   2. Spark Standalone. Use a simple cluster manager included with Spark
   that makes it easy to set up a cluster -->  --master
   spark://<HOSTNAME>:7077. Can run on different hosts. Does not rely on
   Yarn. It looks after scheduling itself. Need to start master and slaves


The doc says: There are two deploy modes that can be used to launch Spark
applications* on YARN*.

*In cluster mode*, the Spark driver runs inside an application master
process which is managed by YARN on the cluster, and the client can go away
after initiating the application.

*In client mode*, the driver runs in the client process, and the
application master is only used for requesting resources from YARN.

Unlike Spark standalone
<http://spark.apache.org/docs/latest/spark-standalone.html> and Mesos
<http://spark.apache.org/docs/latest/running-on-mesos.html> modes, in which
the master’s address is specified in the --master parameter, in YARN mode
the ResourceManager’s address is picked up from the Hadoop configuration.
Thus, the --master parameter is yarn.
  So either we have -->   --master yarn --deploy-mode cluster

OR
                                 -->    master yarn-client

So I am not sure running Spark with Yarn in either yarn-client or yarn
cluster is going to make much difference. In sounds like yarn-cluster
supercedes yarn-client?


Any comments welcome




Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 7 June 2016 at 15:40, Sebastian Piu <sebastian....@gmail.com> wrote:

> If you run that job then the driver will ALWAYS run in the machine from
> where you are issuing the spark submit command (E.g. some edge node with
> the clients installed). No matter where the resource manager is running.
>
> If you change yarn-client for yarn-cluster then your driver will start
> somewhere else in the cluster as will the workers and the spark submit
> command will return before the program finishes
>
> On Tue, 7 Jun 2016, 14:53 Jacek Laskowski, <ja...@japila.pl> wrote:
>
>> Hi,
>>
>> --master yarn-client is deprecated and you should use --master yarn
>> --deploy-mode client instead. There are two deploy-modes: client
>> (default) and cluster. See
>> http://spark.apache.org/docs/latest/cluster-overview.html.
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Tue, Jun 7, 2016 at 2:50 PM, Mich Talebzadeh
>> <mich.talebza...@gmail.com> wrote:
>> > ok thanks
>> >
>> > so I start SparkSubmit or similar Spark app on the Yarn resource manager
>> > node.
>> >
>> > What you are stating is that Yan may decide to start the driver program
>> in
>> > another node as opposed to the resource manager node
>> >
>> > ${SPARK_HOME}/bin/spark-submit \
>> >                 --driver-memory=4G \
>> >                 --num-executors=5 \
>> >                 --executor-memory=4G \
>> >                 --master yarn-client \
>> >                 --executor-cores=4 \
>> >
>> > Due to lack of resources in the resource manager node? What is the
>> > likelihood of that. The resource manager node is the defector master
>> node in
>> > all probability much more powerful than other nodes. Also the node that
>> > running resource manager is also running one of the node manager as
>> well. So
>> > in theory may be in practice may not?
>> >
>> > HTH
>> >
>> > Dr Mich Talebzadeh
>> >
>> >
>> >
>> > LinkedIn
>> >
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> >
>> >
>> >
>> > http://talebzadehmich.wordpress.com
>> >
>> >
>> >
>> >
>> > On 7 June 2016 at 13:20, Sebastian Piu <sebastian....@gmail.com> wrote:
>> >>
>> >> What you are explaining is right for yarn-client mode, but the
>> question is
>> >> about yarn-cluster in which case the spark driver is also submitted
>> and run
>> >> in one of the node managers
>> >>
>> >>
>> >> On Tue, 7 Jun 2016, 13:45 Mich Talebzadeh, <mich.talebza...@gmail.com>
>> >> wrote:
>> >>>
>> >>> can you elaborate on the above statement please.
>> >>>
>> >>> When you start yarn you start the resource manager daemon only on the
>> >>> resource manager node
>> >>>
>> >>> yarn-daemon.sh start resourcemanager
>> >>>
>> >>> Then you start nodemanager deamons on all nodes
>> >>>
>> >>> yarn-daemon.sh start nodemanager
>> >>>
>> >>> A spark app has to start somewhere. That is SparkSubmit. and that is
>> >>> deterministic. I start SparkSubmit that talks to Yarn Resource
>> Manager that
>> >>> initialises and registers an Application master. The crucial point is
>> Yarn
>> >>> Resource manager which is basically a resource scheduler. It
>> optimizes for
>> >>> cluster resource utilization to keep all resources in use all the
>> time.
>> >>> However, resource manager itself is on the resource manager node.
>> >>>
>> >>> Now I always start my Spark app on the same node as the resource
>> manager
>> >>> node and let Yarn take care of the rest.
>> >>>
>> >>> Thanks
>> >>>
>> >>> Dr Mich Talebzadeh
>> >>>
>> >>>
>> >>>
>> >>> LinkedIn
>> >>>
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> >>>
>> >>>
>> >>>
>> >>> http://talebzadehmich.wordpress.com
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On 7 June 2016 at 12:17, Jacek Laskowski <ja...@japila.pl> wrote:
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> It's not possible. YARN uses CPU and memory for resource constraints
>> and
>> >>>> places AM on any node available. Same about executors (unless data
>> locality
>> >>>> constraints the placement).
>> >>>>
>> >>>> Jacek
>> >>>>
>> >>>> On 6 Jun 2016 1:54 a.m., "Saiph Kappa" <saiph.ka...@gmail.com>
>> wrote:
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> In yarn-cluster mode, is there any way to specify on which node I
>> want
>> >>>>> the driver to run?
>> >>>>>
>> >>>>> Thanks.
>> >>>
>> >>>
>> >
>>
>

Re: Specify node where driver should run

Reply via email to