ok thanks

so I start SparkSubmit or similar Spark app on the Yarn resource manager
node.

What you are stating is that Yan may decide to start the driver program in
another node as opposed to the resource manager node

${SPARK_HOME}/bin/spark-submit \
                --driver-memory=4G \
                --num-executors=5 \
                --executor-memory=4G \
                --master yarn-client \
                --executor-cores=4 \

Due to lack of resources in the resource manager node? What is the
likelihood of that. The resource manager node is the defector master node
in all probability much more powerful than other nodes. Also the node that
running resource manager is also running one of the node manager as well.
So in theory may be in practice may not?

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 7 June 2016 at 13:20, Sebastian Piu <sebastian....@gmail.com> wrote:

> What you are explaining is right for yarn-client mode, but the question is
> about yarn-cluster in which case the spark driver is also submitted and run
> in one of the node managers
>
>
> On Tue, 7 Jun 2016, 13:45 Mich Talebzadeh, <mich.talebza...@gmail.com>
> wrote:
>
>> can you elaborate on the above statement please.
>>
>> When you start yarn you start the resource manager daemon only on the
>> resource manager node
>>
>> yarn-daemon.sh start resourcemanager
>>
>> Then you start nodemanager deamons on all nodes
>>
>> yarn-daemon.sh start nodemanager
>>
>> A spark app has to start somewhere. That is SparkSubmit. and that is
>> deterministic. I start SparkSubmit that talks to Yarn Resource Manager that
>> initialises and registers an Application master. The crucial point is Yarn
>> Resource manager which is basically a resource scheduler. It optimizes for
>> cluster resource utilization to keep all resources in use all the time.
>> However, resource manager itself is on the resource manager node.
>>
>> Now I always start my Spark app on the same node as the resource manager
>> node and let Yarn take care of the rest.
>>
>> Thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 7 June 2016 at 12:17, Jacek Laskowski <ja...@japila.pl> wrote:
>>
>>> Hi,
>>>
>>> It's not possible. YARN uses CPU and memory for resource constraints and
>>> places AM on any node available. Same about executors (unless data locality
>>> constraints the placement).
>>>
>>> Jacek
>>> On 6 Jun 2016 1:54 a.m., "Saiph Kappa" <saiph.ka...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> In yarn-cluster mode, is there any way to specify on which node I want
>>>> the driver to run?
>>>>
>>>> Thanks.
>>>>
>>>
>>

Reply via email to