I run my Spark on YARN jobs as:

HADOOP_CONF_DIR=/etc/hadoop/conf/ /app/data/v606014/dist/bin/spark-submit
--master yarn --jars test-job.jar --executor-cores 4 --num-executors 10
--executor-memory 16g --driver-memory 4g --class TestClass test.jar

It uses HADOOP_CONF_DIR to schedule executors and I get the number I ask
for (assuming other MapReduce jobs are not taking the cluster)...

Large memory intensive jobs like ALS still get issues on YARN but simple
jobs run fine...

Mine is also internal CDH cluster...

On Tue, Nov 18, 2014 at 10:03 AM, Alan Prando <a...@scanboo.com.br> wrote:

> Hi Folks!
>
> I'm running Spark on YARN cluster installed with Cloudera Manager Express.
> The cluster has 1 master and 3 slaves, each machine with 32 cores and 64G
> RAM.
>
> My spark's job is working fine, however it seems that just 2 of 3 slaves
> are working (htop shows 2 slaves working 100% on 32 cores, and 1 slaves
> without any processing).
>
> I'm using this command:
> ./spark-submit --master yarn --num-executors 3 --executor-cores 32
>  --executor-memory 32g feature_extractor.py -r 390
>
> Additionaly, spark's log testify communications with 2 slaves only:
> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor:
> Actor[akka.tcp://sparkExecutor@ip-172-31-13-180.ec2.internal:33177/user/Executor#-113177469]
> with ID 1
> 14/11/18 17:19:38 INFO RackResolver: Resolved
> ip-172-31-13-180.ec2.internal to /default
> 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor:
> Actor[akka.tcp://sparkExecutor@ip-172-31-13-179.ec2.internal:51859/user/Executor#-323896724]
> with ID 2
> 14/11/18 17:19:38 INFO RackResolver: Resolved
> ip-172-31-13-179.ec2.internal to /default
> 14/11/18 17:19:38 INFO BlockManagerMasterActor: Registering block manager
> ip-172-31-13-180.ec2.internal:50959 with 16.6 GB RAM
> 14/11/18 17:19:39 INFO BlockManagerMasterActor: Registering block manager
> ip-172-31-13-179.ec2.internal:53557 with 16.6 GB RAM
> 14/11/18 17:19:51 INFO YarnClientSchedulerBackend: SchedulerBackend is
> ready for scheduling beginning after waiting
> maxRegisteredResourcesWaitingTime: 30000(ms)
>
> Is there a configuration to call spark's job on YARN cluster with all
> slaves?
>
> Thanks in advance! =]
>
> ---
> Regards
> Alan Vidotti Prando.
>
>
>

Reply via email to