Can you check in your RM's web UI how much of each resource does Yarn think you have available? You can also check that in the Yarn configuration directly.
Perhaps it's not configured to use all of the available resources. (If it was set up with Cloudera Manager, CM will reserve some room for daemons that need to run on each machine, so it won't tell Yarn to make all 32 cores / 64 GB available for applications.) Also remember that Spark needs to start "num executors + 1" containers when adding up all the needed resources. The extra container generally requires less resources than the executors, but it still needs to allocate resources from the RM. On Tue, Nov 18, 2014 at 10:03 AM, Alan Prando <a...@scanboo.com.br> wrote: > Hi Folks! > > I'm running Spark on YARN cluster installed with Cloudera Manager Express. > The cluster has 1 master and 3 slaves, each machine with 32 cores and 64G > RAM. > > My spark's job is working fine, however it seems that just 2 of 3 slaves are > working (htop shows 2 slaves working 100% on 32 cores, and 1 slaves without > any processing). > > I'm using this command: > ./spark-submit --master yarn --num-executors 3 --executor-cores 32 > --executor-memory 32g feature_extractor.py -r 390 > > Additionaly, spark's log testify communications with 2 slaves only: > 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor: > Actor[akka.tcp://sparkExecutor@ip-172-31-13-180.ec2.internal:33177/user/Executor#-113177469] > with ID 1 > 14/11/18 17:19:38 INFO RackResolver: Resolved ip-172-31-13-180.ec2.internal > to /default > 14/11/18 17:19:38 INFO YarnClientSchedulerBackend: Registered executor: > Actor[akka.tcp://sparkExecutor@ip-172-31-13-179.ec2.internal:51859/user/Executor#-323896724] > with ID 2 > 14/11/18 17:19:38 INFO RackResolver: Resolved ip-172-31-13-179.ec2.internal > to /default > 14/11/18 17:19:38 INFO BlockManagerMasterActor: Registering block manager > ip-172-31-13-180.ec2.internal:50959 with 16.6 GB RAM > 14/11/18 17:19:39 INFO BlockManagerMasterActor: Registering block manager > ip-172-31-13-179.ec2.internal:53557 with 16.6 GB RAM > 14/11/18 17:19:51 INFO YarnClientSchedulerBackend: SchedulerBackend is ready > for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: > 30000(ms) > > Is there a configuration to call spark's job on YARN cluster with all > slaves? > > Thanks in advance! =] > > --- > Regards > Alan Vidotti Prando. > > -- Marcelo --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org