[ https://issues.apache.org/jira/browse/SPARK-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340195#comment-14340195 ]
Thomas Graves commented on SPARK-6050: -------------------------------------- Thanks for investigating this more. This is because on that hadoop cluster cpu scheduling isn't turned on. When its not turned on,it defaults to 1 core. @mridulm are you requesting 2 cores in your config? I didn't see it in your example SparkPi command but going by the log command I assume you are. 15/02/27 06:37:33 INFO YarnAllocator: Will request 1 executor containers, each with 2 cores and 32870 MB memory including 2150 MB overhead If that is the case just remove the config or change to 1 core. Other then that I don't know there is anything Spark can do as it doesn't know how Hadoop is configured. The change that was made is to now use more of the Hadoop AMClient which adds the matching for the container requests. We just weren't checking that the cores matched before in Spark 1.2. We could theorectically look at the hadoop configs but that could get pretty hairy quickly as there are different schedulers and then within the scheduler it handles memory/cores differently. Many clusters don't have scheduler configs on gateway boxes also. One thing we should do is add better logging as to why they don't match, but the match routine is also in the Hadoop AMClient code so its more us printing exactly what came back with the allocate response. I can also file Hadoop jira to log information when it doesn't match. The other option would be to write our own match routine. Thoughts? > Spark on YARN does not work --executor-cores is specified > --------------------------------------------------------- > > Key: SPARK-6050 > URL: https://issues.apache.org/jira/browse/SPARK-6050 > Project: Spark > Issue Type: Bug > Components: YARN > Affects Versions: 1.3.0 > Environment: 2.5 based YARN cluster. > Reporter: Mridul Muralidharan > Priority: Blocker > > There are multiple issues here (which I will detail as comments), but to > reproduce running the following ALWAYS hangs in our cluster with the 1.3 RC > ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master > yarn-cluster --executor-cores 8 --num-executors 15 --driver-memory 4g > --executor-memory 2g --queue webmap lib/spark-examples*.jar > 10 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org