The document is right. Because of a bug introduce in https://issues.apache.org/jira/browse/SPARK-9092 which makes this configuration fail to work.
It is fixed in https://issues.apache.org/jira/browse/SPARK-10790, you could change to newer version of Spark. On Tue, Nov 24, 2015 at 5:12 PM, 谢廷稳 <xieting...@gmail.com> wrote: > @Sab Thank you for your reply, but the cluster has 6 nodes which contain > 300 cores and Spark application did not request resource from YARN. > > @SaiSai I have ran it successful with " > spark.dynamicAllocation.initialExecutors" equals 50, but in > http://spark.apache.org/docs/latest/configuration.html#dynamic-allocation > it says that > > "spark.dynamicAllocation.initialExecutors" equals " > spark.dynamicAllocation.minExecutors". So, I think something was wrong, > did it? > > Thanks. > > > > 2015-11-24 16:47 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: > >> Did you set this configuration "spark.dynamicAllocation.initialExecutors" >> ? >> >> You can set spark.dynamicAllocation.initialExecutors 50 to take try >> again. >> >> I guess you might be hitting this issue since you're running 1.5.0, >> https://issues.apache.org/jira/browse/SPARK-9092. But it still cannot >> explain why 49 executors can be worked. >> >> On Tue, Nov 24, 2015 at 4:42 PM, Sabarish Sasidharan < >> sabarish.sasidha...@manthan.com> wrote: >> >>> If yarn has only 50 cores then it can support max 49 executors plus 1 >>> driver application master. >>> >>> Regards >>> Sab >>> On 24-Nov-2015 1:58 pm, "谢廷稳" <xieting...@gmail.com> wrote: >>> >>>> OK, yarn.scheduler.maximum-allocation-mb is 16384. >>>> >>>> I have ran it again, the command to run it is: >>>> ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master >>>> yarn-cluster - >>>> -driver-memory 4g --executor-memory 8g lib/spark-examples*.jar 200 >>>> >>>> >>>> >>>>> >>>>> >>>>> 15/11/24 16:15:56 INFO yarn.ApplicationMaster: Registered signal handlers >>>>> for [TERM, HUP, INT] >>>>> >>>>> 15/11/24 16:15:57 INFO yarn.ApplicationMaster: ApplicationAttemptId: >>>>> appattempt_1447834709734_0120_000001 >>>>> >>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing view acls to: >>>>> hdfs-test >>>>> >>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing modify acls to: >>>>> hdfs-test >>>>> >>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: SecurityManager: >>>>> authentication disabled; ui acls disabled; users with view permissions: >>>>> Set(hdfs-test); users with modify permissions: Set(hdfs-test) >>>>> >>>>> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Starting the user >>>>> application in a separate Thread >>>>> >>>>> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Waiting for spark context >>>>> initialization >>>>> >>>>> 15/11/24 16:15:58 INFO yarn.ApplicationMaster: Waiting for spark context >>>>> initialization ... >>>>> 15/11/24 16:15:58 INFO spark.SparkContext: Running Spark version 1.5.0 >>>>> >>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing view acls to: >>>>> hdfs-test >>>>> >>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: Changing modify acls to: >>>>> hdfs-test >>>>> >>>>> 15/11/24 16:15:58 INFO spark.SecurityManager: SecurityManager: >>>>> authentication disabled; ui acls disabled; users with view permissions: >>>>> Set(hdfs-test); users with modify permissions: Set(hdfs-test) >>>>> 15/11/24 16:15:58 INFO slf4j.Slf4jLogger: Slf4jLogger started >>>>> 15/11/24 16:15:59 INFO Remoting: Starting remoting >>>>> >>>>> 15/11/24 16:15:59 INFO Remoting: Remoting started; listening on addresses >>>>> :[akka.tcp://sparkDriver@X.X.X.X >>>>> ] >>>>> >>>>> 15/11/24 16:15:59 INFO util.Utils: Successfully started service >>>>> 'sparkDriver' on port 61904. >>>>> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering MapOutputTracker >>>>> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering BlockManagerMaster >>>>> >>>>> 15/11/24 16:15:59 INFO storage.DiskBlockManager: Created local directory >>>>> at >>>>> /data1/hadoop/nm-local-dir/usercache/hdfs-test/appcache/application_1447834709734_0120/blockmgr-33fbe6c4-5138-4eff-83b4-fb0c886667b7 >>>>> >>>>> 15/11/24 16:15:59 INFO storage.MemoryStore: MemoryStore started with >>>>> capacity 1966.1 MB >>>>> >>>>> 15/11/24 16:15:59 INFO spark.HttpFileServer: HTTP File server directory >>>>> is >>>>> /data1/hadoop/nm-local-dir/usercache/hdfs-test/appcache/application_1447834709734_0120/spark-fbbfa2bd-6d30-421e-a634-4546134b3b5f/httpd-e31d7b8e-ca8f-400e-8b4b-d2993fb6f1d1 >>>>> 15/11/24 16:15:59 INFO spark.HttpServer: Starting HTTP Server >>>>> 15/11/24 16:15:59 INFO server.Server: jetty-8.y.z-SNAPSHOT >>>>> 15/11/24 16:15:59 INFO server.AbstractConnector: Started >>>>> SocketConnector@0.0.0.0:14692 >>>>> >>>>> 15/11/24 16:15:59 INFO util.Utils: Successfully started service 'HTTP >>>>> file server' on port 14692. >>>>> >>>>> 15/11/24 16:15:59 INFO spark.SparkEnv: Registering OutputCommitCoordinator >>>>> >>>>> 15/11/24 16:15:59 INFO ui.JettyUtils: Adding filter: >>>>> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter >>>>> 15/11/24 16:15:59 INFO server.Server: jetty-8.y.z-SNAPSHOT >>>>> 15/11/24 16:15:59 INFO server.AbstractConnector: Started >>>>> SelectChannelConnector@0.0.0.0:15948 >>>>> 15/11/24 16:15:59 INFO util.Utils: Successfully started service 'SparkUI' >>>>> on port 15948. >>>>> >>>>> 15/11/24 16:15:59 INFO ui.SparkUI: Started SparkUI at X.X.X.X >>>>> >>>>> 15/11/24 16:15:59 INFO cluster.YarnClusterScheduler: Created >>>>> YarnClusterScheduler >>>>> >>>>> 15/11/24 16:15:59 WARN metrics.MetricsSystem: Using default name >>>>> DAGScheduler for source because >>>>> spark.app.id is not set. >>>>> >>>>> 15/11/24 16:15:59 INFO util.Utils: Successfully started service >>>>> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41830. >>>>> 15/11/24 16:15:59 INFO netty.NettyBlockTransferService: Server created on >>>>> 41830 >>>>> >>>>> 15/11/24 16:15:59 INFO storage.BlockManagerMaster: Trying to register >>>>> BlockManager >>>>> >>>>> 15/11/24 16:15:59 INFO storage.BlockManagerMasterEndpoint: Registering >>>>> block manager X.X.X.X:41830 with 1966.1 MB RAM, BlockManagerId(driver, >>>>> 10.12.30.2, 41830) >>>>> >>>>> >>>>> 15/11/24 16:15:59 INFO storage.BlockManagerMaster: Registered BlockManager >>>>> 15/11/24 16:16:00 INFO scheduler.EventLoggingListener: Logging events to >>>>> hdfs:///tmp/latest-spark-events/application_1447834709734_0120_1 >>>>> >>>>> 15/11/24 16:16:00 INFO >>>>> cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster >>>>> registered as >>>>> AkkaRpcEndpointRef(Actor[akka://sparkDriver/user/YarnAM#293602859]) >>>>> >>>>> 15/11/24 16:16:00 INFO client.RMProxy: Connecting to ResourceManager at >>>>> X.X.X.X >>>>> >>>>> >>>>> 15/11/24 16:16:00 INFO yarn.YarnRMClient: Registering the >>>>> ApplicationMaster >>>>> >>>>> 15/11/24 16:16:00 INFO yarn.ApplicationMaster: Started progress reporter >>>>> thread with (heartbeat : 3000, initial allocation : 200) intervals >>>>> >>>>> 15/11/24 16:16:29 INFO cluster.YarnClusterSchedulerBackend: >>>>> SchedulerBackend is ready for scheduling beginning after waiting >>>>> maxRegisteredResourcesWaitingTime: 30000(ms) >>>>> >>>>> 15/11/24 16:16:29 INFO cluster.YarnClusterScheduler: >>>>> YarnClusterScheduler.postStartHook done >>>>> >>>>> 15/11/24 16:16:29 INFO spark.SparkContext: Starting job: reduce at >>>>> SparkPi.scala:36 >>>>> >>>>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Got job 0 (reduce at >>>>> SparkPi.scala:36) with 200 output partitions >>>>> >>>>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Final stage: ResultStage >>>>> 0(reduce at SparkPi.scala:36) >>>>> >>>>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Parents of final stage: >>>>> List() >>>>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Missing parents: List() >>>>> >>>>> 15/11/24 16:16:29 INFO scheduler.DAGScheduler: Submitting ResultStage 0 >>>>> (MapPartitionsRDD[1] at map at SparkPi.scala:32), which has no missing >>>>> parents >>>>> >>>>> 15/11/24 16:16:30 INFO storage.MemoryStore: ensureFreeSpace(1888) called >>>>> with curMem=0, maxMem=2061647216 >>>>> >>>>> 15/11/24 16:16:30 INFO storage.MemoryStore: Block broadcast_0 stored as >>>>> values in memory (estimated size 1888.0 B, free 1966.1 MB) >>>>> 15/11/24 16:16:30 INFO storage.MemoryStore: ensureFreeSpace(1202) called >>>>> with curMem=1888, maxMem=2061647216 >>>>> >>>>> 15/11/24 16:16:30 INFO storage.MemoryStore: Block broadcast_0_piece0 >>>>> stored as bytes in memory (estimated size 1202.0 B, free 1966.1 MB) >>>>> >>>>> 15/11/24 16:16:30 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 >>>>> in memory on X.X.X.X:41830 (size: 1202.0 B, free: 1966.1 MB) >>>>> >>>>> >>>>> 15/11/24 16:16:30 INFO spark.SparkContext: Created broadcast 0 from >>>>> broadcast at DAGScheduler.scala:861 >>>>> >>>>> 15/11/24 16:16:30 INFO scheduler.DAGScheduler: Submitting 200 missing >>>>> tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32) >>>>> >>>>> 15/11/24 16:16:30 INFO cluster.YarnClusterScheduler: Adding task set 0.0 >>>>> with 200 tasks >>>>> >>>>> 15/11/24 16:16:45 WARN cluster.YarnClusterScheduler: Initial job has not >>>>> accepted any resources; check your cluster UI to ensure that workers are >>>>> registered and have sufficient resources >>>>> >>>>> 15/11/24 16:17:00 WARN cluster.YarnClusterScheduler: Initial job has not >>>>> accepted any resources; check your cluster UI to ensure that workers are >>>>> registered and have sufficient resources >>>>> >>>>> 15/11/24 16:17:15 WARN cluster.YarnClusterScheduler: Initial job has not >>>>> accepted any resources; check your cluster UI to ensure that workers are >>>>> registered and have sufficient resources >>>>> >>>>> 15/11/24 16:17:30 WARN cluster.YarnClusterScheduler: Initial job has not >>>>> accepted any resources; check your cluster UI to ensure that workers are >>>>> registered and have sufficient resources >>>>> >>>>> 15/11/24 16:17:45 WARN cluster.YarnClusterScheduler: Initial job has not >>>>> accepted any resources; check your cluster UI to ensure that workers are >>>>> registered and have sufficient resources >>>>> >>>>> 15/11/24 16:18:00 WARN cluster.YarnClusterScheduler: Initial job has not >>>>> accepted any resources; check your cluster UI to ensure that workers are >>>>> registered and have sufficient resources >>>>> >>>>> >>>> 2015-11-24 15:14 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: >>>> >>>>> What about this configure in Yarn " >>>>> yarn.scheduler.maximum-allocation-mb" >>>>> >>>>> I'm curious why 49 executors can be worked, but 50 failed. Would you >>>>> provide your application master log, if container request is issued, there >>>>> will be log like: >>>>> >>>>> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Will request 2 executor >>>>> containers, each with 1 cores and 1408 MB memory including 384 MB overhead >>>>> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Container request (host: >>>>> Any, capability: <memory:1408, vCores:1>) >>>>> 15/10/14 17:35:37 INFO yarn.YarnAllocator: Container request (host: >>>>> Any, capability: <memory:1408, vCores:1>) >>>>> >>>>> >>>>> >>>>> On Tue, Nov 24, 2015 at 2:56 PM, 谢廷稳 <xieting...@gmail.com> wrote: >>>>> >>>>>> OK, the YARN conf will be list in the following: >>>>>> >>>>>> yarn.nodemanager.resource.memory-mb:115200 >>>>>> yarn.nodemanager.resource.cpu-vcores:50 >>>>>> >>>>>> I think the YARN resource is sufficient. In the previous letter I >>>>>> have said that I think Spark application didn't request resources >>>>>> from YARN. >>>>>> >>>>>> Thanks >>>>>> >>>>>> 2015-11-24 14:30 GMT+08:00 cherrywayb...@gmail.com < >>>>>> cherrywayb...@gmail.com>: >>>>>> >>>>>>> can you show your parameter values in your env ? >>>>>>> yarn.nodemanager.resource.cpu-vcores >>>>>>> yarn.nodemanager.resource.memory-mb >>>>>>> >>>>>>> ------------------------------ >>>>>>> cherrywayb...@gmail.com >>>>>>> >>>>>>> >>>>>>> *From:* 谢廷稳 <xieting...@gmail.com> >>>>>>> *Date:* 2015-11-24 12:13 >>>>>>> *To:* Saisai Shao <sai.sai.s...@gmail.com> >>>>>>> *CC:* spark users <user@spark.apache.org> >>>>>>> *Subject:* Re: A Problem About Running Spark 1.5 on YARN with >>>>>>> Dynamic Alloction >>>>>>> OK,the YARN cluster was used by myself,it have 6 node witch can run >>>>>>> over 100 executor, and the YARN RM logs showed that the Spark >>>>>>> application >>>>>>> did not requested resource from it. >>>>>>> >>>>>>> Is this a bug? Should I create a JIRA for this problem? >>>>>>> >>>>>>> 2015-11-24 12:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: >>>>>>> >>>>>>>> OK, so this looks like your Yarn cluster does not allocate >>>>>>>> containers which you supposed should be 50. Does the yarn cluster have >>>>>>>> enough resource after allocating AM container, if not, that is the >>>>>>>> problem. >>>>>>>> >>>>>>>> The problem not lies in dynamic allocation from my guess of your >>>>>>>> description. I said I'm OK with min and max executors to the same >>>>>>>> number. >>>>>>>> >>>>>>>> On Tue, Nov 24, 2015 at 11:54 AM, 谢廷稳 <xieting...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Saisai, >>>>>>>>> I'm sorry for did not describe it clearly,YARN debug log said I >>>>>>>>> have 50 executors,but ResourceManager showed that I only have 1 >>>>>>>>> container >>>>>>>>> for the AppMaster. >>>>>>>>> >>>>>>>>> I have checked YARN RM logs,after AppMaster changed state >>>>>>>>> from ACCEPTED to RUNNING,it did not have log about this job any >>>>>>>>> more.So,the >>>>>>>>> problem is I did not have any executor but ExecutorAllocationManager >>>>>>>>> think >>>>>>>>> I have.Would you minding having a test in your cluster environment? >>>>>>>>> Thanks, >>>>>>>>> Weber >>>>>>>>> >>>>>>>>> 2015-11-24 11:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: >>>>>>>>> >>>>>>>>>> I think this behavior is expected, since you already have 50 >>>>>>>>>> executors launched, so no need to acquire additional executors. You >>>>>>>>>> change >>>>>>>>>> is not solid, it is just hiding the log. >>>>>>>>>> >>>>>>>>>> Again I think you should check the logs of Yarn and Spark to see >>>>>>>>>> if executors are started correctly. Why resource is still not enough >>>>>>>>>> where >>>>>>>>>> you already have 50 executors. >>>>>>>>>> >>>>>>>>>> On Tue, Nov 24, 2015 at 10:48 AM, 谢廷稳 <xieting...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi SaiSai, >>>>>>>>>>> I have changed "if (numExecutorsTarget >= maxNumExecutors)" to >>>>>>>>>>> "if (numExecutorsTarget > maxNumExecutors)" of the first line in the >>>>>>>>>>> ExecutorAllocationManager#addExecutors() and it rans well. >>>>>>>>>>> In my opinion,when I was set minExecutors equals >>>>>>>>>>> maxExecutors,when the first time to add Executors,numExecutorsTarget >>>>>>>>>>> equals maxNumExecutors and it repeat printe "DEBUG >>>>>>>>>>> ExecutorAllocationManager: Not adding executors because our current >>>>>>>>>>> target >>>>>>>>>>> total is already 50 (limit 50)". >>>>>>>>>>> Thanks >>>>>>>>>>> Weber >>>>>>>>>>> >>>>>>>>>>> 2015-11-23 21:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: >>>>>>>>>>> >>>>>>>>>>>> Hi Tingwen, >>>>>>>>>>>> >>>>>>>>>>>> Would you minding sharing your changes in >>>>>>>>>>>> ExecutorAllocationManager#addExecutors(). >>>>>>>>>>>> >>>>>>>>>>>> From my understanding and test, dynamic allocation can be >>>>>>>>>>>> worked when you set the min to max number of executors to the same >>>>>>>>>>>> number. >>>>>>>>>>>> >>>>>>>>>>>> Please check your Spark and Yarn log to make sure the executors >>>>>>>>>>>> are correctly started, the warning log means currently resource is >>>>>>>>>>>> not >>>>>>>>>>>> enough to submit tasks. >>>>>>>>>>>> >>>>>>>>>>>> Thanks >>>>>>>>>>>> Saisai >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Nov 23, 2015 at 8:41 PM, 谢廷稳 <xieting...@gmail.com> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi all, >>>>>>>>>>>>> I ran a SparkPi on YARN with Dynamic Allocation enabled and >>>>>>>>>>>>> set spark.dynamicAllocation.maxExecutors equals >>>>>>>>>>>>> spark.dynamicAllocation.minExecutors,then I submit an >>>>>>>>>>>>> application using: >>>>>>>>>>>>> ./bin/spark-submit --class org.apache.spark.examples.SparkPi >>>>>>>>>>>>> --master yarn-cluster --driver-memory 4g --executor-memory 8g >>>>>>>>>>>>> lib/spark-examples*.jar 200 >>>>>>>>>>>>> >>>>>>>>>>>>> then, this application was submitted successfully, but the >>>>>>>>>>>>> AppMaster always saying “15/11/23 20:13:08 WARN >>>>>>>>>>>>> cluster.YarnClusterScheduler: Initial job has not accepted any >>>>>>>>>>>>> resources; >>>>>>>>>>>>> check your cluster UI to ensure that workers are registered and >>>>>>>>>>>>> have >>>>>>>>>>>>> sufficient resources” >>>>>>>>>>>>> and when I open DEBUG,I found “15/11/23 20:24:00 DEBUG >>>>>>>>>>>>> ExecutorAllocationManager: Not adding executors because our >>>>>>>>>>>>> current target >>>>>>>>>>>>> total is already 50 (limit 50)” in the console. >>>>>>>>>>>>> >>>>>>>>>>>>> I have fixed it by modifying code in >>>>>>>>>>>>> ExecutorAllocationManager.addExecutors,Does this a bug or it was >>>>>>>>>>>>> designed >>>>>>>>>>>>> that we can’t set maxExecutors equals minExecutors? >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Weber >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> >