Re: TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
I give up, communication must be blocked by the complex EC2 network topology (though the error information indeed need some improvement). It doesn't make sense to run a client thousands miles away to communicate frequently with workers. I have moved everything to EC2 now. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/TaskSchedulerImpl-Initial-job-has-not-accepted-any-resources-check-your-cluster-UI-to-ensure-that-woy-tp8247p8444.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
Try to use --executor-memory 12g with spark-summit. Or you can set it in conf/spark-defaults.properties and rsync it to all workers and then restart. -Xiangrui On Fri, Jun 27, 2014 at 1:05 PM, Peng Cheng pc...@uow.edu.au wrote: I give up, communication must be blocked by the complex EC2 network topology (though the error information indeed need some improvement). It doesn't make sense to run a client thousands miles away to communicate frequently with workers. I have moved everything to EC2 now. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/TaskSchedulerImpl-Initial-job-has-not-accepted-any-resources-check-your-cluster-UI-to-ensure-that-woy-tp8247p8444.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
I'm running a very small job (16 partitions, 2 stages) on a 2-node cluster, each with 15G memory, the master page looks all normal: URL: spark://ec2-54-88-40-125.compute-1.amazonaws.com:7077 Workers: 1 Cores: 2 Total, 2 Used Memory: 13.9 GB Total, 512.0 MB Used Applications: 1 Running, 0 Completed Drivers: 0 Running, 1 Completed Status: ALIVE Workers Id Address State Cores Memory worker-20140625083124-ip-172-31-35-57.ec2.internal-54548 ip-172-31-35-57.ec2.internal:54548 ALIVE 2 (2 Used) 13.9 GB (512.0 MB Used) Running Applications ID NameCores Memory per Node Submitted Time UserState Duration app-20140625083158- org.tribbloid.spookystuff.example.GoogleImage$ 2 512.0 MB2014/06/25 08:31:58 pengRUNNING 17 min However when submitting the job in client mode: $SPARK_HOME/bin/spark-submit \ --class org.tribbloid.spookystuff.example.GoogleImage \ --master spark://ec2-54-88-40-125.compute-1.amazonaws.com:7077 \ --deploy-mode client \ ./../../../target/spookystuff-example-assembly-0.1.0-SNAPSHOT.jar \ it is never picked up by any worker despite that 13.4G memory and 2 cores in total are available. The log of driver shows repeatedly: 14/06/25 04:46:29 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory Looks like its either a bug or misinformation. Can someone confirm this so I can submit a JIRA? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/TaskSchedulerImpl-Initial-job-has-not-accepted-any-resources-check-your-cluster-UI-to-ensure-that-woy-tp8247.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
Expanded to 4 nodes and change the workers to listen to public DNS, but still it shows the same error (which is obviously wrong). I can't believe I'm the first to encounter this issue. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/TaskSchedulerImpl-Initial-job-has-not-accepted-any-resources-check-your-cluster-UI-to-ensure-that-woy-tp8247p8285.html Sent from the Apache Spark User List mailing list archive at Nabble.com.