Re: TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

2014-06-27 Thread Peng Cheng
I give up, communication must be blocked by the complex EC2 network topology
(though the error information indeed need some improvement). It doesn't make
sense to run a client thousands miles away to communicate frequently with
workers. I have moved everything to EC2 now.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/TaskSchedulerImpl-Initial-job-has-not-accepted-any-resources-check-your-cluster-UI-to-ensure-that-woy-tp8247p8444.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

2014-06-27 Thread Xiangrui Meng
Try to use --executor-memory 12g with spark-summit. Or you can set it
in conf/spark-defaults.properties and rsync it to all workers and then
restart. -Xiangrui

On Fri, Jun 27, 2014 at 1:05 PM, Peng Cheng pc...@uow.edu.au wrote:
 I give up, communication must be blocked by the complex EC2 network topology
 (though the error information indeed need some improvement). It doesn't make
 sense to run a client thousands miles away to communicate frequently with
 workers. I have moved everything to EC2 now.



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/TaskSchedulerImpl-Initial-job-has-not-accepted-any-resources-check-your-cluster-UI-to-ensure-that-woy-tp8247p8444.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.


TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

2014-06-25 Thread Peng Cheng
I'm running a very small job (16 partitions, 2 stages) on a 2-node cluster,
each with 15G memory, the master page looks all normal:

URL: spark://ec2-54-88-40-125.compute-1.amazonaws.com:7077
Workers: 1
Cores: 2 Total, 2 Used
Memory: 13.9 GB Total, 512.0 MB Used
Applications: 1 Running, 0 Completed
Drivers: 0 Running, 1 Completed
Status: ALIVE
Workers

Id  Address State   Cores   Memory
worker-20140625083124-ip-172-31-35-57.ec2.internal-54548
ip-172-31-35-57.ec2.internal:54548  ALIVE   2 (2 Used)   13.9 GB (512.0 
MB Used)
Running Applications

ID  NameCores   Memory per Node Submitted Time  UserState   Duration
app-20140625083158- org.tribbloid.spookystuff.example.GoogleImage$   2  
512.0 MB2014/06/25 08:31:58 pengRUNNING 17 min

However when submitting the job in client mode:

$SPARK_HOME/bin/spark-submit \
--class org.tribbloid.spookystuff.example.GoogleImage \
--master spark://ec2-54-88-40-125.compute-1.amazonaws.com:7077 \
--deploy-mode client \
./../../../target/spookystuff-example-assembly-0.1.0-SNAPSHOT.jar \

it is never picked up by any worker despite that 13.4G memory and 2 cores in
total are available. The log of driver shows repeatedly:

14/06/25 04:46:29 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient memory

Looks like its either a bug or misinformation. Can someone confirm this so I
can submit a JIRA?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/TaskSchedulerImpl-Initial-job-has-not-accepted-any-resources-check-your-cluster-UI-to-ensure-that-woy-tp8247.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

2014-06-25 Thread Peng Cheng
Expanded to 4 nodes and change the workers to listen to public DNS, but still
it shows the same error (which is obviously wrong). I can't believe I'm the
first to encounter this issue.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/TaskSchedulerImpl-Initial-job-has-not-accepted-any-resources-check-your-cluster-UI-to-ensure-that-woy-tp8247p8285.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.