Re: Spark: All masters are unresponsive!
Are you sure this is your master URL spark://pzxnvm2018:7077 ? You can look it up in the WebUI (mostly http://pzxnvm2018:8080) top left corner. Also make sure you are able to telnet pzxnvm2018 7077 from the machines where you are running the spark shell. Thanks Best Regards On Tue, Jul 8, 2014 at 12:21 PM, Sameer Tilak ssti...@live.com wrote: Hi All, I am having a few issues with stability and scheduling. When I use spark shell to submit my application. I get the following error message and spark shell crashes. I have a small 4-node cluster for PoC. I tried both manual and scripts-based cluster set up. I tried using FQDN as well for specifying the master node, but no luck. 14/07/07 23:44:35 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[6] at map at JaccardScore.scala:83) 14/07/07 23:44:35 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks 14/07/07 23:44:35 INFO TaskSetManager: Starting task 1.0:0 as TID 1 on executor localhost: localhost (PROCESS_LOCAL) 14/07/07 23:44:35 INFO TaskSetManager: Serialized task 1.0:0 as 2322 bytes in 0 ms 14/07/07 23:44:35 INFO TaskSetManager: Starting task 1.0:1 as TID 2 on executor localhost: localhost (PROCESS_LOCAL) 14/07/07 23:44:35 INFO TaskSetManager: Serialized task 1.0:1 as 2322 bytes in 0 ms 14/07/07 23:44:35 INFO Executor: Running task ID 1 14/07/07 23:44:35 INFO Executor: Running task ID 2 14/07/07 23:44:35 INFO BlockManager: Found block broadcast_1 locally 14/07/07 23:44:35 INFO BlockManager: Found block broadcast_1 locally 14/07/07 23:44:35 INFO HadoopRDD: Input split: hdfs://pzxnvm2018:54310/data/sameer_7-2-2014_3mm_sentences.tsv:0+97239389 14/07/07 23:44:35 INFO HadoopRDD: Input split: hdfs://pzxnvm2018:54310/data/sameer_7-2-2014_3mm_sentences.tsv:97239389+97239390 14/07/07 23:44:54 INFO AppClient$ClientActor: Connecting to master spark://pzxnvm2018:7077... 14/07/07 23:45:14 INFO AppClient$ClientActor: Connecting to master spark://pzxnvm2018:7077... 14/07/07 23:45:35 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. 14/07/07 23:45:35 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up. 14/07/07 23:45:35 WARN HadoopRDD: Exception in RecordReader.close() java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264) at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.close(DFSClient.java:2135) at java.io.FilterInputStream.close(FilterInputStream.java:181) at org.apache.hadoop.util.LineReader.close(LineReader.java:83) at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:168) at org.apache.spark.rdd.HadoopRDD$$anon$1.close(HadoopRDD.scala:208) at org.apache.spark.util.NextIterator.closeIfNeeded(NextIterator.scala:63) at org.apache.spark.rdd.HadoopRDD$$anon$1$$anonfun$1.apply$mcV$sp(HadoopRDD.scala:193) at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63) at org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.TaskContext.executeOnCompleteCallbacks(TaskContext.scala:63) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:113) at org.apache.spark.scheduler.Task.run(Task.scala:51) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) 14/07/07 23:45:35 ERROR Executor: Exception in task ID 2 java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264) at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2213) at java.io.DataInputStream.read(DataInputStream.java:100) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:133) at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:38) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:198) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:181) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) at
Re: Spark: All masters are unresponsive!
app-20140708100139-/2 removed: Command exited with code 1 14/07/08 10:01:43 INFO AppClient$ClientActor: Executor added: app-20140708100139-/5 on worker-20140708095559-pzxnvm2022.x.y.name.org-41826 ( pzxnvm2022.dcld.pldc.kp.org:41826) with 1 cores 14/07/08 10:01:43 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140708100139-/5 on hostPort pzxnvm2022.x.y.name.org:41826 with 1 cores, 512.0 MB RAM 14/07/08 10:01:43 INFO AppClient$ClientActor: Executor updated: app-20140708100139-/5 is now RUNNING 14/07/08 10:01:44 INFO AppClient$ClientActor: Executor updated: app-20140708100139-/3 is now FAILED (Command exited with code 1) 14/07/08 10:01:44 INFO SparkDeploySchedulerBackend: Executor app-20140708100139-/3 removed: Command exited with code 1 14/07/08 10:01:44 INFO AppClient$ClientActor: Executor added: app-20140708100139-/6 on worker-20140708095558-pzxnvm2024.x.y.name.org-50218 ( pzxnvm2024.dcld.pldc.kp.org:50218) with 4 cores 14/07/08 10:01:44 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140708100139-/6 on hostPort pzxnvm2024.x.y.name.org:50218 with 4 cores, 512.0 MB RAM 14/07/08 10:01:44 INFO AppClient$ClientActor: Executor updated: app-20140708100139-/6 is now RUNNING 14/07/08 10:01:45 INFO AppClient$ClientActor: Executor updated: app-20140708100139-/4 is now FAILED (Command exited with code 1) 14/07/08 10:01:45 INFO SparkDeploySchedulerBackend: Executor app-20140708100139-/4 removed: Command exited with code 1 14/07/08 10:01:45 INFO AppClient$ClientActor: Executor added: app-20140708100139-/7 on worker-20140708095559-pzxnvm2023.x.y.name.org-38294 ( pzxnvm2023.dcld.pldc.kp.org:38294) with 4 cores -- Date: Tue, 8 Jul 2014 12:29:21 +0530 Subject: Re: Spark: All masters are unresponsive! From: ak...@sigmoidanalytics.com To: user@spark.apache.org Are you sure this is your master URL spark://pzxnvm2018:7077 ? You can look it up in the WebUI (mostly http://pzxnvm2018:8080) top left corner. Also make sure you are able to telnet pzxnvm2018 7077 from the machines where you are running the spark shell. Thanks Best Regards On Tue, Jul 8, 2014 at 12:21 PM, Sameer Tilak ssti...@live.com wrote: Hi All, I am having a few issues with stability and scheduling. When I use spark shell to submit my application. I get the following error message and spark shell crashes. I have a small 4-node cluster for PoC. I tried both manual and scripts-based cluster set up. I tried using FQDN as well for specifying the master node, but no luck. 14/07/07 23:44:35 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[6] at map at JaccardScore.scala:83) 14/07/07 23:44:35 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks 14/07/07 23:44:35 INFO TaskSetManager: Starting task 1.0:0 as TID 1 on executor localhost: localhost (PROCESS_LOCAL) 14/07/07 23:44:35 INFO TaskSetManager: Serialized task 1.0:0 as 2322 bytes in 0 ms 14/07/07 23:44:35 INFO TaskSetManager: Starting task 1.0:1 as TID 2 on executor localhost: localhost (PROCESS_LOCAL) 14/07/07 23:44:35 INFO TaskSetManager: Serialized task 1.0:1 as 2322 bytes in 0 ms 14/07/07 23:44:35 INFO Executor: Running task ID 1 14/07/07 23:44:35 INFO Executor: Running task ID 2 14/07/07 23:44:35 INFO BlockManager: Found block broadcast_1 locally 14/07/07 23:44:35 INFO BlockManager: Found block broadcast_1 locally 14/07/07 23:44:35 INFO HadoopRDD: Input split: hdfs://pzxnvm2018:54310/data/sameer_7-2-2014_3mm_sentences.tsv:0+97239389 14/07/07 23:44:35 INFO HadoopRDD: Input split: hdfs://pzxnvm2018:54310/data/sameer_7-2-2014_3mm_sentences.tsv:97239389+97239390 14/07/07 23:44:54 INFO AppClient$ClientActor: Connecting to master spark://pzxnvm2018:7077... 14/07/07 23:45:14 INFO AppClient$ClientActor: Connecting to master spark://pzxnvm2018:7077... 14/07/07 23:45:35 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. 14/07/07 23:45:35 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up. 14/07/07 23:45:35 WARN HadoopRDD: Exception in RecordReader.close() java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264) at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.close(DFSClient.java:2135) at java.io.FilterInputStream.close(FilterInputStream.java:181) at org.apache.hadoop.util.LineReader.close(LineReader.java:83) at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:168) at org.apache.spark.rdd.HadoopRDD$$anon$1.close(HadoopRDD.scala:208) at org.apache.spark.util.NextIterator.closeIfNeeded(NextIterator.scala:63) at org.apache.spark.rdd.HadoopRDD$$anon$1$$anonfun$1.apply$mcV$sp(HadoopRDD.scala:193) at org.apache.spark.TaskContext$$anonfun