Re: Spark: All masters are unresponsive!

2014-07-08 Thread Akhil Das
Are you sure this is your master URL spark://pzxnvm2018:7077 ?

You can look it up in the WebUI (mostly http://pzxnvm2018:8080) top left
corner. Also make sure you are able to telnet pzxnvm2018 7077 from the
machines where you are running the spark shell.

Thanks
Best Regards


On Tue, Jul 8, 2014 at 12:21 PM, Sameer Tilak ssti...@live.com wrote:

 Hi All,

 I am having a few issues with stability and scheduling. When I use spark
 shell to submit my application. I get the following error message and spark
 shell crashes. I have a small 4-node cluster for PoC. I tried both manual
 and scripts-based cluster set up. I tried using FQDN as well for specifying
 the master node, but no luck.

 14/07/07 23:44:35 INFO DAGScheduler: Submitting 2 missing tasks from Stage
 1 (MappedRDD[6] at map at JaccardScore.scala:83)
 14/07/07 23:44:35 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
 14/07/07 23:44:35 INFO TaskSetManager: Starting task 1.0:0 as TID 1 on
 executor localhost: localhost (PROCESS_LOCAL)
 14/07/07 23:44:35 INFO TaskSetManager: Serialized task 1.0:0 as 2322 bytes
 in 0 ms
 14/07/07 23:44:35 INFO TaskSetManager: Starting task 1.0:1 as TID 2 on
 executor localhost: localhost (PROCESS_LOCAL)
 14/07/07 23:44:35 INFO TaskSetManager: Serialized task 1.0:1 as 2322 bytes
 in 0 ms
 14/07/07 23:44:35 INFO Executor: Running task ID 1
 14/07/07 23:44:35 INFO Executor: Running task ID 2
 14/07/07 23:44:35 INFO BlockManager: Found block broadcast_1 locally
 14/07/07 23:44:35 INFO BlockManager: Found block broadcast_1 locally
 14/07/07 23:44:35 INFO HadoopRDD: Input split:
 hdfs://pzxnvm2018:54310/data/sameer_7-2-2014_3mm_sentences.tsv:0+97239389
 14/07/07 23:44:35 INFO HadoopRDD: Input split:
 hdfs://pzxnvm2018:54310/data/sameer_7-2-2014_3mm_sentences.tsv:97239389+97239390
 14/07/07 23:44:54 INFO AppClient$ClientActor: Connecting to master
 spark://pzxnvm2018:7077...
 14/07/07 23:45:14 INFO AppClient$ClientActor: Connecting to master
 spark://pzxnvm2018:7077...
 14/07/07 23:45:35 ERROR SparkDeploySchedulerBackend: Application has been
 killed. Reason: All masters are unresponsive! Giving up.
 14/07/07 23:45:35 ERROR TaskSchedulerImpl: Exiting due to error from
 cluster scheduler: All masters are unresponsive! Giving up.
 14/07/07 23:45:35 WARN HadoopRDD: Exception in RecordReader.close()
 java.io.IOException: Filesystem closed
 at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264)
 at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.close(DFSClient.java:2135)
 at java.io.FilterInputStream.close(FilterInputStream.java:181)
 at org.apache.hadoop.util.LineReader.close(LineReader.java:83)
 at
 org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:168)
 at org.apache.spark.rdd.HadoopRDD$$anon$1.close(HadoopRDD.scala:208)
 at org.apache.spark.util.NextIterator.closeIfNeeded(NextIterator.scala:63)
 at
 org.apache.spark.rdd.HadoopRDD$$anon$1$$anonfun$1.apply$mcV$sp(HadoopRDD.scala:193)
 at
 org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63)
 at
 org.apache.spark.TaskContext$$anonfun$executeOnCompleteCallbacks$1.apply(TaskContext.scala:63)
 at
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
 at
 org.apache.spark.TaskContext.executeOnCompleteCallbacks(TaskContext.scala:63)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:113)
 at org.apache.spark.scheduler.Task.run(Task.scala:51)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
 14/07/07 23:45:35 ERROR Executor: Exception in task ID 2
 java.io.IOException: Filesystem closed
 at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264)
 at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2213)
 at java.io.DataInputStream.read(DataInputStream.java:100)
 at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
 at
 org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:133)
 at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:38)
 at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:198)
 at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:181)
 at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
 at
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
 at 

Re: Spark: All masters are unresponsive!

2014-07-08 Thread Andrew Or
 app-20140708100139-/2 removed: Command exited with code 1
 14/07/08 10:01:43 INFO AppClient$ClientActor: Executor added:
 app-20140708100139-/5 on
 worker-20140708095559-pzxnvm2022.x.y.name.org-41826 (
 pzxnvm2022.dcld.pldc.kp.org:41826) with 1 cores
 14/07/08 10:01:43 INFO SparkDeploySchedulerBackend: Granted executor ID
 app-20140708100139-/5 on hostPort pzxnvm2022.x.y.name.org:41826 with
 1 cores, 512.0 MB RAM
 14/07/08 10:01:43 INFO AppClient$ClientActor: Executor updated:
 app-20140708100139-/5 is now RUNNING
 14/07/08 10:01:44 INFO AppClient$ClientActor: Executor updated:
 app-20140708100139-/3 is now FAILED (Command exited with code 1)
 14/07/08 10:01:44 INFO SparkDeploySchedulerBackend: Executor
 app-20140708100139-/3 removed: Command exited with code 1
 14/07/08 10:01:44 INFO AppClient$ClientActor: Executor added:
 app-20140708100139-/6 on
 worker-20140708095558-pzxnvm2024.x.y.name.org-50218 (
 pzxnvm2024.dcld.pldc.kp.org:50218) with 4 cores
 14/07/08 10:01:44 INFO SparkDeploySchedulerBackend: Granted executor ID
 app-20140708100139-/6 on hostPort pzxnvm2024.x.y.name.org:50218 with
 4 cores, 512.0 MB RAM
 14/07/08 10:01:44 INFO AppClient$ClientActor: Executor updated:
 app-20140708100139-/6 is now RUNNING
 14/07/08 10:01:45 INFO AppClient$ClientActor: Executor updated:
 app-20140708100139-/4 is now FAILED (Command exited with code 1)
 14/07/08 10:01:45 INFO SparkDeploySchedulerBackend: Executor
 app-20140708100139-/4 removed: Command exited with code 1
 14/07/08 10:01:45 INFO AppClient$ClientActor: Executor added:
 app-20140708100139-/7 on
 worker-20140708095559-pzxnvm2023.x.y.name.org-38294 (
 pzxnvm2023.dcld.pldc.kp.org:38294) with 4 cores


 --
 Date: Tue, 8 Jul 2014 12:29:21 +0530
 Subject: Re: Spark: All masters are unresponsive!
 From: ak...@sigmoidanalytics.com
 To: user@spark.apache.org


 Are you sure this is your master URL spark://pzxnvm2018:7077 ?

 You can look it up in the WebUI (mostly http://pzxnvm2018:8080) top left
 corner. Also make sure you are able to telnet pzxnvm2018 7077 from the
 machines where you are running the spark shell.

 Thanks
 Best Regards


 On Tue, Jul 8, 2014 at 12:21 PM, Sameer Tilak ssti...@live.com wrote:

 Hi All,

 I am having a few issues with stability and scheduling. When I use spark
 shell to submit my application. I get the following error message and spark
 shell crashes. I have a small 4-node cluster for PoC. I tried both manual
 and scripts-based cluster set up. I tried using FQDN as well for specifying
 the master node, but no luck.

 14/07/07 23:44:35 INFO DAGScheduler: Submitting 2 missing tasks from Stage
 1 (MappedRDD[6] at map at JaccardScore.scala:83)
 14/07/07 23:44:35 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
 14/07/07 23:44:35 INFO TaskSetManager: Starting task 1.0:0 as TID 1 on
 executor localhost: localhost (PROCESS_LOCAL)
 14/07/07 23:44:35 INFO TaskSetManager: Serialized task 1.0:0 as 2322 bytes
 in 0 ms
 14/07/07 23:44:35 INFO TaskSetManager: Starting task 1.0:1 as TID 2 on
 executor localhost: localhost (PROCESS_LOCAL)
 14/07/07 23:44:35 INFO TaskSetManager: Serialized task 1.0:1 as 2322 bytes
 in 0 ms
 14/07/07 23:44:35 INFO Executor: Running task ID 1
 14/07/07 23:44:35 INFO Executor: Running task ID 2
 14/07/07 23:44:35 INFO BlockManager: Found block broadcast_1 locally
 14/07/07 23:44:35 INFO BlockManager: Found block broadcast_1 locally
 14/07/07 23:44:35 INFO HadoopRDD: Input split:
 hdfs://pzxnvm2018:54310/data/sameer_7-2-2014_3mm_sentences.tsv:0+97239389
 14/07/07 23:44:35 INFO HadoopRDD: Input split:
 hdfs://pzxnvm2018:54310/data/sameer_7-2-2014_3mm_sentences.tsv:97239389+97239390
 14/07/07 23:44:54 INFO AppClient$ClientActor: Connecting to master
 spark://pzxnvm2018:7077...
 14/07/07 23:45:14 INFO AppClient$ClientActor: Connecting to master
 spark://pzxnvm2018:7077...
 14/07/07 23:45:35 ERROR SparkDeploySchedulerBackend: Application has been
 killed. Reason: All masters are unresponsive! Giving up.
 14/07/07 23:45:35 ERROR TaskSchedulerImpl: Exiting due to error from
 cluster scheduler: All masters are unresponsive! Giving up.
 14/07/07 23:45:35 WARN HadoopRDD: Exception in RecordReader.close()
 java.io.IOException: Filesystem closed
 at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:264)
 at org.apache.hadoop.hdfs.DFSClient.access$1100(DFSClient.java:74)
  at
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.close(DFSClient.java:2135)
 at java.io.FilterInputStream.close(FilterInputStream.java:181)
  at org.apache.hadoop.util.LineReader.close(LineReader.java:83)
 at
 org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:168)
  at org.apache.spark.rdd.HadoopRDD$$anon$1.close(HadoopRDD.scala:208)
 at org.apache.spark.util.NextIterator.closeIfNeeded(NextIterator.scala:63)
  at
 org.apache.spark.rdd.HadoopRDD$$anon$1$$anonfun$1.apply$mcV$sp(HadoopRDD.scala:193)
 at
 org.apache.spark.TaskContext$$anonfun