[jira] [Commented] (SPARK-7525) Could not read data from write ahead log record when Receiver failed and WAL is stored in Tachyon

2015-05-12 Thread Dibyendu Bhattacharya (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540325#comment-14540325
 ] 

Dibyendu Bhattacharya commented on SPARK-7525:
--

I guess this is something to do with the lack of Tachyon Append Support. 

java.lang.IllegalStateException: File exists and there is no append support!
at 
org.apache.spark.streaming.util.HdfsUtils$.getOutputStream(HdfsUtils.scala:33)
at 
org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.org$apache$spark$streaming$util$FileBasedWriteAheadLogWriter$$stream$lzycompute(FileBasedWriteAheadLogWriter.scala:33)
at 
org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.org$apache$spark$streaming$util$FileBasedWriteAheadLogWriter$$stream(FileBasedWriteAheadLogWriter.scala:33)
at 
org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.init(FileBasedWriteAheadLogWriter.scala:41)
at 
org.apache.spark.streaming.util.FileBasedWriteAheadLog.getLogWriter(FileBasedWriteAheadLog.scala:194)
at 
org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:81)
at 
org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:44)
at 
org.apache.spark.streaming.receiver.WriteAheadLogBasedBlockHandler$$anonfun$5.apply(ReceivedBlockHandler.scala:178)
at 
org.apache.spark.streaming.receiver.WriteAheadLogBasedBlockHandler$$anonfun$5.apply(ReceivedBlockHandler.scala:178)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

 Could not read data from write ahead log record when Receiver failed and WAL 
 is stored in Tachyon
 -

 Key: SPARK-7525
 URL: https://issues.apache.org/jira/browse/SPARK-7525
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.4.0
 Environment: AWS , Spark Streaming 1.4 with Tachyon 0.6.4
Reporter: Dibyendu Bhattacharya

 I was testing Fault Tolerant aspect of Spark Streaming when Checkpoint 
 directory is stored in Tachyon. Spark Streaming is able to recover from 
 Driver failure , but when Receiver Failed, Spark Streaming not able read the 
 WAL files written by failed Receiver. Below is exception when Receiver is 
 failed .
 INFO : org.apache.spark.scheduler.DAGScheduler - Executor lost: 2 (epoch 1)
 INFO : org.apache.spark.storage.BlockManagerMasterEndpoint - Trying to remove 
 executor 2 from BlockManagerMaster.
 INFO : org.apache.spark.storage.BlockManagerMasterEndpoint - Removing block 
 manager BlockManagerId(2, 10.252.5.54, 45789)
 INFO : org.apache.spark.storage.BlockManagerMaster - Removed 2 successfully 
 in removeExecutor
 INFO : org.apache.spark.streaming.scheduler.ReceiverTracker - Registered 
 receiver for stream 2 from 10.252.5.62:47255
 WARN : org.apache.spark.scheduler.TaskSetManager - Lost task 2.1 in stage 
 103.0 (TID 421, 10.252.5.62): org.apache.spark.SparkException: Could not read 
 data from write ahead log record 
 FileBasedWriteAheadLogSegment(tachyon-ft://10.252.5.113:19998/tachyon/checkpoint/receivedData/2/log-1431341091711-1431341151711,645603894,10891919)
   at 
 org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:144)
   at 
 org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:168)
   at 
 org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:168)
   at scala.Option.getOrElse(Option.scala:120)
   at 
 org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.compute(WriteAheadLogBackedBlockRDD.scala:168)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
   at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
   at org.apache.spark.scheduler.Task.run(Task.scala:70)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
   at 
 

[jira] [Commented] (SPARK-7525) Could not read data from write ahead log record when Receiver failed and WAL is stored in Tachyon

2015-05-11 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537824#comment-14537824
 ] 

Sean Owen commented on SPARK-7525:
--

Related to SPARK-7139?

 Could not read data from write ahead log record when Receiver failed and WAL 
 is stored in Tachyon
 -

 Key: SPARK-7525
 URL: https://issues.apache.org/jira/browse/SPARK-7525
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.4.0
 Environment: AWS , Spark Streaming 1.4 with Tachyon 0.6.4
Reporter: Dibyendu Bhattacharya

 I was testing Fault Tolerant aspect of Spark Streaming when Checkpoint 
 directory is stored in Tachyon. Spark Streaming is able to recover from 
 Driver failure , but when Receiver Failed, Spark Streaming not able read the 
 WAL files written by failed Receiver. Below is exception when Receiver is 
 failed .
 INFO : org.apache.spark.scheduler.DAGScheduler - Executor lost: 2 (epoch 1)
 INFO : org.apache.spark.storage.BlockManagerMasterEndpoint - Trying to remove 
 executor 2 from BlockManagerMaster.
 INFO : org.apache.spark.storage.BlockManagerMasterEndpoint - Removing block 
 manager BlockManagerId(2, 10.252.5.54, 45789)
 INFO : org.apache.spark.storage.BlockManagerMaster - Removed 2 successfully 
 in removeExecutor
 INFO : org.apache.spark.streaming.scheduler.ReceiverTracker - Registered 
 receiver for stream 2 from 10.252.5.62:47255
 WARN : org.apache.spark.scheduler.TaskSetManager - Lost task 2.1 in stage 
 103.0 (TID 421, 10.252.5.62): org.apache.spark.SparkException: Could not read 
 data from write ahead log record 
 FileBasedWriteAheadLogSegment(tachyon-ft://10.252.5.113:19998/tachyon/checkpoint/receivedData/2/log-1431341091711-1431341151711,645603894,10891919)
   at 
 org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:144)
   at 
 org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:168)
   at 
 org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:168)
   at scala.Option.getOrElse(Option.scala:120)
   at 
 org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.compute(WriteAheadLogBackedBlockRDD.scala:168)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
   at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
   at org.apache.spark.scheduler.Task.run(Task.scala:70)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IllegalArgumentException: Seek position is past EOF: 
 645603894, fileSize = 0
   at tachyon.hadoop.HdfsFileInputStream.seek(HdfsFileInputStream.java:239)
   at 
 org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37)
   at 
 org.apache.spark.streaming.util.FileBasedWriteAheadLogRandomReader.read(FileBasedWriteAheadLogRandomReader.scala:37)
   at 
 org.apache.spark.streaming.util.FileBasedWriteAheadLog.read(FileBasedWriteAheadLog.scala:104)
   at 
 org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:141)
   ... 15 more
 INFO : org.apache.spark.scheduler.TaskSetManager - Starting task 2.2 in stage 
 103.0 (TID 422, 10.252.5.61, ANY, 1909 bytes)
 INFO : org.apache.spark.scheduler.TaskSetManager - Lost task 2.2 in stage 
 103.0 (TID 422) on executor 10.252.5.61: org.apache.spark.SparkException 
 (Could not read data from write ahead log record 
 FileBasedWriteAheadLogSegment(tachyon-ft://10.252.5.113:19998/tachyon/checkpoint/receivedData/2/log-1431341091711-1431341151711,645603894,10891919))
  [duplicate 1]
 INFO : org.apache.spark.scheduler.TaskSetManager - Starting task 2.3 in stage 
 103.0 (TID 423, 10.252.5.62, ANY, 1909 bytes)
 INFO : org.apache.spark.deploy.client.AppClient$ClientActor - Executor 
 updated: app-20150511104442-0048/2 is now LOST (worker lost)
 INFO : org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend - 
 Executor app-20150511104442-0048/2 

[jira] [Commented] (SPARK-7525) Could not read data from write ahead log record when Receiver failed and WAL is stored in Tachyon

2015-05-11 Thread Dibyendu Bhattacharya (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537914#comment-14537914
 ] 

Dibyendu Bhattacharya commented on SPARK-7525:
--


This issue not happening when chekpointing to HDFS . 

If Checkpoint directory is Tachyon , then this issue comes while Receiver fails 
. 

For Driver failure case, Spark Streaming can recover if checkpoint directory is 
in Tachyon ..

 Could not read data from write ahead log record when Receiver failed and WAL 
 is stored in Tachyon
 -

 Key: SPARK-7525
 URL: https://issues.apache.org/jira/browse/SPARK-7525
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.4.0
 Environment: AWS , Spark Streaming 1.4 with Tachyon 0.6.4
Reporter: Dibyendu Bhattacharya

 I was testing Fault Tolerant aspect of Spark Streaming when Checkpoint 
 directory is stored in Tachyon. Spark Streaming is able to recover from 
 Driver failure , but when Receiver Failed, Spark Streaming not able read the 
 WAL files written by failed Receiver. Below is exception when Receiver is 
 failed .
 INFO : org.apache.spark.scheduler.DAGScheduler - Executor lost: 2 (epoch 1)
 INFO : org.apache.spark.storage.BlockManagerMasterEndpoint - Trying to remove 
 executor 2 from BlockManagerMaster.
 INFO : org.apache.spark.storage.BlockManagerMasterEndpoint - Removing block 
 manager BlockManagerId(2, 10.252.5.54, 45789)
 INFO : org.apache.spark.storage.BlockManagerMaster - Removed 2 successfully 
 in removeExecutor
 INFO : org.apache.spark.streaming.scheduler.ReceiverTracker - Registered 
 receiver for stream 2 from 10.252.5.62:47255
 WARN : org.apache.spark.scheduler.TaskSetManager - Lost task 2.1 in stage 
 103.0 (TID 421, 10.252.5.62): org.apache.spark.SparkException: Could not read 
 data from write ahead log record 
 FileBasedWriteAheadLogSegment(tachyon-ft://10.252.5.113:19998/tachyon/checkpoint/receivedData/2/log-1431341091711-1431341151711,645603894,10891919)
   at 
 org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:144)
   at 
 org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:168)
   at 
 org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:168)
   at scala.Option.getOrElse(Option.scala:120)
   at 
 org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.compute(WriteAheadLogBackedBlockRDD.scala:168)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
   at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
   at org.apache.spark.scheduler.Task.run(Task.scala:70)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: java.lang.IllegalArgumentException: Seek position is past EOF: 
 645603894, fileSize = 0
   at tachyon.hadoop.HdfsFileInputStream.seek(HdfsFileInputStream.java:239)
   at 
 org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:37)
   at 
 org.apache.spark.streaming.util.FileBasedWriteAheadLogRandomReader.read(FileBasedWriteAheadLogRandomReader.scala:37)
   at 
 org.apache.spark.streaming.util.FileBasedWriteAheadLog.read(FileBasedWriteAheadLog.scala:104)
   at 
 org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:141)
   ... 15 more
 INFO : org.apache.spark.scheduler.TaskSetManager - Starting task 2.2 in stage 
 103.0 (TID 422, 10.252.5.61, ANY, 1909 bytes)
 INFO : org.apache.spark.scheduler.TaskSetManager - Lost task 2.2 in stage 
 103.0 (TID 422) on executor 10.252.5.61: org.apache.spark.SparkException 
 (Could not read data from write ahead log record 
 FileBasedWriteAheadLogSegment(tachyon-ft://10.252.5.113:19998/tachyon/checkpoint/receivedData/2/log-1431341091711-1431341151711,645603894,10891919))
  [duplicate 1]
 INFO : org.apache.spark.scheduler.TaskSetManager - Starting task 2.3 in stage 
 103.0 (TID 423, 10.252.5.62, ANY, 1909 bytes)
 INFO :