Spark Streaming in 1 hour batch duration RDD files gets lost

Jeoffrey Lim Thu, 11 Sep 2014 16:48:08 -0700

Hi,


Our spark streaming app is configured to pull data from Kafka in 1 hour
batch duration which performs aggregation of data by specific keys and
store the related RDDs to HDFS in the transform phase. We have tried
checkpoint of 7 days on the DStream of Kafka to ensure that the generated
stream does not expire/lost.

The first hour gets completed, but on the succeeding hours it always fails
with exception:


Job aborted due to stage failure: Task 39.0:1 failed 64 times, most recent
failure: Exception failure in TID 27578 on host X.ec2.internal:
java.io.FileNotFoundException:
/data/run/spark/work/spark-local-20140911175744-4ddf/0d/shuffle_3_1_311 (No
such file or directory) java.io.FileOutputStream.open(Native Method)
java.io.FileOutputStream.<init>(FileOutputStream.java:221)
org.apache.spark.storage.DiskBlockObjectWriter.open(BlockObjectWriter.scala:116)
org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:177)
org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:161)
org.apache.spark.scheduler.ShuffleMapTask$$anonfun$runTask$1.apply(ShuffleMapTask.scala:158)
scala.collection.Iterator$class.foreach(Iterator.scala:727)


Environment:

CDH version: 2.3.0-cdh5.1.0
Spark version: 1.0.0-cdh5.1.0


Spark settings:

spark.io.compression.codec : org.apache.spark.io.SnappyCompressionCodec
spark.serializer : org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.mb : 2
spark.local.dir : /data/run/spark/work/
spark.scheduler.mode : FAIR
spark.rdd.compress : false
spark.task.maxFailures : 64
spark.shuffle.use.netty : false
spark.shuffle.spill : true
spark.streaming.checkpoint.dir :
hdfs://X.ec2.internal:8020/user/spark/checkpoints/event-storage
spark.akka.threads : 4
spark.cores.max : 4
spark.executor.memory : 3g
spark.shuffle.consolidateFiles : false
spark.streaming.unpersist : true
spark.logConf : true
spark.shuffle.spill.compress : true


Thanks,

JL




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-in-1-hour-batch-duration-RDD-files-gets-lost-tp14027.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Spark Streaming in 1 hour batch duration RDD files gets lost

Reply via email to