Re: Spark checkpoint restore failure due to s3 consistency issue
That wont really. What we need to see is the lifecycle of the file before the failure, so we need to the log4j logs. On Fri, Oct 9, 2015 at 2:34 PM, Spark Newbiewrote: > Unfortunately I don't have the before stop logs anymore since the log was > overwritten in my next run. > > I created a rdd-_$folder$ file in S3 which was missing compared to > the other rdd- checkpointed. The app started without the > IllegalArgumentException. Do you still need to after restart log4j logs? I > can send it if that will help dig into the root cause. > > On Fri, Oct 9, 2015 at 2:18 PM, Tathagata Das wrote: > >> Can you provide the before stop and after restart log4j logs for this? >> >> On Fri, Oct 9, 2015 at 2:13 PM, Spark Newbie >> wrote: >> >>> Hi Spark Users, >>> >>> I'm seeing checkpoint restore failures causing the application startup >>> to fail with the below exception. When I do "ls" on the s3 path I see the >>> key listed sometimes and not listed sometimes. There are no part files >>> (checkpointed files) in the specified S3 path. This is possible because I >>> killed the app and restarted as a part of my testing to see if kinesis-asl >>> library's implementation of lossless kinesis receivers work. >>> >>> Has anyone seen the below exception before? If so is there a recommended >>> way to handle this case? >>> >>> 15/10/09 21:02:09 DEBUG S3NativeFileSystem: getFileStatus could not find >>> key '' >>> Exception in thread "main" java.lang.IllegalArgumentException: >>> requirement failed: Checkpoint directory does not exist: >> path to the checkpointed rdd> >>> at scala.Predef$.require(Predef.scala:233) >>> at >>> org.apache.spark.rdd.ReliableCheckpointRDD.(ReliableCheckpointRDD.scala:45) >>> at >>> org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218) >>> at >>> org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218) >>> at >>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) >>> at >>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) >>> at >>> org.apache.spark.SparkContext.withScope(SparkContext.scala:700) >>> at >>> org.apache.spark.SparkContext.checkpointFile(SparkContext.scala:1217) >>> at >>> org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:112) >>> at >>> org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:109) >>> at >>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) >>> at >>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) >>> at >>> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) >>> at >>> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) >>> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) >>> at >>> org.apache.spark.streaming.dstream.DStreamCheckpointData.restore(DStreamCheckpointData.scala:109) >>> at >>> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:487) >>> at >>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) >>> at >>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) >>> at scala.collection.immutable.List.foreach(List.scala:318) >>> at >>> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488) >>> at >>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) >>> at >>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) >>> at scala.collection.immutable.List.foreach(List.scala:318) >>> at >>> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488) >>> at >>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) >>> at >>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) >>> at scala.collection.immutable.List.foreach(List.scala:318) >>> at >>> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488) >>> at >>> org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153) >>> at >>> org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153) >>> at >>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >>> at >>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>> at
Re: Spark checkpoint restore failure due to s3 consistency issue
Unfortunately I don't have the before stop logs anymore since the log was overwritten in my next run. I created a rdd-_$folder$ file in S3 which was missing compared to the other rdd- checkpointed. The app started without the IllegalArgumentException. Do you still need to after restart log4j logs? I can send it if that will help dig into the root cause. On Fri, Oct 9, 2015 at 2:18 PM, Tathagata Daswrote: > Can you provide the before stop and after restart log4j logs for this? > > On Fri, Oct 9, 2015 at 2:13 PM, Spark Newbie > wrote: > >> Hi Spark Users, >> >> I'm seeing checkpoint restore failures causing the application startup to >> fail with the below exception. When I do "ls" on the s3 path I see the key >> listed sometimes and not listed sometimes. There are no part files >> (checkpointed files) in the specified S3 path. This is possible because I >> killed the app and restarted as a part of my testing to see if kinesis-asl >> library's implementation of lossless kinesis receivers work. >> >> Has anyone seen the below exception before? If so is there a recommended >> way to handle this case? >> >> 15/10/09 21:02:09 DEBUG S3NativeFileSystem: getFileStatus could not find >> key '' >> Exception in thread "main" java.lang.IllegalArgumentException: >> requirement failed: Checkpoint directory does not exist: > path to the checkpointed rdd> >> at scala.Predef$.require(Predef.scala:233) >> at >> org.apache.spark.rdd.ReliableCheckpointRDD.(ReliableCheckpointRDD.scala:45) >> at >> org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218) >> at >> org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) >> at >> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) >> at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) >> at >> org.apache.spark.SparkContext.checkpointFile(SparkContext.scala:1217) >> at >> org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:112) >> at >> org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:109) >> at >> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) >> at >> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) >> at >> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) >> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) >> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) >> at >> org.apache.spark.streaming.dstream.DStreamCheckpointData.restore(DStreamCheckpointData.scala:109) >> at >> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:487) >> at >> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) >> at >> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) >> at scala.collection.immutable.List.foreach(List.scala:318) >> at >> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488) >> at >> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) >> at >> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) >> at scala.collection.immutable.List.foreach(List.scala:318) >> at >> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488) >> at >> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) >> at >> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) >> at scala.collection.immutable.List.foreach(List.scala:318) >> at >> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488) >> at >> org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153) >> at >> org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> at >> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >> at >> org.apache.spark.streaming.DStreamGraph.restoreCheckpointData(DStreamGraph.scala:153) >> at >> org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:158) >> at >> org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:837) >>
Spark checkpoint restore failure due to s3 consistency issue
Hi Spark Users, I'm seeing checkpoint restore failures causing the application startup to fail with the below exception. When I do "ls" on the s3 path I see the key listed sometimes and not listed sometimes. There are no part files (checkpointed files) in the specified S3 path. This is possible because I killed the app and restarted as a part of my testing to see if kinesis-asl library's implementation of lossless kinesis receivers work. Has anyone seen the below exception before? If so is there a recommended way to handle this case? 15/10/09 21:02:09 DEBUG S3NativeFileSystem: getFileStatus could not find key '' Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Checkpoint directory does not exist: at scala.Predef$.require(Predef.scala:233) at org.apache.spark.rdd.ReliableCheckpointRDD.(ReliableCheckpointRDD.scala:45) at org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218) at org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) at org.apache.spark.SparkContext.checkpointFile(SparkContext.scala:1217) at org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:112) at org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:109) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) at org.apache.spark.streaming.dstream.DStreamCheckpointData.restore(DStreamCheckpointData.scala:109) at org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:487) at org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) at org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488) at org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) at org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488) at org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) at org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488) at org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153) at org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.streaming.DStreamGraph.restoreCheckpointData(DStreamGraph.scala:153) at org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:158) at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:837) at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:837) at scala.Option.map(Option.scala:145) at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:837) at foo$.getStreamingContext(foo.scala:72) Thanks, Bharath
Re: Spark checkpoint restore failure due to s3 consistency issue
Can you provide the before stop and after restart log4j logs for this? On Fri, Oct 9, 2015 at 2:13 PM, Spark Newbiewrote: > Hi Spark Users, > > I'm seeing checkpoint restore failures causing the application startup to > fail with the below exception. When I do "ls" on the s3 path I see the key > listed sometimes and not listed sometimes. There are no part files > (checkpointed files) in the specified S3 path. This is possible because I > killed the app and restarted as a part of my testing to see if kinesis-asl > library's implementation of lossless kinesis receivers work. > > Has anyone seen the below exception before? If so is there a recommended > way to handle this case? > > 15/10/09 21:02:09 DEBUG S3NativeFileSystem: getFileStatus could not find > key '' > Exception in thread "main" java.lang.IllegalArgumentException: requirement > failed: Checkpoint directory does not exist: checkpointed rdd> > at scala.Predef$.require(Predef.scala:233) > at > org.apache.spark.rdd.ReliableCheckpointRDD.(ReliableCheckpointRDD.scala:45) > at > org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218) > at > org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) > at org.apache.spark.SparkContext.withScope(SparkContext.scala:700) > at > org.apache.spark.SparkContext.checkpointFile(SparkContext.scala:1217) > at > org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:112) > at > org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:109) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) > at > org.apache.spark.streaming.dstream.DStreamCheckpointData.restore(DStreamCheckpointData.scala:109) > at > org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:487) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) > at scala.collection.immutable.List.foreach(List.scala:318) > at > org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) > at scala.collection.immutable.List.foreach(List.scala:318) > at > org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488) > at scala.collection.immutable.List.foreach(List.scala:318) > at > org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488) > at > org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153) > at > org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at > scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.streaming.DStreamGraph.restoreCheckpointData(DStreamGraph.scala:153) > at > org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:158) > at > org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:837) > at > org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:837) > at scala.Option.map(Option.scala:145) > at > org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:837) > at foo$.getStreamingContext(foo.scala:72) > > Thanks, > Bharath >