Re: Spark checkpoint restore failure due to s3 consistency issue

2015-10-09 Thread Tathagata Das
That wont really. What we need to see is the lifecycle of the file before
the failure, so we need to the log4j logs.

On Fri, Oct 9, 2015 at 2:34 PM, Spark Newbie 
wrote:

> Unfortunately I don't have the before stop logs anymore since the log was
> overwritten in my next run.
>
> I created a rdd-_$folder$ file in S3 which was missing compared to
> the other rdd- checkpointed. The app started without the
> IllegalArgumentException. Do you still need to after restart log4j logs? I
> can send it if that will help dig into the root cause.
>
> On Fri, Oct 9, 2015 at 2:18 PM, Tathagata Das  wrote:
>
>> Can you provide the before stop and after restart log4j logs for this?
>>
>> On Fri, Oct 9, 2015 at 2:13 PM, Spark Newbie 
>> wrote:
>>
>>> Hi Spark Users,
>>>
>>> I'm seeing checkpoint restore failures causing the application startup
>>> to fail with the below exception. When I do "ls" on the s3 path I see the
>>> key listed sometimes and not listed sometimes. There are no part files
>>> (checkpointed files) in the specified S3 path. This is possible because I
>>> killed the app and restarted as a part of my testing to see if kinesis-asl
>>> library's implementation of lossless kinesis receivers work.
>>>
>>> Has anyone seen the below exception before? If so is there a recommended
>>> way to handle this case?
>>>
>>> 15/10/09 21:02:09 DEBUG S3NativeFileSystem: getFileStatus could not find
>>> key ''
>>> Exception in thread "main" java.lang.IllegalArgumentException:
>>> requirement failed: Checkpoint directory does not exist: >> path to the checkpointed rdd>
>>> at scala.Predef$.require(Predef.scala:233)
>>> at
>>> org.apache.spark.rdd.ReliableCheckpointRDD.(ReliableCheckpointRDD.scala:45)
>>> at
>>> org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218)
>>> at
>>> org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218)
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>>> at
>>> org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>>> at
>>> org.apache.spark.SparkContext.checkpointFile(SparkContext.scala:1217)
>>> at
>>> org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:112)
>>> at
>>> org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:109)
>>> at
>>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>>> at
>>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>>> at
>>> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>>> at
>>> scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>>> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>>> at
>>> org.apache.spark.streaming.dstream.DStreamCheckpointData.restore(DStreamCheckpointData.scala:109)
>>> at
>>> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:487)
>>> at
>>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>>> at
>>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>> at
>>> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
>>> at
>>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>>> at
>>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>> at
>>> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
>>> at
>>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>>> at
>>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>>> at scala.collection.immutable.List.foreach(List.scala:318)
>>> at
>>> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
>>> at
>>> org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153)
>>> at
>>> org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153)
>>> at
>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>> at
>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>> at

Re: Spark checkpoint restore failure due to s3 consistency issue

2015-10-09 Thread Spark Newbie
Unfortunately I don't have the before stop logs anymore since the log was
overwritten in my next run.

I created a rdd-_$folder$ file in S3 which was missing compared to the
other rdd- checkpointed. The app started without the
IllegalArgumentException. Do you still need to after restart log4j logs? I
can send it if that will help dig into the root cause.

On Fri, Oct 9, 2015 at 2:18 PM, Tathagata Das  wrote:

> Can you provide the before stop and after restart log4j logs for this?
>
> On Fri, Oct 9, 2015 at 2:13 PM, Spark Newbie 
> wrote:
>
>> Hi Spark Users,
>>
>> I'm seeing checkpoint restore failures causing the application startup to
>> fail with the below exception. When I do "ls" on the s3 path I see the key
>> listed sometimes and not listed sometimes. There are no part files
>> (checkpointed files) in the specified S3 path. This is possible because I
>> killed the app and restarted as a part of my testing to see if kinesis-asl
>> library's implementation of lossless kinesis receivers work.
>>
>> Has anyone seen the below exception before? If so is there a recommended
>> way to handle this case?
>>
>> 15/10/09 21:02:09 DEBUG S3NativeFileSystem: getFileStatus could not find
>> key ''
>> Exception in thread "main" java.lang.IllegalArgumentException:
>> requirement failed: Checkpoint directory does not exist: > path to the checkpointed rdd>
>> at scala.Predef$.require(Predef.scala:233)
>> at
>> org.apache.spark.rdd.ReliableCheckpointRDD.(ReliableCheckpointRDD.scala:45)
>> at
>> org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218)
>> at
>> org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>> at
>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>> at org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>> at
>> org.apache.spark.SparkContext.checkpointFile(SparkContext.scala:1217)
>> at
>> org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:112)
>> at
>> org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:109)
>> at
>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>> at
>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>> at
>> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>> at
>> org.apache.spark.streaming.dstream.DStreamCheckpointData.restore(DStreamCheckpointData.scala:109)
>> at
>> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:487)
>> at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>> at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>> at scala.collection.immutable.List.foreach(List.scala:318)
>> at
>> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
>> at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>> at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>> at scala.collection.immutable.List.foreach(List.scala:318)
>> at
>> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
>> at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>> at
>> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
>> at scala.collection.immutable.List.foreach(List.scala:318)
>> at
>> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
>> at
>> org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153)
>> at
>> org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153)
>> at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> at
>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> at
>> org.apache.spark.streaming.DStreamGraph.restoreCheckpointData(DStreamGraph.scala:153)
>> at
>> org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:158)
>> at
>> org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:837)
>> 

Spark checkpoint restore failure due to s3 consistency issue

2015-10-09 Thread Spark Newbie
Hi Spark Users,

I'm seeing checkpoint restore failures causing the application startup to
fail with the below exception. When I do "ls" on the s3 path I see the key
listed sometimes and not listed sometimes. There are no part files
(checkpointed files) in the specified S3 path. This is possible because I
killed the app and restarted as a part of my testing to see if kinesis-asl
library's implementation of lossless kinesis receivers work.

Has anyone seen the below exception before? If so is there a recommended
way to handle this case?

15/10/09 21:02:09 DEBUG S3NativeFileSystem: getFileStatus could not find
key ''
Exception in thread "main" java.lang.IllegalArgumentException: requirement
failed: Checkpoint directory does not exist: 
at scala.Predef$.require(Predef.scala:233)
at
org.apache.spark.rdd.ReliableCheckpointRDD.(ReliableCheckpointRDD.scala:45)
at
org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218)
at
org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
at
org.apache.spark.SparkContext.checkpointFile(SparkContext.scala:1217)
at
org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:112)
at
org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:109)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at
org.apache.spark.streaming.dstream.DStreamCheckpointData.restore(DStreamCheckpointData.scala:109)
at
org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:487)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
at scala.collection.immutable.List.foreach(List.scala:318)
at
org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
at scala.collection.immutable.List.foreach(List.scala:318)
at
org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
at scala.collection.immutable.List.foreach(List.scala:318)
at
org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
at
org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153)
at
org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.streaming.DStreamGraph.restoreCheckpointData(DStreamGraph.scala:153)
at
org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:158)
at
org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:837)
at
org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:837)
at scala.Option.map(Option.scala:145)
at
org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:837)
at foo$.getStreamingContext(foo.scala:72)

Thanks,
Bharath


Re: Spark checkpoint restore failure due to s3 consistency issue

2015-10-09 Thread Tathagata Das
Can you provide the before stop and after restart log4j logs for this?

On Fri, Oct 9, 2015 at 2:13 PM, Spark Newbie 
wrote:

> Hi Spark Users,
>
> I'm seeing checkpoint restore failures causing the application startup to
> fail with the below exception. When I do "ls" on the s3 path I see the key
> listed sometimes and not listed sometimes. There are no part files
> (checkpointed files) in the specified S3 path. This is possible because I
> killed the app and restarted as a part of my testing to see if kinesis-asl
> library's implementation of lossless kinesis receivers work.
>
> Has anyone seen the below exception before? If so is there a recommended
> way to handle this case?
>
> 15/10/09 21:02:09 DEBUG S3NativeFileSystem: getFileStatus could not find
> key ''
> Exception in thread "main" java.lang.IllegalArgumentException: requirement
> failed: Checkpoint directory does not exist:  checkpointed rdd>
> at scala.Predef$.require(Predef.scala:233)
> at
> org.apache.spark.rdd.ReliableCheckpointRDD.(ReliableCheckpointRDD.scala:45)
> at
> org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218)
> at
> org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1218)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
> at
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
> at org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
> at
> org.apache.spark.SparkContext.checkpointFile(SparkContext.scala:1217)
> at
> org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:112)
> at
> org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:109)
> at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
> at
> org.apache.spark.streaming.dstream.DStreamCheckpointData.restore(DStreamCheckpointData.scala:109)
> at
> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:487)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at
> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at
> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
> at
> org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
> at scala.collection.immutable.List.foreach(List.scala:318)
> at
> org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
> at
> org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153)
> at
> org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.streaming.DStreamGraph.restoreCheckpointData(DStreamGraph.scala:153)
> at
> org.apache.spark.streaming.StreamingContext.(StreamingContext.scala:158)
> at
> org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:837)
> at
> org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:837)
> at scala.Option.map(Option.scala:145)
> at
> org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:837)
> at foo$.getStreamingContext(foo.scala:72)
>
> Thanks,
> Bharath
>