Are you accessing the RDDs on raw data blocks and running independent
Spark jobs on them (that is outside DStream)? In that case this may
happen as Spark Straming will clean up the raw data based on the
DStream operations (if there is a window op of 15 mins, it will keep
the data around for 15 mins at least). So independent Spark jobs that
access old data may fail. The solution for that is using
DStream.remember() on the raw input stream to make sure the data is
kept around.

Not sure if this was the problem or not. For more info can you tell
when you are running Spark 0.9 or 1.0?



TD

On Fri, Aug 1, 2014 at 10:55 AM, Kanwaldeep <kanwal...@gmail.com> wrote:
> We are using Sparks 1.0 for Spark Streaming on Spark Standalone cluster and
> seeing the following error.
>
>         Job aborted due to stage failure: Task 3475.0:15 failed 4 times, most
> recent failure: Exception failure in TID 216394 on host
> hslave33102.sjc9.service-now.com: java.lang.Exception: Could not compute
> split, block input-0-1406869340000 not found
> org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
>
> We are using the Memory_DISK serialization option for the input streams. And
> the stream is also being persisted since we have multiple transformations
> happening on the input stream.
>
>
>     val lines = KafkaUtils.createStream[String, Array[Byte], StringDecoder,
> DefaultDecoder](ssc, kafkaParams, topicpMap,
> StorageLevel.MEMORY_AND_DISK_SER)
>
>     lines.persist(StorageLevel.MEMORY_AND_DISK_SER)
>
> We are aggregating data every 15 minutes as well as an hour. The
> spark.streaming.blockInterval=10000 so we minimize the blocks of data read.
>
> The problem started at the 15 minute interval but now I'm seeing it happen
> every hour since last night.
>
> Any suggestions?
>
> Thanks
> Kanwal
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Could-not-compute-split-block-not-found-tp11186.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to