Hi Rafeeq,

I think this situation always occurs when your Spark Streaming application is 
running in an abnormal situation. Would you mind checking your job processing 
time in WebUI or log, is the total latency of job processing + job scheduling 
time larger than batch duration? If your Spark Streaming application is in this 
situation, you will meet this exception.

Normally the reason of this happening is that Spark Streaming job processed one 
by one by default, if one job is blocked for a long time, the next job has to 
wait until the previous one is finished, but input block will be deleted after 
timeout, so when this job is started, it cannot find the right block and will 
throw this exception.

Also you will meet this exception in other abnormal situation. Anyway this 
exception means your application is abnormal, you should pay attention to your 
job execution time.

You can check the spark streaming 
doc<http://spark.apache.org/docs/latest/streaming-programming-guide.html> 
“Monitoring Applications” section to see the details.

Thanks
Jerry

From: Rafeeq S [mailto:rafeeq.ec...@gmail.com]
Sent: Thursday, September 18, 2014 2:43 PM
To: u...@spark.incubator.apache.org
Subject: Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs 
have Failed..

Hi,

I am testing kafka-spark streaming application which throws below error after 
few seconds and below configuration is used for spark streaming test 
environment.

kafka version- 0.8.1
spark version- 1.0.1

SPARK_MASTER_MEMORY="1G"
SPARK_DRIVER_MEMORY="1G"
SPARK_WORKER_INSTANCES="1"
SPARK_EXECUTOR_INSTANCES="1"
SPARK_WORKER_MEMORY="1G"
SPARK_EXECUTOR_MEMORY="1G"
SPARK_WORKER_CORES="2"
SPARK_EXECUTOR_CORES="1"

ERROR:

14/09/12 17:30:23 WARN TaskSetManager: Loss was due to java.lang.Exception
java.lang.Exception: Could not compute split, block
input-4-1410542878200 not found
        at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at org.apache.spark.rdd.UnionPartition.iterator(UnionRDD.scala:33)
        at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:74)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:77)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
        at org.apache.spark.scheduler.Task.run(Task.scala:51)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

Please suggest your answer

Regards,
Rafeeq S
(“What you do is what matters, not what you think or say or plan.” )

Reply via email to