Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

Nan Zhu Thu, 11 Sep 2014 07:29:43 -0700

Hi,   

Can you attach more logs to see if there is some entry from ContextCleaner?


I met very similar issue before…but haven’t get resolved  

Best,  

--  
Nan Zhu


On Thursday, September 11, 2014 at 10:13 AM, Dibyendu Bhattacharya wrote:

> Dear All,  
>  
> Not sure if this is a false alarm. But wanted to raise to this to understand 
> what is happening.  
>  
> I am testing the Kafka Receiver which I have written 
> (https://github.com/dibbhatt/kafka-spark-consumer) which basically a low 
> level Kafka Consumer implemented custom Receivers for every Kafka topic 
> partitions and pulling data in parallel. Individual streams from all topic 
> partitions are then merged to create Union stream which used for further 
> processing.
>  
> The custom Receiver working fine in normal load with no issues. But when I 
> tested this with huge amount of backlog messages from Kafka ( 50 million + 
> messages), I see couple of major issue in Spark Streaming. Wanted to get some 
> opinion on this....
>  
> I am using latest Spark 1.1 taken from the source and built it. Running in 
> Amazon EMR , 3 m1.xlarge Node Spark cluster running in Standalone Mode.
>  
> Below are two main question I have..
>  
> 1. What I am seeing when I run the Spark Streaming with my Kafka Consumer 
> with a huge backlog in Kafka ( around 50 Million), Spark is completely busy 
> performing the Receiving task and hardly schedule any processing task. Can 
> you let me if this is expected ? If there is large backlog, Spark will take 
> long time pulling them . Why Spark not doing any processing ? Is it because 
> of resource limitation ( say all cores are busy puling ) or it is by design ? 
> I am setting the executor-memory to 10G and driver-memory to 4G .
>  
> 2. This issue seems to be more serious. I have attached the Driver trace with 
> this email. What I can see very frequently Block are selected to be 
> Removed...This kind of entries are all over the place. But when a Block is 
> removed , below problem happen.... May be this issue cause the issue 1 that 
> no Jobs are getting processed ..
>  
>  
> INFO : org.apache.spark.storage.MemoryStore - 1 blocks selected for dropping
> INFO : org.apache.spark.storage.BlockManager - Dropping block 
> input-0-1410443074600 from memory
> INFO : org.apache.spark.storage.MemoryStore - Block input-0-1410443074600 of 
> size 12651900 dropped from memory (free 21220667)
> INFO : org.apache.spark.storage.BlockManagerInfo - Removed 
> input-0-1410443074600 on ip-10-252-5-113.asskickery.us:53752 
> (http://ip-10-252-5-113.asskickery.us:53752) in memory (size: 12.1 MB, free: 
> 100.6 MB)
>  
> ...........
>  
> INFO : org.apache.spark.storage.BlockManagerInfo - Removed 
> input-0-1410443074600 on ip-10-252-5-62.asskickery.us:37033 
> (http://ip-10-252-5-62.asskickery.us:37033) in memory (size: 12.1 MB, free: 
> 154.6 MB)
> ..............
>  
>  
> WARN : org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 7.0 
> (TID 118, ip-10-252-5-62.asskickery.us 
> (http://ip-10-252-5-62.asskickery.us)): java.lang.Exception: Could not 
> compute split, block input-0-1410443074600 not found
>  
> ...........
>  
> INFO : org.apache.spark.scheduler.TaskSetManager - Lost task 0.1 in stage 7.0 
> (TID 126) on executor ip-10-252-5-62.asskickery.us 
> (http://ip-10-252-5-62.asskickery.us): java.lang.Exception (Could not compute 
> split, block input-0-1410443074600 not found) [duplicate 1]
>  
>  
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 
> (TID 139, ip-10-252-5-62.asskickery.us 
> (http://ip-10-252-5-62.asskickery.us)): java.lang.Exception: Could not 
> compute split, block input-0-1410443074600 not found
>         org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>         org.apache.spark.scheduler.Task.run(Task.scala:54)
>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>         
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         java.lang.Thread.run(Thread.java:744)
>  
>  
> Regards,  
> Dibyendu
>  
>  
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> (mailto:user-unsubscr...@spark.apache.org)
> For additional commands, e-mail: user-h...@spark.apache.org 
> (mailto:user-h...@spark.apache.org)
>  
>  
>  
>  
> Attachments:  
> - driver-trace.txt
>

Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

Reply via email to