Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2015-01-01 Thread zgm
 in
 stage 7.0 (TID 118, ip-10-252-5-62.asskickery.us):
 java.lang.Exception:
 Could not compute split, block input-0-1410443074600 not found

 ...

 INFO : org.apache.spark.scheduler.TaskSetManager - Lost task 0.1 in
 stage 7.0 (TID 126) on executor ip-10-252-5-62.asskickery.us:
 java.lang.Exception (Could not compute split, block
 input-0-1410443074600
 not found) [duplicate 1]


 org.apache.spark.SparkException: Job aborted due to stage failure:
 Task
 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in
 stage
 7.0 (TID 139, ip-10-252-5-62.asskickery.us): java.lang.Exception:
 Could not
 compute split, block input-0-1410443074600 not found
 org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)

 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)

 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:227)

 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)

 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)

 Regards,
 Dibyendu


 -
 To unsubscribe, e-mail: [hidden email]
 For additional commands, e-mail: [hidden email]

 Attachments:
 - driver-trace.txt






 
 If you reply to this email, your message will be added to the
 discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/Re-Some-Serious-Issue-with-Spark-Streaming-Blocks-Getting-Removed-and-Jobs-have-Failed-tp13972p14075.html
 To start a new topic under Apache Spark User List, email [hidden email]
 To unsubscribe from Apache Spark User List, click here.
 NAML



 
 View this message in context: Re: Some Serious Issue with Spark
 Streaming
 ? Blocks Getting Removed and Jobs have Failed..
 Sent from the Apache Spark User List mailing list archive at Nabble.com.


 
 -
 To unsubscribe, e-mail: 

 user-unsubscribe@.apache

 For additional commands, e-mail: 

 user-help@.apache





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Some-Serious-Issue-with-Spark-Streaming-Blocks-Getting-Removed-and-Jobs-have-Failed-tp14241p20932.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2015-01-01 Thread Sean Owen
)
 java.lang.Thread.run(Thread.java:744)

 Regards,
 Dibyendu


 -
 To unsubscribe, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=14075i=1
 For additional commands, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=14075i=2

 Attachments:
  - driver-trace.txt






 --
  If you reply to this email, your message will be added to the
 discussion below:

 http://apache-spark-user-list.1001560.n3.nabble.com/Re-Some-Serious-Issue-with-Spark-Streaming-Blocks-Getting-Removed-and-Jobs-have-Failed-tp13972p14075.html
  To start a new topic under Apache Spark User List, email [hidden email]
 http:///user/SendEmail.jtp?type=nodenode=14081i=1
 To unsubscribe from Apache Spark User List, click here.
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



 --
 View this message in context: Re: Some Serious Issue with Spark Streaming
 ? Blocks Getting Removed and Jobs have Failed..
 http://apache-spark-user-list.1001560.n3.nabble.com/Re-Some-Serious-Issue-with-Spark-Streaming-Blocks-Getting-Removed-and-Jobs-have-Failed-tp13972p14081.html
 Sent from the Apache Spark User List mailing list archive
 http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.



Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-12 Thread Dibyendu Bhattacharya
Dear all,

I am sorry. This was a false alarm

There was some issue in the RDD processing logic which leads to large
backlog. Once I fixed the issues in my processing logic, I can see all
messages being pulled nicely without any Block Removed error. I need to
tune certain configurations in my Kafka Consumer to modify the data rate
and also the batch size.

Sorry again.


Regards,
Dibyendu

On Thu, Sep 11, 2014 at 8:13 PM, Nan Zhu zhunanmcg...@gmail.com wrote:

  This is my case about broadcast variable:

 14/07/21 19:49:13 INFO Executor: Running task ID 4
 14/07/21 19:49:13 INFO DAGScheduler: Completed ResultTask(0, 2)
 14/07/21 19:49:13 INFO TaskSetManager: Finished TID 2 in 95 ms on localhost 
 (progress: 3/106)
 14/07/21 19:49:13 INFO TableOutputFormat: Created table instance for 
 hdfstest_customers
 14/07/21 19:49:13 INFO Executor: Serialized size of result for 3 is 596
 14/07/21 19:49:13 INFO Executor: Sending result for 3 directly to driver
 14/07/21 19:49:13 INFO BlockManager: Found block broadcast_0 locally
 14/07/21 19:49:13 INFO Executor: Finished task ID 3
 14/07/21 19:49:13 INFO TaskSetManager: Starting task 0.0:5 as TID 5 on 
 executor localhost: localhost (PROCESS_LOCAL)
 14/07/21 19:49:13 INFO TaskSetManager: Serialized task 0.0:5 as 11885 bytes 
 in 0 ms
 14/07/21 19:49:13 INFO Executor: Running task ID 5
 14/07/21 19:49:13 INFO BlockManager: Removing broadcast 0
 14/07/21 19:49:13 INFO DAGScheduler: Completed ResultTask(0, 3)*14/07/21 
 19:49:13 INFO ContextCleaner: Cleaned broadcast 0*
 14/07/21 19:49:13 INFO TaskSetManager: Finished TID 3 in 97 ms on localhost 
 (progress: 4/106)
 14/07/21 19:49:13 INFO BlockManager: Found block broadcast_0 locally
 14/07/21 19:49:13 INFO BlockManager: Removing block broadcast_0*14/07/21 
 19:49:13 INFO MemoryStore: Block broadcast_0 of size 202564 dropped from 
 memory (free 886623436)*
 14/07/21 19:49:13 INFO ContextCleaner: Cleaned shuffle 0
 14/07/21 19:49:13 INFO ShuffleBlockManager: Deleted all files for shuffle 0
 14/07/21 19:49:13 INFO HadoopRDD: Input split: 
 hdfs://172.31.34.184:9000/etltest/hdfsData/customer.csv:25+5
 14/07/21 
 http://172.31.34.184:9000/etltest/hdfsData/customer.csv:25+514/07/21 
 19:49:13 INFO HadoopRDD: Input split: 
 hdfs://172.31.34.184:9000/etltest/hdfsData/customer.csv:20+5
 14/07/21 
 http://172.31.34.184:9000/etltest/hdfsData/customer.csv:20+514/07/21 
 19:49:13 INFO TableOutputFormat: Created table instance for hdfstest_customers
 14/07/21 19:49:13 INFO Executor: Serialized size of result for 4 is 596
 14/07/21 19:49:13 INFO Executor: Sending result for 4 directly to driver
 14/07/21 19:49:13 INFO Executor: Finished task ID 4
 14/07/21 19:49:13 INFO TaskSetManager: Starting task 0.0:6 as TID 6 on 
 executor localhost: localhost (PROCESS_LOCAL)
 14/07/21 19:49:13 INFO TaskSetManager: Serialized task 0.0:6 as 11885 bytes 
 in 0 ms
 14/07/21 19:49:13 INFO Executor: Running task ID 6
 14/07/21 19:49:13 INFO DAGScheduler: Completed ResultTask(0, 4)
 14/07/21 19:49:13 INFO TaskSetManager: Finished TID 4 in 80 ms on localhost 
 (progress: 5/106)
 14/07/21 19:49:13 INFO TableOutputFormat: Created table instance for 
 hdfstest_customers
 14/07/21 19:49:13 INFO Executor: Serialized size of result for 5 is 596
 14/07/21 19:49:13 INFO Executor: Sending result for 5 directly to driver
 14/07/21 19:49:13 INFO Executor: Finished task ID 5
 14/07/21 19:49:13 INFO TaskSetManager: Starting task 0.0:7 as TID 7 on 
 executor localhost: localhost (PROCESS_LOCAL)
 14/07/21 19:49:13 INFO TaskSetManager: Serialized task 0.0:7 as 11885 bytes 
 in 0 ms
 14/07/21 19:49:13 INFO Executor: Running task ID 7
 14/07/21 19:49:13 INFO DAGScheduler: Completed ResultTask(0, 5)
 14/07/21 19:49:13 INFO TaskSetManager: Finished TID 5 in 77 ms on localhost 
 (progress: 6/106)
 14/07/21 19:49:13 INFO HttpBroadcast: Started reading broadcast variable 0
 14/07/21 19:49:13 INFO HttpBroadcast: Started reading broadcast variable 0
 14/07/21 19:49:13 ERROR Executor: Exception in task ID 6
 java.io.FileNotFoundException: http://172.31.34.174:52070/broadcast_0
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624)
   at 
 org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:196)
   at 
 org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:89)
   at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
   at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
   

Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-12 Thread Jeoffrey Lim






 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/Re-Some-Serious-Issue-with-Spark-Streaming-Blocks-Getting-Removed-and-Jobs-have-Failed-tp13972p14075.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from Apache Spark User List, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=amVvZmZyZXlsQGdtYWlsLmNvbXwxfDUzNTE3MDc2OQ==
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Re-Some-Serious-Issue-with-Spark-Streaming-Blocks-Getting-Removed-and-Jobs-have-Failed-tp13972p14081.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-12 Thread Dibyendu Bhattacharya


 -
 To unsubscribe, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=14075i=1
 For additional commands, e-mail: [hidden email]
 http://user/SendEmail.jtp?type=nodenode=14075i=2

 Attachments:
  - driver-trace.txt






 --
  If you reply to this email, your message will be added to the
 discussion below:

 http://apache-spark-user-list.1001560.n3.nabble.com/Re-Some-Serious-Issue-with-Spark-Streaming-Blocks-Getting-Removed-and-Jobs-have-Failed-tp13972p14075.html
  To start a new topic under Apache Spark User List, email [hidden email]
 http://user/SendEmail.jtp?type=nodenode=14081i=1
 To unsubscribe from Apache Spark User List, click here.
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



 --
 View this message in context: Re: Some Serious Issue with Spark Streaming
 ? Blocks Getting Removed and Jobs have Failed..
 http://apache-spark-user-list.1001560.n3.nabble.com/Re-Some-Serious-Issue-with-Spark-Streaming-Blocks-Getting-Removed-and-Jobs-have-Failed-tp13972p14081.html
 Sent from the Apache Spark User List mailing list archive
 http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.



Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-12 Thread Tim Smith
, block input-0-1410443074600 not found

 ...

 INFO : org.apache.spark.scheduler.TaskSetManager - Lost task 0.1 in
 stage 7.0 (TID 126) on executor ip-10-252-5-62.asskickery.us:
 java.lang.Exception (Could not compute split, block input-0-1410443074600
 not found) [duplicate 1]


 org.apache.spark.SparkException: Job aborted due to stage failure: Task
 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage
 7.0 (TID 139, ip-10-252-5-62.asskickery.us): java.lang.Exception: Could not
 compute split, block input-0-1410443074600 not found
 org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)

 org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:227)

 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)

 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)

 Regards,
 Dibyendu


 -
 To unsubscribe, e-mail: [hidden email]
 For additional commands, e-mail: [hidden email]

 Attachments:
 - driver-trace.txt






 
 If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/Re-Some-Serious-Issue-with-Spark-Streaming-Blocks-Getting-Removed-and-Jobs-have-Failed-tp13972p14075.html
 To start a new topic under Apache Spark User List, email [hidden email]
 To unsubscribe from Apache Spark User List, click here.
 NAML



 
 View this message in context: Re: Some Serious Issue with Spark Streaming
 ? Blocks Getting Removed and Jobs have Failed..
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-11 Thread Nan Zhu
Hi,   

Can you attach more logs to see if there is some entry from ContextCleaner?

I met very similar issue before…but haven’t get resolved  

Best,  

--  
Nan Zhu


On Thursday, September 11, 2014 at 10:13 AM, Dibyendu Bhattacharya wrote:

 Dear All,  
  
 Not sure if this is a false alarm. But wanted to raise to this to understand 
 what is happening.  
  
 I am testing the Kafka Receiver which I have written 
 (https://github.com/dibbhatt/kafka-spark-consumer) which basically a low 
 level Kafka Consumer implemented custom Receivers for every Kafka topic 
 partitions and pulling data in parallel. Individual streams from all topic 
 partitions are then merged to create Union stream which used for further 
 processing.
  
 The custom Receiver working fine in normal load with no issues. But when I 
 tested this with huge amount of backlog messages from Kafka ( 50 million + 
 messages), I see couple of major issue in Spark Streaming. Wanted to get some 
 opinion on this
  
 I am using latest Spark 1.1 taken from the source and built it. Running in 
 Amazon EMR , 3 m1.xlarge Node Spark cluster running in Standalone Mode.
  
 Below are two main question I have..
  
 1. What I am seeing when I run the Spark Streaming with my Kafka Consumer 
 with a huge backlog in Kafka ( around 50 Million), Spark is completely busy 
 performing the Receiving task and hardly schedule any processing task. Can 
 you let me if this is expected ? If there is large backlog, Spark will take 
 long time pulling them . Why Spark not doing any processing ? Is it because 
 of resource limitation ( say all cores are busy puling ) or it is by design ? 
 I am setting the executor-memory to 10G and driver-memory to 4G .
  
 2. This issue seems to be more serious. I have attached the Driver trace with 
 this email. What I can see very frequently Block are selected to be 
 Removed...This kind of entries are all over the place. But when a Block is 
 removed , below problem happen May be this issue cause the issue 1 that 
 no Jobs are getting processed ..
  
  
 INFO : org.apache.spark.storage.MemoryStore - 1 blocks selected for dropping
 INFO : org.apache.spark.storage.BlockManager - Dropping block 
 input-0-1410443074600 from memory
 INFO : org.apache.spark.storage.MemoryStore - Block input-0-1410443074600 of 
 size 12651900 dropped from memory (free 21220667)
 INFO : org.apache.spark.storage.BlockManagerInfo - Removed 
 input-0-1410443074600 on ip-10-252-5-113.asskickery.us:53752 
 (http://ip-10-252-5-113.asskickery.us:53752) in memory (size: 12.1 MB, free: 
 100.6 MB)
  
 ...
  
 INFO : org.apache.spark.storage.BlockManagerInfo - Removed 
 input-0-1410443074600 on ip-10-252-5-62.asskickery.us:37033 
 (http://ip-10-252-5-62.asskickery.us:37033) in memory (size: 12.1 MB, free: 
 154.6 MB)
 ..
  
  
 WARN : org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 7.0 
 (TID 118, ip-10-252-5-62.asskickery.us 
 (http://ip-10-252-5-62.asskickery.us)): java.lang.Exception: Could not 
 compute split, block input-0-1410443074600 not found
  
 ...
  
 INFO : org.apache.spark.scheduler.TaskSetManager - Lost task 0.1 in stage 7.0 
 (TID 126) on executor ip-10-252-5-62.asskickery.us 
 (http://ip-10-252-5-62.asskickery.us): java.lang.Exception (Could not compute 
 split, block input-0-1410443074600 not found) [duplicate 1]
  
  
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 
 (TID 139, ip-10-252-5-62.asskickery.us 
 (http://ip-10-252-5-62.asskickery.us)): java.lang.Exception: Could not 
 compute split, block input-0-1410443074600 not found
 org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:51)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
 org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
 org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
 org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
 org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:744)
  
  
 Regards,  
 Dibyendu
  
  
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
 (mailto:user-unsubscr...@spark.apache.org)
 For additional commands, e-mail: user-h...@spark.apache.org 
 (mailto:user-h...@spark.apache.org)
  
  
  
  
 Attachments:  
 - driver-trace.txt
  




Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2014-09-11 Thread Nan Zhu
This is my case about broadcast variable:  

14/07/21 19:49:13 INFO Executor: Running task ID 4 14/07/21 19:49:13 INFO 
DAGScheduler: Completed ResultTask(0, 2) 14/07/21 19:49:13 INFO TaskSetManager: 
Finished TID 2 in 95 ms on localhost (progress: 3/106) 14/07/21 19:49:13 INFO 
TableOutputFormat: Created table instance for hdfstest_customers 14/07/21 
19:49:13 INFO Executor: Serialized size of result for 3 is 596 14/07/21 
19:49:13 INFO Executor: Sending result for 3 directly to driver 14/07/21 
19:49:13 INFO BlockManager: Found block broadcast_0 locally 14/07/21 19:49:13 
INFO Executor: Finished task ID 3 14/07/21 19:49:13 INFO TaskSetManager: 
Starting task 0.0:5 as TID 5 on executor localhost: localhost (PROCESS_LOCAL) 
14/07/21 19:49:13 INFO TaskSetManager: Serialized task 0.0:5 as 11885 bytes in 
0 ms 14/07/21 19:49:13 INFO Executor: Running task ID 5 14/07/21 19:49:13 INFO 
BlockManager: Removing broadcast 0 14/07/21 19:49:13 INFO DAGScheduler: 
Completed ResultTask(0, 3) 14/07/21 19:49:13 INFO ContextCleaner: Cleaned 
broadcast 0 14/07/21 19:49:13 INFO TaskSetManager: Finished TID 3 in 97 ms on 
localhost (progress: 4/106) 14/07/21 19:49:13 INFO BlockManager: Found block 
broadcast_0 locally 14/07/21 19:49:13 INFO BlockManager: Removing block 
broadcast_0 14/07/21 19:49:13 INFO MemoryStore: Block broadcast_0 of size 
202564 dropped from memory (free 886623436) 14/07/21 19:49:13 INFO 
ContextCleaner: Cleaned shuffle 0 14/07/21 19:49:13 INFO ShuffleBlockManager: 
Deleted all files for shuffle 0 14/07/21 19:49:13 INFO HadoopRDD: Input split: 
hdfs://172.31.34.184:9000/etltest/hdfsData/customer.csv:25+5 14/07/21 19:49:13 
INFO HadoopRDD: Input split: 
hdfs://172.31.34.184:9000/etltest/hdfsData/customer.csv:20+5 14/07/21 19:49:13 
INFO TableOutputFormat: Created table instance for hdfstest_customers 14/07/21 
19:49:13 INFO Executor: Serialized size of result for 4 is 596 14/07/21 
19:49:13 INFO Executor: Sending result for 4 directly to driver 14/07/21 
19:49:13 INFO Executor: Finished task ID 4 14/07/21 19:49:13 INFO 
TaskSetManager: Starting task 0.0:6 as TID 6 on executor localhost: localhost 
(PROCESS_LOCAL) 14/07/21 19:49:13 INFO TaskSetManager: Serialized task 0.0:6 as 
11885 bytes in 0 ms 14/07/21 19:49:13 INFO Executor: Running task ID 6 14/07/21 
19:49:13 INFO DAGScheduler: Completed ResultTask(0, 4) 14/07/21 19:49:13 INFO 
TaskSetManager: Finished TID 4 in 80 ms on localhost (progress: 5/106) 14/07/21 
19:49:13 INFO TableOutputFormat: Created table instance for hdfstest_customers 
14/07/21 19:49:13 INFO Executor: Serialized size of result for 5 is 596 
14/07/21 19:49:13 INFO Executor: Sending result for 5 directly to driver 
14/07/21 19:49:13 INFO Executor: Finished task ID 5 14/07/21 19:49:13 INFO 
TaskSetManager: Starting task 0.0:7 as TID 7 on executor localhost: localhost 
(PROCESS_LOCAL) 14/07/21 19:49:13 INFO TaskSetManager: Serialized task 0.0:7 as 
11885 bytes in 0 ms 14/07/21 19:49:13 INFO Executor: Running task ID 7 14/07/21 
19:49:13 INFO DAGScheduler: Completed ResultTask(0, 5) 14/07/21 19:49:13 INFO 
TaskSetManager: Finished TID 5 in 77 ms on localhost (progress: 6/106) 14/07/21 
19:49:13 INFO HttpBroadcast: Started reading broadcast variable 0 14/07/21 
19:49:13 INFO HttpBroadcast: Started reading broadcast variable 0 14/07/21 
19:49:13 ERROR Executor: Exception in task ID 6 java.io.FileNotFoundException: 
http://172.31.34.174:52070/broadcast_0 at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream 
(http://www.protocol.http.HttpURLConnection.getInputStream)(HttpURLConnection.java:1624)
 at org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:196) at 
org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:89) at 
sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606) at 
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at 
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) at 
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) at 
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) at 
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) at 
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) at 
scala.collection.immutable.$colon$colon.readObject(List.scala:362) at