RE: Double Counting When Using Accumulators with Spark Streaming

2016-01-05 Thread Rachana Srivastava
ssage without saveAs INFO : org.apache.spark.executor.Executor - Finished task 0.0 in stage 1.0 (TID 1). 987 bytes result sent to driver INFO : org.apache.spark.scheduler.DAGScheduler - ResultStage 1 (foreachRDD at KafkaURLStreaming.java:90) finished in 0.103 s INFO : org.apache.spark.sche

Re: Double Counting When Using Accumulators with Spark Streaming

2016-01-05 Thread Jean-Baptiste Onofré
Hi Rachana, don't you have two messages on the kafka broker ? Regards JB On 01/05/2016 05:14 PM, Rachana Srivastava wrote: I have a very simple two lines program. I am getting input from Kafka and save the input in a file and counting the input received. My code looks like this, when I run

Double Counting When Using Accumulators with Spark Streaming

2016-01-05 Thread Rachana Srivastava
I have a very simple two lines program. I am getting input from Kafka and save the input in a file and counting the input received. My code looks like this, when I run this code I am getting two accumulator count for each input. HashMap kafkaParams = new HashMap

Re: Double Counting When Using Accumulators with Spark Streaming

2016-01-05 Thread Shixiong(Ryan) Zhu
; tasks from ResultStage 1 (MapPartitionsRDD[3] at map at > KafkaURLStreaming.java:83) > > INFO : org.apache.spark.scheduler.TaskSchedulerImpl - Adding task set 1.0 > with 1 tasks > > INFO : org.apache.spark.scheduler.TaskSetManager - Starting task 0.0 in > stage 1.0 (TID 1, localhost, AN