Re: Spark Streaming Empty DStream / RDD and reduceByKey

2014-10-15 Thread Sean Owen
The problem is not ReduceWords, since it is already Serializable by implementing Function2. Indeed the error tells you just what is unserializable: KafkaStreamingWordCount, your driver class. Something is causing a reference to the containing class to be serialized in the closure. The best fix is

Re: Spark Streaming Empty DStream / RDD and reduceByKey

2014-10-15 Thread Abraham Jacob
Hi All, I figured out what the problem was. Thank you Sean for pointing me in the right direction. All the jibber jabber about empty DStream / RDD was all just pure nonsense [?] . I guess the sequence of events (the fact that spark streaming started crashing just after I implemented the

Spark Streaming Empty DStream / RDD and reduceByKey

2014-10-14 Thread Abraham Jacob
Hi All, I am trying to understand what is going on in my simple WordCount Spark Streaming application. Here is the setup - I have a Kafka producer that is streaming words (lines of text). On the flip side, I have a spark streaming application that uses the high-level Kafka/Spark connector to

Re: Spark Streaming Empty DStream / RDD and reduceByKey

2014-10-14 Thread Abraham Jacob
Yeah... it totally should be... There is nothing fancy in there - import org.apache.spark.api.java.function.Function2; public class ReduceWords implements Function2Integer, Integer, Integer { private static final long serialVersionUID = -6076139388549335886L; public Integer call(Integer

Re: Spark Streaming Empty DStream / RDD and reduceByKey

2014-10-14 Thread Abraham Jacob
The results are no different - import org.apache.spark.api.java.function.Function2; import java.io.Serializable; public class ReduceWords implements Serializable, Function2Integer, Integer, Integer { private static final long serialVersionUID = -6076139388549335886L; public Integer