Re: Direct Kafka input stream and window(…) function

2016-03-29 Thread Martin Soch
Hi Cody, thanks for your answer. I have finally managed to create simple sample code. Here it is: import kafka.serializer.StringDecoder; import org.apache.spark.SparkConf; import org.apache.spark.streaming.Durations; import org.apache.spark.streaming.api.java.*; import

Re: Direct Kafka input stream and window(…) function

2016-03-24 Thread Cody Koeninger
If this is related to https://issues.apache.org/jira/browse/SPARK-14105 , are you windowing before doing any transformations at all? Try using map to extract the data you care about before windowing. On Tue, Mar 22, 2016 at 12:24 PM, Cody Koeninger wrote: > I definitely have

Re: Direct Kafka input stream and window(…) function

2016-03-22 Thread Cody Koeninger
I definitely have direct stream jobs that use window() without problems... Can you post a minimal code example that reproduces the problem? Using print() will confuse the issue, since print() will try to only use the first partition. Use foreachRDD { rdd => rdd.foreach(println) or something

Direct Kafka input stream and window(…) function

2016-03-22 Thread Martin Soch
Hi all, I am using direct-Kafka-input-stream in my Spark app. When I use window(...) function in the chain it will cause the processing pipeline to stop - when I open the Spark-UI I can see that the streaming batches are being queued and the pipeline reports to process one of the first