Hi Cody,
thanks for your answer. I have finally managed to create simple sample
code. Here it is:
import kafka.serializer.StringDecoder;
import org.apache.spark.SparkConf;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.*;
import
If this is related to
https://issues.apache.org/jira/browse/SPARK-14105 , are you windowing
before doing any transformations at all? Try using map to extract the
data you care about before windowing.
On Tue, Mar 22, 2016 at 12:24 PM, Cody Koeninger wrote:
> I definitely have
I definitely have direct stream jobs that use window() without
problems... Can you post a minimal code example that reproduces the
problem?
Using print() will confuse the issue, since print() will try to only
use the first partition.
Use foreachRDD { rdd => rdd.foreach(println)
or something
Hi all,
I am using direct-Kafka-input-stream in my Spark app. When I use
window(...) function in the chain it will cause the processing pipeline
to stop - when I open the Spark-UI I can see that the streaming batches
are being queued and the pipeline reports to process one of the first