If this is related to
https://issues.apache.org/jira/browse/SPARK-14105 , are you windowing
before doing any transformations at all?  Try using map to extract the
data you care about before windowing.

On Tue, Mar 22, 2016 at 12:24 PM, Cody Koeninger <c...@koeninger.org> wrote:
> I definitely have direct stream jobs that use window() without
> problems... Can you post a minimal code example that reproduces the
> problem?
>
> Using print() will confuse the issue, since print() will try to only
> use the first partition.
>
> Use foreachRDD { rdd => rdd.foreach(println)
>
> or something comparable
>
> On Tue, Mar 22, 2016 at 10:14 AM, Martin Soch <martin.s...@oracle.com> wrote:
>> Hi all,
>>
>> I am using direct-Kafka-input-stream in my Spark app. When I use window(...)
>> function in the chain it will cause the processing pipeline to stop - when I
>> open the Spark-UI I can see that the streaming batches are being queued and
>> the pipeline reports to process one of the first batches.
>>
>> To be more correct: the issue happens only when the windows overlap (if
>> sliding_interval < window_length). Otherwise the system behaves as expected.
>>
>> Derivations of window(..) function - like reduceByKeyAndWindow(..), etc.
>> works also as expected - pipeline doesn't stop. The same applies when using
>> different type of stream.
>>
>> Is it some known limitation of window(..) function when used with
>> direct-Kafka-input-stream ?
>>
>> Java pseudo code:
>> org.apache.spark.streaming.kafka.DirectKafkaInputDStream s;
>> s.window(Durations.seconds(10)).print();  // the pipeline will stop
>>
>> Thanks
>> Martin
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to