It looked like from your graphs that you had a 10 second batch time, but
that your processing time was consistently 11 seconds. If that's correct,
then yes your delay is going to keep growing. You'd need to either
increase your batch time, or get your processing time down (either by
adding more
That`s correct, I have 10 seconds batch.
The problem is actually in processing time, it is increasing constantly no
matter how small or large my window duration is.
I am trying to prepare some example code to clarify my use case.
--
Яндекс.Почта — надёжная почта
Thank you very much for great answer!
--
Яндекс.Почта — надёжная почта
http://mail.yandex.ru/neo2/collect/?exp=1=1
08.09.2015, 23:53, "Cody Koeninger" :
> Yeah, that's the general idea.
>
> When you say hard code topic name, do you mean Set(topicA, topicB, topicB) ?
>
Oh my, I implemented one directStream instead of union of three but it is still
growing exponential with window method.
--
Яндекс.Почта — надёжная почта
http://mail.yandex.ru/neo2/collect/?exp=1=1
08.09.2015, 23:53, "Cody Koeninger" :
> Yeah, that's the general idea.
>
>
Yeah, that's the general idea.
When you say hard code topic name, do you mean Set(topicA, topicB, topicB)
? You should be able to use a variable for that - read it from a config
file, whatever.
If you're talking about the match statement, yeah you'd need to hardcode
your cases.
On Tue, Sep
Ok.
Spark 1.4.1 on yarn
Here is my application
I have 4 different Kafka topics(different object streams)
type Edge = (String,String)
val a = KafkaUtils.createDirectStream[...](sc,"A",params).filter( nonEmpty
).map( toEdge )
val b = KafkaUtils.createDirectStream[...](sc,"B",params).filter(
The thing is, that these topics contain absolutely different AVRO
objects(Array[Byte]) that I need to deserialize to different Java(Scala)
objects, filter and then map to tuple (String, String). So i have 3 streams
with different avro object in there. I need to cast them(using some business
I'm not 100% sure what's going on there, but why are you doing a union in
the first place?
If you want multiple topics in a stream, just pass them all in the set of
topics to one call to createDirectStream
On Tue, Sep 8, 2015 at 10:52 AM, Alexey Ponkin wrote:
> Ok.
> Spark
That doesn't really matter. With the direct stream you'll get all objects
for a given topicpartition in the same spark partition. You know what
topic it's from via hasOffsetRanges. Then you can deserialize
appropriately based on topic.
On Tue, Sep 8, 2015 at 11:16 AM, Понькин Алексей
Hi,
I have an application with 2 streams, which are joined together.
Stream1 - is simple DStream(relativly small size batch chunks)
Stream2 - is a windowed DStream(with duration for example 60 seconds)
Stream1 and Stream2 are Kafka direct stream.
The problem is that according to logs window
Can you provide more info (what version of spark, code example)?
On Tue, Sep 8, 2015 at 8:18 AM, Alexey Ponkin wrote:
> Hi,
>
> I have an application with 2 streams, which are joined together.
> Stream1 - is simple DStream(relativly small size batch chunks)
> Stream2 - is a
11 matches
Mail list logo