Re: Low Level Kafka Consumer for Spark

Dibyendu Bhattacharya Mon, 15 Sep 2014 10:25:19 -0700

Hi Tim,

I have not tried persist the RDD.


Here are some discussion on Rate Limiting Spark Streaming is there in this
thread.

http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-rate-limiting-from-kafka-td8590.html

There is a Pull Request https://github.com/apache/spark/pull/945/files to
fix this Rate Limiting issue at BlockGenerator level.

But while testing with heavy load, this fix did not solve my problem. So I
had to have Rate Limiting built into Kafka Consumer. I will make it
configurable soon.

If this is not done, I can see Block are getting dropped which leads to Job
failure.

I have raised this in another thread ..

https://mail.google.com/mail/u/1/?tab=wm#search/Serious/148650fd829cd239.
But have not got any answer yet if this is a bug ( Block getting dropped
and Job failed).



Dib


On Mon, Sep 15, 2014 at 10:33 PM, Tim Smith <secs...@gmail.com> wrote:

> Hi Dibyendu,
>
> I am a little confused about the need for rate limiting input from
> kafka. If the stream coming in from kafka has higher message/second
> rate than what a Spark job can process then it should simply build a
> backlog in Spark if the RDDs are cached on disk using persist().
> Right?
>
> Thanks,
>
> Tim
>
>
> On Mon, Sep 15, 2014 at 4:33 AM, Dibyendu Bhattacharya
> <dibyendu.bhattach...@gmail.com> wrote:
> > Hi Alon,
> >
> > No this will not be guarantee that same set of messages will come in same
> > RDD. This fix just re-play the messages from last processed offset in
> same
> > order. Again this is just a interim fix we needed to solve our use case
> . If
> > you do not need this message re-play feature, just do not perform the
> ack (
> > Acknowledgement) call in the Driver code. Then the processed messages
> will
> > not be written to ZK and hence replay will not happen.
> >
> > Regards,
> > Dibyendu
> >
> > On Mon, Sep 15, 2014 at 4:48 PM, Alon Pe'er <alo...@supersonicads.com>
> > wrote:
> >>
> >> Hi Dibyendu,
> >>
> >> Thanks for your great work!
> >>
> >> I'm new to Spark Streaming, so I just want to make sure I understand
> >> Driver
> >> failure issue correctly.
> >>
> >> In my use case, I want to make sure that messages coming in from Kafka
> are
> >> always broken into the same set of RDDs, meaning that if a set of
> messages
> >> are assigned to one RDD, and the Driver dies before this RDD is
> processed,
> >> then once the Driver recovers, the same set of messages are assigned to
> a
> >> single RDD, instead of arbitrarily repartitioning the messages across
> >> different RDDs.
> >>
> >> Does your Receiver guarantee this behavior, until the problem is fixed
> in
> >> Spark 1.2?
> >>
> >> Regards,
> >> Alon
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p14233.html
> >> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Low Level Kafka Consumer for Spark

Reply via email to