Hi Tim, I have not tried persist the RDD.
Here are some discussion on Rate Limiting Spark Streaming is there in this thread. http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-rate-limiting-from-kafka-td8590.html There is a Pull Request https://github.com/apache/spark/pull/945/files to fix this Rate Limiting issue at BlockGenerator level. But while testing with heavy load, this fix did not solve my problem. So I had to have Rate Limiting built into Kafka Consumer. I will make it configurable soon. If this is not done, I can see Block are getting dropped which leads to Job failure. I have raised this in another thread .. https://mail.google.com/mail/u/1/?tab=wm#search/Serious/148650fd829cd239. But have not got any answer yet if this is a bug ( Block getting dropped and Job failed). Dib On Mon, Sep 15, 2014 at 10:33 PM, Tim Smith <secs...@gmail.com> wrote: > Hi Dibyendu, > > I am a little confused about the need for rate limiting input from > kafka. If the stream coming in from kafka has higher message/second > rate than what a Spark job can process then it should simply build a > backlog in Spark if the RDDs are cached on disk using persist(). > Right? > > Thanks, > > Tim > > > On Mon, Sep 15, 2014 at 4:33 AM, Dibyendu Bhattacharya > <dibyendu.bhattach...@gmail.com> wrote: > > Hi Alon, > > > > No this will not be guarantee that same set of messages will come in same > > RDD. This fix just re-play the messages from last processed offset in > same > > order. Again this is just a interim fix we needed to solve our use case > . If > > you do not need this message re-play feature, just do not perform the > ack ( > > Acknowledgement) call in the Driver code. Then the processed messages > will > > not be written to ZK and hence replay will not happen. > > > > Regards, > > Dibyendu > > > > On Mon, Sep 15, 2014 at 4:48 PM, Alon Pe'er <alo...@supersonicads.com> > > wrote: > >> > >> Hi Dibyendu, > >> > >> Thanks for your great work! > >> > >> I'm new to Spark Streaming, so I just want to make sure I understand > >> Driver > >> failure issue correctly. > >> > >> In my use case, I want to make sure that messages coming in from Kafka > are > >> always broken into the same set of RDDs, meaning that if a set of > messages > >> are assigned to one RDD, and the Driver dies before this RDD is > processed, > >> then once the Driver recovers, the same set of messages are assigned to > a > >> single RDD, instead of arbitrarily repartitioning the messages across > >> different RDDs. > >> > >> Does your Receiver guarantee this behavior, until the problem is fixed > in > >> Spark 1.2? > >> > >> Regards, > >> Alon > >> > >> > >> > >> -- > >> View this message in context: > >> > http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p14233.html > >> Sent from the Apache Spark User List mailing list archive at Nabble.com. > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> For additional commands, e-mail: user-h...@spark.apache.org > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >