Can you be a bit more specific about what “blow up” means? Also what do you 
mean by “messed up” brokers? Inbalance? Broker(s) dead?

We’re also using the direct consumer and so far nothing dramatic happened:
- on READ it automatically reads from backups if leader is dead (machine gone)
- or READ if there is a huge imbalance (partitions/leaders) the job might slow 
down if you don’t have enough cores on the machine with many partitions
- on WRITE - we’ve seen a weird delay of ~7 seconds that I don’t know how to 
re-configure, there’s a timeout that delays the job but it eventually writes 
data to a replica
- it only died when there are no more brokers left and there are partitions 
without a leader. This happened when almost half the cluster was dead during a 
reliability test

Regardless, I would look at the source and try to monitor the kafka cluster for 
things like partitions without leaders or big inbalances.

Hope this helps,
-adrian





On 11/9/15, 8:26 PM, "swetha" <swethakasire...@gmail.com> wrote:

>Hi,
>
>How to recover Kafka Direct automatically when the there is a problem with
>Kafka brokers? Sometimes our Kafka Brokers gets messed up and the entire
>Streaming job blows up unlike some other consumers which do recover
>automatically. How can I make sure that Kafka Direct recovers automatically
>when the broker fails for sometime say 30 minutes? What kind of monitors
>should be in place to recover the job?
>
>Thanks,
>Swetha 
>
>
>
>--
>View this message in context: 
>http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Direct-does-not-recover-automatically-when-the-Kafka-Stream-gets-messed-up-tp25331.html
>Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>For additional commands, e-mail: user-h...@spark.apache.org
>

Reply via email to