Re: Achieving Consistency and Durability

Gwen Shapira Mon, 20 Oct 2014 13:33:24 -0700

Hi Kyle,

I added new documentation, which will hopefully help. Please take a look here:
https://issues.apache.org/jira/browse/KAFKA-1555


I've heard rumors that you are very very good at documenting, so I'm
looking forward to your comments.

Note that I'm completely ignoring the acks>1 case since we are about
to remove it.

Gwen

On Wed, Oct 15, 2014 at 1:21 PM, Kyle Banker <kyleban...@gmail.com> wrote:
> Thanks very much for these clarifications, Gwen.
>
> I'd recommend modifying the following phrase describing "acks=-1":
>
> "This option provides the best durability, we guarantee that no messages
> will be lost as long as at least one in sync replica remains."
>
> The "as long as at least one in sync replica remains" is such a huge
> caveat. It should be noted that "acks=-1" provides no actual durability
> guarantees unless min.isr is also used to specify a majority of replicas.
>
> In addition, I was curious if you might comment on my other recent posting
> "Consistency and Availability on Node Failures" and possibly add this
> scenario to the docs. With acks=-1 and min.isr=2 and a 3-replica topic in a
> 12-node Kafka cluster, there's a relatively high probability that losing 2
> nodes from this cluster will result in an inability to write to the cluster.
>
> On Tue, Oct 14, 2014 at 4:50 PM, Gwen Shapira <gshap...@cloudera.com> wrote:
>
>> ack = 2 *will* throw an exception when there's only one node in ISR.
>>
>> The problem with ack=2 is that if you have 3 replicas and you got acks
>> from 2 of them, the one replica which did not get the message can
>> still be in ISR and get elected as leader, leading for a loss of the
>> message. If you specify ack=3, you can't tolerate the failure of a
>> single replica. Not amazing either.
>>
>> To makes things even worse, when specifying the number of acks you
>> want, you don't always know how many replicas the topic should have,
>> so its difficult to pick the correct number.
>>
>> acks = -1 solves that problem (since all messages need to get acked by
>> all replicas), but introduces the new problem of not getting an
>> exception if ISR shrank to 1 replica.
>>
>> Thats why the min.isr configuration was added.
>>
>> I hope this clarifies things :)
>> I'm planning to add this to the docs in a day or two, so let me know
>> if there are any additional explanations or scenarios you think we
>> need to include.
>>
>> Gwen
>>
>> On Tue, Oct 14, 2014 at 12:27 PM, Scott Reynolds <sreyno...@twilio.com>
>> wrote:
>> > A question about 0.8.1.1 and acks. I was under the impression that
>> setting
>> > acks to 2 will not throw an exception when there is only one node in ISR.
>> > Am I incorrect ? Thus the need for min_isr.
>> >
>> > On Tue, Oct 14, 2014 at 11:50 AM, Kyle Banker <kyleban...@gmail.com>
>> wrote:
>> >
>> >> It's quite difficult to infer from the docs the exact techniques
>> required
>> >> to ensure consistency and durability in Kafka. I propose that we add a
>> doc
>> >> section detailing these techniques. I would be happy to help with this.
>> >>
>> >> The basic question is this: assuming that I can afford to temporarily
>> halt
>> >> production to Kafka, how do I ensure that no message written to Kafka is
>> >> ever lost under typical failure scenarios (i.e., the loss of a single
>> >> broker)?
>> >>
>> >> Here's my understanding of this for Kafka v0.8.1.1:
>> >>
>> >> 1. Create a topic with a replication factor of 3.
>> >> 2. Use a sync producer and set acks to 2. (Setting acks to -1 may
>> >> successfully write even in a case where the data is written only to a
>> >> single node).
>> >>
>> >> Even with these two precautions, there's always the possibility of an
>> >> "unclean leader election." Can data loss still occur in this scenario?
>> Is
>> >> it possible to achieve this level of durability on v0.8.1.1?
>> >>
>> >> In Kafka v0.8.2, in addition to the above:
>> >>
>> >> 3. Ensure that the triple-replicated topic also disallows unclean leader
>> >> election (https://issues.apache.org/jira/browse/KAFKA-1028).
>> >>
>> >> 4. Set the min.isr value of the producer to 2 and acks to -1 (
>> >> https://issues.apache.org/jira/browse/KAFKA-1555). The producer will
>> then
>> >> throw an exception if data can't be written to 2 out of 3 nodes.
>> >>
>> >> In addition to producer configuration and usage, there are also
>> monitoring
>> >> and operations considerations for achieving durability and consistency.
>> As
>> >> those are rather nuanced, it'd probably be easiest to just start
>> iterating
>> >> on a document to flesh those out.
>> >>
>> >> If anyone has any advice on how to better specify this, or how to get
>> >> started on improving the docs, I'm happy to help out.
>> >>
>>

Re: Achieving Consistency and Durability

Reply via email to