Re: [jira] [Commented] (KAFKA-1555) provide strong consistency with reasonable availability

Robert Withers Wed, 23 Jul 2014 19:06:28 -0700

Hi Jun,

Yes, that's what I am thinking. It allows maintaining a pool of offline, but 
current and consistent replica shadows, ready to be flipped into ISR.  Due to 
their being out of ISR prevents them being counted in quorum, yet ready to go, 
so no impact to the producers.


Looking at it through algebra sunglasses means we would establish a secondary 
space of replication but with a different dimensional projection into the 
parent meta space, which is the current ISR replication space, itself projected 
into consumers' meta space as the leader partition.  I am thinking it adds 
another layer of depth, to shore the defenses.

- Rob

> On Jul 23, 2014, at 7:46 PM, "Jun Rao (JIRA)" <[email protected]> wrote:
> 
> 
>    [ 
> https://issues.apache.org/jira/browse/KAFKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072690#comment-14072690
>  ] 
> 
> Jun Rao commented on KAFKA-1555:
> --------------------------------
> 
> Rob,
> 
> Is that any different from just running with a higher replication factor?
> 
>> provide strong consistency with reasonable availability
>> -------------------------------------------------------
>> 
>>                Key: KAFKA-1555
>>                URL: https://issues.apache.org/jira/browse/KAFKA-1555
>>            Project: Kafka
>>         Issue Type: Improvement
>>         Components: controller
>>   Affects Versions: 0.8.1.1
>>           Reporter: Jiang Wu
>>           Assignee: Neha Narkhede
>> 
>> In a mission critical application, we expect a kafka cluster with 3 brokers 
>> can satisfy two requirements:
>> 1. When 1 broker is down, no message loss or service blocking happens.
>> 2. In worse cases such as two brokers are down, service can be blocked, but 
>> no message loss happens.
>> We found that current kafka versoin (0.8.1.1) cannot achieve the 
>> requirements due to its three behaviors:
>> 1. when choosing a new leader from 2 followers in ISR, the one with less 
>> messages may be chosen as the leader.
>> 2. even when replica.lag.max.messages=0, a follower can stay in ISR when it 
>> has less messages than the leader.
>> 3. ISR can contains only 1 broker, therefore acknowledged messages may be 
>> stored in only 1 broker.
>> The following is an analytical proof. 
>> We consider a cluster with 3 brokers and a topic with 3 replicas, and assume 
>> that at the beginning, all 3 replicas, leader A, followers B and C, are in 
>> sync, i.e., they have the same messages and are all in ISR.
>> According to the value of request.required.acks (acks for short), there are 
>> the following cases.
>> 1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
>> 2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this 
>> time, although C hasn't received m, C is still in ISR. If A is killed, C can 
>> be elected as the new leader, and consumers will miss m.
>> 3. acks=-1. B and C restart and are removed from ISR. Producer sends a 
>> message m to A, and receives an acknowledgement. Disk failure happens in A 
>> before B and C replicate m. Message m is lost.
>> In summary, any existing configuration cannot satisfy the requirements.
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)

Re: [jira] [Commented] (KAFKA-1555) provide strong consistency with reasonable availability

Reply via email to