[ 
https://issues.apache.org/jira/browse/KAFKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075038#comment-14075038
 ] 

saurabh agarwal commented on KAFKA-1555:
----------------------------------------

Jun, 

I agree that ack=-1 works fine for the most of the use cases. Here is another 
suggestion that might address our use case. Can we add an additional option in 
producer property - "min.isr.required" ( similar to dfs.replication.min) for 
durability. "ack=-1" ensures that every replicas in ISR will receive the 
message before producer get the ack. And "min.isr.required=2" ensures that 
there are minimum two replicas in ISR to publish a message. Otherwise it will 
throw the exception that "Number of the required replicas is not in ISR". 

Here is the example where this will be very useful. Take a scenario where the 
producer was publishing at very high rate. We bring down two follower replicas. 
Now when we bring back up those replicas, it took a while for those replicas to 
catch up as there are still messages getting published at the higher rate. So 
there is no replicas in ISR for a while. During this time, if the disk at the 
leader replica fail, then we will not have any replica who has those messages. 
It would be good if we have more than one copy in ISR all the time. And this 
will address our usecase where we need strong consistency, high durability with 
reasonable availability. 

> provide strong consistency with reasonable availability
> -------------------------------------------------------
>
>                 Key: KAFKA-1555
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1555
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller
>    Affects Versions: 0.8.1.1
>            Reporter: Jiang Wu
>            Assignee: Neha Narkhede
>
> In a mission critical application, we expect a kafka cluster with 3 brokers 
> can satisfy two requirements:
> 1. When 1 broker is down, no message loss or service blocking happens.
> 2. In worse cases such as two brokers are down, service can be blocked, but 
> no message loss happens.
> We found that current kafka versoin (0.8.1.1) cannot achieve the requirements 
> due to its three behaviors:
> 1. when choosing a new leader from 2 followers in ISR, the one with less 
> messages may be chosen as the leader.
> 2. even when replica.lag.max.messages=0, a follower can stay in ISR when it 
> has less messages than the leader.
> 3. ISR can contains only 1 broker, therefore acknowledged messages may be 
> stored in only 1 broker.
> The following is an analytical proof. 
> We consider a cluster with 3 brokers and a topic with 3 replicas, and assume 
> that at the beginning, all 3 replicas, leader A, followers B and C, are in 
> sync, i.e., they have the same messages and are all in ISR.
> According to the value of request.required.acks (acks for short), there are 
> the following cases.
> 1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
> 2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this 
> time, although C hasn't received m, C is still in ISR. If A is killed, C can 
> be elected as the new leader, and consumers will miss m.
> 3. acks=-1. B and C restart and are removed from ISR. Producer sends a 
> message m to A, and receives an acknowledgement. Disk failure happens in A 
> before B and C replicate m. Message m is lost.
> In summary, any existing configuration cannot satisfy the requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to