[ 
https://issues.apache.org/jira/browse/KAFKA-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076286#comment-14076286
 ] 

Jun Rao commented on KAFKA-1555:
--------------------------------

Hmm, I have to understand the semantics and the implementation of 
dfs.replication.min a bit more. For now, this seems more like the ack > 1 case 
in Kafka. Basically, if you have R replicas and you set dfs.replication.min to 
a value smaller than R, you can get an ack back sooner. However, there is a 
slight possibility that the acked data is lost with fewer than R-1 failures.

A couple of other thoughts on "min.isr.required". 

1. Let's say we do support this mode. However, immediately after a message is 
successfully published, some replicas could go down and bring the ISR below 
"min.isr.required". There is still no guarantee that you have more than one 
copy of the data at that point. So, "min.isr.required" can only be enforced at 
the time of publishing. Is that what you want?

2. I image the implementation could be when a message is committed, we check 
the ISR size and return an error if |ISR| is less than "min.isr.required". 
However, the message is still committed and will be exposed to the consumer. 
The error is just to tell the producer that the message may be under 
replicated. Does that match what you expect?

> provide strong consistency with reasonable availability
> -------------------------------------------------------
>
>                 Key: KAFKA-1555
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1555
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller
>    Affects Versions: 0.8.1.1
>            Reporter: Jiang Wu
>            Assignee: Neha Narkhede
>
> In a mission critical application, we expect a kafka cluster with 3 brokers 
> can satisfy two requirements:
> 1. When 1 broker is down, no message loss or service blocking happens.
> 2. In worse cases such as two brokers are down, service can be blocked, but 
> no message loss happens.
> We found that current kafka versoin (0.8.1.1) cannot achieve the requirements 
> due to its three behaviors:
> 1. when choosing a new leader from 2 followers in ISR, the one with less 
> messages may be chosen as the leader.
> 2. even when replica.lag.max.messages=0, a follower can stay in ISR when it 
> has less messages than the leader.
> 3. ISR can contains only 1 broker, therefore acknowledged messages may be 
> stored in only 1 broker.
> The following is an analytical proof. 
> We consider a cluster with 3 brokers and a topic with 3 replicas, and assume 
> that at the beginning, all 3 replicas, leader A, followers B and C, are in 
> sync, i.e., they have the same messages and are all in ISR.
> According to the value of request.required.acks (acks for short), there are 
> the following cases.
> 1. acks=0, 1, 3. Obviously these settings do not satisfy the requirement.
> 2. acks=2. Producer sends a message m. It's acknowledged by A and B. At this 
> time, although C hasn't received m, C is still in ISR. If A is killed, C can 
> be elected as the new leader, and consumers will miss m.
> 3. acks=-1. B and C restart and are removed from ISR. Producer sends a 
> message m to A, and receives an acknowledgement. Disk failure happens in A 
> before B and C replicate m. Message m is lost.
> In summary, any existing configuration cannot satisfy the requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to