[ 
https://issues.apache.org/jira/browse/KAFKA-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058568#comment-14058568
 ] 

Andrey Stepachev edited comment on KAFKA-1530 at 7/11/14 9:00 AM:
------------------------------------------------------------------

Looks like [~ovgolovin] problem with wrong replica election can be fixed by 
adding notion of min-replicas somewhere around that code 
https://github.com/apache/kafka/blob/3c4ca854fd2da5e5fcecdaf0856a38a9ebe4763c/core/src/main/scala/kafka/cluster/Partition.scala#L165,
 we can restrict leader election/reelection only for partitions which have 
configured size of isr.

According to [~renew] situation, it is not realistic to loose data in situation 
when leader is stopped and one of the replica will became the leader and _if_ 
acks required greater then 1. kafka maintains 'high watermark' for each 
partition and for each request it waits for required replicas to catch up with 
leader before responds to client. So if it is not a correlated failure (when we 
loose 2 replicas at once) it will work correctly. If it was 2 replicas and 1 
replica outside of sir, both in ISR die, then it is possible to bring up third 
replica and new data in original ISR replicas data will be lost.

Just to be sure, kafka is a 'primary backup' replication system, so it doesn't 
tolerate correlated failures in oppose to quorum system. But gives high 
throughput in return. That how in stands :)


was (Author: octo47):
Looks like [~ovgolovin] problem with wrong replica election can be fixed by 
adding notion of min-replicas somewhere around that code 
https://github.com/apache/kafka/blob/3c4ca854fd2da5e5fcecdaf0856a38a9ebe4763c/core/src/main/scala/kafka/cluster/Partition.scala#L165,
 we can restrict leader election/reelection only for partitions which have 
configured size of isr.

According to [~renew] situation, it is not realistic to loose data in situation 
when leader is stopped and one of the replica will became the leader and _if_ 
acks required greater then 1. kafka maintains 'high watermark' for each 
partition and for each request it waits for required replicas to catch up with 
leader before responds to client. So if it is not a correlated failure (when we 
loose 2 replicas at once) it will work correctly. If it was 2 replicas and 1 
replica outside of sir, both in ISR die, then it is possible to bring up third 
replica and new data in original ISR replicas data will be lost.

Just to be sure, kafka is a 'primary backup' replication system, so in doesn't 
tolerate correlated failures in oppose to quorum system. But gives high 
throughput in return. That how in stands :)

> howto update continuously
> -------------------------
>
>                 Key: KAFKA-1530
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1530
>             Project: Kafka
>          Issue Type: Wish
>            Reporter: Stanislav Gilmulin
>            Assignee: Guozhang Wang
>            Priority: Minor
>              Labels: operating_manual, performance
>
> Hi,
>  
> Could I ask you a question about the Kafka update procedure?
> Is there a way to update software, which doesn't require service interruption 
> or lead to data losses?
> We can't stop message brokering during the update as we have a strict SLA.
>  
> Best regards
> Stanislav Gilmulin



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to