Author: junrao
Date: Sun Jan 26 17:35:33 2014
New Revision: 1561522

URL: http://svn.apache.org/r1561522
Log:
minor change to unclean leader election

Modified:
    kafka/site/08/design.html

Modified: kafka/site/08/design.html
URL: 
http://svn.apache.org/viewvc/kafka/site/08/design.html?rev=1561522&r1=1561521&r2=1561522&view=diff
==============================================================================
--- kafka/site/08/design.html (original)
+++ kafka/site/08/design.html Sun Jan 26 17:35:33 2014
@@ -213,15 +213,15 @@ Another important design distinction is 
 
 <h4>Unclean leader election: What if they all die?</h4>
 
-Note that Kafka's guarantee with respect to data loss is predicated on at 
least on replica remaining in sync. If all the nodes replicating a partition 
die, this guarantee no longer holds.
+Note that Kafka's guarantee with respect to data loss is predicated on at 
least on replica remaining in sync. If the current leader dies and no remaining 
live replicas are in the ISR, this guarantee no longer holds. If your have more 
than one replica assigned to a partiiton, this should be relatively rare since 
at least two brokers have to fail for this to happen.
 <p>
-However a practical system needs to do something reasonable when all the 
replicas die. If you are unlucky enough to have this occur, it is important to 
consider what will happen. There are two behaviors that could be implemented:
+However a practical system needs to do something reasonable when all in-sync 
replicas die. If you are unlucky enough to have this occur, it is important to 
consider what will happen. There are two behaviors that could be implemented:
 <ol>
        <li>Wait for a replica in the ISR to come back to life and choose this 
replica as the leader (hopefully it still has all its data).
        <li>Choose the first replica (not necessarily in the ISR) that comes 
back to life as the leader.
 </ol>
 <p>
-This is a simple tradeoff between availability and consistency. If we wait for 
replicas in the ISR, then we will remain unavailable as long as those replicas 
are down. If such replicas were destroyed or their data was lost, then we are 
permanently down. If, on the other hand, a non-in-sync replica comes back to 
life and we allow it to become leader, then its log becomes the source of truth 
even though it is not guaranteed to have every committed message. In our 
current release we choose the second strategy and favor choosing a potentially 
inconsistent replica when all replicas in the ISR are dead. In the future, we 
would like to make this configurable to better support use cases where downtime 
is preferable to inconsistency.
+This is a simple tradeoff between availability and consistency. If we wait for 
replicas in the ISR, then we will remain unavailable as long as those replicas 
are down. If such replicas were destroyed or their data was lost, then we are 
permanently down. If, on the other hand, a non-in-sync replica comes back to 
life and we allow it to become the leader, then its log becomes the source of 
truth even though it is not guaranteed to have every committed message. In our 
current release we choose the second strategy and favor choosing a potentially 
inconsistent replica when all replicas in the ISR are dead. In the future, we 
would like to make this configurable to better support use cases where downtime 
is preferable to inconsistency.
 <p>
 This dilemma is not specific to Kafka. It exists in any quorum-based scheme. 
For example in a majority voting scheme, if a majority of servers suffer a 
permanent failure, then you must either choose to lose 100% of your data or 
violate consistency by taking what remains on an existing server as your new 
source of truth.
 


Reply via email to