[
https://issues.apache.org/jira/browse/KAFKA-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761547#comment-13761547
]
Justin SB commented on KAFKA-1050:
----------------------------------
You're definitely right Jay that I'm conflating a few ideas. I may also be
more deeply confused :-)
#2 is definitely the really unacceptable scenario in my book. For my use case,
I can't allow a non-ISR to become the leader, because that is certain to
involve data loss (by definition, I think).
You're right that #1 is just ensuring that we can still tolerate failures, when
we are imposing #2. Without it, we'd likely get into a scenario where e.g. only
one node was alive, and if we allowed it to make progress then we wouldn't be
able to recover from failure of that node.
I think you're right, that I'm really trying to get majority vote semantics. I
don't see why I'd be intentionally failing successful writes though. Does the
leader count in "request.required.acks"? If I can write to 3/5, I do want to
treat that as a success. I also want 2/5 to be considered a failure. I think
I get that by setting request.required.acks=3, though maybe I need to set
request.required.acks=2 if the leader is not counted as an ack. And maybe I'm
just reading the ack-counting code wrong generally...
It's also occurred to me that I would probably need to add rollback, as the
current Kafka model wouldn't ever rollback a write on the leader because of a
lack of sufficient acks (it would just remove the replicas instead)?
> Support for "no data loss" mode
> -------------------------------
>
> Key: KAFKA-1050
> URL: https://issues.apache.org/jira/browse/KAFKA-1050
> Project: Kafka
> Issue Type: Task
> Reporter: Justin SB
>
> I'd love to use Apache Kafka, but for my application data loss is not
> acceptable. Even at the expense of availability (i.e. I need C not A in CAP).
> I think there are two things that I need to change to get a quorum model:
> 1) Make sure I set request.required.acks to 2 (for a 3 node cluster) or 3
> (for a 5 node cluster) on every request, so that I can only write if a quorum
> is active.
> 2) Prevent the behaviour where a non-ISR can become the leader if all ISRs
> die. I think this is as easy as tweaking
> core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala,
> essentially to throw an exception around line 64 in the "data loss" case.
> I haven't yet implemented / tested this. I'd love to get some input from the
> Kafka-experts on whether my plan is:
> (a) correct - will this work?
> (b) complete - have I missed any cases?
> (c) recommended - is this a terrible idea :-)
> Thanks for any pointers!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira