I think the question of the default broker level configs is a good one. I don't think we need to touch the min.isr config or the replication factor to satisfy 'exactly-once' going by the definition laid out earlier. On the broker side, I think the only thing we should change is to disable unclean leader election.
As I mentioned earlier, the guarantee I am proposing is that 'every acknowledged message appears in the log exactly once, in order per partition'. Now, if we have replication-factor=1, the acks setting makes no difference. In this case, there is no chance of leader failover and you are signing up for low availability. However, in the absence of losing the disk (and hence the actual log), every acknowledged message will appear in the log exactly once in order when the log is available. Now, assume we have disabled unclean leader election and have replication factor > 1. With an acks=1 setting, you can lose acknowledged messages during leader failover (which are not altogether rare). Particularly, the broker can fail after acknowledging but before replication occurs. The other replicas would still be in the ISR but without the acknowledged message, causing it to be lost. With acks=all, the above scenario is not possible. The min.isr setting only really helps protect your data in the event of hard failures where you actually lose a disk. If you have min.isr=1, you could keep producing even though only the leader is up. In this period, you lose availability if the leader fails. And you lose data if the leader's disk fails. However, once the replicas come online, they can only become leaders once they are in sync, so you can't lose data due to leader failover and bad truncation. So, enable.idempotence=true, acks=all, retries=MAX_INT, and unclean.leader.election.enable=false, are sufficient defaults to give strong delivery guarantees, though not strong durability guarantees. I think it is important to distinguish between the two. Some specific responses in line: >From users' perspective, when idempotence=true and > max.in.flight.requests.per.connection > 0, ideally what acks=1 should > really mean is that "as long as there is no hardware failure, my message is > sent exactly once". Do you think this semantic is good enough as a default > configuration to ship? It is unfortunate this statement is not true today > as even when we do leader migration without any broker failure, the leader > will naively truncate the data that has not been replicated. It is a long > existing issue and we should try to fix that. I don't see what's wrong here. This is exactly what you should expect with acks=1, and having stronger behavior for acks=1 would be extremely hard (if not impossible) to achieve. Making sure acks=all and disabling unclean leader election is the right fix if you want stronger semantics where data integrity is guaranteed to be maintained across leader failover. For the max.in.flight.requests.per.connection, can you elaborate a little > on "Given the nature of the idempotence feature, we have to bound it.". > What is the concern here? It seems that when nothing wrong happens, > pipelining should just work. And the memory is bounded by the memory buffer > pool anyways. Sure one has to resend all the subsequent batches if one > batch is out of sequence, but that should be rare and we probably should > not optimize for that. The problem is described in https://issues.apache.org/jira/browse/KAFKA-5494 . Specifically, when you have max.in.flight=N, you need to retain the offset/sequence/timestamp of the last N appended batches on the broker because you could get duplicates of any of those. That's why we have to bound the value of max.inflight when you enable idempotence, otherwise there is no way to handle duplicates correctly and efficiently on the broker. Thanks, Apurva On Sat, Aug 12, 2017 at 2:32 AM, Ismael Juma <ism...@juma.me.uk> wrote: > Hi all, > > I will send a more detail email later, some quick comments: > > 1. It's unlikely that defaults will suit everyone. I think the question is: > what is the most likely configuration for a typical Kafka user _today_? > Kafka's usage is growing well beyond its original use cases and correctness > is often more important than achieving the maximum possible performance. > Large scale users have the time, knowledge and resources to tweak settings > so that they can achieve the latter. 1.0.0 is a good time to be thinking > about this. > > 2. This KIP focuses on producer configs. A few people (in the discussion > thread and offline) have asked whether the default min.insync.replicas > should be changed as well. I think that's a fair question and we should > address it in the KIP. > > 3. It is true that the current configs are generally too complicated. The > notion of "profiles" has been raised previously where one could state their > intent and the configs would be set to match. Becket suggested a semantics > config (exactly_once, at_most_once, at_least_once), which is similar to an > existing Kafka Streams config. The user could also specify if they would > like to optimise for latency or throughput, for example (this has been > raised before). I think we should pursue these ideas in a separate KIP. > > One comment inline. > > On Sat, Aug 12, 2017 at 1:26 AM, Becket Qin <becket....@gmail.com> wrote: > > > From users' perspective, when idempotence=true and > > max.in.flight.requests.per.connection > 0, ideally what acks=1 should > > really mean is that "as long as there is no hardware failure, my message > is > > sent exactly once". > > > I don't understand this statement. Hardware failures are common and we > should be striving for defaults that work correctly under failure. > > Ismael >