I think the question of the default broker level configs is a good one. I
don't think we need to touch the min.isr config or the replication factor
to satisfy 'exactly-once' going by the definition laid out earlier. On the
broker side, I think the only thing we should change is to disable unclean
leader election.

As I mentioned earlier, the guarantee I am proposing is that 'every
acknowledged message appears in the log exactly once, in order per
partition'.

Now, if we have replication-factor=1, the acks setting makes no difference.
In this case, there is no chance of leader failover and you are signing up
for low availability. However, in the absence of losing the disk (and hence
the actual log), every acknowledged message will appear in the log exactly
once in order when the log is available.

Now, assume we have disabled unclean leader election and have replication
factor > 1.

With an acks=1 setting, you can lose acknowledged messages during leader
failover (which are not altogether rare). Particularly, the broker can fail
after acknowledging but before replication occurs. The other replicas would
still be in the ISR but without the acknowledged message, causing it to be
lost.

With acks=all, the above scenario is not possible.

The min.isr setting only really helps protect your data in the event of
hard failures where you actually lose a disk. If you have min.isr=1, you
could keep producing even though only the leader is up. In this period, you
lose availability if the leader fails. And you lose data if the leader's
disk fails. However, once the replicas come online, they can only become
leaders once they are in sync, so you can't lose data due to leader
failover and bad truncation.

So, enable.idempotence=true, acks=all, retries=MAX_INT, and
unclean.leader.election.enable=false, are sufficient defaults to give
strong delivery guarantees, though not strong durability guarantees. I
think it is important to distinguish between the two.

Some specific responses in line:

>From users' perspective, when idempotence=true and
> max.in.flight.requests.per.connection > 0, ideally what acks=1 should
> really mean is that "as long as there is no hardware failure, my message is
> sent exactly once". Do you think this semantic is good enough as a default
> configuration to ship? It is unfortunate this statement is not true today
> as even when we do leader migration without any broker failure, the leader
> will naively truncate the data that has not been replicated. It is a long
> existing issue and we should try to fix that.


I don't see what's wrong here. This is exactly what you should expect with
acks=1, and having stronger behavior for acks=1 would be extremely hard (if
not impossible) to achieve. Making sure acks=all and disabling unclean
leader election is the right fix if you want stronger semantics where data
integrity is guaranteed to be maintained across leader failover.

For the max.in.flight.requests.per.connection, can you elaborate a little
> on "Given the nature of the idempotence feature, we have to bound it.".
> What is the concern here? It seems that when nothing wrong happens,
> pipelining should just work. And the memory is bounded by the memory buffer
> pool anyways. Sure one has to resend all the subsequent batches if one
> batch is out of sequence, but that should be rare and we probably should
> not optimize for that.


The problem is described in https://issues.apache.org/jira/browse/KAFKA-5494
.

Specifically, when you have max.in.flight=N, you need to retain the
offset/sequence/timestamp of the last N appended batches on the broker
because you could get duplicates of any of those. That's why we have to
bound the value of max.inflight when you enable idempotence, otherwise
there is no way to handle duplicates correctly and efficiently on the
broker.

Thanks,
Apurva


On Sat, Aug 12, 2017 at 2:32 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi all,
>
> I will send a more detail email later, some quick comments:
>
> 1. It's unlikely that defaults will suit everyone. I think the question is:
> what is the most likely configuration for a typical Kafka user _today_?
> Kafka's usage is growing well beyond its original use cases and correctness
> is often more important than achieving the maximum possible performance.
> Large scale users have the time, knowledge and resources to tweak settings
> so that they can achieve the latter. 1.0.0 is a good time to be thinking
> about this.
>
> 2. This KIP focuses on producer configs. A few people (in the discussion
> thread and offline) have asked whether the default min.insync.replicas
> should be changed as well. I think that's a fair question and we should
> address it in the KIP.
>
> 3. It is true that the current configs are generally too complicated. The
> notion of "profiles" has been raised previously where one could state their
> intent and the configs would be set to match. Becket suggested a semantics
> config (exactly_once, at_most_once, at_least_once), which is similar to an
> existing Kafka Streams config. The user could also specify if they would
> like to optimise for latency or throughput, for example (this has been
> raised before). I think we should pursue these ideas in a separate KIP.
>
> One comment inline.
>
> On Sat, Aug 12, 2017 at 1:26 AM, Becket Qin <becket....@gmail.com> wrote:
>
> > From users' perspective, when idempotence=true and
> > max.in.flight.requests.per.connection > 0, ideally what acks=1 should
> > really mean is that "as long as there is no hardware failure, my message
> is
> > sent exactly once".
>
>
> I don't understand this statement. Hardware failures are common and we
> should be striving for defaults that work correctly under failure.
>
> Ismael
>

Reply via email to