Hey Jun,

This is a very good example. After thinking through this in detail, I agree
that we need to commit offset with leader epoch in order to address this
example.

I think the remaining question is how to address the scenario that the
topic is deleted and re-created. One possible solution is to commit offset
with both the leader epoch and the metadata version. The logic and the
implementation of this solution does not require a new concept (e.g.
partition epoch) and it does not require any change to the message format
or leader epoch. It also allows us to order the metadata in a
straightforward manner which may be useful in the future. So it may be a
better solution than generating a random partition epoch every time we
create a partition. Does this sound reasonable?

Previously one concern with using the metadata version is that consumer
will be forced to refresh metadata even if metadata version is increased
due to topics that the consumer is not interested in. Now I realized that
this is probably not a problem. Currently client will refresh metadata
either due to InvalidMetadataException in the response from broker or due
to metadata expiry. The addition of the metadata version should increase
the overhead of metadata refresh caused by InvalidMetadataException. If
client refresh metadata due to expiry and it receives a metadata whose
version is lower than the current metadata version, we can reject the
metadata but still reset the metadata age, which essentially keep the
existing behavior in the client.

Thanks much,
Dong

Reply via email to