[
https://issues.apache.org/jira/browse/KAFKA-10357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176582#comment-17176582
]
Guozhang Wang commented on KAFKA-10357:
---------------------------------------
I've thought about relaying on the committed offsets, but that is not 100%
either since it is possible that the commit has not been sent, while some data
has been sent to the repartition topics and hence lost due to topic deletion. I
agree that KAFKA-3370 is not theoretically sound, but I think that is
sufficient for the near term. For longer term solution I feel we'd have to push
this to user's control (via {{#initialize}} for example).
> Handle accidental deletion of repartition-topics as exceptional failure
> -----------------------------------------------------------------------
>
> Key: KAFKA-10357
> URL: https://issues.apache.org/jira/browse/KAFKA-10357
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Reporter: Guozhang Wang
> Assignee: Bruno Cadonna
> Priority: Major
>
> Repartition topics are both written by Stream's producer and read by Stream's
> consumer, so when they are accidentally deleted both clients may be notified.
> But in practice the consumer would react to it much quicker than producer
> since the latter has a delivery timeout expiration period (see
> https://issues.apache.org/jira/browse/KAFKA-10356). When consumer reacts to
> it, it will re-join the group since metadata changed and during the triggered
> rebalance it would auto-recreate the topic silently and continue, causing
> data lost silently.
> One idea, is to only create all repartition topics *once* in the first
> rebalance and not auto-create them any more in future rebalances, instead it
> would be treated similar as INCOMPLETE_SOURCE_TOPIC_METADATA error code
> (https://issues.apache.org/jira/browse/KAFKA-10355).
> The challenge part would be, how to determine if it is the first-ever
> rebalance, and there are several wild ideas I'd like to throw out here:
> 1) change the thread state transition diagram so that STARTING state would
> not transit to PARTITION_REVOKED but only to PARTITION_ASSIGNED, then in the
> assign function we can check if the state is still in CREATED and not RUNNING.
> 2) augment the subscriptionInfo to encode whether or not this is the first
> time ever rebalance.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)