Guozhang Wang created KAFKA-10357:
-------------------------------------

             Summary: Handle accidental deletion of repartition-topics as 
exceptional failure
                 Key: KAFKA-10357
                 URL: https://issues.apache.org/jira/browse/KAFKA-10357
             Project: Kafka
          Issue Type: Improvement
          Components: streams
            Reporter: Guozhang Wang
            Assignee: Bruno Cadonna


Repartition topics are both written by Stream's producer and read by Stream's 
consumer, so when they are accidentally deleted both clients may be notified. 
But in practice the consumer would react to it much quicker than producer since 
the latter has a delivery timeout expiration period (see 
https://issues.apache.org/jira/browse/KAFKA-10356). When consumer reacts to it, 
it will re-join the group since metadata changed and during the triggered 
rebalance it would auto-recreate the topic silently and continue, causing data 
lost silently. 

One idea, is to only create all repartition topics *once* in the first 
rebalance and not auto-create them any more in future rebalances, instead it 
would be treated similar as INCOMPLETE_SOURCE_TOPIC_METADATA error code 
(https://issues.apache.org/jira/browse/KAFKA-10355).

The challenge part would be, how to determine if it is the first-ever 
rebalance, and there are several wild ideas I'd like to throw out here:

1) change the thread state transition diagram so that CREATED state would not 
transit to PARTITION_REVOKED but only to PARTITION_ASSIGNED, then in the assign 
function we can check if the state is still in CREATED and not RUNNING.

2) augment the subscriptionInfo to encode whether or not this is the first time 
ever rebalance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to