[jira] [Commented] (KAFKA-10357) Handle accidental deletion of repartition-topics as exceptional failure

Sophie Blee-Goldman (Jira) Wed, 12 Aug 2020 11:46:09 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-10357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17176535#comment-17176535
 ]


Sophie Blee-Goldman commented on KAFKA-10357:
---------------------------------------------

I think the elegant way to shutdown the whole application is pretty 
straightforward, for that we can just trigger a rebalance and encode an error 
(like we do for missing source topics,  but less silently). The rest, I'm not 
so sure. If we want to solve this "right away" then breaking compatibility 
isn't really an option; if it can wait for 3.0 then the 
"KafkaStreams#initialize" type solution is on the table.

The KAFKA-3370 idea is intriguing but also doesn't seem perfectly safe. Maybe 
we first need to decide if it's acceptable to solve this problem for only 99% 
of cases (or whatever number less than 100).

On the other hand, we just need some way to infer whether the app is new or not 
from some kind of persisted information. Can we leverage the committed offsets 
somehow? It seems like if the repartition topics don't exist but the group has 
committed offsets for them, then they must have been deleted

> Handle accidental deletion of repartition-topics as exceptional failure
> -----------------------------------------------------------------------
>
>                 Key: KAFKA-10357
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10357
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Guozhang Wang
>            Assignee: Bruno Cadonna
>            Priority: Major
>
> Repartition topics are both written by Stream's producer and read by Stream's 
> consumer, so when they are accidentally deleted both clients may be notified. 
> But in practice the consumer would react to it much quicker than producer 
> since the latter has a delivery timeout expiration period (see 
> https://issues.apache.org/jira/browse/KAFKA-10356). When consumer reacts to 
> it, it will re-join the group since metadata changed and during the triggered 
> rebalance it would auto-recreate the topic silently and continue, causing 
> data lost silently. 
> One idea, is to only create all repartition topics *once* in the first 
> rebalance and not auto-create them any more in future rebalances, instead it 
> would be treated similar as INCOMPLETE_SOURCE_TOPIC_METADATA error code 
> (https://issues.apache.org/jira/browse/KAFKA-10355).
> The challenge part would be, how to determine if it is the first-ever 
> rebalance, and there are several wild ideas I'd like to throw out here:
> 1) change the thread state transition diagram so that STARTING state would 
> not transit to PARTITION_REVOKED but only to PARTITION_ASSIGNED, then in the 
> assign function we can check if the state is still in CREATED and not RUNNING.
> 2) augment the subscriptionInfo to encode whether or not this is the first 
> time ever rebalance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-10357) Handle accidental deletion of repartition-topics as exceptional failure

Reply via email to