[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure
[ https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242772#comment-17242772 ] Guozhang Wang commented on KAFKA-10688: --- The main reason I did not go through the consumer route (i.e. KAFKA-3370) is that it does not help resolving the 2.b) case. If the application for whatever reason did not commit before shutting down, then it means the shutdown was not graceful (otherwise we should always commit). In this case, either reset to earliest and cause potential data duplicates, or reset to latest and cause potential data loss, seem not ideal. And hence I propose to still treat them as fatal cases. > Handle accidental truncation of repartition topics as exceptional failure > - > > Key: KAFKA-10688 > URL: https://issues.apache.org/jira/browse/KAFKA-10688 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: Guozhang Wang >Priority: Major > > Today we always handle InvalidOffsetException from the main consumer by the > resetting policy assuming they are for source topics. But repartition topics > are also source topics and should never be truncated and hence cause > InvalidOffsetException. > We should differentiate these repartition topics from external source topics > and treat the InvalidOffsetException from repartition topics as fatal and > close the whole application. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure
[ https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242571#comment-17242571 ] Matthias J. Sax commented on KAFKA-10688: - 1) As the topic is empty, it does not really matter, both "earliest" and "latest" would result in offset zero. 2b) Seems there might one corner case (maybe we could ignore this corner case though): the application for whatever reason did not commit offset but there was also no truncation – for this case, it seems ok to just reset to earliest? Btw: IIRC, we only set the consumer config to `none` iff there are topic with different policies atm. If there is no topic specific config (ie, only the global config) or if all topic specific configs and the global config are the same, we just pass it into the consumer. What I am wondering though is: why don't we just try to tackle KAFKA-3370 directly? Seems to be a good improvement. > Handle accidental truncation of repartition topics as exceptional failure > - > > Key: KAFKA-10688 > URL: https://issues.apache.org/jira/browse/KAFKA-10688 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: Guozhang Wang >Priority: Major > > Today we always handle InvalidOffsetException from the main consumer by the > resetting policy assuming they are for source topics. But repartition topics > are also source topics and should never be truncated and hence cause > InvalidOffsetException. > We should differentiate these repartition topics from external source topics > and treat the InvalidOffsetException from repartition topics as fatal and > close the whole application. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure
[ https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241992#comment-17241992 ] Guozhang Wang commented on KAFKA-10688: --- Had some more discussions with [~cadonna] about different scenarios, and I think we can potentially enlarge the scope of this ticket to include all the following cases: 1) When starting the application for the first time, the repartition is newly created. In this case we should set the starting offset on the repartition topics according to the global reset policy. 2) When restarting the application, where the repartition topic already exist and may have some data. In this case we would try to read the committed offset and start from there. 2.a) If the committed offset is already out of the range --- i.e. a truncation happens before restarting the application --- we should treat it as a fatal error. 2.b) if there is no committed offset, indicating that either the application was not gracefully shutdown before (since otherwise the committed offset should be found), or the committed offset is somehow lost. We should treat it as a fatal error. 3) During normal processing, suddenly the consumer found itself out of the range --- i.e. a truncation happens at the same time --- we should treat it as a fatal error. The challenge today is that we cannot easily distinguish case 1) from case 2) and 3), since the consumer would throw the same invalid offset exception and Streams would handle it universally. Instead of relying on consumer to improve (KAFKA-3370), we can do it at the Streams layer only, as the following: * Whenever we create the repartition topic, we commit an offset as 0 regardless to the global offset reset policy, since in either earliest or latest it should just be 0. * Whenever we get an invalid offset exception (note we still keep the consumer's configuration as `none`), we check if it is from the repartition topic, if yes we always treat it as fatal error; if not we use the reset policy on the corresponding source topic accordingly. > Handle accidental truncation of repartition topics as exceptional failure > - > > Key: KAFKA-10688 > URL: https://issues.apache.org/jira/browse/KAFKA-10688 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: Guozhang Wang >Priority: Major > > Today we always handle InvalidOffsetException from the main consumer by the > resetting policy assuming they are for source topics. But repartition topics > are also source topics and should never be truncated and hence cause > InvalidOffsetException. > We should differentiate these repartition topics from external source topics > and treat the InvalidOffsetException from repartition topics as fatal and > close the whole application. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure
[ https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228844#comment-17228844 ] Guozhang Wang commented on KAFKA-10688: --- Normally the repartition topic should never have an invalid offset after setting it initially, since the repartition topic's retention should be infinity and we only truncate it via the delete-records; this ticket is for guarding against abnormal cases e.g. if users accidentally truncated the repartition topics. > Handle accidental truncation of repartition topics as exceptional failure > - > > Key: KAFKA-10688 > URL: https://issues.apache.org/jira/browse/KAFKA-10688 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: Guozhang Wang >Priority: Major > > Today we always handle InvalidOffsetException from the main consumer by the > resetting policy assuming they are for source topics. But repartition topics > are also source topics and should never be truncated and hence cause > InvalidOffsetException. > We should differentiate these repartition topics from external source topics > and treat the InvalidOffsetException from repartition topics as fatal and > close the whole application. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure
[ https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228465#comment-17228465 ] Bruno Cadonna commented on KAFKA-10688: --- Maybe I misunderstood your previous comment. In your proposal in 1) and 2) aren't you proposing to reset repartition topics by using the global policy? When would a repartition topic not have a valid committed offset after an offset was committed for the first time (i.e. first commit after a fresh start of the Streams application)? Is not the fact that an repartitition topic does not have a valid committed offset enough to throw a fatal error? Why should we reset the repartition topics in point 1) and 2) in your proposal? > Handle accidental truncation of repartition topics as exceptional failure > - > > Key: KAFKA-10688 > URL: https://issues.apache.org/jira/browse/KAFKA-10688 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: Guozhang Wang >Priority: Major > > Today we always handle InvalidOffsetException from the main consumer by the > resetting policy assuming they are for source topics. But repartition topics > are also source topics and should never be truncated and hence cause > InvalidOffsetException. > We should differentiate these repartition topics from external source topics > and treat the InvalidOffsetException from repartition topics as fatal and > close the whole application. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure
[ https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227545#comment-17227545 ] Guozhang Wang commented on KAFKA-10688: --- No we don't actually :) In StreamThread we handle InvalidOffsetException from the main consumer (which consumes from repartition topics) by always trying to reset according to the policy. The reason is that we explicitly set the policy at consumer as `none` and hence have to do all the explicit resetting at the Streams layer itself. > Handle accidental truncation of repartition topics as exceptional failure > - > > Key: KAFKA-10688 > URL: https://issues.apache.org/jira/browse/KAFKA-10688 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: Guozhang Wang >Priority: Major > > Today we always handle InvalidOffsetException from the main consumer by the > resetting policy assuming they are for source topics. But repartition topics > are also source topics and should never be truncated and hence cause > InvalidOffsetException. > We should differentiate these repartition topics from external source topics > and treat the InvalidOffsetException from repartition topics as fatal and > close the whole application. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure
[ https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227236#comment-17227236 ] Bruno Cadonna commented on KAFKA-10688: --- [~guozhang], Thank you for the proposal. Shouldn't we not always throw a fatal error for an {{InvalidOffsetException}} on a repartition topic, since this should never happen? How do 1) and 2) differ? Could you please clarify? > Handle accidental truncation of repartition topics as exceptional failure > - > > Key: KAFKA-10688 > URL: https://issues.apache.org/jira/browse/KAFKA-10688 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: Guozhang Wang >Priority: Major > > Today we always handle InvalidOffsetException from the main consumer by the > resetting policy assuming they are for source topics. But repartition topics > are also source topics and should never be truncated and hence cause > InvalidOffsetException. > We should differentiate these repartition topics from external source topics > and treat the InvalidOffsetException from repartition topics as fatal and > close the whole application. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure
[ https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227080#comment-17227080 ] Guozhang Wang commented on KAFKA-10688: --- Without KAFKA-3370, then we have to implement the desired behavior at the streams layer itself. That is: 1) Upon task assignment, explicitly set the starting offset for the main consumer based on the per-topic / global reset policy. For repartition topics, the reset policy would be `latest`. 2) Upon task revive (for corrupted exception handling), do the same thing as 1). 3) During normal processing, if an InvalidOffsetException is thrown from main consumer, we differentiate these cases: 3.a) for source topics: log a warning and reset accordingly; 3.b) for repartition topics throw as fatal errors. We can potentially be more strict that we require all topics contains committed offset, if only some of them have committed positions then fail. But for extensibility I'm going to hold on doing that for now. > Handle accidental truncation of repartition topics as exceptional failure > - > > Key: KAFKA-10688 > URL: https://issues.apache.org/jira/browse/KAFKA-10688 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: Guozhang Wang >Priority: Major > > Today we always handle InvalidOffsetException from the main consumer by the > resetting policy assuming they are for source topics. But repartition topics > are also source topics and should never be truncated and hence cause > InvalidOffsetException. > We should differentiate these repartition topics from external source topics > and treat the InvalidOffsetException from repartition topics as fatal and > close the whole application. -- This message was sent by Atlassian Jira (v8.3.4#803005)