[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure

2020-12-02 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242772#comment-17242772
 ] 

Guozhang Wang commented on KAFKA-10688:
---

The main reason I did not go through the consumer route (i.e. KAFKA-3370) is 
that it does not help resolving the 2.b) case.

If the application for whatever reason did not commit before shutting down, 
then it means the shutdown was not graceful (otherwise we should always 
commit). In this case, either reset to earliest and cause potential data 
duplicates, or reset to latest and cause potential data loss, seem not ideal. 
And hence I propose to still treat them as fatal cases.

> Handle accidental truncation of repartition topics as exceptional failure
> -
>
> Key: KAFKA-10688
> URL: https://issues.apache.org/jira/browse/KAFKA-10688
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>
> Today we always handle InvalidOffsetException from the main consumer by the 
> resetting policy assuming they are for source topics. But repartition topics 
> are also source topics and should never be truncated and hence cause 
> InvalidOffsetException.
> We should differentiate these repartition topics from external source topics 
> and treat the InvalidOffsetException from repartition topics as fatal and 
> close the whole application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure

2020-12-02 Thread Matthias J. Sax (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17242571#comment-17242571
 ] 

Matthias J. Sax commented on KAFKA-10688:
-

1) As the topic is empty, it does not really matter, both "earliest" and 
"latest" would result in offset zero.

2b) Seems there might one corner case (maybe we could ignore this corner case 
though): the application for whatever reason did not commit offset but there 
was also no truncation – for this case, it seems ok to just reset to earliest?

Btw: IIRC, we only set the consumer config to `none` iff there are topic with 
different policies atm. If there is no topic specific config (ie, only the 
global config) or if all topic specific configs and the global config are the 
same, we just pass it into the consumer.

What I am wondering though is: why don't we just try to tackle KAFKA-3370 
directly? Seems to be a good improvement.

> Handle accidental truncation of repartition topics as exceptional failure
> -
>
> Key: KAFKA-10688
> URL: https://issues.apache.org/jira/browse/KAFKA-10688
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>
> Today we always handle InvalidOffsetException from the main consumer by the 
> resetting policy assuming they are for source topics. But repartition topics 
> are also source topics and should never be truncated and hence cause 
> InvalidOffsetException.
> We should differentiate these repartition topics from external source topics 
> and treat the InvalidOffsetException from repartition topics as fatal and 
> close the whole application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure

2020-12-01 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241992#comment-17241992
 ] 

Guozhang Wang commented on KAFKA-10688:
---

Had some more discussions with [~cadonna] about different scenarios, and I 
think we can potentially enlarge the scope of this ticket to include all the 
following cases:

1) When starting the application for the first time, the repartition is newly 
created. In this case we should set the starting offset on the repartition 
topics according to the global reset policy.

2) When restarting the application, where the repartition topic already exist 
and may have some data. In this case we would try to read the committed offset 
and start from there.
2.a) If the committed offset is already out of the range --- i.e. a 
truncation happens before restarting the application --- we should treat it as 
a fatal error.
2.b) if there is no committed offset, indicating that either the 
application was not gracefully shutdown before (since otherwise the committed 
offset should be found), or the committed offset is somehow lost. We should 
treat it as a fatal error.

3) During normal processing, suddenly the consumer found itself out of the 
range --- i.e. a truncation happens at the same time --- we should treat it as 
a fatal error.

The challenge today is that we cannot easily distinguish case 1) from case 2) 
and 3), since the consumer would throw the same invalid offset exception and 
Streams would handle it universally. Instead of relying on consumer to improve 
(KAFKA-3370), we can do it at the Streams layer only, as the following:

* Whenever we create the repartition topic, we commit an offset as 0 regardless 
to the global offset reset policy, since in either earliest or latest it should 
just be 0.
* Whenever we get an invalid offset exception (note we still keep the 
consumer's configuration as `none`), we check if it is from the repartition 
topic, if yes we always treat it as fatal error; if not we use the reset policy 
on the corresponding source topic accordingly.

> Handle accidental truncation of repartition topics as exceptional failure
> -
>
> Key: KAFKA-10688
> URL: https://issues.apache.org/jira/browse/KAFKA-10688
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>
> Today we always handle InvalidOffsetException from the main consumer by the 
> resetting policy assuming they are for source topics. But repartition topics 
> are also source topics and should never be truncated and hence cause 
> InvalidOffsetException.
> We should differentiate these repartition topics from external source topics 
> and treat the InvalidOffsetException from repartition topics as fatal and 
> close the whole application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure

2020-11-09 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228844#comment-17228844
 ] 

Guozhang Wang commented on KAFKA-10688:
---

Normally the repartition topic should never have an invalid offset after 
setting it initially, since the repartition topic's retention should be 
infinity and we only truncate it via the delete-records; this ticket is for 
guarding against abnormal cases e.g. if users accidentally truncated the 
repartition topics.

> Handle accidental truncation of repartition topics as exceptional failure
> -
>
> Key: KAFKA-10688
> URL: https://issues.apache.org/jira/browse/KAFKA-10688
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>
> Today we always handle InvalidOffsetException from the main consumer by the 
> resetting policy assuming they are for source topics. But repartition topics 
> are also source topics and should never be truncated and hence cause 
> InvalidOffsetException.
> We should differentiate these repartition topics from external source topics 
> and treat the InvalidOffsetException from repartition topics as fatal and 
> close the whole application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure

2020-11-09 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228465#comment-17228465
 ] 

Bruno Cadonna commented on KAFKA-10688:
---

Maybe I misunderstood your previous comment.

In your proposal in 1) and 2) aren't you  proposing to reset repartition topics 
by using the global policy?

When would a repartition topic not have a valid committed offset after an 
offset was committed for the first time (i.e. first commit after a fresh start 
of the Streams application)?

Is not the fact that an repartitition topic does not have a valid committed 
offset enough to throw a fatal error? Why should we reset the repartition 
topics in point  1) and 2) in your proposal? 

> Handle accidental truncation of repartition topics as exceptional failure
> -
>
> Key: KAFKA-10688
> URL: https://issues.apache.org/jira/browse/KAFKA-10688
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>
> Today we always handle InvalidOffsetException from the main consumer by the 
> resetting policy assuming they are for source topics. But repartition topics 
> are also source topics and should never be truncated and hence cause 
> InvalidOffsetException.
> We should differentiate these repartition topics from external source topics 
> and treat the InvalidOffsetException from repartition topics as fatal and 
> close the whole application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure

2020-11-06 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227545#comment-17227545
 ] 

Guozhang Wang commented on KAFKA-10688:
---

No we don't actually :) In StreamThread we handle InvalidOffsetException from 
the main consumer (which consumes from repartition topics) by always trying to 
reset according to the policy. The reason is that we explicitly set the policy 
at consumer as `none` and hence have to do all the explicit resetting at the 
Streams layer itself.

> Handle accidental truncation of repartition topics as exceptional failure
> -
>
> Key: KAFKA-10688
> URL: https://issues.apache.org/jira/browse/KAFKA-10688
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>
> Today we always handle InvalidOffsetException from the main consumer by the 
> resetting policy assuming they are for source topics. But repartition topics 
> are also source topics and should never be truncated and hence cause 
> InvalidOffsetException.
> We should differentiate these repartition topics from external source topics 
> and treat the InvalidOffsetException from repartition topics as fatal and 
> close the whole application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure

2020-11-06 Thread Bruno Cadonna (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227236#comment-17227236
 ] 

Bruno Cadonna commented on KAFKA-10688:
---

[~guozhang], Thank you for the proposal.

Shouldn't we not always throw a fatal error for an {{InvalidOffsetException}} 
on a repartition topic, since this should never happen? How do 1) and 2) 
differ? Could you please clarify? 

> Handle accidental truncation of repartition topics as exceptional failure
> -
>
> Key: KAFKA-10688
> URL: https://issues.apache.org/jira/browse/KAFKA-10688
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>
> Today we always handle InvalidOffsetException from the main consumer by the 
> resetting policy assuming they are for source topics. But repartition topics 
> are also source topics and should never be truncated and hence cause 
> InvalidOffsetException.
> We should differentiate these repartition topics from external source topics 
> and treat the InvalidOffsetException from repartition topics as fatal and 
> close the whole application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10688) Handle accidental truncation of repartition topics as exceptional failure

2020-11-05 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227080#comment-17227080
 ] 

Guozhang Wang commented on KAFKA-10688:
---

Without KAFKA-3370, then we have to implement the desired behavior at the 
streams layer itself. That is:

1) Upon task assignment, explicitly set the starting offset for the main 
consumer based on the per-topic / global reset policy. For repartition topics, 
the reset policy would be `latest`.

2) Upon task revive (for corrupted exception handling), do the same thing as 1).

3) During normal processing, if an InvalidOffsetException is thrown from main 
consumer, we differentiate these cases: 3.a) for source topics: log a warning 
and reset accordingly; 3.b) for repartition topics throw as fatal errors.

We can potentially be more strict that we require all topics contains committed 
offset, if only some of them have committed positions then fail. But for 
extensibility I'm going to hold on doing that for now.

> Handle accidental truncation of repartition topics as exceptional failure
> -
>
> Key: KAFKA-10688
> URL: https://issues.apache.org/jira/browse/KAFKA-10688
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>
> Today we always handle InvalidOffsetException from the main consumer by the 
> resetting policy assuming they are for source topics. But repartition topics 
> are also source topics and should never be truncated and hence cause 
> InvalidOffsetException.
> We should differentiate these repartition topics from external source topics 
> and treat the InvalidOffsetException from repartition topics as fatal and 
> close the whole application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)