[jira] [Commented] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered
[ https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605945#comment-17605945 ] Guozhang Wang commented on KAFKA-10575: --- I've just created a new KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-869%3A+Improve+Streams+State+Restoration+Visibility > StateRestoreListener#onRestoreEnd should always be triggered > > > Key: KAFKA-10575 > URL: https://issues.apache.org/jira/browse/KAFKA-10575 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: highluck >Priority: Major > > Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete > the restoration of an active task and transit it to the running state. > However the restoration can also be stopped when the restoring task gets > closed (because it gets migrated to another client, for example). We should > also trigger the callback indicating its progress when the restoration > stopped in any scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered
[ https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605911#comment-17605911 ] Guozhang Wang commented on KAFKA-10575: --- Hello [~nicktelford] thanks for your inputs! Yes I'm now thinking about introducing a new API to the `StateRestoreListener` for the paused scenarios, and to create a KIP for that new API as well as a couple correlating metrics changes that will be introduced by KAFKA-10199. Regarding the TaskStateChangeListener, I think it worth a separate discussion thread for its own scope --- personally I think only very advanced users would be leverage on this since {{Tasks}} are a concept that Streams library wants to more or less abstract away from common users: they should not worry too much about the unit of parallelism afterall. > StateRestoreListener#onRestoreEnd should always be triggered > > > Key: KAFKA-10575 > URL: https://issues.apache.org/jira/browse/KAFKA-10575 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: highluck >Priority: Major > > Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete > the restoration of an active task and transit it to the running state. > However the restoration can also be stopped when the restoring task gets > closed (because it gets migrated to another client, for example). We should > also trigger the callback indicating its progress when the restoration > stopped in any scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered
[ https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605885#comment-17605885 ] Nicholas Telford commented on KAFKA-10575: -- On a related note, based on what [~ableegoldman] said, it might be a good idea to introduce a {{TaskStateChangeListener}}, which is notified of changes to a Task state. What do you think? > StateRestoreListener#onRestoreEnd should always be triggered > > > Key: KAFKA-10575 > URL: https://issues.apache.org/jira/browse/KAFKA-10575 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: highluck >Priority: Major > > Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete > the restoration of an active task and transit it to the running state. > However the restoration can also be stopped when the restoring task gets > closed (because it gets migrated to another client, for example). We should > also trigger the callback indicating its progress when the restoration > stopped in any scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered
[ https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605878#comment-17605878 ] Nicholas Telford commented on KAFKA-10575: -- Hi [~guozhang], I ran in to this issue today and have some thoughts on it. I agree that {{onRestoreEnd}} currently implies that the restoration _completed_, since that's what the documentation says. I suggest we add a new method, {{StateRestoreListener#onRestoreAbort}} (or {{onRestoreSuspend}} or {{onRestorePaused}}, etc.), which handles the case that restoration was stopped before it could complete (i.e. because the {{Task}} was closed. It should be enough to simply call this new method in {{StoreChangelogReader#unregister}}, which is called when {{Task}}s are closed/migrated. For backwards compatibility, this new method should have a {{default}} implementation in the {{StateRestoreListener}} interface that is a no-op. What do you think? And since this involves adding a new method, do we need a KIP for this? > StateRestoreListener#onRestoreEnd should always be triggered > > > Key: KAFKA-10575 > URL: https://issues.apache.org/jira/browse/KAFKA-10575 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: highluck >Priority: Major > > Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete > the restoration of an active task and transit it to the running state. > However the restoration can also be stopped when the restoring task gets > closed (because it gets migrated to another client, for example). We should > also trigger the callback indicating its progress when the restoration > stopped in any scenarios. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered
[ https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553806#comment-17553806 ] Guozhang Wang commented on KAFKA-10575: --- I took another look at this ticket while working on KAFKA-10199 in parallel, and here are some updates: 1. I could confirm that today we only call `onRestoreEnd` for case 1 above, and for case 2/3 from [~ableegoldman] we do not (in fact case 3) is just a special case of case 2) since we would first transit to CLOSED anyways). 2. On a second thought, there may be different group of users who were anticipating the semantics of such callbacks, for example: a) The original complaint that drives this ticket, is based on the anticipation that each `onRestoreStart` would always be paired with an `onRestoreEnd`. This is not actually the case because of case 2/3 above. b) Others may anticipate that `onRestoreEnd` is only triggered when the restoration is actually completed. In fact this is what we explicitly stated in the javadocs. So, if we just call `onRestoreEnd` on case 2/3) above, we may make users in a) happier but we would break compatibilities of users in b). In addition, since given the same topic-partition, and store names, there might be multiple restoration process happening at the same time e.g. when there are standby replicas, it's not very straight-forward trying to pair each one of `onRestoreStart` with a unique `onRestoreEnd`. With those thoughts, I'm now leaning towards not just calling `onRestoreEnd` for case 2/3), but instead introduce a new API e.g. `onRestorePaused` for case 2/3), plus also document clearly that not every `onRestoreStart` would be paired exactly with an `onRestoreEnd/Paused` to reduce user's unrealistic anticipations. Thoughts? > StateRestoreListener#onRestoreEnd should always be triggered > > > Key: KAFKA-10575 > URL: https://issues.apache.org/jira/browse/KAFKA-10575 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: highluck >Priority: Major > > Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete > the restoration of an active task and transit it to the running state. > However the restoration can also be stopped when the restoring task gets > closed (because it gets migrated to another client, for example). We should > also trigger the callback indicating its progress when the restoration > stopped in any scenarios. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered
[ https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308931#comment-17308931 ] A. Sophie Blee-Goldman commented on KAFKA-10575: [~high.lee] this ticket is aimed at making sure the #onRestoreEnd callback is always invoked (if it exists) any time a task ends ongoing restoration. Specifically, we probably want to invoke it in all of these scenarios: # transition from RESTORING to RUNNING: ie, restoration has actually completed # transition from RESTORING to CLOSED: this one might be a bit tricky since technically a closing task must transition as RESTORING -> SUSPENDED -> CLOSED, but we do not want to invoke this callback if it's going to be resumed after suspension instead of closed. However a suspended task is only ever resumed when following EAGER rebalancing, which is likely not used in the latest versions and will be removed soon anyways, so maybe it's fine to just invoke it during the RESTORING -> SUSPENDED transition -- WDYT [~guozhang]? Also note, this transition will cover the case of a restoring task that hits TaskCorrupted and must be revived # RESTORING active task is recycled from active to standby As of today I believe we only trigger this callback in the 1st case, and possibly the 3rd. So this ticket would encompass confirming that we do this for recycled tasks and implementing it for closing tasks > StateRestoreListener#onRestoreEnd should always be triggered > > > Key: KAFKA-10575 > URL: https://issues.apache.org/jira/browse/KAFKA-10575 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: highluck >Priority: Major > > Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete > the restoration of an active task and transit it to the running state. > However the restoration can also be stopped when the restoring task gets > closed (because it gets migrated to another client, for example). We should > also trigger the callback indicating its progress when the restoration > stopped in any scenarios. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered
[ https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308411#comment-17308411 ] highluck commented on KAFKA-10575: -- [~ableegoldman] Yes, I am willing to work. However, I am wondering what to do..! Would it be okay if I could tell you the direction you would like more about the ticket?! > StateRestoreListener#onRestoreEnd should always be triggered > > > Key: KAFKA-10575 > URL: https://issues.apache.org/jira/browse/KAFKA-10575 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: highluck >Priority: Major > > Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete > the restoration of an active task and transit it to the running state. > However the restoration can also be stopped when the restoring task gets > closed (because it gets migrated to another client, for example). We should > also trigger the callback indicating its progress when the restoration > stopped in any scenarios. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered
[ https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17308309#comment-17308309 ] A. Sophie Blee-Goldman commented on KAFKA-10575: [~high.lee] have you been able to work on this ticket? If you have other PRs that you are currently focusing on, you can always unassign this one and re-assign it once you're ready to pick it up. (No pressure, I've just been trying to clean up some tickets and make sure the owners are active) > StateRestoreListener#onRestoreEnd should always be triggered > > > Key: KAFKA-10575 > URL: https://issues.apache.org/jira/browse/KAFKA-10575 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Assignee: highluck >Priority: Major > > Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete > the restoration of an active task and transit it to the running state. > However the restoration can also be stopped when the restoring task gets > closed (because it gets migrated to another client, for example). We should > also trigger the callback indicating its progress when the restoration > stopped in any scenarios. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered
[ https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245318#comment-17245318 ] highluck commented on KAFKA-10575: -- [~Yohan123] Are you working on this task? > StateRestoreListener#onRestoreEnd should always be triggered > > > Key: KAFKA-10575 > URL: https://issues.apache.org/jira/browse/KAFKA-10575 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Priority: Major > > Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete > the restoration of an active task and transit it to the running state. > However the restoration can also be stopped when the restoring task gets > closed (because it gets migrated to another client, for example). We should > also trigger the callback indicating its progress when the restoration > stopped in any scenarios. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered
[ https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231123#comment-17231123 ] Richard Yu commented on KAFKA-10575: Thanks for letting me know! I was already looking at StoreChangelogReader since that was probably one of only two places where onRestoreEnd was called. Hopefully, I will be able to pull together a PR that can tackle this issue. > StateRestoreListener#onRestoreEnd should always be triggered > > > Key: KAFKA-10575 > URL: https://issues.apache.org/jira/browse/KAFKA-10575 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Priority: Major > > Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete > the restoration of an active task and transit it to the running state. > However the restoration can also be stopped when the restoring task gets > closed (because it gets migrated to another client, for example). We should > also trigger the callback indicating its progress when the restoration > stopped in any scenarios. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered
[ https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229436#comment-17229436 ] Guozhang Wang commented on KAFKA-10575: --- Of course. You can take a look at the StoreChangelogReader class as the entry point for a fix. > StateRestoreListener#onRestoreEnd should always be triggered > > > Key: KAFKA-10575 > URL: https://issues.apache.org/jira/browse/KAFKA-10575 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Priority: Major > > Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete > the restoration of an active task and transit it to the running state. > However the restoration can also be stopped when the restoring task gets > closed (because it gets migrated to another client, for example). We should > also trigger the callback indicating its progress when the restoration > stopped in any scenarios. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered
[ https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17228890#comment-17228890 ] Richard Yu commented on KAFKA-10575: [~guozhang] I'm interested in picking this one up. May I try my hand at it? > StateRestoreListener#onRestoreEnd should always be triggered > > > Key: KAFKA-10575 > URL: https://issues.apache.org/jira/browse/KAFKA-10575 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Guozhang Wang >Priority: Major > > Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete > the restoration of an active task and transit it to the running state. > However the restoration can also be stopped when the restoring task gets > closed (because it gets migrated to another client, for example). We should > also trigger the callback indicating its progress when the restoration > stopped in any scenarios. -- This message was sent by Atlassian Jira (v8.3.4#803005)