[jira] [Commented] (FLINK-33187) Don't record duplicate event if no change
[ https://issues.apache.org/jira/browse/FLINK-33187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1327#comment-1327 ] Clara Xiong commented on FLINK-33187: - Anyone have opinions on the default interval? Is 30 min a reasonable duration? > Don't record duplicate event if no change > - > > Key: FLINK-33187 > URL: https://issues.apache.org/jira/browse/FLINK-33187 > Project: Flink > Issue Type: Improvement > Components: Autoscaler >Affects Versions: 1.17.1 >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > Labels: pull-request-available > Fix For: kubernetes-operator-1.7.0 > > > Problem: > Some events are recorded repeatedly such as ScalingReport when autoscaling is > not enable, which consists 99% of all events in our prod env. This wastes > resources and causes performance downstream. > Proposal: > Suppress duplicate event within an interval defined by a new operator config > "scaling.report.interval" in second, defaulted to 1800. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33187) Don't record duplicate event if no change
[ https://issues.apache.org/jira/browse/FLINK-33187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated FLINK-33187: Description: Problem: Some events are recorded repeatedly such as ScalingReport when autoscaling is not enable, which consists 99% of all events in our prod env. This wastes resources and causes performance downstream. Proposal: Suppress duplicate event within an interval defined by a new operator config "scaling.report.interval" in second, defaulted to 1800. was: Problem: Some events are sent to Kafka repeatedly such as ScalingReport when autoscaling is not enable, which consists 99% of all Kafka events in our prod env. This wastes Kafka resources and cause performance downstream. Proposal: Suppress duplicate event within an interval defined by a new operator config "scaling.report.interval" in second, defaulted to 1800. > Don't record duplicate event if no change > - > > Key: FLINK-33187 > URL: https://issues.apache.org/jira/browse/FLINK-33187 > Project: Flink > Issue Type: Improvement > Components: Autoscaler >Affects Versions: 1.17.1 >Reporter: Clara Xiong >Priority: Major > > Problem: > Some events are recorded repeatedly such as ScalingReport when autoscaling is > not enable, which consists 99% of all events in our prod env. This wastes > resources and causes performance downstream. > Proposal: > Suppress duplicate event within an interval defined by a new operator config > "scaling.report.interval" in second, defaulted to 1800. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33187) Don't record duplicate event if no change
[ https://issues.apache.org/jira/browse/FLINK-33187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated FLINK-33187: Summary: Don't record duplicate event if no change (was: Don't send duplicate event to Kafka if no change) > Don't record duplicate event if no change > - > > Key: FLINK-33187 > URL: https://issues.apache.org/jira/browse/FLINK-33187 > Project: Flink > Issue Type: Improvement > Components: Autoscaler >Affects Versions: 1.17.1 >Reporter: Clara Xiong >Priority: Major > > Problem: > Some events are sent to Kafka repeatedly such as ScalingReport when > autoscaling is not enable, which consists 99% of all Kafka events in our > prod env. This wastes Kafka resources and cause performance downstream. > Proposal: > Suppress duplicate event within an interval defined by a new operator config > "scaling.report.interval" in second, defaulted to 1800. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33187) Don't send duplicate event to Kafka if no change
[ https://issues.apache.org/jira/browse/FLINK-33187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated FLINK-33187: Description: Problem: Some events are sent to Kafka repeatedly such as ScalingReport when autoscaling is not enable, which consists 99% of all Kafka events in our prod env. This wastes Kafka resources and cause performance downstream. Proposal: Suppress duplicate event within an interval defined by a new operator dynamic config "scaling.report.interval" in second, defaulted to 1800. was: Problem: Some events are sent to Kafka repeatedly such as ScalingReport when autoscaling is not enable, which consists 99% of all Kafka events in our prod env. This wastes Kafka resources and cause performance downstream. Proposal: Suppress duplicate event within an interval defined by a new operator dynamic config "suppress-event.interval" in second, defaulted to 0. > Don't send duplicate event to Kafka if no change > > > Key: FLINK-33187 > URL: https://issues.apache.org/jira/browse/FLINK-33187 > Project: Flink > Issue Type: Improvement > Components: Autoscaler >Affects Versions: 1.17.1 >Reporter: Clara Xiong >Priority: Major > > Problem: > Some events are sent to Kafka repeatedly such as ScalingReport when > autoscaling is not enable, which consists 99% of all Kafka events in our > prod env. This wastes Kafka resources and cause performance downstream. > Proposal: > Suppress duplicate event within an interval defined by a new operator dynamic > config "scaling.report.interval" in second, defaulted to 1800. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33187) Don't send duplicate event to Kafka if no change
[ https://issues.apache.org/jira/browse/FLINK-33187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated FLINK-33187: Description: Problem: Some events are sent to Kafka repeatedly such as ScalingReport when autoscaling is not enable, which consists 99% of all Kafka events in our prod env. This wastes Kafka resources and cause performance downstream. Proposal: Suppress duplicate event within an interval defined by a new operator config "scaling.report.interval" in second, defaulted to 1800. was: Problem: Some events are sent to Kafka repeatedly such as ScalingReport when autoscaling is not enable, which consists 99% of all Kafka events in our prod env. This wastes Kafka resources and cause performance downstream. Proposal: Suppress duplicate event within an interval defined by a new operator dynamic config "scaling.report.interval" in second, defaulted to 1800. > Don't send duplicate event to Kafka if no change > > > Key: FLINK-33187 > URL: https://issues.apache.org/jira/browse/FLINK-33187 > Project: Flink > Issue Type: Improvement > Components: Autoscaler >Affects Versions: 1.17.1 >Reporter: Clara Xiong >Priority: Major > > Problem: > Some events are sent to Kafka repeatedly such as ScalingReport when > autoscaling is not enable, which consists 99% of all Kafka events in our > prod env. This wastes Kafka resources and cause performance downstream. > Proposal: > Suppress duplicate event within an interval defined by a new operator config > "scaling.report.interval" in second, defaulted to 1800. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33187) Don't send duplicate event to Kafka if no change
[ https://issues.apache.org/jira/browse/FLINK-33187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated FLINK-33187: Description: Problem: Some events are sent to Kafka repeatedly such as ScalingReport when autoscaling is not enable, which consists 99% of all Kafka events in our prod env. This wastes Kafka resources and cause performance downstream. Proposal: Suppress duplicate event within an interval defined by a new operator dynamic config "suppress-event.interval" in second, defaulted to 0. was: Problem: Some events are sent to Kafka repeatedly such as ScalingReport when autoscaling is not enable, which consists 99% of all Kafka events in our prod env. This wastes Kafka resources and cause performance downstream. Proposal: Suppress duplicate event within an interval defined by a new operator dynamic config "suppress-event.interval" in second, defaulted to 30 min. > Don't send duplicate event to Kafka if no change > > > Key: FLINK-33187 > URL: https://issues.apache.org/jira/browse/FLINK-33187 > Project: Flink > Issue Type: Improvement > Components: Autoscaler >Affects Versions: 1.17.1 >Reporter: Clara Xiong >Priority: Major > > Problem: > Some events are sent to Kafka repeatedly such as ScalingReport when > autoscaling is not enable, which consists 99% of all Kafka events in our > prod env. This wastes Kafka resources and cause performance downstream. > Proposal: > Suppress duplicate event within an interval defined by a new operator dynamic > config "suppress-event.interval" in second, defaulted to 0. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33187) Don't send duplicate event to Kafka if no change
Clara Xiong created FLINK-33187: --- Summary: Don't send duplicate event to Kafka if no change Key: FLINK-33187 URL: https://issues.apache.org/jira/browse/FLINK-33187 Project: Flink Issue Type: Improvement Components: Autoscaler Affects Versions: 1.17.1 Reporter: Clara Xiong Problem: Some events are sent to Kafka repeatedly such as ScalingReport when autoscaling is not enable, which consists 99% of all Kafka events in our prod env. This wastes Kafka resources and cause performance downstream. Proposal: Suppress duplicate event within an interval defined by a new operator dynamic config "suppress-event.interval" in second, defaulted to 30 min. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-30119) Breaking change: Flink Kubernetes Operator should store last savepoint in the SavepointInfo.lastSavepoint field whether it is completed or pending
[ https://issues.apache.org/jira/browse/FLINK-30119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong closed FLINK-30119. --- Release Note: Given the impact vs benefit of the change, we can revisit next time we get a change to make major design changes. Resolution: Won't Fix > Breaking change: Flink Kubernetes Operator should store last savepoint in the > SavepointInfo.lastSavepoint field whether it is completed or pending > -- > > Key: FLINK-30119 > URL: https://issues.apache.org/jira/browse/FLINK-30119 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > Labels: pull-request-available, stale-assigned > > End user experience proposal: > Users can see the properties of last savepoint pending or completed and can > get status in one of three states for the status: PENDING, SUCCEEDED and > FAILED. If there is never savepoint taken or attempted, it is empty. > Completed savepoints (manual, periodic and upgrade) are included Savepoint > history, merged with savepoints form Flink job. > Users can see this savepoint with PENDING status once one is trigger. Once > completed, users can see last savepoint status changed to SUCCEEDED and > included in savepoint history, or FAILED and not in savepoint history. If > there is other savepoint triggered after completion before user checks, user > cannot see the status of the one they triggered but they can check if the > savepoint is in the history. > Currently lastSavepoint only stores the last completed one, duplicate with > savepoint history. To expose the properties of the currently pending > savepoint or last savepoint that failed, we need to expose those info in > separate fields in SavepointInfo. The internal logic of Operator uses those > fields for triggering and retries and creates compatibility issues with > client. It also use more space for etcd size limit. > Code change proposal: > Use lastSavepoint to store the last completed/attempted one and deprecate > SavepointInfo.triggerTimstamp, SavepointInfo.triggerType and > SavepointInfo.formatType. This will simplify the CRD and logic. > Add SavepointInfo::retrieveLastSavepoint method to return the last succeeded > one. > Update getLastSavepointStatus to simplify the logic. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30119) Breaking change: Flink Kubernetes Operator should store last savepoint in the SavepointInfo.lastSavepoint field whether it is completed or pending
[ https://issues.apache.org/jira/browse/FLINK-30119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760994#comment-17760994 ] Clara Xiong commented on FLINK-30119: - Given the impact and benefit of the change, we can revisit next time we get a change to make major design changes. > Breaking change: Flink Kubernetes Operator should store last savepoint in the > SavepointInfo.lastSavepoint field whether it is completed or pending > -- > > Key: FLINK-30119 > URL: https://issues.apache.org/jira/browse/FLINK-30119 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > Labels: pull-request-available, stale-assigned > > End user experience proposal: > Users can see the properties of last savepoint pending or completed and can > get status in one of three states for the status: PENDING, SUCCEEDED and > FAILED. If there is never savepoint taken or attempted, it is empty. > Completed savepoints (manual, periodic and upgrade) are included Savepoint > history, merged with savepoints form Flink job. > Users can see this savepoint with PENDING status once one is trigger. Once > completed, users can see last savepoint status changed to SUCCEEDED and > included in savepoint history, or FAILED and not in savepoint history. If > there is other savepoint triggered after completion before user checks, user > cannot see the status of the one they triggered but they can check if the > savepoint is in the history. > Currently lastSavepoint only stores the last completed one, duplicate with > savepoint history. To expose the properties of the currently pending > savepoint or last savepoint that failed, we need to expose those info in > separate fields in SavepointInfo. The internal logic of Operator uses those > fields for triggering and retries and creates compatibility issues with > client. It also use more space for etcd size limit. > Code change proposal: > Use lastSavepoint to store the last completed/attempted one and deprecate > SavepointInfo.triggerTimstamp, SavepointInfo.triggerType and > SavepointInfo.formatType. This will simplify the CRD and logic. > Add SavepointInfo::retrieveLastSavepoint method to return the last succeeded > one. > Update getLastSavepointStatus to simplify the logic. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30119) Breaking change: Flink Kubernetes Operator should store last savepoint in the SavepointInfo.lastSavepoint field whether it is completed or pending
[ https://issues.apache.org/jira/browse/FLINK-30119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642168#comment-17642168 ] Clara Xiong commented on FLINK-30119: - This is a breaking change: * Existing application using lastSavepoint or trigger related fields in Savepoint info except triggerId may be broken. * For triggerId dependence, better use SavepointUtils.savepointInProgress which is guaranteed to work. * savepointHistory is not changed and still keeps the completed savepoints only. Unit tests for savepoint history also cover a few scenarios. > Breaking change: Flink Kubernetes Operator should store last savepoint in the > SavepointInfo.lastSavepoint field whether it is completed or pending > -- > > Key: FLINK-30119 > URL: https://issues.apache.org/jira/browse/FLINK-30119 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > Labels: pull-request-available > > End user experience proposal: > Users can see the properties of last savepoint pending or completed and can > get status in one of three states for the status: PENDING, SUCCEEDED and > FAILED. If there is never savepoint taken or attempted, it is empty. > Completed savepoints (manual, periodic and upgrade) are included Savepoint > history, merged with savepoints form Flink job. > Users can see this savepoint with PENDING status once one is trigger. Once > completed, users can see last savepoint status changed to SUCCEEDED and > included in savepoint history, or FAILED and not in savepoint history. If > there is other savepoint triggered after completion before user checks, user > cannot see the status of the one they triggered but they can check if the > savepoint is in the history. > Currently lastSavepoint only stores the last completed one, duplicate with > savepoint history. To expose the properties of the currently pending > savepoint or last savepoint that failed, we need to expose those info in > separate fields in SavepointInfo. The internal logic of Operator uses those > fields for triggering and retries and creates compatibility issues with > client. It also use more space for etcd size limit. > Code change proposal: > Use lastSavepoint to store the last completed/attempted one and deprecate > SavepointInfo.triggerTimstamp, SavepointInfo.triggerType and > SavepointInfo.formatType. This will simplify the CRD and logic. > Add SavepointInfo::retrieveLastSavepoint method to return the last succeeded > one. > Update getLastSavepointStatus to simplify the logic. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-30119) Breaking change: Flink Kubernetes Operator should store last savepoint in the SavepointInfo.lastSavepoint field whether it is completed or pending
[ https://issues.apache.org/jira/browse/FLINK-30119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642162#comment-17642162 ] Clara Xiong commented on FLINK-30119: - Updating the title to reflect the fact this could break existing application relying on some fields of SavepointInfo although there is potential benefits to simplying logic and reduce the size of etcd object which is hitting upper limit. > Breaking change: Flink Kubernetes Operator should store last savepoint in the > SavepointInfo.lastSavepoint field whether it is completed or pending > -- > > Key: FLINK-30119 > URL: https://issues.apache.org/jira/browse/FLINK-30119 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > End user experience proposal: > Users can see the properties of last savepoint pending or completed and can > get status in one of three states for the status: PENDING, SUCCEEDED and > FAILED. If there is never savepoint taken or attempted, it is empty. > Completed savepoints (manual, periodic and upgrade) are included Savepoint > history, merged with savepoints form Flink job. > Users can see this savepoint with PENDING status once one is trigger. Once > completed, users can see last savepoint status changed to SUCCEEDED and > included in savepoint history, or FAILED and not in savepoint history. If > there is other savepoint triggered after completion before user checks, user > cannot see the status of the one they triggered but they can check if the > savepoint is in the history. > Currently lastSavepoint only stores the last completed one, duplicate with > savepoint history. To expose the properties of the currently pending > savepoint or last savepoint that failed, we need to expose those info in > separate fields in SavepointInfo. The internal logic of Operator uses those > fields for triggering and retries and creates compatibility issues with > client. It also use more space for etcd size limit. > Code change proposal: > Use lastSavepoint to store the last completed/attempted one and deprecate > SavepointInfo.triggerTimstamp, SavepointInfo.triggerType and > SavepointInfo.formatType. This will simplify the CRD and logic. > Add SavepointInfo::retrieveLastSavepoint method to return the last succeeded > one. > Update getLastSavepointStatus to simplify the logic. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-30119) Breaking change: Flink Kubernetes Operator should store last savepoint in the SavepointInfo.lastSavepoint field whether it is completed or pending
[ https://issues.apache.org/jira/browse/FLINK-30119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated FLINK-30119: Summary: Breaking change: Flink Kubernetes Operator should store last savepoint in the SavepointInfo.lastSavepoint field whether it is completed or pending (was: Flink Kubernetes Operator should store last savepoint in the SavepointInfo.lastSavepoint field whether it is completed or pending) > Breaking change: Flink Kubernetes Operator should store last savepoint in the > SavepointInfo.lastSavepoint field whether it is completed or pending > -- > > Key: FLINK-30119 > URL: https://issues.apache.org/jira/browse/FLINK-30119 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > End user experience proposal: > Users can see the properties of last savepoint pending or completed and can > get status in one of three states for the status: PENDING, SUCCEEDED and > FAILED. If there is never savepoint taken or attempted, it is empty. > Completed savepoints (manual, periodic and upgrade) are included Savepoint > history, merged with savepoints form Flink job. > Users can see this savepoint with PENDING status once one is trigger. Once > completed, users can see last savepoint status changed to SUCCEEDED and > included in savepoint history, or FAILED and not in savepoint history. If > there is other savepoint triggered after completion before user checks, user > cannot see the status of the one they triggered but they can check if the > savepoint is in the history. > Currently lastSavepoint only stores the last completed one, duplicate with > savepoint history. To expose the properties of the currently pending > savepoint or last savepoint that failed, we need to expose those info in > separate fields in SavepointInfo. The internal logic of Operator uses those > fields for triggering and retries and creates compatibility issues with > client. It also use more space for etcd size limit. > Code change proposal: > Use lastSavepoint to store the last completed/attempted one and deprecate > SavepointInfo.triggerTimstamp, SavepointInfo.triggerType and > SavepointInfo.formatType. This will simplify the CRD and logic. > Add SavepointInfo::retrieveLastSavepoint method to return the last succeeded > one. > Update getLastSavepointStatus to simplify the logic. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-30119) Flink Kubernetes Operator should store last savepoint in the SavepointInfo.lastSavepoint field whether it is completed or pending
[ https://issues.apache.org/jira/browse/FLINK-30119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clara Xiong updated FLINK-30119: Summary: Flink Kubernetes Operator should store last savepoint in the SavepointInfo.lastSavepoint field whether it is completed or pending (was: (Flink Kubernetes Operator should store last savepoint in the SavepointInfo.lastSavepoint field whether it is completed or pending) > Flink Kubernetes Operator should store last savepoint in the > SavepointInfo.lastSavepoint field whether it is completed or pending > - > > Key: FLINK-30119 > URL: https://issues.apache.org/jira/browse/FLINK-30119 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Clara Xiong >Assignee: Clara Xiong >Priority: Major > > End user experience proposal: > Users can see the properties of last savepoint pending or completed and can > get status in one of three states for the status: PENDING, SUCCEEDED and > FAILED. If there is never savepoint taken or attempted, it is empty. > Completed savepoints (manual, periodic and upgrade) are included Savepoint > history, merged with savepoints form Flink job. > Users can see this savepoint with PENDING status once one is trigger. Once > completed, users can see last savepoint status changed to SUCCEEDED and > included in savepoint history, or FAILED and not in savepoint history. If > there is other savepoint triggered after completion before user checks, user > cannot see the status of the one they triggered but they can check if the > savepoint is in the history. > Currently lastSavepoint only stores the last completed one, duplicate with > savepoint history. To expose the properties of the currently pending > savepoint or last savepoint that failed, we need to expose those info in > separate fields in SavepointInfo. The internal logic of Operator uses those > fields for triggering and retries and creates compatibility issues with > client. It also use more space for etcd size limit. > Code change proposal: > Use lastSavepoint to store the last completed/attempted one and deprecate > SavepointInfo.triggerTimstamp, SavepointInfo.triggerType and > SavepointInfo.formatType. This will simplify the CRD and logic. > Add SavepointInfo::retrieveLastSavepoint method to return the last succeeded > one. > Update getLastSavepointStatus to simplify the logic. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-30119) (Flink Kubernetes Operator should store last savepoint in the SavepointInfo.lastSavepoint field whether it is completed or pending
Clara Xiong created FLINK-30119: --- Summary: (Flink Kubernetes Operator should store last savepoint in the SavepointInfo.lastSavepoint field whether it is completed or pending Key: FLINK-30119 URL: https://issues.apache.org/jira/browse/FLINK-30119 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Clara Xiong End user experience proposal: Users can see the properties of last savepoint pending or completed and can get status in one of three states for the status: PENDING, SUCCEEDED and FAILED. If there is never savepoint taken or attempted, it is empty. Completed savepoints (manual, periodic and upgrade) are included Savepoint history, merged with savepoints form Flink job. Users can see this savepoint with PENDING status once one is trigger. Once completed, users can see last savepoint status changed to SUCCEEDED and included in savepoint history, or FAILED and not in savepoint history. If there is other savepoint triggered after completion before user checks, user cannot see the status of the one they triggered but they can check if the savepoint is in the history. Currently lastSavepoint only stores the last completed one, duplicate with savepoint history. To expose the properties of the currently pending savepoint or last savepoint that failed, we need to expose those info in separate fields in SavepointInfo. The internal logic of Operator uses those fields for triggering and retries and creates compatibility issues with client. It also use more space for etcd size limit. Code change proposal: Use lastSavepoint to store the last completed/attempted one and deprecate SavepointInfo.triggerTimstamp, SavepointInfo.triggerType and SavepointInfo.formatType. This will simplify the CRD and logic. Add SavepointInfo::retrieveLastSavepoint method to return the last succeeded one. Update getLastSavepointStatus to simplify the logic. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-30047) getLastSavepointStatus should return null when there is never savepoint completed or pending
Clara Xiong created FLINK-30047: --- Summary: getLastSavepointStatus should return null when there is never savepoint completed or pending Key: FLINK-30047 URL: https://issues.apache.org/jira/browse/FLINK-30047 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Clara Xiong Current SUCCEEDED is returned in this case but null should be returned instead to distinguish from really success. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29819) Record an error event when savepoint fails within grace period
Clara Xiong created FLINK-29819: --- Summary: Record an error event when savepoint fails within grace period Key: FLINK-29819 URL: https://issues.apache.org/jira/browse/FLINK-29819 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Clara Xiong As of now, SavepointObserver retries if savepoint fails within grace period until success or failure happens after the grace period. The grace period is for each retry. If underlying problem for quick failure is not transient, such as a mis-configured path or a perisistent storage failure, retries keep going on without recording any error event. We should first add logic to record an error event per failed attempt. We can consider capping the retries if it become a pain for users. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29695) Create a utility to report the status of the last savepoint
Clara Xiong created FLINK-29695: --- Summary: Create a utility to report the status of the last savepoint Key: FLINK-29695 URL: https://issues.apache.org/jira/browse/FLINK-29695 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Reporter: Clara Xiong Users want to know the status of last savepoint, especially for manually triggered ones, to manage savepoints. Currently, users can infer the status of the last savepoint (PENDING, SUCCEEDED and ABANDONED) from jobStatus.triggerId, lastSavepoint.triggerNonce, spec.job.savepointTriggerNonce and savepointTriggerNonce from last reconciliation. If the last savepoint is not manually triggered, there is no ABANDONED status, only PENDING or SUCCEEDED. Creating a utility will encapsulate the internal logic of Flink operator guard against regression by any future version changes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29588) Add Flink Version and Application Version to Savepoint properties
[ https://issues.apache.org/jira/browse/FLINK-29588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17616041#comment-17616041 ] Clara Xiong commented on FLINK-29588: - appVersion is the application version of the FlinkDeployment. If a user doesn't find a problem until after a couple of rounds of upgrades and decides to go back to re-process or troubleshoot, they have already had many savepoints taken. Timestamp is a way to look for the right one but they probably need a way to keep the chronological history of versions. > Add Flink Version and Application Version to Savepoint properties > - > > Key: FLINK-29588 > URL: https://issues.apache.org/jira/browse/FLINK-29588 > Project: Flink > Issue Type: Improvement > Components: Kubernetes Operator >Reporter: Clara Xiong >Priority: Major > > It is common that users need to upgrade long running FlinkDeployment's. An > application upgrade or a major Flink upgrade might break the applications due > to schema incompatibilities, especially the > latter([Link)|https://nightlies.apache.org/flink/flink-docs-master/docs/ops/upgrading/#compatibility-table]. > In this case, the users need to manually restore to a savepoint that is > compatible with the version the user wants to try or re-process. Currently > Flink Operator returns a list of completed Savepoint's for a FlinkDeployment. > It will be helpful that the Savepoint's in SavepointHistory have the > properties for Flink Version and application version so users can easily > determine which savepoint to use. > > * {{String flinkVersion}} > * {{String appVersion}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29588) Add Flink Version and Application Version to Savepoint properties
Clara Xiong created FLINK-29588: --- Summary: Add Flink Version and Application Version to Savepoint properties Key: FLINK-29588 URL: https://issues.apache.org/jira/browse/FLINK-29588 Project: Flink Issue Type: Improvement Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.2.0 Reporter: Clara Xiong It is common that users need to upgrade long running FlinkDeployment's. An application upgrade or a major Flink upgrade might break the applications due to schema incompatibilities, especially the latter([Link)|https://nightlies.apache.org/flink/flink-docs-master/docs/ops/upgrading/#compatibility-table]. In this case, the users need to manually restore to a savepoint that is compatible with the version the user wants to try or re-process. Currently Flink Operator returns a list of completed Savepoint's for a FlinkDeployment. It will be helpful that the Savepoint's in SavepointHistory have the properties for Flink Version and application version so users can easily determine which savepoint to use. * {{String flinkVersion}} * {{String appVersion}} -- This message was sent by Atlassian Jira (v8.20.10#820010)