[jira] [Created] (FLINK-36521) Introduce TtlAwareSerializer to resolve the compatibility check between ttlSerializer and original serializer
xiangyu feng created FLINK-36521: Summary: Introduce TtlAwareSerializer to resolve the compatibility check between ttlSerializer and original serializer Key: FLINK-36521 URL: https://issues.apache.org/jira/browse/FLINK-36521 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Reporter: xiangyu feng -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35780) Support state migration between disabling and enabling ttl in RocksDBKeyedStateBackend
[ https://issues.apache.org/jira/browse/FLINK-35780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881913#comment-17881913 ] xiangyu feng commented on FLINK-35780: -- Also cc: [~tzulitai] [~masteryhx] > Support state migration between disabling and enabling ttl in > RocksDBKeyedStateBackend > -- > > Key: FLINK-35780 > URL: https://issues.apache.org/jira/browse/FLINK-35780 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Reporter: xiangyu feng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-35780) Support state migration between disabling and enabling ttl in RocksDBKeyedStateBackend
[ https://issues.apache.org/jira/browse/FLINK-35780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881851#comment-17881851 ] xiangyu feng commented on FLINK-35780: -- Hi [~Zakelly], I've finished the first version of implementation. Would you kindly review the PR when you have time? > Support state migration between disabling and enabling ttl in > RocksDBKeyedStateBackend > -- > > Key: FLINK-35780 > URL: https://issues.apache.org/jira/browse/FLINK-35780 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Reporter: xiangyu feng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35780) Support state migration between disabling and enabling ttl in RocksDBKeyedStateBackend
[ https://issues.apache.org/jira/browse/FLINK-35780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-35780: - Summary: Support state migration between disabling and enabling ttl in RocksDBKeyedStateBackend (was: Support state migration between disabling and enabling ttl in RocksDBState) > Support state migration between disabling and enabling ttl in > RocksDBKeyedStateBackend > -- > > Key: FLINK-35780 > URL: https://issues.apache.org/jira/browse/FLINK-35780 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Reporter: xiangyu feng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-35780) Support state migration between disabling and enabling ttl in RocksDBState
[ https://issues.apache.org/jira/browse/FLINK-35780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-35780: - Summary: Support state migration between disabling and enabling ttl in RocksDBState (was: Support state migration from disabling to enabling ttl in RocksDBState) > Support state migration between disabling and enabling ttl in RocksDBState > -- > > Key: FLINK-35780 > URL: https://issues.apache.org/jira/browse/FLINK-35780 > Project: Flink > Issue Type: Sub-task > Components: Runtime / State Backends >Reporter: xiangyu feng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35780) Support state migration from disabling to enabling ttl in RocksDBState
xiangyu feng created FLINK-35780: Summary: Support state migration from disabling to enabling ttl in RocksDBState Key: FLINK-35780 URL: https://issues.apache.org/jira/browse/FLINK-35780 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Reporter: xiangyu feng -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32955) Support state compatibility between enabling TTL and disabling TTL
[ https://issues.apache.org/jira/browse/FLINK-32955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17855474#comment-17855474 ] xiangyu feng commented on FLINK-32955: -- [~lijinzhong] thx, [~Zakelly] [~masteryhx] would you kindly assign this issue to me? > Support state compatibility between enabling TTL and disabling TTL > -- > > Key: FLINK-32955 > URL: https://issues.apache.org/jira/browse/FLINK-32955 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Jinzhong Li >Assignee: Jinzhong Li >Priority: Major > > Currently, trying to restore state, which was previously configured without > TTL, using TTL enabled descriptor or vice versa will lead to compatibility > failure and StateMigrationException. > In some scenario, user may enable state ttl and restore from old state which > was previously configured without TTL; or vice versa. > It would be useful for users if we support state compatibility between > enabling TTL and disabling TTL. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32955) Support state compatibility between enabling TTL and disabling TTL
[ https://issues.apache.org/jira/browse/FLINK-32955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17855415#comment-17855415 ] xiangyu feng commented on FLINK-32955: -- Hi [~lijinzhong] , this issue hasn't been updated for a long time. I would like to work on this if it is ok for you. > Support state compatibility between enabling TTL and disabling TTL > -- > > Key: FLINK-32955 > URL: https://issues.apache.org/jira/browse/FLINK-32955 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Reporter: Jinzhong Li >Assignee: Jinzhong Li >Priority: Major > > Currently, trying to restore state, which was previously configured without > TTL, using TTL enabled descriptor or vice versa will lead to compatibility > failure and StateMigrationException. > In some scenario, user may enable state ttl and restore from old state which > was previously configured without TTL; or vice versa. > It would be useful for users if we support state compatibility between > enabling TTL and disabling TTL. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-34452) Release Testing: Verify FLINK-15959 Add min number of slots configuration to limit total number of slots
[ https://issues.apache.org/jira/browse/FLINK-34452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng closed FLINK-34452. > Release Testing: Verify FLINK-15959 Add min number of slots configuration to > limit total number of slots > > > Key: FLINK-34452 > URL: https://issues.apache.org/jira/browse/FLINK-34452 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.19.0 >Reporter: xiangyu feng >Assignee: Weijie Guo >Priority: Blocker > Labels: release-testing > Fix For: 1.19.0 > > > Test Suggestion: > # Prepare configuration options: > ** taskmanager.numberOfTaskSlots = 2, > ** slotmanager.number-of-slots.min = 7, > ** slot.idle.timeout = 5 > # Setup a Flink session Cluster on Yarn or Native Kubernetes based on > following docs: > ** > [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/yarn/#starting-a-flink-session-on-yarn] > ** > [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes] > # Verify that 4 TaskManagers will be registered even though no jobs has been > submitted > # Verify that these TaskManagers will not be destroyed after 50 seconds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-34452) Release Testing: Verify FLINK-15959 Add min number of slots configuration to limit total number of slots
[ https://issues.apache.org/jira/browse/FLINK-34452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng resolved FLINK-34452. -- Resolution: Fixed > Release Testing: Verify FLINK-15959 Add min number of slots configuration to > limit total number of slots > > > Key: FLINK-34452 > URL: https://issues.apache.org/jira/browse/FLINK-34452 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.19.0 >Reporter: xiangyu feng >Assignee: Weijie Guo >Priority: Blocker > Labels: release-testing > Fix For: 1.19.0 > > > Test Suggestion: > # Prepare configuration options: > ** taskmanager.numberOfTaskSlots = 2, > ** slotmanager.number-of-slots.min = 7, > ** slot.idle.timeout = 5 > # Setup a Flink session Cluster on Yarn or Native Kubernetes based on > following docs: > ** > [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/yarn/#starting-a-flink-session-on-yarn] > ** > [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes] > # Verify that 4 TaskManagers will be registered even though no jobs has been > submitted > # Verify that these TaskManagers will not be destroyed after 50 seconds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34452) Release Testing: Verify FLINK-15959 Add min number of slots configuration to limit total number of slots
[ https://issues.apache.org/jira/browse/FLINK-34452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819097#comment-17819097 ] xiangyu feng commented on FLINK-34452: -- [~lincoln.86xy] Hi, I will close this ticket as resolved. > Release Testing: Verify FLINK-15959 Add min number of slots configuration to > limit total number of slots > > > Key: FLINK-34452 > URL: https://issues.apache.org/jira/browse/FLINK-34452 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.19.0 >Reporter: xiangyu feng >Assignee: Weijie Guo >Priority: Blocker > Labels: release-testing > Fix For: 1.19.0 > > > Test Suggestion: > # Prepare configuration options: > ** taskmanager.numberOfTaskSlots = 2, > ** slotmanager.number-of-slots.min = 7, > ** slot.idle.timeout = 5 > # Setup a Flink session Cluster on Yarn or Native Kubernetes based on > following docs: > ** > [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/yarn/#starting-a-flink-session-on-yarn] > ** > [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes] > # Verify that 4 TaskManagers will be registered even though no jobs has been > submitted > # Verify that these TaskManagers will not be destroyed after 50 seconds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34452) Release Testing: Verify FLINK-15959 Add min number of slots configuration to limit total number of slots
[ https://issues.apache.org/jira/browse/FLINK-34452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819096#comment-17819096 ] xiangyu feng commented on FLINK-34452: -- [~Weijie Guo] Thx for your work! > Release Testing: Verify FLINK-15959 Add min number of slots configuration to > limit total number of slots > > > Key: FLINK-34452 > URL: https://issues.apache.org/jira/browse/FLINK-34452 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.19.0 >Reporter: xiangyu feng >Assignee: Weijie Guo >Priority: Blocker > Labels: release-testing > Fix For: 1.19.0 > > > Test Suggestion: > # Prepare configuration options: > ** taskmanager.numberOfTaskSlots = 2, > ** slotmanager.number-of-slots.min = 7, > ** slot.idle.timeout = 5 > # Setup a Flink session Cluster on Yarn or Native Kubernetes based on > following docs: > ** > [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/yarn/#starting-a-flink-session-on-yarn] > ** > [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes] > # Verify that 4 TaskManagers will be registered even though no jobs has been > submitted > # Verify that these TaskManagers will not be destroyed after 50 seconds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34452) Release Testing: Verify FLINK-15959 Add min number of slots configuration to limit total number of slots
[ https://issues.apache.org/jira/browse/FLINK-34452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-34452: - Description: Test Suggestion: # Prepare configuration options: ** taskmanager.numberOfTaskSlots = 2, ** slotmanager.number-of-slots.min = 7, ** slot.idle.timeout = 5 # Setup a Flink session Cluster on Yarn or Native Kubernetes based on following docs: ** [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/yarn/#starting-a-flink-session-on-yarn] ** [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes] # Verify that 4 TaskManagers will be registered even though no jobs has been submitted # Verify that these TaskManagers will not be destroyed after 50 seconds. was: Test Suggestion: # Prepare configuration options: ** taskmanager.numberOfTaskSlots = 2, ** slotmanager.number-of-slots.min = 7, ** slot.idle.timeout = 5 # Setup a Flink session Cluster according to: [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#basic-setup|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#session-mode] # Verify that 4 TaskManagers will be registered even though no jobs has been submitted # Verify that these TaskManagers will not be destroyed after 50 seconds. > Release Testing: Verify FLINK-15959 Add min number of slots configuration to > limit total number of slots > > > Key: FLINK-34452 > URL: https://issues.apache.org/jira/browse/FLINK-34452 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.19.0 >Reporter: xiangyu feng >Priority: Blocker > Labels: release-testing > > Test Suggestion: > # Prepare configuration options: > ** taskmanager.numberOfTaskSlots = 2, > ** slotmanager.number-of-slots.min = 7, > ** slot.idle.timeout = 5 > # Setup a Flink session Cluster on Yarn or Native Kubernetes based on > following docs: > ** > [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/yarn/#starting-a-flink-session-on-yarn] > ** > [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#starting-a-flink-session-on-kubernetes] > # Verify that 4 TaskManagers will be registered even though no jobs has been > submitted > # Verify that these TaskManagers will not be destroyed after 50 seconds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34391) Release Testing Instructions: Verify FLINK-15959 Add min number of slots configuration to limit total number of slots
[ https://issues.apache.org/jira/browse/FLINK-34391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17818219#comment-17818219 ] xiangyu feng commented on FLINK-34391: -- [~lincoln.86xy] You can close this ticket now. I've created this ticket for test: https://issues.apache.org/jira/browse/FLINK-34452 > Release Testing Instructions: Verify FLINK-15959 Add min number of slots > configuration to limit total number of slots > - > > Key: FLINK-34391 > URL: https://issues.apache.org/jira/browse/FLINK-34391 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Affects Versions: 1.19.0 >Reporter: lincoln lee >Assignee: xiangyu feng >Priority: Blocker > Fix For: 1.19.0 > > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34452) Release Testing: Verify FLINK-15959 Add min number of slots configuration to limit total number of slots
[ https://issues.apache.org/jira/browse/FLINK-34452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-34452: - Description: Test Suggestion: # Prepare configuration options: ** taskmanager.numberOfTaskSlots = 2, ** slotmanager.number-of-slots.min = 7, ** slot.idle.timeout = 5 # Setup a Flink session Cluster according to: [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#basic-setup|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#session-mode] # Verify that 4 TaskManagers will be registered even though no jobs has been submitted # Verify that these TaskManagers will not be destroyed after 50 seconds. was: Test Suggestion: # Prepare configuration options: taskmanager.numberOfTaskSlots = 2, slotmanager.number-of-slots.min = 7, slot.idle.timeout = 5 # Setup a Flink session Cluster according to: [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#basic-setup|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#session-mode] # Verify that 4 TaskManagers will be registered even though no jobs has been submitted # Verify that these TaskManagers will not be destroyed after 50 seconds. > Release Testing: Verify FLINK-15959 Add min number of slots configuration to > limit total number of slots > > > Key: FLINK-34452 > URL: https://issues.apache.org/jira/browse/FLINK-34452 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.19.0 >Reporter: xiangyu feng >Priority: Blocker > Labels: release-testing > > Test Suggestion: > # Prepare configuration options: > ** taskmanager.numberOfTaskSlots = 2, > ** slotmanager.number-of-slots.min = 7, > ** slot.idle.timeout = 5 > # Setup a Flink session Cluster according to: > [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#basic-setup|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#session-mode] > # Verify that 4 TaskManagers will be registered even though no jobs has been > submitted > # Verify that these TaskManagers will not be destroyed after 50 seconds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34452) Release Testing: Verify FLINK-15959 Add min number of slots configuration to limit total number of slots
[ https://issues.apache.org/jira/browse/FLINK-34452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-34452: - Description: Test Suggestion: # Prepare configuration options: ** h5. taskmanager.numberOfTaskSlots = 2 ** h5. slotmanager.number-of-slots.min = 7 ** h5. slot.idle.timeout = 5 # Setup a Flink session Cluster according to: [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#basic-setup|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#session-mode] # Verify that 4 TaskManagers will be registered even though no jobs has been submitted # Verify that these TaskManagers will not be destroyed after 50 seconds. was: Test Suggestion: # Prepare configuration options: ## h5. taskmanager.numberOfTaskSlots = 2 ## h5. slotmanager.number-of-slots.min = 7 ## h5. slot.idle.timeout = 5 # Setup a Flink session Cluster according to: [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#basic-setup|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#session-mode] # Verify that 4 TaskManagers will be registered even though no jobs has been submitted # Verify that these TaskManagers will not be destroyed after 50 seconds. > Release Testing: Verify FLINK-15959 Add min number of slots configuration to > limit total number of slots > > > Key: FLINK-34452 > URL: https://issues.apache.org/jira/browse/FLINK-34452 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.19.0 >Reporter: xiangyu feng >Priority: Blocker > Labels: release-testing > > Test Suggestion: > # Prepare configuration options: > ** > h5. taskmanager.numberOfTaskSlots = 2 > ** > h5. slotmanager.number-of-slots.min = 7 > ** > h5. slot.idle.timeout = 5 > # Setup a Flink session Cluster according to: > [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#basic-setup|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#session-mode] > # Verify that 4 TaskManagers will be registered even though no jobs has been > submitted > # Verify that these TaskManagers will not be destroyed after 50 seconds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34452) Release Testing: Verify FLINK-15959 Add min number of slots configuration to limit total number of slots
[ https://issues.apache.org/jira/browse/FLINK-34452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-34452: - Description: Test Suggestion: # Prepare configuration options: taskmanager.numberOfTaskSlots = 2, slotmanager.number-of-slots.min = 7, slot.idle.timeout = 5 # Setup a Flink session Cluster according to: [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#basic-setup|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#session-mode] # Verify that 4 TaskManagers will be registered even though no jobs has been submitted # Verify that these TaskManagers will not be destroyed after 50 seconds. was: Test Suggestion: # Prepare configuration options: ** h5. taskmanager.numberOfTaskSlots = 2 ** h5. slotmanager.number-of-slots.min = 7 ** h5. slot.idle.timeout = 5 # Setup a Flink session Cluster according to: [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#basic-setup|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#session-mode] # Verify that 4 TaskManagers will be registered even though no jobs has been submitted # Verify that these TaskManagers will not be destroyed after 50 seconds. > Release Testing: Verify FLINK-15959 Add min number of slots configuration to > limit total number of slots > > > Key: FLINK-34452 > URL: https://issues.apache.org/jira/browse/FLINK-34452 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.19.0 >Reporter: xiangyu feng >Priority: Blocker > Labels: release-testing > > Test Suggestion: > # Prepare configuration options: taskmanager.numberOfTaskSlots = 2, > slotmanager.number-of-slots.min = 7, slot.idle.timeout = 5 > # Setup a Flink session Cluster according to: > [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#basic-setup|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#session-mode] > # Verify that 4 TaskManagers will be registered even though no jobs has been > submitted > # Verify that these TaskManagers will not be destroyed after 50 seconds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-34452) Release Testing: Verify FLINK-15959 Add min number of slots configuration to limit total number of slots
[ https://issues.apache.org/jira/browse/FLINK-34452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-34452: - Description: Test Suggestion: # Prepare configuration options: ## h5. taskmanager.numberOfTaskSlots = 2 ## h5. slotmanager.number-of-slots.min = 7 ## h5. slot.idle.timeout = 5 # Setup a Flink session Cluster according to: [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#basic-setup|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#session-mode] # Verify that 4 TaskManagers will be registered even though no jobs has been submitted # Verify that these TaskManagers will not be destroyed after 50 seconds. > Release Testing: Verify FLINK-15959 Add min number of slots configuration to > limit total number of slots > > > Key: FLINK-34452 > URL: https://issues.apache.org/jira/browse/FLINK-34452 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.19.0 >Reporter: xiangyu feng >Priority: Blocker > Labels: release-testing > > Test Suggestion: > # Prepare configuration options: > ## > h5. taskmanager.numberOfTaskSlots = 2 > ## > h5. slotmanager.number-of-slots.min = 7 > ## > h5. slot.idle.timeout = 5 > # Setup a Flink session Cluster according to: > [https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#basic-setup|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/overview/#session-mode] > # Verify that 4 TaskManagers will be registered even though no jobs has been > submitted > # Verify that these TaskManagers will not be destroyed after 50 seconds. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34452) Release Testing: Verify FLINK-15959 Add min number of slots configuration to limit total number of slots
xiangyu feng created FLINK-34452: Summary: Release Testing: Verify FLINK-15959 Add min number of slots configuration to limit total number of slots Key: FLINK-34452 URL: https://issues.apache.org/jira/browse/FLINK-34452 Project: Flink Issue Type: Sub-task Components: Runtime / Coordination Affects Versions: 1.19.0 Reporter: xiangyu feng -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-34391) Release Testing Instructions: Verify FLINK-15959 Add min number of slots configuration to limit total number of slots
[ https://issues.apache.org/jira/browse/FLINK-34391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17815115#comment-17815115 ] xiangyu feng commented on FLINK-34391: -- [~lincoln.86xy] Hi, I think this feature needs cross-team testing because it is a user facing feature. > Release Testing Instructions: Verify FLINK-15959 Add min number of slots > configuration to limit total number of slots > - > > Key: FLINK-34391 > URL: https://issues.apache.org/jira/browse/FLINK-34391 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Affects Versions: 1.19.0 >Reporter: lincoln lee >Assignee: xiangyu feng >Priority: Blocker > Fix For: 1.19.0 > > Attachments: screenshot-1.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34226) Add the backoff-multiplier configuration in ExponentialBackoffRetryStrategy
xiangyu feng created FLINK-34226: Summary: Add the backoff-multiplier configuration in ExponentialBackoffRetryStrategy Key: FLINK-34226 URL: https://issues.apache.org/jira/browse/FLINK-34226 Project: Flink Issue Type: Sub-task Reporter: xiangyu feng -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33683) FLIP-407 Improve Flink Client performance in interactive scenarios
[ https://issues.apache.org/jira/browse/FLINK-33683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810282#comment-17810282 ] xiangyu feng commented on FLINK-33683: -- [~guoyangze] Hi, this Flip(https://cwiki.apache.org/confluence/display/FLINK/FLIP-407%3A+Improve+Flink+Client+performance+in+interactive+scenarios) has been approved, would you kindly assign this Jira to me? > FLIP-407 Improve Flink Client performance in interactive scenarios > -- > > Key: FLINK-33683 > URL: https://issues.apache.org/jira/browse/FLINK-33683 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission, Table SQL / Client >Reporter: xiangyu feng >Priority: Major > > Now there are lots of unnecessary overhead involved in submitting jobs and > fetching results to a long-running flink cluster. This works well for > streaming and batch job, because in these scenarios users will not frequently > submit jobs and fetch result to a running cluster. > > But in OLAP scenario, users will continuously submit lots of short-lived jobs > to the running cluster. In this situation, these overhead will have a huge > impact on the E2E performance. Here are some examples of unnecessary > overhead: > * Each `RemoteExecutor` will create a new `StandaloneClusterDescriptor` when > executing a job on the same remote cluster > * `StandaloneClusterDescriptor` will always create a new `RestClusterClient` > when retrieving an existing Flink Cluster > * Each `RestClusterClient` will create a new > `ClientHighAvailabilityServices` which might contains a resource-consuming ha > client(ZKClient or KubeClient) and a time-consuming leader retrieval operation > * `RestClient` will create a new connection for every request which costs > extra connection establishment time > > Therefore, I suggest creating this ticket and following subtasks to improve > this performance. This ticket is also relates to FLINK-25318. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33683) FLIP-407 Improve Flink Client performance in interactive scenarios
[ https://issues.apache.org/jira/browse/FLINK-33683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33683: - Summary: FLIP-407 Improve Flink Client performance in interactive scenarios (was: Improve the performance of submitting jobs and fetching results to a running flink cluster) > FLIP-407 Improve Flink Client performance in interactive scenarios > -- > > Key: FLINK-33683 > URL: https://issues.apache.org/jira/browse/FLINK-33683 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission, Table SQL / Client >Reporter: xiangyu feng >Priority: Major > > Now there are lots of unnecessary overhead involved in submitting jobs and > fetching results to a long-running flink cluster. This works well for > streaming and batch job, because in these scenarios users will not frequently > submit jobs and fetch result to a running cluster. > > But in OLAP scenario, users will continuously submit lots of short-lived jobs > to the running cluster. In this situation, these overhead will have a huge > impact on the E2E performance. Here are some examples of unnecessary > overhead: > * Each `RemoteExecutor` will create a new `StandaloneClusterDescriptor` when > executing a job on the same remote cluster > * `StandaloneClusterDescriptor` will always create a new `RestClusterClient` > when retrieving an existing Flink Cluster > * Each `RestClusterClient` will create a new > `ClientHighAvailabilityServices` which might contains a resource-consuming ha > client(ZKClient or KubeClient) and a time-consuming leader retrieval operation > * `RestClient` will create a new connection for every request which costs > extra connection establishment time > > Therefore, I suggest creating this ticket and following subtasks to improve > this performance. This ticket is also relates to FLINK-25318. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33932) Support Retry Mechanism in RocksDBStateDataTransfer
[ https://issues.apache.org/jira/browse/FLINK-33932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805398#comment-17805398 ] xiangyu feng commented on FLINK-33932: -- [~masteryhx] Thx for ur reply, I've talked to [~dianer17] to update the description of this issue. Would you kindly assign this issue to me? Also, I would like to hear more from you about this issue in the discussion thread of FLIP-414: [https://lists.apache.org/thread/om4kgd6trx2lctwm6x92q2kdjngxtz9k] > Support Retry Mechanism in RocksDBStateDataTransfer > --- > > Key: FLINK-33932 > URL: https://issues.apache.org/jira/browse/FLINK-33932 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: Guojun Li >Priority: Major > Labels: pull-request-available > > Currently, there is no retry mechanism for downloading and uploading RocksDB > state files. Any jittering of remote filesystem might lead to a checkpoint > failure. By supporting retry mechanism in RocksDBStateDataTransfer, we can > significantly reduce the failure rate of checkpoint during asynchronous phase. > The exception is as below: > {noformat} > > 2023-12-19 08:46:00,197 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Decline > checkpoint 2 by task > 5b008c2c048fa8534d648455e69f9497_fbcce2f96483f8f15e87dc6c9afd372f_183_0 of > job a025f19e at > application-6f1c6e3d-1702480803995-5093022-taskmanager-1-1 @ > fdbd:dc61:1a:101:0:0:0:36 (dataPort=38789). > org.apache.flink.util.SerializedThrowable: > org.apache.flink.runtime.checkpoint.CheckpointException: Asynchronous task > checkpoint failed. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:320) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:155) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] > at java.lang.Thread.run(Thread.java:829) [?:?] > Caused by: org.apache.flink.util.SerializedThrowable: java.lang.Exception: > Could not materialize checkpoint 2 for operator GlobalGroupAggregate[132] -> > Calc[133] (184/500)#0. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:298) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 4 more > Caused by: org.apache.flink.util.SerializedThrowable: > java.util.concurrent.ExecutionException: java.io.IOException: Could not flush > to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?] > at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] > at > org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:544) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:54) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 3 more > Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException: > Could not flush to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at > org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:516) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:157) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.contrib.streaming.state.RocksDBStateUploader.lambda$createUploadFutures$0(RocksDBStateUploader.java:113) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.ja
[jira] [Commented] (FLINK-33932) Support retry mechanism for rocksdb uploader
[ https://issues.apache.org/jira/browse/FLINK-33932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804666#comment-17804666 ] xiangyu feng commented on FLINK-33932: -- [~masteryhx] Hi Hangxiang, what do u think about this? I'd like to hear more from you. > Support retry mechanism for rocksdb uploader > > > Key: FLINK-33932 > URL: https://issues.apache.org/jira/browse/FLINK-33932 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: Guojun Li >Priority: Major > Labels: pull-request-available > > Rocksdb uploader will throw exception and decline the current checkpoint if > there are errors when uploading to remote file system like hdfs. > The exception is as below: > {noformat} > > 2023-12-19 08:46:00,197 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Decline > checkpoint 2 by task > 5b008c2c048fa8534d648455e69f9497_fbcce2f96483f8f15e87dc6c9afd372f_183_0 of > job a025f19e at > application-6f1c6e3d-1702480803995-5093022-taskmanager-1-1 @ > fdbd:dc61:1a:101:0:0:0:36 (dataPort=38789). > org.apache.flink.util.SerializedThrowable: > org.apache.flink.runtime.checkpoint.CheckpointException: Asynchronous task > checkpoint failed. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:320) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:155) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] > at java.lang.Thread.run(Thread.java:829) [?:?] > Caused by: org.apache.flink.util.SerializedThrowable: java.lang.Exception: > Could not materialize checkpoint 2 for operator GlobalGroupAggregate[132] -> > Calc[133] (184/500)#0. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:298) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 4 more > Caused by: org.apache.flink.util.SerializedThrowable: > java.util.concurrent.ExecutionException: java.io.IOException: Could not flush > to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?] > at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] > at > org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:544) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:54) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 3 more > Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException: > Could not flush to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at > org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:516) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:157) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.contrib.streaming.state.RocksDBStateUploader.lambda$createUploadFutures$0(RocksDBStateUploader.java:113) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:32) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more > Caused by: org.apache.flink.util.SerializedThrowable: > java.net.ConnectException: Connection timed out > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?] > at sun.nio.ch.SocketChannelImp
[jira] [Commented] (FLINK-33932) Support retry mechanism for rocksdb uploader
[ https://issues.apache.org/jira/browse/FLINK-33932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803901#comment-17803901 ] xiangyu feng commented on FLINK-33932: -- [~pnowojski] Also, I will open a flip to discuss add this retry configuration. > Support retry mechanism for rocksdb uploader > > > Key: FLINK-33932 > URL: https://issues.apache.org/jira/browse/FLINK-33932 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: Guojun Li >Priority: Major > Labels: pull-request-available > > Rocksdb uploader will throw exception and decline the current checkpoint if > there are errors when uploading to remote file system like hdfs. > The exception is as below: > {noformat} > > 2023-12-19 08:46:00,197 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Decline > checkpoint 2 by task > 5b008c2c048fa8534d648455e69f9497_fbcce2f96483f8f15e87dc6c9afd372f_183_0 of > job a025f19e at > application-6f1c6e3d-1702480803995-5093022-taskmanager-1-1 @ > fdbd:dc61:1a:101:0:0:0:36 (dataPort=38789). > org.apache.flink.util.SerializedThrowable: > org.apache.flink.runtime.checkpoint.CheckpointException: Asynchronous task > checkpoint failed. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:320) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:155) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] > at java.lang.Thread.run(Thread.java:829) [?:?] > Caused by: org.apache.flink.util.SerializedThrowable: java.lang.Exception: > Could not materialize checkpoint 2 for operator GlobalGroupAggregate[132] -> > Calc[133] (184/500)#0. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:298) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 4 more > Caused by: org.apache.flink.util.SerializedThrowable: > java.util.concurrent.ExecutionException: java.io.IOException: Could not flush > to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?] > at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] > at > org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:544) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:54) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 3 more > Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException: > Could not flush to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at > org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:516) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:157) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.contrib.streaming.state.RocksDBStateUploader.lambda$createUploadFutures$0(RocksDBStateUploader.java:113) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:32) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more > Caused by: org.apache.flink.util.SerializedThrowable: > java.net.ConnectException: Connection timed out > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:?] > at sun.nio.ch.SocketChannelImpl.finishCo
[jira] [Commented] (FLINK-33932) Support retry mechanism for rocksdb uploader
[ https://issues.apache.org/jira/browse/FLINK-33932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803900#comment-17803900 ] xiangyu feng commented on FLINK-33932: -- [~pnowojski] Hi Piotr, thx for ur feedback. After a closer look at the code, I think we also need retry mechanism in `RocksDBStateDownloader`. IMHO, we can add default retry configurations in `RocksDBOptions` and implement this in `RocksDBStateDataTransfer`. In this way, both `RocksDBStateUploader` and `RocksDBStateDownloader` can reuse this retry mechanism. Can u kindly assign this Jira to me and grant me the permission to change the title and description of this Jira? > Support retry mechanism for rocksdb uploader > > > Key: FLINK-33932 > URL: https://issues.apache.org/jira/browse/FLINK-33932 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: Guojun Li >Priority: Major > Labels: pull-request-available > > Rocksdb uploader will throw exception and decline the current checkpoint if > there are errors when uploading to remote file system like hdfs. > The exception is as below: > {noformat} > > 2023-12-19 08:46:00,197 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Decline > checkpoint 2 by task > 5b008c2c048fa8534d648455e69f9497_fbcce2f96483f8f15e87dc6c9afd372f_183_0 of > job a025f19e at > application-6f1c6e3d-1702480803995-5093022-taskmanager-1-1 @ > fdbd:dc61:1a:101:0:0:0:36 (dataPort=38789). > org.apache.flink.util.SerializedThrowable: > org.apache.flink.runtime.checkpoint.CheckpointException: Asynchronous task > checkpoint failed. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:320) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:155) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] > at java.lang.Thread.run(Thread.java:829) [?:?] > Caused by: org.apache.flink.util.SerializedThrowable: java.lang.Exception: > Could not materialize checkpoint 2 for operator GlobalGroupAggregate[132] -> > Calc[133] (184/500)#0. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:298) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 4 more > Caused by: org.apache.flink.util.SerializedThrowable: > java.util.concurrent.ExecutionException: java.io.IOException: Could not flush > to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?] > at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] > at > org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:544) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:54) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 3 more > Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException: > Could not flush to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at > org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:516) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:157) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.contrib.streaming.state.RocksDBStateUploader.lambda$createUploadFutures$0(RocksDBStateUploader.java:113) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:32
[jira] [Commented] (FLINK-33932) Support retry mechanism for rocksdb uploader
[ https://issues.apache.org/jira/browse/FLINK-33932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801162#comment-17801162 ] xiangyu feng commented on FLINK-33932: -- [~martijnvisser] AFAIK, this retry is not about "retry" checkpointing but to ensure that Checkpoint can successfully upload the local file to the remote storage during the asynchronous phase. For files that partially upload to remote but failed eventually, they won‘t be considered as part of this checkpoint because no file path will be returned. This retry mechanism works well in our production for the past two years without any correctness issues. > Support retry mechanism for rocksdb uploader > > > Key: FLINK-33932 > URL: https://issues.apache.org/jira/browse/FLINK-33932 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: Guojun Li >Priority: Major > Labels: pull-request-available > > Rocksdb uploader will throw exception and decline the current checkpoint if > there are errors when uploading to remote file system like hdfs. > The exception is as below: > 2023-12-19 08:46:00,197 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Decline > checkpoint 2 by task > 5b008c2c048fa8534d648455e69f9497_fbcce2f96483f8f15e87dc6c9afd372f_183_0 of > job a025f19e at > application-6f1c6e3d-1702480803995-5093022-taskmanager-1-1 @ > fdbd:dc61:1a:101:0:0:0:36 (dataPort=38789). > org.apache.flink.util.SerializedThrowable: > org.apache.flink.runtime.checkpoint.CheckpointException: Asynchronous task > checkpoint failed. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:320) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:155) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] > at java.lang.Thread.run(Thread.java:829) [?:?] > Caused by: org.apache.flink.util.SerializedThrowable: java.lang.Exception: > Could not materialize checkpoint 2 for operator GlobalGroupAggregate[132] -> > Calc[133] (184/500)#0. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:298) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 4 more > Caused by: org.apache.flink.util.SerializedThrowable: > java.util.concurrent.ExecutionException: java.io.IOException: Could not flush > to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?] > at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] > at > org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:544) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:54) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 3 more > Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException: > Could not flush to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at > org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:516) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:157) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.contrib.streaming.state.RocksDBStateUploader.lambda$createUploadFutures$0(RocksDBStateUploader.java:113) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:32) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-x
[jira] [Comment Edited] (FLINK-33932) Support retry mechanism for rocksdb uploader
[ https://issues.apache.org/jira/browse/FLINK-33932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800258#comment-17800258 ] xiangyu feng edited comment on FLINK-33932 at 12/29/23 12:24 PM: - Hi [~dianer17], thanks for creating this. IMHO, this retry is very useful to improve the success rate of checkpoint. We can introduce a default fixed delay retry mechanism here. I have implemented a poc in this pr: [https://github.com/apache/flink/pull/23986], WDYT? was (Author: JIRAUSER301129): Hi [~dianer17], thanks for creating this. IMHO, this retry is very useful to improve the success rate of checkpoint. We can introduce a default fixed delay retry mechanism without adding any configuration, thus we don't need a flip for this jira. I have implemented a poc in this pr: [https://github.com/apache/flink/pull/23986], WDYT? > Support retry mechanism for rocksdb uploader > > > Key: FLINK-33932 > URL: https://issues.apache.org/jira/browse/FLINK-33932 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: Guojun Li >Priority: Major > Labels: pull-request-available > > Rocksdb uploader will throw exception and decline the current checkpoint if > there are errors when uploading to remote file system like hdfs. > The exception is as below: > 2023-12-19 08:46:00,197 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Decline > checkpoint 2 by task > 5b008c2c048fa8534d648455e69f9497_fbcce2f96483f8f15e87dc6c9afd372f_183_0 of > job a025f19e at > application-6f1c6e3d-1702480803995-5093022-taskmanager-1-1 @ > fdbd:dc61:1a:101:0:0:0:36 (dataPort=38789). > org.apache.flink.util.SerializedThrowable: > org.apache.flink.runtime.checkpoint.CheckpointException: Asynchronous task > checkpoint failed. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:320) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:155) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] > at java.lang.Thread.run(Thread.java:829) [?:?] > Caused by: org.apache.flink.util.SerializedThrowable: java.lang.Exception: > Could not materialize checkpoint 2 for operator GlobalGroupAggregate[132] -> > Calc[133] (184/500)#0. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:298) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 4 more > Caused by: org.apache.flink.util.SerializedThrowable: > java.util.concurrent.ExecutionException: java.io.IOException: Could not flush > to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?] > at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] > at > org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:544) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:54) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 3 more > Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException: > Could not flush to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at > org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:516) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:157) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.contrib.streaming.state.RocksDBStateUploader.lambda$createUpl
[jira] [Commented] (FLINK-33932) Support retry mechanism for rocksdb uploader
[ https://issues.apache.org/jira/browse/FLINK-33932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800910#comment-17800910 ] xiangyu feng commented on FLINK-33932: -- [~martijnvisser] Yes, like the default `ExponentialBackoffRetryStrategy` defined in `RpcGatewayRetriever` and `ExponentialWaitStrategy` defined in `RestClusterClient`. > Support retry mechanism for rocksdb uploader > > > Key: FLINK-33932 > URL: https://issues.apache.org/jira/browse/FLINK-33932 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: Guojun Li >Priority: Major > Labels: pull-request-available > > Rocksdb uploader will throw exception and decline the current checkpoint if > there are errors when uploading to remote file system like hdfs. > The exception is as below: > 2023-12-19 08:46:00,197 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Decline > checkpoint 2 by task > 5b008c2c048fa8534d648455e69f9497_fbcce2f96483f8f15e87dc6c9afd372f_183_0 of > job a025f19e at > application-6f1c6e3d-1702480803995-5093022-taskmanager-1-1 @ > fdbd:dc61:1a:101:0:0:0:36 (dataPort=38789). > org.apache.flink.util.SerializedThrowable: > org.apache.flink.runtime.checkpoint.CheckpointException: Asynchronous task > checkpoint failed. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:320) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:155) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] > at java.lang.Thread.run(Thread.java:829) [?:?] > Caused by: org.apache.flink.util.SerializedThrowable: java.lang.Exception: > Could not materialize checkpoint 2 for operator GlobalGroupAggregate[132] -> > Calc[133] (184/500)#0. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:298) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 4 more > Caused by: org.apache.flink.util.SerializedThrowable: > java.util.concurrent.ExecutionException: java.io.IOException: Could not flush > to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?] > at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] > at > org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:544) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:54) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 3 more > Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException: > Could not flush to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at > org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:516) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:157) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.contrib.streaming.state.RocksDBStateUploader.lambda$createUploadFutures$0(RocksDBStateUploader.java:113) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:32) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) > ~[?:?] > ... 3 more > Caused by: org.apache.flink.util.SerializedThrowable: > java.net.ConnectException: Connection timed out > at sun.nio.ch.SocketChannelImpl.checkCo
[jira] [Comment Edited] (FLINK-33932) Support retry mechanism for rocksdb uploader
[ https://issues.apache.org/jira/browse/FLINK-33932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800258#comment-17800258 ] xiangyu feng edited comment on FLINK-33932 at 12/25/23 8:55 AM: Hi [~dianer17], thanks for creating this. IMHO, this retry is very useful to improve the success rate of checkpoint. We can introduce a default fixed delay retry mechanism without adding any configuration, thus we don't need a flip for this jira. I have implemented a poc in this pr: [https://github.com/apache/flink/pull/23986], WDYT? was (Author: JIRAUSER301129): Hi [~dianer17], thanks for creating this. IMHO, this retry is very useful to improve the success rate of checkpoint. We can introduce a default fixed delay retry mechanism without adding any configuration, thus we don't need a flip for this jira. I have implemented a poc in this pr: [https://github.com/apache/flink/pull/23986,] WDYT? > Support retry mechanism for rocksdb uploader > > > Key: FLINK-33932 > URL: https://issues.apache.org/jira/browse/FLINK-33932 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: Guojun Li >Priority: Major > Labels: pull-request-available > > Rocksdb uploader will throw exception and decline the current checkpoint if > there are errors when uploading to remote file system like hdfs. > The exception is as below: > 2023-12-19 08:46:00,197 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Decline > checkpoint 2 by task > 5b008c2c048fa8534d648455e69f9497_fbcce2f96483f8f15e87dc6c9afd372f_183_0 of > job a025f19e at > application-6f1c6e3d-1702480803995-5093022-taskmanager-1-1 @ > fdbd:dc61:1a:101:0:0:0:36 (dataPort=38789). > org.apache.flink.util.SerializedThrowable: > org.apache.flink.runtime.checkpoint.CheckpointException: Asynchronous task > checkpoint failed. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:320) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:155) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] > at java.lang.Thread.run(Thread.java:829) [?:?] > Caused by: org.apache.flink.util.SerializedThrowable: java.lang.Exception: > Could not materialize checkpoint 2 for operator GlobalGroupAggregate[132] -> > Calc[133] (184/500)#0. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:298) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 4 more > Caused by: org.apache.flink.util.SerializedThrowable: > java.util.concurrent.ExecutionException: java.io.IOException: Could not flush > to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?] > at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] > at > org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:544) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:54) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 3 more > Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException: > Could not flush to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at > org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:516) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:157) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apach
[jira] [Commented] (FLINK-33932) Support retry mechanism for rocksdb uploader
[ https://issues.apache.org/jira/browse/FLINK-33932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800258#comment-17800258 ] xiangyu feng commented on FLINK-33932: -- Hi [~dianer17], thanks for creating this. IMHO, this retry is very useful to improve the success rate of checkpoint. We can introduce a default fixed delay retry mechanism without adding any configuration, thus we don't need a flip for this jira. I have implemented a poc in this pr: [https://github.com/apache/flink/pull/23986,] WDYT? > Support retry mechanism for rocksdb uploader > > > Key: FLINK-33932 > URL: https://issues.apache.org/jira/browse/FLINK-33932 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing >Reporter: Guojun Li >Priority: Major > Labels: pull-request-available > > Rocksdb uploader will throw exception and decline the current checkpoint if > there are errors when uploading to remote file system like hdfs. > The exception is as below: > 2023-12-19 08:46:00,197 INFO > org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Decline > checkpoint 2 by task > 5b008c2c048fa8534d648455e69f9497_fbcce2f96483f8f15e87dc6c9afd372f_183_0 of > job a025f19e at > application-6f1c6e3d-1702480803995-5093022-taskmanager-1-1 @ > fdbd:dc61:1a:101:0:0:0:36 (dataPort=38789). > org.apache.flink.util.SerializedThrowable: > org.apache.flink.runtime.checkpoint.CheckpointException: Asynchronous task > checkpoint failed. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:320) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:155) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] > at java.lang.Thread.run(Thread.java:829) [?:?] > Caused by: org.apache.flink.util.SerializedThrowable: java.lang.Exception: > Could not materialize checkpoint 2 for operator GlobalGroupAggregate[132] -> > Calc[133] (184/500)#0. > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.handleExecutionException(AsyncCheckpointRunnable.java:298) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 4 more > Caused by: org.apache.flink.util.SerializedThrowable: > java.util.concurrent.ExecutionException: java.io.IOException: Could not flush > to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:?] > at java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[?:?] > at > org.apache.flink.util.concurrent.FutureUtils.runIfNotDoneAndGet(FutureUtils.java:544) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:54) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.finalizeNonFinishedSnapshots(AsyncCheckpointRunnable.java:191) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.streaming.runtime.tasks.AsyncCheckpointRunnable.run(AsyncCheckpointRunnable.java:124) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > ... 3 more > Caused by: org.apache.flink.util.SerializedThrowable: java.io.IOException: > Could not flush to file and close the file system output stream to > hdfs://xxx/shared/97329cbd-fb90-45ca-92de-edf52d6e6fc0 in order to obtain the > stream state handle > at > org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:516) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.contrib.streaming.state.RocksDBStateUploader.uploadLocalFileToCheckpointFs(RocksDBStateUploader.java:157) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.contrib.streaming.state.RocksDBStateUploader.lambda$createUploadFutures$0(RocksDBStateUploader.java:113) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:32) > ~[flink-dist-1.17-xxx-SNAPSHOT.jar:1.17-xxx-SNAPSHOT] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) >
[jira] [Commented] (FLINK-33819) Support setting CompressType in RocksDBStateBackend
[ https://issues.apache.org/jira/browse/FLINK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17796977#comment-17796977 ] xiangyu feng commented on FLINK-33819: -- +1 for this. This will save lots of cpu usage for jobs with less state space usage but high state access frequency. > Support setting CompressType in RocksDBStateBackend > --- > > Key: FLINK-33819 > URL: https://issues.apache.org/jira/browse/FLINK-33819 > Project: Flink > Issue Type: Improvement > Components: Runtime / State Backends >Affects Versions: 1.18.0 >Reporter: Yue Ma >Priority: Major > Fix For: 1.19.0 > > Attachments: image-2023-12-14-11-32-32-968.png, > image-2023-12-14-11-35-22-306.png > > > Currently, RocksDBStateBackend does not support setting the compression > level, and Snappy is used for compression by default. But we have some > scenarios where compression will use a lot of CPU resources. Turning off > compression can significantly reduce CPU overhead. So we may need to support > a parameter for users to set the CompressType of Rocksdb. > !image-2023-12-14-11-35-22-306.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33686) Reuse ClusterDescriptor in AbstractSessionClusterExecutor when executing jobs on the same cluster
[ https://issues.apache.org/jira/browse/FLINK-33686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33686: - Summary: Reuse ClusterDescriptor in AbstractSessionClusterExecutor when executing jobs on the same cluster (was: Reuse StandaloneClusterDescriptor in RemoteExecutor when executing jobs on the same cluster) > Reuse ClusterDescriptor in AbstractSessionClusterExecutor when executing jobs > on the same cluster > - > > Key: FLINK-33686 > URL: https://issues.apache.org/jira/browse/FLINK-33686 > Project: Flink > Issue Type: Sub-task > Components: Client / Job Submission >Reporter: xiangyu feng >Priority: Major > > Multiple `RemoteExecutor` instances can reuse the same > `StandaloneClusterDescriptor` when executing jobs to a same running cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33698) Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy
[ https://issues.apache.org/jira/browse/FLINK-33698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33698: - Description: The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` is problematic due to the lack of a reset when reusing the object. Current Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // equivalent to initial delay return lastRetryDelay; } long backoff = Math.min((long) (lastRetryDelay * multiplier), maxRetryDelay); this.lastRetryDelay = backoff; return backoff; } {code} Fixed Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // reset to initialDelay this.lastRetryDelay = initialDelay; return lastRetryDelay; } long backoff = Math.min((long) (lastRetryDelay * multiplier), maxRetryDelay); this.lastRetryDelay = backoff; return backoff; }{code} was: The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` should consider currentAttempts. Current Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // equivalent to initial delay return lastRetryDelay; } long backoff = Math.min((long) (lastRetryDelay * multiplier), maxRetryDelay); this.lastRetryDelay = backoff; return backoff; } {code} Fixed Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // reset to initialDelay this.lastRetryDelay = initialDelay; return lastRetryDelay; } long backoff = Math.min((long) (lastRetryDelay * multiplier), maxRetryDelay); this.lastRetryDelay = backoff; return backoff; }{code} > Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy > > > Key: FLINK-33698 > URL: https://issues.apache.org/jira/browse/FLINK-33698 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > > The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` is > problematic due to the lack of a reset when reusing the object. > > Current Version: > {code:java} > @Override > public long getBackoffTimeMillis(int currentAttempts) { > if (currentAttempts <= 1) { > // equivalent to initial delay > return lastRetryDelay; > } > long backoff = Math.min((long) (lastRetryDelay * multiplier), > maxRetryDelay); > this.lastRetryDelay = backoff; > return backoff; > } {code} > Fixed Version: > {code:java} > @Override > public long getBackoffTimeMillis(int currentAttempts) { > if (currentAttempts <= 1) { > // reset to initialDelay > this.lastRetryDelay = initialDelay; > return lastRetryDelay; > } > long backoff = Math.min((long) (lastRetryDelay * multiplier), > maxRetryDelay); > this.lastRetryDelay = backoff; > return backoff; > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33698) Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy
[ https://issues.apache.org/jira/browse/FLINK-33698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33698: - Description: The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` should consider currentAttempts. Current Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // equivalent to initial delay return lastRetryDelay; } long backoff = Math.min((long) (lastRetryDelay * multiplier), maxRetryDelay); this.lastRetryDelay = backoff; return backoff; } {code} Fixed Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // reset to initialDelay this.lastRetryDelay = initialDelay; return lastRetryDelay; } long backoff = Math.min((long) (lastRetryDelay * multiplier), maxRetryDelay); this.lastRetryDelay = backoff; return backoff; }{code} was: The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` should consider currentAttempts. Current Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // equivalent to initial delay return lastRetryDelay; } long backoff = Math.min((long) (lastRetryDelay * multiplier), maxRetryDelay); this.lastRetryDelay = backoff; return backoff; } {code} Fixed Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // equivalent to initial delay return initialDelay; } long backoff = Math.min( (long) (initialDelay * Math.pow(multiplier, currentAttempts - 1)), maxRetryDelay); return backoff; } {code} > Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy > > > Key: FLINK-33698 > URL: https://issues.apache.org/jira/browse/FLINK-33698 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > > The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` should > consider currentAttempts. > > Current Version: > {code:java} > @Override > public long getBackoffTimeMillis(int currentAttempts) { > if (currentAttempts <= 1) { > // equivalent to initial delay > return lastRetryDelay; > } > long backoff = Math.min((long) (lastRetryDelay * multiplier), > maxRetryDelay); > this.lastRetryDelay = backoff; > return backoff; > } {code} > Fixed Version: > {code:java} > @Override > public long getBackoffTimeMillis(int currentAttempts) { > if (currentAttempts <= 1) { > // reset to initialDelay > this.lastRetryDelay = initialDelay; > return lastRetryDelay; > } > long backoff = Math.min((long) (lastRetryDelay * multiplier), > maxRetryDelay); > this.lastRetryDelay = backoff; > return backoff; > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33702) Add IncrementalDelayRetryStrategy implementation of RetryStrategy
[ https://issues.apache.org/jira/browse/FLINK-33702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33702: - Summary: Add IncrementalDelayRetryStrategy implementation of RetryStrategy (was: Add IncrementalDelayRetryStrategy in AsyncRetryStrategies ) > Add IncrementalDelayRetryStrategy implementation of RetryStrategy > - > > Key: FLINK-33702 > URL: https://issues.apache.org/jira/browse/FLINK-33702 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: xiangyu feng >Priority: Major > Labels: pull-request-available > > RetryStrategy now supports FixedRetryStrategy and > ExponentialBackoffRetryStrategy. > In certain scenarios, we also need IncrementalDelayRetryStrategy to reduce > the retry count and perform the action more timely. > IncrementalDelayRetryStrategy will increase the retry delay at a fixed rate > for each attempt. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33702) Add IncrementalDelayRetryStrategy in AsyncRetryStrategies
[ https://issues.apache.org/jira/browse/FLINK-33702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33702: - Description: RetryStrategy now supports FixedRetryStrategy and ExponentialBackoffRetryStrategy. In certain scenarios, we also need IncrementalDelayRetryStrategy to reduce the retry count and perform the action more timely. IncrementalDelayRetryStrategy will increase the retry delay at a fixed rate for each attempt. was: AsyncRetryStrategies now supports NoRetryStrategy, FixedDelayRetryStrategy and ExponentialBackoffDelayRetryStrategy. In certain scenarios, we also need IncrementalDelayRetryStrategy to reduce the retry count and perform the action more timely. IncrementalDelayRetryStrategy will increase the retry delay at a fixed rate for each attempt. > Add IncrementalDelayRetryStrategy in AsyncRetryStrategies > -- > > Key: FLINK-33702 > URL: https://issues.apache.org/jira/browse/FLINK-33702 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: xiangyu feng >Priority: Major > Labels: pull-request-available > > RetryStrategy now supports FixedRetryStrategy and > ExponentialBackoffRetryStrategy. > In certain scenarios, we also need IncrementalDelayRetryStrategy to reduce > the retry count and perform the action more timely. > IncrementalDelayRetryStrategy will increase the retry delay at a fixed rate > for each attempt. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (FLINK-33702) Add IncrementalDelayRetryStrategy in AsyncRetryStrategies
[ https://issues.apache.org/jira/browse/FLINK-33702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng reopened FLINK-33702: -- `RetryStrategy` in org.apache.flink.util.concurrent can be used in `CollectResultFetcher`. > Add IncrementalDelayRetryStrategy in AsyncRetryStrategies > -- > > Key: FLINK-33702 > URL: https://issues.apache.org/jira/browse/FLINK-33702 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: xiangyu feng >Priority: Major > Labels: pull-request-available > > AsyncRetryStrategies now supports NoRetryStrategy, FixedDelayRetryStrategy > and ExponentialBackoffDelayRetryStrategy. > In certain scenarios, we also need IncrementalDelayRetryStrategy to reduce > the retry count and perform the action more timely. > IncrementalDelayRetryStrategy will increase the retry delay at a fixed rate > for each attempt. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-33702) Add IncrementalDelayRetryStrategy in AsyncRetryStrategies
[ https://issues.apache.org/jira/browse/FLINK-33702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng closed FLINK-33702. Resolution: Not A Problem `AsyncRetryStrategy` is designed for AsyncWaitOperator to use. It may not suitable for `CollectResultFetcher` to use. > Add IncrementalDelayRetryStrategy in AsyncRetryStrategies > -- > > Key: FLINK-33702 > URL: https://issues.apache.org/jira/browse/FLINK-33702 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: xiangyu feng >Priority: Major > Labels: pull-request-available > > AsyncRetryStrategies now supports NoRetryStrategy, FixedDelayRetryStrategy > and ExponentialBackoffDelayRetryStrategy. > In certain scenarios, we also need IncrementalDelayRetryStrategy to reduce > the retry count and perform the action more timely. > IncrementalDelayRetryStrategy will increase the retry delay at a fixed rate > for each attempt. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33702) Add IncrementalDelayRetryStrategy in AsyncRetryStrategies
[ https://issues.apache.org/jira/browse/FLINK-33702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791883#comment-17791883 ] xiangyu feng commented on FLINK-33702: -- Hi [~huwh] , would u kindly assign this to me and help review this pr? > Add IncrementalDelayRetryStrategy in AsyncRetryStrategies > -- > > Key: FLINK-33702 > URL: https://issues.apache.org/jira/browse/FLINK-33702 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: xiangyu feng >Priority: Major > Labels: pull-request-available > > AsyncRetryStrategies now supports NoRetryStrategy, FixedDelayRetryStrategy > and ExponentialBackoffDelayRetryStrategy. > In certain scenarios, we also need IncrementalDelayRetryStrategy to reduce > the retry count and perform the action more timely. > IncrementalDelayRetryStrategy will increase the retry delay at a fixed rate > for each attempt. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33702) Add IncrementalDelayRetryStrategy in AsyncRetryStrategies
[ https://issues.apache.org/jira/browse/FLINK-33702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791508#comment-17791508 ] xiangyu feng commented on FLINK-33702: -- Hi [~lincoln.86xy], I'll also work on this if it is ok from your side. > Add IncrementalDelayRetryStrategy in AsyncRetryStrategies > -- > > Key: FLINK-33702 > URL: https://issues.apache.org/jira/browse/FLINK-33702 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: xiangyu feng >Priority: Major > > AsyncRetryStrategies now supports NoRetryStrategy, FixedDelayRetryStrategy > and ExponentialBackoffDelayRetryStrategy. > In certain scenarios, we also need IncrementalDelayRetryStrategy to reduce > the retry count and perform the action more timely. > IncrementalDelayRetryStrategy will increase the retry delay at a fixed rate > for each attempt. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33702) Add IncrementalDelayRetryStrategy in AsyncRetryStrategies
[ https://issues.apache.org/jira/browse/FLINK-33702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33702: - Issue Type: Improvement (was: Bug) > Add IncrementalDelayRetryStrategy in AsyncRetryStrategies > -- > > Key: FLINK-33702 > URL: https://issues.apache.org/jira/browse/FLINK-33702 > Project: Flink > Issue Type: Improvement > Components: API / DataStream >Reporter: xiangyu feng >Priority: Major > > AsyncRetryStrategies now supports NoRetryStrategy, FixedDelayRetryStrategy > and ExponentialBackoffDelayRetryStrategy. > In certain scenarios, we also need IncrementalDelayRetryStrategy to reduce > the retry count and perform the action more timely. > IncrementalDelayRetryStrategy will increase the retry delay at a fixed rate > for each attempt. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33702) Add IncrementalDelayRetryStrategy in AsyncRetryStrategies
xiangyu feng created FLINK-33702: Summary: Add IncrementalDelayRetryStrategy in AsyncRetryStrategies Key: FLINK-33702 URL: https://issues.apache.org/jira/browse/FLINK-33702 Project: Flink Issue Type: Bug Components: API / DataStream Reporter: xiangyu feng AsyncRetryStrategies now supports NoRetryStrategy, FixedDelayRetryStrategy and ExponentialBackoffDelayRetryStrategy. In certain scenarios, we also need IncrementalDelayRetryStrategy to reduce the retry count and perform the action more timely. IncrementalDelayRetryStrategy will increase the retry delay at a fixed rate for each attempt. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33702) Add IncrementalDelayRetryStrategy in AsyncRetryStrategies
[ https://issues.apache.org/jira/browse/FLINK-33702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33702: - Description: AsyncRetryStrategies now supports NoRetryStrategy, FixedDelayRetryStrategy and ExponentialBackoffDelayRetryStrategy. In certain scenarios, we also need IncrementalDelayRetryStrategy to reduce the retry count and perform the action more timely. IncrementalDelayRetryStrategy will increase the retry delay at a fixed rate for each attempt. was: AsyncRetryStrategies now supports NoRetryStrategy, FixedDelayRetryStrategy and ExponentialBackoffDelayRetryStrategy. In certain scenarios, we also need IncrementalDelayRetryStrategy to reduce the retry count and perform the action more timely. IncrementalDelayRetryStrategy will increase the retry delay at a fixed rate for each attempt. > Add IncrementalDelayRetryStrategy in AsyncRetryStrategies > -- > > Key: FLINK-33702 > URL: https://issues.apache.org/jira/browse/FLINK-33702 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: xiangyu feng >Priority: Major > > AsyncRetryStrategies now supports NoRetryStrategy, FixedDelayRetryStrategy > and ExponentialBackoffDelayRetryStrategy. > In certain scenarios, we also need IncrementalDelayRetryStrategy to reduce > the retry count and perform the action more timely. > IncrementalDelayRetryStrategy will increase the retry delay at a fixed rate > for each attempt. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33684) Improve the retry strategy in CollectResultFetcher
[ https://issues.apache.org/jira/browse/FLINK-33684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33684: - Description: Currently CollectResultFetcher use a fixed retry interval. {code:java} private void sleepBeforeRetry() { if (retryMillis <= 0) { return; } try { // TODO a more proper retry strategy? Thread.sleep(retryMillis); } catch (InterruptedException e) { LOG.warn("Interrupted when sleeping before a retry", e); } } {code} This can be improved with a RetryStrategy. was: Currently CollectResultFetcher use a fixed retry interval. {code:java} private void sleepBeforeRetry() { if (retryMillis <= 0) { return; } try { // TODO a more proper retry strategy? Thread.sleep(retryMillis); } catch (InterruptedException e) { LOG.warn("Interrupted when sleeping before a retry", e); } } {code} This can be improved with a WaitStrategy. > Improve the retry strategy in CollectResultFetcher > -- > > Key: FLINK-33684 > URL: https://issues.apache.org/jira/browse/FLINK-33684 > Project: Flink > Issue Type: Sub-task > Components: API / DataStream >Reporter: xiangyu feng >Priority: Major > > Currently CollectResultFetcher use a fixed retry interval. > {code:java} > private void sleepBeforeRetry() { > if (retryMillis <= 0) { > return; > } > try { > // TODO a more proper retry strategy? > Thread.sleep(retryMillis); > } catch (InterruptedException e) { > LOG.warn("Interrupted when sleeping before a retry", e); > } > } {code} > This can be improved with a RetryStrategy. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33698) Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy
[ https://issues.apache.org/jira/browse/FLINK-33698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791405#comment-17791405 ] xiangyu feng commented on FLINK-33698: -- [~lincoln.86xy] Sure, would you kindly assign this to me? > Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy > > > Key: FLINK-33698 > URL: https://issues.apache.org/jira/browse/FLINK-33698 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: xiangyu feng >Priority: Major > > The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` should > consider currentAttempts. > > Current Version: > {code:java} > @Override > public long getBackoffTimeMillis(int currentAttempts) { > if (currentAttempts <= 1) { > // equivalent to initial delay > return lastRetryDelay; > } > long backoff = Math.min((long) (lastRetryDelay * multiplier), > maxRetryDelay); > this.lastRetryDelay = backoff; > return backoff; > } {code} > Fixed Version: > {code:java} > @Override > public long getBackoffTimeMillis(int currentAttempts) { > if (currentAttempts <= 1) { > // equivalent to initial delay > return initialDelay; > } > long backoff = > Math.min( > (long) (initialDelay * Math.pow(multiplier, > currentAttempts - 1)), > maxRetryDelay); > return backoff; > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33698) Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy
[ https://issues.apache.org/jira/browse/FLINK-33698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791252#comment-17791252 ] xiangyu feng commented on FLINK-33698: -- Hi [~lincoln.86xy] , what do you think about this? > Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy > > > Key: FLINK-33698 > URL: https://issues.apache.org/jira/browse/FLINK-33698 > Project: Flink > Issue Type: Bug > Components: API / DataStream >Reporter: xiangyu feng >Priority: Major > > The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` should > consider currentAttempts. > > Current Version: > {code:java} > @Override > public long getBackoffTimeMillis(int currentAttempts) { > if (currentAttempts <= 1) { > // equivalent to initial delay > return lastRetryDelay; > } > long backoff = Math.min((long) (lastRetryDelay * multiplier), > maxRetryDelay); > this.lastRetryDelay = backoff; > return backoff; > } {code} > Fixed Version: > {code:java} > @Override > public long getBackoffTimeMillis(int currentAttempts) { > if (currentAttempts <= 1) { > // equivalent to initial delay > return initialDelay; > } > long backoff = > Math.min( > (long) (initialDelay * Math.pow(multiplier, > currentAttempts - 1)), > maxRetryDelay); > return backoff; > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33698) Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy
xiangyu feng created FLINK-33698: Summary: Fix the backoff time calculation in ExponentialBackoffDelayRetryStrategy Key: FLINK-33698 URL: https://issues.apache.org/jira/browse/FLINK-33698 Project: Flink Issue Type: Bug Components: API / DataStream Reporter: xiangyu feng The backoff time calculation in `ExponentialBackoffDelayRetryStrategy` should consider currentAttempts. Current Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // equivalent to initial delay return lastRetryDelay; } long backoff = Math.min((long) (lastRetryDelay * multiplier), maxRetryDelay); this.lastRetryDelay = backoff; return backoff; } {code} Fixed Version: {code:java} @Override public long getBackoffTimeMillis(int currentAttempts) { if (currentAttempts <= 1) { // equivalent to initial delay return initialDelay; } long backoff = Math.min( (long) (initialDelay * Math.pow(multiplier, currentAttempts - 1)), maxRetryDelay); return backoff; } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33683) Improve the performance of submitting jobs and fetching results to a running flink cluster
[ https://issues.apache.org/jira/browse/FLINK-33683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33683: - Description: Now there are lots of unnecessary overhead involved in submitting jobs and fetching results to a long-running flink cluster. This works well for streaming and batch job, because in these scenarios users will not frequently submit jobs and fetch result to a running cluster. But in OLAP scenario, users will continuously submit lots of short-lived jobs to the running cluster. In this situation, these overhead will have a huge impact on the E2E performance. Here are some examples of unnecessary overhead: * Each `RemoteExecutor` will create a new `StandaloneClusterDescriptor` when executing a job on the same remote cluster * `StandaloneClusterDescriptor` will always create a new `RestClusterClient` when retrieving an existing Flink Cluster * Each `RestClusterClient` will create a new `ClientHighAvailabilityServices` which might contains a resource-consuming ha client(ZKClient or KubeClient) and a time-consuming leader retrieval operation * `RestClient` will create a new connection for every request which costs extra connection establishment time Therefore, I suggest creating this ticket and following subtasks to improve this performance. This ticket is also relates to FLINK-25318. was: There is now a lot of unnecessary overhead involved in submitting jobs and fetching results to a long-running flink cluster. This works well for streaming and batch job, because in these scenarios users will not frequently submit jobs and fetch result to a running cluster. But in OLAP scenario, users will continuously submit lots of short-lived jobs to the running cluster. In this situation, these overhead will have a huge impact on the E2E performance. Here are some examples of unnecessary overhead: * Each `RemoteExecutor` will create a new `StandaloneClusterDescriptor` when executing a job on the same remote cluster * `StandaloneClusterDescriptor` will always create a new `RestClusterClient` when retrieving an existing Flink Cluster * Each `RestClusterClient` will create a new `ClientHighAvailabilityServices` which might contains a resource-consuming ha client(ZKClient or KubeClient) and a time-consuming leader retrieval operation * `RestClient` will create a new connection for every request which costs extra connection establishment time Therefore, I suggest creating this ticket and following subtasks to improve this performance. This ticket is also relates to FLINK-25318. > Improve the performance of submitting jobs and fetching results to a running > flink cluster > -- > > Key: FLINK-33683 > URL: https://issues.apache.org/jira/browse/FLINK-33683 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission, Table SQL / Client >Reporter: xiangyu feng >Priority: Major > > Now there are lots of unnecessary overhead involved in submitting jobs and > fetching results to a long-running flink cluster. This works well for > streaming and batch job, because in these scenarios users will not frequently > submit jobs and fetch result to a running cluster. > > But in OLAP scenario, users will continuously submit lots of short-lived jobs > to the running cluster. In this situation, these overhead will have a huge > impact on the E2E performance. Here are some examples of unnecessary > overhead: > * Each `RemoteExecutor` will create a new `StandaloneClusterDescriptor` when > executing a job on the same remote cluster > * `StandaloneClusterDescriptor` will always create a new `RestClusterClient` > when retrieving an existing Flink Cluster > * Each `RestClusterClient` will create a new > `ClientHighAvailabilityServices` which might contains a resource-consuming ha > client(ZKClient or KubeClient) and a time-consuming leader retrieval operation > * `RestClient` will create a new connection for every request which costs > extra connection establishment time > > Therefore, I suggest creating this ticket and following subtasks to improve > this performance. This ticket is also relates to FLINK-25318. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32756) Reuse ClientHighAvailabilityServices when creating RestClusterClient
[ https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32756: - Parent: FLINK-33683 Issue Type: Sub-task (was: Improvement) > Reuse ClientHighAvailabilityServices when creating RestClusterClient > > > Key: FLINK-32756 > URL: https://issues.apache.org/jira/browse/FLINK-32756 > Project: Flink > Issue Type: Sub-task > Components: Client / Job Submission >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > > Currently, every newly built RestClusterClient will create a new > ClientHighAvailabilityServices which is both unnecessary and resource > consuming. For example, each ZooKeeperClientHAServices contains a ZKClient > which holds a connection to ZK server and several related threads. > By reusing ClientHighAvailabilityServices across multiple RestClusterClient > instances, we can save system resources(threads, connections), connection > establish time and leader retrieval time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32756) Reuse ClientHighAvailabilityServices when creating RestClusterClient
[ https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32756: - Parent: (was: FLINK-25318) Issue Type: Improvement (was: Sub-task) > Reuse ClientHighAvailabilityServices when creating RestClusterClient > > > Key: FLINK-32756 > URL: https://issues.apache.org/jira/browse/FLINK-32756 > Project: Flink > Issue Type: Improvement > Components: Client / Job Submission >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > > Currently, every newly built RestClusterClient will create a new > ClientHighAvailabilityServices which is both unnecessary and resource > consuming. For example, each ZooKeeperClientHAServices contains a ZKClient > which holds a connection to ZK server and several related threads. > By reusing ClientHighAvailabilityServices across multiple RestClusterClient > instances, we can save system resources(threads, connections), connection > establish time and leader retrieval time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33688) Reuse Channels in RestClient to save connection establish time
xiangyu feng created FLINK-33688: Summary: Reuse Channels in RestClient to save connection establish time Key: FLINK-33688 URL: https://issues.apache.org/jira/browse/FLINK-33688 Project: Flink Issue Type: Sub-task Components: Client / Job Submission Reporter: xiangyu feng RestClient can reuse the connections to Dispatcher when submitting http requests to a long running Flink cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33687) Reuse RestClusterClient in StandaloneClusterDescriptor to avoid frequent thread create/destroy
xiangyu feng created FLINK-33687: Summary: Reuse RestClusterClient in StandaloneClusterDescriptor to avoid frequent thread create/destroy Key: FLINK-33687 URL: https://issues.apache.org/jira/browse/FLINK-33687 Project: Flink Issue Type: Sub-task Components: Client / Job Submission Reporter: xiangyu feng `RestClusterClient` can also be reused when submitting programs to a long-running Flink Cluster -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33686) Reuse StandaloneClusterDescriptor in RemoteExecutor when executing jobs on the same cluster
xiangyu feng created FLINK-33686: Summary: Reuse StandaloneClusterDescriptor in RemoteExecutor when executing jobs on the same cluster Key: FLINK-33686 URL: https://issues.apache.org/jira/browse/FLINK-33686 Project: Flink Issue Type: Sub-task Components: Client / Job Submission Reporter: xiangyu feng Multiple `RemoteExecutor` instances can reuse the same `StandaloneClusterDescriptor` when executing jobs to a same running cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33685) StandaloneClusterId need to distinguish different remote clusters
xiangyu feng created FLINK-33685: Summary: StandaloneClusterId need to distinguish different remote clusters Key: FLINK-33685 URL: https://issues.apache.org/jira/browse/FLINK-33685 Project: Flink Issue Type: Sub-task Components: Client / Job Submission Reporter: xiangyu feng `StandaloneClusterId` is a singleton, which means `StandaloneClusterDescriptor` cannot distinguish different remote running clusters. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33684) Improve the retry strategy in CollectResultFetcher
xiangyu feng created FLINK-33684: Summary: Improve the retry strategy in CollectResultFetcher Key: FLINK-33684 URL: https://issues.apache.org/jira/browse/FLINK-33684 Project: Flink Issue Type: Sub-task Components: API / DataStream Reporter: xiangyu feng Currently CollectResultFetcher use a fixed retry interval. {code:java} private void sleepBeforeRetry() { if (retryMillis <= 0) { return; } try { // TODO a more proper retry strategy? Thread.sleep(retryMillis); } catch (InterruptedException e) { LOG.warn("Interrupted when sleeping before a retry", e); } } {code} This can be improved with a WaitStrategy. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33683) Improve the performance of submitting jobs and fetching results to a running flink cluster
xiangyu feng created FLINK-33683: Summary: Improve the performance of submitting jobs and fetching results to a running flink cluster Key: FLINK-33683 URL: https://issues.apache.org/jira/browse/FLINK-33683 Project: Flink Issue Type: Improvement Components: Client / Job Submission, Table SQL / Client Reporter: xiangyu feng There is now a lot of unnecessary overhead involved in submitting jobs and fetching results to a long-running flink cluster. This works well for streaming and batch job, because in these scenarios users will not frequently submit jobs and fetch result to a running cluster. But in OLAP scenario, users will continuously submit lots of short-lived jobs to the running cluster. In this situation, these overhead will have a huge impact on the E2E performance. Here are some examples of unnecessary overhead: * Each `RemoteExecutor` will create a new `StandaloneClusterDescriptor` when executing a job on the same remote cluster * `StandaloneClusterDescriptor` will always create a new `RestClusterClient` when retrieving an existing Flink Cluster * Each `RestClusterClient` will create a new `ClientHighAvailabilityServices` which might contains a resource-consuming ha client(ZKClient or KubeClient) and a time-consuming leader retrieval operation * `RestClient` will create a new connection for every request which costs extra connection establishment time Therefore, I suggest creating this ticket and following subtasks to improve this performance. This ticket is also relates to FLINK-25318. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32756) Reuse ClientHighAvailabilityServices when creating RestClusterClient
[ https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32756: - Summary: Reuse ClientHighAvailabilityServices when creating RestClusterClient (was: Reuse ClientHighAvailabilityServices in RestClusterClient when submitting OLAP jobs) > Reuse ClientHighAvailabilityServices when creating RestClusterClient > > > Key: FLINK-32756 > URL: https://issues.apache.org/jira/browse/FLINK-32756 > Project: Flink > Issue Type: Sub-task > Components: Client / Job Submission >Reporter: xiangyu feng >Priority: Major > > Currently, every newly built RestClusterClient will create a new > ClientHighAvailabilityServices which is both unnecessary and resource > consuming. For example, each ZooKeeperClientHAServices contains a ZKClient > which holds a connection to ZK server and several related threads. > By reusing ClientHighAvailabilityServices across multiple RestClusterClient > instances, we can save system resources(threads, connections), connection > establish time and leader retrieval time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32756) Reuse ClientHighAvailabilityServices when creating RestClusterClient
[ https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17790544#comment-17790544 ] xiangyu feng commented on FLINK-32756: -- [~guoyangze] Hi, would u kindly assign this Jira to me. > Reuse ClientHighAvailabilityServices when creating RestClusterClient > > > Key: FLINK-32756 > URL: https://issues.apache.org/jira/browse/FLINK-32756 > Project: Flink > Issue Type: Sub-task > Components: Client / Job Submission >Reporter: xiangyu feng >Priority: Major > > Currently, every newly built RestClusterClient will create a new > ClientHighAvailabilityServices which is both unnecessary and resource > consuming. For example, each ZooKeeperClientHAServices contains a ZKClient > which holds a connection to ZK server and several related threads. > By reusing ClientHighAvailabilityServices across multiple RestClusterClient > instances, we can save system resources(threads, connections), connection > establish time and leader retrieval time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32756) Reuse ClientHighAvailabilityServices in RestClusterClient when submitting OLAP jobs
[ https://issues.apache.org/jira/browse/FLINK-32756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32756: - Description: Currently, every newly built RestClusterClient will create a new ClientHighAvailabilityServices which is both unnecessary and resource consuming. For example, each ZooKeeperClientHAServices contains a ZKClient which holds a connection to ZK server and several related threads. By reusing ClientHighAvailabilityServices across multiple RestClusterClient instances, we can save system resources(threads, connections), connection establish time and leader retrieval time. was: In OLAP scenario, we submit queries to flink session cluster through the flink-sql-gateway service. When receiving queries, the gateway service will create sessions to handle the query, each session will create a new RestClusterClient and a new ClientHAServices. In our production usage, we have enabled JobManager ZK HA and use ZooKeeperClientHAServices to do service discovery. Each ZKClientHAServices will establish a network connection with ZK and create four ZK related threads. When QPS reaches 200, more than 1000 sessions are created in a single flink-sql-gateway instance, which means more than 1000 ZK connections and 4000 ZK related threads are created simultaneously. This will raise a significant stability risk in production. To address this problem, we have implemented SharedZKClientHAService for different sessions to share a ZK connection and ZKClient. This works well in our production. > Reuse ClientHighAvailabilityServices in RestClusterClient when submitting > OLAP jobs > --- > > Key: FLINK-32756 > URL: https://issues.apache.org/jira/browse/FLINK-32756 > Project: Flink > Issue Type: Sub-task > Components: Client / Job Submission >Reporter: xiangyu feng >Priority: Major > > Currently, every newly built RestClusterClient will create a new > ClientHighAvailabilityServices which is both unnecessary and resource > consuming. For example, each ZooKeeperClientHAServices contains a ZKClient > which holds a connection to ZK server and several related threads. > By reusing ClientHighAvailabilityServices across multiple RestClusterClient > instances, we can save system resources(threads, connections), connection > establish time and leader retrieval time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33458) Add env.java.opts.gateway option in flink-conf.yaml
[ https://issues.apache.org/jira/browse/FLINK-33458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17782983#comment-17782983 ] xiangyu feng commented on FLINK-33458: -- Hi [~agsharath], thx for reporting this. However, I think this is resolved in https://issues.apache.org/jira/browse/FLINK-33203 . > Add env.java.opts.gateway option in flink-conf.yaml > --- > > Key: FLINK-33458 > URL: https://issues.apache.org/jira/browse/FLINK-33458 > Project: Flink > Issue Type: Improvement > Components: Table SQL / Gateway >Affects Versions: 1.18.0 >Reporter: Sharath Avadoot Gururaj >Priority: Minor > Labels: easyfix > > Currently, JobManager, TaskManager, HistoryServer, Client have their own > configuration keys to set java options in flink-conf.yaml > However it is missing for SQL Gateway. It should be added for completeness. > It can be useful, for example, to add Remote Debugging options. > I propose this config key > > {noformat} > env.java.opts.gateway{noformat} > in flink-conf.yaml which will be applied to SQL Gateway. I already have a > working patch. Its a pretty small one. If the community is OK, I will raise a > PR -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (FLINK-33235) Improve the Quickstart guide for Flink OLAP and translate to Chinese
[ https://issues.apache.org/jira/browse/FLINK-33235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng resolved FLINK-33235. -- Fix Version/s: 1.19.0 Resolution: Fixed > Improve the Quickstart guide for Flink OLAP and translate to Chinese > > > Key: FLINK-33235 > URL: https://issues.apache.org/jira/browse/FLINK-33235 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > > 1, Follow the > [https://flink.apache.org/how-to-contribute/documentation-style-guide/,|https://flink.apache.org/how-to-contribute/documentation-style-guide/] > use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; > 2, Add architecture graphs in the doc; > 3, Fix typos and misconfigured options in the doc; > 4, Add new introduced options; > 5, Translate to Chinese; -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33235) Improve the Quickstart guide for Flink OLAP and translate to Chinese
[ https://issues.apache.org/jira/browse/FLINK-33235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33235: - Description: 1, Follow the [https://flink.apache.org/how-to-contribute/documentation-style-guide/,|https://flink.apache.org/how-to-contribute/documentation-style-guide/] use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; 2, Add architecture graphs in the doc; 3, Fix typos and misconfigured options in the doc; 4, Add new introduced options; 5, Translate to Chinese; was: 1, Follow the [https://flink.apache.org/how-to-contribute/documentation-style-guide/,|https://flink.apache.org/how-to-contribute/documentation-style-guide/] use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; 2, Add architecture graphs in the doc; 3, Fix typos and misconfigured options in the doc; 4, Translate to Chinese; > Improve the Quickstart guide for Flink OLAP and translate to Chinese > > > Key: FLINK-33235 > URL: https://issues.apache.org/jira/browse/FLINK-33235 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > Labels: pull-request-available > > 1, Follow the > [https://flink.apache.org/how-to-contribute/documentation-style-guide/,|https://flink.apache.org/how-to-contribute/documentation-style-guide/] > use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; > 2, Add architecture graphs in the doc; > 3, Fix typos and misconfigured options in the doc; > 4, Add new introduced options; > 5, Translate to Chinese; -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33235) Improve the Quickstart guide for Flink OLAP and translate to Chinese
[ https://issues.apache.org/jira/browse/FLINK-33235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774866#comment-17774866 ] xiangyu feng commented on FLINK-33235: -- Hi [~martijnvisser], excuse me for a minutes. I will take your advice and continue this work if this is ok for you. > Improve the Quickstart guide for Flink OLAP and translate to Chinese > > > Key: FLINK-33235 > URL: https://issues.apache.org/jira/browse/FLINK-33235 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > > 1, Follow the > [https://flink.apache.org/how-to-contribute/documentation-style-guide/,|https://flink.apache.org/how-to-contribute/documentation-style-guide/] > use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; > 2, Add architecture graphs in the doc; > 3, Fix typos and misconfigured options in the doc; > 4, Translate to Chinese; -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33235) Improve the Quickstart guide for Flink OLAP and translate to Chinese
[ https://issues.apache.org/jira/browse/FLINK-33235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774466#comment-17774466 ] xiangyu feng commented on FLINK-33235: -- [~martijnvisser] Sorry I didn't make myself clear. I mean this document might be released in 1.19 along within its relying features. After then, we might add new features to master branch and then we should update this documents accordingly. > Improve the Quickstart guide for Flink OLAP and translate to Chinese > > > Key: FLINK-33235 > URL: https://issues.apache.org/jira/browse/FLINK-33235 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > > 1, Follow the > [https://flink.apache.org/how-to-contribute/documentation-style-guide/,|https://flink.apache.org/how-to-contribute/documentation-style-guide/] > use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; > 2, Add architecture graphs in the doc; > 3, Fix typos and misconfigured options in the doc; > 4, Translate to Chinese; -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-33209) Translate Flink OLAP quick start guide to Chinese
[ https://issues.apache.org/jira/browse/FLINK-33209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng closed FLINK-33209. Resolution: Duplicate duplicate with FLINK-33235 > Translate Flink OLAP quick start guide to Chinese > - > > Key: FLINK-33209 > URL: https://issues.apache.org/jira/browse/FLINK-33209 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > > Translate Flink OLAP quick start guide to Chinese -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33235) Improve the Quickstart guide for Flink OLAP and translate to Chinese
[ https://issues.apache.org/jira/browse/FLINK-33235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33235: - Summary: Improve the Quickstart guide for Flink OLAP and translate to Chinese (was: Improve the Quickstart guide for Flink OLAP) > Improve the Quickstart guide for Flink OLAP and translate to Chinese > > > Key: FLINK-33235 > URL: https://issues.apache.org/jira/browse/FLINK-33235 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > > 1, Follow the > [https://flink.apache.org/how-to-contribute/documentation-style-guide/,|https://flink.apache.org/how-to-contribute/documentation-style-guide/] > use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; > 2, Add architecture graphs in the doc; > 3, Fix typos and misconfigured options in the doc; > 4, Translate to Chinese; -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33235) Improve the Quickstart guide for Flink OLAP
[ https://issues.apache.org/jira/browse/FLINK-33235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774461#comment-17774461 ] xiangyu feng commented on FLINK-33235: -- [~martijnvisser] I see that now. When this documentation is released, all the referenced feature should also be released. I should only update this document in next release cycle, right? > Improve the Quickstart guide for Flink OLAP > --- > > Key: FLINK-33235 > URL: https://issues.apache.org/jira/browse/FLINK-33235 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > > 1, Follow the > [https://flink.apache.org/how-to-contribute/documentation-style-guide/,|https://flink.apache.org/how-to-contribute/documentation-style-guide/] > use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; > 2, Add architecture graphs in the doc; > 3, Fix typos and misconfigured options in the doc; > 4, Add notes to explain that some OLAP features may have not been released > yet, user can choose to build from the master branch; > 5, Translate to Chinese -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33235) Improve the Quickstart guide for Flink OLAP
[ https://issues.apache.org/jira/browse/FLINK-33235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33235: - Description: 1, Follow the [https://flink.apache.org/how-to-contribute/documentation-style-guide/,|https://flink.apache.org/how-to-contribute/documentation-style-guide/] use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; 2, Add architecture graphs in the doc; 3, Fix typos and misconfigured options in the doc; 4, Translate to Chinese; was: 1, Follow the [https://flink.apache.org/how-to-contribute/documentation-style-guide/,|https://flink.apache.org/how-to-contribute/documentation-style-guide/] use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; 2, Add architecture graphs in the doc; 3, Fix typos and misconfigured options in the doc; 4, Add notes to explain that some OLAP features may have not been released yet, user can choose to build from the master branch; 5, Translate to Chinese > Improve the Quickstart guide for Flink OLAP > --- > > Key: FLINK-33235 > URL: https://issues.apache.org/jira/browse/FLINK-33235 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > > 1, Follow the > [https://flink.apache.org/how-to-contribute/documentation-style-guide/,|https://flink.apache.org/how-to-contribute/documentation-style-guide/] > use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; > 2, Add architecture graphs in the doc; > 3, Fix typos and misconfigured options in the doc; > 4, Translate to Chinese; -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33235) Improve the Quickstart guide for Flink OLAP
[ https://issues.apache.org/jira/browse/FLINK-33235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33235: - Description: 1, Follow the [https://flink.apache.org/how-to-contribute/documentation-style-guide/,|https://flink.apache.org/how-to-contribute/documentation-style-guide/] use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; 2, Add architecture graphs in the doc; 3, Fix typos and misconfigured options in the doc; 4, Add notes to explain that some OLAP features may have not been released yet, user can choose to build from the master branch; 5, Translate to Chinese was: 1, Follow the [https://flink.apache.org/how-to-contribute/documentation-style-guide/,|https://flink.apache.org/how-to-contribute/documentation-style-guide/] use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; 2, Add architecture graphs in the doc; 3, Fix typos and misconfigured options in the doc; 4, Add notes to explain that some OLAP features may have not been released yet, user can choose to build from the master branch; > Improve the Quickstart guide for Flink OLAP > --- > > Key: FLINK-33235 > URL: https://issues.apache.org/jira/browse/FLINK-33235 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > > 1, Follow the > [https://flink.apache.org/how-to-contribute/documentation-style-guide/,|https://flink.apache.org/how-to-contribute/documentation-style-guide/] > use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; > 2, Add architecture graphs in the doc; > 3, Fix typos and misconfigured options in the doc; > 4, Add notes to explain that some OLAP features may have not been released > yet, user can choose to build from the master branch; > 5, Translate to Chinese -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-33235) Improve the Quickstart guide for Flink OLAP
[ https://issues.apache.org/jira/browse/FLINK-33235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774452#comment-17774452 ] xiangyu feng edited comment on FLINK-33235 at 10/12/23 11:37 AM: - [~martijnvisser] Hi Martjin, thx for your comment. I see your point now and I agree with you on this. However, I still want to improve this doc in following ways: 1, After checking the [https://flink.apache.org/how-to-contribute/documentation-style-guide/,|https://flink.apache.org/how-to-contribute/documentation-style-guide/] we should use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; 2, Add architecture graphs in this doc; 3, Fix typos and misconfigured options in this doc; 4, Add notes to explain that some OLAP features may have not released yet, user can choose to build from the master branch; I will change the tittle and description accordingly, WDYT? was (Author: JIRAUSER301129): [~martijnvisser] Hi Martjin, thx for your comment. I see your point now and I agree with you on this. However, I still want to improve this doc in following ways: 1, After checking the [Documentation Guide|[https://flink.apache.org/how-to-contribute/documentation-style-guide/],] we should use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; 2, Add architecture graphs in this doc; 3, Fix typos and misconfigured options in this doc; 4, Add notes to explain that some OLAP features may have not released yet, user can choose to build from the master branch; I will change the tittle and description accordingly, WDYT? > Improve the Quickstart guide for Flink OLAP > --- > > Key: FLINK-33235 > URL: https://issues.apache.org/jira/browse/FLINK-33235 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > > 1, Follow the [Documentation > Guide|[https://flink.apache.org/how-to-contribute/documentation-style-guide/]], > use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; > 2, Add architecture graphs in the doc; > 3, Fix typos and misconfigured options in the doc; > 4, Add notes to explain that some OLAP features may have not been released > yet, user can choose to build from the master branch; -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33235) Improve the Quickstart guide for Flink OLAP
[ https://issues.apache.org/jira/browse/FLINK-33235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33235: - Description: 1, Follow the [https://flink.apache.org/how-to-contribute/documentation-style-guide/,|https://flink.apache.org/how-to-contribute/documentation-style-guide/] use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; 2, Add architecture graphs in the doc; 3, Fix typos and misconfigured options in the doc; 4, Add notes to explain that some OLAP features may have not been released yet, user can choose to build from the master branch; was: 1, Follow the [Documentation Guide|[https://flink.apache.org/how-to-contribute/documentation-style-guide/]], use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; 2, Add architecture graphs in the doc; 3, Fix typos and misconfigured options in the doc; 4, Add notes to explain that some OLAP features may have not been released yet, user can choose to build from the master branch; > Improve the Quickstart guide for Flink OLAP > --- > > Key: FLINK-33235 > URL: https://issues.apache.org/jira/browse/FLINK-33235 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > > 1, Follow the > [https://flink.apache.org/how-to-contribute/documentation-style-guide/,|https://flink.apache.org/how-to-contribute/documentation-style-guide/] > use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; > 2, Add architecture graphs in the doc; > 3, Fix typos and misconfigured options in the doc; > 4, Add notes to explain that some OLAP features may have not been released > yet, user can choose to build from the master branch; -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33235) Improve the Quickstart guide for Flink OLAP
[ https://issues.apache.org/jira/browse/FLINK-33235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33235: - Description: 1, Follow the [Documentation Guide|[https://flink.apache.org/how-to-contribute/documentation-style-guide/]], use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; 2, Add architecture graphs in the doc; 3, Fix typos and misconfigured options in the doc; 4, Add notes to explain that some OLAP features may have not been released yet, user can choose to build from the master branch; was:Many features required by OLAP session cluster are still in master branch or in-progress and not released yet, for example: FLIP-295, FLIP-362, FLIP-374. We need to address this in the document and show users how to quickly build OLAP Session Cluster from master branch. > Improve the Quickstart guide for Flink OLAP > --- > > Key: FLINK-33235 > URL: https://issues.apache.org/jira/browse/FLINK-33235 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > > 1, Follow the [Documentation > Guide|[https://flink.apache.org/how-to-contribute/documentation-style-guide/]], > use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; > 2, Add architecture graphs in the doc; > 3, Fix typos and misconfigured options in the doc; > 4, Add notes to explain that some OLAP features may have not been released > yet, user can choose to build from the master branch; -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33235) Improve the Quickstart guide for Flink OLAP
[ https://issues.apache.org/jira/browse/FLINK-33235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33235: - Summary: Improve the Quickstart guide for Flink OLAP (was: Quickstart guide for Flink OLAP should support building from master branch) > Improve the Quickstart guide for Flink OLAP > --- > > Key: FLINK-33235 > URL: https://issues.apache.org/jira/browse/FLINK-33235 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > > Many features required by OLAP session cluster are still in master branch or > in-progress and not released yet, for example: FLIP-295, FLIP-362, FLIP-374. > We need to address this in the document and show users how to quickly build > OLAP Session Cluster from master branch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33235) Quickstart guide for Flink OLAP should support building from master branch
[ https://issues.apache.org/jira/browse/FLINK-33235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774452#comment-17774452 ] xiangyu feng commented on FLINK-33235: -- [~martijnvisser] Hi Martjin, thx for your comment. I see your point now and I agree with you on this. However, I still want to improve this doc in following ways: 1, After checking the [Documentation Guide|[https://flink.apache.org/how-to-contribute/documentation-style-guide/],] we should use 'active voice' and 'you' instead of 'passive voice' and 'we' in the doc; 2, Add architecture graphs in this doc; 3, Fix typos and misconfigured options in this doc; 4, Add notes to explain that some OLAP features may have not released yet, user can choose to build from the master branch; I will change the tittle and description accordingly, WDYT? > Quickstart guide for Flink OLAP should support building from master branch > -- > > Key: FLINK-33235 > URL: https://issues.apache.org/jira/browse/FLINK-33235 > Project: Flink > Issue Type: Sub-task > Components: Documentation >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > > Many features required by OLAP session cluster are still in master branch or > in-progress and not released yet, for example: FLIP-295, FLIP-362, FLIP-374. > We need to address this in the document and show users how to quickly build > OLAP Session Cluster from master branch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (FLINK-32746) Using ZGC in JDK17 to solve long time class unloading STW
[ https://issues.apache.org/jira/browse/FLINK-32746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng closed FLINK-32746. Resolution: Done JDK17 is already supported in https://issues.apache.org/jira/browse/FLINK-15736, user can enable ZGC by configuring by customizing the `env.java.opts.all` parameter. There already have a example in OLAP QuickStart docs:https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/olap_quickstart/ > Using ZGC in JDK17 to solve long time class unloading STW > - > > Key: FLINK-32746 > URL: https://issues.apache.org/jira/browse/FLINK-32746 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Runtime >Reporter: xiangyu feng >Priority: Major > > In a OLAP session cluster, a TM need to frequently create new classloaders > and generate new classes. These classes will be accumulated in metaspace. > When metaspace data usage reaches a threshold, a FullGC with a long time > Stop-the-World will be triggered. Currently, both SerialGC, ParallelGC and > G1GC are doing Stop-the-World class unloading. Only ZGC supports concurrent > class unload, see more in > [https://bugs.openjdk.org/browse/JDK-8218905|https://bugs.openjdk.org/browse/JDK-8218905).]. > > In our scenario, a class unloading for a 2GB metaspace with 5million classes > will stop the application more than 40 seconds. After switch to ZGC, the > maximum STW of the application has been reduced to less than 10ms. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33235) Quickstart guide for Flink OLAP should support building from master branch
xiangyu feng created FLINK-33235: Summary: Quickstart guide for Flink OLAP should support building from master branch Key: FLINK-33235 URL: https://issues.apache.org/jira/browse/FLINK-33235 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: xiangyu feng Many features required by OLAP session cluster are still in master branch or in-progress and not released yet, for example: FLIP-295, FLIP-362, FLIP-374. We need to address this in the document and show users how to quickly build OLAP Session Cluster from master branch. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-33228) Fix the total current resource calculation when fulfilling requirements
[ https://issues.apache.org/jira/browse/FLINK-33228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773597#comment-17773597 ] xiangyu feng edited comment on FLINK-33228 at 10/10/23 9:01 AM: Hi [~guoyangze], I've fixed this issue. Would u kindly assign this Jira to me? was (Author: JIRAUSER301129): Hi [~guoyangze], I fixed this issue by changing the totalCurrentResources from `ResourceProfile` to `InternalResourceInfo`. Would u kindly assign this Jira to me? > Fix the total current resource calculation when fulfilling requirements > --- > > Key: FLINK-33228 > URL: https://issues.apache.org/jira/browse/FLINK-33228 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Affects Versions: 1.19.0 >Reporter: xiangyu feng >Assignee: xiangyu feng >Priority: Major > Labels: pull-request-available > Fix For: 1.19.0 > > Attachments: image-2023-10-10-16-09-23-635.png > > > Currently, the `totalCurrentResources` calculation in > `DefaultResourceAllocationStrategy#tryFulfillRequirements` is wrong. > `ResourceProfile.merge` will not change the original `ResourceProfile`. > !image-2023-10-10-16-09-23-635.png|width=564,height=286! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33228) Fix the total current resource calculation when fulfilling requirements
[ https://issues.apache.org/jira/browse/FLINK-33228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773597#comment-17773597 ] xiangyu feng commented on FLINK-33228: -- Hi [~guoyangze], I fixed this issue by changing the totalCurrentResources from `ResourceProfile` to `InternalResourceInfo`. Would u kindly assign this Jira to me? > Fix the total current resource calculation when fulfilling requirements > --- > > Key: FLINK-33228 > URL: https://issues.apache.org/jira/browse/FLINK-33228 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: xiangyu feng >Priority: Major > Labels: pull-request-available > Attachments: image-2023-10-10-16-09-23-635.png > > > Currently, the `totalCurrentResources` calculation in > `DefaultResourceAllocationStrategy#tryFulfillRequirements` is wrong. > `ResourceProfile.merge` will not change the original `ResourceProfile`. > !image-2023-10-10-16-09-23-635.png|width=564,height=286! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33228) Fix the total current resource calculation when fulfilling requirements
[ https://issues.apache.org/jira/browse/FLINK-33228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33228: - Description: Currently, the `totalCurrentResources` calculation in `DefaultResourceAllocationStrategy#tryFulfillRequirements` is wrong. `ResourceProfile.merge` will not change the original `ResourceProfile`. !image-2023-10-10-16-09-23-635.png|width=564,height=286! > Fix the total current resource calculation when fulfilling requirements > --- > > Key: FLINK-33228 > URL: https://issues.apache.org/jira/browse/FLINK-33228 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: xiangyu feng >Priority: Major > Attachments: image-2023-10-10-16-09-23-635.png > > > Currently, the `totalCurrentResources` calculation in > `DefaultResourceAllocationStrategy#tryFulfillRequirements` is wrong. > `ResourceProfile.merge` will not change the original `ResourceProfile`. > !image-2023-10-10-16-09-23-635.png|width=564,height=286! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33228) Fix the total current resource calculation when fulfilling requirements
[ https://issues.apache.org/jira/browse/FLINK-33228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33228: - Attachment: image-2023-10-10-16-09-23-635.png > Fix the total current resource calculation when fulfilling requirements > --- > > Key: FLINK-33228 > URL: https://issues.apache.org/jira/browse/FLINK-33228 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: xiangyu feng >Priority: Major > Attachments: image-2023-10-10-16-09-23-635.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33228) Fix the total current resource calculation when fulfilling requirements
[ https://issues.apache.org/jira/browse/FLINK-33228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33228: - Summary: Fix the total current resource calculation when fulfilling requirements (was: Fix the total current resource calculation in fulfill requirements) > Fix the total current resource calculation when fulfilling requirements > --- > > Key: FLINK-33228 > URL: https://issues.apache.org/jira/browse/FLINK-33228 > Project: Flink > Issue Type: Bug > Components: Runtime / Coordination >Reporter: xiangyu feng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33228) Fix the total current resource calculation in fulfill requirements
xiangyu feng created FLINK-33228: Summary: Fix the total current resource calculation in fulfill requirements Key: FLINK-33228 URL: https://issues.apache.org/jira/browse/FLINK-33228 Project: Flink Issue Type: Bug Components: Runtime / Coordination Reporter: xiangyu feng -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-33209) Translate Flink OLAP quick start guide to Chinese
[ https://issues.apache.org/jira/browse/FLINK-33209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773093#comment-17773093 ] xiangyu feng commented on FLINK-33209: -- Hi [~shengbo], thanks for your volunteering. Currently we are still improving this document both in English and Chinese version since OLAP is still in fast progress, so we'd rather keeping this work in house. Is that working for you? > Translate Flink OLAP quick start guide to Chinese > - > > Key: FLINK-33209 > URL: https://issues.apache.org/jira/browse/FLINK-33209 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation >Reporter: xiangyu feng >Priority: Major > > Translate Flink OLAP quick start guide to Chinese -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-33209) Translate Flink OLAP quick start guide to Chinese
[ https://issues.apache.org/jira/browse/FLINK-33209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-33209: - Priority: Major (was: Minor) > Translate Flink OLAP quick start guide to Chinese > - > > Key: FLINK-33209 > URL: https://issues.apache.org/jira/browse/FLINK-33209 > Project: Flink > Issue Type: Sub-task > Components: chinese-translation >Reporter: xiangyu feng >Priority: Major > > Translate Flink OLAP quick start guide to Chinese -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-33209) Translate Flink OLAP quick start guide to Chinese
xiangyu feng created FLINK-33209: Summary: Translate Flink OLAP quick start guide to Chinese Key: FLINK-33209 URL: https://issues.apache.org/jira/browse/FLINK-33209 Project: Flink Issue Type: Sub-task Components: chinese-translation Reporter: xiangyu feng Translate Flink OLAP quick start guide to Chinese -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-15959) Add min number of slots configuration to limit total number of slots
[ https://issues.apache.org/jira/browse/FLINK-15959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17770291#comment-17770291 ] xiangyu feng commented on FLINK-15959: -- [~zhuzh] Hi, the [Flip-362|https://cwiki.apache.org/confluence/display/FLINK/FLIP-362%3A+Support+minimum+resource+limitation] has been accepted. we will continue this work. > Add min number of slots configuration to limit total number of slots > > > Key: FLINK-15959 > URL: https://issues.apache.org/jira/browse/FLINK-15959 > Project: Flink > Issue Type: Sub-task > Components: Runtime / Coordination >Affects Versions: 1.11.0 >Reporter: YufeiLiu >Assignee: xiangyu feng >Priority: Major > Labels: auto-deprioritized-major, auto-deprioritized-minor, > auto-unassigned, pull-request-available > > Flink removed `-n` option after FLIP-6, change to ResourceManager start a new > worker when required. But I think maintain a certain amount of slots is > necessary. These workers will start immediately when ResourceManager starts > and would not release even if all slots are free. > Here are some resons: > # Users actually know how many resources are needed when run a single job, > initialize all workers when cluster starts can speed up startup process. > # Job schedule in topology order, next operator won't schedule until prior > execution slot allocated. The TaskExecutors will start in several batchs in > some cases, it might slow down the startup speed. > # Flink support > [FLINK-12122|https://issues.apache.org/jira/browse/FLINK-12122] [Spread out > tasks evenly across all available registered TaskManagers], but it will only > effect if all TMs are registered. Start all TMs at begining can slove this > problem. > *suggestion:* > * Add config "taskmanager.minimum.numberOfTotalSlots" and > "taskmanager.maximum.numberOfTotalSlots", default behavior is still like > before. > * Start plenty number of workers to satisfy minimum slots when > ResourceManager accept leadership(subtract recovered workers). > * Don't comlete slot request until minimum number of slots are registered, > and throw exeception when exceed maximum. > *update* > Finally, we'd like to introduce three config options related to the minimum > resources restriction: > * slotmanager.min-total-resource.cpu > * slotmanager.min-total-resource.memory > * slotmanager.number-of-slots.min > Note that these configuration do not take effect for standalone clusters, > where how many slots are allocated is not controlled by Flink. These config > are best effort and Flink will not block the job progress even if the min > resources are not guaranteed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-25015) job name should not always be `collect` submitted by sql client
[ https://issues.apache.org/jira/browse/FLINK-25015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768410#comment-17768410 ] xiangyu feng edited comment on FLINK-25015 at 9/24/23 4:14 PM: --- [~zjureel] [~libenchao] Hi, I've created a pr for this issue and verified this change manually. Would you kindly help review this pr? !https://user-images.githubusercontent.com/3021821/270177041-6ceea6af-b6ab-4424-8bdf-efce50feed9f.png|width=563,height=352! was (Author: JIRAUSER301129): [~zjureel] [~libenchao] Hi, I've created a pr for this issue and verified this change manually. Would you kindly help review this pr? !image-2023-09-25-00-13-08-681.png|width=736,height=459! > job name should not always be `collect` submitted by sql client > --- > > Key: FLINK-25015 > URL: https://issues.apache.org/jira/browse/FLINK-25015 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Client >Affects Versions: 1.14.0 >Reporter: KevinyhZou >Assignee: xiangyu feng >Priority: Major > Labels: pull-request-available, stale-assigned > Attachments: image-2021-11-23-20-15-32-459.png, > image-2021-11-23-20-16-21-932.png > > > I use flink sql client to submitted different sql query to flink session > cluster, and the sql job name is always `collect`, as below > !image-2021-11-23-20-16-21-932.png! > which make no sence to users. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-25015) job name should not always be `collect` submitted by sql client
[ https://issues.apache.org/jira/browse/FLINK-25015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17768410#comment-17768410 ] xiangyu feng commented on FLINK-25015: -- [~zjureel] [~libenchao] Hi, I've created a pr for this issue and verified this change manually. Would you kindly help review this pr? !image-2023-09-25-00-13-08-681.png|width=736,height=459! > job name should not always be `collect` submitted by sql client > --- > > Key: FLINK-25015 > URL: https://issues.apache.org/jira/browse/FLINK-25015 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / Client >Affects Versions: 1.14.0 >Reporter: KevinyhZou >Assignee: xiangyu feng >Priority: Major > Labels: pull-request-available, stale-assigned > Attachments: image-2021-11-23-20-15-32-459.png, > image-2021-11-23-20-16-21-932.png > > > I use flink sql client to submitted different sql query to flink session > cluster, and the sql job name is always `collect`, as below > !image-2021-11-23-20-16-21-932.png! > which make no sence to users. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (FLINK-32488) Introduce configuration to control ExecutionGraph cache in REST API
[ https://issues.apache.org/jira/browse/FLINK-32488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763989#comment-17763989 ] xiangyu feng edited comment on FLINK-32488 at 9/12/23 2:51 AM: --- Hi [~guoyangze] , [~hejufang001] has created a PR([https://github.com/apache/flink/pull/23387)|https://github.com/apache/flink/pull/23387,] for this issue would you kindly assign this issue to him? was (Author: JIRAUSER301129): Hi [~guoyangze] , [~hejufang001] has created a [PR|[https://github.com/apache/flink/pull/23387|https://github.com/apache/flink/pull/23387,]] for this issue would you kindly assign this issue to him? > Introduce configuration to control ExecutionGraph cache in REST API > --- > > Key: FLINK-32488 > URL: https://issues.apache.org/jira/browse/FLINK-32488 > Project: Flink > Issue Type: Improvement > Components: Runtime / REST >Affects Versions: 1.16.2, 1.17.1 >Reporter: Hong Liang Teoh >Assignee: Jufang He >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > *What* > Currently, REST handlers that inherit from AbstractExecutionGraphHandler > serve information derived from a cached ExecutionGraph. > This ExecutionGraph cache currently derives it's timeout from > {*}web.refresh-interval{*}. The *web.refresh-interval* controls both the > refresh rate of the Flink dashboard and the ExecutionGraph cache timeout. > We should introduce a new configuration to control the ExecutionGraph cache, > namely {*}rest.cache.execution-graph.expiry{*}. > *Why* > Sharing configuration between REST handler and Flink dashboard is a sign that > we are coupling the two. > Ideally, we want our REST API behaviour to independent of the Flink dashboard > (e.g. supports programmatic access). > > Mailing list discussion: > https://lists.apache.org/thread/7o330hfyoqqkkrfhtvz3kp448jcspjrm > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32488) Introduce configuration to control ExecutionGraph cache in REST API
[ https://issues.apache.org/jira/browse/FLINK-32488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763989#comment-17763989 ] xiangyu feng commented on FLINK-32488: -- Hi [~guoyangze] , [~hejufang001] has created a [PR|[https://github.com/apache/flink/pull/23387|https://github.com/apache/flink/pull/23387,]] for this issue would you kindly assign this issue to him? > Introduce configuration to control ExecutionGraph cache in REST API > --- > > Key: FLINK-32488 > URL: https://issues.apache.org/jira/browse/FLINK-32488 > Project: Flink > Issue Type: Improvement > Components: Runtime / REST >Affects Versions: 1.16.2, 1.17.1 >Reporter: Hong Liang Teoh >Assignee: xiangyu feng >Priority: Major > Labels: pull-request-available > Fix For: 1.18.0 > > > *What* > Currently, REST handlers that inherit from AbstractExecutionGraphHandler > serve information derived from a cached ExecutionGraph. > This ExecutionGraph cache currently derives it's timeout from > {*}web.refresh-interval{*}. The *web.refresh-interval* controls both the > refresh rate of the Flink dashboard and the ExecutionGraph cache timeout. > We should introduce a new configuration to control the ExecutionGraph cache, > namely {*}rest.cache.execution-graph.expiry{*}. > *Why* > Sharing configuration between REST handler and Flink dashboard is a sign that > we are coupling the two. > Ideally, we want our REST API behaviour to independent of the Flink dashboard > (e.g. supports programmatic access). > > Mailing list discussion: > https://lists.apache.org/thread/7o330hfyoqqkkrfhtvz3kp448jcspjrm > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32794) Release Testing: Verify FLIP-293: Introduce Flink Jdbc Driver For Sql Gateway
[ https://issues.apache.org/jira/browse/FLINK-32794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17762319#comment-17762319 ] xiangyu feng commented on FLINK-32794: -- [~Sergey Nuyanzin] Hi Sergey, IMHO it's ok to confirm that this feature was tested. > Release Testing: Verify FLIP-293: Introduce Flink Jdbc Driver For Sql Gateway > - > > Key: FLINK-32794 > URL: https://issues.apache.org/jira/browse/FLINK-32794 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.18.0 >Reporter: Qingsheng Ren >Assignee: xiangyu feng >Priority: Major > Fix For: 1.18.0 > > Attachments: image-2023-09-04-15-33-33-074.png, > image-2023-09-04-15-48-19-665.png, image-2023-09-04-15-51-17-161.png > > > Document for jdbc driver: > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/jdbcdriver/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32794) Release Testing: Verify FLIP-293: Introduce Flink Jdbc Driver For Sql Gateway
[ https://issues.apache.org/jira/browse/FLINK-32794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761726#comment-17761726 ] xiangyu feng commented on FLINK-32794: -- Hi [~renqs], I have verified this feature by using with both jdbc tool(SqlLine) and java application. > Release Testing: Verify FLIP-293: Introduce Flink Jdbc Driver For Sql Gateway > - > > Key: FLINK-32794 > URL: https://issues.apache.org/jira/browse/FLINK-32794 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.18.0 >Reporter: Qingsheng Ren >Assignee: xiangyu feng >Priority: Major > Fix For: 1.18.0 > > Attachments: image-2023-09-04-15-33-33-074.png, > image-2023-09-04-15-48-19-665.png, image-2023-09-04-15-51-17-161.png > > > Document for jdbc driver: > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/jdbcdriver/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32794) Release Testing: Verify FLIP-293: Introduce Flink Jdbc Driver For Sql Gateway
[ https://issues.apache.org/jira/browse/FLINK-32794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32794: - Attachment: image-2023-09-04-15-51-17-161.png > Release Testing: Verify FLIP-293: Introduce Flink Jdbc Driver For Sql Gateway > - > > Key: FLINK-32794 > URL: https://issues.apache.org/jira/browse/FLINK-32794 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.18.0 >Reporter: Qingsheng Ren >Assignee: xiangyu feng >Priority: Major > Fix For: 1.18.0 > > Attachments: image-2023-09-04-15-33-33-074.png, > image-2023-09-04-15-48-19-665.png, image-2023-09-04-15-51-17-161.png > > > Document for jdbc driver: > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/jdbcdriver/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32794) Release Testing: Verify FLIP-293: Introduce Flink Jdbc Driver For Sql Gateway
[ https://issues.apache.org/jira/browse/FLINK-32794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761722#comment-17761722 ] xiangyu feng commented on FLINK-32794: -- Verified using the Datasource Connection !image-2023-09-04-15-51-17-161.png|width=1017,height=540! > Release Testing: Verify FLIP-293: Introduce Flink Jdbc Driver For Sql Gateway > - > > Key: FLINK-32794 > URL: https://issues.apache.org/jira/browse/FLINK-32794 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.18.0 >Reporter: Qingsheng Ren >Assignee: xiangyu feng >Priority: Major > Fix For: 1.18.0 > > Attachments: image-2023-09-04-15-33-33-074.png, > image-2023-09-04-15-48-19-665.png, image-2023-09-04-15-51-17-161.png > > > Document for jdbc driver: > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/jdbcdriver/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32794) Release Testing: Verify FLIP-293: Introduce Flink Jdbc Driver For Sql Gateway
[ https://issues.apache.org/jira/browse/FLINK-32794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761720#comment-17761720 ] xiangyu feng commented on FLINK-32794: -- Verified by using with Java application !image-2023-09-04-15-48-19-665.png|width=818,height=403! > Release Testing: Verify FLIP-293: Introduce Flink Jdbc Driver For Sql Gateway > - > > Key: FLINK-32794 > URL: https://issues.apache.org/jira/browse/FLINK-32794 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.18.0 >Reporter: Qingsheng Ren >Assignee: xiangyu feng >Priority: Major > Fix For: 1.18.0 > > Attachments: image-2023-09-04-15-33-33-074.png, > image-2023-09-04-15-48-19-665.png > > > Document for jdbc driver: > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/jdbcdriver/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32794) Release Testing: Verify FLIP-293: Introduce Flink Jdbc Driver For Sql Gateway
[ https://issues.apache.org/jira/browse/FLINK-32794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32794: - Attachment: image-2023-09-04-15-48-19-665.png > Release Testing: Verify FLIP-293: Introduce Flink Jdbc Driver For Sql Gateway > - > > Key: FLINK-32794 > URL: https://issues.apache.org/jira/browse/FLINK-32794 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.18.0 >Reporter: Qingsheng Ren >Assignee: xiangyu feng >Priority: Major > Fix For: 1.18.0 > > Attachments: image-2023-09-04-15-33-33-074.png, > image-2023-09-04-15-48-19-665.png > > > Document for jdbc driver: > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/jdbcdriver/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-32794) Release Testing: Verify FLIP-293: Introduce Flink Jdbc Driver For Sql Gateway
[ https://issues.apache.org/jira/browse/FLINK-32794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761715#comment-17761715 ] xiangyu feng commented on FLINK-32794: -- Verified by using with SqlLine !image-2023-09-04-15-33-33-074.png|width=726,height=615! > Release Testing: Verify FLIP-293: Introduce Flink Jdbc Driver For Sql Gateway > - > > Key: FLINK-32794 > URL: https://issues.apache.org/jira/browse/FLINK-32794 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.18.0 >Reporter: Qingsheng Ren >Assignee: xiangyu feng >Priority: Major > Fix For: 1.18.0 > > Attachments: image-2023-09-04-15-33-33-074.png > > > Document for jdbc driver: > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/jdbcdriver/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (FLINK-32794) Release Testing: Verify FLIP-293: Introduce Flink Jdbc Driver For Sql Gateway
[ https://issues.apache.org/jira/browse/FLINK-32794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyu feng updated FLINK-32794: - Attachment: image-2023-09-04-15-33-33-074.png > Release Testing: Verify FLIP-293: Introduce Flink Jdbc Driver For Sql Gateway > - > > Key: FLINK-32794 > URL: https://issues.apache.org/jira/browse/FLINK-32794 > Project: Flink > Issue Type: Sub-task > Components: Tests >Affects Versions: 1.18.0 >Reporter: Qingsheng Ren >Assignee: xiangyu feng >Priority: Major > Fix For: 1.18.0 > > Attachments: image-2023-09-04-15-33-33-074.png > > > Document for jdbc driver: > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/jdbcdriver/ -- This message was sent by Atlassian Jira (v8.20.10#820010)