[jira] [Commented] (FLINK-20113) Test K8s High Availability Service
[ https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237810#comment-17237810 ] Yang Wang commented on FLINK-20113: --- cc @[~ksp0422] Please share your test results here. If it is really a valid issue, we need to create a ticket to track. > Test K8s High Availability Service > -- > > Key: FLINK-20113 > URL: https://issues.apache.org/jira/browse/FLINK-20113 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Affects Versions: 1.12.0 >Reporter: Robert Metzger >Assignee: Guowei Ma >Priority: Critical > Fix For: 1.12.0 > > > Added in https://issues.apache.org/jira/browse/FLINK-12884 > > [General Information about the Flink 1.12 release > testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing] > When testing a feature, consider the following aspects: > - Is the documentation easy to understand > - Are the error messages, log messages, APIs etc. easy to understand > - Is the feature working as expected under normal conditions > - Is the feature working / failing as expected with invalid input, induced > errors etc. > If you find a problem during testing, please file a ticket > (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket. > During the testing, and once you are finished, please write a short summary > of all things you have tested. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20113) Test K8s High Availability Service
[ https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237595#comment-17237595 ] Robert Metzger commented on FLINK-20113: I'm closing this ticket since the testing is done and we are tracking all findings in separate tickets. > Test K8s High Availability Service > -- > > Key: FLINK-20113 > URL: https://issues.apache.org/jira/browse/FLINK-20113 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Affects Versions: 1.12.0 >Reporter: Robert Metzger >Assignee: Guowei Ma >Priority: Critical > Fix For: 1.12.0 > > > Added in https://issues.apache.org/jira/browse/FLINK-12884 > > [General Information about the Flink 1.12 release > testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing] > When testing a feature, consider the following aspects: > - Is the documentation easy to understand > - Are the error messages, log messages, APIs etc. easy to understand > - Is the feature working as expected under normal conditions > - Is the feature working / failing as expected with invalid input, induced > errors etc. > If you find a problem during testing, please file a ticket > (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket. > During the testing, and once you are finished, please write a short summary > of all things you have tested. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20113) Test K8s High Availability Service
[ https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235209#comment-17235209 ] Robert Metzger commented on FLINK-20113: Thanks a lot for the detailed test report and the ticket's you've filed. > Test K8s High Availability Service > -- > > Key: FLINK-20113 > URL: https://issues.apache.org/jira/browse/FLINK-20113 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Affects Versions: 1.12.0 >Reporter: Robert Metzger >Assignee: Guowei Ma >Priority: Critical > Fix For: 1.12.0 > > > Added in https://issues.apache.org/jira/browse/FLINK-12884 > > [General Information about the Flink 1.12 release > testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing] > When testing a feature, consider the following aspects: > - Is the documentation easy to understand > - Are the error messages, log messages, APIs etc. easy to understand > - Is the feature working as expected under normal conditions > - Is the feature working / failing as expected with invalid input, induced > errors etc. > If you find a problem during testing, please file a ticket > (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket. > During the testing, and once you are finished, please write a short summary > of all things you have tested. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20113) Test K8s High Availability Service
[ https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234734#comment-17234734 ] Guowei Ma commented on FLINK-20113: --- I test four scenarios # Kubernetes ## Session Cluster ### Deploy a session cluster to the k8s ### Access the JobManager Web ### Check the master have the KubernetesLeaderElector log ### Submit a StateMachineExample.jar job ### Verify that there are some complete checkpoint ### Kill the jobmaster pod ### Verify that job could recovery from previous checkpoint ## Perjob Cluster ### Build a perjob image registry.cn-beijing.aliyuncs.com/streamcompute/flink:k8s-ha-per-job ### Deploy Perjob cluster ### Access the JobManager Web ### Check the master have the KubernetesLeaderElector log ### Verify that there are some complete checkpoints ### Kill the pod ### Verify that job could recovery from previous checkpoint # Native Kubernetes ## Session Cluster ### Start a native k8s session ### Access the JobManager web ### Check the KubernetesLeaderElector log ### Submit a StateMachineExample.jar job ### Verify that there are some complete checkpoints. ### Kill the pod ### Verify that job could recovery from previous checkpoint ## Start Application ### Start a flink application ### Access the JobManager web ### Check the KubernetesLeaderElector log ### Kill the pod ### Verify that job could recovery from previous checkpoint - In general the new HA service is work. Most problems I found are about the log and documentation. > Test K8s High Availability Service > -- > > Key: FLINK-20113 > URL: https://issues.apache.org/jira/browse/FLINK-20113 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Affects Versions: 1.12.0 >Reporter: Robert Metzger >Assignee: Guowei Ma >Priority: Critical > Fix For: 1.12.0 > > > Added in https://issues.apache.org/jira/browse/FLINK-12884 > > [General Information about the Flink 1.12 release > testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing] > When testing a feature, consider the following aspects: > - Is the documentation easy to understand > - Are the error messages, log messages, APIs etc. easy to understand > - Is the feature working as expected under normal conditions > - Is the feature working / failing as expected with invalid input, induced > errors etc. > If you find a problem during testing, please file a ticket > (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket. > During the testing, and once you are finished, please write a short summary > of all things you have tested. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20113) Test K8s High Availability Service
[ https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234284#comment-17234284 ] Guowei Ma commented on FLINK-20113: --- [~fly_in_gis] ok. > Test K8s High Availability Service > -- > > Key: FLINK-20113 > URL: https://issues.apache.org/jira/browse/FLINK-20113 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Affects Versions: 1.12.0 >Reporter: Robert Metzger >Assignee: Guowei Ma >Priority: Critical > Fix For: 1.12.0 > > > Added in https://issues.apache.org/jira/browse/FLINK-12884 > > [General Information about the Flink 1.12 release > testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing] > When testing a feature, consider the following aspects: > - Is the documentation easy to understand > - Are the error messages, log messages, APIs etc. easy to understand > - Is the feature working as expected under normal conditions > - Is the feature working / failing as expected with invalid input, induced > errors etc. > If you find a problem during testing, please file a ticket > (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket. > During the testing, and once you are finished, please write a short summary > of all things you have tested. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20113) Test K8s High Availability Service
[ https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234209#comment-17234209 ] Yang Wang commented on FLINK-20113: --- [~maguowei] Thanks for volunteering to do the K8s HA service test. Ping me if you need any help to building the image, run the session/application cluster with HA configured. > Test K8s High Availability Service > -- > > Key: FLINK-20113 > URL: https://issues.apache.org/jira/browse/FLINK-20113 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Affects Versions: 1.12.0 >Reporter: Robert Metzger >Assignee: Guowei Ma >Priority: Critical > Fix For: 1.12.0 > > > Added in https://issues.apache.org/jira/browse/FLINK-12884 > > [General Information about the Flink 1.12 release > testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing] > When testing a feature, consider the following aspects: > - Is the documentation easy to understand > - Are the error messages, log messages, APIs etc. easy to understand > - Is the feature working as expected under normal conditions > - Is the feature working / failing as expected with invalid input, induced > errors etc. > If you find a problem during testing, please file a ticket > (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket. > During the testing, and once you are finished, please write a short summary > of all things you have tested. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20113) Test K8s High Availability Service
[ https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233506#comment-17233506 ] Robert Metzger commented on FLINK-20113: Note, according to an offline discussion the testing will start on Wednesday or Thursday. > Test K8s High Availability Service > -- > > Key: FLINK-20113 > URL: https://issues.apache.org/jira/browse/FLINK-20113 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Affects Versions: 1.12.0 >Reporter: Robert Metzger >Assignee: Guowei Ma >Priority: Critical > Fix For: 1.12.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20113) Test K8s High Availability Service
[ https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231234#comment-17231234 ] Robert Metzger commented on FLINK-20113: Awesome, thanks a lot! > Test K8s High Availability Service > -- > > Key: FLINK-20113 > URL: https://issues.apache.org/jira/browse/FLINK-20113 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Affects Versions: 1.12.0 >Reporter: Robert Metzger >Assignee: Guowei Ma >Priority: Critical > Fix For: 1.12.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-20113) Test K8s High Availability Service
[ https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231080#comment-17231080 ] Guowei Ma commented on FLINK-20113: --- Could assign this task to me. I can work for this. > Test K8s High Availability Service > -- > > Key: FLINK-20113 > URL: https://issues.apache.org/jira/browse/FLINK-20113 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Affects Versions: 1.12.0 >Reporter: Robert Metzger >Priority: Critical > Fix For: 1.12.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)