[jira] [Commented] (FLINK-20113) Test K8s High Availability Service

2020-11-23 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237810#comment-17237810
 ] 

Yang Wang commented on FLINK-20113:
---

cc @[~ksp0422]

Please share your test results here. If it is really a valid issue, we need to 
create a ticket to track.

> Test K8s High Availability Service
> --
>
> Key: FLINK-20113
> URL: https://issues.apache.org/jira/browse/FLINK-20113
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Guowei Ma
>Priority: Critical
> Fix For: 1.12.0
>
>
> Added in https://issues.apache.org/jira/browse/FLINK-12884
> 
> [General Information about the Flink 1.12 release 
> testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing]
> When testing a feature, consider the following aspects:
> - Is the documentation easy to understand
> - Are the error messages, log messages, APIs etc. easy to understand
> - Is the feature working as expected under normal conditions
> - Is the feature working / failing as expected with invalid input, induced 
> errors etc.
> If you find a problem during testing, please file a ticket 
> (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket.
> During the testing, and once you are finished, please write a short summary 
> of all things you have tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20113) Test K8s High Availability Service

2020-11-23 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237595#comment-17237595
 ] 

Robert Metzger commented on FLINK-20113:


I'm closing this ticket since the testing is done and we are tracking all 
findings in separate tickets.

> Test K8s High Availability Service
> --
>
> Key: FLINK-20113
> URL: https://issues.apache.org/jira/browse/FLINK-20113
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Guowei Ma
>Priority: Critical
> Fix For: 1.12.0
>
>
> Added in https://issues.apache.org/jira/browse/FLINK-12884
> 
> [General Information about the Flink 1.12 release 
> testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing]
> When testing a feature, consider the following aspects:
> - Is the documentation easy to understand
> - Are the error messages, log messages, APIs etc. easy to understand
> - Is the feature working as expected under normal conditions
> - Is the feature working / failing as expected with invalid input, induced 
> errors etc.
> If you find a problem during testing, please file a ticket 
> (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket.
> During the testing, and once you are finished, please write a short summary 
> of all things you have tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20113) Test K8s High Availability Service

2020-11-18 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17235209#comment-17235209
 ] 

Robert Metzger commented on FLINK-20113:


Thanks a lot for the detailed test report and the ticket's you've filed.

> Test K8s High Availability Service
> --
>
> Key: FLINK-20113
> URL: https://issues.apache.org/jira/browse/FLINK-20113
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Guowei Ma
>Priority: Critical
> Fix For: 1.12.0
>
>
> Added in https://issues.apache.org/jira/browse/FLINK-12884
> 
> [General Information about the Flink 1.12 release 
> testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing]
> When testing a feature, consider the following aspects:
> - Is the documentation easy to understand
> - Are the error messages, log messages, APIs etc. easy to understand
> - Is the feature working as expected under normal conditions
> - Is the feature working / failing as expected with invalid input, induced 
> errors etc.
> If you find a problem during testing, please file a ticket 
> (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket.
> During the testing, and once you are finished, please write a short summary 
> of all things you have tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20113) Test K8s High Availability Service

2020-11-18 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234734#comment-17234734
 ] 

Guowei Ma commented on FLINK-20113:
---

I test four scenarios
 # Kubernetes
 ## Session Cluster
 ### Deploy a session cluster to the k8s 
 ### Access the JobManager Web
 ### Check the master have the KubernetesLeaderElector log
 ### Submit a StateMachineExample.jar job 
 ### Verify that there are some complete checkpoint
 ### Kill the jobmaster pod
 ### Verify that job could recovery from previous checkpoint
 ## Perjob Cluster
 ### Build a perjob image 
registry.cn-beijing.aliyuncs.com/streamcompute/flink:k8s-ha-per-job
 ### Deploy Perjob cluster
 ### Access the JobManager Web
 ### Check the master have the KubernetesLeaderElector log
 ### Verify that there are some complete checkpoints
 ### Kill the pod
 ### Verify that job could recovery from previous checkpoint
 # Native Kubernetes
 ## Session Cluster 
 ### Start a native k8s session
 ### Access the JobManager web 
 ### Check the KubernetesLeaderElector log
 ### Submit a StateMachineExample.jar job 
 ### Verify that there are some complete checkpoints.
 ### Kill the pod 
 ### Verify that job could recovery from previous checkpoint
 ## Start Application
 ### Start a flink application
 ### Access the JobManager web 
 ### Check the KubernetesLeaderElector log
 ### Kill the pod 
 ### Verify that job could recovery from previous checkpoint

-

In general the new HA service is work. Most problems I found are about the log 
and documentation.

> Test K8s High Availability Service
> --
>
> Key: FLINK-20113
> URL: https://issues.apache.org/jira/browse/FLINK-20113
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Guowei Ma
>Priority: Critical
> Fix For: 1.12.0
>
>
> Added in https://issues.apache.org/jira/browse/FLINK-12884
> 
> [General Information about the Flink 1.12 release 
> testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing]
> When testing a feature, consider the following aspects:
> - Is the documentation easy to understand
> - Are the error messages, log messages, APIs etc. easy to understand
> - Is the feature working as expected under normal conditions
> - Is the feature working / failing as expected with invalid input, induced 
> errors etc.
> If you find a problem during testing, please file a ticket 
> (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket.
> During the testing, and once you are finished, please write a short summary 
> of all things you have tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20113) Test K8s High Availability Service

2020-11-17 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234284#comment-17234284
 ] 

Guowei Ma commented on FLINK-20113:
---

[~fly_in_gis] ok.

> Test K8s High Availability Service
> --
>
> Key: FLINK-20113
> URL: https://issues.apache.org/jira/browse/FLINK-20113
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Guowei Ma
>Priority: Critical
> Fix For: 1.12.0
>
>
> Added in https://issues.apache.org/jira/browse/FLINK-12884
> 
> [General Information about the Flink 1.12 release 
> testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing]
> When testing a feature, consider the following aspects:
> - Is the documentation easy to understand
> - Are the error messages, log messages, APIs etc. easy to understand
> - Is the feature working as expected under normal conditions
> - Is the feature working / failing as expected with invalid input, induced 
> errors etc.
> If you find a problem during testing, please file a ticket 
> (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket.
> During the testing, and once you are finished, please write a short summary 
> of all things you have tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20113) Test K8s High Availability Service

2020-11-17 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17234209#comment-17234209
 ] 

Yang Wang commented on FLINK-20113:
---

[~maguowei] Thanks for volunteering to do the K8s HA service test. Ping me if 
you need any help to building the image, run the session/application cluster 
with HA configured.

> Test K8s High Availability Service
> --
>
> Key: FLINK-20113
> URL: https://issues.apache.org/jira/browse/FLINK-20113
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Guowei Ma
>Priority: Critical
> Fix For: 1.12.0
>
>
> Added in https://issues.apache.org/jira/browse/FLINK-12884
> 
> [General Information about the Flink 1.12 release 
> testing|https://cwiki.apache.org/confluence/display/FLINK/1.12+Release+-+Community+Testing]
> When testing a feature, consider the following aspects:
> - Is the documentation easy to understand
> - Are the error messages, log messages, APIs etc. easy to understand
> - Is the feature working as expected under normal conditions
> - Is the feature working / failing as expected with invalid input, induced 
> errors etc.
> If you find a problem during testing, please file a ticket 
> (Priority=Critical; Fix Version = 1.12.0), and link it in this testing ticket.
> During the testing, and once you are finished, please write a short summary 
> of all things you have tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20113) Test K8s High Availability Service

2020-11-17 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233506#comment-17233506
 ] 

Robert Metzger commented on FLINK-20113:


Note, according to an offline discussion the testing will start on Wednesday or 
Thursday.

> Test K8s High Availability Service
> --
>
> Key: FLINK-20113
> URL: https://issues.apache.org/jira/browse/FLINK-20113
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Guowei Ma
>Priority: Critical
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20113) Test K8s High Availability Service

2020-11-13 Thread Robert Metzger (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231234#comment-17231234
 ] 

Robert Metzger commented on FLINK-20113:


Awesome, thanks a lot!

> Test K8s High Availability Service
> --
>
> Key: FLINK-20113
> URL: https://issues.apache.org/jira/browse/FLINK-20113
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Assignee: Guowei Ma
>Priority: Critical
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-20113) Test K8s High Availability Service

2020-11-12 Thread Guowei Ma (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231080#comment-17231080
 ] 

Guowei Ma commented on FLINK-20113:
---

Could assign this task to me. I can work for this.

> Test K8s High Availability Service
> --
>
> Key: FLINK-20113
> URL: https://issues.apache.org/jira/browse/FLINK-20113
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.0
>Reporter: Robert Metzger
>Priority: Critical
> Fix For: 1.12.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)