[jira] [Commented] (FLINK-21144) KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error

2021-01-26 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272607#comment-17272607
 ] 

Xintong Song commented on FLINK-21144:
--

After taking a closer look, I found it not trivial to support scheduling 
periodical executions in `MainThreadExecutor`.

Scheduling periodical executions can be supported through 
`RpcService#getScheduledExecutor()`. However, exposing the `ScheduledExecutor` 
to the `RpcEndpoint#MainThreadExecutor` requires invasive changes to the 
current encapsulation.

For now, I think we may only implement `MainThreadExecutor#schedule(callable, 
delay, unit)`, which should be enough to fix the `tryResetPodCreationCoolDown` 
problem.

I've opened a PR. Please take a look. [~fly_in_gis]

> KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error
> --
>
> Key: FLINK-21144
> URL: https://issues.apache.org/jira/browse/FLINK-21144
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.1
>Reporter: Yang Wang
>Assignee: Xintong Song
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.2
>
>
> {{KubernetesResourceManagerDriver#tryResetPodCreationCoolDown}} is calling a 
> not implemented method {{RpcEndpoint.MainThreadExecutor#schedule(Callable 
> callable, long delay, TimeUnit unit)}}. This will cause a fatal error and 
> make JobManager terminate exceptionally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21144) KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error

2021-01-26 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272049#comment-17272049
 ] 

Yang Wang commented on FLINK-21144:
---

Great. +1 to implement these methods to avoid similar issues in the future.

> KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error
> --
>
> Key: FLINK-21144
> URL: https://issues.apache.org/jira/browse/FLINK-21144
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.1
>Reporter: Yang Wang
>Assignee: Xintong Song
>Priority: Major
> Fix For: 1.12.2
>
>
> {{KubernetesResourceManagerDriver#tryResetPodCreationCoolDown}} is calling a 
> not implemented method {{RpcEndpoint.MainThreadExecutor#schedule(Callable 
> callable, long delay, TimeUnit unit)}}. This will cause a fatal error and 
> make JobManager terminate exceptionally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21144) KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error

2021-01-26 Thread Xintong Song (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271981#comment-17271981
 ] 

Xintong Song commented on FLINK-21144:
--

Nice catch!

This escaped our tests because we use a fully implemented executor in the tests.

I think the proper fix might be to implement the missing methods in 
`MainThreadExecutor`. Per the error message, there're no strong reason against 
implementing those methods, except for there were not used. And we might also 
want to do that for the master branch, to avoid similar problems in future.

> KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error
> --
>
> Key: FLINK-21144
> URL: https://issues.apache.org/jira/browse/FLINK-21144
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.1
>Reporter: Yang Wang
>Priority: Major
> Fix For: 1.12.2
>
>
> {{KubernetesResourceManagerDriver#tryResetPodCreationCoolDown}} is calling a 
> not implemented method {{RpcEndpoint.MainThreadExecutor#schedule(Callable 
> callable, long delay, TimeUnit unit)}}. This will cause a fatal error and 
> make JobManager terminate exceptionally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21144) KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error

2021-01-25 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271929#comment-17271929
 ] 

Yang Wang commented on FLINK-21144:
---

Since {{KubernetesResourceManagerDriver#tryResetPodCreationCoolDown}} has 
already been replaced with {{ThresholdMeter}} inĀ FLINK-10868. Maybe we need to 
backport it to 1.12 or add a quick fix just for 1.12 branch.

> KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error
> --
>
> Key: FLINK-21144
> URL: https://issues.apache.org/jira/browse/FLINK-21144
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.1
>Reporter: Yang Wang
>Priority: Major
> Fix For: 1.13.0, 1.12.2
>
>
> {{KubernetesResourceManagerDriver#tryResetPodCreationCoolDown}} is calling a 
> not implemented method {{RpcEndpoint.MainThreadExecutor#schedule(Callable 
> callable, long delay, TimeUnit unit)}}. This will cause a fatal error and 
> make JobManager terminate exceptionally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21144) KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error

2021-01-25 Thread Yang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271922#comment-17271922
 ] 

Yang Wang commented on FLINK-21144:
---

cc [~xintongsong]

> KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error
> --
>
> Key: FLINK-21144
> URL: https://issues.apache.org/jira/browse/FLINK-21144
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.12.1
>Reporter: Yang Wang
>Priority: Major
> Fix For: 1.13.0, 1.12.2
>
>
> {{KubernetesResourceManagerDriver#tryResetPodCreationCoolDown}} is calling a 
> not implemented method {{RpcEndpoint.MainThreadExecutor#schedule(Callable 
> callable, long delay, TimeUnit unit)}}. This will cause a fatal error and 
> make JobManager terminate exceptionally.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)