[jira] [Commented] (FLINK-21144) KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error
[ https://issues.apache.org/jira/browse/FLINK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272607#comment-17272607 ] Xintong Song commented on FLINK-21144: -- After taking a closer look, I found it not trivial to support scheduling periodical executions in `MainThreadExecutor`. Scheduling periodical executions can be supported through `RpcService#getScheduledExecutor()`. However, exposing the `ScheduledExecutor` to the `RpcEndpoint#MainThreadExecutor` requires invasive changes to the current encapsulation. For now, I think we may only implement `MainThreadExecutor#schedule(callable, delay, unit)`, which should be enough to fix the `tryResetPodCreationCoolDown` problem. I've opened a PR. Please take a look. [~fly_in_gis] > KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error > -- > > Key: FLINK-21144 > URL: https://issues.apache.org/jira/browse/FLINK-21144 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.12.1 >Reporter: Yang Wang >Assignee: Xintong Song >Priority: Major > Labels: pull-request-available > Fix For: 1.12.2 > > > {{KubernetesResourceManagerDriver#tryResetPodCreationCoolDown}} is calling a > not implemented method {{RpcEndpoint.MainThreadExecutor#schedule(Callable > callable, long delay, TimeUnit unit)}}. This will cause a fatal error and > make JobManager terminate exceptionally. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21144) KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error
[ https://issues.apache.org/jira/browse/FLINK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17272049#comment-17272049 ] Yang Wang commented on FLINK-21144: --- Great. +1 to implement these methods to avoid similar issues in the future. > KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error > -- > > Key: FLINK-21144 > URL: https://issues.apache.org/jira/browse/FLINK-21144 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.12.1 >Reporter: Yang Wang >Assignee: Xintong Song >Priority: Major > Fix For: 1.12.2 > > > {{KubernetesResourceManagerDriver#tryResetPodCreationCoolDown}} is calling a > not implemented method {{RpcEndpoint.MainThreadExecutor#schedule(Callable > callable, long delay, TimeUnit unit)}}. This will cause a fatal error and > make JobManager terminate exceptionally. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21144) KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error
[ https://issues.apache.org/jira/browse/FLINK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271981#comment-17271981 ] Xintong Song commented on FLINK-21144: -- Nice catch! This escaped our tests because we use a fully implemented executor in the tests. I think the proper fix might be to implement the missing methods in `MainThreadExecutor`. Per the error message, there're no strong reason against implementing those methods, except for there were not used. And we might also want to do that for the master branch, to avoid similar problems in future. > KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error > -- > > Key: FLINK-21144 > URL: https://issues.apache.org/jira/browse/FLINK-21144 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.12.1 >Reporter: Yang Wang >Priority: Major > Fix For: 1.12.2 > > > {{KubernetesResourceManagerDriver#tryResetPodCreationCoolDown}} is calling a > not implemented method {{RpcEndpoint.MainThreadExecutor#schedule(Callable > callable, long delay, TimeUnit unit)}}. This will cause a fatal error and > make JobManager terminate exceptionally. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21144) KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error
[ https://issues.apache.org/jira/browse/FLINK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271929#comment-17271929 ] Yang Wang commented on FLINK-21144: --- Since {{KubernetesResourceManagerDriver#tryResetPodCreationCoolDown}} has already been replaced with {{ThresholdMeter}} inĀ FLINK-10868. Maybe we need to backport it to 1.12 or add a quick fix just for 1.12 branch. > KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error > -- > > Key: FLINK-21144 > URL: https://issues.apache.org/jira/browse/FLINK-21144 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.12.1 >Reporter: Yang Wang >Priority: Major > Fix For: 1.13.0, 1.12.2 > > > {{KubernetesResourceManagerDriver#tryResetPodCreationCoolDown}} is calling a > not implemented method {{RpcEndpoint.MainThreadExecutor#schedule(Callable > callable, long delay, TimeUnit unit)}}. This will cause a fatal error and > make JobManager terminate exceptionally. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-21144) KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error
[ https://issues.apache.org/jira/browse/FLINK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271922#comment-17271922 ] Yang Wang commented on FLINK-21144: --- cc [~xintongsong] > KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error > -- > > Key: FLINK-21144 > URL: https://issues.apache.org/jira/browse/FLINK-21144 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.12.1 >Reporter: Yang Wang >Priority: Major > Fix For: 1.13.0, 1.12.2 > > > {{KubernetesResourceManagerDriver#tryResetPodCreationCoolDown}} is calling a > not implemented method {{RpcEndpoint.MainThreadExecutor#schedule(Callable > callable, long delay, TimeUnit unit)}}. This will cause a fatal error and > make JobManager terminate exceptionally. -- This message was sent by Atlassian Jira (v8.3.4#803005)