Miuler commented on code in PR #252: URL: https://github.com/apache/flink-kubernetes-operator/pull/252#discussion_r889533284
########## flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/FlinkService.java: ########## @@ -229,14 +229,12 @@ private JarRunResponseBody runJar( ? RestoreMode.DEFAULT : null); LOG.info("Submitting job: {} to session cluster.", jobID.toHexString()); + var clientTimeout = + configManager.getOperatorConfiguration().getFlinkClientTimeout().toSeconds(); + LOG.debug("clientTimeout: {}", clientTimeout); Review Comment: > I have a doubt, what is the properti for this clinetTimeout? > > I add this in the values.yaml > > ``` > flink-conf.yaml: |+ > # Flink Config Overrides > client.timeout: 4 MINUTE > ``` > > and in my pod I see this > > ``` > exec -ti migration-cosmosdb-wape-5578f9948c-9t6xm -- bash > > root@migration-cosmosdb-wape-5578f9948c-9t6xm:/opt/flink# grep time conf/flink-conf.yaml > client.timeout: 4 MINUTE > ``` > > but in my log I see this > > ``` > 2022-06-04 12:20:30,505 o.a.f.k.o.c.FlinkConfigManager [INFO ] Updating default configuration to {blob.server.port=6124, taskmanager.memory.process.size=1728m, client.timeout=4 MINUTE, jobmanager.memory.process.size=1600m, jobmanager.rpc.port=6123, taskmanager.rpc.port=6122, queryable-state.proxy.ports=6125, paralle > lism.default=2, taskmanager.numberOfTaskSlots=2, kubernetes.operator.metrics.reporter.slf4j.interval=5 MINUTE, kubernetes.operator.observer.progress-check.interval=5 s, kubernetes.operator.metrics.reporter.slf4j.factory.class=org.apache.flink.metrics.slf4j.Slf4jReporterFactory, kubernetes.operator.reconciler.resched > ule.interval=15 s} > ... > ... > 2022-06-04 12:20:30,762 i.j.o.a.c.ExecutorServiceManager [DEBUG] Initialized ExecutorServiceManager executor: class java.util.concurrent.ThreadPoolExecutor, timeout: 10 > ... > ... > 2022-06-04 12:28:12,124 o.a.f.k.o.s.FlinkService [DEBUG][flink-wape-02/migration-cosmosdb-wape-sessionjob] clientTimeout: 10 > ... > ... > 2022-06-04 12:26:52,685 i.j.o.p.e.ReconciliationDispatcher [ERROR][flink-wape-02/migration-cosmosdb-wape-sessionjob] Error during event processing ExecutionScope{ resource id: CustomResourceID{name='migration-cosmosdb-wape-sessionjob', namespace='flink-wape-02'}, version: null} failed. > org.apache.flink.kubernetes.operator.exception.ReconciliationException: org.apache.flink.util.FlinkRuntimeException: java.util.concurrent.TimeoutException > at org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:117) > at org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:59) > at io.javaoperatorsdk.operator.processing.Controller$2.execute(Controller.java:101) > at io.javaoperatorsdk.operator.processing.Controller$2.execute(Controller.java:76) > at io.javaoperatorsdk.operator.api.monitoring.Metrics.timeControllerExecution(Metrics.java:34) > at io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:75) > at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:143) > at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:109) > at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:74) > at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:50) > at io.javaoperatorsdk.operator.processing.event.EventProcessor$ControllerExecution.run(EventProcessor.java:349) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.base/java.lang.Thread.run(Unknown Source) > Caused by: org.apache.flink.util.FlinkRuntimeException: java.util.concurrent.TimeoutException > at org.apache.flink.kubernetes.operator.service.FlinkService.runJar(FlinkService.java:240) > at org.apache.flink.kubernetes.operator.service.FlinkService.submitJobToSessionCluster(FlinkService.java:198) > at org.apache.flink.kubernetes.operator.reconciler.sessionjob.FlinkSessionJobReconciler.submitAndInitStatus(FlinkSessionJobReconciler.java:164) > at org.apache.flink.kubernetes.operator.reconciler.sessionjob.FlinkSessionJobReconciler.reconcile(FlinkSessionJobReconciler.java:88) > at org.apache.flink.kubernetes.operator.reconciler.sessionjob.FlinkSessionJobReconciler.reconcile(FlinkSessionJobReconciler.java:48) > at org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:115) > ... 13 more > Caused by: java.util.concurrent.TimeoutException > at java.base/java.util.concurrent.CompletableFuture.timedGet(Unknown Source) > at java.base/java.util.concurrent.CompletableFuture.get(Unknown Source) > at org.apache.flink.kubernetes.operator.service.FlinkService.runJar(FlinkService.java:237) > ... 18 more > ``` > > I see my `client.timeout=4 MINUTE` but also a `clientTimeout 10` seconds? All my pipelines are multiplied by this timeout :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org