Miuler commented on code in PR #252:
URL: 
https://github.com/apache/flink-kubernetes-operator/pull/252#discussion_r889533284


##########
flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/service/FlinkService.java:
##########
@@ -229,14 +229,12 @@ private JarRunResponseBody runJar(
                                     ? RestoreMode.DEFAULT
                                     : null);
             LOG.info("Submitting job: {} to session cluster.", 
jobID.toHexString());
+            var clientTimeout =
+                    
configManager.getOperatorConfiguration().getFlinkClientTimeout().toSeconds();
+            LOG.debug("clientTimeout: {}", clientTimeout);

Review Comment:
   > I have a doubt, what is the properti for this clinetTimeout?
   > 
   > I add this in the values.yaml
   > 
   > ```
   >   flink-conf.yaml: |+
   >     # Flink Config Overrides
   >     client.timeout: 4 MINUTE
   > ```
   > 
   > and in my pod I see this
   > 
   > ```
   > exec -ti  migration-cosmosdb-wape-5578f9948c-9t6xm  -- bash
   > 
   > root@migration-cosmosdb-wape-5578f9948c-9t6xm:/opt/flink# grep time 
conf/flink-conf.yaml
   > client.timeout: 4 MINUTE
   > ```
   > 
   > but in my log I see this
   > 
   > ```
   > 2022-06-04 12:20:30,505 o.a.f.k.o.c.FlinkConfigManager [INFO ] Updating 
default configuration to {blob.server.port=6124, 
taskmanager.memory.process.size=1728m, client.timeout=4 MINUTE, 
jobmanager.memory.process.size=1600m, jobmanager.rpc.port=6123, 
taskmanager.rpc.port=6122, queryable-state.proxy.ports=6125, paralle
   > lism.default=2, taskmanager.numberOfTaskSlots=2, 
kubernetes.operator.metrics.reporter.slf4j.interval=5 MINUTE, 
kubernetes.operator.observer.progress-check.interval=5 s, 
kubernetes.operator.metrics.reporter.slf4j.factory.class=org.apache.flink.metrics.slf4j.Slf4jReporterFactory,
 kubernetes.operator.reconciler.resched
   > ule.interval=15 s}
   > ...
   > ...
   > 2022-06-04 12:20:30,762 i.j.o.a.c.ExecutorServiceManager [DEBUG] 
Initialized ExecutorServiceManager executor: class 
java.util.concurrent.ThreadPoolExecutor, timeout: 10
   > ...
   > ...
   > 2022-06-04 12:28:12,124 o.a.f.k.o.s.FlinkService       
[DEBUG][flink-wape-02/migration-cosmosdb-wape-sessionjob] clientTimeout: 10
   > ...
   > ...
   > 2022-06-04 12:26:52,685 i.j.o.p.e.ReconciliationDispatcher 
[ERROR][flink-wape-02/migration-cosmosdb-wape-sessionjob] Error during event 
processing ExecutionScope{ resource id: 
CustomResourceID{name='migration-cosmosdb-wape-sessionjob', 
namespace='flink-wape-02'}, version: null} failed.
   > org.apache.flink.kubernetes.operator.exception.ReconciliationException: 
org.apache.flink.util.FlinkRuntimeException: 
java.util.concurrent.TimeoutException
   >     at 
org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:117)
   >     at 
org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:59)
   >     at 
io.javaoperatorsdk.operator.processing.Controller$2.execute(Controller.java:101)
   >     at 
io.javaoperatorsdk.operator.processing.Controller$2.execute(Controller.java:76)
   >     at 
io.javaoperatorsdk.operator.api.monitoring.Metrics.timeControllerExecution(Metrics.java:34)
   >     at 
io.javaoperatorsdk.operator.processing.Controller.reconcile(Controller.java:75)
   >     at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:143)
   >     at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:109)
   >     at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:74)
   >     at 
io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:50)
   >     at 
io.javaoperatorsdk.operator.processing.event.EventProcessor$ControllerExecution.run(EventProcessor.java:349)
   >     at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
Source)
   >     at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   >     at java.base/java.lang.Thread.run(Unknown Source)
   > Caused by: org.apache.flink.util.FlinkRuntimeException: 
java.util.concurrent.TimeoutException
   >     at 
org.apache.flink.kubernetes.operator.service.FlinkService.runJar(FlinkService.java:240)
   >     at 
org.apache.flink.kubernetes.operator.service.FlinkService.submitJobToSessionCluster(FlinkService.java:198)
   >     at 
org.apache.flink.kubernetes.operator.reconciler.sessionjob.FlinkSessionJobReconciler.submitAndInitStatus(FlinkSessionJobReconciler.java:164)
   >     at 
org.apache.flink.kubernetes.operator.reconciler.sessionjob.FlinkSessionJobReconciler.reconcile(FlinkSessionJobReconciler.java:88)
   >     at 
org.apache.flink.kubernetes.operator.reconciler.sessionjob.FlinkSessionJobReconciler.reconcile(FlinkSessionJobReconciler.java:48)
   >     at 
org.apache.flink.kubernetes.operator.controller.FlinkSessionJobController.reconcile(FlinkSessionJobController.java:115)
   >     ... 13 more
   > Caused by: java.util.concurrent.TimeoutException
   >     at java.base/java.util.concurrent.CompletableFuture.timedGet(Unknown 
Source)
   >     at java.base/java.util.concurrent.CompletableFuture.get(Unknown Source)
   >     at 
org.apache.flink.kubernetes.operator.service.FlinkService.runJar(FlinkService.java:237)
   >     ... 18 more
   > ```
   > 
   > I see my `client.timeout=4 MINUTE` but also a `clientTimeout 10` seconds?
   
   All my pipelines are multiplied by this timeout :(



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to