Hello all,
Our team encounter *akka.pattern.AskTimeoutException *when start
jobmanager. Base on the error message, we try to setup *akka.ask.timeout *
and* web.timeout *to 360s, but both of them doesn't work.
We guess the issue may cause by *FileSource.forRecordFileFormat.* The
application will load files in batch mode to rebuild our historical data.
The job can run normally in small batch. But it will be broken when run
over lots of files. (around 3 files distributed in 1500 folders)
The flink application is on kubernetes in application mode and files stores
in Google Cloud Storage.
Our questions are,
1. How to enlarge akka.ask.timeout correctly in our case?
2. Is it cause by FileSource? If yes, could you provide some suggestions to
prevent it?
Following is our settings.
```
2021-06-10 03:44:14,317 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: kubernetes.container.image, */:**.*.**
2021-06-10 03:44:14,317 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: fs.hdfs.hadoopconfig, /opt/flink/conf
2021-06-10 03:44:14,317 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: taskmanager.numberOfTaskSlots, 4
2021-06-10 03:44:14,317 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: kubernetes.rest-service.exposed.type, ClusterIP
2021-06-10 03:44:14,317 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: high-availability.jobmanager.port, 6123
2021-06-10 03:44:14,318 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: akka.ask.timeout, 360s
2021-06-10 03:44:14,318 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: state.backend.rocksdb.memory.write-buffer-ratio, 0.7
2021-06-10 03:44:14,318 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: metrics.reporter.prom.class,
org.apache.flink.metrics.prometheus.PrometheusReporter
2021-06-10 03:44:14,318 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: state.storage.fs.memory-threshold, 1048576
2021-06-10 03:44:14,318 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: execution.checkpointing.unaligned, true
2021-06-10 03:44:14,318 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: web.timeout, 100
2021-06-10 03:44:14,318 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: execution.target, kubernetes-application
2021-06-10 03:44:14,318 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: restart-strategy.fixed-delay.attempts, 2147483647
2021-06-10 03:44:14,319 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: jobmanager.memory.process.size, 8g
2021-06-10 03:44:14,319 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: taskmanager.rpc.port, 6122
2021-06-10 03:44:14,319 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: akka.framesize, 104857600b
2021-06-10 03:44:14,319 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: containerized.master.env.HADOOP_CLASSPATH,
/opt/flink/conf:/opt/hadoop-3.1.1/share/hadoop/common/lib/*:/opt/hadoop-3.1.1/share/hadoop/common/*:/opt/hadoop-3.1.1/share/hadoop/hdfs:/opt/hadoop-3.1.1/share/hadoop/hdfs/lib/*:/opt/hadoop-3.1.1/share/hadoop/hdfs/*:/opt/hadoop-3.1.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.1.1/share/hadoop/mapreduce/*:/opt/hadoop-3.1.1/share/hadoop/yarn:/opt/hadoop-3.1.1/share/hadoop/yarn/lib/*:/opt/hadoop-3.1.1/share/hadoop/yarn/*:/contrib/capacity-scheduler/*.jar
2021-06-10 03:44:14,319 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: execution.attached, true
2021-06-10 03:44:14,319 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: internal.cluster.execution-mode, NORMAL
2021-06-10 03:44:14,319 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property: high-availability,
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
2021-06-10 03:44:14,320 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property:
execution.checkpointing.externalized-checkpoint-retention,
DELETE_ON_CANCELLATION
2021-06-10 03:44:14,320 INFO
org.apache.flink.configuration.GlobalConfiguration [] - Loading
configuration property