WarsenLiu opened a new issue, #7106: URL: https://github.com/apache/seatunnel/issues/7106
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22) and found no similar issues. ### What happened Build SeaTunnel Engine using three 8u32g servers, but sometimes there may be checkpoint timeout and checkpoints always write to the first node, resulting in too many inode. ### SeaTunnel Version 2.3.5 ### SeaTunnel Config ```conf seatunnel: engine: classloader-cache-mode: true history-job-expire-minutes: 180 backup-count: 1 queue-type: blockingqueue print-execution-info-interval: 60 print-job-metrics-info-interval: 60 slot-service: dynamic-slot: true checkpoint: interval: 300000 timeout: 600000 storage: type: hdfs max-retained: 3 plugin-config: namespace: /data/apache-seatunnel-2.3.5/checkpoint # namespace: /tmp/seatunnel/checkpoint_snapshot storage.type: hdfs fs.defaultFS: file:///data/apache-seatunnel-2.3.5/ ``` ### Running Command ```shell used ds: env { parallelism = 1 job.mode = "STREAMING" checkpoint.interval = 60000 job.name = "z003" } source { MySQL-CDC { base-url = "jdbc:mysql://xxx:3306/xxx?autoReconnect=true" username = "root" password = "xxx" table-names = ["xxx.xxx"] startup.mode = "initial" result_table_name = "source_table_2" query = "select xxx from xxx" } } transform { Sql { source_table_name = "source_table_2" result_table_name = "target_table_2" query = "select xxx from source_table_2" } Sql { source_table_name = "target_table_2" result_table_name = "target_table_log_2" query = "select xxx from target_table_2" } } sink { Jdbc { url = "jdbc:mysql://xxx:3306/xxx?autoReconnect=true" driver= "com.mysql.cj.jdbc.Driver" user = "root" password = "xxx" database = "xxx" source_table_name = "target_table_2" generate_sink_sql = true table = "xxx" batch_size = 10 primary_keys = ["xxx"] } Jdbc { url = "jdbc:mysql://xxx:3306/xxx?autoReconnect=true" driver= "com.mysql.cj.jdbc.Driver" user = "root" password = "xxx" database = "xxx" source_table_name = "target_table_log_2" batch_size = 10 query = "insert into xxx(xxx) values(?) ON DUPLICATE KEY UPDATE field= VALUES(field);" } } ``` ### Error Exception ```log [INFO] 2024-07-04 13:37:16.704 +0800 - -> 2024-07-04 13:37:15,926 INFO org.apache.seatunnel.engine.client.job.ClientJobProxy - Job (861093691492663301) end with state FAILED 2024-07-04 13:37:15,927 INFO com.hazelcast.core.LifecycleService - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTTING_DOWN 2024-07-04 13:37:15,936 INFO com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [seatunnel] [5.1] Removed connection to endpoint: [10.60.162.14]:5801:aed286d6-1625-4ff4-91b6-34028db07da3, connection: ClientConnection{alive=false, connectionId=2, channel=NioChannel{/10.60.162.35:52531->/10.60.162.14:5801}, remoteAddress=[10.60.162.14]:5801, lastReadTime=2024-07-04 13:37:08.482, lastWriteTime=2024-07-04 13:37:08.481, closedTime=2024-07-04 13:37:15.932, connected server version=5.1} 2024-07-04 13:37:15,940 INFO com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [seatunnel] [5.1] Removed connection to endpoint: [10.60.162.16]:5801:a3bbbfda-b0bc-4738-a148-a17337bdb588, connection: ClientConnection{alive=false, connectionId=3, channel=NioChannel{/10.60.162.35:47319->/10.60.162.16:5801}, remoteAddress=[10.60.162.16]:5801, lastReadTime=2024-07-04 13:37:13.484, lastWriteTime=2024-07-04 13:37:13.482, closedTime=2024-07-04 13:37:15.937, connected server version=5.1} 2024-07-04 13:37:15,942 INFO com.hazelcast.client.impl.connection.ClientConnectionManager - hz.client_1 [seatunnel] [5.1] Removed connection to endpoint: [10.60.162.31]:5801:b9abbcd8-ac93-41d8-9d85-25664ce23716, connection: ClientConnection{alive=false, connectionId=1, channel=NioChannel{/10.60.162.35:49005->/10.60.162.31:5801}, remoteAddress=[10.60.162.31]:5801, lastReadTime=2024-07-04 13:37:15.906, lastWriteTime=2024-07-04 13:37:13.328, closedTime=2024-07-04 13:37:15.940, connected server version=5.1} 2024-07-04 13:37:15,942 INFO com.hazelcast.core.LifecycleService - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is CLIENT_DISCONNECTED 2024-07-04 13:37:15,946 INFO com.hazelcast.core.LifecycleService - hz.client_1 [seatunnel] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTDOWN 2024-07-04 13:37:15,946 INFO org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand - Closed SeaTunnel client...... 2024-07-04 13:37:15,946 INFO org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand - Closed metrics executor service ...... 2024-07-04 13:37:15,946 ERROR org.apache.seatunnel.core.starter.SeaTunnel - =============================================================================== 2024-07-04 13:37:15,947 ERROR org.apache.seatunnel.core.starter.SeaTunnel - Fatal Error, 2024-07-04 13:37:15,947 ERROR org.apache.seatunnel.core.starter.SeaTunnel - Please submit bug report in https://github.com/apache/seatunnel/issues 2024-07-04 13:37:15,947 ERROR org.apache.seatunnel.core.starter.SeaTunnel - Reason:SeaTunnel job executed failed 2024-07-04 13:37:15,949 ERROR org.apache.seatunnel.core.starter.SeaTunnel - Exception StackTrace:org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202) at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40) at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34) Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml or jobConfig env. at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:274) at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:590) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:194) ... 2 more 2024-07-04 13:37:15,949 ERROR org.apache.seatunnel.core.starter.SeaTunnel - =============================================================================== Exception in thread "main" org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:202) at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40) at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34) Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.engine.server.checkpoint.CheckpointException: Checkpoint expired before completing. Please increase checkpoint timeout in the seatunnel.yaml or jobConfig env. at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.handleCoordinatorError(CheckpointCoordinator.java:274) at org.apache.seatunnel.engine.server.checkpoint.CheckpointCoordinator.lambda$null$9(CheckpointCoordinator.java:590) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:194) ... 2 more 2024-07-04 13:37:15,951 INFO org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand - run shutdown hook because get close signal [INFO] 2024-07-04 13:37:16.706 +0800 - process has exited. execute path:/data/hubs/dolphinscheduler/tmp/exec/process/default/13823066283424/13831054200608_19/164485/311103, processId:560021 ,exitStatusCode:1 ,processWaitForStatus:true ,processExitValue:1 [INFO] 2024-07-04 13:37:16.707 +0800 - *********************************************************************************************** [INFO] 2024-07-04 13:37:16.707 +0800 - ********************************* Finalize task instance ************************************ [INFO] 2024-07-04 13:37:16.707 +0800 - *********************************************************************************************** [INFO] 2024-07-04 13:37:16.707 +0800 - Upload output files: [] successfully [INFO] 2024-07-04 13:37:16.707 +0800 - Send task execute status: FAILURE to master : 10.60.162.35:1234 [INFO] 2024-07-04 13:37:16.708 +0800 - Remove the current task execute context from worker cache [INFO] 2024-07-04 13:37:16.708 +0800 - The current execute mode isn't develop mode, will clear the task execute file: /data/hubs/dolphinscheduler/tmp/exec/process/default/13823066283424/13831054200608_19/164485/311103 [INFO] 2024-07-04 13:37:16.708 +0800 - Success clear the task execute file: /data/hubs/dolphinscheduler/tmp/exec/process/default/13823066283424/13831054200608_19/164485/311103 [INFO] 2024-07-04 13:37:16.708 +0800 - FINALIZE_SESSION ``` ### Zeta or Flink or Spark Version Zeta ### Java or Scala Version java version "1.8.0_401" Java(TM) SE Runtime Environment (build 1.8.0_401-b10) Java HotSpot(TM) 64-Bit Server VM (build 25.401-b10, mixed mode) ### Screenshots _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
