Re: [I] [Bug] Seatunnel +Flink Engine: recent versions of seatunnel do NOT support to restore from flink (versions: 1.16 1.17 1.18） checkpoints/savepoints [seatunnel]

via GitHub Sun, 28 Dec 2025 22:27:33 -0800


tomma-a commented on issue #10193:
URL: https://github.com/apache/seatunnel/issues/10193#issuecomment-3695580038


   > > > [@tomma-a](https://github.com/tomma-a) 
[@yzeng1618](https://github.com/yzeng1618) hi,Will it cause the checkpoint to 
be unable to restart, or will it just prevent the jobid from being obtained
   > > 
   > > 
   > > Hello [@Carl-Zhou-CN](https://github.com/Carl-Zhou-CN) , 
[@yzeng1618](https://github.com/yzeng1618)
   > > From my test, the job could not be restored from flink 
checkpoint/savepoint, The flink in a loop of trying to restore and fail
   > > For some jobs(depends on a specifi job) if flink 
jobmanager.execution.failover-strategy=Restart pipelined region ， the job might 
start to run again, but I think it lost the interanal state , not restore from 
last run state.
   > > Really appreciate all your help here. Seatunnel is an awesome 
tool/framework , really appreciate all your contirubitions here!
   > > Tom
   > 
   > hi, [@tomma-a](https://github.com/tomma-a) could you provide more logs? It 
doesn't seem to be the main reason
   
   
   
   Hey @Carl-Zhou-CN ,
   
   I think even it's a 'WARN' message,  the problem could prevent the flink 
job(seatunnel job) to run normally as the the task could not be restored
   
   
   Seatunnel job   config
   `
   {
       "env" : {
           "parallelism" : 3,
           "job.mode" : "STREAMING",
           "checkpoint.interval" : 60000,
           "flink.execution.checkpointing.mode" : "EXACTLY_ONCE",
           "flink.pipeline.max-parallelism" : 64,
           "flink.execution.checkpointing.timeout" : 600000
       },
       "source" : [
           {
               "plugin_output" : "fake2",
               "topic" : "info",
               "consumer.group" : "testr",
               "bootstrap.servers" : 
"tom-cluster-kafka-bootstrap.kafka.svc.cluster.local:9092",
               "commit_on_checkpoint" : false,
               "format" : "json",
               "plugin_name" : "Kafka"
           }
       ],
       "sink" : [
           {
               "plugin_input" : "fake2",
               "topic" : "topic",
               "bootstrap.servers" : 
"tom-cluster1-kafka-bootstrap.kafka.svc.cluster.local:9092",
               "format" : "json",
               "kafka.request.timeout.ms" : 60000,
               "semantics" : "EXACTLY_ONCE",
               "plugin_name" : "Kafka"
           }
       ]
   }
   
   ....
   2025-12-29 06:07:17,485 INFO  
org.apache.seatunnel.core.starter.flink.execution.FlinkExecution [] - Flink 
Execution Plan: {
     "nodes" : [ {
       "id" : 1,
       "type" : "Source: Kafka-Source",
       "pact" : "Data Source",
       "contents" : "Source: Kafka-Source",
       "parallelism" : 3
     }, {
       "id" : 3,
       "type" : "Kafka-Sink: Writer",
       "pact" : "Operator",
       "contents" : "Kafka-Sink: Writer",
       "parallelism" : 3,
       "predecessors" : [ {
         "id" : 1,
         "ship_strategy" : "FORWARD",
         "side" : "second"
       } ]
     }, {
       "id" : 5,
       "type" : "Kafka-Sink: Committer",
       "pact" : "Operator",
       "contents" : "Kafka-Sink: Committer",
       "parallelism" : 3,
       "predecessors" : [ {
         "id" : 3,
         "ship_strategy" : "FORWARD",
         "side" : "second"
       } ]
     } ]
   }
   `
   
   After change the flink job yaml (I used flink upgradeMode: savepoint),  in 
k8s re-apply the flink yaml
   
   flink job is now in  a loop of :  created , initialized, failed.         
Never to be running of success!
   
   
   here is the full log:
   
   2025-12-29 06:07:31,116 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Reset the 
checkpoint ID of job bd6a33ade95d6a15ffaabc25c2fca8c2 to 5.
   2025-12-29 06:07:31,116 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Restoring job 
bd6a33ade95d6a15ffaabc25c2fca8c2 from Savepoint 4 @ 0 for 
bd6a33ade95d6a15ffaabc25c2fca8c2 located at 
oss://xxxxxxxxxxxxxxxxxxxx/testflink/tom/savepoints/savepoint-110c86-77fa991c785d.
   2025-12-29 06:07:31,193 INFO  
org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - No master 
state to restore
   2025-12-29 06:07:31,195 INFO  
org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator
 [] - Resetting coordinator to checkpoint.
   2025-12-29 06:07:31,199 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Closing 
SourceCoordinator for source Source: Kafka-Source.
   2025-12-29 06:07:31,199 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Source 
coordinator for source Source: Kafka-Source closed.
   2025-12-29 06:07:31,203 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Restoring 
SplitEnumerator of source Source: Kafka-Source from checkpoint.
   2025-12-29 06:07:31,218 WARN  
org.apache.seatunnel.translation.flink.source.FlinkSourceSplitEnumeratorContext 
[] - Get flink job id failed
   java.lang.IllegalStateException: Initialize flink job-id failed
        at 
org.apache.seatunnel.translation.flink.source.FlinkSourceSplitEnumeratorContext.getJobIdForV15(FlinkSourceSplitEnumeratorContext.java:152)
 
~[blob_p-ce35d9ba37fc821b91a3c1462ad9474638da52bc-02ed4dda7fda8af13c106a81eef8194f:2.3.12]
        at 
org.apache.seatunnel.translation.flink.source.FlinkSourceSplitEnumeratorContext.getFlinkJobId(FlinkSourceSplitEnumeratorContext.java:100)
 
~[blob_p-ce35d9ba37fc821b91a3c1462ad9474638da52bc-02ed4dda7fda8af13c106a81eef8194f:2.3.12]
        at 
org.apache.seatunnel.translation.flink.source.FlinkSourceSplitEnumeratorContext.<init>(FlinkSourceSplitEnumeratorContext.java:57)
 
~[blob_p-ce35d9ba37fc821b91a3c1462ad9474638da52bc-02ed4dda7fda8af13c106a81eef8194f:2.3.12]
        at 
org.apache.seatunnel.translation.flink.source.FlinkSource.restoreEnumerator(FlinkSource.java:116)
 
~[blob_p-ce35d9ba37fc821b91a3c1462ad9474638da52bc-02ed4dda7fda8af13c106a81eef8194f:2.3.12]
        at 
org.apache.seatunnel.translation.flink.source.FlinkSource.restoreEnumerator(FlinkSource.java:48)
 
~[blob_p-ce35d9ba37fc821b91a3c1462ad9474638da52bc-02ed4dda7fda8af13c106a81eef8194f:2.3.12]
        at 
org.apache.flink.runtime.source.coordinator.SourceCoordinator.resetToCheckpoint(SourceCoordinator.java:448)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$DeferrableCoordinator.resetAndStart(RecreateOnResetOperatorCoordinator.java:412)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator.lambda$resetToCheckpoint$7(RecreateOnResetOperatorCoordinator.java:156)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown 
Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.uniWhenCompleteStage(Unknown 
Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.whenComplete(Unknown Source) 
~[?:?]
        at 
org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator.resetToCheckpoint(RecreateOnResetOperatorCoordinator.java:143)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.resetToCheckpoint(OperatorCoordinatorHolder.java:284)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreStateToCoordinators(CheckpointCoordinator.java:2044)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreLatestCheckpointedStateInternal(CheckpointCoordinator.java:1758)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1872)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory.tryRestoreExecutionGraphFromSavepoint(DefaultExecutionGraphFactory.java:224)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory.createAndRestoreExecutionGraph(DefaultExecutionGraphFactory.java:199)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:371)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:214) 
~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:140)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:156)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:122)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:379)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:356) 
~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:128)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:100)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown 
Source) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
[?:?]
        at java.lang.Thread.run(Unknown Source) [?:?]
   Caused by: java.lang.NullPointerException: Cannot invoke "Object.getClass()" 
because "obj" is null
        at 
org.apache.seatunnel.translation.flink.source.FlinkSourceSplitEnumeratorContext.getJobIdForV15(FlinkSourceSplitEnumeratorContext.java:142)
 
~[blob_p-ce35d9ba37fc821b91a3c1462ad9474638da52bc-02ed4dda7fda8af13c106a81eef8194f:2.3.12]
        ... 31 more
   2025-12-29 06:07:31,309 INFO  
org.apache.kafka.clients.admin.AdminClientConfig             [] - 
AdminClientConfig values: 
        bootstrap.servers = 
[tom-cluster-kafka-bootstrap.kafka.svc.cluster.local:9092]
        client.dns.lookup = use_all_dns_ips
        client.id = seatunnel-enumerator-admin-client-1022909018
        connections.max.idle.ms = 300000
        default.api.timeout.ms = 60000
        metadata.max.age.ms = 300000
        metric.reporters = []
        metrics.num.samples = 2
        metrics.recording.level = INFO
        metrics.sample.window.ms = 30000
        receive.buffer.bytes = 65536
        reconnect.backoff.max.ms = 1000
        reconnect.backoff.ms = 50
        request.timeout.ms = 30000
        retries = 2147483647
        retry.backoff.ms = 100
        sasl.client.callback.handler.class = null
        sasl.jaas.config = null
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.min.time.before.relogin = 60000
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        sasl.kerberos.ticket.renew.window.factor = 0.8
        sasl.login.callback.handler.class = null
        sasl.login.class = null
        sasl.login.connect.timeout.ms = null
        sasl.login.read.timeout.ms = null
        sasl.login.refresh.buffer.seconds = 300
        sasl.login.refresh.min.period.seconds = 60
        sasl.login.refresh.window.factor = 0.8
        sasl.login.refresh.window.jitter = 0.05
        sasl.login.retry.backoff.max.ms = 10000
        sasl.login.retry.backoff.ms = 100
        sasl.mechanism = GSSAPI
        sasl.oauthbearer.clock.skew.seconds = 30
        sasl.oauthbearer.expected.audience = null
        sasl.oauthbearer.expected.issuer = null
        sasl.oauthbearer.jwks.endpoint.refresh.ms = 3600000
        sasl.oauthbearer.jwks.endpoint.retry.backoff.max.ms = 10000
        sasl.oauthbearer.jwks.endpoint.retry.backoff.ms = 100
        sasl.oauthbearer.jwks.endpoint.url = null
        sasl.oauthbearer.scope.claim.name = scope
        sasl.oauthbearer.sub.claim.name = sub
        sasl.oauthbearer.token.endpoint.url = null
        security.protocol = PLAINTEXT
        security.providers = null
        send.buffer.bytes = 131072
        socket.connection.setup.timeout.max.ms = 30000
        socket.connection.setup.timeout.ms = 10000
        ssl.cipher.suites = null
        ssl.enabled.protocols = [TLSv1.2, TLSv1.3]
        ssl.endpoint.identification.algorithm = https
        ssl.engine.factory.class = null
        ssl.key.password = null
        ssl.keymanager.algorithm = SunX509
        ssl.keystore.certificate.chain = null
        ssl.keystore.key = null
        ssl.keystore.location = null
        ssl.keystore.password = null
        ssl.keystore.type = JKS
        ssl.protocol = TLSv1.3
        ssl.provider = null
        ssl.secure.random.implementation = null
        ssl.trustmanager.algorithm = PKIX
        ssl.truststore.certificates = null
        ssl.truststore.location = null
        ssl.truststore.password = null
        ssl.truststore.type = JKS
   
   2025-12-29 06:07:31,493 INFO  org.apache.kafka.common.utils.AppInfoParser    
              [] - Kafka version: 3.2.0
   2025-12-29 06:07:31,493 INFO  org.apache.kafka.common.utils.AppInfoParser    
              [] - Kafka commitId: 38103ffaa962ef50
   2025-12-29 06:07:31,493 INFO  org.apache.kafka.common.utils.AppInfoParser    
              [] - Kafka startTimeMs: 1766988451491
   2025-12-29 06:07:31,494 INFO  
org.apache.seatunnel.connectors.seatunnel.kafka.source.KafkaSourceSplitEnumerator
 [] - Task is being restored, forcing start mode to GROUP_OFFSETS for all topics
   2025-12-29 06:07:31,496 WARN  
org.apache.seatunnel.translation.flink.source.FlinkSourceSplitEnumeratorContext 
[] - Get flink job id failed
   java.lang.IllegalStateException: Initialize flink job-id failed
        at 
org.apache.seatunnel.translation.flink.source.FlinkSourceSplitEnumeratorContext.getJobIdForV15(FlinkSourceSplitEnumeratorContext.java:152)
 
~[blob_p-ce35d9ba37fc821b91a3c1462ad9474638da52bc-02ed4dda7fda8af13c106a81eef8194f:2.3.12]
        at 
org.apache.seatunnel.translation.flink.source.FlinkSourceSplitEnumeratorContext.getFlinkJobId(FlinkSourceSplitEnumeratorContext.java:100)
 
~[blob_p-ce35d9ba37fc821b91a3c1462ad9474638da52bc-02ed4dda7fda8af13c106a81eef8194f:2.3.12]
        at 
org.apache.seatunnel.translation.flink.source.FlinkSourceSplitEnumeratorContext.<init>(FlinkSourceSplitEnumeratorContext.java:57)
 
~[blob_p-ce35d9ba37fc821b91a3c1462ad9474638da52bc-02ed4dda7fda8af13c106a81eef8194f:2.3.12]
        at 
org.apache.seatunnel.translation.flink.source.FlinkSourceEnumerator.<init>(FlinkSourceEnumerator.java:69)
 
~[blob_p-ce35d9ba37fc821b91a3c1462ad9474638da52bc-02ed4dda7fda8af13c106a81eef8194f:2.3.12]
        at 
org.apache.seatunnel.translation.flink.source.FlinkSource.restoreEnumerator(FlinkSource.java:120)
 
~[blob_p-ce35d9ba37fc821b91a3c1462ad9474638da52bc-02ed4dda7fda8af13c106a81eef8194f:2.3.12]
        at 
org.apache.seatunnel.translation.flink.source.FlinkSource.restoreEnumerator(FlinkSource.java:48)
 
~[blob_p-ce35d9ba37fc821b91a3c1462ad9474638da52bc-02ed4dda7fda8af13c106a81eef8194f:2.3.12]
        at 
org.apache.flink.runtime.source.coordinator.SourceCoordinator.resetToCheckpoint(SourceCoordinator.java:448)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$DeferrableCoordinator.resetAndStart(RecreateOnResetOperatorCoordinator.java:412)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator.lambda$resetToCheckpoint$7(RecreateOnResetOperatorCoordinator.java:156)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown 
Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.uniWhenCompleteStage(Unknown 
Source) ~[?:?]
        at java.util.concurrent.CompletableFuture.whenComplete(Unknown Source) 
~[?:?]
        at 
org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator.resetToCheckpoint(RecreateOnResetOperatorCoordinator.java:143)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder.resetToCheckpoint(OperatorCoordinatorHolder.java:284)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreStateToCoordinators(CheckpointCoordinator.java:2044)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreLatestCheckpointedStateInternal(CheckpointCoordinator.java:1758)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1872)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory.tryRestoreExecutionGraphFromSavepoint(DefaultExecutionGraphFactory.java:224)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.scheduler.DefaultExecutionGraphFactory.createAndRestoreExecutionGraph(DefaultExecutionGraphFactory.java:199)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.scheduler.SchedulerBase.createAndRestoreExecutionGraph(SchedulerBase.java:371)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.scheduler.SchedulerBase.<init>(SchedulerBase.java:214) 
~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.scheduler.DefaultScheduler.<init>(DefaultScheduler.java:140)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.scheduler.DefaultSchedulerFactory.createInstance(DefaultSchedulerFactory.java:156)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.jobmaster.DefaultSlotPoolServiceSchedulerFactory.createScheduler(DefaultSlotPoolServiceSchedulerFactory.java:122)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.jobmaster.JobMaster.createScheduler(JobMaster.java:379)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.jobmaster.JobMaster.<init>(JobMaster.java:356) 
~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.internalCreateJobMasterService(DefaultJobMasterServiceFactory.java:128)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.runtime.jobmaster.factories.DefaultJobMasterServiceFactory.lambda$createJobMasterService$0(DefaultJobMasterServiceFactory.java:100)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at 
org.apache.flink.util.function.FunctionUtils.lambda$uncheckedSupplier$4(FunctionUtils.java:112)
 ~[flink-dist-1.18.0.jar:1.18.0]
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown 
Source) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
[?:?]
        at java.lang.Thread.run(Unknown Source) [?:?]
   Caused by: java.lang.NullPointerException: Cannot invoke "Object.getClass()" 
because "obj" is null
        at 
org.apache.seatunnel.translation.flink.source.FlinkSourceSplitEnumeratorContext.getJobIdForV15(FlinkSourceSplitEnumeratorContext.java:142)
 
~[blob_p-ce35d9ba37fc821b91a3c1462ad9474638da52bc-02ed4dda7fda8af13c106a81eef8194f:2.3.12]
        ... 32 more
   2025-12-29 06:07:31,573 INFO  org.apache.flink.runtime.jobmaster.JobMast
   2025-12-29 06:07:31,573 INFO  org.apache.flink.runtime.jobmaster.JobMaster   
              [] - Using failover strategy 
org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@699c93b4
 for SeaTunnel (bd6a33ade95d6a15ffaabc25c2fca8c2).
   2025-12-29 06:07:31,589 INFO  
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
Starting DefaultLeaderRetrievalService with 
KubernetesLeaderRetrievalDriver{configMapName='seatunnel-flink-streaming-example-3-cluster-config-map'}.
   2025-12-29 06:07:31,589 INFO  
org.apache.flink.kubernetes.kubeclient.resources.KubernetesConfigMapSharedInformer
 [] - Starting to watch for 
kafka/seatunnel-flink-streaming-example-3-cluster-config-map, watching 
id:715188fd-67ff-4c81-8dc5-2c8d5c2c0f99
   2025-12-29 06:07:31,589 INFO  org.apache.flink.runtime.jobmaster.JobMaster   
              [] - Starting execution of job 'SeaTunnel' 
(bd6a33ade95d6a15ffaabc25c2fca8c2) under job master id 
b51439a7fbdc7b39047daed657f64893.
   2025-12-29 06:07:31,593 INFO  
org.apache.flink.runtime.source.coordinator.SourceCoordinator [] - Starting 
split enumerator for source Source: Kafka-Source.
   2025-12-29 06:07:31,595 INFO  org.apache.flink.runtime.jobmaster.JobMaster   
              [] - Starting scheduling with scheduling strategy 
[org.apache.flink.runtime.scheduler.strategy.PipelinedRegionSchedulingStrategy]
   2025-12-29 06:07:31,595 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Job SeaTunnel 
(bd6a33ade95d6a15ffaabc25c2fca8c2) switched from state CREATED to RUNNING.
   2025-12-29 06:07:31,653 INFO  
org.apache.seatunnel.api.event.LoggingEventHandler           [] - log event: 
EnumeratorOpenEvent(createdTime=1766988451653, jobId=null, 
eventType=LIFECYCLE_ENUMERATOR_OPEN)
   2025-12-29 06:07:31,671 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: Committer (1/3) 
(d5762617aee8a970df5d8d9bedb25c17_cbc357ccb763df2852fee8c4fc7d55f2_0_0) 
switched from CREATED to SCHEDULED.
   2025-12-29 06:07:31,690 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: Committer (2/3) 
(d5762617aee8a970df5d8d9bedb25c17_cbc357ccb763df2852fee8c4fc7d55f2_1_0) 
switched from CREATED to SCHEDULED.
   2025-12-29 06:07:31,691 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: Committer (3/3) 
(d5762617aee8a970df5d8d9bedb25c17_cbc357ccb763df2852fee8c4fc7d55f2_2_0) 
switched from CREATED to SCHEDULED.
   2025-12-29 06:07:31,692 INFO  org.apache.flink.runtime.jobmaster.JobMaster   
              [] - Connecting to ResourceManager 
pekko.tcp://[email protected]:6123/user/rpc/resourcemanager_0(b51439a7fbdc7b39047daed657f64893)
   2025-12-29 06:07:31,772 INFO  org.apache.flink.runtime.jobmaster.JobMaster   
              [] - Resolved ResourceManager address, beginning registration
   2025-12-29 06:07:31,774 INFO  
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
Starting DefaultLeaderRetrievalService with 
KubernetesLeaderRetrievalDriver{configMapName='seatunnel-flink-streaming-example-3-cluster-config-map'}.
   2025-12-29 06:07:31,774 INFO  
org.apache.flink.kubernetes.kubeclient.resources.KubernetesConfigMapSharedInformer
 [] - Starting to watch for 
kafka/seatunnel-flink-streaming-example-3-cluster-config-map, watching 
id:8f4ba652-bba7-469f-a0ce-dbb1973afa84
   2025-12-29 06:07:31,774 INFO  
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
Registering job manager 
[email protected]://[email protected]:6123/user/rpc/jobmanager_2
 for job bd6a33ade95d6a15ffaabc25c2fca8c2.
   2025-12-29 06:07:31,786 INFO  
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
Registered job manager 
[email protected]://[email protected]:6123/user/rpc/jobmanager_2
 for job bd6a33ade95d6a15ffaabc25c2fca8c2.
   2025-12-29 06:07:31,793 INFO  org.apache.flink.runtime.jobmaster.JobMaster   
              [] - JobManager successfully registered at ResourceManager, 
leader id: b51439a7fbdc7b39047daed657f64893.
   2025-12-29 06:07:31,794 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager [] 
- Received resource requirements from job bd6a33ade95d6a15ffaabc25c2fca8c2: 
[ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
numberOfRequiredSlots=3}]
   2025-12-29 06:07:31,872 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager [] 
- Matching resource requirements against available resources.
   Missing resources:
         Job bd6a33ade95d6a15ffaabc25c2fca8c2
                ResourceRequirement{resourceProfile=ResourceProfile{UNKNOWN}, 
numberOfRequiredSlots=3}
   Current resources:
        (none)
   2025-12-29 06:07:31,874 INFO  
org.apache.seatunnel.core.starter.flink.execution.FlinkExecution [] - Job 
finished, execution result: 
   
[!!!org.apache.seatunnel.translation.flink.metric.FlinkJobMetricsSummary@1ceda53f=>org.apache.flink.api.common.InvalidProgramException:Job
 was submitted in detached mode. Results of job execution, such as 
accumulators, runtime, etc. are not available. !!!]
   2025-12-29 06:07:31,971 INFO  
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - need 
request 1 new workers, current worker number 0, declared worker number 1
   2025-12-29 06:07:31,972 INFO  
org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils [] - The derived 
from fraction jvm overhead memory (92.444mb (96935027 bytes)) is less than its 
min value 192.000mb (201326592 bytes), min value will be used instead
   2025-12-29 06:07:31,972 INFO  
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
Requesting new worker with resource spec WorkerResourceSpec {cpuCores=1.1, 
taskHeapSize=25.600mb (26843542 bytes), taskOffHeapSize=0 bytes, 
networkMemSize=64.000mb (67108864 bytes), managedMemSize=230.400mb (241591914 
bytes), numSlots=8}, current pending count: 1.
   2025-12-29 06:07:31,980 INFO  
org.apache.flink.runtime.externalresource.ExternalResourceUtils [] - Enabled 
external resources: []
   2025-12-29 06:07:32,301 INFO  org.apache.flink.configuration.Configuration   
              [] - Config uses fallback configuration key 
'kubernetes.service-account' instead of key 
'kubernetes.taskmanager.service-account'
   2025-12-29 06:07:32,307 INFO  
org.apache.flink.kubernetes.utils.KubernetesUtils            [] - The main 
container image pull policy configured in pod template will be overwritten to 
'Always' because of explicitly configured options.
   2025-12-29 06:07:32,376 INFO  
org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Creating new 
TaskManager pod with name seatunnel-flink-streaming-example-3-taskmanager-1-1 
and resource <1024,1.1>.
   2025-12-29 06:07:33,338 INFO  
org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Pod 
seatunnel-flink-streaming-example-3-taskmanager-1-1 is created.
   2025-12-29 06:07:33,382 INFO  
org.apache.flink.kubernetes.KubernetesResourceManagerDriver  [] - Received new 
TaskManager pod: seatunnel-flink-streaming-example-3-taskmanager-1-1
   2025-12-29 06:07:33,383 INFO  
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
Requested worker seatunnel-flink-streaming-example-3-taskmanager-1-1 with 
resource spec WorkerResourceSpec {cpuCores=1.1, taskHeapSize=25.600mb (26843542 
bytes), taskOffHeapSize=0 bytes, networkMemSize=64.000mb (67108864 bytes), 
managedMemSize=230.400mb (241591914 bytes), numSlots=8}.
   2025-12-29 06:07:49,176 INFO  
org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to 
trigger checkpoint for job bd6a33ade95d6a15ffaabc25c2fca8c2 since Checkpoint 
triggering task Source: Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: 
Committer (1/3) of job bd6a33ade95d6a15ffaabc25c2fca8c2 is not being executed 
at the moment. Aborting checkpoint. Failure reason: Not all required tasks are 
currently running..
   2025-12-29 06:08:04,588 INFO  
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
Registering TaskManager with ResourceID 
seatunnel-flink-streaming-example-3-taskmanager-1-1 
(pekko.tcp://[email protected]:6122/user/rpc/taskmanager_0) at 
ResourceManager
   2025-12-29 06:08:04,682 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.FineGrainedSlotManager [] 
- Registering task executor seatunnel-flink-streaming-example-3-taskmanager-1-1 
under c0a7bcab446f42bcb2d1b76cc5d46e7f at the slot manager.
   2025-12-29 06:08:04,684 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer [] 
- Starting allocation of slot fc9290907d17d74942cf33818525ac0b from 
seatunnel-flink-streaming-example-3-taskmanager-1-1 for job 
bd6a33ade95d6a15ffaabc25c2fca8c2 with resource profile 
ResourceProfile{cpuCores=0.1375, taskHeapMemory=3.200mb (3355442 bytes), 
taskOffHeapMemory=0 bytes, managedMemory=28.800mb (30198989 bytes), 
networkMemory=8.000mb (8388608 bytes)}.
   2025-12-29 06:08:04,689 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer [] 
- Starting allocation of slot f461814e1852bddfcbc59b52742796d7 from 
seatunnel-flink-streaming-example-3-taskmanager-1-1 for job 
bd6a33ade95d6a15ffaabc25c2fca8c2 with resource profile 
ResourceProfile{cpuCores=0.1375, taskHeapMemory=3.200mb (3355442 bytes), 
taskOffHeapMemory=0 bytes, managedMemory=28.800mb (30198989 bytes), 
networkMemory=8.000mb (8388608 bytes)}.
   2025-12-29 06:08:04,690 INFO  
org.apache.flink.runtime.resourcemanager.slotmanager.DefaultSlotStatusSyncer [] 
- Starting allocation of slot 4e34c9a12c7cadb6b829d3ba71dd0c2e from 
seatunnel-flink-streaming-example-3-taskmanager-1-1 for job 
bd6a33ade95d6a15ffaabc25c2fca8c2 with resource profile 
ResourceProfile{cpuCores=0.1375, taskHeapMemory=3.200mb (3355442 bytes), 
taskOffHeapMemory=0 bytes, managedMemory=28.800mb (30198989 bytes), 
networkMemory=8.000mb (8388608 bytes)}.
   2025-12-29 06:08:04,691 INFO  
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
Worker seatunnel-flink-streaming-example-3-taskmanager-1-1 is registered.
   2025-12-29 06:08:04,691 INFO  
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
Worker seatunnel-flink-streaming-example-3-taskmanager-1-1 with resource spec 
WorkerResourceSpec {cpuCores=1.1, taskHeapSize=25.600mb (26843542 bytes), 
taskOffHeapSize=0 bytes, networkMemSize=64.000mb (67108864 bytes), 
managedMemSize=230.400mb (241591914 bytes), numSlots=8} was requested in 
current attempt. Current pending count after registering: 0.
   2025-12-29 06:08:04,986 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: Committer (1/3) 
(d5762617aee8a970df5d8d9bedb25c17_cbc357ccb763df2852fee8c4fc7d55f2_0_0) 
switched from SCHEDULED to DEPLOYING.
   2025-12-29 06:08:04,987 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Deploying 
Source: Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: Committer (1/3) 
(attempt #0) with attempt id 
d5762617aee8a970df5d8d9bedb25c17_cbc357ccb763df2852fee8c4fc7d55f2_0_0 and 
vertex id cbc357ccb763df2852fee8c4fc7d55f2_0 to 
seatunnel-flink-streaming-example-3-taskmanager-1-1 @ 10.244.146.216 
(dataPort=41061) with allocation id fc9290907d17d74942cf33818525ac0b
   2025-12-29 06:08:05,008 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: Committer (2/3) 
(d5762617aee8a970df5d8d9bedb25c17_cbc357ccb763df2852fee8c4fc7d55f2_1_0) 
switched from SCHEDULED to DEPLOYING.
   2025-12-29 06:08:05,008 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Deploying 
Source: Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: Committer (2/3) 
(attempt #0) with attempt id 
d5762617aee8a970df5d8d9bedb25c17_cbc357ccb763df2852fee8c4fc7d55f2_1_0 and 
vertex id cbc357ccb763df2852fee8c4fc7d55f2_1 to 
seatunnel-flink-streaming-example-3-taskmanager-1-1 @ 10.244.146.216 
(dataPort=41061) with allocation id f461814e1852bddfcbc59b52742796d7
   2025-12-29 06:08:05,008 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: Committer (3/3) 
(d5762617aee8a970df5d8d9bedb25c17_cbc357ccb763df2852fee8c4fc7d55f2_2_0) 
switched from SCHEDULED to DEPLOYING.
   2025-12-29 06:08:05,009 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Deploying 
Source: Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: Committer (3/3) 
(attempt #0) with attempt id 
d5762617aee8a970df5d8d9bedb25c17_cbc357ccb763df2852fee8c4fc7d55f2_2_0 and 
vertex id cbc357ccb763df2852fee8c4fc7d55f2_2 to 
seatunnel-flink-streaming-example-3-taskmanager-1-1 @ 10.244.146.216 
(dataPort=41061) with allocation id 4e34c9a12c7cadb6b829d3ba71dd0c2e
   2025-12-29 06:08:20,478 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: Committer (3/3) 
(d5762617aee8a970df5d8d9bedb25c17_cbc357ccb763df2852fee8c4fc7d55f2_2_0) 
switched from DEPLOYING to INITIALIZING.
   2025-12-29 06:08:20,483 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: Committer (1/3) 
(d5762617aee8a970df5d8d9bedb25c17_cbc357ccb763df2852fee8c4fc7d55f2_0_0) 
switched from DEPLOYING to INITIALIZING.
   2025-12-29 06:08:20,484 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph       [] - Source: 
Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: Committer (2/3) 
(d5762617aee8a970df5d8d9bedb25c17_cbc357ccb763df2852fee8c4fc7d55f2_1_0) 
switched from DEPLOYING to INITIALIZING.
   2025-12-29 06:08:49,170 INFO  
org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to 
trigger checkpoint for job bd6a33ade95d6a15ffaabc25c2fca8c2 since Checkpoint 
triggering task Source: Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: 
Committer (1/3) of job bd6a33ade95d6a15ffaabc25c2fca8c2 is not being executed 
at the moment. Aborting checkpoint. Failure reason: Not all required tasks are 
currently running..
   2025-12-29 06:09:49,170 INFO  
org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to 
trigger checkpoint for job bd6a33ade95d6a15ffaabc25c2fca8c2 since Checkpoint 
triggering task Source: Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: 
Committer (1/3) of job bd6a33ade95d6a15ffaabc25c2fca8c2 is not being executed 
at the moment. Aborting checkpoint. Failure reason: Not all required tasks are 
currently running..
   2025-12-29 06:10:49,170 INFO  
org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to 
trigger checkpoint for job bd6a33ade95d6a15ffaabc25c2fca8c2 since Checkpoint 
triggering task Source: Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: 
Committer (1/3) of job bd6a33ade95d6a15ffaabc25c2fca8c2 is not being executed 
at the moment. Aborting checkpoint. Failure reason: Not all required tasks are 
currently running..
   2025-12-29 06:11:49,170 INFO  
org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to 
trigger checkpoint for job bd6a33ade95d6a15ffaabc25c2fca8c2 since Checkpoint 
triggering task Source: Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: 
Committer (1/3) of job bd6a33ade95d6a15ffaabc25c2fca8c2 is not being executed 
at the moment. Aborting checkpoint. Failure reason: Not all required tasks are 
currently running..
   2025-12-29 06:12:33,694 INFO  org.apache.kafka.clients.NetworkClient         
              [] - [AdminClient 
clientId=seatunnel-enumerator-admin-client-1022909018] Node -1 disconnected.
   2025-12-29 06:12:49,170 INFO  
org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to 
trigger checkpoint for job bd6a33ade95d6a15ffaabc25c2fca8c2 since Checkpoint 
triggering task Source: Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: 
Committer (1/3) of job bd6a33ade95d6a15ffaabc25c2fca8c2 is not being executed 
at the moment. Aborting checkpoint. Failure reason: Not all required tasks are 
currently running..
   2025-12-29 06:13:49,170 INFO  
org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to 
trigger checkpoint for job bd6a33ade95d6a15ffaabc25c2fca8c2 since Checkpoint 
triggering task Source: Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: 
Committer (1/3) of job bd6a33ade95d6a15ffaabc25c2fca8c2 is not being executed 
at the moment. Aborting checkpoint. Failure reason: Not all required tasks are 
currently running..
   2025-12-29 06:14:49,170 INFO  
org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to 
trigger checkpoint for job bd6a33ade95d6a15ffaabc25c2fca8c2 since Checkpoint 
triggering task Source: Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: 
Committer (1/3) of job bd6a33ade95d6a15ffaabc25c2fca8c2 is not being executed 
at the moment. Aborting checkpoint. Failure reason: Not all required tasks are 
currently running..
   2025-12-29 06:15:49,170 INFO  
org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to 
trigger checkpoint for job bd6a33ade95d6a15ffaabc25c2fca8c2 since Checkpoint 
triggering task Source: Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: 
Committer (1/3) of job bd6a33ade95d6a15ffaabc25c2fca8c2 is not being executed 
at the moment. Aborting checkpoint. Failure reason: Not all required tasks are 
currently running..
   2025-12-29 06:16:49,169 INFO  
org.apache.flink.runtime.checkpoint.CheckpointFailureManager [] - Failed to 
trigger checkpoint for job bd6a33ade95d6a15ffaabc25c2fca8c2 since Checkpoint 
triggering task Source: Kafka-Source -> Kafka-Sink: Writer -> Kafka-Sink: 
Committer (1/3) of job bd6a33ade95d6a15ffaabc25c2fca8c2 is not being executed 
at the moment. Aborting checkpoint. Failure reason: Not all required tasks are 
currently running..
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Bug] Seatunnel +Flink Engine: recent versions of seatunnel do NOT support to restore from flink (versions: 1.16 1.17 1.18） checkpoints/savepoints [seatunnel]

Reply via email to