Adamyuanyuan opened a new issue, #10138:
URL: https://github.com/apache/seatunnel/issues/10138

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   The batch synchronization task from MySQL to XXX automatically retries after 
the first TM disconnection, but the second run remains in the RUNNING state and 
never finishes naturally; it can only be stopped manually.
    
   JdbcSource bounded job never finishes after TaskManager failover 
(NoMoreSplitsEvent missing for recovered readers)
   
   
   ### SeaTunnel Version
   
   2.3.12
   
   ### SeaTunnel Config
   
   ```conf
   {
       "env" : {
           "execution.parallelism" : 1,
           "job.mode" : "BATCH"
       },
       "source" : [
           {
               "url" : 
"jdbc:mysql://10.xxxx:6301/xxxx?useUnicode=true&characterEncoding=UTF-8&useSSL=false",
               "driver" : "com.mysql.cj.jdbc.Driver",
               "user" : "readonly",
               "password" : "******",
               "query" : "select xxxx FROM bas_battery WHERE 1=1",
               "result_table_name" : "result_table",
               "plugin_name" : "Jdbc"
           }
       ],
       "sink" : [
           {
               "table_name" : "xxxx.xxxx",
               "metastore_uri" : "thrift://xxxx:9083,thrift:/xxxx:9083",
               "hdfs_site_path" : "xxxx/hdfs-site.xml",
               "hive_site_path" : "xxxx/hive-site.xml",
               "krb5_path" : "xxxx/krb5.conf",
               "source_table_name" : "source_table",
               "compress_codec" : "SNAPPY",
               "plugin_name" : "Hive"
           }
       ],
       "transform" : [
           {
               "source_table_name" : "result_table",
               "result_table_name" : "source_table",
               "query" : "select * from result_table",
               "plugin_name" : "Sql"
           }
       ]
   }
   ```
   
   ### Running Command
   
   ```shell
   use flink API
   ```
   
   ### Error Exception
   
   ```log
   1. The job started normally on the TaskManager container 
`container_e13_..._000002` for the first time:
      - `JdbcSourceSplitEnumerator` split the table into 1 split:
        - `Splitting table zzkj.bas_battery.`
        - `Split table zzkj.bas_battery into 1 splits.`
      - Then sent `NoMoreSplitsEvent`:
        - `No more splits to assign. Sending NoMoreSplitsEvent to reader [0].`
   2. After running for about 7 minutes, the heartbeat of this TaskManager 
timed out:
      - `Heartbeat of TaskManager ... timed out.`
      - The corresponding operator changed from `RUNNING` to `FAILED`, and the 
job changed from `RUNNING` to `RESTARTING`.
   3. Flink automatically recovered the job on the new container 
`container_e13_..._000003`:
      - `Recovering subtask 0 to checkpoint -1 for source Source: Jdbc-Source 
to checkpoint.`
      - `Adding splits back from subtask: 0, splits count: 1`
      - `Add back splits 1 to JdbcSourceSplitEnumerator.`
      - The new reader registered and obtained the split:
        - `Register reader 0 to JdbcSourceSplitEnumerator.`
        - `Assigning 1 splits to subtask: 0`
   4. From the second start until it was manually canceled, the following 
**never appeared** in the logs:
      - `No more splits to assign. Sending NoMoreSplitsEvent ...`
      - There were no exception stacks for any operators either.
   5. It was finally ended by an external cancel:
      - Job status: `RUNNING` → `CANCELLING` → `CANCELED`.
      - Final YARN status: `KILLED`, with the diagnosis being `null`.
      - SeaTunnel reported an error: `Reason:Flink job executed failed` 
(because the Flink Job was CANCELED).
   ```
   
   ### Zeta or Flink or Spark Version
   
   _No response_
   
   ### Java or Scala Version
   
   _No response_
   
   ### Screenshots
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to