[I] [Feature][Hive/COS] 任务执行过程中，Rename操作时间过长，是否存在可优化空间？ [seatunnel]

via GitHub Sat, 17 Jan 2026 03:31:37 -0800


qifanlili opened a new issue, #9231:
URL: https://github.com/apache/seatunnel/issues/9231


   ### Search before asking
   
   - [x] I had searched in the 
[feature](https://github.com/apache/seatunnel/issues?q=is%3Aissue+label%3A%22Feature%22)
 and found no similar feature requirement.
   
   
   ### Description
   
   
Spark引擎执行过程中，所有的executors都执行完成后，Driver端会单线层执行rename操作，将数据文件从/tmp/seatunnel移动到最终目录。
 这个rename的过程是单线程串行执行的，当文件数量多的时候这个过程是非常漫长的。特别是使用对象存储的时候，如COS,似乎也是基于同样的逻辑。
   
   <img width="888" alt="Image" 
src="https://github.com/user-attachments/assets/2acd82dc-67d8-482b-a55d-66570123207c";
 />
   
   <img width="897" alt="Image" 
src="https://github.com/user-attachments/assets/82ac4a7a-d349-410a-93f7-7220ec8523e7";
 />
   
   <img width="956" alt="Image" 
src="https://github.com/user-attachments/assets/1266c838-a6a1-4a9e-b8d9-c56b6de57c36";
 />
   
   我有几个疑问和建议：
   1、为什么设计时使用的是单线程串行的方式？是出于规避什么风险吗？
   2、如果要做优化的话，是否可以参考阿里的jindo oss commit 通过Multipart Upload的方式来实现？或者有更合理的方式推荐呢？
   
   ### Usage Scenario
   
   _No response_
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [Feature][Hive/COS] 任务执行过程中，Rename操作时间过长，是否存在可优化空间？ [seatunnel]

Reply via email to