YuweiXiao opened a new pull request #4441: URL: https://github.com/apache/hudi/pull/4441
## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.* ## What is the purpose of the pull request Restructure the bulk insert partitioner interface, to include the handling of fileIdPfx & write handle factory. With this improvement, one can implement a new bulk_insert partitioner that is capable of routing records to pre-defined fileIds using customized write factory (e.g., different write factories for different partitions) ## Brief change log - Modify interface of `BulkInsertPartitioner` - Modify bulk_insert write path (e.g., `AbstractBulkInsertHelper` and its subclasses) to make use of the new partitioner interface - The java bulk_insert write path is mostly left untouched because of its specialty, e.g., always write to a single filegroup (i.e., parallelism=1) and has customized fileId generator `FileIdPrefixProvider`. ## Verify this pull request Added a fileId generation check to existing tests, and other parts are already covered by existing tests, such as `TestBulkInsertInternalPartitioner`. ## Committer checklist - [x] Has a corresponding JIRA in PR title & commit - [x] Commit message is descriptive of the change - [ ] CI is green - [ ] Necessary doc changes done or have another open PR - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org