YuweiXiao opened a new pull request #4441:
URL: https://github.com/apache/hudi/pull/4441


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   Restructure the bulk insert partitioner interface, to include the handling 
of fileIdPfx & write handle factory.
   
   With this improvement, one can implement a new bulk_insert partitioner that 
is capable of routing records to pre-defined fileIds using customized write 
factory (e.g., different write factories for different partitions)
   
   ## Brief change log
   
   - Modify interface of `BulkInsertPartitioner`
   - Modify bulk_insert write path (e.g., `AbstractBulkInsertHelper` and its 
subclasses) to make use of the new partitioner interface
   - The java bulk_insert write path is mostly left untouched because of its 
specialty, e.g., always write to a single filegroup (i.e., parallelism=1) and 
has customized fileId generator `FileIdPrefixProvider`.
   
   ## Verify this pull request
   
   Added a fileId generation check to existing tests, and other parts are 
already covered by existing tests, such as `TestBulkInsertInternalPartitioner`.
   
   ## Committer checklist
   
    - [x] Has a corresponding JIRA in PR title & commit
    
    - [x] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to