devinjdangelo opened a new pull request, #4859:
URL: https://github.com/apache/arrow-rs/pull/4859

   # Which issue does this PR close?
   
   Related to: #1718 
   Enables: <forthcoming>
   
   # Rationale for this change
    
   #4850 enabled external access to `ArrowRowGroupWriter` so downstream users 
could orchestrate serialization of row groups in parallel on threads/tokio 
tasks as desired. This PR goes one level deeper to make `ArrowColumnWriter` and 
associated structs/functions public, so that downstream users can serialize 
columns in parallel. 
   
   This PR also adds some utility methods to break apart and reconstruct 
`ArrowRowGroupWriter`. The idea is to do the following:
   
   1. Initialize `ArrowRowGroupWriter`
   2. Break into component `ArrowColumnWriter`s and distribute to threads/tasks
   3. Serialize columns in parallel
   4. Join column writers back to main thread, reconstruct 
`ArrowRowGroupWriter` and finalize the row group
   
   The above strategy is implemented in <forthcoming>
   
   # What changes are included in this PR?
   `ArrowColumnWriter` and associated sturcts/functions are marked `pub`. 
Additional utility methods implemented for `ArrowColumnWriter.
   
   
   # Are there any user-facing changes?
   
   Additional structs and functions are public.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to