[PR] feat: use aligned slice access during bulk append in SparkUnsafeArray [datafusion-comet]

via GitHub Wed, 17 Jun 2026 17:33:10 -0700


sandugood opened a new pull request, #4672:
URL: https://github.com/apache/datafusion-comet/pull/4672


   ## Which issue does this PR close?
   
   Closes #4626 
   
   ## Rationale for this change
   
   Currently, there is a bottleneck in performance during bulk append in the 
`SparkUnsafeArray` implementation (in `macro` and respective `Builder` types 
for `bool`, `date32` and `timestamp`). If the array is `NULLABLE` there is a 
hotspot:
   - we check each corresponding element at the index for nullability using 
`Self::is_null_in_bitset()` which is suboptimal (see the benchmarking results 
below)
   
   ## What changes are included in this PR?
   
   With this PR we change the flow of execution for nullable arrays using 
native arrow methods:
   - appending nulls to the current builder
   - getting `.slices_mut() `from the passed builder to get the reference to 
its content (data and null buffer)
   - checking if we land on byte boundary
   - switching between Spark and Arrow null mask bits. 
   
   All of these changes are also documented in the code as docstrings. This 
part of native code wasn't well documented, so I thought it would be great to 
add a bunch of comments for future contributors. 
   
   ## How are these changes tested?
   
   All of the basic tests pass.
   Here is additional result from benchmarking in array_element_append.rs 
(benches for `NULLABLE` case)
   
   | array_type | main (current) | incoming PR |
   | ---| --- | --- |
   | i32 | 30.030μs | 1.877μs |
   | i64 | 27.572μs | 3.074μs |
   | f64 | 30.059μs | 3.082μs |
   | date32 | 28.693μs | 1.905μs |
   | timestamp | 46.832μs | 3.153μs |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] feat: use aligned slice access during bulk append in SparkUnsafeArray [datafusion-comet]

Reply via email to