friendlymatthew opened a new pull request, #7878:
URL: https://github.com/apache/arrow-rs/pull/7878

   This PR is based off of this commit 
https://github.com/apache/arrow-rs/pull/7871. Please review from the second 
commit.
   
   # Rationale for this change
   
   This PR contains algorithmic modifications to the validation logic and the 
associated benchmarks, specifically targeting complex object and list 
validation.
   
   Previously, the approach involved iterating over each element and repeatedly 
fetching the same slice of the backing buffer, then slicing _into_ that buffer 
again for each individual element. This led to redundant buffer access. 
   
   This validation approach is done in multiple passes that take advantage of 
the variant's memory layout. For example, dictionary field names are stored 
contiguously; instead of validating each field name slice is a UTF8 separately, 
we now validate the entire field name buffer in a single pass.
   
   The benchmark cases were adapted from 
`test_json_to_variant_object_very_large`, 
`test_json_to_variant_object_complex`, and 
`test_json_to_variant_array_nested_large` test cases. 
   
   Compared to #7871, we observe a significant improvement in performance:
   
   <img width="576" alt="Screenshot 2025-07-07 at 10 25 07 AM" 
src="https://github.com/user-attachments/assets/b8644466-8259-4081-892b-c18f9f64b9f3";
 />
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to