friendlymatthew opened a new pull request, #7878: URL: https://github.com/apache/arrow-rs/pull/7878
This PR is based off of this commit https://github.com/apache/arrow-rs/pull/7871. Please review from the second commit. # Rationale for this change This PR contains algorithmic modifications to the validation logic and the associated benchmarks, specifically targeting complex object and list validation. Previously, the approach involved iterating over each element and repeatedly fetching the same slice of the backing buffer, then slicing _into_ that buffer again for each individual element. This led to redundant buffer access. This validation approach is done in multiple passes that take advantage of the variant's memory layout. For example, dictionary field names are stored contiguously; instead of validating each field name slice is a UTF8 separately, we now validate the entire field name buffer in a single pass. The benchmark cases were adapted from `test_json_to_variant_object_very_large`, `test_json_to_variant_object_complex`, and `test_json_to_variant_array_nested_large` test cases. Compared to #7871, we observe a significant improvement in performance: <img width="576" alt="Screenshot 2025-07-07 at 10 25 07 AM" src="https://github.com/user-attachments/assets/b8644466-8259-4081-892b-c18f9f64b9f3" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org