[PR] [VARIANT] Path-based Field Extraction for VariantArray [arrow-rs]

via GitHub Wed, 16 Jul 2025 14:51:47 -0700


carpecodeum opened a new pull request, #7946:
URL: https://github.com/apache/arrow-rs/pull/7946


   # Which issue does this PR close?
   
   This PR implements efficient path-based field extraction and manipulation 
capabilities for `VariantArray`, enabling direct access to nested fields 
without expensive unshredding operations.
   Follow-up on #7919 
   
   - Closes #7941 
   
   # Rationale for this change
   
   This work builds directly on the path navigation concepts introduced in 
https://github.com/apache/arrow-rs/pull/7919, sharing the fundamental 
VariantPathElement design with Field and Index variants. While PR 
https://github.com/apache/arrow-rs/pull/7919 provided a compute kernel approach 
with a variant_get function, this PR provides instance-based methods directly 
on VariantArray with a builder API using owned strings rather than PR 
https://github.com/apache/arrow-rs/pull/7919's vector-based approach.
   
   This is a draft still, as the changes for #7919 got merged today, I still 
have to incorporate those changes, and looking forward to reviews and 
suggestions.
   
   This PR is complementary to https://github.com/apache/arrow-rs/pull/7921, 
which implements schema-driven shredding during array construction. This PR 
provides runtime path-based access to both shredded and unshredded data, 
creating a complete solution for both efficient construction and efficient 
access of variant data.
   
   # What changes are included in this PR?
   
   Field removal operations through methods like `remove_field` and 
`remove_fields` enable removal of specific fields from variant data, crucial 
for shredding operations where temporary or debug fields need to be stripped. 
`field_operations.rs` provides direct binary manipulation through functions 
like `get_path_bytes`, `extract_field_bytes`, and `remove_field_bytes` that 
operate on raw binary format without constructing intermediate objects. 
`variant_parser.rs` supports all variant types with specialized parsers for 17 
different primitive types, providing the foundation for efficient binary 
navigation.
   
   The performance-critical byte operations could serve as the underlying 
implementation for PR #7919's compute kernel, potentially providing better 
performance for batch operations by avoiding object construction overhead. The 
field removal capabilities could extend PR #7919's functionality beyond 
extraction to comprehensive field manipulation. The instance-based approach 
provides different ergonomics that complement PR #7919's compute kernel 
approach.
   
   This PR focuses on runtime access and manipulation rather than 
construction-time optimization, leaving build-time schema-driven shredding to 
PR #7921. Future work is integration with PR #7919's compute kernel approach, 
potentially using this PR's byte-level operations as the underlying 
implementation.
   
   # Are these changes tested?
   
   Yes, tests are added 
   
   # Are there any user-facing changes?
   
   Not yet
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[PR] [VARIANT] Path-based Field Extraction for VariantArray [arrow-rs]

Reply via email to