carpecodeum opened a new pull request, #7946: URL: https://github.com/apache/arrow-rs/pull/7946
# Which issue does this PR close? This PR implements efficient path-based field extraction and manipulation capabilities for `VariantArray`, enabling direct access to nested fields without expensive unshredding operations. Follow-up on #7919 - Closes #7941 # Rationale for this change This work builds directly on the path navigation concepts introduced in https://github.com/apache/arrow-rs/pull/7919, sharing the fundamental VariantPathElement design with Field and Index variants. While PR https://github.com/apache/arrow-rs/pull/7919 provided a compute kernel approach with a variant_get function, this PR provides instance-based methods directly on VariantArray with a builder API using owned strings rather than PR https://github.com/apache/arrow-rs/pull/7919's vector-based approach. This is a draft still, as the changes for #7919 got merged today, I still have to incorporate those changes, and looking forward to reviews and suggestions. This PR is complementary to https://github.com/apache/arrow-rs/pull/7921, which implements schema-driven shredding during array construction. This PR provides runtime path-based access to both shredded and unshredded data, creating a complete solution for both efficient construction and efficient access of variant data. # What changes are included in this PR? Field removal operations through methods like `remove_field` and `remove_fields` enable removal of specific fields from variant data, crucial for shredding operations where temporary or debug fields need to be stripped. `field_operations.rs` provides direct binary manipulation through functions like `get_path_bytes`, `extract_field_bytes`, and `remove_field_bytes` that operate on raw binary format without constructing intermediate objects. `variant_parser.rs` supports all variant types with specialized parsers for 17 different primitive types, providing the foundation for efficient binary navigation. The performance-critical byte operations could serve as the underlying implementation for PR #7919's compute kernel, potentially providing better performance for batch operations by avoiding object construction overhead. The field removal capabilities could extend PR #7919's functionality beyond extraction to comprehensive field manipulation. The instance-based approach provides different ergonomics that complement PR #7919's compute kernel approach. This PR focuses on runtime access and manipulation rather than construction-time optimization, leaving build-time schema-driven shredding to PR #7921. Future work is integration with PR #7919's compute kernel approach, potentially using this PR's byte-level operations as the underlying implementation. # Are these changes tested? Yes, tests are added # Are there any user-facing changes? Not yet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org