Tushar7012 opened a new pull request, #19975: URL: https://github.com/apache/datafusion/pull/19975
- Use values.views() instead of values.iter() for direct u128 access - Use is_valid(i) for efficient null checking via validity bitmap - Avoid dereferencing overhead for inline strings - No additional memory overhead in Entry struct Closes #19961 ## Which issue does this PR close? Closes #19961 ## Rationale for this change The [ArrowBytesViewMap](cci:2://file:///d:/Agentic_AI/Gssoc_Apache/datafusion/datafusion/physical-expr-common/src/binary_view_map.rs:115:0-136:1) was using `values.iter()` which creates unnecessary `Option` wrappers and extra overhead when iterating over byte view arrays. For ClickBench query 5, >50% CPU was spent during the [intern](cci:1://file:///d:/Agentic_AI/Gssoc_Apache/datafusion/datafusion/physical-plan/src/aggregates/group_values/single_group_by/bytes_view.rs:45:4-74:5) operation. This PR optimizes the hot path in [insert_if_new_inner](cci:1://file:///d:/Agentic_AI/Gssoc_Apache/datafusion/datafusion/physical-expr-common/src/binary_view_map.rs:222:4-317:5) by using direct view access methods that avoid the iteration overhead. ## What changes are included in this PR? - **Replace `values.iter()` with `values.views()`**: Access the raw `&[u128]` view buffer directly instead of creating Option wrappers for each value - **Use `is_valid(i)` for null checking**: Check validity via the bitmap instead of pattern matching on `Option` - **Direct index-based access**: Use `values.value(i)` only when the value is needed, avoiding unnecessary dereferencing - **No additional memory overhead**: Per maintainer feedback, we don't store the u128 view in the Entry struct ## Are these changes tested? Yes, all existing tests pass: - `binary_view_map::tests::string_view_set_empty` - `binary_view_map::tests::string_view_set_one_null` - `binary_view_map::tests::string_view_set_many_null` - `binary_view_map::tests::test_string_view_set_basic` - `binary_view_map::tests::test_string_set_non_utf8` - `binary_view_map::tests::test_binary_set` - `binary_view_map::tests::test_map` - `binary_view_map::tests::test_string_set_memory_usage` ## Are there any user-facing changes? No, this is an internal performance optimization with no changes to public APIs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
