Tushar7012 opened a new pull request, #19975:
URL: https://github.com/apache/datafusion/pull/19975

   - Use values.views() instead of values.iter() for direct u128 access
   - Use is_valid(i) for efficient null checking via validity bitmap
   - Avoid dereferencing overhead for inline strings
   - No additional memory overhead in Entry struct
   
   Closes #19961
   
   ## Which issue does this PR close?
   
   Closes #19961
   
   ## Rationale for this change
   
   The 
[ArrowBytesViewMap](cci:2://file:///d:/Agentic_AI/Gssoc_Apache/datafusion/datafusion/physical-expr-common/src/binary_view_map.rs:115:0-136:1)
 was using `values.iter()` which creates unnecessary `Option` wrappers and 
extra overhead when iterating over byte view arrays. For ClickBench query 5, 
>50% CPU was spent during the 
[intern](cci:1://file:///d:/Agentic_AI/Gssoc_Apache/datafusion/datafusion/physical-plan/src/aggregates/group_values/single_group_by/bytes_view.rs:45:4-74:5)
 operation.
   
   This PR optimizes the hot path in 
[insert_if_new_inner](cci:1://file:///d:/Agentic_AI/Gssoc_Apache/datafusion/datafusion/physical-expr-common/src/binary_view_map.rs:222:4-317:5)
 by using direct view access methods that avoid the iteration overhead.
   
   ## What changes are included in this PR?
   
   - **Replace `values.iter()` with `values.views()`**: Access the raw 
`&[u128]` view buffer directly instead of creating Option wrappers for each 
value
   - **Use `is_valid(i)` for null checking**: Check validity via the bitmap 
instead of pattern matching on `Option`
   - **Direct index-based access**: Use `values.value(i)` only when the value 
is needed, avoiding unnecessary dereferencing
   - **No additional memory overhead**: Per maintainer feedback, we don't store 
the u128 view in the Entry struct
   
   ## Are these changes tested?
   
   Yes, all existing tests pass:
   - `binary_view_map::tests::string_view_set_empty`
   - `binary_view_map::tests::string_view_set_one_null`
   - `binary_view_map::tests::string_view_set_many_null`
   - `binary_view_map::tests::test_string_view_set_basic`
   - `binary_view_map::tests::test_string_set_non_utf8`
   - `binary_view_map::tests::test_binary_set`
   - `binary_view_map::tests::test_map`
   - `binary_view_map::tests::test_string_set_memory_usage`
   
   ## Are there any user-facing changes?
   
   No, this is an internal performance optimization with no changes to public 
APIs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to