theirix opened a new pull request, #20069:
URL: https://github.com/apache/datafusion/pull/20069

   ## Which issue does this PR close?
   
   - Closes #20068.
   
   ## Rationale for this change
   
   Similar to issue #19749 and the optimisation of `left` in #19980, it's worth 
doing the same for `right`
   
   ## What changes are included in this PR?
    
   - Improve efficiency of the function by making fewer memory allocations and 
going directly to bytes, based on char boundaries
   
   - Provide a specialisation for StringView with buffer zero-copy
   
   - Use `arrow_array::buffer::make_view` for low-level view manipulation (we 
still need to know about a magic constant 12 for a buffer layout)
   
   - Benchmark - up to 90% performance improvement
   
   ```
   right size=1024/string_array positive n/1024
                           time:   [24.286 µs 24.658 µs 25.087 µs]
                           change: [−86.881% −86.662% −86.424%] (p = 0.00 < 
0.05)
                           Performance has improved.
   right size=1024/string_array negative n/1024
                           time:   [29.996 µs 30.737 µs 31.511 µs]
                           change: [−89.442% −89.229% −89.003%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   right size=4096/string_array positive n/4096
                           time:   [105.58 µs 109.39 µs 113.51 µs]
                           change: [−86.119% −85.788% −85.497%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     6 (6.00%) high mild
     3 (3.00%) high severe
   right size=4096/string_array negative n/4096
                           time:   [136.48 µs 138.34 µs 140.36 µs]
                           change: [−88.007% −87.848% −87.692%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     4 (4.00%) high mild
   
   right size=1024/string_view_array positive n/1024
                           time:   [25.054 µs 25.500 µs 26.033 µs]
                           change: [−82.569% −82.285% −81.891%] (p = 0.00 < 
0.05)
                           Performance has improved.
   right size=1024/string_view_array negative n/1024
                           time:   [41.281 µs 42.730 µs 44.432 µs]
                           change: [−73.832% −73.288% −72.716%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     3 (3.00%) high mild
     2 (2.00%) high severe
   
   right size=4096/string_view_array positive n/4096
                           time:   [129.38 µs 133.69 µs 137.61 µs]
                           change: [−79.497% −78.998% −78.581%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     4 (4.00%) high mild
   right size=4096/string_view_array negative n/4096
                           time:   [218.16 µs 229.41 µs 243.30 µs]
                           change: [−65.405% −63.622% −61.515%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 10 outliers among 100 measurements (10.00%)
     3 (3.00%) high mild
     7 (7.00%) high severe
   ```
   
   ## Are these changes tested?
   
   - Existing unit tests for `right`
   
   - Added more unit tests
   
   - Added bench similar to `right.rs`
   
   - Existing SLTs pass
   
   ## Are there any user-facing changes?
   
   No
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to