theirix opened a new pull request, #20114:
URL: https://github.com/apache/datafusion/pull/20114

   ## Which issue does this PR close?
   
   - Closes #20103
   
   ## Rationale for this change
   
   A refactoring PR for performance improvement PRs for left #19749 and right 
#20068.
   
   ## What changes are included in this PR?
   
   1. Removed a lot of code duplication by extracting a common stringarray / 
stringview implementation. Now left and right UDFs entry points are leaner. 
Differences are only in slicing - from the left or from the right - which is 
implemented in a generic trait parameter, following the design of trim.
   
   2. Switched `left` to use `make_view` to avoid buffer tinkering in 
datafusion code.
   
   4. Combine left and right benches together
   
   ## Are these changes tested?
   
   - Existing unit tests
   - Existing SLTs passed
   - Benches show the same performance improvement of 60-85%
   
   Bench results against pre-optimisation commit 
458b49109af58e678520edebbad9fb3edfd26992:
   <details>
   left size=1024/string_array positive n/1024
                           time:   [34.150 µs 34.694 µs 35.251 µs]
                           change: [−71.694% −70.722% −69.818%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   left size=1024/string_array negative n/1024
                           time:   [30.860 µs 31.396 µs 31.998 µs]
                           change: [−85.846% −85.294% −84.759%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) low mild
     4 (4.00%) high mild
     2 (2.00%) high severe
   
   left size=4096/string_array positive n/4096
                           time:   [112.19 µs 114.28 µs 116.98 µs]
                           change: [−71.673% −70.934% −70.107%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   left size=4096/string_array negative n/4096
                           time:   [126.71 µs 129.06 µs 131.26 µs]
                           change: [−84.204% −83.809% −83.455%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     3 (3.00%) low mild
     2 (2.00%) high mild
   
   left size=1024/string_view_array positive n/1024
                           time:   [30.249 µs 30.887 µs 31.461 µs]
                           change: [−75.288% −74.499% −73.743%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) low mild
     1 (1.00%) high mild
   left size=1024/string_view_array negative n/1024
                           time:   [48.404 µs 49.007 µs 49.608 µs]
                           change: [−66.827% −65.727% −64.652%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) low mild
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   left size=4096/string_view_array positive n/4096
                           time:   [145.25 µs 148.47 µs 151.85 µs]
                           change: [−68.913% −67.836% −66.770%] (p = 0.00 < 
0.05)
                           Performance has improved.
   left size=4096/string_view_array negative n/4096
                           time:   [203.11 µs 206.31 µs 209.98 µs]
                           change: [−57.411% −56.773% −56.142%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 15 outliers among 100 measurements (15.00%)
     1 (1.00%) low mild
     13 (13.00%) high mild
     1 (1.00%) high severe
   
   right size=1024/string_array positive n/1024
                           time:   [30.820 µs 31.674 µs 32.627 µs]
                           change: [−84.230% −83.842% −83.402%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     5 (5.00%) high mild
   right size=1024/string_array negative n/1024
                           time:   [32.434 µs 33.170 µs 33.846 µs]
                           change: [−88.796% −88.460% −88.164%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   right size=4096/string_array positive n/4096
                           time:   [124.71 µs 126.54 µs 128.27 µs]
                           change: [−83.321% −82.902% −82.537%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   right size=4096/string_array negative n/4096
                           time:   [125.05 µs 127.67 µs 130.35 µs]
                           change: [−89.376% −89.193% −89.004%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   right size=1024/string_view_array positive n/1024
                           time:   [29.110 µs 29.608 µs 30.141 µs]
                           change: [−79.807% −79.330% −78.683%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     6 (6.00%) high mild
     2 (2.00%) high severe
   right size=1024/string_view_array negative n/1024
                           time:   [44.883 µs 45.656 µs 46.511 µs]
                           change: [−71.157% −70.546% −69.874%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     5 (5.00%) high mild
     1 (1.00%) high severe
   
   right size=4096/string_view_array positive n/4096
                           time:   [139.57 µs 142.18 µs 144.96 µs]
                           change: [−75.610% −75.088% −74.549%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high severe
   right size=4096/string_view_array negative n/4096
                           time:   [221.47 µs 224.47 µs 227.72 µs]
                           change: [−64.625% −64.047% −63.504%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   </details>
   
   
   ## Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be 
updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api 
change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to