cetra3 opened a new pull request, #9407:
URL: https://github.com/apache/arrow-rs/pull/9407

   # Which issue does this PR close?
   
   N/A
   
   # Rationale for this change
   
   There are a few small wins we can get adjusting how sort works by avoiding 
some copies in memory & eliminates some unsafe code around the partition scan.  
I tried a few attempts at bringing down the numbers of the partition scan even 
more by chunking the bitmaps, but this "simpler" implementation looks like it 
compiles/branches pretty well
   
   I also optimised the `sort_native_type` to avoid some double copies and 
bounds checks, which gets us a nice +17% speedup in some of the primitive 
benchmarks including null values (i.e, `sort i32 nulls 2^12`).
   
   # What changes are included in this PR?
   
   Adjustments to two inner functions:
   
   * `sort_native_type`
   * & `partition_validity_scan`
   
   # Are these changes tested?
   
   Yes via:
   
   ```
   cargo test -p arrow-ord
   cargo +nightly miri test -p arrow-ord
   ```
   
   & benched with `main` via:
   
   ```
   cargo bench --features test_utils -p arrow --bench sort_kernel -- --baseline 
original
   ```
   
   # Are there any user-facing changes?
   
   Nope
   
   # Benchmark Results against `main`
   
   Here's a comparison against `main` on my PC. 
   
   It would be interesting to run this via the standard benchmark and see if we 
have the same improvements there:
   
   ```
   sort i32 2^10           time:   [3.6062 µs 3.6126 µs 3.6198 µs]
                           change: [−0.4850% −0.2379% +0.0180%] (p = 0.08 > 
0.05)
                           No change in performance detected.
   Found 7 outliers among 100 measurements (7.00%)
     3 (3.00%) high mild
     4 (4.00%) high severe
   
   sort i32 to indices 2^10
                           time:   [5.1211 µs 5.1294 µs 5.1390 µs]
                           change: [−3.1047% −2.9580% −2.7578%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     3 (3.00%) high mild
     2 (2.00%) high severe
   
   sort i32 2^12           time:   [16.895 µs 16.907 µs 16.919 µs]
                           change: [−0.0119% +0.0442% +0.1068%] (p = 0.17 > 
0.05)
                           No change in performance detected.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   sort i32 to indices 2^12
                           time:   [23.050 µs 23.071 µs 23.090 µs]
                           change: [−5.1415% −5.0592% −4.9845%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   sort i32 nulls 2^10     time:   [2.0931 µs 2.0955 µs 2.0983 µs]
                           change: [−17.734% −17.538% −17.307%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 12 outliers among 100 measurements (12.00%)
     5 (5.00%) high mild
     7 (7.00%) high severe
   
   sort i32 nulls to indices 2^10
                           time:   [3.0490 µs 3.0500 µs 3.0510 µs]
                           change: [−2.0059% −1.9109% −1.8199%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   sort i32 nulls 2^12     time:   [8.9355 µs 8.9408 µs 8.9468 µs]
                           change: [−18.961% −18.893% −18.827%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   
   sort i32 nulls to indices 2^12
                           time:   [12.599 µs 12.603 µs 12.608 µs]
                           change: [−5.2886% −5.2491% −5.2092%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     2 (2.00%) high mild
     1 (1.00%) high severe
   
   sort f32 2^12           time:   [25.314 µs 25.329 µs 25.345 µs]
                           change: [+0.0265% +0.1125% +0.1923%] (p = 0.01 < 
0.05)
                           Change within noise threshold.
   Found 7 outliers among 100 measurements (7.00%)
     7 (7.00%) high mild
   
   sort f32 to indices 2^12
                           time:   [30.457 µs 30.490 µs 30.523 µs]
                           change: [−1.2373% −1.1522% −1.0705%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   sort f32 nulls 2^12     time:   [12.506 µs 12.516 µs 12.527 µs]
                           change: [−1.3180% −1.2207% −1.1189%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   sort f32 nulls to indices 2^12
                           time:   [16.185 µs 16.221 µs 16.263 µs]
                           change: [−1.9818% −1.8358% −1.6704%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     2 (2.00%) high mild
     3 (3.00%) high severe
   
   sort string[0-10] to indices 2^12
                           time:   [33.546 µs 33.661 µs 33.817 µs]
                           change: [−0.3347% −0.1203% +0.1008%] (p = 0.32 > 
0.05)
                           No change in performance detected.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high severe
   
   sort string[0-10] nulls to indices 2^12
                           time:   [17.370 µs 17.383 µs 17.396 µs]
                           change: [−3.9307% −3.8159% −3.7119%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     5 (5.00%) low mild
     1 (1.00%) high mild
   
   sort string[0-100] to indices 2^12
                           time:   [31.321 µs 31.394 µs 31.469 µs]
                           change: [−1.7726% −1.5633% −1.3371%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     4 (4.00%) high mild
   
   sort string[0-100] nulls to indices 2^12
                           time:   [16.468 µs 16.488 µs 16.511 µs]
                           change: [−1.4143% −0.9178% −0.2647%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 7 outliers among 100 measurements (7.00%)
     5 (5.00%) high mild
     2 (2.00%) high severe
   
   sort string[0-400] to indices 2^12
                           time:   [31.458 µs 31.500 µs 31.547 µs]
                           change: [−2.7636% −2.5729% −2.3756%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     1 (1.00%) low mild
     3 (3.00%) high mild
     2 (2.00%) high severe
   
   sort string[0-400] nulls to indices 2^12
                           time:   [16.734 µs 16.800 µs 16.866 µs]
                           change: [−3.4989% −3.2443% −2.9793%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     6 (6.00%) high mild
     2 (2.00%) high severe
   
   sort string[10] to indices 2^12
                           time:   [30.367 µs 30.404 µs 30.446 µs]
                           change: [−3.1487% −2.9660% −2.7843%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   sort string[10] nulls to indices 2^12
                           time:   [16.809 µs 16.829 µs 16.852 µs]
                           change: [−3.2575% −3.1146% −2.9765%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     3 (3.00%) high mild
     2 (2.00%) high severe
   
   sort string[100] to indices 2^12
                           time:   [31.771 µs 31.864 µs 31.978 µs]
                           change: [−0.2878% +0.1187% +0.5950%] (p = 0.58 > 
0.05)
                           No change in performance detected.
   Found 13 outliers among 100 measurements (13.00%)
     10 (10.00%) high mild
     3 (3.00%) high severe
   
   sort string[100] nulls to indices 2^12
                           time:   [16.676 µs 16.716 µs 16.762 µs]
                           change: [−2.9266% −2.7685% −2.6018%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     4 (4.00%) high mild
     2 (2.00%) high severe
   
   sort string[1000] to indices 2^12
                           time:   [30.735 µs 30.766 µs 30.796 µs]
                           change: [−1.6116% −1.4391% −1.2690%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   sort string[1000] nulls to indices 2^12
                           time:   [17.003 µs 17.017 µs 17.032 µs]
                           change: [−0.0323% +0.2819% +0.5431%] (p = 0.09 > 
0.05)
                           No change in performance detected.
   Found 13 outliers among 100 measurements (13.00%)
     1 (1.00%) low mild
     5 (5.00%) high mild
     7 (7.00%) high severe
   
   sort string_view[10] to indices 2^12
                           time:   [39.257 µs 39.371 µs 39.502 µs]
                           change: [+0.0916% +0.3817% +0.7433%] (p = 0.01 < 
0.05)
                           Change within noise threshold.
   Found 14 outliers among 100 measurements (14.00%)
     10 (10.00%) high mild
     4 (4.00%) high severe
   
   sort string_view[10] nulls to indices 2^12
                           time:   [20.450 µs 20.462 µs 20.474 µs]
                           change: [−2.4581% −2.3353% −2.2270%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) low mild
   
   sort string_view[0-400] to indices 2^12
                           time:   [45.049 µs 45.080 µs 45.115 µs]
                           change: [−1.7400% −1.5494% −1.3678%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     1 (1.00%) high mild
     3 (3.00%) high severe
   
   sort string_view[0-400] nulls to indices 2^12
                           time:   [22.724 µs 22.743 µs 22.760 µs]
                           change: [−1.2593% −1.1669% −1.0752%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   sort string_view_inlined[0-12] to indices 2^12
                           time:   [36.892 µs 36.904 µs 36.918 µs]
                           change: [−1.4699% −1.2349% −1.0770%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   sort string_view_inlined[0-12] nulls to indices 2^12
                           time:   [19.937 µs 19.963 µs 19.991 µs]
                           change: [−0.7254% −0.5934% −0.4552%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   
   sort string[10] dict to indices 2^12
                           time:   [142.71 µs 142.98 µs 143.29 µs]
                           change: [−5.5558% −2.9225% −0.7838%] (p = 0.01 < 
0.05)
                           Change within noise threshold.
   Found 7 outliers among 100 measurements (7.00%)
     3 (3.00%) high mild
     4 (4.00%) high severe
   
   sort string[10] dict nulls to indices 2^12
                           time:   [69.301 µs 69.416 µs 69.563 µs]
                           change: [−13.972% −9.2852% −4.7013%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   sort primitive run 2^12 time:   [3.3347 µs 3.3552 µs 3.3795 µs]
                           change: [+0.7883% +1.5862% +2.4305%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 20 outliers among 100 measurements (20.00%)
     1 (1.00%) high mild
     19 (19.00%) high severe
   
   sort primitive run to indices 2^12
                           time:   [3.8023 µs 3.8040 µs 3.8057 µs]
                           change: [+0.3625% +0.4246% +0.4842%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   lexsort (f32, f32) 2^10 time:   [21.543 µs 21.559 µs 21.578 µs]
                           change: [−1.1606% −0.9424% −0.7230%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   lexsort (f32, f32) 2^12 time:   [101.30 µs 101.35 µs 101.42 µs]
                           change: [−1.2541% −1.0419% −0.8720%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 8 outliers among 100 measurements (8.00%)
     5 (5.00%) high mild
     3 (3.00%) high severe
   
   lexsort (f32, f32) nulls 2^10
                           time:   [20.862 µs 20.875 µs 20.891 µs]
                           change: [+0.2650% +0.4623% +0.7238%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) low mild
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   lexsort (f32, f32) nulls 2^12
                           time:   [90.433 µs 90.501 µs 90.572 µs]
                           change: [−1.1384% −0.8178% −0.4471%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 9 outliers among 100 measurements (9.00%)
     2 (2.00%) high mild
     7 (7.00%) high severe
   
   lexsort (bool, bool) 2^12
                           time:   [35.253 µs 35.268 µs 35.282 µs]
                           change: [−2.4786% −2.4183% −2.3630%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     1 (1.00%) low mild
     6 (6.00%) high mild
     1 (1.00%) high severe
   
   lexsort (bool, bool) nulls 2^12
                           time:   [47.627 µs 47.669 µs 47.708 µs]
                           change: [−4.2046% −4.0162% −3.8511%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     4 (4.00%) high mild
   
   lexsort (f32, f32) 2^12 limit 10
                           time:   [18.287 µs 18.304 µs 18.324 µs]
                           change: [−3.0634% −2.5423% −2.0626%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     8 (8.00%) high mild
   
   lexsort (f32, f32) 2^12 limit 100
                           time:   [19.535 µs 19.560 µs 19.585 µs]
                           change: [−0.9957% −0.8426% −0.6985%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   lexsort (f32, f32) 2^12 limit 1000
                           time:   [37.494 µs 37.511 µs 37.526 µs]
                           change: [−1.3760% −1.2316% −1.0851%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   lexsort (f32, f32) 2^12 limit 2^12
                           time:   [101.54 µs 101.63 µs 101.72 µs]
                           change: [−0.1362% −0.0357% +0.0559%] (p = 0.47 > 
0.05)
                           No change in performance detected.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   lexsort (f32, f32) nulls 2^12 limit 10
                           time:   [30.107 µs 30.129 µs 30.154 µs]
                           change: [+0.3201% +0.4125% +0.5164%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 6 outliers among 100 measurements (6.00%)
     4 (4.00%) high mild
     2 (2.00%) high severe
   
   lexsort (f32, f32) nulls 2^12 limit 100
                           time:   [30.469 µs 30.486 µs 30.504 µs]
                           change: [+0.3045% +0.3768% +0.4462%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) low mild
     1 (1.00%) high severe
   
   lexsort (f32, f32) nulls 2^12 limit 1000
                           time:   [34.089 µs 34.103 µs 34.117 µs]
                           change: [−1.0657% −0.9015% −0.7530%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 3 outliers among 100 measurements (3.00%)
     3 (3.00%) high mild
   
   lexsort (f32, f32) nulls 2^12 limit 2^12
                           time:   [90.160 µs 90.201 µs 90.247 µs]
                           change: [−2.6910% −2.5248% −2.3647%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   rank f32 2^12           time:   [30.088 µs 30.103 µs 30.119 µs]
                           change: [−1.2426% −1.1095% −0.9916%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 6 outliers among 100 measurements (6.00%)
     3 (3.00%) low mild
     1 (1.00%) high mild
     2 (2.00%) high severe
   
   rank f32 nulls 2^12     time:   [15.402 µs 15.415 µs 15.430 µs]
                           change: [−1.9693% −1.8289% −1.6885%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   rank string[10] 2^12    time:   [117.06 µs 117.10 µs 117.13 µs]
                           change: [−3.0992% −2.9805% −2.8630%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   rank string[10] nulls 2^12
                           time:   [55.871 µs 55.906 µs 55.945 µs]
                           change: [−0.5642% −0.4729% −0.3840%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to