jhorstmann commented on PR #5100:
URL: https://github.com/apache/arrow-rs/pull/5100#issuecomment-1817961415

   Benchmarks on 1.73.0, against master (commit 61da64a) with `simd` feature.
   
   ```
   RUSTFLAGS="-Ctarget-cpu=native -Copt-level=3 
-Ctarget-feature=-prefer-256-bit" cargo +1.73 bench --bench aggregate_kernels
   ```
   
   All kernels are faster than the previous scalar code, most of them 
siginificantly so.
   
   The numbers are lower than the results using `nightly` above because 
detection of the `avx512` feature flags is still unstable, which makes the code 
use fewer lanes than would be supported by the hardware.
   
   ```
   float32/sum nonnull     time:   [3.4112 µs 3.4134 µs 3.4162 µs]
                           thrpt:  [71.465 GiB/s 71.523 GiB/s 71.570 GiB/s]
                    change:
                           time:   [-93.879% -93.853% -93.829%] (p = 0.00 < 
0.05)
                           thrpt:  [+1520.6% +1526.9% +1533.6%]
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     4 (4.00%) high mild
     9 (9.00%) high severe
   float32/min nonnull     time:   [6.0613 µs 6.0636 µs 6.0662 µs]
                           thrpt:  [40.246 GiB/s 40.263 GiB/s 40.279 GiB/s]
                    change:
                           time:   [-91.625% -91.594% -91.566%] (p = 0.00 < 
0.05)
                           thrpt:  [+1085.7% +1089.6% +1094.0%]
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     7 (7.00%) high mild
     2 (2.00%) high severe
   float32/max nonnull     time:   [6.0646 µs 6.0683 µs 6.0730 µs]
                           thrpt:  [40.201 GiB/s 40.232 GiB/s 40.257 GiB/s]
                    change:
                           time:   [-91.519% -91.506% -91.493%] (p = 0.00 < 
0.05)
                           thrpt:  [+1075.5% +1077.3% +1079.1%]
                           Performance has improved.
   Found 14 outliers among 100 measurements (14.00%)
     7 (7.00%) high mild
     7 (7.00%) high severe
   float32/sum nullable    time:   [11.487 µs 11.492 µs 11.499 µs]
                           thrpt:  [21.232 GiB/s 21.244 GiB/s 21.253 GiB/s]
                    change:
                           time:   [-91.966% -91.921% -91.879%] (p = 0.00 < 
0.05)
                           thrpt:  [+1131.4% +1137.8% +1144.7%]
                           Performance has improved.
   Found 12 outliers among 100 measurements (12.00%)
     2 (2.00%) low mild
     3 (3.00%) high mild
     7 (7.00%) high severe
   float32/min nullable    time:   [17.309 µs 17.318 µs 17.330 µs]
                           thrpt:  [14.088 GiB/s 14.098 GiB/s 14.105 GiB/s]
                    change:
                           time:   [-79.033% -78.986% -78.942%] (p = 0.00 < 
0.05)
                           thrpt:  [+374.87% +375.86% +376.94%]
                           Performance has improved.
   Found 12 outliers among 100 measurements (12.00%)
     6 (6.00%) high mild
     6 (6.00%) high severe
   float32/max nullable    time:   [17.313 µs 17.328 µs 17.350 µs]
                           thrpt:  [14.071 GiB/s 14.089 GiB/s 14.102 GiB/s]
                    change:
                           time:   [-79.230% -79.195% -79.161%] (p = 0.00 < 
0.05)
                           thrpt:  [+379.88% +380.66% +381.47%]
                           Performance has improved.
   Found 14 outliers among 100 measurements (14.00%)
     6 (6.00%) high mild
     8 (8.00%) high severe
   
   float64/sum nonnull     time:   [6.8393 µs 6.8460 µs 6.8552 µs]
                           thrpt:  [71.228 GiB/s 71.323 GiB/s 71.394 GiB/s]
                    change:
                           time:   [-87.308% -87.274% -87.243%] (p = 0.00 < 
0.05)
                           thrpt:  [+683.86% +685.76% +687.90%]
                           Performance has improved.
   Found 12 outliers among 100 measurements (12.00%)
     5 (5.00%) high mild
     7 (7.00%) high severe
   float64/min nonnull     time:   [12.117 µs 12.130 µs 12.148 µs]
                           thrpt:  [40.195 GiB/s 40.253 GiB/s 40.296 GiB/s]
                    change:
                           time:   [-82.709% -82.656% -82.605%] (p = 0.00 < 
0.05)
                           thrpt:  [+474.88% +476.56% +478.33%]
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     7 (7.00%) high mild
     6 (6.00%) high severe
   float64/max nonnull     time:   [12.117 µs 12.127 µs 12.138 µs]
                           thrpt:  [40.227 GiB/s 40.265 GiB/s 40.298 GiB/s]
                    change:
                           time:   [-82.566% -82.541% -82.517%] (p = 0.00 < 
0.05)
                           thrpt:  [+471.97% +472.78% +473.60%]
                           Performance has improved.
   Found 10 outliers among 100 measurements (10.00%)
     5 (5.00%) high mild
     5 (5.00%) high severe
   float64/sum nullable    time:   [23.024 µs 23.032 µs 23.042 µs]
                           thrpt:  [21.191 GiB/s 21.200 GiB/s 21.208 GiB/s]
                    change:
                           time:   [-83.329% -83.268% -83.209%] (p = 0.00 < 
0.05)
                           thrpt:  [+495.57% +497.66% +499.84%]
                           Performance has improved.
   Found 12 outliers among 100 measurements (12.00%)
     1 (1.00%) low severe
     7 (7.00%) low mild
     1 (1.00%) high mild
     3 (3.00%) high severe
   float64/min nullable    time:   [34.426 µs 34.431 µs 34.437 µs]
                           thrpt:  [14.179 GiB/s 14.181 GiB/s 14.183 GiB/s]
                    change:
                           time:   [-57.965% -57.881% -57.805%] (p = 0.00 < 
0.05)
                           thrpt:  [+136.99% +137.42% +137.90%]
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) low mild
     2 (2.00%) high mild
     4 (4.00%) high severe
   float64/max nullable    time:   [34.444 µs 34.471 µs 34.505 µs]
                           thrpt:  [14.151 GiB/s 14.165 GiB/s 14.176 GiB/s]
                    change:
                           time:   [-58.304% -58.247% -58.186%] (p = 0.00 < 
0.05)
                           thrpt:  [+139.15% +139.50% +139.83%]
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     5 (5.00%) high mild
     8 (8.00%) high severe
   
   int8/sum nonnull        time:   [291.60 ns 291.71 ns 291.84 ns]
                           thrpt:  [209.14 GiB/s 209.23 GiB/s 209.31 GiB/s]
                    change:
                           time:   [-4.0706% -3.9336% -3.7993%] (p = 0.00 < 
0.05)
                           thrpt:  [+3.9493% +4.0946% +4.2433%]
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     4 (4.00%) high mild
     5 (5.00%) high severe
   int8/min nonnull        time:   [288.61 ns 288.71 ns 288.82 ns]
                           thrpt:  [211.33 GiB/s 211.41 GiB/s 211.48 GiB/s]
                    change:
                           time:   [-57.479% -57.335% -57.202%] (p = 0.00 < 
0.05)
                           thrpt:  [+133.66% +134.38% +135.18%]
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     6 (6.00%) high mild
   int8/max nonnull        time:   [289.92 ns 290.20 ns 290.54 ns]
                           thrpt:  [210.07 GiB/s 210.32 GiB/s 210.53 GiB/s]
                    change:
                           time:   [-57.142% -57.024% -56.907%] (p = 0.00 < 
0.05)
                           thrpt:  [+132.06% +132.69% +133.33%]
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     1 (1.00%) high mild
     6 (6.00%) high severe
   int8/sum nullable       time:   [3.4562 µs 3.4576 µs 3.4597 µs]
                           thrpt:  [17.642 GiB/s 17.652 GiB/s 17.660 GiB/s]
                    change:
                           time:   [-97.490% -97.484% -97.479%] (p = 0.00 < 
0.05)
                           thrpt:  [+3866.5% +3874.8% +3883.8%]
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     1 (1.00%) high mild
     6 (6.00%) high severe
   int8/min nullable       time:   [3.8795 µs 3.8810 µs 3.8829 µs]
                           thrpt:  [15.719 GiB/s 15.727 GiB/s 15.733 GiB/s]
                    change:
                           time:   [-92.463% -92.420% -92.378%] (p = 0.00 < 
0.05)
                           thrpt:  [+1212.0% +1219.2% +1226.7%]
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     5 (5.00%) high mild
     6 (6.00%) high severe
   int8/max nullable       time:   [3.8790 µs 3.8812 µs 3.8846 µs]
                           thrpt:  [15.712 GiB/s 15.726 GiB/s 15.735 GiB/s]
                    change:
                           time:   [-92.711% -92.686% -92.662%] (p = 0.00 < 
0.05)
                           thrpt:  [+1262.9% +1267.3% +1271.9%]
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     3 (3.00%) high mild
     6 (6.00%) high severe
   
   int16/sum nonnull       time:   [583.83 ns 584.05 ns 584.30 ns]
                           thrpt:  [208.92 GiB/s 209.01 GiB/s 209.09 GiB/s]
                    change:
                           time:   [-2.6158% -2.5160% -2.4259%] (p = 0.00 < 
0.05)
                           thrpt:  [+2.4862% +2.5809% +2.6861%]
                           Performance has improved.
   Found 10 outliers among 100 measurements (10.00%)
     1 (1.00%) low severe
     3 (3.00%) high mild
     6 (6.00%) high severe
   int16/min nonnull       time:   [578.65 ns 578.82 ns 579.01 ns]
                           thrpt:  [210.83 GiB/s 210.89 GiB/s 210.96 GiB/s]
                    change:
                           time:   [-55.421% -55.393% -55.367%] (p = 0.00 < 
0.05)
                           thrpt:  [+124.05% +124.18% +124.32%]
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   int16/max nonnull       time:   [580.11 ns 580.76 ns 581.55 ns]
                           thrpt:  [209.91 GiB/s 210.19 GiB/s 210.43 GiB/s]
                    change:
                           time:   [-55.286% -55.246% -55.200%] (p = 0.00 < 
0.05)
                           thrpt:  [+123.21% +123.44% +123.65%]
                           Performance has improved.
   Found 14 outliers among 100 measurements (14.00%)
     3 (3.00%) high mild
     11 (11.00%) high severe
   int16/sum nullable      time:   [3.4950 µs 3.4976 µs 3.5007 µs]
                           thrpt:  [34.870 GiB/s 34.901 GiB/s 34.927 GiB/s]
                    change:
                           time:   [-97.406% -97.394% -97.383%] (p = 0.00 < 
0.05)
                           thrpt:  [+3721.6% +3737.4% +3754.5%]
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     1 (1.00%) low mild
     8 (8.00%) high mild
     2 (2.00%) high severe
   int16/min nullable      time:   [5.2134 µs 5.2147 µs 5.2161 µs]
                           thrpt:  [23.403 GiB/s 23.409 GiB/s 23.415 GiB/s]
                    change:
                           time:   [-89.477% -89.410% -89.347%] (p = 0.00 < 
0.05)
                           thrpt:  [+838.73% +844.29% +850.30%]
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     6 (6.00%) high mild
     2 (2.00%) high severe
   int16/max nullable      time:   [5.1586 µs 5.1597 µs 5.1609 µs]
                           thrpt:  [23.653 GiB/s 23.658 GiB/s 23.663 GiB/s]
                    change:
                           time:   [-89.341% -89.279% -89.226%] (p = 0.00 < 
0.05)
                           thrpt:  [+828.20% +832.76% +838.20%]
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   int32/sum nonnull       time:   [1.1674 µs 1.1681 µs 1.1689 µs]
                           thrpt:  [208.86 GiB/s 209.00 GiB/s 209.13 GiB/s]
                    change:
                           time:   [-2.3865% -2.2896% -2.2025%] (p = 0.00 < 
0.05)
                           thrpt:  [+2.2521% +2.3433% +2.4448%]
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     4 (4.00%) high mild
     1 (1.00%) high severe
   int32/min nonnull       time:   [1.1593 µs 1.1600 µs 1.1607 µs]
                           thrpt:  [210.33 GiB/s 210.46 GiB/s 210.59 GiB/s]
                    change:
                           time:   [-55.402% -55.344% -55.297%] (p = 0.00 < 
0.05)
                           thrpt:  [+123.70% +123.93% +124.23%]
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     1 (1.00%) low mild
     2 (2.00%) high mild
     2 (2.00%) high severe
   int32/max nonnull       time:   [1.1613 µs 1.1619 µs 1.1625 µs]
                           thrpt:  [210.02 GiB/s 210.13 GiB/s 210.24 GiB/s]
                    change:
                           time:   [-55.217% -55.186% -55.155%] (p = 0.00 < 
0.05)
                           thrpt:  [+122.99% +123.14% +123.30%]
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     3 (3.00%) high mild
     2 (2.00%) high severe
   int32/sum nullable      time:   [3.6026 µs 3.6048 µs 3.6077 µs]
                           thrpt:  [67.673 GiB/s 67.727 GiB/s 67.768 GiB/s]
                    change:
                           time:   [-97.340% -97.335% -97.329%] (p = 0.00 < 
0.05)
                           thrpt:  [+3644.1% +3652.3% +3659.5%]
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     2 (2.00%) low severe
     1 (1.00%) low mild
     4 (4.00%) high mild
     6 (6.00%) high severe
   int32/min nullable      time:   [12.045 µs 12.049 µs 12.054 µs]
                           thrpt:  [20.255 GiB/s 20.263 GiB/s 20.269 GiB/s]
                    change:
                           time:   [-75.866% -75.755% -75.641%] (p = 0.00 < 
0.05)
                           thrpt:  [+310.52% +312.45% +314.36%]
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     6 (6.00%) high mild
     5 (5.00%) high severe
   int32/max nullable      time:   [12.051 µs 12.055 µs 12.060 µs]
                           thrpt:  [20.244 GiB/s 20.252 GiB/s 20.259 GiB/s]
                    change:
                           time:   [-75.752% -75.639% -75.524%] (p = 0.00 < 
0.05)
                           thrpt:  [+308.56% +310.50% +312.41%]
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     1 (1.00%) low mild
     3 (3.00%) high mild
     4 (4.00%) high severe
   
   int64/sum nonnull       time:   [2.4296 µs 2.4311 µs 2.4327 µs]
                           thrpt:  [200.72 GiB/s 200.85 GiB/s 200.97 GiB/s]
                    change:
                           time:   [-1.0469% -0.7523% -0.3929%] (p = 0.00 < 
0.05)
                           thrpt:  [+0.3944% +0.7580% +1.0580%]
                           Change within noise threshold.
   Found 12 outliers among 100 measurements (12.00%)
     3 (3.00%) high mild
     9 (9.00%) high severe
   int64/min nonnull       time:   [2.4356 µs 2.4377 µs 2.4403 µs]
                           thrpt:  [200.09 GiB/s 200.30 GiB/s 200.48 GiB/s]
                    change:
                           time:   [-54.033% -53.990% -53.940%] (p = 0.00 < 
0.05)
                           thrpt:  [+117.11% +117.34% +117.55%]
                           Performance has improved.
   Found 7 outliers among 100 measurements (7.00%)
     1 (1.00%) low mild
     3 (3.00%) high mild
     3 (3.00%) high severe
   int64/max nonnull       time:   [2.4404 µs 2.4419 µs 2.4436 µs]
                           thrpt:  [199.82 GiB/s 199.96 GiB/s 200.09 GiB/s]
                    change:
                           time:   [-53.982% -53.922% -53.872%] (p = 0.00 < 
0.05)
                           thrpt:  [+116.79% +117.02% +117.31%]
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     2 (2.00%) low mild
     6 (6.00%) high mild
     3 (3.00%) high severe
   int64/sum nullable      time:   [7.1891 µs 7.1921 µs 7.1958 µs]
                           thrpt:  [67.856 GiB/s 67.892 GiB/s 67.920 GiB/s]
                    change:
                           time:   [-94.725% -94.718% -94.710%] (p = 0.00 < 
0.05)
                           thrpt:  [+1790.4% +1793.1% +1795.7%]
                           Performance has improved.
   Found 9 outliers among 100 measurements (9.00%)
     4 (4.00%) high mild
     5 (5.00%) high severe
   int64/min nullable      time:   [24.719 µs 24.728 µs 24.738 µs]
                           thrpt:  [19.738 GiB/s 19.746 GiB/s 19.753 GiB/s]
                    change:
                           time:   [-51.258% -51.041% -50.833%] (p = 0.00 < 
0.05)
                           thrpt:  [+103.39% +104.25% +105.16%]
                           Performance has improved.
   Found 12 outliers among 100 measurements (12.00%)
     6 (6.00%) high mild
     6 (6.00%) high severe
   int64/max nullable      time:   [24.704 µs 24.712 µs 24.722 µs]
                           thrpt:  [19.751 GiB/s 19.759 GiB/s 19.765 GiB/s]
                    change:
                           time:   [-54.224% -54.086% -53.942%] (p = 0.00 < 
0.05)
                           thrpt:  [+117.12% +117.80% +118.45%]
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     1 (1.00%) low mild
     7 (7.00%) high mild
     3 (3.00%) high severe
   
   string/min nonnull      time:   [141.44 µs 141.58 µs 141.77 µs]
                           thrpt:  [462.26 Melem/s 462.90 Melem/s 463.36 
Melem/s]
                    change:
                           time:   [-0.7023% -0.5395% -0.4004%] (p = 0.00 < 
0.05)
                           thrpt:  [+0.4020% +0.5424% +0.7073%]
                           Change within noise threshold.
   Found 10 outliers among 100 measurements (10.00%)
     2 (2.00%) low mild
     4 (4.00%) high mild
     4 (4.00%) high severe
   string/max nonnull      time:   [141.32 µs 141.46 µs 141.66 µs]
                           thrpt:  [462.64 Melem/s 463.28 Melem/s 463.75 
Melem/s]
                    change:
                           time:   [-0.5020% -0.2694% -0.0027%] (p = 0.03 < 
0.05)
                           thrpt:  [+0.0027% +0.2701% +0.5045%]
                           Change within noise threshold.
   Found 18 outliers among 100 measurements (18.00%)
     6 (6.00%) high mild
     12 (12.00%) high severe
   string/min nullable     time:   [267.85 µs 268.07 µs 268.30 µs]
                           thrpt:  [244.26 Melem/s 244.47 Melem/s 244.68 
Melem/s]
                    change:
                           time:   [+1.4168% +1.6011% +1.7800%] (p = 0.00 < 
0.05)
                           thrpt:  [-1.7488% -1.5758% -1.3970%]
                           Performance has regressed.
   Found 5 outliers among 100 measurements (5.00%)
     1 (1.00%) low severe
     1 (1.00%) low mild
     3 (3.00%) high mild
   string/max nullable     time:   [281.09 µs 281.47 µs 281.84 µs]
                           thrpt:  [232.53 Melem/s 232.84 Melem/s 233.15 
Melem/s]
                    change:
                           time:   [+1.6971% +1.8843% +2.0923%] (p = 0.00 < 
0.05)
                           thrpt:  [-2.0494% -1.8495% -1.6688%]
                           Performance has regressed.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to