jhorstmann commented on PR #5100:
URL: https://github.com/apache/arrow-rs/pull/5100#issuecomment-1817961415
Benchmarks on 1.73.0, against master (commit 61da64a) with `simd` feature.
```
RUSTFLAGS="-Ctarget-cpu=native -Copt-level=3
-Ctarget-feature=-prefer-256-bit" cargo +1.73 bench --bench aggregate_kernels
```
All kernels are faster than the previous scalar code, most of them
siginificantly so.
The numbers are lower than the results using `nightly` above because
detection of the `avx512` feature flags is still unstable, which makes the code
use fewer lanes than would be supported by the hardware.
```
float32/sum nonnull time: [3.4112 µs 3.4134 µs 3.4162 µs]
thrpt: [71.465 GiB/s 71.523 GiB/s 71.570 GiB/s]
change:
time: [-93.879% -93.853% -93.829%] (p = 0.00 <
0.05)
thrpt: [+1520.6% +1526.9% +1533.6%]
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
4 (4.00%) high mild
9 (9.00%) high severe
float32/min nonnull time: [6.0613 µs 6.0636 µs 6.0662 µs]
thrpt: [40.246 GiB/s 40.263 GiB/s 40.279 GiB/s]
change:
time: [-91.625% -91.594% -91.566%] (p = 0.00 <
0.05)
thrpt: [+1085.7% +1089.6% +1094.0%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe
float32/max nonnull time: [6.0646 µs 6.0683 µs 6.0730 µs]
thrpt: [40.201 GiB/s 40.232 GiB/s 40.257 GiB/s]
change:
time: [-91.519% -91.506% -91.493%] (p = 0.00 <
0.05)
thrpt: [+1075.5% +1077.3% +1079.1%]
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
7 (7.00%) high mild
7 (7.00%) high severe
float32/sum nullable time: [11.487 µs 11.492 µs 11.499 µs]
thrpt: [21.232 GiB/s 21.244 GiB/s 21.253 GiB/s]
change:
time: [-91.966% -91.921% -91.879%] (p = 0.00 <
0.05)
thrpt: [+1131.4% +1137.8% +1144.7%]
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
2 (2.00%) low mild
3 (3.00%) high mild
7 (7.00%) high severe
float32/min nullable time: [17.309 µs 17.318 µs 17.330 µs]
thrpt: [14.088 GiB/s 14.098 GiB/s 14.105 GiB/s]
change:
time: [-79.033% -78.986% -78.942%] (p = 0.00 <
0.05)
thrpt: [+374.87% +375.86% +376.94%]
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
6 (6.00%) high mild
6 (6.00%) high severe
float32/max nullable time: [17.313 µs 17.328 µs 17.350 µs]
thrpt: [14.071 GiB/s 14.089 GiB/s 14.102 GiB/s]
change:
time: [-79.230% -79.195% -79.161%] (p = 0.00 <
0.05)
thrpt: [+379.88% +380.66% +381.47%]
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
6 (6.00%) high mild
8 (8.00%) high severe
float64/sum nonnull time: [6.8393 µs 6.8460 µs 6.8552 µs]
thrpt: [71.228 GiB/s 71.323 GiB/s 71.394 GiB/s]
change:
time: [-87.308% -87.274% -87.243%] (p = 0.00 <
0.05)
thrpt: [+683.86% +685.76% +687.90%]
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) high mild
7 (7.00%) high severe
float64/min nonnull time: [12.117 µs 12.130 µs 12.148 µs]
thrpt: [40.195 GiB/s 40.253 GiB/s 40.296 GiB/s]
change:
time: [-82.709% -82.656% -82.605%] (p = 0.00 <
0.05)
thrpt: [+474.88% +476.56% +478.33%]
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
7 (7.00%) high mild
6 (6.00%) high severe
float64/max nonnull time: [12.117 µs 12.127 µs 12.138 µs]
thrpt: [40.227 GiB/s 40.265 GiB/s 40.298 GiB/s]
change:
time: [-82.566% -82.541% -82.517%] (p = 0.00 <
0.05)
thrpt: [+471.97% +472.78% +473.60%]
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
5 (5.00%) high mild
5 (5.00%) high severe
float64/sum nullable time: [23.024 µs 23.032 µs 23.042 µs]
thrpt: [21.191 GiB/s 21.200 GiB/s 21.208 GiB/s]
change:
time: [-83.329% -83.268% -83.209%] (p = 0.00 <
0.05)
thrpt: [+495.57% +497.66% +499.84%]
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low severe
7 (7.00%) low mild
1 (1.00%) high mild
3 (3.00%) high severe
float64/min nullable time: [34.426 µs 34.431 µs 34.437 µs]
thrpt: [14.179 GiB/s 14.181 GiB/s 14.183 GiB/s]
change:
time: [-57.965% -57.881% -57.805%] (p = 0.00 <
0.05)
thrpt: [+136.99% +137.42% +137.90%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
float64/max nullable time: [34.444 µs 34.471 µs 34.505 µs]
thrpt: [14.151 GiB/s 14.165 GiB/s 14.176 GiB/s]
change:
time: [-58.304% -58.247% -58.186%] (p = 0.00 <
0.05)
thrpt: [+139.15% +139.50% +139.83%]
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
int8/sum nonnull time: [291.60 ns 291.71 ns 291.84 ns]
thrpt: [209.14 GiB/s 209.23 GiB/s 209.31 GiB/s]
change:
time: [-4.0706% -3.9336% -3.7993%] (p = 0.00 <
0.05)
thrpt: [+3.9493% +4.0946% +4.2433%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
int8/min nonnull time: [288.61 ns 288.71 ns 288.82 ns]
thrpt: [211.33 GiB/s 211.41 GiB/s 211.48 GiB/s]
change:
time: [-57.479% -57.335% -57.202%] (p = 0.00 <
0.05)
thrpt: [+133.66% +134.38% +135.18%]
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
6 (6.00%) high mild
int8/max nonnull time: [289.92 ns 290.20 ns 290.54 ns]
thrpt: [210.07 GiB/s 210.32 GiB/s 210.53 GiB/s]
change:
time: [-57.142% -57.024% -56.907%] (p = 0.00 <
0.05)
thrpt: [+132.06% +132.69% +133.33%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) high mild
6 (6.00%) high severe
int8/sum nullable time: [3.4562 µs 3.4576 µs 3.4597 µs]
thrpt: [17.642 GiB/s 17.652 GiB/s 17.660 GiB/s]
change:
time: [-97.490% -97.484% -97.479%] (p = 0.00 <
0.05)
thrpt: [+3866.5% +3874.8% +3883.8%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) high mild
6 (6.00%) high severe
int8/min nullable time: [3.8795 µs 3.8810 µs 3.8829 µs]
thrpt: [15.719 GiB/s 15.727 GiB/s 15.733 GiB/s]
change:
time: [-92.463% -92.420% -92.378%] (p = 0.00 <
0.05)
thrpt: [+1212.0% +1219.2% +1226.7%]
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
int8/max nullable time: [3.8790 µs 3.8812 µs 3.8846 µs]
thrpt: [15.712 GiB/s 15.726 GiB/s 15.735 GiB/s]
change:
time: [-92.711% -92.686% -92.662%] (p = 0.00 <
0.05)
thrpt: [+1262.9% +1267.3% +1271.9%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) high mild
6 (6.00%) high severe
int16/sum nonnull time: [583.83 ns 584.05 ns 584.30 ns]
thrpt: [208.92 GiB/s 209.01 GiB/s 209.09 GiB/s]
change:
time: [-2.6158% -2.5160% -2.4259%] (p = 0.00 <
0.05)
thrpt: [+2.4862% +2.5809% +2.6861%]
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low severe
3 (3.00%) high mild
6 (6.00%) high severe
int16/min nonnull time: [578.65 ns 578.82 ns 579.01 ns]
thrpt: [210.83 GiB/s 210.89 GiB/s 210.96 GiB/s]
change:
time: [-55.421% -55.393% -55.367%] (p = 0.00 <
0.05)
thrpt: [+124.05% +124.18% +124.32%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
int16/max nonnull time: [580.11 ns 580.76 ns 581.55 ns]
thrpt: [209.91 GiB/s 210.19 GiB/s 210.43 GiB/s]
change:
time: [-55.286% -55.246% -55.200%] (p = 0.00 <
0.05)
thrpt: [+123.21% +123.44% +123.65%]
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) high mild
11 (11.00%) high severe
int16/sum nullable time: [3.4950 µs 3.4976 µs 3.5007 µs]
thrpt: [34.870 GiB/s 34.901 GiB/s 34.927 GiB/s]
change:
time: [-97.406% -97.394% -97.383%] (p = 0.00 <
0.05)
thrpt: [+3721.6% +3737.4% +3754.5%]
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low mild
8 (8.00%) high mild
2 (2.00%) high severe
int16/min nullable time: [5.2134 µs 5.2147 µs 5.2161 µs]
thrpt: [23.403 GiB/s 23.409 GiB/s 23.415 GiB/s]
change:
time: [-89.477% -89.410% -89.347%] (p = 0.00 <
0.05)
thrpt: [+838.73% +844.29% +850.30%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
6 (6.00%) high mild
2 (2.00%) high severe
int16/max nullable time: [5.1586 µs 5.1597 µs 5.1609 µs]
thrpt: [23.653 GiB/s 23.658 GiB/s 23.663 GiB/s]
change:
time: [-89.341% -89.279% -89.226%] (p = 0.00 <
0.05)
thrpt: [+828.20% +832.76% +838.20%]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
int32/sum nonnull time: [1.1674 µs 1.1681 µs 1.1689 µs]
thrpt: [208.86 GiB/s 209.00 GiB/s 209.13 GiB/s]
change:
time: [-2.3865% -2.2896% -2.2025%] (p = 0.00 <
0.05)
thrpt: [+2.2521% +2.3433% +2.4448%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
int32/min nonnull time: [1.1593 µs 1.1600 µs 1.1607 µs]
thrpt: [210.33 GiB/s 210.46 GiB/s 210.59 GiB/s]
change:
time: [-55.402% -55.344% -55.297%] (p = 0.00 <
0.05)
thrpt: [+123.70% +123.93% +124.23%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
int32/max nonnull time: [1.1613 µs 1.1619 µs 1.1625 µs]
thrpt: [210.02 GiB/s 210.13 GiB/s 210.24 GiB/s]
change:
time: [-55.217% -55.186% -55.155%] (p = 0.00 <
0.05)
thrpt: [+122.99% +123.14% +123.30%]
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
int32/sum nullable time: [3.6026 µs 3.6048 µs 3.6077 µs]
thrpt: [67.673 GiB/s 67.727 GiB/s 67.768 GiB/s]
change:
time: [-97.340% -97.335% -97.329%] (p = 0.00 <
0.05)
thrpt: [+3644.1% +3652.3% +3659.5%]
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
2 (2.00%) low severe
1 (1.00%) low mild
4 (4.00%) high mild
6 (6.00%) high severe
int32/min nullable time: [12.045 µs 12.049 µs 12.054 µs]
thrpt: [20.255 GiB/s 20.263 GiB/s 20.269 GiB/s]
change:
time: [-75.866% -75.755% -75.641%] (p = 0.00 <
0.05)
thrpt: [+310.52% +312.45% +314.36%]
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe
int32/max nullable time: [12.051 µs 12.055 µs 12.060 µs]
thrpt: [20.244 GiB/s 20.252 GiB/s 20.259 GiB/s]
change:
time: [-75.752% -75.639% -75.524%] (p = 0.00 <
0.05)
thrpt: [+308.56% +310.50% +312.41%]
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
3 (3.00%) high mild
4 (4.00%) high severe
int64/sum nonnull time: [2.4296 µs 2.4311 µs 2.4327 µs]
thrpt: [200.72 GiB/s 200.85 GiB/s 200.97 GiB/s]
change:
time: [-1.0469% -0.7523% -0.3929%] (p = 0.00 <
0.05)
thrpt: [+0.3944% +0.7580% +1.0580%]
Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe
int64/min nonnull time: [2.4356 µs 2.4377 µs 2.4403 µs]
thrpt: [200.09 GiB/s 200.30 GiB/s 200.48 GiB/s]
change:
time: [-54.033% -53.990% -53.940%] (p = 0.00 <
0.05)
thrpt: [+117.11% +117.34% +117.55%]
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
int64/max nonnull time: [2.4404 µs 2.4419 µs 2.4436 µs]
thrpt: [199.82 GiB/s 199.96 GiB/s 200.09 GiB/s]
change:
time: [-53.982% -53.922% -53.872%] (p = 0.00 <
0.05)
thrpt: [+116.79% +117.02% +117.31%]
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) low mild
6 (6.00%) high mild
3 (3.00%) high severe
int64/sum nullable time: [7.1891 µs 7.1921 µs 7.1958 µs]
thrpt: [67.856 GiB/s 67.892 GiB/s 67.920 GiB/s]
change:
time: [-94.725% -94.718% -94.710%] (p = 0.00 <
0.05)
thrpt: [+1790.4% +1793.1% +1795.7%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
4 (4.00%) high mild
5 (5.00%) high severe
int64/min nullable time: [24.719 µs 24.728 µs 24.738 µs]
thrpt: [19.738 GiB/s 19.746 GiB/s 19.753 GiB/s]
change:
time: [-51.258% -51.041% -50.833%] (p = 0.00 <
0.05)
thrpt: [+103.39% +104.25% +105.16%]
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
6 (6.00%) high mild
6 (6.00%) high severe
int64/max nullable time: [24.704 µs 24.712 µs 24.722 µs]
thrpt: [19.751 GiB/s 19.759 GiB/s 19.765 GiB/s]
change:
time: [-54.224% -54.086% -53.942%] (p = 0.00 <
0.05)
thrpt: [+117.12% +117.80% +118.45%]
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low mild
7 (7.00%) high mild
3 (3.00%) high severe
string/min nonnull time: [141.44 µs 141.58 µs 141.77 µs]
thrpt: [462.26 Melem/s 462.90 Melem/s 463.36
Melem/s]
change:
time: [-0.7023% -0.5395% -0.4004%] (p = 0.00 <
0.05)
thrpt: [+0.4020% +0.5424% +0.7073%]
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) low mild
4 (4.00%) high mild
4 (4.00%) high severe
string/max nonnull time: [141.32 µs 141.46 µs 141.66 µs]
thrpt: [462.64 Melem/s 463.28 Melem/s 463.75
Melem/s]
change:
time: [-0.5020% -0.2694% -0.0027%] (p = 0.03 <
0.05)
thrpt: [+0.0027% +0.2701% +0.5045%]
Change within noise threshold.
Found 18 outliers among 100 measurements (18.00%)
6 (6.00%) high mild
12 (12.00%) high severe
string/min nullable time: [267.85 µs 268.07 µs 268.30 µs]
thrpt: [244.26 Melem/s 244.47 Melem/s 244.68
Melem/s]
change:
time: [+1.4168% +1.6011% +1.7800%] (p = 0.00 <
0.05)
thrpt: [-1.7488% -1.5758% -1.3970%]
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
string/max nullable time: [281.09 µs 281.47 µs 281.84 µs]
thrpt: [232.53 Melem/s 232.84 Melem/s 233.15
Melem/s]
change:
time: [+1.6971% +1.8843% +2.0923%] (p = 0.00 <
0.05)
thrpt: [-2.0494% -1.8495% -1.6688%]
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]