adamreeve commented on PR #6953:
URL: https://github.com/apache/arrow-rs/pull/6953#issuecomment-2576679001
Benchmark results from the new benchmarks before changing the interning
behaviour:
```
write_batch primitive/4096 values float with NaNs
time: [5.6968 ms 5.7060 ms 5.7141 ms]
thrpt: [9.6186 MiB/s 9.6324 MiB/s 9.6479 MiB/s]
Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) low severe
4 (4.00%) low mild
1 (1.00%) high mild
write_batch primitive/4096 values float with no NaNs
time: [383.44 µs 383.65 µs 383.85 µs]
thrpt: [143.18 MiB/s 143.26 MiB/s 143.34 MiB/s]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
```
This shows that writing with 50% NaN values is much slower than with no NaNs.
After the change, performance with NaNs is very similar to without NaNs:
```
write_batch primitive/4096 values float with NaNs
time: [406.40 µs 406.63 µs 406.88 µs]
thrpt: [135.08 MiB/s 135.16 MiB/s 135.24 MiB/s]
change:
time: [-92.875% -92.861% -92.845%] (p = 0.00 <
0.05)
thrpt: [+1297.6% +1300.7% +1303.5%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
write_batch primitive/4096 values float with no NaNs
time: [382.52 µs 384.16 µs 385.50 µs]
thrpt: [142.58 MiB/s 143.07 MiB/s 143.68 MiB/s]
change:
time: [+0.1803% +0.3520% +0.5192%] (p = 0.00 <
0.05)
thrpt: [-0.5165% -0.3507% -0.1799%]
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) low severe
```
(I removed the `4096 values float with no NaNs` benchmark from this PR after
running these benchmarks as I don't think there's a lot of value in keeping it)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]