ClSlaid commented on PR #9755:
URL: https://github.com/apache/arrow-rs/pull/9755#issuecomment-4609603062
Benchmark update for the latest version of this PR.
I reran the full `filter:` benchmark set after reverting the dense-filter
experiments, so the `current` numbers below correspond to the current
per-column implementation.
Command:
```bash
CARGO_TARGET_DIR=/Users/cl/Projects/CLionProjects/arrow-rs/target \
cargo bench -p arrow --features test_utils --bench coalesce_kernels --
'filter:' \
--sample-size 10 --warm-up-time 1 --measurement-time 2 \
--baseline pre_per_column_filter_all
```
Definitions:
- **baseline**: `apache/main` (`apache_main_filter_all`)
- **previous**: previous PR implementation before the per-column refactor
(`pre_per_column_filter_all`)
- **current**: current per-column implementation
- Relative time is normalized to baseline, so lower is better.
- `current vs previous` uses the geomean of per-case mean times.
- The improved/regressed/no-change counts use Criterion's confidence
interval for current vs previous.
| Group | Cases | Baseline | Previous | Current | Current vs previous |
Current vs previous significance |
|---|---:|---:|---:|---:|---:|---|
| all | 104 | 1.000x | 0.905x | 0.811x | -10.4% | 59 improved / 13 regressed
/ 32 no-change |
| selectivity 0.001 | 26 | 1.000x | 0.769x | 0.584x | -24.1% | 23 improved /
0 regressed / 3 no-change |
| selectivity 0.01 | 26 | 1.000x | 0.885x | 0.755x | -14.7% | 22 improved /
0 regressed / 4 no-change |
| selectivity 0.1 | 26 | 1.000x | 0.987x | 0.969x | -1.8% | 10 improved / 4
regressed / 12 no-change |
| selectivity 0.8 | 26 | 1.000x | 0.998x | 1.011x | +1.4% | 4 improved / 9
regressed / 13 no-change |
Breakdown by null density and selectivity:
| Nulls / selectivity | Cases | Baseline | Previous | Current | Current vs
previous | Improved / regressed / no-change |
|---|---:|---:|---:|---:|---:|---|
| nulls 0, sel 0.001 | 13 | 1.000x | 0.730x | 0.536x | -26.6% | 13 / 0 / 0 |
| nulls 0, sel 0.01 | 13 | 1.000x | 0.852x | 0.710x | -16.7% | 11 / 0 / 2 |
| nulls 0, sel 0.1 | 13 | 1.000x | 0.979x | 0.951x | -2.9% | 6 / 1 / 6 |
| nulls 0, sel 0.8 | 13 | 1.000x | 0.990x | 0.998x | +0.8% | 3 / 4 / 6 |
| nulls 0.1, sel 0.001 | 13 | 1.000x | 0.810x | 0.635x | -21.6% | 10 / 0 / 3
|
| nulls 0.1, sel 0.01 | 13 | 1.000x | 0.920x | 0.802x | -12.8% | 11 / 0 / 2 |
| nulls 0.1, sel 0.1 | 13 | 1.000x | 0.995x | 0.989x | -0.6% | 4 / 3 / 6 |
| nulls 0.1, sel 0.8 | 13 | 1.000x | 1.006x | 1.025x | +1.9% | 1 / 5 / 7 |
Largest current-vs-previous improvements:
| Case | Current vs previous |
|---|---:|
| primitive, nulls 0, selectivity 0.001 | -67.1% |
| primitive, nulls 0.1, selectivity 0.001 | -51.2% |
| primitive, nulls 0, selectivity 0.01 | -47.2% |
| mixed_utf8view max_len=20, nulls 0, selectivity 0.001 | -41.0% |
| mixed_binaryview max_len=20, nulls 0, selectivity 0.001 | -40.2% |
Largest current-vs-previous regressions:
| Case | Current vs previous | Current vs baseline |
|---|---:|---:|
| single_utf8view, nulls 0, selectivity 0.8 | +21.9% | +0.8% |
| single_utf8view, nulls 0.1, selectivity 0.8 | +17.1% | +3.7% |
| primitive, nulls 0, selectivity 0.8 | +9.0% | -0.3% |
| mixed_binaryview max_len=8, nulls 0, selectivity 0.8 | +8.9% | -0.1% |
| mixed_binaryview max_len=128, nulls 0.1, selectivity 0.8 | +7.0% | +8.4% |
Summary: the current per-column implementation is materially faster for
sparse filters and is overall faster than both `apache/main` and the previous
PR implementation on this benchmark set. The remaining regressions are
concentrated in dense/high-selectivity cases, especially some `Utf8View` /
`BinaryView` cases. I left dense-filter-specific tuning out of this PR and plan
to treat that separately.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]