hhhizzz commented on PR #10141:
URL: https://github.com/apache/arrow-rs/pull/10141#issuecomment-4732734563
I did a quick TPC-DS follow-up on this PR to check whether the bitmap-backed
`RowSelection` change shows any broad regression in DataFusion. This is
**not**
a formal 10-round benchmark result; it is a 3-round/debug signal plus
targeted
rechecks.
Setup:
- TPC-DS SF10, all 99 queries
- 3 full rounds, 1 iteration per query
- CPUSET `0-23`
- Parquet pushdown filters / reorder filters / pruning all enabled
- Arrow baseline: PR base `be664614ab`
- Arrow current: this PR head `7f40e67a`
- DataFusion base: `ddc157d10e8`, with only Cargo patch wiring changed
between the two sides
Full 3-round result was effectively neutral:
```text
main total: 175786.07 ms
current total: 175010.49 ms
geomean current/main: 0.993443
runner-reported current speedup: 0.660%
failures: 0
```
The round-level direction was not stable:
```text
round 1: +2.18% current speedup
round 2: +6.44% current speedup
round 3: -6.22% current speedup
```
Some queries initially looked slower in the full run, so I rechecked the
largest
apparent regressions in isolation with 5 rounds each:
| query | isolated geomean current/main | bucket | range |
|---:|---:|---|---:|
| q39 | 1.014365 | neutral | 0.991326-1.057961 |
| q23 | 1.045529 | neutral | 0.980200-1.171810 |
| q28 | 0.991207 | neutral | 0.953509-1.051072 |
| q47 | 1.011234 | neutral | 0.989303-1.030131 |
| q87 | 0.989210 | neutral | 0.957960-1.017851 |
I also ran `dfbench tpcds --debug` for those queries and compared
`DataSourceExec` metrics. I did not see evidence of row-pushdown getting
worse
on the PR side. For example, summed task metrics showed:
| query/table | processing main -> current | scanning main -> current |
row_pushdown main -> current |
|---|---:|---:|---:|
| q39 `inventory` | 752.4 -> 764.4 ms | 7770.0 -> 7660.0 ms | 74.9 -> 74.5
ms |
| q23 `store_sales` | 23310.0 -> 22520.0 ms | 87870.0 -> 84350.0 ms |
13157.2 -> 12881.9 ms |
| q28 `store_sales` | 15470.0 -> 14710.0 ms | 81720.0 -> 78300.0 ms | 247.2
-> 243.8 ms |
Those metrics are summed across tasks, not wall-clock time, but they are
useful
for attribution. The main takeaway is that the large per-query slowdowns from
the 3-round full run did not reproduce in isolated runs and do not appear to
come from row-pushdown evaluation.
My interpretation is that standard TPC-DS is mostly exercising the existing
selector-backed `RowSelection::from_filters` path, not the new "caller
already
has a `BooleanBuffer`" path. So near-neutral TPC-DS is expected and should
not
block this PR by itself.
A few suggestions:
1. Keep the PR focused on first-class bitmap-backed `RowSelection`; this is
the
right shape for external indexes / FTS / runtime filters that already
produce
a bitmap.
2. Add or keep direct benchmarks for the existing selector-backed operations
(`from_filters`, `and_then`, `intersection`, `split_off`, `limit`,
`offset`)
because the PR changes the common representation and iterator surface even
when callers do not use the new bitmap constructor.
3. Add an end-to-end benchmark where the access plan starts as a
`BooleanBuffer`,
with both fragmented and clustered masks. Standard TPC-DS is not a strong
workload for that new path.
4. For `Auto` with mask-backed input, consider documenting the current
"preserve
mask" behavior or adding a shape gate. The raw representation break-even
is
roughly `selector_count > row_count / 128`; fragmented masks strongly
favor
bitmap, but clustered masks can still be better as selectors.
Overall, I see the TPC-DS signal as neutral, while the synthetic
fragmented-selection benchmark still strongly supports the API direction.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]