jverhoeks commented on issue #2376: URL: https://github.com/apache/iceberg-rust/issues/2376#issuecomment-4330042368
Update — tried two follow-up "fixes" on top of the per-task DynamicPredicate impl, both backfired and got reverted. Sharing the lessons so anyone working on the actual API in iceberg-rust doesn't have to relearn: **Attempt 1 — cap on `Predicate::Set` size in the converter** A `HashJoinExec` build side can produce a runtime `InListExpr` with hundreds of thousands of values (e.g. TPC-H q04 at SF10: ~580K orderkeys probed back into lineitem). Setting a cap (4096) so oversized lists fall through to a per-batch evaluator looked safe but was net-negative: queries with multiple runtime filters keep paying construction cost on the unaffected filters while losing the row-group pruning that the big filter was actually delivering. Worst-of-both-worlds. **Attempt 2 — `Arc::ptr_eq` cache keyed on `DynamicFilterPhysicalExpr::current()`** The intention was to skip retranslation when the filter hadn't changed between tasks. In DataFusion 53 `current()` calls `remap_children`, which in the join-pushdown path returns a freshly-built `Arc<dyn PhysicalExpr>` each call (because column indices are remapped to the probe-side schema). So `Arc::ptr_eq` essentially never matched, the cache never hit, and I just added Mutex contention across the scan's concurrent FileScanTask processors. Net SF10 wallclock: ~+5% vs no cache. **Practical takeaway for the API design** Per-task `current()` calls hit a cost floor of the underlying `Predicate` construction + `bind()` + row-group eval. For multi-join queries with many filters above one big scan, that floor is non-trivial. Two ways to lower it without breaking caching: 1. Expose `DynamicFilterPhysicalExpr` `generation: u64` (or any cheap monotonic) so consumers can skip retranslation on the no-change path. 2. Or have the `DynamicPredicate` trait return an opaque "version" alongside the predicate, letting the reader make its own cache key. Either is a small extension to what's proposed in this issue, and both are forward-compatible. Happy to fold this into the PR draft when we get to it. (Bench evidence for the original Path B-2 working as designed and these failed follow-ups is in our downstream branch; can share if useful.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
