zhuqi-lucas commented on PR #21976: URL: https://github.com/apache/datafusion/pull/21976#issuecomment-4438274721
Quick update — pushed all three steps. Branch is now `ensure-requirements` (commits `d5bd105e8` and `47cd38363` on top of the original). **Steps 1+2 (`d5bd105e8`)** — removed the disconnected `EnforceDistribution` / `EnforceSorting` rule structs, and retargeted the existing tests in `core/tests/physical_optimizer/`: - `enforce_distribution.rs` — `Run::Distribution` / `Run::Sorting` branches both call `EnsureRequirements::new()`. Legacy run sequences kept verbatim so existing assertions retain their shape; the merged rule is idempotent so the previously-different orderings now converge. - `enforce_sorting.rs` — the historical `[Dist, Sort]` vs `[Sort, Dist, Sort]` idempotency comparison is rewritten as "running EnsureRequirements N times == running it once". - 78 snapshots refreshed; consistent pattern is `SortExec + CoalescePartitionsExec` (blocking) → `SortPreservingMergeExec` (streaming), since EnsureRequirements now applies `parallelize_sorts` + `replace_with_order_preserving_variants` on inputs the single-rule path didn't reach. Improvements rather than regressions. **Step 3 (`47cd38363`)** — went ahead and made the call: dropped `ensure_requirements/new_tests.rs` entirely. For the "which new-behavior cases to keep" question I raised earlier: turns out every test in both `new_tests.rs` and `ensure_requirements/mod.rs` inline tests uses either `optimize_and_sanity_check` (production-502 regression) or `assert_idempotent` (merged-rule core property). None are pure smoke tests. The real redundancy was organizational — two files with duplicated helpers and substantial plan-shape overlap (hash join, aggregate, window, projection, fetch, filter all covered in both). So I deleted `new_tests.rs` and kept the `mod.rs` inline tests as-is. All the named bug regressions (#14150, PR53, PR54) live there, plus a comprehensive sweep of plan shapes with idempotency + SanityCheckPlan-validity assertions. If you'd prefer a more aggressive cull (keep only `#14150` / PR53 / PR54 named regressions, drop the rest), happy to do that as a follow-up commit — just say the word. Test results after the cleanup: - `core_integration physical_optimizer::` 454 passed - `datafusion-physical-optimizer` (lib) 59 passed (down from 79 with `new_tests.rs` dropped) - `clippy --all-targets -- -D warnings` clean - `cargo fmt --check` clean CI is running on the push now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
