andygrove opened a new pull request, #3981: URL: https://github.com/apache/datafusion-comet/pull/3981
## Which issue does this PR close? Related to #3949. ## Rationale for this change \`stageContainsDPPScan\` in \`CometShuffleExchangeExec\` (introduced by #3879) uses a plain \`s.child.exists(...)\` walk to decide whether the shuffle subtree contains a DPP scan. Under AQE, once a child stage materializes, its subtree is replaced by a \`ShuffleQueryStageExec\` — a \`LeafExecNode\` whose \`children\` is \`Seq.empty\`. \`.exists\` cannot descend through it, so the DPP scan becomes invisible on the stage-prep pass. The consequence: the same shuffle that correctly fell back to Spark at initial planning gets converted to Comet the second time the rule runs, because \`stageContainsDPPScan\` returns \`false\` once the inner stage has materialized. That flip produces plan-shape inconsistencies across the two passes — the suspected mechanism behind the \`ColumnarToRowExec\` canonicalization assertion in #3949. ## What changes are included in this PR? - Walk the tree explicitly instead of relying on \`exists\`, and descend into \`QueryStageExec.plan\` so both passes see the same subtree. ## How are these changes tested? - New \`CometDppFallbackConsistencySuite\` builds a DPP-shaped query, takes the real \`ShuffleExchangeExec\` out of the plan, wraps it in a real \`ShuffleQueryStageExec\` (the exact wrapper AQE produces when a stage materializes), and asserts that \`columnarShuffleSupported\` still falls back to Spark. Without this fix the outer wrapping flips the decision to Comet. - Existing \`DPP fallback\` and \`DPP fallback avoids inefficient Comet shuffle (#3874)\` tests in \`CometExecSuite\` still pass. I have not been able to reproduce the full #3949 crash even with the flip demonstrated, so this PR is positioned as a correctness fix for the decision-stability invariant. It may or may not close #3949 on its own. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
