andygrove opened a new pull request, #3982:
URL: https://github.com/apache/datafusion-comet/pull/3982
## Which issue does this PR close?
Related to #3949.
## Rationale for this change
Comet's shuffle-support predicates (\`nativeShuffleSupported\`,
\`columnarShuffleSupported\` in \`CometShuffleExchangeExec\`) are called at
both initial planning and AQE stage-prep. Between those passes, AQE wraps
already-materialized child stages in \`ShuffleQueryStageExec\` — a
\`LeafExecNode\` whose \`children\` is \`Seq.empty\`. A naive re-evaluation of
the support check can therefore flip the decision.
Concrete example (and the suspected trigger for #3949):
\`stageContainsDPPScan\` walks \`s.child.exists(...)\` looking for a
\`FileSourceScanExec\` with a \`PlanExpression\` partition filter. When the DPP
subtree sits under a materialized \`ShuffleQueryStageExec\`, \`.exists\` stops
at the wrapper and the DPP scan becomes invisible. The same shuffle that
correctly fell back to Spark at initial planning is then converted to Comet at
stage prep — producing plan-shape inconsistencies across the two passes.
This PR takes the same approach \`CometNativeScan.isSupported\` already uses
for scans: if the node already carries any Comet fallback tag from a prior rule
pass, preserve that decision rather than re-deriving it.
## What changes are included in this PR?
- \`nativeShuffleSupported\` and \`columnarShuffleSupported\` short-circuit
to \`false\` when \`hasExplainInfo(s)\` is true.
- New \`CometShuffleFallbackStickinessSuite\`:
- direct: tags a synthetic shuffle and asserts both predicates return
\`false\`.
- end-to-end: plans a DPP query, runs the support check once to establish
the fallback tag, then swaps the shuffle's child for an opaque \`LeafExecNode\`
that hides the DPP subtree (mimicking a materialized stage), and asserts the
second call still falls back.
Existing \`DPP fallback\` and \`DPP fallback avoids inefficient Comet
shuffle (#3874)\` tests in \`CometExecSuite\` continue to pass.
## How are these changes tested?
- New stickiness suite (two tests) passes.
- Existing DPP tests pass.
- I have **not** been able to reproduce the full #3949 \`ColumnarToRowExec\`
canonicalization assertion with small synthetic DPP queries, so this PR is
framed as a correctness fix for the decision-stability invariant — it may or
may not close #3949 on its own.
Draft until we either reproduce the full crash end-to-end, or confirm the
fix resolves the awslabs benchmark failure.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]