andygrove opened a new pull request, #3696:
URL: https://github.com/apache/datafusion-comet/pull/3696
## Which issue does this PR close?
Closes #3315.
## Rationale for this change
Three Spark SQL tests were ignored for `native_datafusion` scan mode due to
plan structure differences. The root cause for the streaming tests was that
`CometNativeScanExec` did not expose a `numOutputRows` metric, which Spark's
streaming `ProgressReporter` uses to count input rows.
## What changes are included in this PR?
- **`CometNativeScanExec`**: Add `numOutputRows` as an alias for the
`output_rows` native metric. Both keys reference the same `SQLMetric` instance,
so when native code updates `output_rows`, Spark's streaming framework sees the
correct value via `numOutputRows`.
- **`dev/diffs/3.5.8.diff`**: Remove `IgnoreCometNativeDataFusion` tags from
three tests:
- `FileDataSourceV2FallBackSuite`: "Fallback Parquet V2 to V1" (assertion
already handles `CometNativeScanExec`)
- `StreamingQuerySuite`: "SPARK-41198: input row calculation with CTE"
- `StreamingQuerySuite`: "SPARK-41199: input row calculation with mixed-up
of DSv1 and DSv2 streaming sources"
## How are these changes tested?
All three tests verified locally with
`COMET_PARQUET_SCAN_IMPL=native_datafusion` against Spark 3.5.8 with the
updated diff applied.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]