andygrove opened a new pull request, #3696:
URL: https://github.com/apache/datafusion-comet/pull/3696

   ## Which issue does this PR close?
   
   Closes #3315.
   
   ## Rationale for this change
   
   Three Spark SQL tests were ignored for `native_datafusion` scan mode due to 
plan structure differences. The root cause for the streaming tests was that 
`CometNativeScanExec` did not expose a `numOutputRows` metric, which Spark's 
streaming `ProgressReporter` uses to count input rows.
   
   ## What changes are included in this PR?
   
   - **`CometNativeScanExec`**: Add `numOutputRows` as an alias for the 
`output_rows` native metric. Both keys reference the same `SQLMetric` instance, 
so when native code updates `output_rows`, Spark's streaming framework sees the 
correct value via `numOutputRows`.
   - **`dev/diffs/3.5.8.diff`**: Remove `IgnoreCometNativeDataFusion` tags from 
three tests:
     - `FileDataSourceV2FallBackSuite`: "Fallback Parquet V2 to V1" (assertion 
already handles `CometNativeScanExec`)
     - `StreamingQuerySuite`: "SPARK-41198: input row calculation with CTE"
     - `StreamingQuerySuite`: "SPARK-41199: input row calculation with mixed-up 
of DSv1 and DSv2 streaming sources"
   
   ## How are these changes tested?
   
   All three tests verified locally with 
`COMET_PARQUET_SCAN_IMPL=native_datafusion` against Spark 3.5.8 with the 
updated diff applied.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to