ernestprovo23 opened a new pull request, #20767: URL: https://github.com/apache/datafusion/pull/20767
## Which issue does this PR close? Closes part of https://github.com/apache/datafusion/issues/18195 — specifically the `elapsed_compute` baseline metric sub-item for Parquet scans. ## Rationale `EXPLAIN ANALYZE` on Parquet scans reports `elapsed_compute` values like `14ns` for full table scans, which is misleading. The metric was never being populated because no timer wrapped the per-batch compute work in the Parquet scan path. ## What changes are included in this PR? Follows the same pattern established in PR #18901 (CSV fix): 1. Added `BaselineMetrics` instantiation in `ParquetOpener::open()` using the existing `metrics` and `partition_index` fields 2. Wrapped the per-batch stream `.map()` closure with an `elapsed_compute` timer that measures projection, schema replacement, and metrics copy work Single file changed: `datafusion/datasource-parquet/src/opener.rs` (+7, -3 lines) ## Are these changes tested? - All 81 existing tests in `datafusion-datasource-parquet` pass - The metric correctness is verified by observing realistic `elapsed_compute` values in `EXPLAIN ANALYZE` output (no longer showing nanosecond-level values for real scans) - Per maintainer guidance from @2010YOUY01: "Testing if we have the time measured correct is tricky, I don't think there is a good way to do it. But for a large parquet file scan, several nanoseconds is definitely not reasonable." ## Are there any user-facing changes? `EXPLAIN ANALYZE` output for Parquet scans will now show accurate `elapsed_compute` values reflecting actual CPU time spent on per-batch processing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
