ShreyeshArangath opened a new pull request, #1381:
URL: https://github.com/apache/datafusion-python/pull/1381
# Which issue does this PR close?
<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases. You can
link an issue to this PR using the GitHub syntax. For example `Closes #123`
indicates that this PR will close issue #123.
-->
Closes #1379
# Rationale for this change
<!--
Why are you proposing this change? If this is already explained clearly in
the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand your
changes and offer better suggestions for fixes.
-->
Today, DataFusion Python only exposes execution metrics through formatted
console output via `explain(analyze=True)`.
This makes it difficult to programmatically inspect execution behavior.
There is currently no structured API to access per-operator metrics such as
output_rows, elapsed_compute, spill_count and other runtime metrics collected
during execution.
This PR introduces a structured Python interface for execution metrics,
mirroring the Rust API in `datafusion::physical_plan::metrics`.
# What changes are included in this PR?
- Added plan caching to `PyDataFrame` so the physical plan used during
execution is retained and available for metrics access.
- Kept the `metrics()` method and added `collect_metrics()` helper to walk
the execution plan tree and aggregate metrics from all operators.
# Are there any user-facing changes?
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
Users can now programmatically access execution metrics
```python
df = ctx.sql("SELECT * FROM t WHERE x > 1")
df.collect()
plan = df.execution_plan()
metrics = plan.collect_metrics()
for operator_name, metrics_set in metrics:
print(f"{operator_name}: {metrics_set.output_rows} rows")
```
<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]