ShreyeshArangath opened a new issue, #1379:
URL: https://github.com/apache/datafusion-python/issues/1379

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
    DataFusion Python currently provides execution metrics only through the 
`explain(analyze=True)` output, which displays metrics as formatted console 
text. There is no structured Python API to programmatically access per-operator 
metrics such as `output_rows`, `elapsed_compute`, `spill_count`, etc.
   
   **Describe the solution you'd like**
   Expose a structured Python API to access execution metrics after running a 
query:
   ```py
     from datafusion import SessionContext, collect_metrics
   
     ctx = SessionContext()
     df = ctx.sql("SELECT * FROM table WHERE value > 100")
     plan = df.execution_plan()
     plan.execute_collect(ctx)
   
     # Access metrics on the plan
     metrics = plan.metrics()
     print(f"Rows: {metrics.output_rows}")
     print(f"CPU time: {metrics.elapsed_compute} ns")
   
     for operator_name, operator_metrics in collect_metrics(plan):
         print(f"{operator_name}: {operator_metrics.output_rows} rows")
   ```
   
   **Describe alternatives you've considered**
   N/A
   
   **Additional context**
   This mirrors the existing Rust API in datafusion::physical_plan::metrics and 
makes it accessible from Python. The metrics would only be populated after 
execution, matching DataFusion's semantics where metrics are collected during 
query execution. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to