LucaCanali opened a new pull request #31367:
URL: https://github.com/apache/spark/pull/31367


   ### What changes were proposed in this pull request?
   This proposes to add SQLMetrics instrumentation for Python UDF execution.
   
   ### Why are the changes needed?
   This is aimed at improving monitoring and performance troubleshooting of 
Python code called by Spark, via UDF, Pandas UDF or with MapPartittions. 
   
   ### Does this PR introduce _any_ user-facing change?
   The introduced SQL metrics are exposed to the end users via the WebUI 
interface, visible in the SQL tab for execution steps related to Python UDF 
execution, namely BatchEvalPython, ArrowEvalPython, AggregateInPandas, 
FlaMapGroupsInPandas, FlatMapsCoGroupsInPandas, MapInPandas, WindowsInPandas.
   See also the screenshot with the metrics introduced:
   
![](https://issues.apache.org/jira/secure/attachment/13019522/PythonSQLMetrics_Jira_Picture.png)
   
   ### How was this patch tested?
   Manually tested + a python test has been added.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to