LucaCanali opened a new pull request #26953: [SPARK-30306][CORE][PYTHON] 
Instrument Python UDF execution time and metrics using Spark Metrics system
URL: https://github.com/apache/spark/pull/26953
 
 
   ### What changes were proposed in this pull request?
   This proposes to extend Spark instrumentation to add metrics aimed at 
drilling down on the performance of Python code called by Spark: via UDF, 
Pandas UDF or with MapPartittions. Relevant performance counters, notably 
exuction time, are exposed using the Spark Metrics System (based on the 
Dropwizard library).
   
   ### Why are the changes needed?
   This allows to easily consume the metrics produced by executors, for example 
using a performance dashboard (this references to previous work as discucssed 
in 
https://db-blog.web.cern.ch/blog/luca-canali/2019-02-performance-dashboard-apache-spark
 ).
   See also the screenshot that compares the existing state (no Python UDF time 
instrumentation) to the proposed new functionality 
![](https://issues.apache.org/jira/secure/attachment/12989201/PandasUDF_Time_Instrumentation_Annotated.png)
   
   ### Does this PR introduce any user-facing change?
   This PR adds the PythonMetrics source to the Spark Metrics system. The list 
of the implemented metrics has been added to the Monitoring documentation.
   
   ### How was this patch tested?
   Added relevant tests
   + manually tested end-to-end on a YARN cluster and using an existing Spark 
dashboard extended with the metrics proposed here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to