[ 
https://issues.apache.org/jira/browse/SPARK-34265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-34265:
--------------------------------
    Description: 
This proposes to add SQLMetrics instrumentation for Python UDF. 
This is aimed at improving monitoring and performance troubleshooting of Python 
UDFs, Pandas UDF, including also the use of MapPartittion, and MapInArrow.
The introduced metrics are exposed to the end users via the metrics system and 
are visible through the WebUI interface, in the SQL/DataFrame tab for execution 
steps related to Python UDF execution. See also the attached screenshots.

This intrumentation is lightweight and can be used in production and for 
monitoring. It is complementary to the Python/Pandas UDF Profiler introduced in 
Spark 3.3 
[https://spark.apache.org/docs/latest/api/python/development/debugging.html#python-pandas-udf]

  was:
This proposes to add SQLMetrics instrumentation for Python UDF. This is aimed 
at improving monitoring and performance troubleshooting of Python code called 
by Spark, via UDF, Pandas UDF or with MapPartittions.
The introduced metrics are exposed to the end users via the WebUI interface, in 
the SQL tab for execution steps related to Python UDF execution. 
Thes scope of this has been limited to Pandas UDF and related operatio, namely: 
ArrowEvalPython, AggregateInPandas, FlaMapGroupsInPandas, MapInPandas, 
FlatMapsCoGroupsInPandas, PythonMapInArrow, WindowsInPandas.
See also the attached screenshot.


> Instrument Python UDF execution using SQL Metrics
> -------------------------------------------------
>
>                 Key: SPARK-34265
>                 URL: https://issues.apache.org/jira/browse/SPARK-34265
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>    Affects Versions: 3.2.0
>            Reporter: Luca Canali
>            Priority: Minor
>         Attachments: PandasUDF_ArrowEvalPython_Metrics.png, 
> PythonSQLMetrics_Jira_Picture.png, proposed_Python_SQLmetrics_v20210128.png
>
>
> This proposes to add SQLMetrics instrumentation for Python UDF. 
> This is aimed at improving monitoring and performance troubleshooting of 
> Python UDFs, Pandas UDF, including also the use of MapPartittion, and 
> MapInArrow.
> The introduced metrics are exposed to the end users via the metrics system 
> and are visible through the WebUI interface, in the SQL/DataFrame tab for 
> execution steps related to Python UDF execution. See also the attached 
> screenshots.
> This intrumentation is lightweight and can be used in production and for 
> monitoring. It is complementary to the Python/Pandas UDF Profiler introduced 
> in Spark 3.3 
> [https://spark.apache.org/docs/latest/api/python/development/debugging.html#python-pandas-udf]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to