[ 
https://issues.apache.org/jira/browse/SPARK-20368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16277710#comment-16277710
 ] 

Taylor Edmiston commented on SPARK-20368:
-----------------------------------------

I also posted this on the PR linked in the comment above, but I'd like to 
inquire about the status of this PR.

Is it something that could be merged?

Exception aggregation with Sentry in Python is such a common feature, and it's 
something I really need as well.  I'd be happy to jump in and help push this 
over the finish line if possible.

> Support Sentry on PySpark workers
> ---------------------------------
>
>                 Key: SPARK-20368
>                 URL: https://issues.apache.org/jira/browse/SPARK-20368
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark
>    Affects Versions: 2.1.0
>            Reporter: Alexander Shorin
>
> [Setry|https://sentry.io] is a well known among Python developers system to 
> capture, classify, track and explain tracebacks, helping people better 
> understand what went wrong, how to reproduce the issue and fix it.
> Any Spark application on Python is actually divided on two parts:
> 1. The one that runs on "driver side". That part user may control in all the 
> ways it want and provide reports to Sentry is very easy to do here.
> 2. The one that runs on executors. That's Python UDFs and the rest 
> transformation functions. Unfortunately, here we cannot provide such kind of 
> feature. And that is the part this feature is about.
> In order to simplify developing experience, it would be nice to have optional 
> Sentry support on PySpark worker level.
> What this feature could looks like?
> 1. PySpark will have new extra named {{sentry}} which installs Sentry client 
> and the rest required things if are necessary. This is an optional 
> install-time dependency.
> 2. PySpark worker will be able to detect presence of Sentry support and send 
> error reports there. 
> 3. All configuration of Sentry could and will be done via standard Sentry`s 
> environment variables.
> What this feature will give to users?
> 1. Better exceptions in Sentry. From driver-side application, now all of them 
> get recorded as like `Py4JJavaError` where the real executor exception is 
> written in a traceback body.
> 2. Greater simplification of understanding context when thing went wrong and 
> why.
> 3. Simplify Python UDFs debug and issues reproduce.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to