[jira] [Created] (SPARK-33088) Enhance ExecutorPlugin API to include methods for task start and end events

Samuel Souza (Jira) Wed, 07 Oct 2020 11:50:16 -0700

Samuel Souza created SPARK-33088:
------------------------------------

             Summary: Enhance ExecutorPlugin API to include methods for task 
start and end events
                 Key: SPARK-33088
                 URL: https://issues.apache.org/jira/browse/SPARK-33088
             Project: Spark
          Issue Type: New Feature
          Components: Spark Core
    Affects Versions: 3.1.0
            Reporter: Samuel Souza

On https://issues.apache.org/jira/browse/SPARK-24918's
[SIPP|[https://docs.google.com/document/d/1a20gHGMyRbCM8aicvq4LhWfQmoA5cbHBQtyqIA2hgtc/view#|https://docs.google.com/document/d/1a20gHGMyRbCM8aicvq4LhWfQmoA5cbHBQtyqIA2hgtc/edit#]],
it was raised to potentially add methods to ExecutorPlugin interface on task
start and end:
{quote}The basic interface can just be a marker trait, as that allows a plugin
to monitor general characteristics of the JVM (eg. monitor memory or take
thread dumps). Optionally, we could include methods for task start and end
events. This would allow more control on monitoring -- eg., you could start
polling thread dumps only if there was a task from a particular stage that had
been taking too long. But anything task related is a bit trickier to decide the
right api. Should the task end event also get the failure reason? Should those
events get called in the same thread as the task runner, or in another thread?
{quote}
The ask is to add exactly that. I've put up a draft PR in our fork of spark
[here| [https://github.com/palantir/spark/pull/713]] and I'm happy to push it
upstream. Also happy to receive comments on what's the right interface to
expose - not opinionated on that front, tried to expose the simplest interface
for now.

The main reason for this ask is to propagate tracing information from the
driver to the executors (https://issues.apache.org/jira/browse/SPARK-21962 has
some context). On https://issues.apache.org/jira/browse/HADOOP-15566 I see
we're discussing how to add tracing to the Apache ecosystem, but my problem is
slightly different: I want to use this interface to propagate tracing
information to my framework of choice. If the Hadoop issue gets solved we'll
have a framework to communicate tracing information inside the Apache
ecosystem, but it's highly unlikely that all Spark users will use the same
common framework. Therefore we should still provide plugin interfaces where the
tracing information can be propagated appropriately.

To give more color, in our case the tracing information is [stored in a thread
local|[https://github.com/palantir/tracing-java/blob/develop/tracing/src/main/java/com/palantir/tracing/Tracer.java#L61]],
therefore it needs to be set in the same thread which is executing the task.
[*]

While our framework is specific, I imagine such an interface could be useful in
general. Happy to hear your thoughts about it.

[*] Something I did not mention was how to propagate the tracing information
from the driver to the executors. For that I intend to use 1. the driver's
localProperties, which 2. will be eventually propagated to the executors'
TaskContext, which 3. I'll be able to access from the methods above.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33088) Enhance ExecutorPlugin API to include methods for task start and end events

Reply via email to