Samuel Souza created SPARK-33088:
------------------------------------

             Summary: Enhance ExecutorPlugin API to include methods for task 
start and end events
                 Key: SPARK-33088
                 URL: https://issues.apache.org/jira/browse/SPARK-33088
             Project: Spark
          Issue Type: New Feature
          Components: Spark Core
    Affects Versions: 3.1.0
            Reporter: Samuel Souza


On https://issues.apache.org/jira/browse/SPARK-24918's 
[SIPP|[https://docs.google.com/document/d/1a20gHGMyRbCM8aicvq4LhWfQmoA5cbHBQtyqIA2hgtc/view#|https://docs.google.com/document/d/1a20gHGMyRbCM8aicvq4LhWfQmoA5cbHBQtyqIA2hgtc/edit#]],
 it was raised to potentially add methods to ExecutorPlugin interface on task 
start and end:
{quote}The basic interface can just be a marker trait, as that allows a plugin 
to monitor general characteristics of the JVM (eg. monitor memory or take 
thread dumps).   Optionally, we could include methods for task start and end 
events.   This would allow more control on monitoring -- eg., you could start 
polling thread dumps only if there was a task from a particular stage that had 
been taking too long. But anything task related is a bit trickier to decide the 
right api. Should the task end event also get the failure reason? Should those 
events get called in the same thread as the task runner, or in another thread?
{quote}
The ask is to add exactly that. I've put up a draft PR in our fork of spark 
[here| [https://github.com/palantir/spark/pull/713]] and I'm happy to push it 
upstream. Also happy to receive comments on what's the right interface to 
expose - not opinionated on that front, tried to expose the simplest interface 
for now.

The main reason for this ask is to propagate tracing information from the 
driver to the executors (https://issues.apache.org/jira/browse/SPARK-21962 has 
some context). On https://issues.apache.org/jira/browse/HADOOP-15566 I see 
we're discussing how to add tracing to the Apache ecosystem, but my problem is 
slightly different: I want to use this interface to propagate tracing 
information to my framework of choice. If the Hadoop issue gets solved we'll 
have a framework to communicate tracing information inside the Apache 
ecosystem, but it's highly unlikely that all Spark users will use the same 
common framework. Therefore we should still provide plugin interfaces where the 
tracing information can be propagated appropriately.

To give more color, in our case the tracing information is [stored in a thread 
local|[https://github.com/palantir/tracing-java/blob/develop/tracing/src/main/java/com/palantir/tracing/Tracer.java#L61]],
 therefore it needs to be set in the same thread which is executing the task. 
[*]

While our framework is specific, I imagine such an interface could be useful in 
general. Happy to hear your thoughts about it.

[*] Something I did not mention was how to propagate the tracing information 
from the driver to the executors. For that I intend to use 1. the driver's 
localProperties, which 2. will be eventually propagated to the executors' 
TaskContext, which 3. I'll be able to access from the methods above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to