[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578858#comment-16578858 ]
Imran Rashid edited comment on SPARK-24918 at 8/13/18 8:15 PM: --------------------------------------------------------------- [~lucacanali] OK I see the case for what you're proposing -- its hard to setup that communication between the driver & executors without *some* initial setup message. Still ... I'm a bit reluctant to include that now, until we see someone actually builds something that uses it. I realizes you might be hesitant to do that until you know it can be built on a stable api, but I don't think we can get around that. was (Author: irashid): [~lucacanali] OK I see the case for what you're proposing -- its hard too setup that communication between the driver & executors without *some* initial setup message. Still ... I'm a bit reluctant to include that now, until we see someone actually builds something that uses it. I realizes you might be hesitant to do that until you know it can be built on a stable api, but I don't think we can get around that. > Executor Plugin API > ------------------- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Affects Versions: 2.4.0 > Reporter: Imran Rashid > Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org