[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961672#comment-16961672 ] Brandon commented on SPARK-24918: - [~nsheth] placing the plugin class inside a jar and passing as `–jars` to spark-submit should sufficient, right? It seems this is not enough to make the class visible to the executor. I have had to explicitly add this jar to `spark.executor.extraClassPath` for plugins to load correctly. > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Assignee: Nihar Sheth >Priority: Major > Labels: SPIP, memory-analysis > Fix For: 2.4.0 > > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595182#comment-16595182 ] Imran Rashid commented on SPARK-24918: -- Yes, that's right, its to avoid putting a reference to the static initializer in every single task. That's a nuisance when your task definitions are For my intended use, the task itself doesn't depend on the plugin at all. The plugin gives you added debug info, that's all. That's why its OK for the original code to not know anything about the plugin. There were other requests on the earlier jira to do lifecycle management, eg. eager initialization of resources. I agree that's not as clear (if you need the resource, your task will have to reference it, so you have a spot for your static initializer). In any case, you could use this same mechanism for that as well, if you wanted. > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595020#comment-16595020 ] Sean Owen commented on SPARK-24918: --- Wait, why doesn't a static init "run everywhere"? is "everywhere" the same as "on every executor that will run a task"? and why would an executor be able to init without touching a static initializer that the user code touches? Yes you just call it somewhere in any task that needs the init. There's no overhead in checking the init and initting if not. Therein is the issue, I suppose: you have to put some reference to a class or method in every task. The upshot is it requires no additional mechanism or reasoning about what thread runs what. > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589483#comment-16589483 ] Apache Spark commented on SPARK-24918: -- User 'NiharS' has created a pull request for this issue: https://github.com/apache/spark/pull/22192 > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589249#comment-16589249 ] Marcelo Vanzin commented on SPARK-24918: Unless he gives you push access to his repo, that's really the only option you have. > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16589248#comment-16589248 ] Nihar Sheth commented on SPARK-24918: - [~irashid] has asked me to add testing to his PR. I'm not sure what the standard procedure is, can I just open another PR with his changes and the tests? > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582009#comment-16582009 ] Felix Cheung commented on SPARK-24918: -- I'd tend to agree with opt in - too many mistakes with copy/paste script I've seen... explicit config would make it easier to track down if this is loaded by "accident". > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579851#comment-16579851 ] Thomas Graves commented on SPARK-24918: --- Personally I like the explicit config on better (spark.executor.plugins). Opt out in my opinion is easier for users to mess up. Someone includes jar someone some other group and doesn't realize it has this ServiceLoader. > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579042#comment-16579042 ] Marcelo Vanzin commented on SPARK-24918: I like the idea in general. On the implementation side, instead of {{spark.executor.plugins}}, how about using {{java.util.ServiceLoader}}? That's one less configuration needed to enable these plugins. The downside is that if the jar is visible to Spark, it will be invoked (so it becomes "opt out" instead of "opt in", if you want to add an option to disable specific plugins). I thought about suggesting a new API in SparkContext to programatically add plugins, but that might be too messy. Better to start simple. > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578894#comment-16578894 ] Imran Rashid commented on SPARK-24918: -- With dynamic allocation you don't have a good place to run {code} df.mapPartitions { it => MyResource.initIfNeeded() it.map(...) } {code} Executors can come and go, you can't ensure that runs everywhere. Even if you make "too many" tasks, it could be your job starts out with a very small number of tasks for a while before ramping up. So after you run your initialization with the added initResource code, many executors get torn down during the first part of the real job as they sit idle; then when the job ramps up, you get new executors, which never had your initialization run. You'd have to put {{MyResource.initIfNeeded()}} inside *every* task. (Note that for the debug use case, the initializer is totally unnecessary for the task to complete -- if the task actually depended on it, then of course you should have that logic in each task.) I think there are a large class of users who can add "--conf spark.executor.plugins com.mycompany.WhizzBangDebugPlugin --jars whizzbangdebug.jar" to the command line arguments, that couldn't add in that code sample (even with static allocation). They're not the ones that are *writing* the plugins, they just need to be able to enable it. {quote}What do you do if init fails? retry or fail?{quote} good question, Tom asked the same thing on the pr. I suggested the executor just fails to start. If a plugin wanted to be "safe", it could catch exceptions in its own initialization. > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578872#comment-16578872 ] Sean Owen commented on SPARK-24918: --- This is just for per-executor initialization right? What's the issue with dynamic allocation – executors still start there, JVMs still initialize; how is it particularly hard? What do you do if init fails? retry or fail? Would SQL-only users meaningfully be able to use this if they don't know about code anyway? Is turning on debug code not something for config options? I guess I don't get why this still can't be solved by a static initializer. I'm not dead-set against this, just think it will add some complexity and not sure it gains a lot. > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578858#comment-16578858 ] Imran Rashid commented on SPARK-24918: -- [~lucacanali] OK I see the case for what you're proposing -- its hard too setup that communication between the driver & executors without *some* initial setup message. Still ... I'm a bit reluctant to include that now, until we see someone actually builds something that uses it. I realizes you might be hesitant to do that until you know it can be built on a stable api, but I don't think we can get around that. > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578853#comment-16578853 ] Imran Rashid commented on SPARK-24918: -- Ah, right, thanks [~vanzin], I knew I had seen this before. [~srowen], you argued the most against SPARK-650 -- have I made the case here? I did indeed at first do exactly what you suggested, using a static initializer, but realized it was not great for a couple of very important reasons: * dynamic allocation * turning on a "debug" mode without any code changes (you'd be surprised how big a hurdle this is for something in production) * "sql only" apps, where the end user barely knows anything about calling a mapPartitions function > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16578767#comment-16578767 ] Marcelo Vanzin commented on SPARK-24918: For reference: this looks kinda similar to SPARK-650. > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572850#comment-16572850 ] Luca Canali commented on SPARK-24918: - [~irashid] I agree that the proposal should be something of low complexity for v1. I don't have a clear design in mind yet, on raw idea could be as follows: to have the option, for some plugins if needed, to open a TCP socket that the plugin could use to listen for incoming connections and in general to use as a control and data sharing mechanism. The list of allocated endpoints (server + port) + possibly a security token should be made available on the driver via an API and then user programs can pick this up for their custom "plugin control scripts". > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571793#comment-16571793 ] Imran Rashid commented on SPARK-24918: -- [~lucacanali] you could certainly sample stack traces, but the current proposal doesn't cover communication with the driver at all. IMO that is too much complexity for v1. Did you have a design in mind for that? You could use the executor plugin to build your own communication between the driver and executors, but depending on what you want, might be tricky. Do you think you could setup the configuration you need statically, when the application starts? Eg. i had run a test to take stack traces anytime a task was running over some configurable time -- then I just needed task start & end events in my plugin. > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571748#comment-16571748 ] Luca Canali commented on SPARK-24918: - I have a use case where I would like to sample stack traces of the Spark executors across the cluster and later aggregate the data into a Flame Graph. I may want to do data collection only for a short duration (due to the overhead) and possibly be able to start and stop data collection at will from the driver. Similar use cases would be to deploy "probes" using tools for dynamic tracing to measure specific details of the workload. I think the executor plugin would be useful for this. In additional it would be nice to have a mechanism to send and receive commands/data between the Spark driver and the plugin process. Would this proposal make sense in the context of this SPIP or would it add too much complexity to the original proposal? > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: SPIP, memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568471#comment-16568471 ] Imran Rashid commented on SPARK-24918: -- attached an [spip proposal|https://docs.google.com/document/d/1a20gHGMyRbCM8aicvq4LhWfQmoA5cbHBQtyqIA2hgtc/edit?usp=sharing] > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16562521#comment-16562521 ] Apache Spark commented on SPARK-24918: -- User 'squito' has created a pull request for this issue: https://github.com/apache/spark/pull/21923 > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558829#comment-16558829 ] Imran Rashid commented on SPARK-24918: -- The only thing I *really* needed was just to be able to instantiate some arbitrary class when the executor starts up. My instrumentation code could do the rest via reflection from there. But I might want more eventually, eg. with task start & end events, I could imagine setting something up to periodically take stack traces only for if there is a stage running in stage X or for longer than Y seconds etc. > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558331#comment-16558331 ] Thomas Graves commented on SPARK-24918: --- I think this is a good idea. I thought I had seen a Jira around this before but couldn't find it. It might have been a task run pre-hook Its also good question about what we tie into it. I haven't looked at the details of your spark-memory debugging module, what does that class all need? I could see people doing all sorts of things from checking node health to preloading something, etc. so we should definitely think about the possibilities here and what we may or may not want to allow. > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556964#comment-16556964 ] Imran Rashid commented on SPARK-24918: -- I have some changes with an initial draft of this, at least, which I'll post soon > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24918) Executor Plugin API
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555939#comment-16555939 ] Imran Rashid commented on SPARK-24918: -- [~jerryshao] [~tgraves] you might be interested in this -- I feel like this has come up in past discussions (though I couldn't find any jiras about it). > Executor Plugin API > --- > > Key: SPARK-24918 > URL: https://issues.apache.org/jira/browse/SPARK-24918 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Imran Rashid >Priority: Major > Labels: memory-analysis > > It would be nice if we could specify an arbitrary class to run within each > executor for debugging and instrumentation. Its hard to do this currently > because: > a) you have no idea when executors will come and go with DynamicAllocation, > so don't have a chance to run custom code before the first task > b) even with static allocation, you'd have to change the code of your spark > app itself to run a special task to "install" the plugin, which is often > tough in production cases when those maintaining regularly running > applications might not even know how to make changes to the application. > For example, https://github.com/squito/spark-memory could be used in a > debugging context to understand memory use, just by re-running an application > with extra command line arguments (as opposed to rebuilding spark). > I think one tricky part here is just deciding the api, and how its versioned. > Does it just get created when the executor starts, and thats it? Or does it > get more specific events, like task start, task end, etc? Would we ever add > more events? It should definitely be a {{DeveloperApi}}, so breaking > compatibility would be allowed ... but still should be avoided. We could > create a base class that has no-op implementations, or explicitly version > everything. > Note that this is not needed in the driver as we already have SparkListeners > (even if you don't care about the SparkListenerEvents and just want to > inspect objects in the JVM, its still good enough). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org