[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117090#comment-16117090 ]
Louis Bergelson commented on SPARK-650: --------------------------------------- [~srowen] Thanks for the reply and the example. Unfortunately, I still believe that the singleton approach doesn't work well for our use case. We don't have a single resource which needs initialization and can always be wrapped in a singleton. We have a sprawl of legacy dependencies that need to be initialized in certain ways before use, and then can be called into from literally hundreds of entry points. One of the things that needs initializing is the set of FileSystemProviders that [~rdub] mentioned above. This has to be done before potentially any file access in our dependencies. It's implausible to wrap all of our library code into singleton objects and it's difficult to always call initResources() before every library call. It requires a lot of discipline on the part of the developers. Since we develop a framework for biologists to use to write tools, any thing that has to be enforced by convention isn't ideal and is likely to cause problems. People will forget to start their work by calling initResources() or worse, they'll remember to call initResources(), but only at the start of the first stage. Then they'll run into issues when executors die and are replaced during a later stage and the initialization doesn't run on the new executor. For something that could be cleanly wrapped in a singleton I agree that the semantics are obvious, but for the case where you're calling init() before running your code, the semantics are confusing and error prone. I'm sure there are complications from introducing a setup hook, but the one you mention seems simple enough to me. If a setup fails, that executor is killed and can't schedule tasks. There would probably have to be a mechanism for timing out after a certain number of failed executor starts, but I suspect that that exists already in some fashion for other sorts of failures. > Add a "setup hook" API for running initialization code on each executor > ----------------------------------------------------------------------- > > Key: SPARK-650 > URL: https://issues.apache.org/jira/browse/SPARK-650 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Reporter: Matei Zaharia > Priority: Minor > > Would be useful to configure things like reporting libraries -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org