[ 
https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117090#comment-16117090
 ] 

Louis Bergelson commented on SPARK-650:
---------------------------------------

[~srowen]  Thanks for the reply and the example.  Unfortunately, I still 
believe that the singleton approach doesn't work well for our use case.  

We don't have a single resource which needs initialization and can always be 
wrapped in a singleton.  We have a sprawl of legacy dependencies that need to 
be initialized in certain ways before use, and then can be called into from 
literally hundreds of entry points.  One of the things that needs initializing 
is the set of FileSystemProviders that [~rdub] mentioned above.  This has to be 
done before potentially any file access in our dependencies.  It's implausible 
to wrap all of our library code into singleton objects and it's difficult to 
always call initResources() before every library call.  It requires a lot of 
discipline on the part of the developers.  Since we develop a framework for 
biologists to use to write tools, any thing that has to be enforced by 
convention isn't ideal and is likely to cause problems.  People will forget to 
start their work by calling initResources() or worse, they'll remember to call 
initResources(), but only at the start of the first stage.  Then they'll run 
into issues when executors die and are replaced during a later stage and the 
initialization doesn't run on the new executor.

For something that could be cleanly wrapped in a singleton I agree that the 
semantics are obvious, but for the case where you're calling init() before 
running your code, the semantics are confusing and error prone.  

I'm sure there are complications from introducing a setup hook, but the one you 
mention seems simple enough to me.  If a setup fails, that executor is killed 
and can't schedule tasks.  There would probably have to be a mechanism for 
timing out after a certain number of failed executor starts, but I suspect that 
that exists already in some fashion for other sorts of failures.


> Add a "setup hook" API for running initialization code on each executor
> -----------------------------------------------------------------------
>
>                 Key: SPARK-650
>                 URL: https://issues.apache.org/jira/browse/SPARK-650
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> Would be useful to configure things like reporting libraries



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to