[ 
https://issues.apache.org/jira/browse/SPARK-27048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-27048:
------------------------------
    Target Version/s:   (was: 2.4.0)

> A way to execute functions on Executor Startup and Executor Exit in Standalone
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-27048
>                 URL: https://issues.apache.org/jira/browse/SPARK-27048
>             Project: Spark
>          Issue Type: Wish
>          Components: Deploy, Spark Submit
>    Affects Versions: 2.3.1, 2.3.3
>            Reporter: Ross Brigoli
>            Priority: Major
>              Labels: usability
>
> *Background*
> We have a Spark Standalone ETL workload that is heavily dependent on Apache 
> Ignite KV store for lookup/reference data. There are hundreds (400+) of 
> lookup data some are up to 300K records. We formerly used broadcast variables 
> but later found out that it was not fast enough.
> So we decided implement a caching mechanism by retrieving reference data from 
> JDBC source and put them in-memory through Apache ignite as replicated cache. 
> Each Spark worker node is also running an Ignite node (JVM). Then we let the 
> spark executors retrieve the data from Ignite through "shared memory port". 
> This is very fast but is causing instability in the Ignite cluster. The 
> reason is that when the Spark executor JVM terminates, the Ignite Data Grid 
> is terminated abnormally. This makes the Ignite cluster wait for the client 
> node (which is the spark executor) to reconnect making the Ignite cluster 
> non-responsive for a while.
> *Wish*
> We have this need for an ability to close the ignite client node gracefully 
> just before the Executor process ends. So a feature that makes it possible to 
> pass an EventHandler for "executor.onStart" and "executor.exitExecutor()" 
> would be really really useful.
> It could be a spark-submit argument or an entry in the spark-defaults.conf 
> that looks something like:
> {{spark.executor.startUpClass=com.company.ExecutorInitializer}}
> {{spark.executor.shutdownClass=com.company.ExecutorCleaner}}
> The class will have to implement an interface provided by Spark. This class 
> can then be loaded dynamically in the CoarseGrainedExecutorBackend and called 
> on the onStart() and exitExecutor() methods respectively
> This is also useful for opening and closing JDBC connections per executor 
> instead of per partition.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to