[ https://issues.apache.org/jira/browse/SPARK-27048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-27048: ------------------------------ Target Version/s: (was: 2.4.0) > A way to execute functions on Executor Startup and Executor Exit in Standalone > ------------------------------------------------------------------------------ > > Key: SPARK-27048 > URL: https://issues.apache.org/jira/browse/SPARK-27048 > Project: Spark > Issue Type: Wish > Components: Deploy, Spark Submit > Affects Versions: 2.3.1, 2.3.3 > Reporter: Ross Brigoli > Priority: Major > Labels: usability > > *Background* > We have a Spark Standalone ETL workload that is heavily dependent on Apache > Ignite KV store for lookup/reference data. There are hundreds (400+) of > lookup data some are up to 300K records. We formerly used broadcast variables > but later found out that it was not fast enough. > So we decided implement a caching mechanism by retrieving reference data from > JDBC source and put them in-memory through Apache ignite as replicated cache. > Each Spark worker node is also running an Ignite node (JVM). Then we let the > spark executors retrieve the data from Ignite through "shared memory port". > This is very fast but is causing instability in the Ignite cluster. The > reason is that when the Spark executor JVM terminates, the Ignite Data Grid > is terminated abnormally. This makes the Ignite cluster wait for the client > node (which is the spark executor) to reconnect making the Ignite cluster > non-responsive for a while. > *Wish* > We have this need for an ability to close the ignite client node gracefully > just before the Executor process ends. So a feature that makes it possible to > pass an EventHandler for "executor.onStart" and "executor.exitExecutor()" > would be really really useful. > It could be a spark-submit argument or an entry in the spark-defaults.conf > that looks something like: > {{spark.executor.startUpClass=com.company.ExecutorInitializer}} > {{spark.executor.shutdownClass=com.company.ExecutorCleaner}} > The class will have to implement an interface provided by Spark. This class > can then be loaded dynamically in the CoarseGrainedExecutorBackend and called > on the onStart() and exitExecutor() methods respectively > This is also useful for opening and closing JDBC connections per executor > instead of per partition. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org