Prashant Sharma created SPARK-30985:
---------------------------------------

             Summary: Propagate SPARK_CONF_DIR files to driver and exec pods.
                 Key: SPARK-30985
                 URL: https://issues.apache.org/jira/browse/SPARK-30985
             Project: Spark
          Issue Type: Improvement
          Components: Kubernetes
    Affects Versions: 3.0.0
            Reporter: Prashant Sharma
            Assignee: Prashant Sharma


SPARK_CONF_DIR hosts configuration files like, 
1) spark-defaults.conf - containing all the spark properties.
2) log4j.properties - Logger configuration.
3) spark-env.sh - Environment variables to be setup at driver and executor.
4) core-site.xml - Hadoop related configuration.
5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
6) metrics.properties - Spark metrics.
7) Any user specific - library or framework specific configuration file.

Traditionally, SPARK_CONF_DIR has been the home to all user specific 
configuration files and the default behaviour in the Yarn or standalone mode is 
that these configuration files are copied to the worker nodes as required by 
the users themselves. In other words, they are not auto-copied.

But, in the case of  spark on kubernetes, we use spark images and generally 
these images are approved or undergoe some kind of standardisation. These files 
cannot be simply copied to the SPARK_CONF_DIR of the running executor and 
driver pods by the user. 

So, at the moment we have special casing for providing each configuration and 
for any other user specific configuration files, the process is more complex, 
i.e. - e.g. one can start with their own custom image of spark with 
configuration files pre installed etc..
Examples of special casing are:
1. Hadoop configuration in spark.kubernetes.hadoop.configMapName
2. Spark-env.sh as in spark.kubernetes.driverEnv.[EnvironmentVariableName]
3. Log4j.properties as in https://github.com/apache/spark/pull/26193
... And for those such special casing does not exist, they are simply out of 
luck.

So this feature, will let the user specific configuration files be mounted on 
the driver and executor pods' SPARK_CONF_DIR.
At the moment it is not clear, if there is a need to, let user specify which 
config files to propagate - to driver and or executor. But, if there is a case 
that feature will be helpful, we can increase the scope of this work or create 
another JIRA issue to track that work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to