[ 
https://issues.apache.org/jira/browse/SPARK-30985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Sharma updated SPARK-30985:
------------------------------------
    Description: 
SPARK_CONF_DIR hosts configuration files like, 
 1) spark-defaults.conf - containing all the spark properties.
 2) log4j.properties - Logger configuration.
 3) spark-env.sh - Environment variables to be setup at driver and executor.
 4) core-site.xml - Hadoop related configuration.
 5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
 6) metrics.properties - Spark metrics.
 7) Any user specific - library or framework specific configuration file.

Traditionally, SPARK_CONF_DIR has been the home to all user specific 
configuration files.

So this feature, will let the user specific configuration files be mounted on 
the driver and executor pods' SPARK_CONF_DIR.



Please review the attached design doc, for more details.

  was:
SPARK_CONF_DIR hosts configuration files like, 
1) spark-defaults.conf - containing all the spark properties.
2) log4j.properties - Logger configuration.
3) spark-env.sh - Environment variables to be setup at driver and executor.
4) core-site.xml - Hadoop related configuration.
5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
6) metrics.properties - Spark metrics.
7) Any user specific - library or framework specific configuration file.

Traditionally, SPARK_CONF_DIR has been the home to all user specific 
configuration files and the default behaviour in the Yarn or standalone mode is 
that these configuration files are copied to the worker nodes as required by 
the users themselves. In other words, they are not auto-copied.

But, in the case of  spark on kubernetes, we use spark images and generally 
these images are approved or undergoe some kind of standardisation. These files 
cannot be simply copied to the SPARK_CONF_DIR of the running executor and 
driver pods by the user. 

So, at the moment we have special casing for providing each configuration and 
for any other user specific configuration files, the process is more complex, 
i.e. - e.g. one can start with their own custom image of spark with 
configuration files pre installed etc..
Examples of special casing are:
1. Hadoop configuration in spark.kubernetes.hadoop.configMapName
2. Spark-env.sh as in spark.kubernetes.driverEnv.[EnvironmentVariableName]
3. Log4j.properties as in https://github.com/apache/spark/pull/26193
... And for those such special casing does not exist, they are simply out of 
luck.

So this feature, will let the user specific configuration files be mounted on 
the driver and executor pods' SPARK_CONF_DIR.
At the moment it is not clear, if there is a need to, let user specify which 
config files to propagate - to driver and or executor. But, if there is a case 
that feature will be helpful, we can increase the scope of this work or create 
another JIRA issue to track that work.


> Propagate SPARK_CONF_DIR files to driver and exec pods.
> -------------------------------------------------------
>
>                 Key: SPARK-30985
>                 URL: https://issues.apache.org/jira/browse/SPARK-30985
>             Project: Spark
>          Issue Type: Improvement
>          Components: Kubernetes, Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Prashant Sharma
>            Priority: Major
>
> SPARK_CONF_DIR hosts configuration files like, 
>  1) spark-defaults.conf - containing all the spark properties.
>  2) log4j.properties - Logger configuration.
>  3) spark-env.sh - Environment variables to be setup at driver and executor.
>  4) core-site.xml - Hadoop related configuration.
>  5) fairscheduler.xml - Spark's fair scheduling policy at the job level.
>  6) metrics.properties - Spark metrics.
>  7) Any user specific - library or framework specific configuration file.
> Traditionally, SPARK_CONF_DIR has been the home to all user specific 
> configuration files.
> So this feature, will let the user specific configuration files be mounted on 
> the driver and executor pods' SPARK_CONF_DIR.
> Please review the attached design doc, for more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to