Hi dev,

I'd like to start a discussion of ConfigMap creations for Spark on K8S and
get more people involved, it's originally discussed on Github
https://github.com/apache/spark/pull/31428#issuecomment-1552531182 .Any
comments/suggestions are welcome.

To summarize, ConfigMap is an object on K8S, it has some limitation exposed
by K8S:

   1. config map is an object, it may has quota limitation and add burdens
   to K8S's api server and etcd
   2. for access control/security reasons, it may not be possible to create
   new objects from driver pod(such as the cases in this PR)

Therefore, some rules might be made when developing feature steps for spark
on K8S regarding config maps:

   1. the config map should be created as less as possible, unless it's
   unavoidable, such as a new HADOOP_CONF map
   2. the config map should only be created on the client(spark-submit)
   side, not from the driver pod.

Does that make sense?


And for the current master code, I'd like to make a proposal:
1. only create SPARK_CONF config map on the client side, records the config
map's name, the driver pod should reuse this config map to mount SPARK_CONF
on the executor side, rather creating a new ConfigMap. Therefore the driver
and executor shares the same SPARK_CONF mount. Also, this makes
spark.kubernetes.executor.disableConfigMap unnecessary

Related PR: https://github.com/apache/spark/pull/27735,
https://github.com/apache/spark/pull/31428

2. extends KubernetesFeatureConfigStep to add a method
getAdditionalDataFileForSparkConfConfigMap (an example one), to allow other
feature steps to dynamic add data files into the SparkConf config map when
building driver pod. By leveraging this method, PodTemplateConfigMapStep could
eliminate the need to create a new config map. New feature steps could also
benefit from this method.

Also cc @Dongjoon Hyun <dongjoon.h...@gmail.com> and @Prashant Sharma
<scrapco...@gmail.com>

Reply via email to