Hi Spark devs,

Spark contains a bunch of array-like configs (comma separated lists). Some
examples include `spark.sql.extensions`,
`spark.sql.queryExecutionListeners`, `spark.jars.repositories`,
`spark.extraListeners`, `spark.driver.extraClassPath` and so on (there are
a dozen or so more). As owners of the Spark platform in our organization,
we would like to set platform-level defaults, e.g. custom SQL extension and
listeners, and we use some of the above mentioned properties to do so. At
the same time, we have power users writing their own listeners, setting the
same Spark confs and thus unintentionally overriding our platform defaults.
This leads to a loss of functionality within our platform.

Previously, Spark has introduced "default" confs for a few of these
array-like configs, e.g. `spark.plugins.defaultList` for `spark.plugins`,
`spark.driver.defaultJavaOptions` for `spark.driver.extraJavaOptions`.
These properties are meant to only be set by cluster admins thus allowing
separation between platform default and user configs. However, as discussed
in https://github.com/apache/spark/pull/34856, these configs are still
client-side and can still be overridden, while also not being a scalable
solution as we cannot introduce 1 new "default" config for every array-like
config.

I wanted to know if others have experienced this issue and what systems
have been implemented to tackle this. Are there any existing solutions for
this; either client-side or server-side? (e.g. at job submission server).
Even though we cannot easily enforce this at the client-side, the
simplicity of a solution may make it more appealing.

Thanks,
Shardul

Reply via email to