I find there's substantial value in being able to set defaults, and I think
we can see that the community finds value in it as well, given the handful
of "default"-like configs that exist today as mentioned in Shardul's email.
The mismatch of conventions used today (suffix with ".defaultList", change
"extra" to "default", ...) is confusing and inconsistent, plus requires
one-off additions for each config.

My proposal here would be:

   - Define a clear convention, e.g. a suffix of ".default" that enables a
   default to be set and merged
   - Document this convention in configuration.md so that we can avoid
   separately documenting each default-config, and instead just add a note in
   the docs for the normal config.
   - Adjust the withPrepended method
   
<https://github.com/apache/spark/blob/c7c51bcab5cb067d36bccf789e0e4ad7f37ffb7c/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala#L219>
   added in #24804 <https://github.com/apache/spark/pull/24804> to leverage
   this convention instead of each usage instance re-defining the additional
   config name
   - Do a comprehensive review of applicable configs and enable them all to
   use the newly updated withPrepended method

Wenchen, you expressed some concerns with adding more default configs in
#34856 <https://github.com/apache/spark/pull/34856>, would this proposal
address those concerns?

Thanks,
Erik

On Wed, Jul 13, 2022 at 11:54 PM Shardul Mahadik <shardulsmaha...@gmail.com>
wrote:

> Hi Spark devs,
>
> Spark contains a bunch of array-like configs (comma separated lists). Some
> examples include `spark.sql.extensions`,
> `spark.sql.queryExecutionListeners`, `spark.jars.repositories`,
> `spark.extraListeners`, `spark.driver.extraClassPath` and so on (there are
> a dozen or so more). As owners of the Spark platform in our organization,
> we would like to set platform-level defaults, e.g. custom SQL extension and
> listeners, and we use some of the above mentioned properties to do so. At
> the same time, we have power users writing their own listeners, setting the
> same Spark confs and thus unintentionally overriding our platform defaults.
> This leads to a loss of functionality within our platform.
>
> Previously, Spark has introduced "default" confs for a few of these
> array-like configs, e.g. `spark.plugins.defaultList` for `spark.plugins`,
> `spark.driver.defaultJavaOptions` for `spark.driver.extraJavaOptions`.
> These properties are meant to only be set by cluster admins thus allowing
> separation between platform default and user configs. However, as discussed
> in https://github.com/apache/spark/pull/34856, these configs are still
> client-side and can still be overridden, while also not being a scalable
> solution as we cannot introduce 1 new "default" config for every array-like
> config.
>
> I wanted to know if others have experienced this issue and what systems
> have been implemented to tackle this. Are there any existing solutions for
> this; either client-side or server-side? (e.g. at job submission server).
> Even though we cannot easily enforce this at the client-side, the
> simplicity of a solution may make it more appealing.
>
> Thanks,
> Shardul
>

Reply via email to