Hi, Wenchen, would be great if you could chime in with your thoughts - given the feedback you originally had on the PR. It would be great to hear feedback from others on this, particularly folks managing spark deployments - how this is mitigated/avoided in your case, any other pain points with configs in this context.
Regards, Mridul On Wed, Jul 27, 2022 at 12:28 PM Erik Krogen <xkro...@apache.org> wrote: > I find there's substantial value in being able to set defaults, and I > think we can see that the community finds value in it as well, given the > handful of "default"-like configs that exist today as mentioned in > Shardul's email. The mismatch of conventions used today (suffix with > ".defaultList", change "extra" to "default", ...) is confusing and > inconsistent, plus requires one-off additions for each config. > > My proposal here would be: > > - Define a clear convention, e.g. a suffix of ".default" that enables > a default to be set and merged > - Document this convention in configuration.md so that we can avoid > separately documenting each default-config, and instead just add a note in > the docs for the normal config. > - Adjust the withPrepended method > > <https://github.com/apache/spark/blob/c7c51bcab5cb067d36bccf789e0e4ad7f37ffb7c/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala#L219> > added in #24804 <https://github.com/apache/spark/pull/24804> to > leverage this convention instead of each usage instance re-defining the > additional config name > - Do a comprehensive review of applicable configs and enable them all > to use the newly updated withPrepended method > > Wenchen, you expressed some concerns with adding more default configs in > #34856 <https://github.com/apache/spark/pull/34856>, would this proposal > address those concerns? > > Thanks, > Erik > > On Wed, Jul 13, 2022 at 11:54 PM Shardul Mahadik < > shardulsmaha...@gmail.com> wrote: > >> Hi Spark devs, >> >> Spark contains a bunch of array-like configs (comma separated lists). >> Some examples include `spark.sql.extensions`, >> `spark.sql.queryExecutionListeners`, `spark.jars.repositories`, >> `spark.extraListeners`, `spark.driver.extraClassPath` and so on (there are >> a dozen or so more). As owners of the Spark platform in our organization, >> we would like to set platform-level defaults, e.g. custom SQL extension and >> listeners, and we use some of the above mentioned properties to do so. At >> the same time, we have power users writing their own listeners, setting the >> same Spark confs and thus unintentionally overriding our platform defaults. >> This leads to a loss of functionality within our platform. >> >> Previously, Spark has introduced "default" confs for a few of these >> array-like configs, e.g. `spark.plugins.defaultList` for `spark.plugins`, >> `spark.driver.defaultJavaOptions` for `spark.driver.extraJavaOptions`. >> These properties are meant to only be set by cluster admins thus allowing >> separation between platform default and user configs. However, as discussed >> in https://github.com/apache/spark/pull/34856, these configs are still >> client-side and can still be overridden, while also not being a scalable >> solution as we cannot introduce 1 new "default" config for every array-like >> config. >> >> I wanted to know if others have experienced this issue and what systems >> have been implemented to tackle this. Are there any existing solutions for >> this; either client-side or server-side? (e.g. at job submission server). >> Even though we cannot easily enforce this at the client-side, the >> simplicity of a solution may make it more appealing. >> >> Thanks, >> Shardul >> >