Hi,

  Wenchen, would be great if you could chime in with your thoughts - given
the feedback you originally had on the PR.
It would be great to hear feedback from others on this, particularly folks
managing spark deployments - how this is mitigated/avoided in your
case, any other pain points with configs in this context.


Regards,
Mridul

On Wed, Jul 27, 2022 at 12:28 PM Erik Krogen <xkro...@apache.org> wrote:

> I find there's substantial value in being able to set defaults, and I
> think we can see that the community finds value in it as well, given the
> handful of "default"-like configs that exist today as mentioned in
> Shardul's email. The mismatch of conventions used today (suffix with
> ".defaultList", change "extra" to "default", ...) is confusing and
> inconsistent, plus requires one-off additions for each config.
>
> My proposal here would be:
>
>    - Define a clear convention, e.g. a suffix of ".default" that enables
>    a default to be set and merged
>    - Document this convention in configuration.md so that we can avoid
>    separately documenting each default-config, and instead just add a note in
>    the docs for the normal config.
>    - Adjust the withPrepended method
>    
> <https://github.com/apache/spark/blob/c7c51bcab5cb067d36bccf789e0e4ad7f37ffb7c/core/src/main/scala/org/apache/spark/internal/config/ConfigBuilder.scala#L219>
>    added in #24804 <https://github.com/apache/spark/pull/24804> to
>    leverage this convention instead of each usage instance re-defining the
>    additional config name
>    - Do a comprehensive review of applicable configs and enable them all
>    to use the newly updated withPrepended method
>
> Wenchen, you expressed some concerns with adding more default configs in
> #34856 <https://github.com/apache/spark/pull/34856>, would this proposal
> address those concerns?
>
> Thanks,
> Erik
>
> On Wed, Jul 13, 2022 at 11:54 PM Shardul Mahadik <
> shardulsmaha...@gmail.com> wrote:
>
>> Hi Spark devs,
>>
>> Spark contains a bunch of array-like configs (comma separated lists).
>> Some examples include `spark.sql.extensions`,
>> `spark.sql.queryExecutionListeners`, `spark.jars.repositories`,
>> `spark.extraListeners`, `spark.driver.extraClassPath` and so on (there are
>> a dozen or so more). As owners of the Spark platform in our organization,
>> we would like to set platform-level defaults, e.g. custom SQL extension and
>> listeners, and we use some of the above mentioned properties to do so. At
>> the same time, we have power users writing their own listeners, setting the
>> same Spark confs and thus unintentionally overriding our platform defaults.
>> This leads to a loss of functionality within our platform.
>>
>> Previously, Spark has introduced "default" confs for a few of these
>> array-like configs, e.g. `spark.plugins.defaultList` for `spark.plugins`,
>> `spark.driver.defaultJavaOptions` for `spark.driver.extraJavaOptions`.
>> These properties are meant to only be set by cluster admins thus allowing
>> separation between platform default and user configs. However, as discussed
>> in https://github.com/apache/spark/pull/34856, these configs are still
>> client-side and can still be overridden, while also not being a scalable
>> solution as we cannot introduce 1 new "default" config for every array-like
>> config.
>>
>> I wanted to know if others have experienced this issue and what systems
>> have been implemented to tackle this. Are there any existing solutions for
>> this; either client-side or server-side? (e.g. at job submission server).
>> Even though we cannot easily enforce this at the client-side, the
>> simplicity of a solution may make it more appealing.
>>
>> Thanks,
>> Shardul
>>
>

Reply via email to