How to set platform-level defaults for array-like configs?

2022-07-14 Thread Shardul Mahadik
Hi Spark devs,

Spark contains a bunch of array-like configs (comma separated lists). Some
examples include `spark.sql.extensions`,
`spark.sql.queryExecutionListeners`, `spark.jars.repositories`,
`spark.extraListeners`, `spark.driver.extraClassPath` and so on (there are
a dozen or so more). As owners of the Spark platform in our organization,
we would like to set platform-level defaults, e.g. custom SQL extension and
listeners, and we use some of the above mentioned properties to do so. At
the same time, we have power users writing their own listeners, setting the
same Spark confs and thus unintentionally overriding our platform defaults.
This leads to a loss of functionality within our platform.

Previously, Spark has introduced "default" confs for a few of these
array-like configs, e.g. `spark.plugins.defaultList` for `spark.plugins`,
`spark.driver.defaultJavaOptions` for `spark.driver.extraJavaOptions`.
These properties are meant to only be set by cluster admins thus allowing
separation between platform default and user configs. However, as discussed
in https://github.com/apache/spark/pull/34856, these configs are still
client-side and can still be overridden, while also not being a scalable
solution as we cannot introduce 1 new "default" config for every array-like
config.

I wanted to know if others have experienced this issue and what systems
have been implemented to tackle this. Are there any existing solutions for
this; either client-side or server-side? (e.g. at job submission server).
Even though we cannot easily enforce this at the client-side, the
simplicity of a solution may make it more appealing.

Thanks,
Shardul


Re: [VOTE] Release Spark 3.2.0 (RC6)

2021-09-30 Thread Shardul Mahadik
I ran into https://issues.apache.org/jira/browse/SPARK-36905 when testing on 
some views in our organization. This used to work in 3.1.1. Should this be an 
RC blocker?

On 2021/09/30 11:35:28, Jacek Laskowski  wrote: 
> Hi,
> 
> I don't want to hijack the voting thread but given I faced
> https://issues.apache.org/jira/browse/SPARK-36904 in RC6 I wonder if it's
> -1.
> 
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books 
> Follow me on https://twitter.com/jaceklaskowski
> 
> 
> 
> 
> On Wed, Sep 29, 2021 at 10:28 PM Mridul Muralidharan 
> wrote:
> 
> >
> > Yi Wu helped identify an issue
> >  which causes
> > correctness (duplication) and hangs - waiting for validation to complete
> > before submitting a patch.
> >
> > Regards,
> > Mridul
> >
> > On Wed, Sep 29, 2021 at 11:34 AM Holden Karau 
> > wrote:
> >
> >> PySpark smoke tests pass, I'm going to do a last pass through the JIRAs
> >> before my vote though.
> >>
> >> On Wed, Sep 29, 2021 at 8:54 AM Sean Owen  wrote:
> >>
> >>> +1 looks good to me as before, now that a few recent issues are resolved.
> >>>
> >>>
> >>> On Tue, Sep 28, 2021 at 10:45 AM Gengliang Wang 
> >>> wrote:
> >>>
>  Please vote on releasing the following candidate as
>  Apache Spark version 3.2.0.
> 
>  The vote is open until 11:59pm Pacific time September 30 and passes if
>  a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> 
>  [ ] +1 Release this package as Apache Spark 3.2.0
>  [ ] -1 Do not release this package because ...
> 
>  To learn more about Apache Spark, please see http://spark.apache.org/
> 
>  The tag to be voted on is v3.2.0-rc6 (commit
>  dde73e2e1c7e55c8e740cb159872e081ddfa7ed6):
>  https://github.com/apache/spark/tree/v3.2.0-rc6
> 
>  The release files, including signatures, digests, etc. can be found at:
>  https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc6-bin/
> 
>  Signatures used for Spark RCs can be found in this file:
>  https://dist.apache.org/repos/dist/dev/spark/KEYS
> 
>  The staging repository for this release can be found at:
>  https://repository.apache.org/content/repositories/orgapachespark-1393
> 
>  The documentation corresponding to this release can be found at:
>  https://dist.apache.org/repos/dist/dev/spark/v3.2.0-rc6-docs/
> 
>  The list of bug fixes going into 3.2.0 can be found at the following
>  URL:
>  https://issues.apache.org/jira/projects/SPARK/versions/12349407
> 
>  This release is using the release script of the tag v3.2.0-rc6.
> 
> 
>  FAQ
> 
>  =
>  How can I help test this release?
>  =
>  If you are a Spark user, you can help us test this release by taking
>  an existing Spark workload and running on this release candidate, then
>  reporting any regressions.
> 
>  If you're working in PySpark you can set up a virtual env and install
>  the current RC and see if anything important breaks, in the Java/Scala
>  you can add the staging repository to your projects resolvers and test
>  with the RC (make sure to clean up the artifact cache before/after so
>  you don't end up building with a out of date RC going forward).
> 
>  ===
>  What should happen to JIRA tickets still targeting 3.2.0?
>  ===
>  The current list of open tickets targeted at 3.2.0 can be found at:
>  https://issues.apache.org/jira/projects/SPARK and search for "Target
>  Version/s" = 3.2.0
> 
>  Committers should look at those and triage. Extremely important bug
>  fixes, documentation, and API tweaks that impact compatibility should
>  be worked on immediately. Everything else please retarget to an
>  appropriate release.
> 
>  ==
>  But my bug isn't fixed?
>  ==
>  In order to make timely releases, we will typically not hold the
>  release unless the bug in question is a regression from the previous
>  release. That being said, if there is something which is a regression
>  that has not been correctly targeted please ping me or a committer to
>  help target the issue.
> 
> 
> >>
> >> --
> >> Twitter: https://twitter.com/holdenkarau
> >> Books (Learning Spark, High Performance Spark, etc.):
> >> https://amzn.to/2MaRAG9  
> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> >>
> >
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org