I believe the current spark config system is unfortunate in the way it has
grown - you have no way of telling which sub-systems uses which
configuration options without direct and detailed reading of the code.

Isolating config items for datasources into a separate namespaces (rather
then using a whitelist), is a nice idea - unfortunately in this case we are
dealing with configuration items that have been exposed to end-users in
their current from for a significant amount of time, and Kerberos cross-cuts
not only datasources, but also things like YARN.

So given that fact - the best options of a way forward I can think of are:
1. Whitelisting of specific sub sections of the configuration space, or
2. Just pass in a Map[String,String] of all config values 
3. Implement a specific interface for data sources to indicate/implement
Kerberos support 

Option (1), is pretty arbitrary, and more then likely the whitelist will
change from version to version as additional items get added to it.  Data
sources will develop dependencies on certain configuration values being
present in the white list.

Option (2) would work, but continues the practice of having a vaguely
specified grab-bag of config items as a dependency for practically all Spark
code.

I am beginning to to warm to option (3), it would be a clean way of
declaring that a data source supports Kerberos, and also a cleanly specified
way of injecting the relevant Kerberos configuration information into the
data source - and we will not need to change any user-facing configuration
items as well.
 




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to