I believe the current spark config system is unfortunate in the way it has grown - you have no way of telling which sub-systems uses which configuration options without direct and detailed reading of the code.
Isolating config items for datasources into a separate namespaces (rather then using a whitelist), is a nice idea - unfortunately in this case we are dealing with configuration items that have been exposed to end-users in their current from for a significant amount of time, and Kerberos cross-cuts not only datasources, but also things like YARN. So given that fact - the best options of a way forward I can think of are: 1. Whitelisting of specific sub sections of the configuration space, or 2. Just pass in a Map[String,String] of all config values 3. Implement a specific interface for data sources to indicate/implement Kerberos support Option (1), is pretty arbitrary, and more then likely the whitelist will change from version to version as additional items get added to it. Data sources will develop dependencies on certain configuration values being present in the white list. Option (2) would work, but continues the practice of having a vaguely specified grab-bag of config items as a dependency for practically all Spark code. I am beginning to to warm to option (3), it would be a clean way of declaring that a data source supports Kerberos, and also a cleanly specified way of injecting the relevant Kerberos configuration information into the data source - and we will not need to change any user-facing configuration items as well. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org