Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17723 > The only public interface being added in this change is ServiceCredentialProvider. It's an interface that service-specific libraries (e.g. a Solr connector, or a Kudu connector) would extend, and user applications would never touch. It's also an existing interface, which was in the YARN module but, really, has functionality that is independent of YARN. As I detailed above, the exposed spi contract is not just the interface, but also how it is to be used by spark. It includes the environment `obtainCredentials` is invoked within, use of `Credentials` as container for tokens and how `Credentials` are leveraged in driver/executor. This, in entirety, defines the public contract - not just the methods in the interface : and this is dependent on how hadoop-security authenticates/authorizes services. > And, on top of that, I don't see a point in abstracting it further. It would just be abstraction for the sake of abstraction. It would ensure that spark is not tied to hadoop security, but can continue to leverage it for managing long running applications/services. Once interfaces are exposed, and have been exposed for a few releases, it is exceedingly difficult for users if we break/modify them - irrespective of the stability annotation's we add. A straw man pseudocode proposal which elaborates is: ``` @Unstable // rename trait BaseServiceTokenProvider[C, T] { // Must match a supported ServiceTokenManager.serviceType def serviceType: String def serviceName: String def credentialsRequired(conf: C): Boolean def obtainCredentials(conf: C, tokenContainer: T): Option[Long] } // Existing ServiceCredentialProvider becomes @Unstable trait ServiceCredentialProvider extends BaseServiceTokenProvider[Configuration, Credentials] { override def serviceType: String = ServiceTokenManager.HADOOP_SECURITY_SERVICE_TYPE } // Modify ConfigurableCredentialManager to segregate by serviceType of ServiceTokenProvider // which will be queried by ServiceTokenManager. If unknown serviceType, then log error. @Unstable trait ServiceTokenManager { val HADOOP_SECURITY_SERVICE_TYPE = "hadoop-security" def serviceType: String // Return false if validation fails. Invoked in driver. def configure(conf: org.apache.spark.SparkConf): Boolean // Invoked in driver. If required, application of tokens to driver to be done as part of // acquireTokens def acquireTokens(manager: ConfigurableCredentialManager): Array[Byte] // Invoked in executor def applyTokens(data: Array[Byte]) } // Existing implementation will be encapsulated within this class. private[spark] class YarnServiceTokenManager extends ServiceTokenManager { override def serviceType: String = HADOOP_SECURITY_SERVICE_TYPE // In driver override def configure(conf: SparkConf): Boolean = { // save principal, keytab; create freshHadoopConf // configure true } override def acquireTokens(manager: ConfigurableCredentialManager): Array[Byte] = { val ugi = UGI.loginUserFromKeytabAndReturnUGI(principal, keytab) val tempCreds = ugi.getCredentials ugi.doAs { manager.getCredentialProviders(serviceType).flatMap { provider => // existing code in ConfigurableCredentialManager.obtainCredentials }.foldLeft(Long.MaxValue)(math.min) } // apply it to driver UGI.currentUser.addCredentials(tempCreds) // serialize tempCreds to Array[Byte] serialize(tempCreds) } override def applyTokens(data: Array[Byte]): Unit = { // deserialize data into Credentials; apply it to UGI.currentUser } } // Spark core itself becomes decoupled from hadoop-security with the above. // It also allows us to add other models if applicable. // Changes to driver to leverage above would essentially be: // a) load ServiceTokenManager's via ServiceLoader. // b) create CredentialManager - which loads BaseServiceTokenProvider via service loader. // b.1) Reject any token provider which has a service type not loaded in (a) // c) configure (a) managers.foreach(_.configure(sparkConf)) // token acquisition val acquiredTokenMap = managers.map(manager => (manager.serviceType, manager.acquireTokens(credentialManager))) // serialize and send acquiredTokenMap to executors. // At executors, application is: acquiredTokenMap.map { case (managerType, tokenData) => serviceTokenManagers(managerType).applyTokens(tokenData) } ``` This ensures that future requirements are reasonably captured without any implementation specific changes required to the interface or behavior.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org