Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/17723
  
    
    > The only public interface being added in this change is 
ServiceCredentialProvider. It's an interface that service-specific libraries 
(e.g. a Solr connector, or a Kudu connector) would extend, and user 
applications would never touch. It's also an existing interface, which was in 
the YARN module but, really, has functionality that is independent of YARN.
    
    
    As I detailed above, the exposed spi contract is not just the interface, 
but also how it is to be used by spark.
    It includes the environment `obtainCredentials` is invoked within, use of 
`Credentials` as container for tokens and how `Credentials` are leveraged in 
driver/executor.
    This, in entirety, defines the public contract - not just the methods in 
the interface : and this is dependent on how hadoop-security 
authenticates/authorizes services.
    
    > And, on top of that, I don't see a point in abstracting it further. It 
would just be abstraction for the sake of abstraction.
    
    It would ensure that spark is not tied to hadoop security, but can continue 
to leverage it for managing long running applications/services.
    
    Once interfaces are exposed, and have been exposed for a few releases, it 
is exceedingly difficult for users if we break/modify them - irrespective of 
the stability annotation's we add.
    
    
    
    A straw man pseudocode proposal which elaborates is:
    
    ```
    @Unstable
    // rename
    trait BaseServiceTokenProvider[C, T] {
    
      // Must match a supported ServiceTokenManager.serviceType
      def serviceType: String
    
      def serviceName: String
    
      def credentialsRequired(conf: C): Boolean
    
      def obtainCredentials(conf: C, tokenContainer: T): Option[Long]
    }
    
    
    // Existing ServiceCredentialProvider becomes
    @Unstable
    trait ServiceCredentialProvider extends 
BaseServiceTokenProvider[Configuration, Credentials] {
      override def serviceType: String = 
ServiceTokenManager.HADOOP_SECURITY_SERVICE_TYPE
    }
    
    
    
    // Modify ConfigurableCredentialManager to segregate by serviceType of 
ServiceTokenProvider
    // which will be queried by ServiceTokenManager. If unknown serviceType, 
then log error.
    
    @Unstable
    trait ServiceTokenManager {
    
      val HADOOP_SECURITY_SERVICE_TYPE = "hadoop-security"
    
      def serviceType: String
    
      // Return false if validation fails. Invoked in driver.
      def configure(conf: org.apache.spark.SparkConf): Boolean
    
      // Invoked in driver. If required, application of tokens to driver to be 
done as part of
      // acquireTokens
      def acquireTokens(manager: ConfigurableCredentialManager): Array[Byte]
    
      // Invoked in executor
      def applyTokens(data: Array[Byte])
    }
    
    // Existing implementation will be encapsulated within this class.
    private[spark] class YarnServiceTokenManager extends ServiceTokenManager {
    
      override def serviceType: String = HADOOP_SECURITY_SERVICE_TYPE
    
      // In driver
      override def configure(conf: SparkConf): Boolean = {
        // save principal, keytab; create freshHadoopConf
        // configure
        true
      }
    
      override def acquireTokens(manager: ConfigurableCredentialManager): 
Array[Byte] = {
        val ugi = UGI.loginUserFromKeytabAndReturnUGI(principal, keytab)
        val tempCreds = ugi.getCredentials
        ugi.doAs {
          manager.getCredentialProviders(serviceType).flatMap { provider =>
            // existing code in ConfigurableCredentialManager.obtainCredentials
          }.foldLeft(Long.MaxValue)(math.min)
        }
    
        // apply it to driver
        UGI.currentUser.addCredentials(tempCreds)
    
        // serialize tempCreds to Array[Byte]
        serialize(tempCreds)
      }
    
      override def applyTokens(data: Array[Byte]): Unit = {
        // deserialize data into Credentials; apply it to UGI.currentUser
      }
    }
    
    
    
    // Spark core itself becomes decoupled from hadoop-security with the above.
    // It also allows us to add other models if applicable.
    // Changes to driver to leverage above would essentially be:
    
    // a) load ServiceTokenManager's via ServiceLoader.
    
    // b) create CredentialManager - which loads BaseServiceTokenProvider via 
service loader.
    // b.1) Reject any token provider which has a service type not loaded in (a)
    // c) configure (a)
    
    managers.foreach(_.configure(sparkConf))
    
    
    // token acquisition
    val acquiredTokenMap = managers.map(manager => (manager.serviceType, 
manager.acquireTokens(credentialManager)))
    
    // serialize and send acquiredTokenMap to executors.
    
    
    // At executors, application is:
    
    acquiredTokenMap.map {
      case (managerType, tokenData) => 
serviceTokenManagers(managerType).applyTokens(tokenData)
    }
    
    
    ```
    
    This ensures that future requirements are reasonably captured without any 
implementation specific changes required to the interface or behavior.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to