Marcelo Vanzin created SPARK-14743:
--------------------------------------

             Summary: Improve delegation token handling in secure clusters
                 Key: SPARK-14743
                 URL: https://issues.apache.org/jira/browse/SPARK-14743
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.0.0
            Reporter: Marcelo Vanzin


In a way, I'd consider this a parent bug of SPARK-7252.

Spark's current support for delegation tokens is a little all over the place:
- for HDFS, there's support for re-creating tokens if a principal and keytab 
are provided
- for HBase and Hive, Spark will fetch delegation tokens so that apps can work 
in cluster mode, but will not re-create them, so apps that need those will stop 
working after 7 days
- for anything else, Spark doesn't do anything. Lots of other services use 
delegation tokens, and supporting them as data sources in Spark becomes more 
complicated because of that. e.g., Kafka will (hopefully) soon support them.

It would be nice if Spark had consistent support for handling delegation tokens 
regardless of who needs them. I'd list these as the requirements:

- Spark to provide a generic interface for fetching delegation tokens. This 
would allow Spark's delegation token support to be extended using some plugin 
architecture (e.g. Java services), meaning Spark itself doesn't need to support 
every possible service out there.

This would be used to fetch tokens when launching apps in cluster mode, and 
when a principal and a keytab are provided to Spark.

- A way to manually update delegation tokens in Spark. For example, a new 
SparkContext API, or some configuration that tells Spark to monitor a file for 
changes and load tokens from said file.

This would allow external applications to manage tokens outside of Spark and be 
able to update a running Spark application (think, for example, a job sever 
like Oozie, or something like Hive-on-Spark which manages Spark apps running 
remotely).

- A way to notify running code that new delegation tokens have been loaded.

This may not be strictly necessary; it might be possible for code to detect 
that, e.g., by peeking into the UserGroupInformation structure. But an event 
sent to the listener bus would allow applications to react when new tokens are 
available (e.g., the Hive backend could re-create connections to the metastore 
server using the new tokens).


Also, cc'ing [~busbey] and [~steve_l] since you've talked about this in the 
mailing list recently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to