Sure. In general I consider keytabs as a part of connection information. Connections should be secured by sending the connection information a task needs as part of information the executor gets. A task should then not need access to the connection table in Airflow. Keytabs could then be send as part of the connection information (base64 encoded) and setup by the executor (this key) to be read only to the task it is launching.
So basically in the scheduler we parse the dag. Either from the manifest (new) or from smart parsing (probably harder, maybe some auto register?) we know what connections and keytabs are available dag wide or per task. The credentials and connection information then are serialized into a protobuf message and send to the executor as part of the “queue” action. The worker then deserializes the information and makes it securely available to the task (which is quite hard btw). On that last bit making the info securely available might be storing it in the Linux KEYRING (supported by python keyring). Keytabs will be tough to do properly due to Java not properly supporting KEYRING and only files and these are hard to make secure (due to the possibility a process will list all files in /tmp and get credentials through that). Maybe storing the keytab with a password and having the password in the KEYRING might work. Something to find out. B. Verstuurd vanaf mijn iPad > Op 27 jul. 2018 om 22:04 heeft Dan Davydov <ddavy...@twitter.com.INVALID> het > volgende geschreven: > > I'm curious if you had any ideas in terms of ideas to enable multi-tenancy > with respect to Kerberos in Airflow. > >> On Fri, Jul 27, 2018 at 2:38 PM Bolke de Bruin <bdbr...@gmail.com> wrote: >> >> Cool. The doc will need some refinement as it isn't entirely accurate. In >> addition we need to separate between Airflow as a client of kerberized >> services (this is what is talked about in the astronomer doc) vs >> kerberizing airflow itself, which the API supports. >> >> In general to access kerberized services (airflow as a client) one needs >> to start the ticket renewer with a valid keytab. For the hooks it isn't >> always required to change the hook to support it. Hadoop cli tools often >> just pick it up as their client config is set to do so. Then another class >> is there for HTTP-like services which are accessed by urllib under the >> hood, these typically use SPNEGO. These often need to be adjusted as it >> requires some urllib config. Finally, there are protocols which use SASL >> with kerberos. Like HDFS (not webhdfs, that uses SPNEGO). These require per >> protocol implementations. >> >> From the top of my head we support kerberos client side now with: >> >> * Spark >> * HDFS (snakebite python 2.7, cli and with the upcoming libhdfs >> implementation) >> * Hive (not metastore afaik) >> >> Two things to remember: >> >> * If a job (ie. Spark job) will finish later than the maximum ticket >> lifetime you probably need to provide a keytab to said application. >> Otherwise you will get failures after the expiry. >> * A keytab (used by the renewer) are credentials (user and pass) so jobs >> are executed under the keytab in use at that moment >> * Securing keytab in multi tenancy airflow is a challenge. This also goes >> for securing connections. This we need to fix at some point. Solution for >> now seems to be no multi tenancy. >> >> Kerberos seems harder than it is btw. Still, we are sometimes moving away >> from it to OAUTH2 based authentication. This gets use closer to cloud >> standards (but we are on prem) >> >> B. >> >> Sent from my iPhone >> >>> On 27 Jul 2018, at 17:41, Hitesh Shah <hit...@apache.org> wrote: >>> >>> Hi Taylor >>> >>> +1 on upstreaming this. It would be great if you can submit a pull >> request >>> to enhance the apache airflow docs. >>> >>> thanks >>> Hitesh >>> >>> >>>> On Thu, Jul 26, 2018 at 2:32 PM Taylor Edmiston <tedmis...@gmail.com> >> wrote: >>>> >>>> While we're on the topic, I'd love any feedback from Bolke or others >> who've >>>> used Kerberos with Airflow on this quick guide I put together yesterday. >>>> It's similar to what's in the Airflow docs but instead all on one page >>>> and slightly >>>> expanded. >>>> >>>> >>>> >> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >>>> (or web version <https://www.astronomer.io/guides/kerberos/>) >>>> >>>> One thing I'd like to add is a minimal example of how to Kerberize a >> hook. >>>> >>>> I'd be happy to upstream this as well if it's useful (maybe a Concepts > >>>> Additional Functionality > Kerberos page?) >>>> >>>> Best, >>>> Taylor >>>> >>>> >>>> *Taylor Edmiston* >>>> Blog <https://blog.tedmiston.com/> | CV >>>> <https://stackoverflow.com/cv/taylor> | LinkedIn >>>> <https://www.linkedin.com/in/tedmiston/> | AngelList >>>> <https://angel.co/taylor> | Stack Overflow >>>> <https://stackoverflow.com/users/149428/taylor-edmiston> >>>> >>>> >>>> On Thu, Jul 26, 2018 at 5:18 PM, Driesprong, Fokko <fo...@driesprong.frl >>> >>>> wrote: >>>> >>>>> Hi Ry, >>>>> >>>>> You should ask Bolke de Bruin. He's really experienced with Kerberos >> and >>>> he >>>>> did also the implementation for Airflow. Beside that he worked also on >>>>> implementing Kerberos in Ambari. Just want to let you know. >>>>> >>>>> Cheers, Fokko >>>>> >>>>> Op do 26 jul. 2018 om 23:03 schreef Ry Walker <r...@astronomer.io> >>>>> >>>>>> Hi everyone - >>>>>> >>>>>> We have several bigCo's who are considering using Airflow asking into >>>> its >>>>>> support for Kerberos. >>>>>> >>>>>> We're going to work on a proof-of-concept next week, will likely >>>> record a >>>>>> screencast on it. >>>>>> >>>>>> For now, we're looking for any anecdotal information from >> organizations >>>>> who >>>>>> are using Kerberos with Airflow, if anyone would be willing to share >>>>> their >>>>>> experiences here, or reply to me personally, it would be greatly >>>>>> appreciated! >>>>>> >>>>>> -Ry >>>>>> >>>>>> -- >>>>>> >>>>>> *Ry Walker* | CEO, Astronomer <http://www.astronomer.io/> | >>>>> 513.417.2163 | >>>>>> @rywalker <http://twitter.com/rywalker> | LinkedIn >>>>>> <http://www.linkedin.com/in/rywalker> >>>> >>