Small correction: we probably need something like this https://github.com/tuxberlin/python-keyctl
For proper securing Verstuurd vanaf mijn iPad > Op 27 jul. 2018 om 23:24 heeft Bolke de Bruin <bdbr...@gmail.com> het > volgende geschreven: > > Sure. In general I consider keytabs as a part of connection information. > Connections should be secured by sending the connection information a task > needs as part of information the executor gets. A task should then not need > access to the connection table in Airflow. Keytabs could then be send as part > of the connection information (base64 encoded) and setup by the executor > (this key) to be read only to the task it is launching. > > So basically in the scheduler we parse the dag. Either from the manifest > (new) or from smart parsing (probably harder, maybe some auto register?) we > know what connections and keytabs are available dag wide or per task. > > The credentials and connection information then are serialized into a > protobuf message and send to the executor as part of the “queue” action. The > worker then deserializes the information and makes it securely available to > the task (which is quite hard btw). > > On that last bit making the info securely available might be storing it in > the Linux KEYRING (supported by python keyring). Keytabs will be tough to do > properly due to Java not properly supporting KEYRING and only files and these > are hard to make secure (due to the possibility a process will list all files > in /tmp and get credentials through that). Maybe storing the keytab with a > password and having the password in the KEYRING might work. Something to find > out. > > B. > > Verstuurd vanaf mijn iPad > >> Op 27 jul. 2018 om 22:04 heeft Dan Davydov <ddavy...@twitter.com.INVALID> >> het volgende geschreven: >> >> I'm curious if you had any ideas in terms of ideas to enable multi-tenancy >> with respect to Kerberos in Airflow. >> >>> On Fri, Jul 27, 2018 at 2:38 PM Bolke de Bruin <bdbr...@gmail.com> wrote: >>> >>> Cool. The doc will need some refinement as it isn't entirely accurate. In >>> addition we need to separate between Airflow as a client of kerberized >>> services (this is what is talked about in the astronomer doc) vs >>> kerberizing airflow itself, which the API supports. >>> >>> In general to access kerberized services (airflow as a client) one needs >>> to start the ticket renewer with a valid keytab. For the hooks it isn't >>> always required to change the hook to support it. Hadoop cli tools often >>> just pick it up as their client config is set to do so. Then another class >>> is there for HTTP-like services which are accessed by urllib under the >>> hood, these typically use SPNEGO. These often need to be adjusted as it >>> requires some urllib config. Finally, there are protocols which use SASL >>> with kerberos. Like HDFS (not webhdfs, that uses SPNEGO). These require per >>> protocol implementations. >>> >>> From the top of my head we support kerberos client side now with: >>> >>> * Spark >>> * HDFS (snakebite python 2.7, cli and with the upcoming libhdfs >>> implementation) >>> * Hive (not metastore afaik) >>> >>> Two things to remember: >>> >>> * If a job (ie. Spark job) will finish later than the maximum ticket >>> lifetime you probably need to provide a keytab to said application. >>> Otherwise you will get failures after the expiry. >>> * A keytab (used by the renewer) are credentials (user and pass) so jobs >>> are executed under the keytab in use at that moment >>> * Securing keytab in multi tenancy airflow is a challenge. This also goes >>> for securing connections. This we need to fix at some point. Solution for >>> now seems to be no multi tenancy. >>> >>> Kerberos seems harder than it is btw. Still, we are sometimes moving away >>> from it to OAUTH2 based authentication. This gets use closer to cloud >>> standards (but we are on prem) >>> >>> B. >>> >>> Sent from my iPhone >>> >>>> On 27 Jul 2018, at 17:41, Hitesh Shah <hit...@apache.org> wrote: >>>> >>>> Hi Taylor >>>> >>>> +1 on upstreaming this. It would be great if you can submit a pull >>> request >>>> to enhance the apache airflow docs. >>>> >>>> thanks >>>> Hitesh >>>> >>>> >>>>> On Thu, Jul 26, 2018 at 2:32 PM Taylor Edmiston <tedmis...@gmail.com> >>> wrote: >>>>> >>>>> While we're on the topic, I'd love any feedback from Bolke or others >>> who've >>>>> used Kerberos with Airflow on this quick guide I put together yesterday. >>>>> It's similar to what's in the Airflow docs but instead all on one page >>>>> and slightly >>>>> expanded. >>>>> >>>>> >>>>> >>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >>>>> (or web version <https://www.astronomer.io/guides/kerberos/>) >>>>> >>>>> One thing I'd like to add is a minimal example of how to Kerberize a >>> hook. >>>>> >>>>> I'd be happy to upstream this as well if it's useful (maybe a Concepts > >>>>> Additional Functionality > Kerberos page?) >>>>> >>>>> Best, >>>>> Taylor >>>>> >>>>> >>>>> *Taylor Edmiston* >>>>> Blog <https://blog.tedmiston.com/> | CV >>>>> <https://stackoverflow.com/cv/taylor> | LinkedIn >>>>> <https://www.linkedin.com/in/tedmiston/> | AngelList >>>>> <https://angel.co/taylor> | Stack Overflow >>>>> <https://stackoverflow.com/users/149428/taylor-edmiston> >>>>> >>>>> >>>>> On Thu, Jul 26, 2018 at 5:18 PM, Driesprong, Fokko <fo...@driesprong.frl >>>> >>>>> wrote: >>>>> >>>>>> Hi Ry, >>>>>> >>>>>> You should ask Bolke de Bruin. He's really experienced with Kerberos >>> and >>>>> he >>>>>> did also the implementation for Airflow. Beside that he worked also on >>>>>> implementing Kerberos in Ambari. Just want to let you know. >>>>>> >>>>>> Cheers, Fokko >>>>>> >>>>>> Op do 26 jul. 2018 om 23:03 schreef Ry Walker <r...@astronomer.io> >>>>>> >>>>>>> Hi everyone - >>>>>>> >>>>>>> We have several bigCo's who are considering using Airflow asking into >>>>> its >>>>>>> support for Kerberos. >>>>>>> >>>>>>> We're going to work on a proof-of-concept next week, will likely >>>>> record a >>>>>>> screencast on it. >>>>>>> >>>>>>> For now, we're looking for any anecdotal information from >>> organizations >>>>>> who >>>>>>> are using Kerberos with Airflow, if anyone would be willing to share >>>>>> their >>>>>>> experiences here, or reply to me personally, it would be greatly >>>>>>> appreciated! >>>>>>> >>>>>>> -Ry >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> *Ry Walker* | CEO, Astronomer <http://www.astronomer.io/> | >>>>>> 513.417.2163 | >>>>>>> @rywalker <http://twitter.com/rywalker> | LinkedIn >>>>>>> <http://www.linkedin.com/in/rywalker> >>>>> >>>