Small correction: we probably need something like this

https://github.com/tuxberlin/python-keyctl

For proper securing

Verstuurd vanaf mijn iPad

> Op 27 jul. 2018 om 23:24 heeft Bolke de Bruin <bdbr...@gmail.com> het 
> volgende geschreven:
> 
> Sure. In general I consider keytabs as a part of connection information. 
> Connections should be secured by sending the connection information a task 
> needs as part of information the executor gets. A task should then not need 
> access to the connection table in Airflow. Keytabs could then be send as part 
> of the connection information (base64 encoded) and setup by the executor 
> (this key) to be read only to the task it is launching.
> 
> So basically in the scheduler we parse the dag. Either from the manifest 
> (new) or from smart parsing (probably harder, maybe some auto register?) we 
> know what connections and keytabs are available dag wide or per task. 
> 
> The credentials and connection information then are serialized into a 
> protobuf message and send to the executor as part of the “queue” action. The 
> worker then deserializes the information and makes it securely available to 
> the task (which is quite hard btw).
> 
> On that last bit making the info securely available might be storing it in 
> the Linux KEYRING (supported by python keyring). Keytabs will be tough to do 
> properly due to Java not properly supporting KEYRING and only files and these 
> are hard to make secure (due to the possibility a process will list all files 
> in /tmp and get credentials through that). Maybe storing the keytab with a 
> password and having the password in the KEYRING might work. Something to find 
> out.
> 
> B.
> 
> Verstuurd vanaf mijn iPad
> 
>> Op 27 jul. 2018 om 22:04 heeft Dan Davydov <ddavy...@twitter.com.INVALID> 
>> het volgende geschreven:
>> 
>> I'm curious if you had any ideas in terms of ideas to enable multi-tenancy
>> with respect to Kerberos in Airflow.
>> 
>>> On Fri, Jul 27, 2018 at 2:38 PM Bolke de Bruin <bdbr...@gmail.com> wrote:
>>> 
>>> Cool. The doc will need some refinement as it isn't entirely accurate. In
>>> addition we need to separate between Airflow as a client of kerberized
>>> services (this is what is talked about in the astronomer doc) vs
>>> kerberizing airflow itself, which the API supports.
>>> 
>>> In general to access kerberized services (airflow as a client) one needs
>>> to start the ticket renewer with a valid keytab. For the hooks it isn't
>>> always required to change the hook to support it. Hadoop cli tools often
>>> just pick it up as their client config is set to do so. Then another class
>>> is there for HTTP-like services which are accessed by urllib under the
>>> hood, these typically use SPNEGO. These often need to be adjusted as it
>>> requires some urllib config. Finally, there are protocols which use SASL
>>> with kerberos. Like HDFS (not webhdfs, that uses SPNEGO). These require per
>>> protocol implementations.
>>> 
>>> From the top of my head we support kerberos client side now with:
>>> 
>>> * Spark
>>> * HDFS (snakebite python 2.7, cli and with the upcoming libhdfs
>>> implementation)
>>> * Hive (not metastore afaik)
>>> 
>>> Two things to remember:
>>> 
>>> * If a job (ie. Spark job) will finish later than the maximum ticket
>>> lifetime you probably need to provide a keytab to said application.
>>> Otherwise you will get failures after the expiry.
>>> * A keytab (used by the renewer) are credentials (user and pass) so jobs
>>> are executed under the keytab in use at that moment
>>> * Securing keytab in multi tenancy airflow is a challenge. This also goes
>>> for securing connections. This we need to fix at some point. Solution for
>>> now seems to be no multi tenancy.
>>> 
>>> Kerberos seems harder than it is btw. Still, we are sometimes moving away
>>> from it to OAUTH2 based authentication. This gets use closer to cloud
>>> standards (but we are on prem)
>>> 
>>> B.
>>> 
>>> Sent from my iPhone
>>> 
>>>> On 27 Jul 2018, at 17:41, Hitesh Shah <hit...@apache.org> wrote:
>>>> 
>>>> Hi Taylor
>>>> 
>>>> +1 on upstreaming this. It would be great if you can submit a pull
>>> request
>>>> to enhance the apache airflow docs.
>>>> 
>>>> thanks
>>>> Hitesh
>>>> 
>>>> 
>>>>> On Thu, Jul 26, 2018 at 2:32 PM Taylor Edmiston <tedmis...@gmail.com>
>>> wrote:
>>>>> 
>>>>> While we're on the topic, I'd love any feedback from Bolke or others
>>> who've
>>>>> used Kerberos with Airflow on this quick guide I put together yesterday.
>>>>> It's similar to what's in the Airflow docs but instead all on one page
>>>>> and slightly
>>>>> expanded.
>>>>> 
>>>>> 
>>>>> 
>>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md
>>>>> (or web version <https://www.astronomer.io/guides/kerberos/>)
>>>>> 
>>>>> One thing I'd like to add is a minimal example of how to Kerberize a
>>> hook.
>>>>> 
>>>>> I'd be happy to upstream this as well if it's useful (maybe a Concepts >
>>>>> Additional Functionality > Kerberos page?)
>>>>> 
>>>>> Best,
>>>>> Taylor
>>>>> 
>>>>> 
>>>>> *Taylor Edmiston*
>>>>> Blog <https://blog.tedmiston.com/> | CV
>>>>> <https://stackoverflow.com/cv/taylor> | LinkedIn
>>>>> <https://www.linkedin.com/in/tedmiston/> | AngelList
>>>>> <https://angel.co/taylor> | Stack Overflow
>>>>> <https://stackoverflow.com/users/149428/taylor-edmiston>
>>>>> 
>>>>> 
>>>>> On Thu, Jul 26, 2018 at 5:18 PM, Driesprong, Fokko <fo...@driesprong.frl
>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hi Ry,
>>>>>> 
>>>>>> You should ask Bolke de Bruin. He's really experienced with Kerberos
>>> and
>>>>> he
>>>>>> did also the implementation for Airflow. Beside that he worked also on
>>>>>> implementing Kerberos in Ambari. Just want to let you know.
>>>>>> 
>>>>>> Cheers, Fokko
>>>>>> 
>>>>>> Op do 26 jul. 2018 om 23:03 schreef Ry Walker <r...@astronomer.io>
>>>>>> 
>>>>>>> Hi everyone -
>>>>>>> 
>>>>>>> We have several bigCo's who are considering using Airflow asking into
>>>>> its
>>>>>>> support for Kerberos.
>>>>>>> 
>>>>>>> We're going to work on a proof-of-concept next week, will likely
>>>>> record a
>>>>>>> screencast on it.
>>>>>>> 
>>>>>>> For now, we're looking for any anecdotal information from
>>> organizations
>>>>>> who
>>>>>>> are using Kerberos with Airflow, if anyone would be willing to share
>>>>>> their
>>>>>>> experiences here, or reply to me personally, it would be greatly
>>>>>>> appreciated!
>>>>>>> 
>>>>>>> -Ry
>>>>>>> 
>>>>>>> --
>>>>>>> 
>>>>>>> *Ry Walker* | CEO, Astronomer <http://www.astronomer.io/> |
>>>>>> 513.417.2163 |
>>>>>>> @rywalker <http://twitter.com/rywalker> | LinkedIn
>>>>>>> <http://www.linkedin.com/in/rywalker>
>>>>> 
>>> 

Reply via email to