Also: using the Kubernetes executor combined with some of the things we discussed greatly enhances the security of Airflow as the environment isn’t really shared anymore.
B. > On 2 Aug 2018, at 19:51, Bolke de Bruin <bdbr...@gmail.com> wrote: > > Hi Dan, > > I discussed this a little bit with one of the security architects here. We > think that > you can have a fair trade off between security and usability by having > a kind of manifest with the dag you are submitting. This manifest can then > specify what the generated tasks/dags are allowed to do and what metadata > to provide to them. We could also let the scheduler generate hashes per > generated > DAG / task and verify those with an established version (1st run?). This > limits the > attack vector. > > A DagSerializer would be great, but I think it solves a different issue and > the above > is somewhat simpler to implement? > > Bolke > >> On 29 Jul 2018, at 23:47, Dan Davydov <ddavy...@twitter.com.INVALID >> <mailto:ddavy...@twitter.com.INVALID>> wrote: >> >> *Let’s say we trust the owner field of the DAGs I think we could do the >> following.* >> *Obviously, the trusting the user part is key here. It is one of the >> reasons I was suggesting using “airflow submit” to update / add dags in >> Airflow* >> >> >> *This is the hard part about my question.* >> I think in a true multi-tenant environment we wouldn't be able to trust the >> user, otherwise we wouldn't necessarily even need a mapping of Airflow DAG >> users to secrets, because if we trust users to set the correct Airflow user >> for DAGs, we are basically trusting them with all of the creds the Airflow >> scheduler can access for all users anyways. >> >> I actually had the same thought as your "airflow submit" a while ago, which >> I discussed with Alex, basically creating an API for adding DAGs instead of >> having the Scheduler parse them. FWIW I think it's superior to the git time >> machine approach because it's a more generic form of "serialization" and is >> more correct as well because the same DAG file parsed on a given git SHA >> can produce different DAGs. Let me know what you think, and maybe I can >> start a more formal design doc if you are onboard: >> >> A user or service with an auth token sends an "airflow submit" request to a >> new kind of Dag Serialization service, along with the serialized DAG >> objects generated by parsing on the client. It's important that these >> serialized objects are declaritive and not e.g. pickles so that the >> scheduler/workers can consume them and reproducability of the DAGs is >> guaranteed. The service will then store each generated DAG along with it's >> access based on the provided token e.g. using Ranger, and the >> scheduler/workers will use the stored DAGs for scheduling/execution. >> Operators would be deployed along with the Airflow code separately from the >> serialized DAGs. >> >> A serialed DAG would look something like this (basically Luigi-style :)): >> MyTask - BashOperator: { >> cmd: "sleep 1" >> user: "Foo" >> access: "token1", "token2" >> } >> >> MyDAG: { >> MyTask1 >> SomeOtherTask1 >> MyTask2 >> SomeOtherTask1 >> } >> >> Dynamic DAGs in this case would just consist of a service calling "Airflow >> Submit" that does it's own form of authentication to get access to some >> kind of tokens (or basically just forwarding the secrets the users of the >> dynamic DAG submit). >> >> For the default Airflow implementation you can maybe just have the Dag >> Serialization server bundled with the Scheduler, with auth turned off, and >> to periodically update the Dag Serialization store which would emulate the >> current behavior closely. >> >> Pros: >> 1. Consistency across running task instances in a dagrun/scheduler, >> reproducability and auditability of DAGs >> 2. Users can control when to deploy their DAGs >> 3. Scheduler runs much faster since it doesn't have to run python files and >> e.g. make network calls >> 4. Scaling scheduler becomes easier because can have different service >> responsible for parsing DAGs which can be trivially scaled horizontally >> (clients are doing the parsing) >> 5. Potentially makes creating ad-hoc DAGs/backfilling/iterating on DAGs >> easier? e.g. can use the Scheduler itself to schedule backfills with a >> slightly modified serialized version of a DAG. >> >> Cons: >> 1. Have to deprecate a lot of popular features, e.g. allowing custom >> callbacks in operators (e.g. on_failure), and jinja_templates >> 2. Version compatibility problems, e.g. user/service client might be >> serializing arguments for hooks/operators that have been deprecated in >> newer versions of the hooks, or the serialized DAG schema changes and old >> DAGs aren't automatically updated. Might want to have some kind of >> versioning system for serialized DAGs to at least ensure that stored DAGs >> are valid when the Scheduler/Worker/etc are upgraded, maybe something >> similar to thrift/protobuf versioning. >> 3. Additional complexity - additional service, logic on workers/scheduler >> to fetch/cache serialized DAGs efficiently, expiring/archiving old DAG >> definitions, etc >> >> >> On Sun, Jul 29, 2018 at 3:20 PM Bolke de Bruin <bdbr...@gmail.com >> <mailto:bdbr...@gmail.com>> wrote: >> >>> Ah gotcha. That’s another issue actually (but related). >>> >>> Let’s say we trust the owner field of the DAGs I think we could do the >>> following. We then have a table (and interface) to tell Airflow what users >>> have access to what connections. The scheduler can then check if the task >>> in the dag can access the conn_id it is asking for. Auto generated dags >>> still have an owner (or should) and therefore should be fine. Some >>> integrity checking could/should be added as we want to be sure that the >>> task we schedule is the task we launch. So a signature calculated at the >>> scheduler (or part of the DAG), send as part of the metadata and checked by >>> the executor is probably smart. >>> >>> You can also make this more fancy by integrating with something like >>> Apache Ranger that allows for policy checking. >>> >>> Obviously, the trusting the user part is key here. It is one of the >>> reasons I was suggesting using “airflow submit” to update / add dags in >>> Airflow. We could enforce authentication on the DAG. It was kind of ruled >>> out in favor of git time machines although these never happened afaik ;-). >>> >>> BTW: I have updated my implementation with protobuf. Metadata is now >>> available at executor and task. >>> >>> >>>> On 29 Jul 2018, at 15:47, Dan Davydov <ddavy...@twitter.com.INVALID >>>> <mailto:ddavy...@twitter.com.INVALID>> >>> wrote: >>>> >>>> The concern is how to secure secrets on the scheduler such that only >>>> certain DAGs can access them, and in the case of files that create DAGs >>>> dynamically, only some set of DAGs should be able to access these >>> secrets. >>>> >>>> e.g. if there is a secret/keytab that can be read by DAG A generated by >>>> file X, and file X generates DAG B as well, there needs to be a scheme to >>>> stop the parsing of DAG B on the scheduler from being able to read the >>>> secret in DAG A. >>>> >>>> Does that make sense? >>>> >>>> On Sun, Jul 29, 2018 at 6:14 AM Bolke de Bruin <bdbr...@gmail.com >>>> <mailto:bdbr...@gmail.com> >>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>>> wrote: >>>> >>>>> I’m not sure what you mean. The example I created allows for dynamic >>> DAGs, >>>>> as the scheduler obviously knows about the tasks when they are ready to >>> be >>>>> scheduled. >>>>> This isn’t any different from a static DAG or a dynamic one. >>>>> >>>>> For Kerberos it isnt that special. Basically a keytab are the revokable >>>>> users credentials >>>>> in a special format. The keytab itself can be protected by a password. >>> So >>>>> I can imagine >>>>> that a connection is defined that sets a keytab location and password to >>>>> access the keytab. >>>>> The scheduler understands this (or maybe the Connection model) and >>>>> serializes and sends >>>>> it to the worker as part of the metadata. The worker then reconstructs >>> the >>>>> keytab and issues >>>>> a kinit or supplies it to the other service requiring it (eg. Spark) >>>>> >>>>> * Obviously the worker and scheduler need to communicate over SSL. >>>>> * There is a challenge at the worker level. Credentials are secured >>>>> against other users, but are readable by the owning user. So imagine 2 >>> DAGs >>>>> from two different users with different connections without sudo >>>>> configured. If they end up at the same worker if DAG 2 is malicious it >>>>> could read files and memory created by DAG 1. This is the reason why >>> using >>>>> environment variables are NOT safe (DAG 2 could read >>> /proc/<pid>/environ). >>>>> To mitigate this we probably need to PIPE the data to the task’s STDIN. >>> It >>>>> won’t solve the issue but will make it harder as now it will only be in >>>>> memory. >>>>> * The reconstructed keytab (or the initalized version) can be stored in, >>>>> most likely, the process-keyring ( >>>>> http://man7.org/linux/man-pages/man7/process-keyring.7.html >>>>> <http://man7.org/linux/man-pages/man7/process-keyring.7.html> < >>>>> http://man7.org/linux/man-pages/man7/process-keyring.7.html >>>>> <http://man7.org/linux/man-pages/man7/process-keyring.7.html> < >>> http://man7.org/linux/man-pages/man7/process-keyring.7.html >>> <http://man7.org/linux/man-pages/man7/process-keyring.7.html>>>). As >>>>> mentioned earlier this poses a challenge for Java applications that >>> cannot >>>>> read from this location (keytab an ccache). Writing it out to the >>>>> filesystem then becomes a possibility. This is essentially the same how >>>>> Spark solves it ( >>>>> https://spark.apache.org/docs/latest/security.html#yarn-mode >>>>> <https://spark.apache.org/docs/latest/security.html#yarn-mode> < >>> https://spark.apache.org/docs/latest/security.html#yarn-mode >>> <https://spark.apache.org/docs/latest/security.html#yarn-mode>> < >>>>> https://spark.apache.org/docs/latest/security.html#yarn-mode >>>>> <https://spark.apache.org/docs/latest/security.html#yarn-mode> < >>> https://spark.apache.org/docs/latest/security.html#yarn-mode >>> <https://spark.apache.org/docs/latest/security.html#yarn-mode>>>). >>>>> >>>>> Why not work on this together? We need it as well. Airflow as it is now >>> we >>>>> consider the biggest security threat and it is really hard to secure it. >>>>> The above would definitely be a serious improvement. Another step would >>> be >>>>> to stop Tasks from accessing the Airflow DB all together. >>>>> >>>>> Cheers >>>>> Bolke >>>>> >>>>>> On 29 Jul 2018, at 05:36, Dan Davydov <ddavy...@twitter.com.INVALID >>>>>> <mailto:ddavy...@twitter.com.INVALID> >>> <mailto:ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>>> >>>>> wrote: >>>>>> >>>>>> This makes sense, and thanks for putting this together. I might pick >>> this >>>>>> up myself depending on if we can get the rest of the mutli-tenancy >>> story >>>>>> nailed down, but I still think the tricky part is figuring out how to >>>>> allow >>>>>> dynamic DAGs (e.g. DAGs created from rows in a Mysql table) to work >>> with >>>>>> Kerberos, curious what your thoughts are there. How would secrets be >>>>> passed >>>>>> securely in a multi-tenant Scheduler starting from parsing the DAGs up >>> to >>>>>> the executor sending them off? >>>>>> >>>>>> On Sat, Jul 28, 2018 at 5:07 PM Bolke de Bruin <bdbr...@gmail.com >>>>>> <mailto:bdbr...@gmail.com> >>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>> >>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> >>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>>>> wrote: >>>>>> >>>>>>> Here: >>>>>>> >>>>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections >>>>>>> <https://github.com/bolkedebruin/airflow/tree/secure_connections> < >>> https://github.com/bolkedebruin/airflow/tree/secure_connections >>> <https://github.com/bolkedebruin/airflow/tree/secure_connections>> < >>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections >>>>> <https://github.com/bolkedebruin/airflow/tree/secure_connections> < >>> https://github.com/bolkedebruin/airflow/tree/secure_connections >>> <https://github.com/bolkedebruin/airflow/tree/secure_connections>>> < >>>>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections >>>>>>> <https://github.com/bolkedebruin/airflow/tree/secure_connections> < >>> https://github.com/bolkedebruin/airflow/tree/secure_connections >>> <https://github.com/bolkedebruin/airflow/tree/secure_connections>> < >>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections >>>>> <https://github.com/bolkedebruin/airflow/tree/secure_connections> < >>> https://github.com/bolkedebruin/airflow/tree/secure_connections >>> <https://github.com/bolkedebruin/airflow/tree/secure_connections>>>> >>>>>>> >>>>>>> Is a working rudimentary implementation that allows securing the >>>>>>> connections (only LocalExecutor at the moment) >>>>>>> >>>>>>> * It enforces the use of “conn_id” instead of the mix that we have now >>>>>>> * A task if using “conn_id” has ‘auto-registered’ (which is a noop) >>> its >>>>>>> connections >>>>>>> * The scheduler reads the connection informations and serializes it to >>>>>>> json (which should be a different format, protobuf preferably) >>>>>>> * The scheduler then sends this info to the executor >>>>>>> * The executor puts this in the environment of the task (environment >>>>> most >>>>>>> likely not secure enough for us) >>>>>>> * The BaseHook reads out this environment variable and does not need >>> to >>>>>>> touch the database >>>>>>> >>>>>>> The example_http_operator works, I havent tested any other. To make it >>>>>>> work I just adjusted the hook and operator to use “conn_id” instead >>>>>>> of the non standard http_conn_id. >>>>>>> >>>>>>> Makes sense? >>>>>>> >>>>>>> B. >>>>>>> >>>>>>> * The BaseHook is adjusted to not connect to the database >>>>>>>> On 28 Jul 2018, at 17:50, Bolke de Bruin <bdbr...@gmail.com >>>>>>>> <mailto:bdbr...@gmail.com> <mailto: >>> bdbr...@gmail.com <mailto:bdbr...@gmail.com>>> wrote: >>>>>>>> >>>>>>>> Well, I don’t think a hook (or task) should be obtain it by itself. >>> It >>>>>>> should be supplied. >>>>>>>> At the moment you start executing the task you cannot trust it >>> anymore >>>>>>> (ie. it is unmanaged >>>>>>>> / non airflow code). >>>>>>>> >>>>>>>> So we could change the basehook to understand supplied credentials >>> and >>>>>>> populate >>>>>>>> a hash with “conn_ids”. Hooks normally call BaseHook.get_connection >>>>>>> anyway, so >>>>>>>> it shouldnt be too hard and should in principle not require changes >>> to >>>>>>> the hooks >>>>>>>> themselves if they are well behaved. >>>>>>>> >>>>>>>> B. >>>>>>>> >>>>>>>>> On 28 Jul 2018, at 17:41, Dan Davydov <ddavy...@twitter.com.INVALID >>>>>>>>> <mailto:ddavy...@twitter.com.INVALID> >>> <mailto:ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>> >>>>>>> <mailto:ddavy...@twitter.com.INVALID >>>>>>> <mailto:ddavy...@twitter.com.INVALID> <mailto: >>> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>> <mailto: >>>>> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> >>>>> <mailto:ddavy...@twitter.com.INVALID >>>>> <mailto:ddavy...@twitter.com.INVALID>>>>> >>> wrote: >>>>>>>>> >>>>>>>>> *So basically in the scheduler we parse the dag. Either from the >>>>>>> manifest >>>>>>>>> (new) or from smart parsing (probably harder, maybe some auto >>>>>>> register?) we >>>>>>>>> know what connections and keytabs are available dag wide or per >>> task.* >>>>>>>>> This is the hard part that I was curious about, for dynamically >>>>> created >>>>>>>>> DAGs, e.g. those generated by reading tasks in a MySQL database or a >>>>>>> json >>>>>>>>> file, there isn't a great way to do this. >>>>>>>>> >>>>>>>>> I 100% agree with deprecating the connections table (at least for >>> the >>>>>>>>> secure option). The main work there is rewriting all hooks to take >>>>>>>>> credentials from arbitrary data sources by allowing a customized >>>>>>>>> CredentialsReader class. Although hooks are technically private, I >>>>>>> think a >>>>>>>>> lot of companies depend on them so the PMC should probably discuss >>> if >>>>>>> this >>>>>>>>> is an Airflow 2.0 change or not. >>>>>>>>> >>>>>>>>> On Fri, Jul 27, 2018 at 5:24 PM Bolke de Bruin <bdbr...@gmail.com >>>>>>>>> <mailto:bdbr...@gmail.com> >>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>> >>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> >>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>>> >>>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> >>>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>> <mailto: >>> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto:bdbr...@gmail.com >>> <mailto:bdbr...@gmail.com>>>>> wrote: >>>>>>>>> >>>>>>>>>> Sure. In general I consider keytabs as a part of connection >>>>>>> information. >>>>>>>>>> Connections should be secured by sending the connection >>> information a >>>>>>> task >>>>>>>>>> needs as part of information the executor gets. A task should then >>>>> not >>>>>>> need >>>>>>>>>> access to the connection table in Airflow. Keytabs could then be >>> send >>>>>>> as >>>>>>>>>> part of the connection information (base64 encoded) and setup by >>> the >>>>>>>>>> executor (this key) to be read only to the task it is launching. >>>>>>>>>> >>>>>>>>>> So basically in the scheduler we parse the dag. Either from the >>>>>>> manifest >>>>>>>>>> (new) or from smart parsing (probably harder, maybe some auto >>>>>>> register?) we >>>>>>>>>> know what connections and keytabs are available dag wide or per >>> task. >>>>>>>>>> >>>>>>>>>> The credentials and connection information then are serialized >>> into a >>>>>>>>>> protobuf message and send to the executor as part of the “queue” >>>>>>> action. >>>>>>>>>> The worker then deserializes the information and makes it securely >>>>>>>>>> available to the task (which is quite hard btw). >>>>>>>>>> >>>>>>>>>> On that last bit making the info securely available might be >>> storing >>>>>>> it in >>>>>>>>>> the Linux KEYRING (supported by python keyring). Keytabs will be >>>>> tough >>>>>>> to >>>>>>>>>> do properly due to Java not properly supporting KEYRING and only >>>>> files >>>>>>> and >>>>>>>>>> these are hard to make secure (due to the possibility a process >>> will >>>>>>> list >>>>>>>>>> all files in /tmp and get credentials through that). Maybe storing >>>>> the >>>>>>>>>> keytab with a password and having the password in the KEYRING might >>>>>>> work. >>>>>>>>>> Something to find out. >>>>>>>>>> >>>>>>>>>> B. >>>>>>>>>> >>>>>>>>>> Verstuurd vanaf mijn iPad >>>>>>>>>> >>>>>>>>>>> Op 27 jul. 2018 om 22:04 heeft Dan Davydov >>>>>>> <ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> >>>>>>> <mailto:ddavy...@twitter.com.INVALID >>>>>>> <mailto:ddavy...@twitter.com.INVALID>> >>> <mailto:ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> >>> <mailto:ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> >>>>> >>>>> <mailto:ddavy...@twitter.com.INVALID >>>>> <mailto:ddavy...@twitter.com.INVALID> <mailto: >>> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>> >>> <mailto:ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> >>> <mailto:ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>> >>>>>>>> >>>>>>>>>> het volgende geschreven: >>>>>>>>>>> >>>>>>>>>>> I'm curious if you had any ideas in terms of ideas to enable >>>>>>>>>> multi-tenancy >>>>>>>>>>> with respect to Kerberos in Airflow. >>>>>>>>>>> >>>>>>>>>>>> On Fri, Jul 27, 2018 at 2:38 PM Bolke de Bruin < >>> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto:bdbr...@gmail.com >>> <mailto:bdbr...@gmail.com>> >>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> >>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>>> >>>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> >>>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>> <mailto: >>> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto:bdbr...@gmail.com >>> <mailto:bdbr...@gmail.com>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Cool. The doc will need some refinement as it isn't entirely >>>>>>> accurate. >>>>>>>>>> In >>>>>>>>>>>> addition we need to separate between Airflow as a client of >>>>>>> kerberized >>>>>>>>>>>> services (this is what is talked about in the astronomer doc) vs >>>>>>>>>>>> kerberizing airflow itself, which the API supports. >>>>>>>>>>>> >>>>>>>>>>>> In general to access kerberized services (airflow as a client) >>> one >>>>>>> needs >>>>>>>>>>>> to start the ticket renewer with a valid keytab. For the hooks it >>>>>>> isn't >>>>>>>>>>>> always required to change the hook to support it. Hadoop cli >>> tools >>>>>>> often >>>>>>>>>>>> just pick it up as their client config is set to do so. Then >>>>> another >>>>>>>>>> class >>>>>>>>>>>> is there for HTTP-like services which are accessed by urllib >>> under >>>>>>> the >>>>>>>>>>>> hood, these typically use SPNEGO. These often need to be adjusted >>>>> as >>>>>>> it >>>>>>>>>>>> requires some urllib config. Finally, there are protocols which >>> use >>>>>>> SASL >>>>>>>>>>>> with kerberos. Like HDFS (not webhdfs, that uses SPNEGO). These >>>>>>> require >>>>>>>>>> per >>>>>>>>>>>> protocol implementations. >>>>>>>>>>>> >>>>>>>>>>>> From the top of my head we support kerberos client side now with: >>>>>>>>>>>> >>>>>>>>>>>> * Spark >>>>>>>>>>>> * HDFS (snakebite python 2.7, cli and with the upcoming libhdfs >>>>>>>>>>>> implementation) >>>>>>>>>>>> * Hive (not metastore afaik) >>>>>>>>>>>> >>>>>>>>>>>> Two things to remember: >>>>>>>>>>>> >>>>>>>>>>>> * If a job (ie. Spark job) will finish later than the maximum >>>>> ticket >>>>>>>>>>>> lifetime you probably need to provide a keytab to said >>> application. >>>>>>>>>>>> Otherwise you will get failures after the expiry. >>>>>>>>>>>> * A keytab (used by the renewer) are credentials (user and pass) >>> so >>>>>>> jobs >>>>>>>>>>>> are executed under the keytab in use at that moment >>>>>>>>>>>> * Securing keytab in multi tenancy airflow is a challenge. This >>>>> also >>>>>>>>>> goes >>>>>>>>>>>> for securing connections. This we need to fix at some point. >>>>> Solution >>>>>>>>>> for >>>>>>>>>>>> now seems to be no multi tenancy. >>>>>>>>>>>> >>>>>>>>>>>> Kerberos seems harder than it is btw. Still, we are sometimes >>>>> moving >>>>>>>>>> away >>>>>>>>>>>> from it to OAUTH2 based authentication. This gets use closer to >>>>> cloud >>>>>>>>>>>> standards (but we are on prem) >>>>>>>>>>>> >>>>>>>>>>>> B. >>>>>>>>>>>> >>>>>>>>>>>> Sent from my iPhone >>>>>>>>>>>> >>>>>>>>>>>>> On 27 Jul 2018, at 17:41, Hitesh Shah <hit...@apache.org >>>>>>>>>>>>> <mailto:hit...@apache.org> >>> <mailto:hit...@apache.org <mailto:hit...@apache.org>> <mailto: >>>>> hit...@apache.org <mailto:hit...@apache.org> <mailto:hit...@apache.org >>>>> <mailto:hit...@apache.org>>> <mailto: >>>>>>> hit...@apache.org <mailto:hit...@apache.org> <mailto:hit...@apache.org >>>>>>> <mailto:hit...@apache.org>> <mailto: >>> hit...@apache.org <mailto:hit...@apache.org> <mailto:hit...@apache.org >>> <mailto:hit...@apache.org>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Taylor >>>>>>>>>>>>> >>>>>>>>>>>>> +1 on upstreaming this. It would be great if you can submit a >>> pull >>>>>>>>>>>> request >>>>>>>>>>>>> to enhance the apache airflow docs. >>>>>>>>>>>>> >>>>>>>>>>>>> thanks >>>>>>>>>>>>> Hitesh >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Jul 26, 2018 at 2:32 PM Taylor Edmiston < >>>>>>> tedmis...@gmail.com <mailto:tedmis...@gmail.com> >>>>>>> <mailto:tedmis...@gmail.com <mailto:tedmis...@gmail.com>> <mailto: >>> tedmis...@gmail.com <mailto:tedmis...@gmail.com> >>> <mailto:tedmis...@gmail.com <mailto:tedmis...@gmail.com>>> <mailto: >>>>> tedmis...@gmail.com <mailto:tedmis...@gmail.com> >>>>> <mailto:tedmis...@gmail.com <mailto:tedmis...@gmail.com>> <mailto: >>> tedmis...@gmail.com <mailto:tedmis...@gmail.com> >>> <mailto:tedmis...@gmail.com <mailto:tedmis...@gmail.com>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> While we're on the topic, I'd love any feedback from Bolke or >>>>>>> others >>>>>>>>>>>> who've >>>>>>>>>>>>>> used Kerberos with Airflow on this quick guide I put together >>>>>>>>>> yesterday. >>>>>>>>>>>>>> It's similar to what's in the Airflow docs but instead all on >>> one >>>>>>> page >>>>>>>>>>>>>> and slightly >>>>>>>>>>>>>> expanded. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>> >>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >>> >>> <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md> >>> < >>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >>> >>> <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md> >>>> >>>>> < >>>>> >>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >>> >>> <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md> >>> < >>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >>> >>> <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md> >>>> >>>>>> >>>>>>> < >>>>>>> >>>>> >>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >>> >>> <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md> >>> < >>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >>> >>> <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md> >>>> >>>>> < >>>>> >>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >>> >>> <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md> >>> < >>> https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md >>> >>> <https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md> >>>> >>>>>> >>>>>>>> >>>>>>>>>>>>>> (or web version <https://www.astronomer.io/guides/kerberos/ >>>>>>>>>>>>>> <https://www.astronomer.io/guides/kerberos/> < >>> https://www.astronomer.io/guides/kerberos/ >>> <https://www.astronomer.io/guides/kerberos/>> < >>>>> https://www.astronomer.io/guides/kerberos/ >>>>> <https://www.astronomer.io/guides/kerberos/> < >>> https://www.astronomer.io/guides/kerberos/ >>> <https://www.astronomer.io/guides/kerberos/>>>>) >>>>>>>>>>>>>> >>>>>>>>>>>>>> One thing I'd like to add is a minimal example of how to >>>>> Kerberize >>>>>>> a >>>>>>>>>>>> hook. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I'd be happy to upstream this as well if it's useful (maybe a >>>>>>>>>> Concepts > >>>>>>>>>>>>>> Additional Functionality > Kerberos page?) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Taylor >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> *Taylor Edmiston* >>>>>>>>>>>>>> Blog <https://blog.tedmiston.com/ <https://blog.tedmiston.com/> >>>>>>>>>>>>>> <https://blog.tedmiston.com/ <https://blog.tedmiston.com/>> >>> <https://blog.tedmiston.com/ <https://blog.tedmiston.com/> >>> <https://blog.tedmiston.com/ <https://blog.tedmiston.com/>>>> >>>>> | CV >>>>>>>>>>>>>> <https://stackoverflow.com/cv/taylor >>>>>>>>>>>>>> <https://stackoverflow.com/cv/taylor> < >>> https://stackoverflow.com/cv/taylor <https://stackoverflow.com/cv/taylor>> < >>>>> https://stackoverflow.com/cv/taylor <https://stackoverflow.com/cv/taylor> >>>>> < >>> https://stackoverflow.com/cv/taylor >>> <https://stackoverflow.com/cv/taylor>>>> | LinkedIn >>>>>>>>>>>>>> <https://www.linkedin.com/in/tedmiston/ >>>>>>>>>>>>>> <https://www.linkedin.com/in/tedmiston/> < >>> https://www.linkedin.com/in/tedmiston/ >>> <https://www.linkedin.com/in/tedmiston/>> < >>>>> https://www.linkedin.com/in/tedmiston/ >>>>> <https://www.linkedin.com/in/tedmiston/> < >>> https://www.linkedin.com/in/tedmiston/ >>> <https://www.linkedin.com/in/tedmiston/>>>> | AngelList >>>>>>>>>>>>>> <https://angel.co/taylor <https://angel.co/taylor> >>>>>>>>>>>>>> <https://angel.co/taylor <https://angel.co/taylor>> < >>> https://angel.co/taylor <https://angel.co/taylor> <https://angel.co/taylor >>> <https://angel.co/taylor>>>> | Stack >>>>> Overflow >>>>>>>>>>>>>> <https://stackoverflow.com/users/149428/taylor-edmiston >>>>>>>>>>>>>> <https://stackoverflow.com/users/149428/taylor-edmiston> < >>> https://stackoverflow.com/users/149428/taylor-edmiston >>> <https://stackoverflow.com/users/149428/taylor-edmiston>> < >>>>> https://stackoverflow.com/users/149428/taylor-edmiston >>>>> <https://stackoverflow.com/users/149428/taylor-edmiston> < >>> https://stackoverflow.com/users/149428/taylor-edmiston >>> <https://stackoverflow.com/users/149428/taylor-edmiston>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Jul 26, 2018 at 5:18 PM, Driesprong, Fokko >>>>>>>>>> <fo...@driesprong.frl <mailto:fo...@driesprong.frl> >>>>>>>>>> <mailto:fo...@driesprong.frl <mailto:fo...@driesprong.frl>> <mailto: >>> fo...@driesprong.frl <mailto:fo...@driesprong.frl> >>> <mailto:fo...@driesprong.frl <mailto:fo...@driesprong.frl>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Ry, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> You should ask Bolke de Bruin. He's really experienced with >>>>>>> Kerberos >>>>>>>>>>>> and >>>>>>>>>>>>>> he >>>>>>>>>>>>>>> did also the implementation for Airflow. Beside that he worked >>>>>>> also >>>>>>>>>> on >>>>>>>>>>>>>>> implementing Kerberos in Ambari. Just want to let you know. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, Fokko >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Op do 26 jul. 2018 om 23:03 schreef Ry Walker < >>> r...@astronomer.io <mailto:r...@astronomer.io> <mailto:r...@astronomer.io >>> <mailto:r...@astronomer.io>> >>>>> <mailto:r...@astronomer.io <mailto:r...@astronomer.io> >>>>> <mailto:r...@astronomer.io <mailto:r...@astronomer.io>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi everyone - >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We have several bigCo's who are considering using Airflow >>>>> asking >>>>>>>>>> into >>>>>>>>>>>>>> its >>>>>>>>>>>>>>>> support for Kerberos. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We're going to work on a proof-of-concept next week, will >>>>> likely >>>>>>>>>>>>>> record a >>>>>>>>>>>>>>>> screencast on it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For now, we're looking for any anecdotal information from >>>>>>>>>>>> organizations >>>>>>>>>>>>>>> who >>>>>>>>>>>>>>>> are using Kerberos with Airflow, if anyone would be willing >>> to >>>>>>> share >>>>>>>>>>>>>>> their >>>>>>>>>>>>>>>> experiences here, or reply to me personally, it would be >>>>> greatly >>>>>>>>>>>>>>>> appreciated! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Ry >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> *Ry Walker* | CEO, Astronomer <http://www.astronomer.io/ >>>>>>>>>>>>>>>> <http://www.astronomer.io/> < >>> http://www.astronomer.io/ <http://www.astronomer.io/>> < >>>>> http://www.astronomer.io/ <http://www.astronomer.io/> >>>>> <http://www.astronomer.io/ <http://www.astronomer.io/>>>> | >>>>>>>>>>>>>>> 513.417.2163 | >>>>>>>>>>>>>>>> @rywalker <http://twitter.com/rywalker >>>>>>>>>>>>>>>> <http://twitter.com/rywalker> < >>> http://twitter.com/rywalker <http://twitter.com/rywalker>> < >>>>> http://twitter.com/rywalker <http://twitter.com/rywalker> >>>>> <http://twitter.com/rywalker <http://twitter.com/rywalker>>>> | LinkedIn >>>>>>>>>>>>>>>> <http://www.linkedin.com/in/rywalker >>>>>>>>>>>>>>>> <http://www.linkedin.com/in/rywalker> < >>> http://www.linkedin.com/in/rywalker <http://www.linkedin.com/in/rywalker>> < >>>>> http://www.linkedin.com/in/rywalker <http://www.linkedin.com/in/rywalker> >>>>> < >>> http://www.linkedin.com/in/rywalker <http://www.linkedin.com/in/rywalker>>>> >