I look forward to reading the draft and working on it with you! Not 100% sure I can make it so SF for the hackathon (I'm in New York now), but I can participate remotely.
On Sat, Aug 4, 2018 at 9:30 AM Bolke de Bruin <bdbr...@gmail.com> wrote: > Hi Dan, > > Don’t misunderstand me. I think what I proposed is complementary to the > dag submit function. The only thing you mentioned I don’t think is needed > is to fully serialize up front and therefore excluding callback etc > (although there are other serialization libraries like marshmallow that > might be able to do it). > > You are right to mention that the hashes should be calculated at submit > time and a authorized user should be able to recalculate a hash. Another > option could be something like https://pypi.org/project/signedimp/ which > we could use to verify dependencies. > > I’ll start writing something up. We can then shoot holes in it (i think > you have a point on the crypto) and maybe do some hacking on it. This could > be part of the hackathon in sept in SF, I’m sure some other people would > have an interest in it as well. > > B. > > Verstuurd vanaf mijn iPad > > > Op 3 aug. 2018 om 23:14 heeft Dan Davydov <ddavy...@twitter.com.INVALID> > het volgende geschreven: > > > > I designed a system similar to what you are describing which is in use at > > Airbnb (only DAGs on a whitelist would be allowed to merged to the git > repo > > if they used certain types of impersonation), it worked for simple use > > cases, but the problem was doing access control becomes very difficult, > > e.g. solving the problem of which DAGs map to which manifest files, and > > which manifest files can access which secrets. > > > > There is also a security risk where someone changes e.g. a python file > > dependency of your task, or let's say you figure out a way to block those > > kinds of changes based on your sthashing, what if there is a legitimate > > change in a dependency and you want to recalculate the hash? Then I think > > you go back to a solution like your proposed "airflow submit" command to > > accomplish this. > > > > Additional concerns: > > - I'm not sure if I'm a fan of the the first time a scheduler parses a > DAG > > to be what creates the hashes either, it feels to me like > > encryption/hashing should be done before DAGs are even parsed by the > > scheduler (at commit time or submit time of the DAGs) > > - The type of the encrypted key seem kind of hacky to me, i.e. some kind > of > > custom hash based on DAG structure instead of a simple token passed in by > > users which has a clear separation of concerns WRT security > > - Added complexity both to Airflow code, and to users as they need to > > define or customize hashing functions for DAGs to improve security > > If we can get a reasonably secure solution then it might be a reasonable > > trade-off considering the alternative is a major overhaul/restrictions to > > DAGs. > > > > Maybe I'm missing some details that would alleviate my concerns here, > and a > > bit of a more in-depth document might help? > > > > > > > > *Also: using the Kubernetes executor combined with some of the things > > wediscussed greatly enhances the security of Airflow as the > > environment isn’t really shared anymore.* > > Assuming a multi-tenant scheduler, I feel the same set of hard problems > > exist with Kubernetes, as the executor mainly just simplifies the > > post-executor parts of task scheduling/execution which I think you > already > > outlined a good solution for early on in this thread (passing keys from > the > > executor to workers). > > > > Happy to set up some time to talk real-time about this by the way, once > we > > iron out the details I want to implement whatever the best solution we > come > > up with is. > > > >> On Thu, Aug 2, 2018 at 4:13 PM Bolke de Bruin <bdbr...@gmail.com> > wrote: > >> > >> You mentioned you would like to make sure that the DAG (and its tasks) > >> runs in a confined set of settings. Ie. > >> A given set of connections at submission time not at run time. So here > we > >> can make use of the fact that both the scheduler > >> and the worker parse the DAG. > >> > >> Firstly, when scheduler evaluates a DAG it can add an integrity check > >> (hash) for each task. The executor can encrypt the > >> metadata with this hash ensuring that the structure of the DAG remained > >> the same. It means that the task is only > >> able to decrypt the metadata when it is able to calculate the same hash. > >> > >> Similarly, if the scheduler parses a DAG for the first time it can > >> register the hashes for the tasks. It can then verify these hashes > >> at runtime to ensure the structure of the tasks have stayed the same. In > >> the manifest (which could even in the DAG or > >> part of the DAG definition) we could specify which fields would be used > >> for hash calculation. We could even specify > >> static hashes. This would give flexibility as to what freedom the users > >> have in the auto-generated DAGS. > >> > >> Something like that? > >> > >> B. > >> > >>> On 2 Aug 2018, at 20:12, Dan Davydov <ddavy...@twitter.com.INVALID> > >> wrote: > >>> > >>> I'm very intrigued, and am curious how this would work in a bit more > >>> detail, especially for dynamically created DAGs (how would static > >> manifests > >>> map to DAGs that are generated from rows in a MySQL table for example)? > >> You > >>> could of course have something like regexes in your manifest file like > >>> some_dag_framework_dag_*, but then how would you make sure that other > >> users > >>> did not create DAGs that matched this regex? > >>> > >>> On Thu, Aug 2, 2018 at 1:51 PM Bolke de Bruin <bdbr...@gmail.com > >> <mailto:bdbr...@gmail.com>> wrote: > >>> > >>>> Hi Dan, > >>>> > >>>> I discussed this a little bit with one of the security architects > here. > >> We > >>>> think that > >>>> you can have a fair trade off between security and usability by having > >>>> a kind of manifest with the dag you are submitting. This manifest can > >> then > >>>> specify what the generated tasks/dags are allowed to do and what > >> metadata > >>>> to provide to them. We could also let the scheduler generate hashes > per > >>>> generated > >>>> DAG / task and verify those with an established version (1st run?). > This > >>>> limits the > >>>> attack vector. > >>>> > >>>> A DagSerializer would be great, but I think it solves a different > issue > >>>> and the above > >>>> is somewhat simpler to implement? > >>>> > >>>> Bolke > >>>> > >>>>> On 29 Jul 2018, at 23:47, Dan Davydov <ddavy...@twitter.com.INVALID> > >>>> wrote: > >>>>> > >>>>> *Let’s say we trust the owner field of the DAGs I think we could do > the > >>>>> following.* > >>>>> *Obviously, the trusting the user part is key here. It is one of the > >>>>> reasons I was suggesting using “airflow submit” to update / add dags > in > >>>>> Airflow* > >>>>> > >>>>> > >>>>> *This is the hard part about my question.* > >>>>> I think in a true multi-tenant environment we wouldn't be able to > trust > >>>> the > >>>>> user, otherwise we wouldn't necessarily even need a mapping of > Airflow > >>>> DAG > >>>>> users to secrets, because if we trust users to set the correct > Airflow > >>>> user > >>>>> for DAGs, we are basically trusting them with all of the creds the > >>>> Airflow > >>>>> scheduler can access for all users anyways. > >>>>> > >>>>> I actually had the same thought as your "airflow submit" a while ago, > >>>> which > >>>>> I discussed with Alex, basically creating an API for adding DAGs > >> instead > >>>> of > >>>>> having the Scheduler parse them. FWIW I think it's superior to the > git > >>>> time > >>>>> machine approach because it's a more generic form of "serialization" > >> and > >>>> is > >>>>> more correct as well because the same DAG file parsed on a given git > >> SHA > >>>>> can produce different DAGs. Let me know what you think, and maybe I > can > >>>>> start a more formal design doc if you are onboard: > >>>>> > >>>>> A user or service with an auth token sends an "airflow submit" > request > >>>> to a > >>>>> new kind of Dag Serialization service, along with the serialized DAG > >>>>> objects generated by parsing on the client. It's important that these > >>>>> serialized objects are declaritive and not e.g. pickles so that the > >>>>> scheduler/workers can consume them and reproducability of the DAGs is > >>>>> guaranteed. The service will then store each generated DAG along with > >>>> it's > >>>>> access based on the provided token e.g. using Ranger, and the > >>>>> scheduler/workers will use the stored DAGs for scheduling/execution. > >>>>> Operators would be deployed along with the Airflow code separately > from > >>>> the > >>>>> serialized DAGs. > >>>>> > >>>>> A serialed DAG would look something like this (basically Luigi-style > >> :)): > >>>>> MyTask - BashOperator: { > >>>>> cmd: "sleep 1" > >>>>> user: "Foo" > >>>>> access: "token1", "token2" > >>>>> } > >>>>> > >>>>> MyDAG: { > >>>>> MyTask1 >> SomeOtherTask1 > >>>>> MyTask2 >> SomeOtherTask1 > >>>>> } > >>>>> > >>>>> Dynamic DAGs in this case would just consist of a service calling > >>>> "Airflow > >>>>> Submit" that does it's own form of authentication to get access to > some > >>>>> kind of tokens (or basically just forwarding the secrets the users of > >> the > >>>>> dynamic DAG submit). > >>>>> > >>>>> For the default Airflow implementation you can maybe just have the > Dag > >>>>> Serialization server bundled with the Scheduler, with auth turned > off, > >>>> and > >>>>> to periodically update the Dag Serialization store which would > emulate > >>>> the > >>>>> current behavior closely. > >>>>> > >>>>> Pros: > >>>>> 1. Consistency across running task instances in a dagrun/scheduler, > >>>>> reproducability and auditability of DAGs > >>>>> 2. Users can control when to deploy their DAGs > >>>>> 3. Scheduler runs much faster since it doesn't have to run python > files > >>>> and > >>>>> e.g. make network calls > >>>>> 4. Scaling scheduler becomes easier because can have different > service > >>>>> responsible for parsing DAGs which can be trivially scaled > horizontally > >>>>> (clients are doing the parsing) > >>>>> 5. Potentially makes creating ad-hoc DAGs/backfilling/iterating on > DAGs > >>>>> easier? e.g. can use the Scheduler itself to schedule backfills with > a > >>>>> slightly modified serialized version of a DAG. > >>>>> > >>>>> Cons: > >>>>> 1. Have to deprecate a lot of popular features, e.g. allowing custom > >>>>> callbacks in operators (e.g. on_failure), and jinja_templates > >>>>> 2. Version compatibility problems, e.g. user/service client might be > >>>>> serializing arguments for hooks/operators that have been deprecated > in > >>>>> newer versions of the hooks, or the serialized DAG schema changes and > >> old > >>>>> DAGs aren't automatically updated. Might want to have some kind of > >>>>> versioning system for serialized DAGs to at least ensure that stored > >> DAGs > >>>>> are valid when the Scheduler/Worker/etc are upgraded, maybe something > >>>>> similar to thrift/protobuf versioning. > >>>>> 3. Additional complexity - additional service, logic on > >> workers/scheduler > >>>>> to fetch/cache serialized DAGs efficiently, expiring/archiving old > DAG > >>>>> definitions, etc > >>>>> > >>>>> > >>>>> On Sun, Jul 29, 2018 at 3:20 PM Bolke de Bruin <bdbr...@gmail.com > >>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>>> wrote: > >>>>> > >>>>>> Ah gotcha. That’s another issue actually (but related). > >>>>>> > >>>>>> Let’s say we trust the owner field of the DAGs I think we could do > the > >>>>>> following. We then have a table (and interface) to tell Airflow what > >>>> users > >>>>>> have access to what connections. The scheduler can then check if the > >>>> task > >>>>>> in the dag can access the conn_id it is asking for. Auto generated > >> dags > >>>>>> still have an owner (or should) and therefore should be fine. Some > >>>>>> integrity checking could/should be added as we want to be sure that > >> the > >>>>>> task we schedule is the task we launch. So a signature calculated at > >> the > >>>>>> scheduler (or part of the DAG), send as part of the metadata and > >>>> checked by > >>>>>> the executor is probably smart. > >>>>>> > >>>>>> You can also make this more fancy by integrating with something like > >>>>>> Apache Ranger that allows for policy checking. > >>>>>> > >>>>>> Obviously, the trusting the user part is key here. It is one of the > >>>>>> reasons I was suggesting using “airflow submit” to update / add dags > >> in > >>>>>> Airflow. We could enforce authentication on the DAG. It was kind of > >>>> ruled > >>>>>> out in favor of git time machines although these never happened > afaik > >>>> ;-). > >>>>>> > >>>>>> BTW: I have updated my implementation with protobuf. Metadata is now > >>>>>> available at executor and task. > >>>>>> > >>>>>> > >>>>>>> On 29 Jul 2018, at 15:47, Dan Davydov <ddavy...@twitter.com.INVALID > >> <mailto:ddavy...@twitter.com.INVALID>> > >>>>>> wrote: > >>>>>>> > >>>>>>> The concern is how to secure secrets on the scheduler such that > only > >>>>>>> certain DAGs can access them, and in the case of files that create > >> DAGs > >>>>>>> dynamically, only some set of DAGs should be able to access these > >>>>>> secrets. > >>>>>>> > >>>>>>> e.g. if there is a secret/keytab that can be read by DAG A > generated > >> by > >>>>>>> file X, and file X generates DAG B as well, there needs to be a > >> scheme > >>>> to > >>>>>>> stop the parsing of DAG B on the scheduler from being able to read > >> the > >>>>>>> secret in DAG A. > >>>>>>> > >>>>>>> Does that make sense? > >>>>>>> > >>>>>>> On Sun, Jul 29, 2018 at 6:14 AM Bolke de Bruin <bdbr...@gmail.com > >> <mailto:bdbr...@gmail.com> > >>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > >> bdbr...@gmail.com <mailto:bdbr...@gmail.com>>>> wrote: > >>>>>>> > >>>>>>>> I’m not sure what you mean. The example I created allows for > dynamic > >>>>>> DAGs, > >>>>>>>> as the scheduler obviously knows about the tasks when they are > ready > >>>> to > >>>>>> be > >>>>>>>> scheduled. > >>>>>>>> This isn’t any different from a static DAG or a dynamic one. > >>>>>>>> > >>>>>>>> For Kerberos it isnt that special. Basically a keytab are the > >>>> revokable > >>>>>>>> users credentials > >>>>>>>> in a special format. The keytab itself can be protected by a > >> password. > >>>>>> So > >>>>>>>> I can imagine > >>>>>>>> that a connection is defined that sets a keytab location and > >> password > >>>> to > >>>>>>>> access the keytab. > >>>>>>>> The scheduler understands this (or maybe the Connection model) and > >>>>>>>> serializes and sends > >>>>>>>> it to the worker as part of the metadata. The worker then > >> reconstructs > >>>>>> the > >>>>>>>> keytab and issues > >>>>>>>> a kinit or supplies it to the other service requiring it (eg. > Spark) > >>>>>>>> > >>>>>>>> * Obviously the worker and scheduler need to communicate over SSL. > >>>>>>>> * There is a challenge at the worker level. Credentials are > secured > >>>>>>>> against other users, but are readable by the owning user. So > >> imagine 2 > >>>>>> DAGs > >>>>>>>> from two different users with different connections without sudo > >>>>>>>> configured. If they end up at the same worker if DAG 2 is > malicious > >> it > >>>>>>>> could read files and memory created by DAG 1. This is the reason > why > >>>>>> using > >>>>>>>> environment variables are NOT safe (DAG 2 could read > >>>>>> /proc/<pid>/environ). > >>>>>>>> To mitigate this we probably need to PIPE the data to the task’s > >>>> STDIN. > >>>>>> It > >>>>>>>> won’t solve the issue but will make it harder as now it will only > be > >>>> in > >>>>>>>> memory. > >>>>>>>> * The reconstructed keytab (or the initalized version) can be > stored > >>>> in, > >>>>>>>> most likely, the process-keyring ( > >>>>>>>> http://man7.org/linux/man-pages/man7/process-keyring.7.html < > >> http://man7.org/linux/man-pages/man7/process-keyring.7.html> < > >>>> http://man7.org/linux/man-pages/man7/process-keyring.7.html < > >> http://man7.org/linux/man-pages/man7/process-keyring.7.html>> < > >>>>>>>> http://man7.org/linux/man-pages/man7/process-keyring.7.html < > >> http://man7.org/linux/man-pages/man7/process-keyring.7.html> < > >>>> http://man7.org/linux/man-pages/man7/process-keyring.7.html < > >> http://man7.org/linux/man-pages/man7/process-keyring.7.html>> < > >>>>>> http://man7.org/linux/man-pages/man7/process-keyring.7.html < > >> http://man7.org/linux/man-pages/man7/process-keyring.7.html> < > >>>> http://man7.org/linux/man-pages/man7/process-keyring.7.html < > >> http://man7.org/linux/man-pages/man7/process-keyring.7.html>>>>). As > >>>>>>>> mentioned earlier this poses a challenge for Java applications > that > >>>>>> cannot > >>>>>>>> read from this location (keytab an ccache). Writing it out to the > >>>>>>>> filesystem then becomes a possibility. This is essentially the > same > >>>> how > >>>>>>>> Spark solves it ( > >>>>>>>> https://spark.apache.org/docs/latest/security.html#yarn-mode < > >> https://spark.apache.org/docs/latest/security.html#yarn-mode> < > >>>> https://spark.apache.org/docs/latest/security.html#yarn-mode < > >> https://spark.apache.org/docs/latest/security.html#yarn-mode>> < > >>>>>> https://spark.apache.org/docs/latest/security.html#yarn-mode < > >> https://spark.apache.org/docs/latest/security.html#yarn-mode> < > >>>> https://spark.apache.org/docs/latest/security.html#yarn-mode < > >> https://spark.apache.org/docs/latest/security.html#yarn-mode>>> < > >>>>>>>> https://spark.apache.org/docs/latest/security.html#yarn-mode < > >> https://spark.apache.org/docs/latest/security.html#yarn-mode> < > >>>> https://spark.apache.org/docs/latest/security.html#yarn-mode < > >> https://spark.apache.org/docs/latest/security.html#yarn-mode>> < > >>>>>> https://spark.apache.org/docs/latest/security.html#yarn-mode < > >> https://spark.apache.org/docs/latest/security.html#yarn-mode> < > >>>> https://spark.apache.org/docs/latest/security.html#yarn-mode < > >> https://spark.apache.org/docs/latest/security.html#yarn-mode>>>>). > >>>>>>>> > >>>>>>>> Why not work on this together? We need it as well. Airflow as it > is > >>>> now > >>>>>> we > >>>>>>>> consider the biggest security threat and it is really hard to > secure > >>>> it. > >>>>>>>> The above would definitely be a serious improvement. Another step > >>>> would > >>>>>> be > >>>>>>>> to stop Tasks from accessing the Airflow DB all together. > >>>>>>>> > >>>>>>>> Cheers > >>>>>>>> Bolke > >>>>>>>> > >>>>>>>>> On 29 Jul 2018, at 05:36, Dan Davydov > <ddavy...@twitter.com.INVALID > >> <mailto:ddavy...@twitter.com.INVALID> > >>>> <mailto:ddavy...@twitter.com.INVALID <mailto: > >> ddavy...@twitter.com.INVALID>> > >>>>>> <mailto:ddavy...@twitter.com.INVALID <mailto: > >> ddavy...@twitter.com.INVALID> <mailto: > >>>> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>>>> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> This makes sense, and thanks for putting this together. I might > >> pick > >>>>>> this > >>>>>>>>> up myself depending on if we can get the rest of the > mutli-tenancy > >>>>>> story > >>>>>>>>> nailed down, but I still think the tricky part is figuring out > how > >> to > >>>>>>>> allow > >>>>>>>>> dynamic DAGs (e.g. DAGs created from rows in a Mysql table) to > work > >>>>>> with > >>>>>>>>> Kerberos, curious what your thoughts are there. How would secrets > >> be > >>>>>>>> passed > >>>>>>>>> securely in a multi-tenant Scheduler starting from parsing the > DAGs > >>>> up > >>>>>> to > >>>>>>>>> the executor sending them off? > >>>>>>>>> > >>>>>>>>> On Sat, Jul 28, 2018 at 5:07 PM Bolke de Bruin < > bdbr...@gmail.com > >> <mailto:bdbr...@gmail.com> > >>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>> > >>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > >> bdbr...@gmail.com <mailto:bdbr...@gmail.com>>> > >>>>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > >> bdbr...@gmail.com <mailto:bdbr...@gmail.com>> <mailto: > >>>> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > bdbr...@gmail.com > >> <mailto:bdbr...@gmail.com>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Here: > >>>>>>>>>> > >>>>>>>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections > < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections> < > >>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections>> < > >>>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections> < > >>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections>>> < > >>>>>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections> < > >>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections>> < > >>>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections> < > >>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections>>>> < > >>>>>>>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections > < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections> < > >>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections>> < > >>>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections> < > >>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections>>> < > >>>>>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections> < > >>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections>> < > >>>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections> < > >>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections>>>>> > >>>>>>>>>> > >>>>>>>>>> Is a working rudimentary implementation that allows securing the > >>>>>>>>>> connections (only LocalExecutor at the moment) > >>>>>>>>>> > >>>>>>>>>> * It enforces the use of “conn_id” instead of the mix that we > have > >>>> now > >>>>>>>>>> * A task if using “conn_id” has ‘auto-registered’ (which is a > >> noop) > >>>>>> its > >>>>>>>>>> connections > >>>>>>>>>> * The scheduler reads the connection informations and serializes > >> it > >>>> to > >>>>>>>>>> json (which should be a different format, protobuf preferably) > >>>>>>>>>> * The scheduler then sends this info to the executor > >>>>>>>>>> * The executor puts this in the environment of the task > >> (environment > >>>>>>>> most > >>>>>>>>>> likely not secure enough for us) > >>>>>>>>>> * The BaseHook reads out this environment variable and does not > >> need > >>>>>> to > >>>>>>>>>> touch the database > >>>>>>>>>> > >>>>>>>>>> The example_http_operator works, I havent tested any other. To > >> make > >>>> it > >>>>>>>>>> work I just adjusted the hook and operator to use “conn_id” > >> instead > >>>>>>>>>> of the non standard http_conn_id. > >>>>>>>>>> > >>>>>>>>>> Makes sense? > >>>>>>>>>> > >>>>>>>>>> B. > >>>>>>>>>> > >>>>>>>>>> * The BaseHook is adjusted to not connect to the database > >>>>>>>>>>> On 28 Jul 2018, at 17:50, Bolke de Bruin <bdbr...@gmail.com > >> <mailto:bdbr...@gmail.com> > >>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>> <mailto: > >>>>>> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > >> bdbr...@gmail.com <mailto:bdbr...@gmail.com>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Well, I don’t think a hook (or task) should be obtain it by > >> itself. > >>>>>> It > >>>>>>>>>> should be supplied. > >>>>>>>>>>> At the moment you start executing the task you cannot trust it > >>>>>> anymore > >>>>>>>>>> (ie. it is unmanaged > >>>>>>>>>>> / non airflow code). > >>>>>>>>>>> > >>>>>>>>>>> So we could change the basehook to understand supplied > >> credentials > >>>>>> and > >>>>>>>>>> populate > >>>>>>>>>>> a hash with “conn_ids”. Hooks normally call > >> BaseHook.get_connection > >>>>>>>>>> anyway, so > >>>>>>>>>>> it shouldnt be too hard and should in principle not require > >> changes > >>>>>> to > >>>>>>>>>> the hooks > >>>>>>>>>>> themselves if they are well behaved. > >>>>>>>>>>> > >>>>>>>>>>> B. > >>>>>>>>>>> > >>>>>>>>>>>> On 28 Jul 2018, at 17:41, Dan Davydov > >>>> <ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> > >> <mailto:ddavy...@twitter.com.INVALID <mailto: > ddavy...@twitter.com.INVALID > >>>> > >>>>>> <mailto:ddavy...@twitter.com.INVALID <mailto: > >> ddavy...@twitter.com.INVALID> <mailto: > >>>> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>>> > >>>>>>>>>> <mailto:ddavy...@twitter.com.INVALID <mailto: > >> ddavy...@twitter.com.INVALID> <mailto: > >>>> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>> > >> <mailto: > >>>>>> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> > >> <mailto:ddavy...@twitter.com.INVALID <mailto: > ddavy...@twitter.com.INVALID > >>>>> > >>>> <mailto: > >>>>>>>> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID > > > >> <mailto:ddavy...@twitter.com.INVALID <mailto: > ddavy...@twitter.com.INVALID > >>>> > >>>> <mailto:ddavy...@twitter.com.INVALID <mailto: > >> ddavy...@twitter.com.INVALID> <mailto:ddavy...@twitter.com.INVALID > >> <mailto:ddavy...@twitter.com.INVALID> > >>>>>>>>> > >>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> *So basically in the scheduler we parse the dag. Either from > the > >>>>>>>>>> manifest > >>>>>>>>>>>> (new) or from smart parsing (probably harder, maybe some auto > >>>>>>>>>> register?) we > >>>>>>>>>>>> know what connections and keytabs are available dag wide or > per > >>>>>> task.* > >>>>>>>>>>>> This is the hard part that I was curious about, for > dynamically > >>>>>>>> created > >>>>>>>>>>>> DAGs, e.g. those generated by reading tasks in a MySQL > database > >>>> or a > >>>>>>>>>> json > >>>>>>>>>>>> file, there isn't a great way to do this. > >>>>>>>>>>>> > >>>>>>>>>>>> I 100% agree with deprecating the connections table (at least > >> for > >>>>>> the > >>>>>>>>>>>> secure option). The main work there is rewriting all hooks to > >> take > >>>>>>>>>>>> credentials from arbitrary data sources by allowing a > customized > >>>>>>>>>>>> CredentialsReader class. Although hooks are technically > >> private, I > >>>>>>>>>> think a > >>>>>>>>>>>> lot of companies depend on them so the PMC should probably > >> discuss > >>>>>> if > >>>>>>>>>> this > >>>>>>>>>>>> is an Airflow 2.0 change or not. > >>>>>>>>>>>> > >>>>>>>>>>>> On Fri, Jul 27, 2018 at 5:24 PM Bolke de Bruin < > >> bdbr...@gmail.com <mailto:bdbr...@gmail.com> > >>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>> > >>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > >> bdbr...@gmail.com <mailto:bdbr...@gmail.com>>> > >>>>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > >> bdbr...@gmail.com <mailto:bdbr...@gmail.com>> <mailto: > >>>> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > bdbr...@gmail.com > >> <mailto:bdbr...@gmail.com>>>> > >>>>>>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > >> bdbr...@gmail.com <mailto:bdbr...@gmail.com>> <mailto: > >>>> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > bdbr...@gmail.com > >> <mailto:bdbr...@gmail.com>>> <mailto: > >>>>>> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > >> bdbr...@gmail.com <mailto:bdbr...@gmail.com>> <mailto:bdbr...@gmail.com > >> <mailto:bdbr...@gmail.com> > >>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Sure. In general I consider keytabs as a part of connection > >>>>>>>>>> information. > >>>>>>>>>>>>> Connections should be secured by sending the connection > >>>>>> information a > >>>>>>>>>> task > >>>>>>>>>>>>> needs as part of information the executor gets. A task should > >>>> then > >>>>>>>> not > >>>>>>>>>> need > >>>>>>>>>>>>> access to the connection table in Airflow. Keytabs could then > >> be > >>>>>> send > >>>>>>>>>> as > >>>>>>>>>>>>> part of the connection information (base64 encoded) and setup > >> by > >>>>>> the > >>>>>>>>>>>>> executor (this key) to be read only to the task it is > >> launching. > >>>>>>>>>>>>> > >>>>>>>>>>>>> So basically in the scheduler we parse the dag. Either from > the > >>>>>>>>>> manifest > >>>>>>>>>>>>> (new) or from smart parsing (probably harder, maybe some auto > >>>>>>>>>> register?) we > >>>>>>>>>>>>> know what connections and keytabs are available dag wide or > per > >>>>>> task. > >>>>>>>>>>>>> > >>>>>>>>>>>>> The credentials and connection information then are > serialized > >>>>>> into a > >>>>>>>>>>>>> protobuf message and send to the executor as part of the > >> “queue” > >>>>>>>>>> action. > >>>>>>>>>>>>> The worker then deserializes the information and makes it > >>>> securely > >>>>>>>>>>>>> available to the task (which is quite hard btw). > >>>>>>>>>>>>> > >>>>>>>>>>>>> On that last bit making the info securely available might be > >>>>>> storing > >>>>>>>>>> it in > >>>>>>>>>>>>> the Linux KEYRING (supported by python keyring). Keytabs will > >> be > >>>>>>>> tough > >>>>>>>>>> to > >>>>>>>>>>>>> do properly due to Java not properly supporting KEYRING and > >> only > >>>>>>>> files > >>>>>>>>>> and > >>>>>>>>>>>>> these are hard to make secure (due to the possibility a > process > >>>>>> will > >>>>>>>>>> list > >>>>>>>>>>>>> all files in /tmp and get credentials through that). Maybe > >>>> storing > >>>>>>>> the > >>>>>>>>>>>>> keytab with a password and having the password in the KEYRING > >>>> might > >>>>>>>>>> work. > >>>>>>>>>>>>> Something to find out. > >>>>>>>>>>>>> > >>>>>>>>>>>>> B. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Verstuurd vanaf mijn iPad > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Op 27 jul. 2018 om 22:04 heeft Dan Davydov > >>>>>>>>>> <ddavy...@twitter.com.INVALID <mailto: > >> ddavy...@twitter.com.INVALID> <mailto:ddavy...@twitter.com.INVALID > >> <mailto:ddavy...@twitter.com.INVALID>> > >>>> <mailto:ddavy...@twitter.com.INVALID <mailto: > >> ddavy...@twitter.com.INVALID> <mailto:ddavy...@twitter.com.INVALID > >> <mailto:ddavy...@twitter.com.INVALID> > >>>>>> > >>>>>> <mailto:ddavy...@twitter.com.INVALID <mailto: > >> ddavy...@twitter.com.INVALID> <mailto: > >>>> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>> > >> <mailto:ddavy...@twitter.com.INVALID <mailto: > ddavy...@twitter.com.INVALID> > >>>> <mailto:ddavy...@twitter.com.INVALID <mailto: > >> ddavy...@twitter.com.INVALID>> > >>>>>>>> > >>>>>>>> <mailto:ddavy...@twitter.com.INVALID <mailto: > >> ddavy...@twitter.com.INVALID> <mailto: > >>>> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>> > >> <mailto: > >>>>>> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> > >> <mailto:ddavy...@twitter.com.INVALID <mailto: > ddavy...@twitter.com.INVALID > >>>>> > >>>> <mailto:ddavy...@twitter.com.INVALID <mailto: > >> ddavy...@twitter.com.INVALID> <mailto:ddavy...@twitter.com.INVALID > >> <mailto:ddavy...@twitter.com.INVALID>> > >>>>>> <mailto:ddavy...@twitter.com.INVALID <mailto: > >> ddavy...@twitter.com.INVALID> <mailto: > >>>> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>>> > >>>>>>>>>>> > >>>>>>>>>>>>> het volgende geschreven: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I'm curious if you had any ideas in terms of ideas to enable > >>>>>>>>>>>>> multi-tenancy > >>>>>>>>>>>>>> with respect to Kerberos in Airflow. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Fri, Jul 27, 2018 at 2:38 PM Bolke de Bruin < > >>>>>> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > >> bdbr...@gmail.com <mailto:bdbr...@gmail.com>> <mailto:bdbr...@gmail.com > >> <mailto:bdbr...@gmail.com> > >>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>>> > >>>>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > >> bdbr...@gmail.com <mailto:bdbr...@gmail.com>> <mailto: > >>>> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > bdbr...@gmail.com > >> <mailto:bdbr...@gmail.com>>>> > >>>>>>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > >> bdbr...@gmail.com <mailto:bdbr...@gmail.com>> <mailto: > >>>> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > bdbr...@gmail.com > >> <mailto:bdbr...@gmail.com>>> <mailto: > >>>>>> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > >> bdbr...@gmail.com <mailto:bdbr...@gmail.com>> <mailto:bdbr...@gmail.com > >> <mailto:bdbr...@gmail.com> > >>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>>>>>> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Cool. The doc will need some refinement as it isn't > entirely > >>>>>>>>>> accurate. > >>>>>>>>>>>>> In > >>>>>>>>>>>>>>> addition we need to separate between Airflow as a client of > >>>>>>>>>> kerberized > >>>>>>>>>>>>>>> services (this is what is talked about in the astronomer > doc) > >>>> vs > >>>>>>>>>>>>>>> kerberizing airflow itself, which the API supports. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> In general to access kerberized services (airflow as a > >> client) > >>>>>> one > >>>>>>>>>> needs > >>>>>>>>>>>>>>> to start the ticket renewer with a valid keytab. For the > >> hooks > >>>> it > >>>>>>>>>> isn't > >>>>>>>>>>>>>>> always required to change the hook to support it. Hadoop > cli > >>>>>> tools > >>>>>>>>>> often > >>>>>>>>>>>>>>> just pick it up as their client config is set to do so. > Then > >>>>>>>> another > >>>>>>>>>>>>> class > >>>>>>>>>>>>>>> is there for HTTP-like services which are accessed by > urllib > >>>>>> under > >>>>>>>>>> the > >>>>>>>>>>>>>>> hood, these typically use SPNEGO. These often need to be > >>>> adjusted > >>>>>>>> as > >>>>>>>>>> it > >>>>>>>>>>>>>>> requires some urllib config. Finally, there are protocols > >> which > >>>>>> use > >>>>>>>>>> SASL > >>>>>>>>>>>>>>> with kerberos. Like HDFS (not webhdfs, that uses SPNEGO). > >> These > >>>>>>>>>> require > >>>>>>>>>>>>> per > >>>>>>>>>>>>>>> protocol implementations. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> From the top of my head we support kerberos client side now > >>>> with: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> * Spark > >>>>>>>>>>>>>>> * HDFS (snakebite python 2.7, cli and with the upcoming > >> libhdfs > >>>>>>>>>>>>>>> implementation) > >>>>>>>>>>>>>>> * Hive (not metastore afaik) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Two things to remember: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> * If a job (ie. Spark job) will finish later than the > maximum > >>>>>>>> ticket > >>>>>>>>>>>>>>> lifetime you probably need to provide a keytab to said > >>>>>> application. > >>>>>>>>>>>>>>> Otherwise you will get failures after the expiry. > >>>>>>>>>>>>>>> * A keytab (used by the renewer) are credentials (user and > >>>> pass) > >>>>>> so > >>>>>>>>>> jobs > >>>>>>>>>>>>>>> are executed under the keytab in use at that moment > >>>>>>>>>>>>>>> * Securing keytab in multi tenancy airflow is a challenge. > >> This > >>>>>>>> also > >>>>>>>>>>>>> goes > >>>>>>>>>>>>>>> for securing connections. This we need to fix at some > point. > >>>>>>>> Solution > >>>>>>>>>>>>> for > >>>>>>>>>>>>>>> now seems to be no multi tenancy. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Kerberos seems harder than it is btw. Still, we are > sometimes > >>>>>>>> moving > >>>>>>>>>>>>> away > >>>>>>>>>>>>>>> from it to OAUTH2 based authentication. This gets use > closer > >> to > >>>>>>>> cloud > >>>>>>>>>>>>>>> standards (but we are on prem) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> B. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Sent from my iPhone > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On 27 Jul 2018, at 17:41, Hitesh Shah <hit...@apache.org > >> <mailto:hit...@apache.org> > >>>> <mailto:hit...@apache.org <mailto:hit...@apache.org>> > >>>>>> <mailto:hit...@apache.org <mailto:hit...@apache.org> <mailto: > >> hit...@apache.org <mailto:hit...@apache.org>>> <mailto: > >>>>>>>> hit...@apache.org <mailto:hit...@apache.org> <mailto: > >> hit...@apache.org <mailto:hit...@apache.org>> <mailto: > >>>> hit...@apache.org <mailto:hit...@apache.org> <mailto: > hit...@apache.org > >> <mailto:hit...@apache.org>>>> <mailto: > >>>>>>>>>> hit...@apache.org <mailto:hit...@apache.org> <mailto: > >> hit...@apache.org <mailto:hit...@apache.org>> <mailto: > >>>> hit...@apache.org <mailto:hit...@apache.org> <mailto: > hit...@apache.org > >> <mailto:hit...@apache.org>>> <mailto: > >>>>>> hit...@apache.org <mailto:hit...@apache.org> <mailto: > >> hit...@apache.org <mailto:hit...@apache.org>> <mailto:hit...@apache.org > >> <mailto:hit...@apache.org> > >>>> <mailto:hit...@apache.org <mailto:hit...@apache.org>>>>>> wrote: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hi Taylor > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> +1 on upstreaming this. It would be great if you can > submit > >> a > >>>>>> pull > >>>>>>>>>>>>>>> request > >>>>>>>>>>>>>>>> to enhance the apache airflow docs. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> thanks > >>>>>>>>>>>>>>>> Hitesh > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Thu, Jul 26, 2018 at 2:32 PM Taylor Edmiston < > >>>>>>>>>> tedmis...@gmail.com <mailto:tedmis...@gmail.com> <mailto: > >> tedmis...@gmail.com <mailto:tedmis...@gmail.com>> <mailto: > >>>> tedmis...@gmail.com <mailto:tedmis...@gmail.com> <mailto: > >> tedmis...@gmail.com <mailto:tedmis...@gmail.com>>> <mailto: > >>>>>> tedmis...@gmail.com <mailto:tedmis...@gmail.com> <mailto: > >> tedmis...@gmail.com <mailto:tedmis...@gmail.com>> <mailto: > >>>> tedmis...@gmail.com <mailto:tedmis...@gmail.com> <mailto: > >> tedmis...@gmail.com <mailto:tedmis...@gmail.com>>>> <mailto: > >>>>>>>> tedmis...@gmail.com <mailto:tedmis...@gmail.com> <mailto: > >> tedmis...@gmail.com <mailto:tedmis...@gmail.com>> <mailto: > >>>> tedmis...@gmail.com <mailto:tedmis...@gmail.com> <mailto: > >> tedmis...@gmail.com <mailto:tedmis...@gmail.com>>> <mailto: > >>>>>> tedmis...@gmail.com <mailto:tedmis...@gmail.com> <mailto: > >> tedmis...@gmail.com <mailto:tedmis...@gmail.com>> <mailto: > >>>> tedmis...@gmail.com <mailto:tedmis...@gmail.com> <mailto: > >> tedmis...@gmail.com <mailto:tedmis...@gmail.com>>>>>> > >>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> While we're on the topic, I'd love any feedback from > Bolke > >> or > >>>>>>>>>> others > >>>>>>>>>>>>>>> who've > >>>>>>>>>>>>>>>>> used Kerberos with Airflow on this quick guide I put > >> together > >>>>>>>>>>>>> yesterday. > >>>>>>>>>>>>>>>>> It's similar to what's in the Airflow docs but instead > all > >> on > >>>>>> one > >>>>>>>>>> page > >>>>>>>>>>>>>>>>> and slightly > >>>>>>>>>>>>>>>>> expanded. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>> > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >>> > >>>> < > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >>> > >>>>> > >>>>>> < > >>>>>> > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >>> > >>>> < > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >>> > >>>>> > >>>>>>> > >>>>>>>> < > >>>>>>>> > >>>>>> > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >>> > >>>> < > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >>> > >>>>> > >>>>>> < > >>>>>> > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >>> > >>>> < > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >>> > >>>>> > >>>>>>> > >>>>>>>>> > >>>>>>>>>> < > >>>>>>>>>> > >>>>>>>> > >>>>>> > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >>> > >>>> < > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >>> > >>>>> > >>>>>> < > >>>>>> > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >>> > >>>> < > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >>> > >>>>> > >>>>>>> > >>>>>>>> < > >>>>>>>> > >>>>>> > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >>> > >>>> < > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >>> > >>>>> > >>>>>> < > >>>>>> > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >>> > >>>> < > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > >>> > >>>>> > >>>>>>> > >>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>>>>>> (or web version < > >> https://www.astronomer.io/guides/kerberos/ < > >> https://www.astronomer.io/guides/kerberos/> > >>>> <https://www.astronomer.io/guides/kerberos/ < > >> https://www.astronomer.io/guides/kerberos/>> < > >>>>>> https://www.astronomer.io/guides/kerberos/ < > >> https://www.astronomer.io/guides/kerberos/> < > >>>> https://www.astronomer.io/guides/kerberos/ < > >> https://www.astronomer.io/guides/kerberos/>>> < > >>>>>>>> https://www.astronomer.io/guides/kerberos/ < > >> https://www.astronomer.io/guides/kerberos/> < > >>>> https://www.astronomer.io/guides/kerberos/ < > >> https://www.astronomer.io/guides/kerberos/>> < > >>>>>> https://www.astronomer.io/guides/kerberos/ < > >> https://www.astronomer.io/guides/kerberos/> < > >>>> https://www.astronomer.io/guides/kerberos/ < > >> https://www.astronomer.io/guides/kerberos/>>>>>) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> One thing I'd like to add is a minimal example of how to > >>>>>>>> Kerberize > >>>>>>>>>> a > >>>>>>>>>>>>>>> hook. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I'd be happy to upstream this as well if it's useful > >> (maybe a > >>>>>>>>>>>>> Concepts > > >>>>>>>>>>>>>>>>> Additional Functionality > Kerberos page?) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>> Taylor > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> *Taylor Edmiston* > >>>>>>>>>>>>>>>>> Blog <https://blog.tedmiston.com/ < > >> https://blog.tedmiston.com/> < > >>>> https://blog.tedmiston.com/ <https://blog.tedmiston.com/>> < > >> https://blog.tedmiston.com/ <https://blog.tedmiston.com/> < > >>>> https://blog.tedmiston.com/ <https://blog.tedmiston.com/>>> > >>>>>> <https://blog.tedmiston.com/ <https://blog.tedmiston.com/> < > >> https://blog.tedmiston.com/ <https://blog.tedmiston.com/>> < > >>>> https://blog.tedmiston.com/ <https://blog.tedmiston.com/> < > >> https://blog.tedmiston.com/ <https://blog.tedmiston.com/>>>>> > >>>>>>>> | CV > >>>>>>>>>>>>>>>>> <https://stackoverflow.com/cv/taylor < > >> https://stackoverflow.com/cv/taylor> < > >>>> https://stackoverflow.com/cv/taylor < > >> https://stackoverflow.com/cv/taylor>> < > >>>>>> https://stackoverflow.com/cv/taylor < > >> https://stackoverflow.com/cv/taylor> < > >>>> https://stackoverflow.com/cv/taylor < > >> https://stackoverflow.com/cv/taylor>>> < > >>>>>>>> https://stackoverflow.com/cv/taylor < > >> https://stackoverflow.com/cv/taylor> < > >>>> https://stackoverflow.com/cv/taylor < > >> https://stackoverflow.com/cv/taylor>> < > >>>>>> https://stackoverflow.com/cv/taylor < > >> https://stackoverflow.com/cv/taylor> < > >>>> https://stackoverflow.com/cv/taylor < > >> https://stackoverflow.com/cv/taylor>>>>> | LinkedIn > >>>>>>>>>>>>>>>>> <https://www.linkedin.com/in/tedmiston/ < > >> https://www.linkedin.com/in/tedmiston/> < > >>>> https://www.linkedin.com/in/tedmiston/ < > >> https://www.linkedin.com/in/tedmiston/>> < > >>>>>> https://www.linkedin.com/in/tedmiston/ < > >> https://www.linkedin.com/in/tedmiston/> < > >>>> https://www.linkedin.com/in/tedmiston/ < > >> https://www.linkedin.com/in/tedmiston/>>> < > >>>>>>>> https://www.linkedin.com/in/tedmiston/ < > >> https://www.linkedin.com/in/tedmiston/> < > >>>> https://www.linkedin.com/in/tedmiston/ < > >> https://www.linkedin.com/in/tedmiston/>> < > >>>>>> https://www.linkedin.com/in/tedmiston/ < > >> https://www.linkedin.com/in/tedmiston/> < > >>>> https://www.linkedin.com/in/tedmiston/ < > >> https://www.linkedin.com/in/tedmiston/>>>>> | AngelList > >>>>>>>>>>>>>>>>> <https://angel.co/taylor <https://angel.co/taylor> < > >> https://angel.co/taylor <https://angel.co/taylor>> < > >>>> https://angel.co/taylor <https://angel.co/taylor> < > >> https://angel.co/taylor <https://angel.co/taylor>>> < > >>>>>> https://angel.co/taylor <https://angel.co/taylor> < > >> https://angel.co/taylor <https://angel.co/taylor>> < > >>>> https://angel.co/taylor <https://angel.co/taylor> < > >> https://angel.co/taylor <https://angel.co/taylor>>>>> | Stack > >>>>>>>> Overflow > >>>>>>>>>>>>>>>>> <https://stackoverflow.com/users/149428/taylor-edmiston > < > >> https://stackoverflow.com/users/149428/taylor-edmiston> < > >>>> https://stackoverflow.com/users/149428/taylor-edmiston < > >> https://stackoverflow.com/users/149428/taylor-edmiston>> < > >>>>>> https://stackoverflow.com/users/149428/taylor-edmiston < > >> https://stackoverflow.com/users/149428/taylor-edmiston> < > >>>> https://stackoverflow.com/users/149428/taylor-edmiston < > >> https://stackoverflow.com/users/149428/taylor-edmiston>>> < > >>>>>>>> https://stackoverflow.com/users/149428/taylor-edmiston < > >> https://stackoverflow.com/users/149428/taylor-edmiston> < > >>>> https://stackoverflow.com/users/149428/taylor-edmiston < > >> <https://stackoverflow.com/users/149428/taylor-edmiston>