I'm very intrigued, and am curious how this would work in a bit more detail, especially for dynamically created DAGs (how would static manifests map to DAGs that are generated from rows in a MySQL table for example)? You could of course have something like regexes in your manifest file like some_dag_framework_dag_*, but then how would you make sure that other users did not create DAGs that matched this regex?
On Thu, Aug 2, 2018 at 1:51 PM Bolke de Bruin <bdbr...@gmail.com> wrote: > Hi Dan, > > I discussed this a little bit with one of the security architects here. We > think that > you can have a fair trade off between security and usability by having > a kind of manifest with the dag you are submitting. This manifest can then > specify what the generated tasks/dags are allowed to do and what metadata > to provide to them. We could also let the scheduler generate hashes per > generated > DAG / task and verify those with an established version (1st run?). This > limits the > attack vector. > > A DagSerializer would be great, but I think it solves a different issue > and the above > is somewhat simpler to implement? > > Bolke > > > On 29 Jul 2018, at 23:47, Dan Davydov <ddavy...@twitter.com.INVALID> > wrote: > > > > *Let’s say we trust the owner field of the DAGs I think we could do the > > following.* > > *Obviously, the trusting the user part is key here. It is one of the > > reasons I was suggesting using “airflow submit” to update / add dags in > > Airflow* > > > > > > *This is the hard part about my question.* > > I think in a true multi-tenant environment we wouldn't be able to trust > the > > user, otherwise we wouldn't necessarily even need a mapping of Airflow > DAG > > users to secrets, because if we trust users to set the correct Airflow > user > > for DAGs, we are basically trusting them with all of the creds the > Airflow > > scheduler can access for all users anyways. > > > > I actually had the same thought as your "airflow submit" a while ago, > which > > I discussed with Alex, basically creating an API for adding DAGs instead > of > > having the Scheduler parse them. FWIW I think it's superior to the git > time > > machine approach because it's a more generic form of "serialization" and > is > > more correct as well because the same DAG file parsed on a given git SHA > > can produce different DAGs. Let me know what you think, and maybe I can > > start a more formal design doc if you are onboard: > > > > A user or service with an auth token sends an "airflow submit" request > to a > > new kind of Dag Serialization service, along with the serialized DAG > > objects generated by parsing on the client. It's important that these > > serialized objects are declaritive and not e.g. pickles so that the > > scheduler/workers can consume them and reproducability of the DAGs is > > guaranteed. The service will then store each generated DAG along with > it's > > access based on the provided token e.g. using Ranger, and the > > scheduler/workers will use the stored DAGs for scheduling/execution. > > Operators would be deployed along with the Airflow code separately from > the > > serialized DAGs. > > > > A serialed DAG would look something like this (basically Luigi-style :)): > > MyTask - BashOperator: { > > cmd: "sleep 1" > > user: "Foo" > > access: "token1", "token2" > > } > > > > MyDAG: { > > MyTask1 >> SomeOtherTask1 > > MyTask2 >> SomeOtherTask1 > > } > > > > Dynamic DAGs in this case would just consist of a service calling > "Airflow > > Submit" that does it's own form of authentication to get access to some > > kind of tokens (or basically just forwarding the secrets the users of the > > dynamic DAG submit). > > > > For the default Airflow implementation you can maybe just have the Dag > > Serialization server bundled with the Scheduler, with auth turned off, > and > > to periodically update the Dag Serialization store which would emulate > the > > current behavior closely. > > > > Pros: > > 1. Consistency across running task instances in a dagrun/scheduler, > > reproducability and auditability of DAGs > > 2. Users can control when to deploy their DAGs > > 3. Scheduler runs much faster since it doesn't have to run python files > and > > e.g. make network calls > > 4. Scaling scheduler becomes easier because can have different service > > responsible for parsing DAGs which can be trivially scaled horizontally > > (clients are doing the parsing) > > 5. Potentially makes creating ad-hoc DAGs/backfilling/iterating on DAGs > > easier? e.g. can use the Scheduler itself to schedule backfills with a > > slightly modified serialized version of a DAG. > > > > Cons: > > 1. Have to deprecate a lot of popular features, e.g. allowing custom > > callbacks in operators (e.g. on_failure), and jinja_templates > > 2. Version compatibility problems, e.g. user/service client might be > > serializing arguments for hooks/operators that have been deprecated in > > newer versions of the hooks, or the serialized DAG schema changes and old > > DAGs aren't automatically updated. Might want to have some kind of > > versioning system for serialized DAGs to at least ensure that stored DAGs > > are valid when the Scheduler/Worker/etc are upgraded, maybe something > > similar to thrift/protobuf versioning. > > 3. Additional complexity - additional service, logic on workers/scheduler > > to fetch/cache serialized DAGs efficiently, expiring/archiving old DAG > > definitions, etc > > > > > > On Sun, Jul 29, 2018 at 3:20 PM Bolke de Bruin <bdbr...@gmail.com > <mailto:bdbr...@gmail.com>> wrote: > > > >> Ah gotcha. That’s another issue actually (but related). > >> > >> Let’s say we trust the owner field of the DAGs I think we could do the > >> following. We then have a table (and interface) to tell Airflow what > users > >> have access to what connections. The scheduler can then check if the > task > >> in the dag can access the conn_id it is asking for. Auto generated dags > >> still have an owner (or should) and therefore should be fine. Some > >> integrity checking could/should be added as we want to be sure that the > >> task we schedule is the task we launch. So a signature calculated at the > >> scheduler (or part of the DAG), send as part of the metadata and > checked by > >> the executor is probably smart. > >> > >> You can also make this more fancy by integrating with something like > >> Apache Ranger that allows for policy checking. > >> > >> Obviously, the trusting the user part is key here. It is one of the > >> reasons I was suggesting using “airflow submit” to update / add dags in > >> Airflow. We could enforce authentication on the DAG. It was kind of > ruled > >> out in favor of git time machines although these never happened afaik > ;-). > >> > >> BTW: I have updated my implementation with protobuf. Metadata is now > >> available at executor and task. > >> > >> > >>> On 29 Jul 2018, at 15:47, Dan Davydov <ddavy...@twitter.com.INVALID> > >> wrote: > >>> > >>> The concern is how to secure secrets on the scheduler such that only > >>> certain DAGs can access them, and in the case of files that create DAGs > >>> dynamically, only some set of DAGs should be able to access these > >> secrets. > >>> > >>> e.g. if there is a secret/keytab that can be read by DAG A generated by > >>> file X, and file X generates DAG B as well, there needs to be a scheme > to > >>> stop the parsing of DAG B on the scheduler from being able to read the > >>> secret in DAG A. > >>> > >>> Does that make sense? > >>> > >>> On Sun, Jul 29, 2018 at 6:14 AM Bolke de Bruin <bdbr...@gmail.com > >> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>>> wrote: > >>> > >>>> I’m not sure what you mean. The example I created allows for dynamic > >> DAGs, > >>>> as the scheduler obviously knows about the tasks when they are ready > to > >> be > >>>> scheduled. > >>>> This isn’t any different from a static DAG or a dynamic one. > >>>> > >>>> For Kerberos it isnt that special. Basically a keytab are the > revokable > >>>> users credentials > >>>> in a special format. The keytab itself can be protected by a password. > >> So > >>>> I can imagine > >>>> that a connection is defined that sets a keytab location and password > to > >>>> access the keytab. > >>>> The scheduler understands this (or maybe the Connection model) and > >>>> serializes and sends > >>>> it to the worker as part of the metadata. The worker then reconstructs > >> the > >>>> keytab and issues > >>>> a kinit or supplies it to the other service requiring it (eg. Spark) > >>>> > >>>> * Obviously the worker and scheduler need to communicate over SSL. > >>>> * There is a challenge at the worker level. Credentials are secured > >>>> against other users, but are readable by the owning user. So imagine 2 > >> DAGs > >>>> from two different users with different connections without sudo > >>>> configured. If they end up at the same worker if DAG 2 is malicious it > >>>> could read files and memory created by DAG 1. This is the reason why > >> using > >>>> environment variables are NOT safe (DAG 2 could read > >> /proc/<pid>/environ). > >>>> To mitigate this we probably need to PIPE the data to the task’s > STDIN. > >> It > >>>> won’t solve the issue but will make it harder as now it will only be > in > >>>> memory. > >>>> * The reconstructed keytab (or the initalized version) can be stored > in, > >>>> most likely, the process-keyring ( > >>>> http://man7.org/linux/man-pages/man7/process-keyring.7.html < > http://man7.org/linux/man-pages/man7/process-keyring.7.html> < > >>>> http://man7.org/linux/man-pages/man7/process-keyring.7.html < > http://man7.org/linux/man-pages/man7/process-keyring.7.html> < > >> http://man7.org/linux/man-pages/man7/process-keyring.7.html < > http://man7.org/linux/man-pages/man7/process-keyring.7.html>>>). As > >>>> mentioned earlier this poses a challenge for Java applications that > >> cannot > >>>> read from this location (keytab an ccache). Writing it out to the > >>>> filesystem then becomes a possibility. This is essentially the same > how > >>>> Spark solves it ( > >>>> https://spark.apache.org/docs/latest/security.html#yarn-mode < > https://spark.apache.org/docs/latest/security.html#yarn-mode> < > >> https://spark.apache.org/docs/latest/security.html#yarn-mode < > https://spark.apache.org/docs/latest/security.html#yarn-mode>> < > >>>> https://spark.apache.org/docs/latest/security.html#yarn-mode < > https://spark.apache.org/docs/latest/security.html#yarn-mode> < > >> https://spark.apache.org/docs/latest/security.html#yarn-mode < > https://spark.apache.org/docs/latest/security.html#yarn-mode>>>). > >>>> > >>>> Why not work on this together? We need it as well. Airflow as it is > now > >> we > >>>> consider the biggest security threat and it is really hard to secure > it. > >>>> The above would definitely be a serious improvement. Another step > would > >> be > >>>> to stop Tasks from accessing the Airflow DB all together. > >>>> > >>>> Cheers > >>>> Bolke > >>>> > >>>>> On 29 Jul 2018, at 05:36, Dan Davydov <ddavy...@twitter.com.INVALID > <mailto:ddavy...@twitter.com.INVALID> > >> <mailto:ddavy...@twitter.com.INVALID <mailto: > ddavy...@twitter.com.INVALID>>> > >>>> wrote: > >>>>> > >>>>> This makes sense, and thanks for putting this together. I might pick > >> this > >>>>> up myself depending on if we can get the rest of the mutli-tenancy > >> story > >>>>> nailed down, but I still think the tricky part is figuring out how to > >>>> allow > >>>>> dynamic DAGs (e.g. DAGs created from rows in a Mysql table) to work > >> with > >>>>> Kerberos, curious what your thoughts are there. How would secrets be > >>>> passed > >>>>> securely in a multi-tenant Scheduler starting from parsing the DAGs > up > >> to > >>>>> the executor sending them off? > >>>>> > >>>>> On Sat, Jul 28, 2018 at 5:07 PM Bolke de Bruin <bdbr...@gmail.com > <mailto:bdbr...@gmail.com> > >> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>> > >>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > bdbr...@gmail.com <mailto:bdbr...@gmail.com>>>> wrote: > >>>>> > >>>>>> Here: > >>>>>> > >>>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > https://github.com/bolkedebruin/airflow/tree/secure_connections> < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections < > https://github.com/bolkedebruin/airflow/tree/secure_connections>> < > >>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > https://github.com/bolkedebruin/airflow/tree/secure_connections> < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections < > https://github.com/bolkedebruin/airflow/tree/secure_connections>>> < > >>>>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > https://github.com/bolkedebruin/airflow/tree/secure_connections> < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections < > https://github.com/bolkedebruin/airflow/tree/secure_connections>> < > >>>> https://github.com/bolkedebruin/airflow/tree/secure_connections < > https://github.com/bolkedebruin/airflow/tree/secure_connections> < > >> https://github.com/bolkedebruin/airflow/tree/secure_connections < > https://github.com/bolkedebruin/airflow/tree/secure_connections>>>> > >>>>>> > >>>>>> Is a working rudimentary implementation that allows securing the > >>>>>> connections (only LocalExecutor at the moment) > >>>>>> > >>>>>> * It enforces the use of “conn_id” instead of the mix that we have > now > >>>>>> * A task if using “conn_id” has ‘auto-registered’ (which is a noop) > >> its > >>>>>> connections > >>>>>> * The scheduler reads the connection informations and serializes it > to > >>>>>> json (which should be a different format, protobuf preferably) > >>>>>> * The scheduler then sends this info to the executor > >>>>>> * The executor puts this in the environment of the task (environment > >>>> most > >>>>>> likely not secure enough for us) > >>>>>> * The BaseHook reads out this environment variable and does not need > >> to > >>>>>> touch the database > >>>>>> > >>>>>> The example_http_operator works, I havent tested any other. To make > it > >>>>>> work I just adjusted the hook and operator to use “conn_id” instead > >>>>>> of the non standard http_conn_id. > >>>>>> > >>>>>> Makes sense? > >>>>>> > >>>>>> B. > >>>>>> > >>>>>> * The BaseHook is adjusted to not connect to the database > >>>>>>> On 28 Jul 2018, at 17:50, Bolke de Bruin <bdbr...@gmail.com > <mailto:bdbr...@gmail.com> <mailto: > >> bdbr...@gmail.com <mailto:bdbr...@gmail.com>>> wrote: > >>>>>>> > >>>>>>> Well, I don’t think a hook (or task) should be obtain it by itself. > >> It > >>>>>> should be supplied. > >>>>>>> At the moment you start executing the task you cannot trust it > >> anymore > >>>>>> (ie. it is unmanaged > >>>>>>> / non airflow code). > >>>>>>> > >>>>>>> So we could change the basehook to understand supplied credentials > >> and > >>>>>> populate > >>>>>>> a hash with “conn_ids”. Hooks normally call BaseHook.get_connection > >>>>>> anyway, so > >>>>>>> it shouldnt be too hard and should in principle not require changes > >> to > >>>>>> the hooks > >>>>>>> themselves if they are well behaved. > >>>>>>> > >>>>>>> B. > >>>>>>> > >>>>>>>> On 28 Jul 2018, at 17:41, Dan Davydov > <ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> > >> <mailto:ddavy...@twitter.com.INVALID <mailto: > ddavy...@twitter.com.INVALID>> > >>>>>> <mailto:ddavy...@twitter.com.INVALID <mailto: > ddavy...@twitter.com.INVALID> <mailto: > >> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>> > <mailto: > >>>> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> > <mailto:ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID > >>>>> > >> wrote: > >>>>>>>> > >>>>>>>> *So basically in the scheduler we parse the dag. Either from the > >>>>>> manifest > >>>>>>>> (new) or from smart parsing (probably harder, maybe some auto > >>>>>> register?) we > >>>>>>>> know what connections and keytabs are available dag wide or per > >> task.* > >>>>>>>> This is the hard part that I was curious about, for dynamically > >>>> created > >>>>>>>> DAGs, e.g. those generated by reading tasks in a MySQL database > or a > >>>>>> json > >>>>>>>> file, there isn't a great way to do this. > >>>>>>>> > >>>>>>>> I 100% agree with deprecating the connections table (at least for > >> the > >>>>>>>> secure option). The main work there is rewriting all hooks to take > >>>>>>>> credentials from arbitrary data sources by allowing a customized > >>>>>>>> CredentialsReader class. Although hooks are technically private, I > >>>>>> think a > >>>>>>>> lot of companies depend on them so the PMC should probably discuss > >> if > >>>>>> this > >>>>>>>> is an Airflow 2.0 change or not. > >>>>>>>> > >>>>>>>> On Fri, Jul 27, 2018 at 5:24 PM Bolke de Bruin <bdbr...@gmail.com > <mailto:bdbr...@gmail.com> > >> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com>> > >>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > bdbr...@gmail.com <mailto:bdbr...@gmail.com>>> > >>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > bdbr...@gmail.com <mailto:bdbr...@gmail.com>> <mailto: > >> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto:bdbr...@gmail.com > <mailto:bdbr...@gmail.com>>>>> wrote: > >>>>>>>> > >>>>>>>>> Sure. In general I consider keytabs as a part of connection > >>>>>> information. > >>>>>>>>> Connections should be secured by sending the connection > >> information a > >>>>>> task > >>>>>>>>> needs as part of information the executor gets. A task should > then > >>>> not > >>>>>> need > >>>>>>>>> access to the connection table in Airflow. Keytabs could then be > >> send > >>>>>> as > >>>>>>>>> part of the connection information (base64 encoded) and setup by > >> the > >>>>>>>>> executor (this key) to be read only to the task it is launching. > >>>>>>>>> > >>>>>>>>> So basically in the scheduler we parse the dag. Either from the > >>>>>> manifest > >>>>>>>>> (new) or from smart parsing (probably harder, maybe some auto > >>>>>> register?) we > >>>>>>>>> know what connections and keytabs are available dag wide or per > >> task. > >>>>>>>>> > >>>>>>>>> The credentials and connection information then are serialized > >> into a > >>>>>>>>> protobuf message and send to the executor as part of the “queue” > >>>>>> action. > >>>>>>>>> The worker then deserializes the information and makes it > securely > >>>>>>>>> available to the task (which is quite hard btw). > >>>>>>>>> > >>>>>>>>> On that last bit making the info securely available might be > >> storing > >>>>>> it in > >>>>>>>>> the Linux KEYRING (supported by python keyring). Keytabs will be > >>>> tough > >>>>>> to > >>>>>>>>> do properly due to Java not properly supporting KEYRING and only > >>>> files > >>>>>> and > >>>>>>>>> these are hard to make secure (due to the possibility a process > >> will > >>>>>> list > >>>>>>>>> all files in /tmp and get credentials through that). Maybe > storing > >>>> the > >>>>>>>>> keytab with a password and having the password in the KEYRING > might > >>>>>> work. > >>>>>>>>> Something to find out. > >>>>>>>>> > >>>>>>>>> B. > >>>>>>>>> > >>>>>>>>> Verstuurd vanaf mijn iPad > >>>>>>>>> > >>>>>>>>>> Op 27 jul. 2018 om 22:04 heeft Dan Davydov > >>>>>> <ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> > <mailto:ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID > >> > >> <mailto:ddavy...@twitter.com.INVALID <mailto: > ddavy...@twitter.com.INVALID> <mailto:ddavy...@twitter.com.INVALID > <mailto:ddavy...@twitter.com.INVALID> > >>>> > >>>> <mailto:ddavy...@twitter.com.INVALID <mailto: > ddavy...@twitter.com.INVALID> <mailto: > >> ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID>> > <mailto:ddavy...@twitter.com.INVALID <mailto:ddavy...@twitter.com.INVALID> > >> <mailto:ddavy...@twitter.com.INVALID <mailto: > ddavy...@twitter.com.INVALID>> > >>>>>>> > >>>>>>>>> het volgende geschreven: > >>>>>>>>>> > >>>>>>>>>> I'm curious if you had any ideas in terms of ideas to enable > >>>>>>>>> multi-tenancy > >>>>>>>>>> with respect to Kerberos in Airflow. > >>>>>>>>>> > >>>>>>>>>>> On Fri, Jul 27, 2018 at 2:38 PM Bolke de Bruin < > >> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto:bdbr...@gmail.com > <mailto:bdbr...@gmail.com>> > >>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > bdbr...@gmail.com <mailto:bdbr...@gmail.com>>> > >>>>>> <mailto:bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto: > bdbr...@gmail.com <mailto:bdbr...@gmail.com>> <mailto: > >> bdbr...@gmail.com <mailto:bdbr...@gmail.com> <mailto:bdbr...@gmail.com > <mailto:bdbr...@gmail.com>>>>> > >>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Cool. The doc will need some refinement as it isn't entirely > >>>>>> accurate. > >>>>>>>>> In > >>>>>>>>>>> addition we need to separate between Airflow as a client of > >>>>>> kerberized > >>>>>>>>>>> services (this is what is talked about in the astronomer doc) > vs > >>>>>>>>>>> kerberizing airflow itself, which the API supports. > >>>>>>>>>>> > >>>>>>>>>>> In general to access kerberized services (airflow as a client) > >> one > >>>>>> needs > >>>>>>>>>>> to start the ticket renewer with a valid keytab. For the hooks > it > >>>>>> isn't > >>>>>>>>>>> always required to change the hook to support it. Hadoop cli > >> tools > >>>>>> often > >>>>>>>>>>> just pick it up as their client config is set to do so. Then > >>>> another > >>>>>>>>> class > >>>>>>>>>>> is there for HTTP-like services which are accessed by urllib > >> under > >>>>>> the > >>>>>>>>>>> hood, these typically use SPNEGO. These often need to be > adjusted > >>>> as > >>>>>> it > >>>>>>>>>>> requires some urllib config. Finally, there are protocols which > >> use > >>>>>> SASL > >>>>>>>>>>> with kerberos. Like HDFS (not webhdfs, that uses SPNEGO). These > >>>>>> require > >>>>>>>>> per > >>>>>>>>>>> protocol implementations. > >>>>>>>>>>> > >>>>>>>>>>> From the top of my head we support kerberos client side now > with: > >>>>>>>>>>> > >>>>>>>>>>> * Spark > >>>>>>>>>>> * HDFS (snakebite python 2.7, cli and with the upcoming libhdfs > >>>>>>>>>>> implementation) > >>>>>>>>>>> * Hive (not metastore afaik) > >>>>>>>>>>> > >>>>>>>>>>> Two things to remember: > >>>>>>>>>>> > >>>>>>>>>>> * If a job (ie. Spark job) will finish later than the maximum > >>>> ticket > >>>>>>>>>>> lifetime you probably need to provide a keytab to said > >> application. > >>>>>>>>>>> Otherwise you will get failures after the expiry. > >>>>>>>>>>> * A keytab (used by the renewer) are credentials (user and > pass) > >> so > >>>>>> jobs > >>>>>>>>>>> are executed under the keytab in use at that moment > >>>>>>>>>>> * Securing keytab in multi tenancy airflow is a challenge. This > >>>> also > >>>>>>>>> goes > >>>>>>>>>>> for securing connections. This we need to fix at some point. > >>>> Solution > >>>>>>>>> for > >>>>>>>>>>> now seems to be no multi tenancy. > >>>>>>>>>>> > >>>>>>>>>>> Kerberos seems harder than it is btw. Still, we are sometimes > >>>> moving > >>>>>>>>> away > >>>>>>>>>>> from it to OAUTH2 based authentication. This gets use closer to > >>>> cloud > >>>>>>>>>>> standards (but we are on prem) > >>>>>>>>>>> > >>>>>>>>>>> B. > >>>>>>>>>>> > >>>>>>>>>>> Sent from my iPhone > >>>>>>>>>>> > >>>>>>>>>>>> On 27 Jul 2018, at 17:41, Hitesh Shah <hit...@apache.org > <mailto:hit...@apache.org> > >> <mailto:hit...@apache.org <mailto:hit...@apache.org>> <mailto: > >>>> hit...@apache.org <mailto:hit...@apache.org> <mailto: > hit...@apache.org <mailto:hit...@apache.org>>> <mailto: > >>>>>> hit...@apache.org <mailto:hit...@apache.org> <mailto: > hit...@apache.org <mailto:hit...@apache.org>> <mailto: > >> hit...@apache.org <mailto:hit...@apache.org> <mailto:hit...@apache.org > <mailto:hit...@apache.org>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Hi Taylor > >>>>>>>>>>>> > >>>>>>>>>>>> +1 on upstreaming this. It would be great if you can submit a > >> pull > >>>>>>>>>>> request > >>>>>>>>>>>> to enhance the apache airflow docs. > >>>>>>>>>>>> > >>>>>>>>>>>> thanks > >>>>>>>>>>>> Hitesh > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> On Thu, Jul 26, 2018 at 2:32 PM Taylor Edmiston < > >>>>>> tedmis...@gmail.com <mailto:tedmis...@gmail.com> <mailto: > tedmis...@gmail.com <mailto:tedmis...@gmail.com>> <mailto: > >> tedmis...@gmail.com <mailto:tedmis...@gmail.com> <mailto: > tedmis...@gmail.com <mailto:tedmis...@gmail.com>>> <mailto: > >>>> tedmis...@gmail.com <mailto:tedmis...@gmail.com> <mailto: > tedmis...@gmail.com <mailto:tedmis...@gmail.com>> <mailto: > >> tedmis...@gmail.com <mailto:tedmis...@gmail.com> <mailto: > tedmis...@gmail.com <mailto:tedmis...@gmail.com>>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> While we're on the topic, I'd love any feedback from Bolke or > >>>>>> others > >>>>>>>>>>> who've > >>>>>>>>>>>>> used Kerberos with Airflow on this quick guide I put together > >>>>>>>>> yesterday. > >>>>>>>>>>>>> It's similar to what's in the Airflow docs but instead all on > >> one > >>>>>> page > >>>>>>>>>>>>> and slightly > >>>>>>>>>>>>> expanded. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>> > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > < > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > > > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > < > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > > > >>> > >>>> < > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > < > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > > > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > < > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > > > >>> > >>>>> > >>>>>> < > >>>>>> > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > < > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > > > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > < > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > > > >>> > >>>> < > >>>> > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > < > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > > > >> < > >> > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > < > https://github.com/astronomerio/airflow-guides/blob/master/guides/kerberos.md > > > >>> > >>>>> > >>>>>>> > >>>>>>>>>>>>> (or web version <https://www.astronomer.io/guides/kerberos/ > <https://www.astronomer.io/guides/kerberos/> < > >> https://www.astronomer.io/guides/kerberos/ < > https://www.astronomer.io/guides/kerberos/>> < > >>>> https://www.astronomer.io/guides/kerberos/ < > https://www.astronomer.io/guides/kerberos/> < > >> https://www.astronomer.io/guides/kerberos/ < > https://www.astronomer.io/guides/kerberos/>>>>) > >>>>>>>>>>>>> > >>>>>>>>>>>>> One thing I'd like to add is a minimal example of how to > >>>> Kerberize > >>>>>> a > >>>>>>>>>>> hook. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I'd be happy to upstream this as well if it's useful (maybe a > >>>>>>>>> Concepts > > >>>>>>>>>>>>> Additional Functionality > Kerberos page?) > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best, > >>>>>>>>>>>>> Taylor > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> *Taylor Edmiston* > >>>>>>>>>>>>> Blog <https://blog.tedmiston.com/ < > https://blog.tedmiston.com/> <https://blog.tedmiston.com/ < > https://blog.tedmiston.com/>> > >> <https://blog.tedmiston.com/ <https://blog.tedmiston.com/> < > https://blog.tedmiston.com/ <https://blog.tedmiston.com/>>>> > >>>> | CV > >>>>>>>>>>>>> <https://stackoverflow.com/cv/taylor < > https://stackoverflow.com/cv/taylor> < > >> https://stackoverflow.com/cv/taylor < > https://stackoverflow.com/cv/taylor>> < > >>>> https://stackoverflow.com/cv/taylor < > https://stackoverflow.com/cv/taylor> < > >> https://stackoverflow.com/cv/taylor < > https://stackoverflow.com/cv/taylor>>>> | LinkedIn > >>>>>>>>>>>>> <https://www.linkedin.com/in/tedmiston/ < > https://www.linkedin.com/in/tedmiston/> < > >> https://www.linkedin.com/in/tedmiston/ < > https://www.linkedin.com/in/tedmiston/>> < > >>>> https://www.linkedin.com/in/tedmiston/ < > https://www.linkedin.com/in/tedmiston/> < > >> https://www.linkedin.com/in/tedmiston/ < > https://www.linkedin.com/in/tedmiston/>>>> | AngelList > >>>>>>>>>>>>> <https://angel.co/taylor <https://angel.co/taylor> < > https://angel.co/taylor <https://angel.co/taylor>> < > >> https://angel.co/taylor <https://angel.co/taylor> < > https://angel.co/taylor <https://angel.co/taylor>>>> | Stack > >>>> Overflow > >>>>>>>>>>>>> <https://stackoverflow.com/users/149428/taylor-edmiston < > https://stackoverflow.com/users/149428/taylor-edmiston> < > >> https://stackoverflow.com/users/149428/taylor-edmiston < > https://stackoverflow.com/users/149428/taylor-edmiston>> < > >>>> https://stackoverflow.com/users/149428/taylor-edmiston < > https://stackoverflow.com/users/149428/taylor-edmiston> < > >> https://stackoverflow.com/users/149428/taylor-edmiston < > https://stackoverflow.com/users/149428/taylor-edmiston>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Thu, Jul 26, 2018 at 5:18 PM, Driesprong, Fokko > >>>>>>>>> <fo...@driesprong.frl <mailto:fo...@driesprong.frl> <mailto: > fo...@driesprong.frl <mailto:fo...@driesprong.frl>> <mailto: > >> fo...@driesprong.frl <mailto:fo...@driesprong.frl> <mailto: > fo...@driesprong.frl <mailto:fo...@driesprong.frl>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi Ry, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> You should ask Bolke de Bruin. He's really experienced with > >>>>>> Kerberos > >>>>>>>>>>> and > >>>>>>>>>>>>> he > >>>>>>>>>>>>>> did also the implementation for Airflow. Beside that he > worked > >>>>>> also > >>>>>>>>> on > >>>>>>>>>>>>>> implementing Kerberos in Ambari. Just want to let you know. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Cheers, Fokko > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Op do 26 jul. 2018 om 23:03 schreef Ry Walker < > >> r...@astronomer.io <mailto:r...@astronomer.io> <mailto:r...@astronomer.io > <mailto:r...@astronomer.io>> > >>>> <mailto:r...@astronomer.io <mailto:r...@astronomer.io> <mailto: > r...@astronomer.io <mailto:r...@astronomer.io>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi everyone - > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> We have several bigCo's who are considering using Airflow > >>>> asking > >>>>>>>>> into > >>>>>>>>>>>>> its > >>>>>>>>>>>>>>> support for Kerberos. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> We're going to work on a proof-of-concept next week, will > >>>> likely > >>>>>>>>>>>>> record a > >>>>>>>>>>>>>>> screencast on it. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> For now, we're looking for any anecdotal information from > >>>>>>>>>>> organizations > >>>>>>>>>>>>>> who > >>>>>>>>>>>>>>> are using Kerberos with Airflow, if anyone would be willing > >> to > >>>>>> share > >>>>>>>>>>>>>> their > >>>>>>>>>>>>>>> experiences here, or reply to me personally, it would be > >>>> greatly > >>>>>>>>>>>>>>> appreciated! > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> -Ry > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> *Ry Walker* | CEO, Astronomer <http://www.astronomer.io/ < > http://www.astronomer.io/> < > >> http://www.astronomer.io/ <http://www.astronomer.io/>> < > >>>> http://www.astronomer.io/ <http://www.astronomer.io/> < > http://www.astronomer.io/ <http://www.astronomer.io/>>>> | > >>>>>>>>>>>>>> 513.417.2163 | > >>>>>>>>>>>>>>> @rywalker <http://twitter.com/rywalker < > http://twitter.com/rywalker> < > >> http://twitter.com/rywalker <http://twitter.com/rywalker>> < > >>>> http://twitter.com/rywalker <http://twitter.com/rywalker> < > http://twitter.com/rywalker <http://twitter.com/rywalker>>>> | LinkedIn > >>>>>>>>>>>>>>> <http://www.linkedin.com/in/rywalker < > http://www.linkedin.com/in/rywalker> < > >> http://www.linkedin.com/in/rywalker < > http://www.linkedin.com/in/rywalker>> < > >>>> http://www.linkedin.com/in/rywalker < > http://www.linkedin.com/in/rywalker> < > >> http://www.linkedin.com/in/rywalker < > http://www.linkedin.com/in/rywalker>>>> > >