Bumping this thread again. We have 2 "+1" votes on it currently (Me & Jarek).
Would love to hear other's opinion and thoughts. Personally I feel this is a great addition and the one I was looking forward to for a long time. Regards, Kaxil On Sun, Mar 8, 2020 at 8:33 PM Kaxil Naik <[email protected]> wrote: > Thanks, Daniel, a *big "+1"* to the AIP and the idea of integrating > external vaults and using them as connections. Was in the roadmap of items > I wanted to work on so very happy that it is coming together and we are > already discussing it. > > I have also looked at the PR and for the most part, it looks good to me. > Following are my comments > > > 1. "*SecretsBackend*" is a more intuitive name in my opinion (we can > have a vote in case we don't come to a conclusion :D) > 2. I think we should have *creds_backend_kwargs* (either separate > section or a config allowing JSON-string) or something similar to avoid > hardcoding configs (like *aws_ssm_prefix*/*aws_ssm_profile_name*) > related to a specific provider in *airflow.cfg*. > If we don't do that we would need to hardcode these configs when I (or > someone) adds support for Google's KMS and Hashicorp Vault :) > 3. How do we authenticate to the Secrets/Creds Backend? > For Google Cloud KMS or Hashicorp Vault, the main way of > authenticating I think is using Environment Variable. > Of course (2) can solve this, for example, we can pass > *GOOGLE_APPLICATION_CREDENTIAL* JSON file path to *creds_backend_kwargs > *which would then authenticate with GCP > Or do we want to get that value from* Connection object stored in DB*? > > > Regards, > Kaxil > > > On Sun, Mar 8, 2020 at 6:41 AM Daniel Standish <[email protected]> > wrote: > >> Thanks Jarek for adding the PR link. In addition to forgetting to include >> that, I put the wrong AIP number in subject -- fixed now (AIP-33 is the >> correct AIP). >> >> I like the name Secrets -- I will let the other ideas percolate and will >> look into your suggestion about using the plugin system. >> >> >> On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <[email protected]> >> wrote: >> >> > +1. I like it. And I like the general direction of the name as well not >> to >> > be tied with connections. >> > >> > I think it's also worth mentioning that there is PR in progress about it >> > here: https://github.com/apache/airflow/pull/6376/files >> > >> > I have one observation though. We might not necessarily go in this >> > direction or maybe defer it to later discussion, but I thought about >> > extending the use of such CredentialsBackend and make it simply a >> > SecretsBackend (that would be a nice solution to discussing various >> names >> > for it :)). Daniel - you might consider that as a "loose" comment. >> > >> > I do not want to hijack your topic here, so feel free to cut the >> discussion >> > here if you think it's going too far from the original proposal. >> > >> > Why don't we change it to a bit more generic "SecretsBackend" and >> provide a >> > more generic interface to use by the Hooks/Operators? I think the >> current >> > implementation is not very far from it. Additionally to just providing a >> > "get_connection" method it could provide much more generic "get_secret" >> and >> > this would give the Hooks/Operators an opportunity to use it to retrieve >> > all various kinds of secrets, not only Connections. We already have a >> > half-baked solution for that with the _COMMAND variables and maybe we >> can >> > incorporate this in backwards-compatible way in the "SecretBackend' >> > similarly as you did with Metastore and Env Backends. >> > >> > I think this would be a very handy feature - some of the secrets are not >> > necessary in the URI form, they might be much more complex and contain >> > better structured data and might be needed or available at different >> times >> > than when connection is retrieved. Often secrets are short-living and >> they >> > should be retrieved "just" before they are used - limiting it only to >> when >> > the Hook is created and only to URI is quite a limitation I think. >> > >> > Again - feel free to disregard that comment, I just thought it might be >> a >> > handy feature and maybe it's a good time to add it. >> > >> > J. >> > >> > >> > On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <[email protected]> >> > wrote: >> > >> > > We can currently retrieve connections from environment variables or >> the >> > > metastore database. >> > > >> > > AIP-33 >> > > < >> > >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend> >> > > provides >> > > a way to retrieve them from other sources, for example AWS SSM >> parameter >> > > store. >> > > >> > > There are many instances in airflow where we allow for user >> customization >> > > like this, for example with auth backend and hostname callable -- it >> just >> > > hasn't been done yet with creds. >> > > >> > > *How is it implemented?* >> > > This adds a base class BaseCredsBackend which takes over BaseHook's >> > > get_connections method. Then EnvironmentVariablesCredsBackend and >> > > MetastoreCredsBackend are added as subclasses. New implementations >> can >> > be >> > > added and user can configure precedence. E.g. instead of the default >> env >> > > vars > metastore, a user could specify SSM > metastore > env vars in >> > > airflow.cfg. >> > > >> > > *Does this break anything?* >> > > No. This is a relatively simple refactor that results in no change in >> the >> > > default behavior unless user modifies config to enable an alternative >> > > backend. >> > > >> > > *Why is this worth adding?* >> > > If implemented, you could store creds anywhere, as long as you can >> write >> > a >> > > `get_connections` method that retrieves them. >> > > This can make it possible for a team of devs to share one creds source >> > > rather than each dev storing them in text files, for example. >> > > Can also make it easier to spin up a dev instance since you don't >> have to >> > > worry about loading creds. >> > > And if you don't have access to the airflow CLI (because you are on a >> > cloud >> > > platform perhaps) then this can provide an easy way to load and manage >> > > creds. >> > > >> > > *Why do you call it creds?* >> > > I don't need to call it creds. I could call it BaseConnectionBackend >> or >> > > BaseConnectionInfoBackend or any other thing -- I don't care too much. >> > > There was something about calling it connection backend that was >> > > unsatisfying. Perhaps because connection is a bit of an ambiguous >> word. >> > > So even though Connection is the name of the model that we instantiate >> > when >> > > retrieving these creds, creds seemed more specific and more >> > > representative. But again if there is consensus on something else, I >> am >> > > happy to change. >> > > This adds a `creds` package under `airflow`. Alternatively would we >> > create >> > > a `connections` package? Would this cause confusion relative to the >> > > *model* >> > > Connection? >> > > Another thought: even though today this backend will produce a >> connection >> > > object, maybe in the future it can produce a different model -- either >> > way, >> > > this backend is still about retrieving *creds*. >> > > >> > > *Outstanding questions* >> > > As mentioned above, some don't like using the word Creds because this >> is >> > > about producing connections. I totally get this. If there is general >> > > consensus around another name, I am happy to change it. >> > > I welcome any other suggestions or feedback on the structure of this >> > > implementation. >> > > >> > >> > >> > -- >> > >> > Jarek Potiuk >> > Polidea <https://www.polidea.com/> | Principal Software Engineer >> > >> > M: +48 660 796 129 <+48660796129> >> > [image: Polidea] <https://www.polidea.com/> >> > >> >
