Thanks, Daniel, a *big "+1"* to the AIP and the idea of integrating external vaults and using them as connections. Was in the roadmap of items I wanted to work on so very happy that it is coming together and we are already discussing it.
I have also looked at the PR and for the most part, it looks good to me. Following are my comments 1. "*SecretsBackend*" is a more intuitive name in my opinion (we can have a vote in case we don't come to a conclusion :D) 2. I think we should have *creds_backend_kwargs* (either separate section or a config allowing JSON-string) or something similar to avoid hardcoding configs (like *aws_ssm_prefix*/*aws_ssm_profile_name*) related to a specific provider in *airflow.cfg*. If we don't do that we would need to hardcode these configs when I (or someone) adds support for Google's KMS and Hashicorp Vault :) 3. How do we authenticate to the Secrets/Creds Backend? For Google Cloud KMS or Hashicorp Vault, the main way of authenticating I think is using Environment Variable. Of course (2) can solve this, for example, we can pass *GOOGLE_APPLICATION_CREDENTIAL* JSON file path to *creds_backend_kwargs *which would then authenticate with GCP Or do we want to get that value from* Connection object stored in DB*? Regards, Kaxil On Sun, Mar 8, 2020 at 6:41 AM Daniel Standish <[email protected]> wrote: > Thanks Jarek for adding the PR link. In addition to forgetting to include > that, I put the wrong AIP number in subject -- fixed now (AIP-33 is the > correct AIP). > > I like the name Secrets -- I will let the other ideas percolate and will > look into your suggestion about using the plugin system. > > > On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <[email protected]> > wrote: > > > +1. I like it. And I like the general direction of the name as well not > to > > be tied with connections. > > > > I think it's also worth mentioning that there is PR in progress about it > > here: https://github.com/apache/airflow/pull/6376/files > > > > I have one observation though. We might not necessarily go in this > > direction or maybe defer it to later discussion, but I thought about > > extending the use of such CredentialsBackend and make it simply a > > SecretsBackend (that would be a nice solution to discussing various names > > for it :)). Daniel - you might consider that as a "loose" comment. > > > > I do not want to hijack your topic here, so feel free to cut the > discussion > > here if you think it's going too far from the original proposal. > > > > Why don't we change it to a bit more generic "SecretsBackend" and > provide a > > more generic interface to use by the Hooks/Operators? I think the current > > implementation is not very far from it. Additionally to just providing a > > "get_connection" method it could provide much more generic "get_secret" > and > > this would give the Hooks/Operators an opportunity to use it to retrieve > > all various kinds of secrets, not only Connections. We already have a > > half-baked solution for that with the _COMMAND variables and maybe we can > > incorporate this in backwards-compatible way in the "SecretBackend' > > similarly as you did with Metastore and Env Backends. > > > > I think this would be a very handy feature - some of the secrets are not > > necessary in the URI form, they might be much more complex and contain > > better structured data and might be needed or available at different > times > > than when connection is retrieved. Often secrets are short-living and > they > > should be retrieved "just" before they are used - limiting it only to > when > > the Hook is created and only to URI is quite a limitation I think. > > > > Again - feel free to disregard that comment, I just thought it might be a > > handy feature and maybe it's a good time to add it. > > > > J. > > > > > > On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <[email protected]> > > wrote: > > > > > We can currently retrieve connections from environment variables or the > > > metastore database. > > > > > > AIP-33 > > > < > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend > > > > > provides > > > a way to retrieve them from other sources, for example AWS SSM > parameter > > > store. > > > > > > There are many instances in airflow where we allow for user > customization > > > like this, for example with auth backend and hostname callable -- it > just > > > hasn't been done yet with creds. > > > > > > *How is it implemented?* > > > This adds a base class BaseCredsBackend which takes over BaseHook's > > > get_connections method. Then EnvironmentVariablesCredsBackend and > > > MetastoreCredsBackend are added as subclasses. New implementations can > > be > > > added and user can configure precedence. E.g. instead of the default > env > > > vars > metastore, a user could specify SSM > metastore > env vars in > > > airflow.cfg. > > > > > > *Does this break anything?* > > > No. This is a relatively simple refactor that results in no change in > the > > > default behavior unless user modifies config to enable an alternative > > > backend. > > > > > > *Why is this worth adding?* > > > If implemented, you could store creds anywhere, as long as you can > write > > a > > > `get_connections` method that retrieves them. > > > This can make it possible for a team of devs to share one creds source > > > rather than each dev storing them in text files, for example. > > > Can also make it easier to spin up a dev instance since you don't have > to > > > worry about loading creds. > > > And if you don't have access to the airflow CLI (because you are on a > > cloud > > > platform perhaps) then this can provide an easy way to load and manage > > > creds. > > > > > > *Why do you call it creds?* > > > I don't need to call it creds. I could call it BaseConnectionBackend > or > > > BaseConnectionInfoBackend or any other thing -- I don't care too much. > > > There was something about calling it connection backend that was > > > unsatisfying. Perhaps because connection is a bit of an ambiguous > word. > > > So even though Connection is the name of the model that we instantiate > > when > > > retrieving these creds, creds seemed more specific and more > > > representative. But again if there is consensus on something else, I > am > > > happy to change. > > > This adds a `creds` package under `airflow`. Alternatively would we > > create > > > a `connections` package? Would this cause confusion relative to the > > > *model* > > > Connection? > > > Another thought: even though today this backend will produce a > connection > > > object, maybe in the future it can produce a different model -- either > > way, > > > this backend is still about retrieving *creds*. > > > > > > *Outstanding questions* > > > As mentioned above, some don't like using the word Creds because this > is > > > about producing connections. I totally get this. If there is general > > > consensus around another name, I am happy to change it. > > > I welcome any other suggestions or feedback on the structure of this > > > implementation. > > > > > > > > > -- > > > > Jarek Potiuk > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > M: +48 660 796 129 <+48660796129> > > [image: Polidea] <https://www.polidea.com/> > > >
