+1. I like it. And I like the general direction of the name as well not to be tied with connections.
I think it's also worth mentioning that there is PR in progress about it here: https://github.com/apache/airflow/pull/6376/files I have one observation though. We might not necessarily go in this direction or maybe defer it to later discussion, but I thought about extending the use of such CredentialsBackend and make it simply a SecretsBackend (that would be a nice solution to discussing various names for it :)). Daniel - you might consider that as a "loose" comment. I do not want to hijack your topic here, so feel free to cut the discussion here if you think it's going too far from the original proposal. Why don't we change it to a bit more generic "SecretsBackend" and provide a more generic interface to use by the Hooks/Operators? I think the current implementation is not very far from it. Additionally to just providing a "get_connection" method it could provide much more generic "get_secret" and this would give the Hooks/Operators an opportunity to use it to retrieve all various kinds of secrets, not only Connections. We already have a half-baked solution for that with the _COMMAND variables and maybe we can incorporate this in backwards-compatible way in the "SecretBackend' similarly as you did with Metastore and Env Backends. I think this would be a very handy feature - some of the secrets are not necessary in the URI form, they might be much more complex and contain better structured data and might be needed or available at different times than when connection is retrieved. Often secrets are short-living and they should be retrieved "just" before they are used - limiting it only to when the Hook is created and only to URI is quite a limitation I think. Again - feel free to disregard that comment, I just thought it might be a handy feature and maybe it's a good time to add it. J. On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <[email protected]> wrote: > We can currently retrieve connections from environment variables or the > metastore database. > > AIP-33 > <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend> > provides > a way to retrieve them from other sources, for example AWS SSM parameter > store. > > There are many instances in airflow where we allow for user customization > like this, for example with auth backend and hostname callable -- it just > hasn't been done yet with creds. > > *How is it implemented?* > This adds a base class BaseCredsBackend which takes over BaseHook's > get_connections method. Then EnvironmentVariablesCredsBackend and > MetastoreCredsBackend are added as subclasses. New implementations can be > added and user can configure precedence. E.g. instead of the default env > vars > metastore, a user could specify SSM > metastore > env vars in > airflow.cfg. > > *Does this break anything?* > No. This is a relatively simple refactor that results in no change in the > default behavior unless user modifies config to enable an alternative > backend. > > *Why is this worth adding?* > If implemented, you could store creds anywhere, as long as you can write a > `get_connections` method that retrieves them. > This can make it possible for a team of devs to share one creds source > rather than each dev storing them in text files, for example. > Can also make it easier to spin up a dev instance since you don't have to > worry about loading creds. > And if you don't have access to the airflow CLI (because you are on a cloud > platform perhaps) then this can provide an easy way to load and manage > creds. > > *Why do you call it creds?* > I don't need to call it creds. I could call it BaseConnectionBackend or > BaseConnectionInfoBackend or any other thing -- I don't care too much. > There was something about calling it connection backend that was > unsatisfying. Perhaps because connection is a bit of an ambiguous word. > So even though Connection is the name of the model that we instantiate when > retrieving these creds, creds seemed more specific and more > representative. But again if there is consensus on something else, I am > happy to change. > This adds a `creds` package under `airflow`. Alternatively would we create > a `connections` package? Would this cause confusion relative to the > *model* > Connection? > Another thought: even though today this backend will produce a connection > object, maybe in the future it can produce a different model -- either way, > this backend is still about retrieving *creds*. > > *Outstanding questions* > As mentioned above, some don't like using the word Creds because this is > about producing connections. I totally get this. If there is general > consensus around another name, I am happy to change it. > I welcome any other suggestions or feedback on the structure of this > implementation. > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>
