Thanks Jarek for adding the PR link. In addition to forgetting to include that, I put the wrong AIP number in subject -- fixed now (AIP-33 is the correct AIP).
I like the name Secrets -- I will let the other ideas percolate and will look into your suggestion about using the plugin system. On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <[email protected]> wrote: > +1. I like it. And I like the general direction of the name as well not to > be tied with connections. > > I think it's also worth mentioning that there is PR in progress about it > here: https://github.com/apache/airflow/pull/6376/files > > I have one observation though. We might not necessarily go in this > direction or maybe defer it to later discussion, but I thought about > extending the use of such CredentialsBackend and make it simply a > SecretsBackend (that would be a nice solution to discussing various names > for it :)). Daniel - you might consider that as a "loose" comment. > > I do not want to hijack your topic here, so feel free to cut the discussion > here if you think it's going too far from the original proposal. > > Why don't we change it to a bit more generic "SecretsBackend" and provide a > more generic interface to use by the Hooks/Operators? I think the current > implementation is not very far from it. Additionally to just providing a > "get_connection" method it could provide much more generic "get_secret" and > this would give the Hooks/Operators an opportunity to use it to retrieve > all various kinds of secrets, not only Connections. We already have a > half-baked solution for that with the _COMMAND variables and maybe we can > incorporate this in backwards-compatible way in the "SecretBackend' > similarly as you did with Metastore and Env Backends. > > I think this would be a very handy feature - some of the secrets are not > necessary in the URI form, they might be much more complex and contain > better structured data and might be needed or available at different times > than when connection is retrieved. Often secrets are short-living and they > should be retrieved "just" before they are used - limiting it only to when > the Hook is created and only to URI is quite a limitation I think. > > Again - feel free to disregard that comment, I just thought it might be a > handy feature and maybe it's a good time to add it. > > J. > > > On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <[email protected]> > wrote: > > > We can currently retrieve connections from environment variables or the > > metastore database. > > > > AIP-33 > > < > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend> > > provides > > a way to retrieve them from other sources, for example AWS SSM parameter > > store. > > > > There are many instances in airflow where we allow for user customization > > like this, for example with auth backend and hostname callable -- it just > > hasn't been done yet with creds. > > > > *How is it implemented?* > > This adds a base class BaseCredsBackend which takes over BaseHook's > > get_connections method. Then EnvironmentVariablesCredsBackend and > > MetastoreCredsBackend are added as subclasses. New implementations can > be > > added and user can configure precedence. E.g. instead of the default env > > vars > metastore, a user could specify SSM > metastore > env vars in > > airflow.cfg. > > > > *Does this break anything?* > > No. This is a relatively simple refactor that results in no change in the > > default behavior unless user modifies config to enable an alternative > > backend. > > > > *Why is this worth adding?* > > If implemented, you could store creds anywhere, as long as you can write > a > > `get_connections` method that retrieves them. > > This can make it possible for a team of devs to share one creds source > > rather than each dev storing them in text files, for example. > > Can also make it easier to spin up a dev instance since you don't have to > > worry about loading creds. > > And if you don't have access to the airflow CLI (because you are on a > cloud > > platform perhaps) then this can provide an easy way to load and manage > > creds. > > > > *Why do you call it creds?* > > I don't need to call it creds. I could call it BaseConnectionBackend or > > BaseConnectionInfoBackend or any other thing -- I don't care too much. > > There was something about calling it connection backend that was > > unsatisfying. Perhaps because connection is a bit of an ambiguous word. > > So even though Connection is the name of the model that we instantiate > when > > retrieving these creds, creds seemed more specific and more > > representative. But again if there is consensus on something else, I am > > happy to change. > > This adds a `creds` package under `airflow`. Alternatively would we > create > > a `connections` package? Would this cause confusion relative to the > > *model* > > Connection? > > Another thought: even though today this backend will produce a connection > > object, maybe in the future it can produce a different model -- either > way, > > this backend is still about retrieving *creds*. > > > > *Outstanding questions* > > As mentioned above, some don't like using the word Creds because this is > > about producing connections. I totally get this. If there is general > > consensus around another name, I am happy to change it. > > I welcome any other suggestions or feedback on the structure of this > > implementation. > > > > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > [image: Polidea] <https://www.polidea.com/> >
