Thanks Jarek for adding the PR link.  In addition to forgetting to include
that, I put the wrong AIP number in subject -- fixed now (AIP-33 is the
correct AIP).

I like the name Secrets -- I will let the other ideas percolate and will
look into your suggestion about using the plugin system.


On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <[email protected]>
wrote:

> +1. I like it. And I like the general direction of the name as well not to
> be tied with connections.
>
> I think it's also worth mentioning that there is PR in progress about it
> here: https://github.com/apache/airflow/pull/6376/files
>
> I have one observation though. We might not necessarily go in this
> direction or maybe defer it to later discussion, but I thought about
> extending the use of such CredentialsBackend and make it simply a
> SecretsBackend (that would be a nice solution to discussing various names
> for it :)).  Daniel - you might consider that as a "loose" comment.
>
> I do not want to hijack your topic here, so feel free to cut the discussion
> here if you think it's going too far from the original proposal.
>
> Why don't we change it to a bit more generic "SecretsBackend" and provide a
> more generic interface to use by the Hooks/Operators? I think the current
> implementation is not very far from it. Additionally to just providing a
> "get_connection" method it could provide much more generic "get_secret" and
> this would give the Hooks/Operators an opportunity to use it to retrieve
> all various kinds of secrets, not only Connections. We already have a
> half-baked solution for that with the _COMMAND variables and maybe we can
> incorporate this in backwards-compatible way in the "SecretBackend'
> similarly as you did with Metastore and Env Backends.
>
> I think this would be a very handy feature - some of the secrets are not
> necessary in the URI form, they might be much more complex and contain
> better structured data and might be needed or available at different times
> than when connection is retrieved. Often secrets are short-living and they
> should be retrieved "just" before they are used - limiting it only to when
> the Hook is created and only to URI is quite a limitation I think.
>
> Again - feel free to disregard that comment, I just thought it might be a
> handy feature and maybe it's a good time to add it.
>
> J.
>
>
> On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <[email protected]>
> wrote:
>
> > We can currently retrieve connections from environment variables or the
> > metastore database.
> >
> > AIP-33
> > <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend>
> > provides
> > a way to retrieve them from other sources, for example AWS SSM parameter
> > store.
> >
> > There are many instances in airflow where we allow for user customization
> > like this, for example with auth backend and hostname callable -- it just
> > hasn't been done yet with creds.
> >
> > *How is it implemented?*
> > This adds a base class BaseCredsBackend which takes over BaseHook's
> > get_connections method.  Then EnvironmentVariablesCredsBackend and
> > MetastoreCredsBackend are added as subclasses.  New implementations can
> be
> > added and user can configure precedence.  E.g. instead of the default env
> > vars > metastore, a user could specify SSM > metastore > env vars in
> > airflow.cfg.
> >
> > *Does this break anything?*
> > No. This is a relatively simple refactor that results in no change in the
> > default behavior unless user modifies config to enable an alternative
> > backend.
> >
> > *Why is this worth adding?*
> > If implemented, you could store creds anywhere, as long as you can write
> a
> > `get_connections` method that retrieves them.
> > This can make it possible for a team of devs to share one creds source
> > rather than each dev storing them in text files, for example.
> > Can also make it easier to spin up a dev instance since you don't have to
> > worry about loading creds.
> > And if you don't have access to the airflow CLI (because you are on a
> cloud
> > platform perhaps) then this can provide an easy way to load and manage
> > creds.
> >
> > *Why do you call it creds?*
> > I don't need to call it creds.  I could call it BaseConnectionBackend or
> > BaseConnectionInfoBackend or any other thing -- I don't care too much.
> > There was something about calling it connection backend that was
> > unsatisfying.  Perhaps because connection is a bit of an ambiguous word.
> > So even though Connection is the name of the model that we instantiate
> when
> > retrieving these creds, creds seemed more specific and more
> > representative.  But again if there is consensus on something else, I am
> > happy to change.
> > This adds a `creds` package under `airflow`.  Alternatively would we
> create
> > a `connections` package?  Would this cause confusion relative to the
> > *model*
> > Connection?
> > Another thought: even though today this backend will produce a connection
> > object, maybe in the future it can produce a different model -- either
> way,
> > this backend is still about retrieving *creds*.
> >
> > *Outstanding questions*
> > As mentioned above, some don't like using the word Creds because this is
> > about producing connections.  I totally get this.  If there is general
> > consensus around another name, I am happy to change it.
> > I welcome any other suggestions or feedback on the structure of this
> > implementation.
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Reply via email to