Thanks, Daniel, a *big "+1"* to the AIP and the idea of integrating
external vaults and using them as connections. Was in the roadmap of items
I wanted to work on so very happy that it is coming together and we are
already discussing it.

I have also looked at the PR and for the most part, it looks good to me.
Following are my comments


   1. "*SecretsBackend*" is a more intuitive name in my opinion (we can
   have a vote in case we don't come to a conclusion :D)
   2. I think we should have *creds_backend_kwargs* (either separate
   section or a config allowing JSON-string) or something similar to avoid
   hardcoding configs (like *aws_ssm_prefix*/*aws_ssm_profile_name*)
   related to a specific provider in *airflow.cfg*.
   If we don't do that we would need to hardcode these configs when I (or
   someone) adds support for Google's KMS and Hashicorp Vault :)
   3. How do we authenticate to the Secrets/Creds Backend?
   For Google Cloud KMS or Hashicorp Vault, the main way of authenticating
   I think is using Environment Variable.
   Of course (2) can solve this, for example, we can pass
   *GOOGLE_APPLICATION_CREDENTIAL* JSON file path to
*creds_backend_kwargs *which
   would then authenticate with GCP
   Or do we want to get that value from* Connection object stored in DB*?


Regards,
Kaxil


On Sun, Mar 8, 2020 at 6:41 AM Daniel Standish <[email protected]> wrote:

> Thanks Jarek for adding the PR link.  In addition to forgetting to include
> that, I put the wrong AIP number in subject -- fixed now (AIP-33 is the
> correct AIP).
>
> I like the name Secrets -- I will let the other ideas percolate and will
> look into your suggestion about using the plugin system.
>
>
> On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <[email protected]>
> wrote:
>
> > +1. I like it. And I like the general direction of the name as well not
> to
> > be tied with connections.
> >
> > I think it's also worth mentioning that there is PR in progress about it
> > here: https://github.com/apache/airflow/pull/6376/files
> >
> > I have one observation though. We might not necessarily go in this
> > direction or maybe defer it to later discussion, but I thought about
> > extending the use of such CredentialsBackend and make it simply a
> > SecretsBackend (that would be a nice solution to discussing various names
> > for it :)).  Daniel - you might consider that as a "loose" comment.
> >
> > I do not want to hijack your topic here, so feel free to cut the
> discussion
> > here if you think it's going too far from the original proposal.
> >
> > Why don't we change it to a bit more generic "SecretsBackend" and
> provide a
> > more generic interface to use by the Hooks/Operators? I think the current
> > implementation is not very far from it. Additionally to just providing a
> > "get_connection" method it could provide much more generic "get_secret"
> and
> > this would give the Hooks/Operators an opportunity to use it to retrieve
> > all various kinds of secrets, not only Connections. We already have a
> > half-baked solution for that with the _COMMAND variables and maybe we can
> > incorporate this in backwards-compatible way in the "SecretBackend'
> > similarly as you did with Metastore and Env Backends.
> >
> > I think this would be a very handy feature - some of the secrets are not
> > necessary in the URI form, they might be much more complex and contain
> > better structured data and might be needed or available at different
> times
> > than when connection is retrieved. Often secrets are short-living and
> they
> > should be retrieved "just" before they are used - limiting it only to
> when
> > the Hook is created and only to URI is quite a limitation I think.
> >
> > Again - feel free to disregard that comment, I just thought it might be a
> > handy feature and maybe it's a good time to add it.
> >
> > J.
> >
> >
> > On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <[email protected]>
> > wrote:
> >
> > > We can currently retrieve connections from environment variables or the
> > > metastore database.
> > >
> > > AIP-33
> > > <
> > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend
> >
> > > provides
> > > a way to retrieve them from other sources, for example AWS SSM
> parameter
> > > store.
> > >
> > > There are many instances in airflow where we allow for user
> customization
> > > like this, for example with auth backend and hostname callable -- it
> just
> > > hasn't been done yet with creds.
> > >
> > > *How is it implemented?*
> > > This adds a base class BaseCredsBackend which takes over BaseHook's
> > > get_connections method.  Then EnvironmentVariablesCredsBackend and
> > > MetastoreCredsBackend are added as subclasses.  New implementations can
> > be
> > > added and user can configure precedence.  E.g. instead of the default
> env
> > > vars > metastore, a user could specify SSM > metastore > env vars in
> > > airflow.cfg.
> > >
> > > *Does this break anything?*
> > > No. This is a relatively simple refactor that results in no change in
> the
> > > default behavior unless user modifies config to enable an alternative
> > > backend.
> > >
> > > *Why is this worth adding?*
> > > If implemented, you could store creds anywhere, as long as you can
> write
> > a
> > > `get_connections` method that retrieves them.
> > > This can make it possible for a team of devs to share one creds source
> > > rather than each dev storing them in text files, for example.
> > > Can also make it easier to spin up a dev instance since you don't have
> to
> > > worry about loading creds.
> > > And if you don't have access to the airflow CLI (because you are on a
> > cloud
> > > platform perhaps) then this can provide an easy way to load and manage
> > > creds.
> > >
> > > *Why do you call it creds?*
> > > I don't need to call it creds.  I could call it BaseConnectionBackend
> or
> > > BaseConnectionInfoBackend or any other thing -- I don't care too much.
> > > There was something about calling it connection backend that was
> > > unsatisfying.  Perhaps because connection is a bit of an ambiguous
> word.
> > > So even though Connection is the name of the model that we instantiate
> > when
> > > retrieving these creds, creds seemed more specific and more
> > > representative.  But again if there is consensus on something else, I
> am
> > > happy to change.
> > > This adds a `creds` package under `airflow`.  Alternatively would we
> > create
> > > a `connections` package?  Would this cause confusion relative to the
> > > *model*
> > > Connection?
> > > Another thought: even though today this backend will produce a
> connection
> > > object, maybe in the future it can produce a different model -- either
> > way,
> > > this backend is still about retrieving *creds*.
> > >
> > > *Outstanding questions*
> > > As mentioned above, some don't like using the word Creds because this
> is
> > > about producing connections.  I totally get this.  If there is general
> > > consensus around another name, I am happy to change it.
> > > I welcome any other suggestions or feedback on the structure of this
> > > implementation.
> > >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>

Reply via email to