Bumping this thread again. We have 2 "+1" votes on it currently (Me &
Jarek).

Would love to hear other's opinion and thoughts. Personally I feel this is
a great addition and the one I was looking forward to for a long time.

Regards,
Kaxil

On Sun, Mar 8, 2020 at 8:33 PM Kaxil Naik <[email protected]> wrote:

> Thanks, Daniel, a *big "+1"* to the AIP and the idea of integrating
> external vaults and using them as connections. Was in the roadmap of items
> I wanted to work on so very happy that it is coming together and we are
> already discussing it.
>
> I have also looked at the PR and for the most part, it looks good to me.
> Following are my comments
>
>
>    1. "*SecretsBackend*" is a more intuitive name in my opinion (we can
>    have a vote in case we don't come to a conclusion :D)
>    2. I think we should have *creds_backend_kwargs* (either separate
>    section or a config allowing JSON-string) or something similar to avoid
>    hardcoding configs (like *aws_ssm_prefix*/*aws_ssm_profile_name*)
>    related to a specific provider in *airflow.cfg*.
>    If we don't do that we would need to hardcode these configs when I (or
>    someone) adds support for Google's KMS and Hashicorp Vault :)
>    3. How do we authenticate to the Secrets/Creds Backend?
>    For Google Cloud KMS or Hashicorp Vault, the main way of
>    authenticating I think is using Environment Variable.
>    Of course (2) can solve this, for example, we can pass
>    *GOOGLE_APPLICATION_CREDENTIAL* JSON file path to *creds_backend_kwargs
>    *which would then authenticate with GCP
>    Or do we want to get that value from* Connection object stored in DB*?
>
>
> Regards,
> Kaxil
>
>
> On Sun, Mar 8, 2020 at 6:41 AM Daniel Standish <[email protected]>
> wrote:
>
>> Thanks Jarek for adding the PR link.  In addition to forgetting to include
>> that, I put the wrong AIP number in subject -- fixed now (AIP-33 is the
>> correct AIP).
>>
>> I like the name Secrets -- I will let the other ideas percolate and will
>> look into your suggestion about using the plugin system.
>>
>>
>> On Sat, Mar 7, 2020 at 8:09 PM Jarek Potiuk <[email protected]>
>> wrote:
>>
>> > +1. I like it. And I like the general direction of the name as well not
>> to
>> > be tied with connections.
>> >
>> > I think it's also worth mentioning that there is PR in progress about it
>> > here: https://github.com/apache/airflow/pull/6376/files
>> >
>> > I have one observation though. We might not necessarily go in this
>> > direction or maybe defer it to later discussion, but I thought about
>> > extending the use of such CredentialsBackend and make it simply a
>> > SecretsBackend (that would be a nice solution to discussing various
>> names
>> > for it :)).  Daniel - you might consider that as a "loose" comment.
>> >
>> > I do not want to hijack your topic here, so feel free to cut the
>> discussion
>> > here if you think it's going too far from the original proposal.
>> >
>> > Why don't we change it to a bit more generic "SecretsBackend" and
>> provide a
>> > more generic interface to use by the Hooks/Operators? I think the
>> current
>> > implementation is not very far from it. Additionally to just providing a
>> > "get_connection" method it could provide much more generic "get_secret"
>> and
>> > this would give the Hooks/Operators an opportunity to use it to retrieve
>> > all various kinds of secrets, not only Connections. We already have a
>> > half-baked solution for that with the _COMMAND variables and maybe we
>> can
>> > incorporate this in backwards-compatible way in the "SecretBackend'
>> > similarly as you did with Metastore and Env Backends.
>> >
>> > I think this would be a very handy feature - some of the secrets are not
>> > necessary in the URI form, they might be much more complex and contain
>> > better structured data and might be needed or available at different
>> times
>> > than when connection is retrieved. Often secrets are short-living and
>> they
>> > should be retrieved "just" before they are used - limiting it only to
>> when
>> > the Hook is created and only to URI is quite a limitation I think.
>> >
>> > Again - feel free to disregard that comment, I just thought it might be
>> a
>> > handy feature and maybe it's a good time to add it.
>> >
>> > J.
>> >
>> >
>> > On Sun, Mar 8, 2020 at 1:03 AM Daniel Standish <[email protected]>
>> > wrote:
>> >
>> > > We can currently retrieve connections from environment variables or
>> the
>> > > metastore database.
>> > >
>> > > AIP-33
>> > > <
>> >
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend>
>> > > provides
>> > > a way to retrieve them from other sources, for example AWS SSM
>> parameter
>> > > store.
>> > >
>> > > There are many instances in airflow where we allow for user
>> customization
>> > > like this, for example with auth backend and hostname callable -- it
>> just
>> > > hasn't been done yet with creds.
>> > >
>> > > *How is it implemented?*
>> > > This adds a base class BaseCredsBackend which takes over BaseHook's
>> > > get_connections method.  Then EnvironmentVariablesCredsBackend and
>> > > MetastoreCredsBackend are added as subclasses.  New implementations
>> can
>> > be
>> > > added and user can configure precedence.  E.g. instead of the default
>> env
>> > > vars > metastore, a user could specify SSM > metastore > env vars in
>> > > airflow.cfg.
>> > >
>> > > *Does this break anything?*
>> > > No. This is a relatively simple refactor that results in no change in
>> the
>> > > default behavior unless user modifies config to enable an alternative
>> > > backend.
>> > >
>> > > *Why is this worth adding?*
>> > > If implemented, you could store creds anywhere, as long as you can
>> write
>> > a
>> > > `get_connections` method that retrieves them.
>> > > This can make it possible for a team of devs to share one creds source
>> > > rather than each dev storing them in text files, for example.
>> > > Can also make it easier to spin up a dev instance since you don't
>> have to
>> > > worry about loading creds.
>> > > And if you don't have access to the airflow CLI (because you are on a
>> > cloud
>> > > platform perhaps) then this can provide an easy way to load and manage
>> > > creds.
>> > >
>> > > *Why do you call it creds?*
>> > > I don't need to call it creds.  I could call it BaseConnectionBackend
>> or
>> > > BaseConnectionInfoBackend or any other thing -- I don't care too much.
>> > > There was something about calling it connection backend that was
>> > > unsatisfying.  Perhaps because connection is a bit of an ambiguous
>> word.
>> > > So even though Connection is the name of the model that we instantiate
>> > when
>> > > retrieving these creds, creds seemed more specific and more
>> > > representative.  But again if there is consensus on something else, I
>> am
>> > > happy to change.
>> > > This adds a `creds` package under `airflow`.  Alternatively would we
>> > create
>> > > a `connections` package?  Would this cause confusion relative to the
>> > > *model*
>> > > Connection?
>> > > Another thought: even though today this backend will produce a
>> connection
>> > > object, maybe in the future it can produce a different model -- either
>> > way,
>> > > this backend is still about retrieving *creds*.
>> > >
>> > > *Outstanding questions*
>> > > As mentioned above, some don't like using the word Creds because this
>> is
>> > > about producing connections.  I totally get this.  If there is general
>> > > consensus around another name, I am happy to change it.
>> > > I welcome any other suggestions or feedback on the structure of this
>> > > implementation.
>> > >
>> >
>> >
>> > --
>> >
>> > Jarek Potiuk
>> > Polidea <https://www.polidea.com/> | Principal Software Engineer
>> >
>> > M: +48 660 796 129 <+48660796129>
>> > [image: Polidea] <https://www.polidea.com/>
>> >
>>
>

Reply via email to