We can currently retrieve connections from environment variables or the
metastore database.

AIP-33
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-33+Creds+backend>
provides
a way to retrieve them from other sources, for example AWS SSM parameter
store.

There are many instances in airflow where we allow for user customization
like this, for example with auth backend and hostname callable -- it just
hasn't been done yet with creds.

*How is it implemented?*
This adds a base class BaseCredsBackend which takes over BaseHook's
get_connections method.  Then EnvironmentVariablesCredsBackend and
MetastoreCredsBackend are added as subclasses.  New implementations can be
added and user can configure precedence.  E.g. instead of the default env
vars > metastore, a user could specify SSM > metastore > env vars in
airflow.cfg.

*Does this break anything?*
No. This is a relatively simple refactor that results in no change in the
default behavior unless user modifies config to enable an alternative
backend.

*Why is this worth adding?*
If implemented, you could store creds anywhere, as long as you can write a
`get_connections` method that retrieves them.
This can make it possible for a team of devs to share one creds source
rather than each dev storing them in text files, for example.
Can also make it easier to spin up a dev instance since you don't have to
worry about loading creds.
And if you don't have access to the airflow CLI (because you are on a cloud
platform perhaps) then this can provide an easy way to load and manage
creds.

*Why do you call it creds?*
I don't need to call it creds.  I could call it BaseConnectionBackend or
BaseConnectionInfoBackend or any other thing -- I don't care too much.
There was something about calling it connection backend that was
unsatisfying.  Perhaps because connection is a bit of an ambiguous word.
So even though Connection is the name of the model that we instantiate when
retrieving these creds, creds seemed more specific and more
representative.  But again if there is consensus on something else, I am
happy to change.
This adds a `creds` package under `airflow`.  Alternatively would we create
a `connections` package?  Would this cause confusion relative to the *model*
Connection?
Another thought: even though today this backend will produce a connection
object, maybe in the future it can produce a different model -- either way,
this backend is still about retrieving *creds*.

*Outstanding questions*
As mentioned above, some don't like using the word Creds because this is
about producing connections.  I totally get this.  If there is general
consensus around another name, I am happy to change it.
I welcome any other suggestions or feedback on the structure of this
implementation.

Reply via email to