> Are we all talking about different things ?

Good point. I think that's the main source of confusion here and we
think about different things.

> So what I feel that the use case that Nathan defined can just be solved a
> VaultHook & VaultOperator for example.

That's what I was talking (from the beginning - maybe it was not
clear) about separate hooks for each service. Not a shared one. GCP
Secret Manager Hook, Vault Hook,  KMS Hook, AWS Secret Hook - all of
them separate, in different providers, and simple hooks to be used by
whoever wants to use them in their custom operators.

We also talked about implementing operators, but there is very little
use of generic Operators for secrets. Mainly because the only way
operators can pass anything to other operators (tasks) is via xcom
which would make the secrets stored plain text in the database. That
is rather bad I am afraid. Having Hooks make them instantiatable in
the context of running tasks, use Fernet to decrypt credentials from
the Connection DB, request to retrieve secret from the backend and
pass the unencrypted secret to the other parts of the operator - all
in the context of a single worker/task.

>
> This should not be confused with "Secrets" at all. Why do we need to create
> a generic Hooks for all Secrets Backend?

No generic hooks :). I never meant it to be generic.Maybe that's a
confusion there - I wanted to implement a separate hook for every type
of backend.

> Consider we use PostgreSQL for backend and the connection is defined in
> airflow.cfg. Now you can still use the MySQLHook and PostgresHook
> independently to connect to those Databases, correct.
>
> But they both should not be confused to be using anything "shared".

No plans for that whatsoever.

> The proposal if I interpret correctly talks about the following:
>
> We have an idea that we might want also (on top of the above SecretManager
> > implementation) define generic Hooks for accessing secrets from those
> > services (just generic secrets, not connection, variables). Simply treat
> > each of the backends above as another "provider" and create a Hook to
> > access the service. Such Hook could have just one method:
> > def get_secret(self, path_prefix: str, secret_id: str) -> Optional[str]
> > It would use a connection defined (as usual) in ENV variables or database
> > of Airflow to authenticate with the secret service and retrieve the
> > secrets.

OK. maybe confusion is about 'generic' . My "generic" was ("no
connections, no variables") - just retrieve "generic" secret. Separate
implementation for Hashicorp Vault, Separate for Secret Manager, etc.

> The connection can be defined in The Secrets backend. To make it clearer,
> "Vault" in Nathan's case is a "Service" and has nothing to do with
> SecretsBackend similar to how PostgresHook or MySQLHook has nothing to do
> with using Postgres as Airflow MetadataDB Backend.
>
> Another example is Google KMS, there is already Hook for Google KMS (
> https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py)
> and an Operator can be created. Same can be done for Google Secrets Manager
> and Hashicorp Vault, in which cases all of these are "Services".

That's exactly what I plan to implement. As explained above - Operator
for secrets makes no sense because it would have to pass the secrets
via xcom :(. I did not even check that we already have KMS hook. I was
mostly about Vault and Secret Manager and AWS Secret Manager. Knowing
that we have KMS makes it even easier :).

> We could create SecretsHook similar to DbApiHook (
> https://github.com/apache/airflow/blob/master/airflow/hooks/dbapi_hook.py)
> if we want to just define the single *get_secret* method you talked about.

I don't even plan that in fact, I thought about implementing several
totally independent Hooks for each of the Backend Secrets.

> The concept of "Secrets Backend" is to allow Managing of "Secrets used in
> Airflow" (Either to connect to an external system or Variables) in actual
> Secret Management Tools.
>

Yeah. I do not - at all - want to mess with that :)

>
> *Pros:*
> >  And I
> > well imagine this might be actually even more convenient to configure
> > connections in the DB and access secrets this way rather than having to
> > configure Secret Backends in Airflow configuration.
>
> This is exactly where both "Secrets" and the "Service" terms are mixed I
> think. Again echoing what I said above : The concept of "Secrets Backend"
> is to allow Managing of "Secrets used in Airflow".
> The Secrets Backend is so that you don't need to store secrets in Airflow
> Metadata DB whether they can encrypted or not as there are tools that are
> specifically designed to handle "Secrets, rotation of secrets etc". Having
> the Hook and Operator to talk to the Service should be separate.

Full agreement - I do not want to intermix those. It was always
thought as per-provider implementation of traditional "Hook".

>
> * Another benefit of it is that it would allow people still stuck on pre
> > 1.10.10 to  write custom operators that would like to use secret backends
> > (via backport operators). And still continue doing it in the future
> > (possibly migrating to 2.0/1.10.10+ in cases when there is one secret
> > backed only - but continue ot use connections/hooks where some specific
> > secrets shoudl be kept in different secret backend.
>
>
> What is the objective here: (1) is it to interact with those Services
> (Vault or Secrets Manager etc) or (2) Get Airflow Connections and Variables
> from different Secrets Backend

Just to interact with it - no plans at all to get Airflow Connections
nor Variables.

>
> Regards,
> Kaxil
>

Reply via email to