> GCP Secret Manager Hook, Vault Hook, KMS Hook, AWS Secret Hook

Why do we even need Hooks for those? Why can't we use the existing GCP
Secret Manager class inside a custom operator? What does creating a hook
give us?

-a

On May 18 2020, at 9:50 pm, Jarek Potiuk <jarek.pot...@polidea.com> wrote:

>> Are we all talking about different things ?
> 
> Good point. I think that's the main source of confusion here and we
> think about different things.
> 
>> So what I feel that the use case that Nathan defined can just be
>> solved a
>> VaultHook & VaultOperator for example.
> 
> That's what I was talking (from the beginning - maybe it was not
> clear) about separate hooks for each service. Not a shared one. GCP
> Secret Manager Hook, Vault Hook,  KMS Hook, AWS Secret Hook - all of
> them separate, in different providers, and simple hooks to be used by
> whoever wants to use them in their custom operators.
> 
> We also talked about implementing operators, but there is very little
> use of generic Operators for secrets. Mainly because the only way
> operators can pass anything to other operators (tasks) is via xcom
> which would make the secrets stored plain text in the database. That
> is rather bad I am afraid. Having Hooks make them instantiatable in
> the context of running tasks, use Fernet to decrypt credentials from
> the Connection DB, request to retrieve secret from the backend and
> pass the unencrypted secret to the other parts of the operator - all
> in the context of a single worker/task.
> 
>> 
>> This should not be confused with "Secrets" at all. Why do we need to create
>> a generic Hooks for all Secrets Backend?
> 
> No generic hooks :). I never meant it to be generic.Maybe that's a
> confusion there - I wanted to implement a separate hook for every type
> of backend.
> 
>> Consider we use PostgreSQL for backend and the connection is defined in
>> airflow.cfg. Now you can still use the MySQLHook and PostgresHook
>> independently to connect to those Databases, correct.
>> 
>> But they both should not be confused to be using anything "shared".
> 
> No plans for that whatsoever.
> 
>> The proposal if I interpret correctly talks about the following:
>> 
>> We have an idea that we might want also (on top of the above SecretManager
>> > implementation) define generic Hooks for accessing secrets from those
>> > services (just generic secrets, not connection, variables). Simply treat
>> > each of the backends above as another "provider" and create a Hook to
>> > access the service. Such Hook could have just one method:
>> > def get_secret(self, path_prefix: str, secret_id: str) -> Optional[str]
>> > It would use a connection defined (as usual) in ENV variables or database
>> > of Airflow to authenticate with the secret service and retrieve the
>> > secrets.
> 
> OK. maybe confusion is about 'generic' . My "generic" was ("no
> connections, no variables") - just retrieve "generic" secret. Separate
> implementation for Hashicorp Vault, Separate for Secret Manager, etc.
> 
>> The connection can be defined in The Secrets backend. To make it clearer,
>> "Vault" in Nathan's case is a "Service" and has nothing to do with
>> SecretsBackend similar to how PostgresHook or MySQLHook has nothing
>> to do
>> with using Postgres as Airflow MetadataDB Backend.
>> 
>> Another example is Google KMS, there is already Hook for Google KMS (
>> https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py)
>> and an Operator can be created. Same can be done for Google Secrets Manager
>> and Hashicorp Vault, in which cases all of these are "Services".
> 
> That's exactly what I plan to implement. As explained above - Operator
> for secrets makes no sense because it would have to pass the secrets
> via xcom :(. I did not even check that we already have KMS hook. I was
> mostly about Vault and Secret Manager and AWS Secret Manager. Knowing
> that we have KMS makes it even easier :).
> 
>> We could create SecretsHook similar to DbApiHook (
>> https://github.com/apache/airflow/blob/master/airflow/hooks/dbapi_hook.py)
>> if we want to just define the single *get_secret* method you talked about.
> 
> I don't even plan that in fact, I thought about implementing several
> totally independent Hooks for each of the Backend Secrets.
> 
>> The concept of "Secrets Backend" is to allow Managing of "Secrets
>> used in
>> Airflow" (Either to connect to an external system or Variables) in actual
>> Secret Management Tools.
>> 
> 
> Yeah. I do not - at all - want to mess with that :)
> 
>> 
>> *Pros:*
>> >  And I
>> > well imagine this might be actually even more convenient to configure
>> > connections in the DB and access secrets this way rather than
>> having to
>> > configure Secret Backends in Airflow configuration.
>> 
>> This is exactly where both "Secrets" and the "Service" terms are
>> mixed I
>> think. Again echoing what I said above : The concept of "Secrets Backend"
>> is to allow Managing of "Secrets used in Airflow".
>> The Secrets Backend is so that you don't need to store secrets in Airflow
>> Metadata DB whether they can encrypted or not as there are tools that are
>> specifically designed to handle "Secrets, rotation of secrets etc". Having
>> the Hook and Operator to talk to the Service should be separate.
> 
> Full agreement - I do not want to intermix those. It was always
> thought as per-provider implementation of traditional "Hook".
> 
>> 
>> * Another benefit of it is that it would allow people still stuck on pre
>> > 1.10.10 to  write custom operators that would like to use secret backends
>> > (via backport operators). And still continue doing it in the future
>> > (possibly migrating to 2.0/1.10.10+ in cases when there is one secret
>> > backed only - but continue ot use connections/hooks where some specific
>> > secrets shoudl be kept in different secret backend.
>> 
>> 
>> What is the objective here: (1) is it to interact with those Services
>> (Vault or Secrets Manager etc) or (2) Get Airflow Connections and Variables
>> from different Secrets Backend
> 
> Just to interact with it - no plans at all to get Airflow Connections
> nor Variables.
> 
>> 
>> Regards,
>> Kaxil
>> 
> 

Reply via email to