This is why I was asking for a concrete example of where you'd want to
use this, right now I still can't see what problem you are aiming to
solve with this proposal.

So I'll ask again Jarek: Do you have a concrete use case that is driving
you wanting to create SecretManagerHook? 

> Straightforward API to be used by the operators.

We already have that, don't we? It's the SecretsBackend API.


Would your ask be solved by being able to configure multiple secrets
backends rather than a just a single one?


> hook = SecretManagerHook(conn_id = conn_id)
> hook.decrypt(key="KEY")

I don't think any of the  secrets backends support encrypting/decrypting
values, did you mean `hook.get_secret` here?

A counter proposal: Just use Variables within the operator. There is
essentially no difference between "a secret" and "a variable", and
doesn't introduce a whole new concept to Airflow.


The advantage of Hooks is they are know how to ask for a connection ID
to find their credentials.

But for a secrets backend/hook that all gets very self-referential. So
to use KMSHook does that mean I need to configure the secrets backend
(to look up connections) _and_ create a connection for itself in KMS so
the KMSHook and connect to it?

That is my major complaint I think. It strikes me as a very messy API
that is prone to user confusion and hard to document.


I do have one ask though if we do go down this route: that we don't end
up duplicating code to speak to the Secrets providers (i.e. in hooks and
in secrets backends) - it should live in one place only. (I'm sure you'd
do this anyway, I just wanted to state it)

-ash


On May 18 2020, at 10:42 pm, Jarek Potiuk <jarek.pot...@polidea.com> wrote:

> On Mon, May 18, 2020 at 11:17 PM Ash Berlin-Taylor <a...@apache.org> wrote:
> 
>> > GCP Secret Manager Hook, Vault Hook, KMS Hook, AWS Secret Hook
>> 
>> Why do we even need Hooks for those? Why can't we use the existing GCP
>> Secret Manager class inside a custom operator? What does creating a hook
>> give us?
>> 
> 
> The same as all the other hooks. Common way to authenticate to the service
> (using Airflow Connection mechanism). Straightforward API to be used
> by the
> operators.
> 
> Now that Kaxil mentioned it - This is exactly what KMS hooks gives - it
> cause already defined
> connection id in Airflow DB to authenticate to KMS and encrypt/decrypt
> secret. Please take a look there:
> 
> https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py
> 
> Then in operator I'd use it like:
> 
> hook = KMSHook(conn_id = conn_id)
> hook.decrypt(key="KEY", cipher ="") (the cipher part is KMS-specific)..
> or
> 
> hook = SecretManagerHook(conn_id = conn_id)
> hook.decrypt(key="KEY")
> 
> Which is substantially easier than handling all the
> authentication/credential options (for example in GCP case it handles all
> the different forms of authentication 0 like variables, json file,
> connection-encoded-credentials) out of the box. This is the very same
> reason pretty much any hook exists in Airflow. The only difference for
> secrets is that it makes no sense to write operators for them because of
> having to pass decrypted secrets via XCom.
> 
> From the very beginning of this conversation, I was surprised it is at all
> a problem for anyone.
> 
> I never intended to make any shared service, I just wanted to implement
> what Kaxil described - separate hooks for all the secrets - same as any
> other service.
> I am quite surprised it is a problem for anyone (now knowing that KMS Hook
> already exist at all makes it even more surprising).
> J.
> 
> 
> 
>> 
>> -a
>> 
>> On May 18 2020, at 9:50 pm, Jarek Potiuk <jarek.pot...@polidea.com> wrote:
>> 
>> >> Are we all talking about different things ?
>> >
>> > Good point. I think that's the main source of confusion here and we
>> > think about different things.
>> >
>> >> So what I feel that the use case that Nathan defined can just be
>> >> solved a
>> >> VaultHook & VaultOperator for example.
>> >
>> > That's what I was talking (from the beginning - maybe it was not
>> > clear) about separate hooks for each service. Not a shared one. GCP
>> > Secret Manager Hook, Vault Hook,  KMS Hook, AWS Secret Hook - all of
>> > them separate, in different providers, and simple hooks to be used by
>> > whoever wants to use them in their custom operators.
>> >
>> > We also talked about implementing operators, but there is very little
>> > use of generic Operators for secrets. Mainly because the only way
>> > operators can pass anything to other operators (tasks) is via xcom
>> > which would make the secrets stored plain text in the database. That
>> > is rather bad I am afraid. Having Hooks make them instantiatable in
>> > the context of running tasks, use Fernet to decrypt credentials from
>> > the Connection DB, request to retrieve secret from the backend and
>> > pass the unencrypted secret to the other parts of the operator - all
>> > in the context of a single worker/task.
>> >
>> >>
>> >> This should not be confused with "Secrets" at all. Why do we need to
>> create
>> >> a generic Hooks for all Secrets Backend?
>> >
>> > No generic hooks :). I never meant it to be generic.Maybe that's a
>> > confusion there - I wanted to implement a separate hook for every type
>> > of backend.
>> >
>> >> Consider we use PostgreSQL for backend and the connection is
>> defined in
>> >> airflow.cfg. Now you can still use the MySQLHook and PostgresHook
>> >> independently to connect to those Databases, correct.
>> >>
>> >> But they both should not be confused to be using anything "shared".
>> >
>> > No plans for that whatsoever.
>> >
>> >> The proposal if I interpret correctly talks about the following:
>> >>
>> >> We have an idea that we might want also (on top of the above
>> SecretManager
>> >> > implementation) define generic Hooks for accessing secrets from those
>> >> > services (just generic secrets, not connection, variables). Simply
>> treat
>> >> > each of the backends above as another "provider" and create a
>> Hook to
>> >> > access the service. Such Hook could have just one method:
>> >> > def get_secret(self, path_prefix: str, secret_id: str) ->
>> Optional[str]
>> >> > It would use a connection defined (as usual) in ENV variables or
>> database
>> >> > of Airflow to authenticate with the secret service and retrieve the
>> >> > secrets.
>> >
>> > OK. maybe confusion is about 'generic' . My "generic" was ("no
>> > connections, no variables") - just retrieve "generic" secret. Separate
>> > implementation for Hashicorp Vault, Separate for Secret Manager, etc.
>> >
>> >> The connection can be defined in The Secrets backend. To make it
>> clearer,
>> >> "Vault" in Nathan's case is a "Service" and has nothing to do with
>> >> SecretsBackend similar to how PostgresHook or MySQLHook has nothing
>> >> to do
>> >> with using Postgres as Airflow MetadataDB Backend.
>> >>
>> >> Another example is Google KMS, there is already Hook for Google
>> KMS (
>> >>
>> https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py
>> )
>> >> and an Operator can be created. Same can be done for Google Secrets
>> Manager
>> >> and Hashicorp Vault, in which cases all of these are "Services".
>> >
>> > That's exactly what I plan to implement. As explained above - Operator
>> > for secrets makes no sense because it would have to pass the secrets
>> > via xcom :(. I did not even check that we already have KMS hook. I was
>> > mostly about Vault and Secret Manager and AWS Secret Manager. Knowing
>> > that we have KMS makes it even easier :).
>> >
>> >> We could create SecretsHook similar to DbApiHook (
>> >>
>> https://github.com/apache/airflow/blob/master/airflow/hooks/dbapi_hook.py)
>> >> if we want to just define the single *get_secret* method you talked
>> about.
>> >
>> > I don't even plan that in fact, I thought about implementing several
>> > totally independent Hooks for each of the Backend Secrets.
>> >
>> >> The concept of "Secrets Backend" is to allow Managing of "Secrets
>> >> used in
>> >> Airflow" (Either to connect to an external system or Variables) in
>> actual
>> >> Secret Management Tools.
>> >>
>> >
>> > Yeah. I do not - at all - want to mess with that :)
>> >
>> >>
>> >> *Pros:*
>> >> >  And I
>> >> > well imagine this might be actually even more convenient to configure
>> >> > connections in the DB and access secrets this way rather than
>> >> having to
>> >> > configure Secret Backends in Airflow configuration.
>> >>
>> >> This is exactly where both "Secrets" and the "Service" terms are
>> >> mixed I
>> >> think. Again echoing what I said above : The concept of "Secrets
>> Backend"
>> >> is to allow Managing of "Secrets used in Airflow".
>> >> The Secrets Backend is so that you don't need to store secrets in
>> Airflow
>> >> Metadata DB whether they can encrypted or not as there are tools that
>> are
>> >> specifically designed to handle "Secrets, rotation of secrets etc".
>> Having
>> >> the Hook and Operator to talk to the Service should be separate.
>> >
>> > Full agreement - I do not want to intermix those. It was always
>> > thought as per-provider implementation of traditional "Hook".
>> >
>> >>
>> >> * Another benefit of it is that it would allow people still stuck
>> on pre
>> >> > 1.10.10 to  write custom operators that would like to use secret
>> backends
>> >> > (via backport operators). And still continue doing it in the future
>> >> > (possibly migrating to 2.0/1.10.10+ in cases when there is one secret
>> >> > backed only - but continue ot use connections/hooks where some
>> specific
>> >> > secrets shoudl be kept in different secret backend.
>> >>
>> >>
>> >> What is the objective here: (1) is it to interact with those Services
>> >> (Vault or Secrets Manager etc) or (2) Get Airflow Connections and
>> Variables
>> >> from different Secrets Backend
>> >
>> > Just to interact with it - no plans at all to get Airflow Connections
>> > nor Variables.
>> >
>> >>
>> >> Regards,
>> >> Kaxil
>> >>
>> >
>> 
> 
> 
> -- 
> 
> Jarek Potiuk
> 

Reply via email to