On Mon, May 18, 2020 at 11:17 PM Ash Berlin-Taylor <a...@apache.org> wrote:
> > GCP Secret Manager Hook, Vault Hook, KMS Hook, AWS Secret Hook > > Why do we even need Hooks for those? Why can't we use the existing GCP > Secret Manager class inside a custom operator? What does creating a hook > give us? > The same as all the other hooks. Common way to authenticate to the service (using Airflow Connection mechanism). Straightforward API to be used by the operators. Now that Kaxil mentioned it - This is exactly what KMS hooks gives - it cause already defined connection id in Airflow DB to authenticate to KMS and encrypt/decrypt secret. Please take a look there: https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py Then in operator I'd use it like: hook = KMSHook(conn_id = conn_id) hook.decrypt(key="KEY", cipher ="") (the cipher part is KMS-specific).. or hook = SecretManagerHook(conn_id = conn_id) hook.decrypt(key="KEY") Which is substantially easier than handling all the authentication/credential options (for example in GCP case it handles all the different forms of authentication 0 like variables, json file, connection-encoded-credentials) out of the box. This is the very same reason pretty much any hook exists in Airflow. The only difference for secrets is that it makes no sense to write operators for them because of having to pass decrypted secrets via XCom. >From the very beginning of this conversation, I was surprised it is at all a problem for anyone. I never intended to make any shared service, I just wanted to implement what Kaxil described - separate hooks for all the secrets - same as any other service. I am quite surprised it is a problem for anyone (now knowing that KMS Hook already exist at all makes it even more surprising). J. > > -a > > On May 18 2020, at 9:50 pm, Jarek Potiuk <jarek.pot...@polidea.com> wrote: > > >> Are we all talking about different things ? > > > > Good point. I think that's the main source of confusion here and we > > think about different things. > > > >> So what I feel that the use case that Nathan defined can just be > >> solved a > >> VaultHook & VaultOperator for example. > > > > That's what I was talking (from the beginning - maybe it was not > > clear) about separate hooks for each service. Not a shared one. GCP > > Secret Manager Hook, Vault Hook, KMS Hook, AWS Secret Hook - all of > > them separate, in different providers, and simple hooks to be used by > > whoever wants to use them in their custom operators. > > > > We also talked about implementing operators, but there is very little > > use of generic Operators for secrets. Mainly because the only way > > operators can pass anything to other operators (tasks) is via xcom > > which would make the secrets stored plain text in the database. That > > is rather bad I am afraid. Having Hooks make them instantiatable in > > the context of running tasks, use Fernet to decrypt credentials from > > the Connection DB, request to retrieve secret from the backend and > > pass the unencrypted secret to the other parts of the operator - all > > in the context of a single worker/task. > > > >> > >> This should not be confused with "Secrets" at all. Why do we need to > create > >> a generic Hooks for all Secrets Backend? > > > > No generic hooks :). I never meant it to be generic.Maybe that's a > > confusion there - I wanted to implement a separate hook for every type > > of backend. > > > >> Consider we use PostgreSQL for backend and the connection is defined in > >> airflow.cfg. Now you can still use the MySQLHook and PostgresHook > >> independently to connect to those Databases, correct. > >> > >> But they both should not be confused to be using anything "shared". > > > > No plans for that whatsoever. > > > >> The proposal if I interpret correctly talks about the following: > >> > >> We have an idea that we might want also (on top of the above > SecretManager > >> > implementation) define generic Hooks for accessing secrets from those > >> > services (just generic secrets, not connection, variables). Simply > treat > >> > each of the backends above as another "provider" and create a Hook to > >> > access the service. Such Hook could have just one method: > >> > def get_secret(self, path_prefix: str, secret_id: str) -> > Optional[str] > >> > It would use a connection defined (as usual) in ENV variables or > database > >> > of Airflow to authenticate with the secret service and retrieve the > >> > secrets. > > > > OK. maybe confusion is about 'generic' . My "generic" was ("no > > connections, no variables") - just retrieve "generic" secret. Separate > > implementation for Hashicorp Vault, Separate for Secret Manager, etc. > > > >> The connection can be defined in The Secrets backend. To make it > clearer, > >> "Vault" in Nathan's case is a "Service" and has nothing to do with > >> SecretsBackend similar to how PostgresHook or MySQLHook has nothing > >> to do > >> with using Postgres as Airflow MetadataDB Backend. > >> > >> Another example is Google KMS, there is already Hook for Google KMS ( > >> > https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py > ) > >> and an Operator can be created. Same can be done for Google Secrets > Manager > >> and Hashicorp Vault, in which cases all of these are "Services". > > > > That's exactly what I plan to implement. As explained above - Operator > > for secrets makes no sense because it would have to pass the secrets > > via xcom :(. I did not even check that we already have KMS hook. I was > > mostly about Vault and Secret Manager and AWS Secret Manager. Knowing > > that we have KMS makes it even easier :). > > > >> We could create SecretsHook similar to DbApiHook ( > >> > https://github.com/apache/airflow/blob/master/airflow/hooks/dbapi_hook.py) > >> if we want to just define the single *get_secret* method you talked > about. > > > > I don't even plan that in fact, I thought about implementing several > > totally independent Hooks for each of the Backend Secrets. > > > >> The concept of "Secrets Backend" is to allow Managing of "Secrets > >> used in > >> Airflow" (Either to connect to an external system or Variables) in > actual > >> Secret Management Tools. > >> > > > > Yeah. I do not - at all - want to mess with that :) > > > >> > >> *Pros:* > >> > And I > >> > well imagine this might be actually even more convenient to configure > >> > connections in the DB and access secrets this way rather than > >> having to > >> > configure Secret Backends in Airflow configuration. > >> > >> This is exactly where both "Secrets" and the "Service" terms are > >> mixed I > >> think. Again echoing what I said above : The concept of "Secrets > Backend" > >> is to allow Managing of "Secrets used in Airflow". > >> The Secrets Backend is so that you don't need to store secrets in > Airflow > >> Metadata DB whether they can encrypted or not as there are tools that > are > >> specifically designed to handle "Secrets, rotation of secrets etc". > Having > >> the Hook and Operator to talk to the Service should be separate. > > > > Full agreement - I do not want to intermix those. It was always > > thought as per-provider implementation of traditional "Hook". > > > >> > >> * Another benefit of it is that it would allow people still stuck on pre > >> > 1.10.10 to write custom operators that would like to use secret > backends > >> > (via backport operators). And still continue doing it in the future > >> > (possibly migrating to 2.0/1.10.10+ in cases when there is one secret > >> > backed only - but continue ot use connections/hooks where some > specific > >> > secrets shoudl be kept in different secret backend. > >> > >> > >> What is the objective here: (1) is it to interact with those Services > >> (Vault or Secrets Manager etc) or (2) Get Airflow Connections and > Variables > >> from different Secrets Backend > > > > Just to interact with it - no plans at all to get Airflow Connections > > nor Variables. > > > >> > >> Regards, > >> Kaxil > >> > > > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> [image: Polidea] <https://www.polidea.com/>