Are we all talking about different things 😁 ?

So what I feel that the use case that Nathan defined can just be solved a
VaultHook & VaultOperator for example.

This should not be confused with "Secrets" at all. Why do we need to create
a generic Hooks for all Secrets Backend?

Consider we use PostgreSQL for backend and the connection is defined in
airflow.cfg. Now you can still use the MySQLHook and PostgresHook
independently to connect to those Databases, correct.

But they both should not be confused to be using anything "shared".

The proposal if I interpret correctly talks about the following:

We have an idea that we might want also (on top of the above SecretManager
> implementation) define generic Hooks for accessing secrets from those
> services (just generic secrets, not connection, variables). Simply treat
> each of the backends above as another "provider" and create a Hook to
> access the service. Such Hook could have just one method:
> def get_secret(self, path_prefix: str, secret_id: str) -> Optional[str]
> It would use a connection defined (as usual) in ENV variables or database
> of Airflow to authenticate with the secret service and retrieve the
> secrets.


The connection can be defined in The Secrets backend. To make it clearer,
"Vault" in Nathan's case is a "Service" and has nothing to do with
SecretsBackend similar to how PostgresHook or MySQLHook has nothing to do
with using Postgres as Airflow MetadataDB Backend.

Another example is Google KMS, there is already Hook for Google KMS (
https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/hooks/kms.py)
and an Operator can be created. Same can be done for Google Secrets Manager
and Hashicorp Vault, in which cases all of these are "Services".

We could create SecretsHook similar to DbApiHook (
https://github.com/apache/airflow/blob/master/airflow/hooks/dbapi_hook.py)
if we want to just define the single *get_secret* method you talked about.

The concept of "Secrets Backend" is to allow Managing of "Secrets used in
Airflow" (Either to connect to an external system or Variables) in actual
Secret Management Tools.


*Pros:*
>  And I
> well imagine this might be actually even more convenient to configure
> connections in the DB and access secrets this way rather than having to
> configure Secret Backends in Airflow configuration.


This is exactly where both "Secrets" and the "Service" terms are mixed I
think. Again echoing what I said above : The concept of "Secrets Backend"
is to allow Managing of "Secrets used in Airflow".
The Secrets Backend is so that you don't need to store secrets in Airflow
Metadata DB whether they can encrypted or not as there are tools that are
specifically designed to handle "Secrets, rotation of secrets etc". Having
the Hook and Operator to talk to the Service should be separate.


* Another benefit of it is that it would allow people still stuck on pre
> 1.10.10 to  write custom operators that would like to use secret backends
> (via backport operators). And still continue doing it in the future
> (possibly migrating to 2.0/1.10.10+ in cases when there is one secret
> backed only - but continue ot use connections/hooks where some specific
> secrets shoudl be kept in different secret backend.


What is the objective here: (1) is it to interact with those Services
(Vault or Secrets Manager etc) or (2) Get Airflow Connections and Variables
from different Secrets Backend


Regards,
Kaxil


On Mon, May 18, 2020 at 8:07 PM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> Thanks Nathan,
>
> I think your case is really good example where the Hook might be really
> useful (and apparently somebody did it already via Hooks).
>
> I wonder Nathan if you (in the future) switch to secret backend - would you
> use the same secret backend for Airlfow connections/variables? Or do you
> foresee that you will have another backend/credentials to access it?
>
> Maybe others had similar experiences - and would like to share it here?
>
> I still think there is a valid point in having separate hooks. Those are my
> points:
>
> 1) Seems that the use pattern is close to what I described - separe secret
> backend that contains more "dynamic" secrets. And I think still being able
> to used different connections is a nice way of accessing multiple backend
> credentials within Airflow core. I think there was a good reason why only
> one backend is considered for "core" and it really ill-sutied to support
> multiple credential backends. I can hardly imagine reading connections, or
> variables from multiple secret backends. How would you choose which backend
> to use for different variables? Fallback mechanisms? I think it's hardly
> useful.  Hooks on the other hand (via connections) has built in way to
> choose different backends and it's use pattern for custom operators is
> really standard "airflow" way.
>
> 2) Python operator is not the best idea, because you need to provide
> credentials to access secret backend. It can be done - of course - via
> environment variables. but using connection from Airlfow has the additional
> advantage of being encrypted at rest in the database. And with Hooks being
> the common denominator of accessing external services (secret backend being
> one of them) - it can hide all the authorisation and communication details
> from the operators using the hook (this is basically what hook is for).
>
> 3) I have a good parallell here I think.  I would compare my proposal to
> the current way we use Postgres and MySQL hooks vs. using SQLAlchemy for
> Airflow itself. While Airflow uses Postgres and MySQL to provide it's
> internal database, it also has the "postgres" and "MySQL" providers that
> provide hooks that access the database in a "generic" way (and those hooks
> are used by a number of operators). We still can choose various databases
> to connect to via hooks - even if "Airflow core" uses that single database.
>
> J.
>
> On Mon, May 18, 2020 at 5:57 PM Nathan Hadfield <nathan.hadfi...@king.com>
> wrote:
>
> > Yep, I understand.  I wasn't necessarily advocating for a Vault hook;
> just
> > wanted to give some real world colour to the conversation and what we did
> > to solve our needs prior to the secrets backend.
> >
> > I'm sure that extending the class would also enable the same
> functionality.
> >
> > Cheers,
> >
> > Nathan
> >
> > On 18/05/2020, 16:46, "Ash Berlin-Taylor" <a...@apache.org> wrote:
> >
> >     Accessing things that aren't connections or variables is, essentially
> >     creating a third class of thing that Secrets store.
> >
> >     But that is a separate issue to what Jarek is proposing, which is
> > Hooks.
> >
> >     For your use case a Python operator sounds like the best fit. A hook
> is
> >     going to have to target the lowest common denominator, which means
> >     vault-specific things are just a needless layer over the top.
> >
> >     Extending the existing Secrets Backend interface to support that is
> >     doable, but I don't see the need for a Hook. Not everything needs to
> be
> >     a hook :)
> >
> >     -ash
> >
> >
> >     On May 18 2020, at 4:41 pm, Nathan Hadfield <
> nathan.hadfi...@king.com>
> > wrote:
> >
> >     > Hey,
> >     >
> >     >
> >     >
> >     > My quick two cents are that it would be good to access secrets that
> >     > are not explicitly either connections or variables
> >     >
> >     >
> >     >
> >     > We have a need for DAGs that feature more complex interactions with
> >     > Vault - which typically end up being custom operators - that I
> think
> >     > would be helped by more generic capabilities.
> >     >
> >     >
> >     >
> >     > For example, we have an automated system that regularly rotates GCP
> >     > service accounts across the whole company and stores them in Vault.
> >     > We then have to ensure that our different Looker environments
> always
> >     > have these SAs before the old ones expire every 48 hours.  To do
> > this,
> >     > we wrote a Vault Hook and a Looker Hook and them combine them in an
> >     > operator which would read every SA from a specific Vault path and
> > then
> >     > update the connection inside Looker.
> >     >
> >     >
> >     >
> >     > I don’t know if this will influence your thinking in any way but
> just
> >     > wanted to briefly share our experiences.  If anyone would like to
> >     > learn more then please reach out and I’d be happy to share more.
> >     >
> >     >
> >     >
> >     > Cheers,
> >     >
> >     > Nathan
> >     >
> >     >
> >     >
> >     > On 18/05/2020, 15:21, "Ash Berlin-Taylor" <a...@apache.org> wrote:
> >     >
> >     >
> >     >
> >     >
> >     >
> >     >    > The good thing with it is that you could have easily multiple
> > secret
> >     >
> >     >    > backends configured to retrieve secrets for specific "service"
> > (so
> >     >
> >     >    > that you
> >     >
> >     >    > could keep "generic airflow's secerts" in one backend but
> still
> > have
> >     >
> >     >    > possibility of custom operators to use other backends (with
> > different
> >     >
> >     >    > authentication, scopes etc.).
> >     >
> >     >
> >     >
> >     >    Having the ability to configure multiple secrets backends is
> > independent
> >     >
> >     >    of this feature. The original PR/AIP to add Secrets Backends
> >     > decided to
> >     >
> >     >    leave this ability out as it was more complex to configure. We
> >     > could add
> >     >
> >     >    that back in.
> >     >
> >     >
> >     >
> >     >    I still don't quite get from your example where you are
> proposing
> > this
> >     >
> >     >    would be used? Can you give a fuller example please? Do you
> have a
> >     >
> >     >    concrete use case where you need this?
> >     >
> >     >
> >     >
> >     >    Not everything in Airflow needs to be a hook; just access the
> > secrets
> >     >
> >     >    backend directly. I'm not sure what wrapping an extra layer
> > around these
> >     >
> >     >    classes gives us?
> >     >
> >     >
> >     >
> >     >    Without a concrete example I can't see anything other than this
> >     > adds a
> >     >
> >     >    lot of complexity.
> >     >
> >     >
> >     >
> >     >    -ash
> >     >
> >     >
> >     >
> >     >
> >     >
> >     >    On May 18 2020, at 2:45 pm, Jarek Potiuk <
> > jarek.pot...@polidea.com> wrote:
> >     >
> >     >
> >     >
> >     >    > Hello Everyone,
> >     >
> >     >    >
> >     >
> >     >    > TL;DR; I was just about to start to work on a small set of
> > Hooks -
> >     >
> >     >    > dedicated to retrieving screts from the Secret Backend. I
> >     > discussed it
> >     >
> >     >    > with Ash
> >     >
> >     >    > and Kamil
> >     >
> >     >    >
> >     >
> >     >
> >     > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__apache-2Dairflow.slack.com_archives_C0145R4NPS5_p1589805908013700&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=NBBItsFcPZR-C26VepQEehBPNPEWUsxar_DatX5ulco&e=
> >     > > on
> >     >
> >     >    > Slack today. So far I thought I treat them as usual providers,
> >     > but Ash
> >     >
> >     >    > raised some valid concenrs. so I wanted to raise teh proposal
> >     > before I
> >     >
> >     >    > start working on it/
> >     >
> >     >    >
> >     >
> >     >    > *Context:*
> >     >
> >     >    >
> >     >
> >     >    > Currently we have "Secret Backend" support built in in 2.0 and
> >     >
> >     >    > 1.10.10+. It
> >     >
> >     >    > includes retrieving the variable and connections (via Secret
> >     > Manager class)
> >     >
> >     >    > for:
> >     >
> >     >    >
> >     >
> >     >    >   -  Hashicorp Vault
> >     >
> >     >    >   -  Secret Manager
> >     >
> >     >    >   -  KMS
> >     >
> >     >    >   -  AWS secret manager
> >     >
> >     >    >
> >     >
> >     >    > Those secret managers are configured in:
> >     >
> >     >    >
> >     >
> >     >    > [secret]
> >     >
> >     >    > backend=<SecretManagerClass>
> >     >
> >     >    > backend_kwargs={}
> >     >
> >     >    >
> >     >
> >     >    > Those are available for use in a nice way (via Jinja templates
> >     > and the
> >     >
> >     >    > like), but they need support in the Core of Airlfow (so
> require
> > 1.10.10+).
> >     >
> >     >    > This means that if you are on pre 1.10.10 you cannot use those
> > secrets.
> >     >
> >     >    > Currently you can only use one secret per whole Airflow
> > installation
> >     >
> >     >    > so if
> >     >
> >     >    > your secrets are split between several secret managers (or if
> >     > secrets for
> >     >
> >     >    > particular service require different credentials) - you cannot
> >     > use the
> >     >
> >     >    > mechanism to access such distributed secrets. It's not often
> >     > case, but I
> >     >
> >     >    > very well imagine it might happen that there are different
> sets
> > of
> >     >
> >     >    > credentials to access different secrets - some services might
> > have
> >     >
> >     >    > different scopes/level of access needed. .
> >     >
> >     >    >
> >     >
> >     >    > *Proposal*
> >     >
> >     >    >
> >     >
> >     >    > We have an idea that we might want also (on top of the above
> > SecretManager
> >     >
> >     >    > implementation) define generic Hooks for accessing secrets
> from
> > those
> >     >
> >     >    > services (just generic secrets, not connection, variables).
> >     > Simply treat
> >     >
> >     >    > each of the backends above as another "provider" and create a
> >     > Hook to
> >     >
> >     >    > access the service. Such Hook could have just one method:
> >     >
> >     >    >
> >     >
> >     >    > def get_secret(self, path_prefix: str, secret_id: str) ->
> > Optional[str]
> >     >
> >     >    >
> >     >
> >     >    > It would use a connection defined (as usual) in ENV variables
> > or database
> >     >
> >     >    > of Airflow to authenticate with the secret service and
> retrieve
> > the
> >     >
> >     >    > secrets.
> >     >
> >     >    >
> >     >
> >     >    > The good thing with it is that you could have easily multiple
> > secret
> >     >
> >     >    > backends configured to retrieve secrets for specific "service"
> > (so
> >     >
> >     >    > that you
> >     >
> >     >    > could keep "generic airflow's secerts" in one backend but
> still
> > have
> >     >
> >     >    > possibility of custom operators to use other backends (with
> > different
> >     >
> >     >    > authentication,  scopes etc.). And it is not touching any of
> the
> >     >
> >     >    > "core" of
> >     >
> >     >    > Airflow. It's just a set of hooks with corresponding
> connections
> >     > that work
> >     >
> >     >    > the same way as accessing any other provider in Airflow. No
> core
> >     > of Airflow
> >     >
> >     >    > will be touched with this change.
> >     >
> >     >    >
> >     >
> >     >    > *Pros/Cons*
> >     >
> >     >    >
> >     >
> >     >    > *Con:*
> >     >
> >     >    >
> >     >
> >     >    > I do realise it is a bit of duplication in functionality. We
> > already
> >     >
> >     >    > have a
> >     >
> >     >    > way to connect to a secret backend via airflow configuration
> and
> >     > we should
> >     >
> >     >    > likely promote it rather than introduce additional mechanism.
> >     >
> >     >    >
> >     >
> >     >    > *Pros:*
> >     >
> >     >    >
> >     >
> >     >    > * Most of all -> it adds flexibility of accessing several
> > secret backends
> >     >
> >     >    > for different use-cases. I looked at it so far in the way
> those
> >     > hooks are
> >     >
> >     >    > merely another set of "provider hooks". For me this is nothing
> > different
> >     >
> >     >    > than "providers" for any other services we have.  fFr example
> > "cloudant"
> >     >
> >     >    > provider has only "CloudantHook" that other custom operators
> > can use.
> >     >
> >     >    > And I
> >     >
> >     >    > well imagine this might be actually even more convenient to
> > configure
> >     >
> >     >    > connections in the DB and access secrets this way rather than
> >     > having to
> >     >
> >     >    > configure Secret Backends in Airflow configuration.
> >     >
> >     >    >
> >     >
> >     >    > * The dupication there it is very, very limited (basically a
> > method
> >     >
> >     >    > call to
> >     >
> >     >    > secret backend).
> >     >
> >     >    >
> >     >
> >     >    > * Another benefit of it is that it would allow people still
> > stuck
> >     > on pre
> >     >
> >     >    > 1.10.10 to  write custom operators that would like to use
> > secret backends
> >     >
> >     >    > (via backport operators). And still continue doing it in the
> > future
> >     >
> >     >    > (possibly migrating to 2.0/1.10.10+ in cases when there is one
> > secret
> >     >
> >     >    > backed only - but continue ot use connections/hooks where some
> > specific
> >     >
> >     >    > secrets shoudl be kept in different secret backend.
> >     >
> >     >    >
> >     >
> >     >    > I would like to hear your opinion on that.
> >     >
> >     >    >
> >     >
> >     >    > J.
> >     >
> >     >    >
> >     >
> >     >    > --
> >     >
> >     >    >
> >     >
> >     >    > Jarek Potiuk
> >     >
> >     >    > Polidea
> >     > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e=
> >     > > | Principal Software Engineer
> >     >
> >     >    >
> >     >
> >     >    > M: +48 660 796 129 <+48660796129>
> >     >
> >     >    > [image: Polidea]
> >     > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e=
> > >
> >     >
> >     >    >
> >     >
> >
> >
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Reply via email to