Jarek,

We are already using the secret backend for Airflow variables.  But, because of 
the example I explained and also a programmatic need to update our GCP Airflow 
connections every day, then we still have to maintain a secondary, custom 
method for Vault authentication and manipulation of other secrets.

Cheers,
 
Nathan

On 18/05/2020, 20:07, "Jarek Potiuk" <jarek.pot...@polidea.com> wrote:

    Thanks Nathan,

    I think your case is really good example where the Hook might be really
    useful (and apparently somebody did it already via Hooks).

    I wonder Nathan if you (in the future) switch to secret backend - would you
    use the same secret backend for Airlfow connections/variables? Or do you
    foresee that you will have another backend/credentials to access it?

    Maybe others had similar experiences - and would like to share it here?

    I still think there is a valid point in having separate hooks. Those are my
    points:

    1) Seems that the use pattern is close to what I described - separe secret
    backend that contains more "dynamic" secrets. And I think still being able
    to used different connections is a nice way of accessing multiple backend
    credentials within Airflow core. I think there was a good reason why only
    one backend is considered for "core" and it really ill-sutied to support
    multiple credential backends. I can hardly imagine reading connections, or
    variables from multiple secret backends. How would you choose which backend
    to use for different variables? Fallback mechanisms? I think it's hardly
    useful.  Hooks on the other hand (via connections) has built in way to
    choose different backends and it's use pattern for custom operators is
    really standard "airflow" way.

    2) Python operator is not the best idea, because you need to provide
    credentials to access secret backend. It can be done - of course - via
    environment variables. but using connection from Airlfow has the additional
    advantage of being encrypted at rest in the database. And with Hooks being
    the common denominator of accessing external services (secret backend being
    one of them) - it can hide all the authorisation and communication details
    from the operators using the hook (this is basically what hook is for).

    3) I have a good parallell here I think.  I would compare my proposal to
    the current way we use Postgres and MySQL hooks vs. using SQLAlchemy for
    Airflow itself. While Airflow uses Postgres and MySQL to provide it's
    internal database, it also has the "postgres" and "MySQL" providers that
    provide hooks that access the database in a "generic" way (and those hooks
    are used by a number of operators). We still can choose various databases
    to connect to via hooks - even if "Airflow core" uses that single database.

    J.

    On Mon, May 18, 2020 at 5:57 PM Nathan Hadfield <nathan.hadfi...@king.com>
    wrote:

    > Yep, I understand.  I wasn't necessarily advocating for a Vault hook; just
    > wanted to give some real world colour to the conversation and what we did
    > to solve our needs prior to the secrets backend.
    >
    > I'm sure that extending the class would also enable the same 
functionality.
    >
    > Cheers,
    >
    > Nathan
    >
    > On 18/05/2020, 16:46, "Ash Berlin-Taylor" <a...@apache.org> wrote:
    >
    >     Accessing things that aren't connections or variables is, essentially
    >     creating a third class of thing that Secrets store.
    >
    >     But that is a separate issue to what Jarek is proposing, which is
    > Hooks.
    >
    >     For your use case a Python operator sounds like the best fit. A hook 
is
    >     going to have to target the lowest common denominator, which means
    >     vault-specific things are just a needless layer over the top.
    >
    >     Extending the existing Secrets Backend interface to support that is
    >     doable, but I don't see the need for a Hook. Not everything needs to 
be
    >     a hook :)
    >
    >     -ash
    >
    >
    >     On May 18 2020, at 4:41 pm, Nathan Hadfield <nathan.hadfi...@king.com>
    > wrote:
    >
    >     > Hey,
    >     >
    >     >
    >     >
    >     > My quick two cents are that it would be good to access secrets that
    >     > are not explicitly either connections or variables
    >     >
    >     >
    >     >
    >     > We have a need for DAGs that feature more complex interactions with
    >     > Vault - which typically end up being custom operators - that I think
    >     > would be helped by more generic capabilities.
    >     >
    >     >
    >     >
    >     > For example, we have an automated system that regularly rotates GCP
    >     > service accounts across the whole company and stores them in Vault.
    >     > We then have to ensure that our different Looker environments always
    >     > have these SAs before the old ones expire every 48 hours.  To do
    > this,
    >     > we wrote a Vault Hook and a Looker Hook and them combine them in an
    >     > operator which would read every SA from a specific Vault path and
    > then
    >     > update the connection inside Looker.
    >     >
    >     >
    >     >
    >     > I don’t know if this will influence your thinking in any way but 
just
    >     > wanted to briefly share our experiences.  If anyone would like to
    >     > learn more then please reach out and I’d be happy to share more.
    >     >
    >     >
    >     >
    >     > Cheers,
    >     >
    >     > Nathan
    >     >
    >     >
    >     >
    >     > On 18/05/2020, 15:21, "Ash Berlin-Taylor" <a...@apache.org> wrote:
    >     >
    >     >
    >     >
    >     >
    >     >
    >     >    > The good thing with it is that you could have easily multiple
    > secret
    >     >
    >     >    > backends configured to retrieve secrets for specific "service"
    > (so
    >     >
    >     >    > that you
    >     >
    >     >    > could keep "generic airflow's secerts" in one backend but still
    > have
    >     >
    >     >    > possibility of custom operators to use other backends (with
    > different
    >     >
    >     >    > authentication, scopes etc.).
    >     >
    >     >
    >     >
    >     >    Having the ability to configure multiple secrets backends is
    > independent
    >     >
    >     >    of this feature. The original PR/AIP to add Secrets Backends
    >     > decided to
    >     >
    >     >    leave this ability out as it was more complex to configure. We
    >     > could add
    >     >
    >     >    that back in.
    >     >
    >     >
    >     >
    >     >    I still don't quite get from your example where you are proposing
    > this
    >     >
    >     >    would be used? Can you give a fuller example please? Do you have 
a
    >     >
    >     >    concrete use case where you need this?
    >     >
    >     >
    >     >
    >     >    Not everything in Airflow needs to be a hook; just access the
    > secrets
    >     >
    >     >    backend directly. I'm not sure what wrapping an extra layer
    > around these
    >     >
    >     >    classes gives us?
    >     >
    >     >
    >     >
    >     >    Without a concrete example I can't see anything other than this
    >     > adds a
    >     >
    >     >    lot of complexity.
    >     >
    >     >
    >     >
    >     >    -ash
    >     >
    >     >
    >     >
    >     >
    >     >
    >     >    On May 18 2020, at 2:45 pm, Jarek Potiuk <
    > jarek.pot...@polidea.com> wrote:
    >     >
    >     >
    >     >
    >     >    > Hello Everyone,
    >     >
    >     >    >
    >     >
    >     >    > TL;DR; I was just about to start to work on a small set of
    > Hooks -
    >     >
    >     >    > dedicated to retrieving screts from the Secret Backend. I
    >     > discussed it
    >     >
    >     >    > with Ash
    >     >
    >     >    > and Kamil
    >     >
    >     >    >
    >     >
    >     >
    >     > <
    > 
https://urldefense.proofpoint.com/v2/url?u=https-3A__apache-2Dairflow.slack.com_archives_C0145R4NPS5_p1589805908013700&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=NBBItsFcPZR-C26VepQEehBPNPEWUsxar_DatX5ulco&e=
    >     > > on
    >     >
    >     >    > Slack today. So far I thought I treat them as usual providers,
    >     > but Ash
    >     >
    >     >    > raised some valid concenrs. so I wanted to raise teh proposal
    >     > before I
    >     >
    >     >    > start working on it/
    >     >
    >     >    >
    >     >
    >     >    > *Context:*
    >     >
    >     >    >
    >     >
    >     >    > Currently we have "Secret Backend" support built in in 2.0 and
    >     >
    >     >    > 1.10.10+. It
    >     >
    >     >    > includes retrieving the variable and connections (via Secret
    >     > Manager class)
    >     >
    >     >    > for:
    >     >
    >     >    >
    >     >
    >     >    >   -  Hashicorp Vault
    >     >
    >     >    >   -  Secret Manager
    >     >
    >     >    >   -  KMS
    >     >
    >     >    >   -  AWS secret manager
    >     >
    >     >    >
    >     >
    >     >    > Those secret managers are configured in:
    >     >
    >     >    >
    >     >
    >     >    > [secret]
    >     >
    >     >    > backend=<SecretManagerClass>
    >     >
    >     >    > backend_kwargs={}
    >     >
    >     >    >
    >     >
    >     >    > Those are available for use in a nice way (via Jinja templates
    >     > and the
    >     >
    >     >    > like), but they need support in the Core of Airlfow (so require
    > 1.10.10+).
    >     >
    >     >    > This means that if you are on pre 1.10.10 you cannot use those
    > secrets.
    >     >
    >     >    > Currently you can only use one secret per whole Airflow
    > installation
    >     >
    >     >    > so if
    >     >
    >     >    > your secrets are split between several secret managers (or if
    >     > secrets for
    >     >
    >     >    > particular service require different credentials) - you cannot
    >     > use the
    >     >
    >     >    > mechanism to access such distributed secrets. It's not often
    >     > case, but I
    >     >
    >     >    > very well imagine it might happen that there are different sets
    > of
    >     >
    >     >    > credentials to access different secrets - some services might
    > have
    >     >
    >     >    > different scopes/level of access needed. .
    >     >
    >     >    >
    >     >
    >     >    > *Proposal*
    >     >
    >     >    >
    >     >
    >     >    > We have an idea that we might want also (on top of the above
    > SecretManager
    >     >
    >     >    > implementation) define generic Hooks for accessing secrets from
    > those
    >     >
    >     >    > services (just generic secrets, not connection, variables).
    >     > Simply treat
    >     >
    >     >    > each of the backends above as another "provider" and create a
    >     > Hook to
    >     >
    >     >    > access the service. Such Hook could have just one method:
    >     >
    >     >    >
    >     >
    >     >    > def get_secret(self, path_prefix: str, secret_id: str) ->
    > Optional[str]
    >     >
    >     >    >
    >     >
    >     >    > It would use a connection defined (as usual) in ENV variables
    > or database
    >     >
    >     >    > of Airflow to authenticate with the secret service and retrieve
    > the
    >     >
    >     >    > secrets.
    >     >
    >     >    >
    >     >
    >     >    > The good thing with it is that you could have easily multiple
    > secret
    >     >
    >     >    > backends configured to retrieve secrets for specific "service"
    > (so
    >     >
    >     >    > that you
    >     >
    >     >    > could keep "generic airflow's secerts" in one backend but still
    > have
    >     >
    >     >    > possibility of custom operators to use other backends (with
    > different
    >     >
    >     >    > authentication,  scopes etc.). And it is not touching any of 
the
    >     >
    >     >    > "core" of
    >     >
    >     >    > Airflow. It's just a set of hooks with corresponding 
connections
    >     > that work
    >     >
    >     >    > the same way as accessing any other provider in Airflow. No 
core
    >     > of Airflow
    >     >
    >     >    > will be touched with this change.
    >     >
    >     >    >
    >     >
    >     >    > *Pros/Cons*
    >     >
    >     >    >
    >     >
    >     >    > *Con:*
    >     >
    >     >    >
    >     >
    >     >    > I do realise it is a bit of duplication in functionality. We
    > already
    >     >
    >     >    > have a
    >     >
    >     >    > way to connect to a secret backend via airflow configuration 
and
    >     > we should
    >     >
    >     >    > likely promote it rather than introduce additional mechanism.
    >     >
    >     >    >
    >     >
    >     >    > *Pros:*
    >     >
    >     >    >
    >     >
    >     >    > * Most of all -> it adds flexibility of accessing several
    > secret backends
    >     >
    >     >    > for different use-cases. I looked at it so far in the way those
    >     > hooks are
    >     >
    >     >    > merely another set of "provider hooks". For me this is nothing
    > different
    >     >
    >     >    > than "providers" for any other services we have.  fFr example
    > "cloudant"
    >     >
    >     >    > provider has only "CloudantHook" that other custom operators
    > can use.
    >     >
    >     >    > And I
    >     >
    >     >    > well imagine this might be actually even more convenient to
    > configure
    >     >
    >     >    > connections in the DB and access secrets this way rather than
    >     > having to
    >     >
    >     >    > configure Secret Backends in Airflow configuration.
    >     >
    >     >    >
    >     >
    >     >    > * The dupication there it is very, very limited (basically a
    > method
    >     >
    >     >    > call to
    >     >
    >     >    > secret backend).
    >     >
    >     >    >
    >     >
    >     >    > * Another benefit of it is that it would allow people still
    > stuck
    >     > on pre
    >     >
    >     >    > 1.10.10 to  write custom operators that would like to use
    > secret backends
    >     >
    >     >    > (via backport operators). And still continue doing it in the
    > future
    >     >
    >     >    > (possibly migrating to 2.0/1.10.10+ in cases when there is one
    > secret
    >     >
    >     >    > backed only - but continue ot use connections/hooks where some
    > specific
    >     >
    >     >    > secrets shoudl be kept in different secret backend.
    >     >
    >     >    >
    >     >
    >     >    > I would like to hear your opinion on that.
    >     >
    >     >    >
    >     >
    >     >    > J.
    >     >
    >     >    >
    >     >
    >     >    > --
    >     >
    >     >    >
    >     >
    >     >    > Jarek Potiuk
    >     >
    >     >    > Polidea
    >     > <
    > 
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e=
    >     > > | Principal Software Engineer
    >     >
    >     >    >
    >     >
    >     >    > M: +48 660 796 129 <+48660796129>
    >     >
    >     >    > [image: Polidea]
    >     > <
    > 
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwICaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=AIc65Hls2sR87-APqi_0oh4N0F-NOUCC2ulfyS04GGU&s=r-sJroKu0X4XYmnboaHwbpbEIgk5TLwTErkDtwEgvog&e=
    > >
    >     >
    >     >    >
    >     >
    >
    >

    -- 

    Jarek Potiuk
    Polidea 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwIFaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=2Np0DHPYBn3aIynz3Rjb_Chh91zIO8nPv_zlsiom6cU&s=bTrXyNYkUkvEsg8UsK8c5R5LlmeljEtcrp9EqmxS-hM&e=
 > | Principal Software Engineer

    M: +48 660 796 129 <+48660796129>
    [image: Polidea] 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.polidea.com_&d=DwIFaQ&c=-0jfte1J3SKEE6FyZmTngg&r=cgex0jmJ1tJ3A5nVgQ7Pjo7sdo3NkXzIHPolJPlCwBw&m=2Np0DHPYBn3aIynz3Rjb_Chh91zIO8nPv_zlsiom6cU&s=bTrXyNYkUkvEsg8UsK8c5R5LlmeljEtcrp9EqmxS-hM&e=
 >

Reply via email to