Re: [DISCUSS] a cache for Airflow Variables

Jarek Potiuk Thu, 30 Mar 2023 12:16:58 -0700

Hello here.

I would like to propose something.

1) I really like the idea of conditionally disabling parsing - and I
would really love that this idea is hashed out and discussed more
(especially if those who came with it have a bit more time than me, I
seem to be involved in more things recently :D. I would really like
this to happen, but I am afraid it might take more time to discuss
some of the consequences, also it would necessarily require our users
to adopt and maintain some kind of exclusion mechanism (and we know it
will take a lot of time for adoption).

2) On the other hand - we have a very small, very localized and
already tested change that can help a number of users. Change that
only involves running shared cache memory in DagFileProcessor, that
does not require any changes from the users (mayve except
configuration parameter change to enable it). It does not really
change operational complexity nor bears any risks (especially if it
will be disabled by default).

Or this is at least how I see the options we have. What I would hate
most is that neither of the two happens because nobody will want to
spend their time discussing approving and implementing 1) whereas 2)
will be blocked because 1) is a better (though only ideated) solution.

My proposal will be to get that in for dag file processor only,
disabled by default - with some extra documentation explaining the
consequences.

Does it bear any risks I am not aware of for us if we do so? Is it too
much to ask ?

J.

On Mon, Mar 27, 2023 at 8:26 PM Vandon, Raphael
<[email protected]> wrote:
>
> My initial goal when working on this cache was mostly to shorten DAG parsing 
> times, simply because that's what I was looking at, so I'd be happy with 
> restricting this cache to dag parsing.
> I'm still relatively new to airflow codebase, so I don't know all the 
> implications this change has, so I'm grateful for the comments here.
>
> The benefits can be quite noticeable, depending a lot on the context. If the 
> dag file is simple, then a network call is going to be slow in comparison.
> And if the parsing interval is short compared to the dag execution schedule, 
> then the number of calls to get secrets is going to be dominated by the dag 
> parsing rather than the executions.
>
> The scenario where this brings the most benefits is many simple dag files, 
> all querying the same key from the Variables, parsed regularly, and ran less 
> often.
>
> @Hussein says that "the user can implement [their] own secret backend", but 
> it's not an easy task. They'd have to implement it as a wrapper around the 
> custom backend they want to use, since there can only be one custom secret 
> backend. And implementing an in-memory cache that works cross-process just as 
> a custom backend is straight up impossible.
>
> About the secure caching, since I'm only caching in-memory, I didn't do 
> anything to that regard, but we already have something in place to encrypt 
> secrets when they are saved in the metastore using cryptography.fernet.
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS] a cache for Airflow Variables

Reply via email to