Re: [DISCUSS] a cache for Airflow Variables

Jarek Potiuk Fri, 24 Mar 2023 04:00:12 -0700

Two points:

> I like Daniel's idea: implement this _in_ the provider itself (means you 
> can't use in memory cache, but can use the DB or redis etc) then it doesn't 
> even need user to upgrade to 2.6.
> Also it means that a problem with a single provider doesn't make all of core 
> significantly more complex.


I think this is not a good idea. It's not an AWS secret provider
problem. It's the same story for all secret backends (including custom
ones). People complained a lot about Hashicorp Vault  being heavily
pressured by Airflow for example. The change would help there - it's a
general optimisation for our secret backend architecture in this
context, and not a single-provider issue (at all).

> And it doesn't tie us in to the existing multi-processing model. But to 
> connect this discussion to other discussions going on right now: Why is 
> parsing even a special process: I can easily see a change where "parse this 
> file please" becomes a Workload, and all workloads run on the workers/runners.

I think this is a separate discussion and this change **could** be
implemented now (2.6), where the discussion about workload and how we
can implement future logic is pretty independent. Maybe 2.7, maybe
2.8, who knows.

I think this is really the case of a tactical solution improving our
user experience with very limited effort (which IMHO does not "heavily
complicate" core BTW. - it's very small and localized and even if we
change multiprocessing model we could easily adapt it) vs. strategic
long-term improvement in the architecture.

As I said - I am cool with either leaving it as is or implementing
this one in the current DFP (with preference for the latter). And I
think we should consider that seriously without dismissing it base on
the grounds of "we have bigger changes that might or might not come",
because we have a chance to improve the experience of our users now.

J,

On Fri, Mar 24, 2023 at 11:28 AM Ash Berlin-Taylor <[email protected]> wrote:
>
> I like Daniel's idea: implement this _in_ the provider itself (means you 
> can't use in memory cache, but can use the DB or redis etc) then it doesn't 
> even need user to upgrade to 2.6.
>
> Also it means that a problem with a single provider doesn't make all of core 
> significantly more complex.
> And it doesn't tie us in to the existing multi-processing model. But to 
> connect this discussion to other discussions going on right now: Why is 
> parsing even a special process: I can easily see a change where "parse this 
> file please" becomes a Workload, and all workloads run on the workers/runners.
> -ash
> On Mar 24 2023, at 9:48 am, Elad Kalif <[email protected]> wrote:
> > Jarek I can argue that by doing that the problem is not gone but just
> > reduced to a level where it's even harder to spot and adresses by the
> > users. This is the worst kind of problems to tackle!
> >
> > In my prespective if we don't address the issue directly (general solution
> > to top level code issues) any attempt to minimize it must come with a clear
> > way of letting users know what is going on.
> >
> > בתאריך יום ו׳, 24 במרץ 2023, 12:37, מאת Jarek Potiuk ‏<[email protected]>:
> > > TL;DR; I am in favour of the change (limiting it to DagFileProcessor
> > > only) and turning using Variables at top-level as good practice
> > > (2.6+).
> > >
> > > Here is why.
> > >
> > > I partially agree with Elad /Ash, but I am not 0/1 on that subject any
> > > more. I think there is a more nuanced approach that we could take
> > > here. I had a very similar view yesterday but I slept on it and
> > > changed my mind - to an extent.
> > >
> > > Putting aside (for a moment, I will come back to it further on) the
> > > fact that this one is addressing "bad practices" of users. I want to
> > > focus on technicalities first and risks.
> > >
> > > I initially contested the idea seeing the PR and raised my doubts
> > > (actually the discussion here is because I asked Raphael to bring it
> > > here): https://github.com/apache/airflow/pull/30259#issuecomment-148161094
> > > .
> > > The reasons I raised were based on earlier discussions and attempts to
> > > do similar things (including by myself) when I hit some problems
> > > related to multiprocessing and especially Celery. After looking at the
> > > code and hearing from Raphael I think the solution he came up with
> > > **might** work in a stable and good way for DagFileProcessor, where it
> > > is risky to implement for Celery (because it's billiard <>
> > > multiprocessing weird relation - Celery has it's own multiprocessing
> > > drop-in replacement - billiard that at times clashes with
> > > multiprocessing). The risk that this caching approach will cause
> > > problems when enabled for Celery is high and while I see how it can
> > > work for our DagFileProcessor, it will be a strong "no" for Celery.
> > > And As Daniel rightfully noticed, in K8S executor it is ineffective at
> > > all.
> > >
> > > However, I think (again putting aside the bad practices angle) - I
> > > **think** if we only do that for DagFileProcessor (I think caching in
> > > workers is generally out of question), from the technical point of
> > > view it has only benefits.
> > >
> > > * the amount of traffic it generates is WAY smaller in general - I
> > > think there is a nice angle to it that in this way it can decrease the
> > > amount of traffic generated even across multiple DAGs - imagine 100
> > > DAGs using the same variable - with single Scheduler/DagFileProcessor
> > > that might be a really nice resource saving (not only Secrets but just
> > > DB traffic and in some cases even number of connections to establish
> > > to the DB.
> > > * this will also - interestingly - help in case of AIP-44 and db-less
> > > dag file processor (in terms of lower number of requests to the
> > > internal-API
> > > * I think in case of the DagFileProcessor, my initial worry about ttl
> > > (and sometimes seeing outdated version of the variable) is far, far
> > > less than in case of workers. Unlike for tasks /workers, DAGs are
> > > parsed in essentially random intervals and there is never a
> > > "dependency" between DAGs so whether the variable is fresh or stale a
> > > bit - does not matter (because the same DAG can be parsed now or 5
> > > minutes ago and we have essentially no control on it and we cannot
> > > rely on it)(and.
> > >
> > > So - technically speaking, to summarize, as long as we test it
> > > thoroughly - also at scale, I see no problem with getting it in - as
> > > long as we only limit it to DagFileProcessor.
> > >
> > > Now - coming back to the reason "should we make life easier for our
> > > users who violate best practices" (essentially helping them to violate
> > > those).
> > >
> > > I think in general we should not make life easier for users who
> > > violate best practices. And I agree with Elad's points. And as a side
> > > comment I love Ash's idea about not re-parsing - this should be a
> > > discussion started separately and I would be 100% supportive for that
> > > one. It's the first time I hear it from Ash :) and I love it - there
> > > are other things to consider there - like when to trigger reparsing
> > > etc/
> > >
> > > However, I think with this change we have the opportunity of doing
> > > something different - i.e. change our best practices.
> > >
> > > Why not changing our best practices to (essentially):
> > >
> > > "Do not use long running things like API calls. You can use Airflow
> > > Variables though - they are implemented in the way that workaround the
> > > limitations, at the expense of propagation time it takes when you
> > > change the variable" (or something like that).
> > >
> > > Philosophical rant here (sorry for those who do not like such analogies):
> > >
> > > The nice thing is that for many of our users (who currently violate
> > > them) they will suddenly with **just** upgrading Airflow fall from
> > > "violating best practices" camp to "following best practices" camp. I
> > > think that could be a major win for everyone. Yes, we can preach the
> > > best practices but seeing how often they are violated (for good
> > > reason, alternatives are not as "easy" to use), another good approach
> > > is to embrace the behaviours and let them become good practices. That
> > > worked for major religions for example (we are close to it - so think
> > > Easter being absorbed from Pagan spring equinox celebrations
> > > originally). It's a very effective strategy to bring new followers and
> > > strengthen binding with those who struggle with their practices being
> > > touted as "bad". And this all without making a single change to their
> > > DAG files/structure (this would be necessary if we follow the
> > > "no-reparsing" idea by Ash).
> > >
> > > It would also make migration to 2.6 way more attractive for many users
> > > (if it becomes part of 2.6). Think our message to our users "yes you
> > > pay a lot for Secret access, but by migrating to 2.6 your bills will
> > > drop-down dramatically" with just upgrading and no changes to your
> > > DAGs. In the current economic climate, that alone might convince some
> > > decision makers to invest in the migration to 2.6.
> > >
> > > This is not a super-strong opinion, I would also be ok if we leave
> > > things as they are, but I think that there are no technical reasons
> > > why this would not work. Those are more "communication" issues - how
> > > we communicate to the users, how to be VERY CLEAR that "using
> > > variables at top level" is only good for 2.6+ and that it is a very
> > > bad idea for pre-2.6.
> > >
> > > I think (despite many years of arguing and advocating - including my
> > > own statements/words/rants - differently) we should keep an open mind
> > > to that option.
> > >
> > > Thoughts?
> > >
> > > J
> > >
> > > On Fri, Mar 24, 2023 at 2:42 AM Daniel Standish
> > > <[email protected]> wrote:
> > > >
> > > > It would not help with kubernetes executor. Did you try with local
> > > > executor?
> > > >
> > > > Another option to consider is to implement this specifically on the AWS
> > > > secrets manager secrets backend itself.
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [DISCUSS] a cache for Airflow Variables

Reply via email to