Re: [VOTE] AIP-24: Persisting serialized DAG in DB for webserver scalability

Kaxil Naik Thu, 24 Oct 2019 07:48:54 -0700

I agree with what Jarek is saying. And if just for testing a user wants to
see what his template field would look like they can used airflow render
from the cli


On Thu, Oct 24, 2019 at 3:19 PM Jarek Potiuk <[email protected]>
wrote:

> Great work!
>
> I also think it's a pragmatic and reasonable approach to add it to 1.10 and
> work on improved version for 2.0 (and have it as the only option).
> This way we do not really add unnecessary complexity - we will just improve
> the serialized implementation in our 2.0 effort, we close all the gaps we
> know now and get rid of SimpleDag.
>
> Storing rendered template seems to be a good idea.
>
> But one question here - maybe I do not understand it - why do we even
> attempt to render the templates in UI in the tasks that have not run yet?
> Does it really make sense?
> We do not know if what we have in the UI will have the actual values that
> will be used during the run. Rendering it in the UI when they have not been
> rendered yet might give different values than those actually used during
> task run.
>
> For me It would make perfect sense to keep the unrendered template before
> the task runs (and mark it as such) and rendered one (from the DB as
> suggested by Ash) after it runs.
>
> Or am I missing something?
>
> J.
>
> On Thu, Oct 24, 2019 at 4:01 PM Driesprong, Fokko <[email protected]>
> wrote:
>
> > Thank you,
> >
> > I fine with adding it to 1.10 as the current PR where is it an option,
> and
> > for now, the current behavior would be the old one, until smooth out all
> > the issues.
> >
> > For 2.0 I'd love to see it as the only option so we can remove the logic
> > and code of the SimpleDag to keep the codebase nice and clean.
> >
> > I like your thinking out loud Ash, and I think that should be a first
> > approach to storing the rendered templates.
> >
> > Cheers, Fokko
> >
> > Op do 24 okt. 2019 om 12:05 schreef Ash Berlin-Taylor <[email protected]>:
> >
> > > Speaking with my Astronomer hat on, we are going to include this in our
> > > next 1.10.x image as this improves the experience when deploying
> changes.
> > > We could not backport it to the v-1-10-test for inclusion in next
> > release,
> > > but the work to backport it is already done.
> > >
> > > Yes, making this the default (or only?) option in 2.0 sounds good to
> me.
> > >
> > > And to re-iterate, we aren't done on this bit of work (even if the AIP
> > > might be done) when this PR is merged and we will carry on working on
> the
> > > next stage of this work.
> > >
> > > Next stages of work:
> > > - Work out best way off addressing
> > > https://github.com/apache/airflow/pull/5743/files#r335284375 <
> > > https://github.com/apache/airflow/pull/5743/files#r335284375>
> (Operator
> > > links that expect an Operator instance to work.)
> > > - Porting scheduler to use the serialised version instead of Simple*
> > > (overlaps with my performance testing+improvement work)
> > > - Tackling the Rendered templates (pre baking? something else?)
> > >
> > >
> > > Thinking about the rendering: the only reason we can't render the
> strings
> > > on the webserver is that there _might_ be local user defined macros or
> > > filters for the dag. But in the case where there isn't (most of them)
> we
> > > could possibly still render the template in the webserver. For tasks
> that
> > > have already run (those that have a TaskInstance row) maybe we could
> > store
> > > the rendered version as executed in there? Just thinking out loud here.
> > >
> > > -ash
> > >
> > >
> > > > On 24 Oct 2019, at 10:56, Driesprong, Fokko <[email protected]>
> > > wrote:
> > > >
> > > > Awesome work all,
> > > >
> > > > I would be more in the opposite direction. Maybe we should not
> backport
> > > > such a huge change to 1.10 and work on it for 2.0. Also, I'd like to
> > see
> > > it
> > > > as being default in 2.0, having it as an option will introduce
> another
> > > > permutation in the configuration and this will make the whole thing
> > more
> > > > complex to maintain, and also te debug.
> > > >
> > > > I would personally not be worried too much with the pre-rendering. As
> > > long
> > > > as wel make sure that at execution time, the thing gets rendered
> > > (including
> > > > xcom at that time), and then save it again for reference. So you can
> > > always
> > > > trace what's being executed, but before execution, it might be
> > incorrect
> > > > (because the xcom etc is not properly being populated).
> > > >
> > > > Cheers, Fokko
> > > >
> > > >
> > > > Op do 24 okt. 2019 om 11:51 schreef Kevin Yang <[email protected]>:
> > > >
> > > >> Just saw the latest email from Kaxil and want to clarify. I was
> > replying
> > > >> the previous posts and making general comments instead of trying to
> > > block
> > > >> the PR from being merged. I'm fine with merging it and improve
> > > >> incrementally. It is quite polished and already huge now, I can
> > totally
> > > >> relate to the rebase pain.
> > > >>
> > > >>
> > > >> Cheers,
> > > >> Kevin Y
> > > >>
> > > >> On Thu, Oct 24, 2019 at 2:40 AM Kevin Yang <[email protected]>
> wrote:
> > > >>
> > > >>> Oops, I guess I'm too slow reading the PRs to let me vote in time
> 😛
> > > >>> Congrats Zhou, Kaxil and Ash for getting it accepted 🎉
> > > >>>
> > > >>> I'm pretty excited about this long-anticipated change and thank you
> > > guys
> > > >>> for the great work. Though I do have a few comments.
> > > >>>
> > > >>> I agree with Dan's concern about maintaining two representations
> > would
> > > >>> potentially create a lot pain in the future. If the scheduler
> rework
> > > >>> mentioned in Ash's rough long term plan can be a more immediate
> > action,
> > > >>> doesn't matter if it is in the same PR, then I won't be so worried.
> > > >>>
> > > >>> With records in this new separate table, I agree the cost of
> > > >>> rollback/upgrade won't be too much of a headache. We have those
> > > >> approaches
> > > >>> Ash mentioned or can even create another table and let them coexist
> > > for a
> > > >>> while if the change is dramastic.
> > > >>>
> > > >>> The added DB load seems pretty negligible. But just to be extra
> safe,
> > > >>> we'll try to test it within our cluster, where DB load is at the
> > > boarder
> > > >>> line, and report back.
> > > >>>
> > > >>> About the template rendering, I think we should still try to
> > pre-render
> > > >> it
> > > >>> and persist it into the DB. Rendered is a very frequently accessed
> > > >> endpoint
> > > >>> and for large DAG files( we have DAG files taking multiple mins to
> > > parse)
> > > >>> it would take too long to load, block the webserver workers if we
> got
> > > too
> > > >>> many rendered endpoint requests and create quite some confusions
> > given
> > > >> that
> > > >>> we act differently on different endpoints. In AIP-17, we pre-render
> > the
> > > >>> template when creating the task instances where we have
> > execution_date.
> > > >> If
> > > >>> XCOM is the blocker, we can maybe pre-render everything else
> besides
> > > the
> > > >>> XCOM and render the XCOM part on the fly. Since people can access
> the
> > > >>> rendered endpoint before upstream tasks finished, we'll probly have
> > to
> > > >>> exclude XCOM in pre-rendering anyway. I would be very happy if this
> > can
> > > >> be
> > > >>> a high priority follow up of the AIP.
> > > >>>
> > > >>>
> > > >>> Cheers,
> > > >>> Kevin Y
> > > >>>
> > > >>> On Wed, Oct 23, 2019 at 3:39 PM Kaxil Naik <[email protected]>
> > > wrote:
> > > >>>
> > > >>>> This vote passed (although not unanimous) and I'll mark this AIP
> as
> > > >>>> accepted.
> > > >>>>
> > > >>>> *Result*:
> > > >>>> +1 votes: 7 (6 binding and 1 non-binding vote)
> > > >>>> -1 votes: 2 (2 binding and 0 non-binding votes)
> > > >>>>
> > > >>>> *+1 (binding)*:
> > > >>>> Kaxil Naik
> > > >>>> Ash-Berlin Taylor
> > > >>>> Jarek Potiuk
> > > >>>> Kamil Breguła
> > > >>>> Fokko Driesprong
> > > >>>> Sumit Maheshwari
> > > >>>>
> > > >>>> *+1 (non-binding):*
> > > >>>> Philippe Gagnon
> > > >>>>
> > > >>>> *-1 (binding)*:
> > > >>>> Dan Davydov
> > > >>>> Alex Guziel
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Thu, Oct 17, 2019 at 5:29 PM Dan Davydov <[email protected]
> >
> > > >> wrote:
> > > >>>>
> > > >>>>> Not sure I'm convinced, I think my core concern of maintaining
> two
> > > >>>>> representations and not having a thought-out future plan of how
> the
> > > >>>> schema
> > > >>>>> will evolve/how easy this migration will be still stands. It
> feels
> > > >> like
> > > >>>> we
> > > >>>>> are trying to chase some short-term wins here. But hey it's a
> > > >> democracy,
> > > >>>>> don't let my vote block you :)!
> > > >>>>>
> > > >>>>> On Thu, Oct 17, 2019 at 10:15 AM Kaxil Naik <[email protected]
> >
> > > >>>> wrote:
> > > >>>>>
> > > >>>>>> Hi Dan, I understand your concern. Your +1 and suggestions are
> > very
> > > >>>>>> important for us so let me try to explain in more details
> > > >>>>>> and see if I can convince you.
> > > >>>>>>
> > > >>>>>> Please check replies in-line
> > > >>>>>>
> > > >>>>>> I think this is the kind of core change we should make from the
> > > >>>> get-go, my
> > > >>>>>>> feeling is that we will be paying for this decision otherwise,
> > and
> > > >> the
> > > >>>>>>> cost
> > > >>>>>>> will surpass the immediate value. Ideally we would fully
> > understand
> > > >>>> the
> > > >>>>>>> long-term plan before deciding on this serialization format,
> but
> > I
> > > >>>> think
> > > >>>>>>> at
> > > >>>>>>> least having a single serialized representation for now would
> be
> > a
> > > >>>> decent
> > > >>>>>>> compromise for me.
> > > >>>>>>> This is still a blocker to me agreeing.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Trying to do all (Serialization for Webserver, Scheduler, etc)
> at
> > > >> once
> > > >>>>>> would be error prone as we would need to cover all cases as well
> > and
> > > >>>>>> it is very difficult to test and review such a large chunk of
> work
> > > >> even
> > > >>>>>> if we have multiple people reviewing it.
> > > >>>>>>
> > > >>>>>> Doing it iteratively would allow us to roll out serialisation
> > while
> > > >>>>>> keeping backwards compatibility. Keeping backwards compatibility
> > for
> > > >>>> now
> > > >>>>>> is very important atleast for Websever as the users would have
> to
> > > >> wait
> > > >>>>>> for months or at-worse 1 more year (plus the upgrade path to
> 1.10*
> > > to
> > > >>>> 2.0 is
> > > >>>>>> already looking quite huge).
> > > >>>>>>
> > > >>>>>> Airflow 2.0 is still going to take months and is already going
> to
> > > >> have
> > > >>>>>> number of breaking changes that is going to make
> > > >>>>>> updating to it cumbersome task:
> > > >>>>>>
> > > >>>>>> - No non-RBAC UI
> > > >>>>>> - No Py2
> > > >>>>>> - Import paths have (or in the process of being) changed from
> > > >> all/most
> > > >>>>>> contrib objects
> > > >>>>>> - CLI reorganization
> > > >>>>>> - and various others:
> > > >>>>>>
> > > >>>>
> > > >>
> > >
> https://github.com/apache/airflow/blob/master/UPDATING.md#airflow-master
> > > >>>>>>
> > > >>>>>> I am listing the above for 1 reason - "How long would the users
> > have
> > > >> to
> > > >>>>>> deal with Webserver scalability issue" before they upgrade to
> 2.0
> > !
> > > >>>>>>
> > > >>>>>> However, I agree with your argument that we should ideally
> > > understand
> > > >>>> the
> > > >>>>>> long-term plan.
> > > >>>>>>
> > > >>>>>> The current serialized representation of DAG looks like the
> > > >> following:
> > > >>>>>>
> > > >>>>>>
> > > >>>>
> > > >>
> > >
> >
> https://github.com/apache/airflow/blob/a6f37cd0da0774dff6461d3fdcb94dee9d101fda/tests/serialization/test_dag_serialization.py#L40-L93
> > > >>>>>>
> > > >>>>>> As you can see most of the information required by SimpleTask is
> > > >>>> already
> > > >>>>>> present. Currently, SimpleTask is used by the Scheduler. The
> next
> > > >> phase
> > > >>>>>> of this AIP will be to use Serialization in the Scheduler and
> > > >> removing
> > > >>>>>> SimpleTask altogether.
> > > >>>>>>
> > > >>>>>> As you said that at least having a single serialized
> > representation,
> > > >>>> for
> > > >>>>>> now, would be a decent compromise for me, how about you suggest
> > > >>>>>> or let us know if you think we are missing anything in the
> > > >> serialized
> > > >>>>>> blob example above.
> > > >>>>>>
> > > >>>>>> ------------------------
> > > >>>>>>
> > > >>>>>> As long as all of the changes are in the camp of "easy to
> > > >> migrate/undo
> > > >>>> in
> > > >>>>>>> the future" then these are not blockers for me. I'm not 100%
> > > >> convinced
> > > >>>>>>> that
> > > >>>>>>> they are, but you have more context so I'll defer to your
> > judgment.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Currently, as the Serialization applies only to the webserver,
> if
> > > >>>>>> something goes wrong, changing `store_serialized_dags=False`
> would
> > > >>>> restore
> > > >>>>>> no-serialization behavior.
> > > >>>>>>
> > > >>>>>> We are not touching Scheduler for 1.10* as it is an actual Heart
> > of
> > > >>>>>> Airflow and wouldn't feel comfortable in changing that core in a
> > > >>>>>> minor/patch fix.
> > > >>>>>>
> > > >>>>>> ------------------------
> > > >>>>>>
> > > >>>>>> I'm happy with JSON, my main point about "what if we get it
> wrong"
> > > is
> > > >>>> that
> > > >>>>>>> I feel like more planning work could have been put into this
> > > project
> > > >>>> vs
> > > >>>>>>> just a couple of rough high-level target "milestones". This
> > > >>>> significantly
> > > >>>>>>> increases the risk of the project long term in my eyes.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> Getting all the things right all at once would be harder than
> > taking
> > > >> it
> > > >>>>>> piece by piece.
> > > >>>>>> We have discovered many things in Webserver that we initially
> did
> > > not
> > > >>>>>> take into account like "Operator Links".
> > > >>>>>> Even Scheduler would have something similar which we won't know
> > now
> > > >> but
> > > >>>>>> find out later which would then require re-working and going
> > > >>>> back-and-forth.
> > > >>>>>>
> > > >>>>>> ------------------------
> > > >>>>>>
> > > >>>>>>> I think as long as we combine the serialized DAG
> representations
> > > and
> > > >>>> have
> > > >>>>>>> some confidence that the data format will be easy to switch to
> > > >> future
> > > >>>>>>> potential formats we would want (kind of hard to do without a
> > more
> > > >>>>>>> detailed
> > > >>>>>>> future plan...), then I would happily change my vote.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> For storing versioned dags, this is just a rough idea and it can
> > > >>>> change:
> > > >>>>>>
> > > >>>>>> We have a last_updated_time column in the Serialized Dag table.
> > > >>>> Currently
> > > >>>>>> we just update that dag row when we process the Dag again,
> > > >>>>>> instead, we would just store it as a different row if the DAG
> has
> > > >>>> changed
> > > >>>>>> and if it already had a DagRun with previous version i.e
> > > >>>>>> we don't need to store it every time it changes if it hasn't run
> > in
> > > >> the
> > > >>>>>> meantime.
> > > >>>>>>
> > > >>>>>> We can assign a unique number (version) to each row for a DAG.
> > > >>>>>> Sorting by `last_updated_time` column who gives us how DAG
> updated
> > > >> over
> > > >>>>>> time.
> > > >>>>>>
> > > >>>>>> Regards,
> > > >>>>>> Kaxil
> > > >>>>>>
> > > >>>>>> On Wed, Oct 16, 2019 at 4:43 PM Dan Davydov
> > > >>>> <[email protected]>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Responses inline.
> > > >>>>>>>
> > > >>>>>>> On Wed, Oct 16, 2019 at 6:07 AM Ash Berlin-Taylor <
> > [email protected]>
> > > >>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Thanks for the feedback and discussion everyone - it's nice to
> > > >> know
> > > >>>>>>> people
> > > >>>>>>>> feel strongly about this. Let's make sure we build this right!
> > > >>>>>>>>
> > > >>>>>>>> As a reminder: the main goals of this AIP:
> > > >>>>>>>>
> > > >>>>>>>> - To speed up bootup/worker recycle time of the webserver
> > > >>>>>>>>
> > > >>>>>>>>  If you have a large number of DAGs or highly dynamic DAGs it
> is
> > > >>>>>>> possible
> > > >>>>>>>> that starting your webserver needs a Gunicorn worker timeout
> of
> > 5
> > > >>>> or 10
> > > >>>>>>>> minutes. (It is worse at start up because each of the gunicorn
> > > >>>> workers
> > > >>>>>>>> parses the dag itself, in exactly the same order, at the same
> > time
> > > >>>> as
> > > >>>>>>> the
> > > >>>>>>>> other workers, leading to disk contention, and CPU burst.)
> > > >>>>>>>>
> > > >>>>>>>>  And just yesterday I helped two people diagnose a problem
> that
> > > >> was
> > > >>>>>>>> caused by slow NFS: > time airflow list_dags` took 3:35mins to
> > go
> > > >>>> over
> > > >>>>>>> 43
> > > >>>>>>>> dags.
> > > >>>>>>>>
> > > >>>>>>>>  At Astronomer, and Google (Cloud Composer) we've seen this
> slow
> > > >>>>>>> loading
> > > >>>>>>>> many times for our customer's installs.
> > > >>>>>>>>
> > > >>>>>>>>  And not to mention this makes the webserver snappier
> > > >>>>>>>>
> > > >>>>>>>> - To make the webserver almost "stateless" (not the best term
> > for
> > > >>>> it:
> > > >>>>>>>> Diskless perhaps? Dagbag-less?).
> > > >>>>>>>>
> > > >>>>>>>>  This makes actions on the webserver quicker. We had a number
> of
> > > >>>> PRs
> > > >>>>>>> in
> > > >>>>>>>> the 1.10 series that made views and API endpoints in the
> > webserver
> > > >>>>>>> quicker
> > > >>>>>>>> By making them only load one dag from disk. This takes it
> > further
> > > >>>> and
> > > >>>>>>> means
> > > >>>>>>>> that (as of the PR) only the Rendered tab needs to go to the
> > code
> > > >> on
> > > >>>>>>> disk.
> > > >>>>>>>> As a reminder the "trigger_dag" endpoint would previously load
> > all
> > > >>>>>>> dags.
> > > >>>>>>>> Other's still do. Multiple people on have complained about
> this
> > > >>>> Slack.
> > > >>>>>>>>
> > > >>>>>>>>  (The only remaining view that has to hit the code on disk is
> > the
> > > >>>>>>>> Rendered template tab, as right now a DAG can define custom
> > macros
> > > >>>> or
> > > >>>>>>>> filters, and we don't want to have to deal with serializing
> > code.
> > > >>>> The
> > > >>>>>>> only
> > > >>>>>>>> "real" way of doing it is CloudPickle. No thank you)
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> To address the points .
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On 15 Oct 2019, at 21:14, Driesprong, Fokko
> > <[email protected]
> > > >>>
> > > >>>>>>> wrote
> > > >>>>>>>>> - Are we going to extend the existing data model, to allow
> the
> > > >>>> RDBMS
> > > >>>>>>> to
> > > >>>>>>>>> optimize queries on fields that we use a lot?
> > > >>>>>>>>
> > > >>>>>>>> Yes, as needed. Not really needed for this PR yet (it doesn't
> > > >> change
> > > >>>>>>> any
> > > >>>>>>>> query patterns in the webserver, so they are using the
> existing
> > > >>>> tables)
> > > >>>>>>>>
> > > >>>>>>>>> - How are we going to do state evolution when we extend the
> > JSON
> > > >>>>>>> model
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> The top level object we're storing has a __version field (for
> > > >>>> example
> > > >>>>>>>> `{"__version": 1, "dag": { ... } }`) so we can detect older
> > > >> versions
> > > >>>>>>> and
> > > >>>>>>>> either run a db "migration" on install, or upgrade at load
> time.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On 15 Oct 2019, at 20:04, Dan Davydov
> > > >> <[email protected]
> > > >>>>>
> > > >>>>>>>> wrote:
> > > >>>>>>>>> Having both a SimpleDagBag representation and the JSON
> > > >>>> representation
> > > >>>>>>>>> doesn't make sense to me at the moment: *"**Quoting from
> > Airflow
> > > >>>>>>> code, it
> > > >>>>>>>>> is “a simplified representation of a DAG that contains all
> > > >>>> attributes
> > > >>>>>>>>> required for instantiating and scheduling its associated
> > > >> tasks.”.
> > > >>>> It
> > > >>>>>>> does
> > > >>>>>>>>> not contain enough information required by the webserver.".
> > *Why
> > > >>>> not
> > > >>>>>>>> create
> > > >>>>>>>>> a representation that can be used by both? This is going to
> be
> > a
> > > >>>> big
> > > >>>>>>>>> headache to both understand and work with in the codebase
> since
> > > >> it
> > > >>>>>>> will
> > > >>>>>>>> be
> > > >>>>>>>>> another definition that we need to keep in sync.
> > > >>>>>>>>
> > > >>>>>>>> Honestly: because that makes the change bigger, harder to
> review
> > > >> and
> > > >>>>>>>> harder to back port. If this is a serious blocker to you
> > agreeing
> > > >> we
> > > >>>>>>> can
> > > >>>>>>>> update it to use the same structure as part of this PR, but I
> > > >> would
> > > >>>>>>> rather
> > > >>>>>>>> it was a separate change.
> > > >>>>>>>>
> > > >>>>>>> I think this is the kind of core change we should make from the
> > > >>>> get-go,
> > > >>>>>>> my
> > > >>>>>>> feeling is that we will be paying for this decision otherwise,
> > and
> > > >> the
> > > >>>>>>> cost
> > > >>>>>>> will surpass the immediate value. Ideally we would fully
> > understand
> > > >>>> the
> > > >>>>>>> long-term plan before deciding on this serialization format,
> but
> > I
> > > >>>> think
> > > >>>>>>> at
> > > >>>>>>> least having a single serialized representation for now would
> be
> > a
> > > >>>> decent
> > > >>>>>>> compromise for me.
> > > >>>>>>>
> > > >>>>>>> This is still a blocker to me agreeing.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> Not sure if fileloc/fileloc_hash is the right solution, the
> > > >>>> longterm
> > > >>>>>>>>> solution I am imagining has clients responsible for uploading
> > > >> DAGs
> > > >>>>>>> rather
> > > >>>>>>>>> than retrieving them from the filesystem so
> > fileloc/fileloc_hash
> > > >>>>>>> wouldn't
> > > >>>>>>>>> even exist (dag_id would be used for indexing here).
> > > >>>>>>>>
> > > >>>>>>>> Then we can remove it later, when that code exists. For now so
> > > >> much
> > > >>>> of
> > > >>>>>>> the
> > > >>>>>>>> rest of the code base assumes DAGs from files on disk so.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> Versioning isn't really addressed either (e.g. if a DAG
> > topology
> > > >>>>>>> changes
> > > >>>>>>>>> with some update you want to be able to show both the old and
> > > >> new
> > > >>>>>>> ones,
> > > >>>>>>>> or
> > > >>>>>>>>> at least have a way to deal with them), there is an argument
> > > >> that
> > > >>>>>>> this is
> > > >>>>>>>>> acceptable since it isn't currently addressed now, but I'm
> > > >> worried
> > > >>>>>>> that
> > > >>>>>>>>> large schema changes should think through the long term plan
> a
> > > >> bit
> > > >>>>>>> more.
> > > >>>>>>>>
> > > >>>>>>>> Correct, and this is by design to make the change smaller and
> > > >>>> easier to
> > > >>>>>>>> review. Adding versioning using the existing serialization
> > format
> > > >> is
> > > >>>>>>> not
> > > >>>>>>>> that much work once the ground work is here and the webserver
> is
> > > >> not
> > > >>>>>>>> relying on DAGs from disk. There's also a UI question of how
> to
> > > >>>> handle
> > > >>>>>>> this
> > > >>>>>>>> on the Tree view that I don't have an immediate answer for.
> But
> > > >> yes,
> > > >>>>>>>> versioning of dag run history was in our minds when working on
> > > >> this
> > > >>>> PR.
> > > >>>>>>>>
> > > >>>>>>>> As long as all of the changes are in the camp of "easy to
> > > >>>> migrate/undo
> > > >>>>>>> in
> > > >>>>>>> the future" then these are not blockers for me. I'm not 100%
> > > >> convinced
> > > >>>>>>> that
> > > >>>>>>> they are, but you have more context so I'll defer to your
> > judgment.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>> I feel like getting this wrong is going to make it very hard
> to
> > > >>>>>>> migrate
> > > >>>>>>>>> things in the future, and make the codebase worse (another
> > > >>>>>>> representation
> > > >>>>>>>>> of DAGs that we need to support/understand/keep parity for).
> If
> > > >>>> I'm
> > > >>>>>>> wrong
> > > >>>>>>>>> about this then I would be more willing to +1 this change.
> This
> > > >>>> doc
> > > >>>>>>> is a
> > > >>>>>>>>> 1-2 pager and I feel like it is not thorough or deep enough
> and
> > > >>>>>>> doesn't
> > > >>>>>>>>> give me enough confidence that the work in the PR is going to
> > > >>>> make it
> > > >>>>>>>>> easier to complete the future milestones instead of harder.
> > > >>>>>>>>
> > > >>>>>>>> "But what if we get it wrong" is not something we can act on.
> So
> > > >>>> long
> > > >>>>>>> as
> > > >>>>>>>> you are happy with the choice of JSON as the serialization
> then
> > > >>>> since
> > > >>>>>>> we've
> > > >>>>>>>> got a version in the record we can change this relatively
> easily
> > > >> in
> > > >>>> the
> > > >>>>>>>> future - we'll know when we're dealing with an older record
> (we
> > > >>>> won't
> > > >>>>>>> have
> > > >>>>>>>> to guess) and can write a format migrator (either to upgrade
> in
> > > >>>>>>> memory, or
> > > >>>>>>>> to upgrade the record in place. And as a db "migration" or as
> a
> > > >>>> upgrade
> > > >>>>>>>> command)
> > > >>>>>>>>
> > > >>>>>>> I'm happy with JSON, my main point about "what if we get it
> > wrong"
> > > >> is
> > > >>>>>>> that
> > > >>>>>>> I feel like more planning work could have been put into this
> > > project
> > > >>>> vs
> > > >>>>>>> just a couple of rough high-level target "milestones". This
> > > >>>> significantly
> > > >>>>>>> increases the risk of the project long term in my eyes.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> We've based our serialization on Uber's Piper work <
> > > >>>>>>>> https://eng.uber.com/managing-data-workflows-at-scale/ <
> > > >>>>>>>> https://eng.uber.com/managing-data-workflows-at-scale/>> (we
> > had
> > > >> a
> > > >>>>>>> call
> > > >>>>>>>> with Alex at Uber before starting any of this work) so we've
> > piggy
> > > >>>>>>> backed
> > > >>>>>>>> of their hard work and months of running something very close
> to
> > > >>>> this
> > > >>>>>>>> format in production.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>
> > > >>>>>>> This is great to hear!
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> Also let's not forget: right now we aren't storing versions,
> and
> > > >> we
> > > >>>>>>> still
> > > >>>>>>>> have DAGs on disk. If we do get it massively wrong (which I
> > don't
> > > >>>>>>> think we
> > > >>>>>>>> have) we can just drop this entire table and start again. This
> > is
> > > >>>> all
> > > >>>>>>> an
> > > >>>>>>>> internal implementation detail that is not exposed via an API.
> > > >> What
> > > >>>>>>> better
> > > >>>>>>>> way to see if this is right than by running it at scale in
> > > >>>> production
> > > >>>>>>>> across 1000s of customers (Astronomer+Cloud Composer)
> > > >>>>>>>>
> > > >>>>>>> If we have thought through future milestones and made sure that
> > we
> > > >> can
> > > >>>>>>> switch to different formats that could required for them
> easily,
> > > >> then
> > > >>>> I'm
> > > >>>>>>> not a blocker on this front.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> Can you be more specific about you are worried about here?
> It's
> > a
> > > >>>> bit
> > > >>>>>>> hard
> > > >>>>>>>> to put general "but what if we get it wrong" fears to rest.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On Tue, Oct 15, 2019 at 9:10 PM Alex Guziel <
> > > >> [email protected]
> > > >>>>>>>> .invalid>
> > > >>>>>>>>> We don't need to have the future plan implemented
> > > >>>>>>>>> completely but it would be nice to see more detailed notes
> > about
> > > >>>> how
> > > >>>>>>> this
> > > >>>>>>>>> will play out in the future.
> > > >>>>>>>>
> > > >>>>>>>> Fair point. Our rough long-term plan:
> > > >>>>>>>>
> > > >>>>>>>> - Add versioning to the UI, likely by storing a new version
> > linked
> > > >>>> from
> > > >>>>>>>> the DagRun, (but only when it changes, not every time to avoid
> > DB
> > > >>>>>>> bloat -
> > > >>>>>>>> multiple dag_run rows will point at the same serialized blob).
> > > >>>>>>>>
> > > >>>>>>>> - Rework the scheduler use the serialized version, not the
> > Simple*
> > > >>>>>>>> version. assuming we can do it and keep the scheduler
> performing
> > > >>>> well.
> > > >>>>>>>> (i.e. that ser/deser time isn't significant for the scheduler
> > > >> loop)
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> - Massively rework how scheduler creates TaskInstances (right
> > now
> > > >>>> this
> > > >>>>>>> is
> > > >>>>>>>> done in the Dag parsing process, not the main scheduler) and
> dag
> > > >>>> runs.
> > > >>>>>>> We
> > > >>>>>>>> have to keep an eye on scheduler performance as we do this. If
> > we
> > > >>>> can
> > > >>>>>>> pull
> > > >>>>>>>> this back in to the main scheduler then this leads us towards
> > the
> > > >>>> next
> > > >>>>>>>> points:
> > > >>>>>>>>
> > > >>>>>>>> - Split out the updating of serialized dag/parsing of DAGs
> from
> > > >> the
> > > >>>>>>>> scheduler loop so it can be a separate component/subprocess
> > (which
> > > >>>> is
> > > >>>>>>> on by
> > > >>>>>>>> default). This gets us towards being able to submit DAG
> > > >> definitions
> > > >>>>>>> via an
> > > >>>>>>>> API if we wanted. (There's a lot to work out to get that
> > working,
> > > >>>>>>> mostly
> > > >>>>>>>> around running the DAG).
> > > >>>>>>>>
> > > >>>>>>>> - Once the scheduler loop/logic is not tied to parsing the
> DAGs
> > > >> then
> > > >>>>>>> this
> > > >>>>>>>> also heads us towards being able to run multiple schedulers.
> > (This
> > > >>>>>>> isn't a
> > > >>>>>>>> requirement on HA scheduler, but it makes them easier to
> "scale
> > > >>>> out".)
> > > >>>>>>>>
> > > >>>>>>>> Hope this answers people's questions, and thanks for the
> > feedback
> > > >>>>>>>>
> > > >>>>>>>> Alex and Dan: Does this give enough information for you to
> > change
> > > >>>> your
> > > >>>>>>>> vote?
> > > >>>>>>>>
> > > >>>>>>> I think as long as we combine the serialized DAG
> representations
> > > and
> > > >>>> have
> > > >>>>>>> some confidence that the data format will be easy to switch to
> > > >> future
> > > >>>>>>> potential formats we would want (kind of hard to do without a
> > more
> > > >>>>>>> detailed
> > > >>>>>>> future plan...), then I would happily change my vote.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> -ash
> > > >>>>>>>>
> > > >>>>>>>>> On 15 Oct 2019, at 21:14, Driesprong, Fokko
> > > >> <[email protected]
> > > >>>>>
> > > >>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>> Big +1 from my side, looking forward to make this happen.
> > > >>>>>>>>>
> > > >>>>>>>>> Two sides that aren't completely clear to me:
> > > >>>>>>>>>
> > > >>>>>>>>>  - Are we going to extend the existing data model, to allow
> the
> > > >>>>>>> RDBMS to
> > > >>>>>>>>>  optimize queries on fields that we use a lot?
> > > >>>>>>>>>  - How are we going to do state evolution when we extend the
> > > >> JSON
> > > >>>>>>> model
> > > >>>>>>>>>
> > > >>>>>>>>> I have good confidence that we'll solve this along the way.
> > > >>>>>>>>>
> > > >>>>>>>>> Cheers, Fokko
> > > >>>>>>>>>
> > > >>>>>>>>> Op di 15 okt. 2019 om 21:29 schreef Dan Davydov
> > > >>>>>>>>> <[email protected]>:
> > > >>>>>>>>>
> > > >>>>>>>>>> I have been following it from the beginning as well. I
> > > >> understand
> > > >>>>>>> there
> > > >>>>>>>>>> would be short-term wins for some users (I don't think a
> huge
> > > >>>>>>> amount of
> > > >>>>>>>>>> users?), but I still feel like we are being a bit
> > short-sighted
> > > >>>>>>> here and
> > > >>>>>>>>>> that we are creating more work for ourselves and potentially
> > > >> our
> > > >>>>>>> users
> > > >>>>>>>> in
> > > >>>>>>>>>> the future. I also feel like there will be side effects to
> > > >> users
> > > >>>> as
> > > >>>>>>>> well,
> > > >>>>>>>>>> many of which don't care about the webserver scalability,
> such
> > > >> as
> > > >>>>>>> bugs
> > > >>>>>>>>>> caused by the addition of the new webserver representation.
> I
> > > >>>> think
> > > >>>>>>>> without
> > > >>>>>>>>>> a design that is much larger in scope I wouldn't feel
> > > >> comfortable
> > > >>>>>>> moving
> > > >>>>>>>>>> forward with this AIP.
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Tue, Oct 15, 2019 at 3:21 PM Jarek Potiuk <
> > > >>>>>>> [email protected]>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Hello Dan, Alex,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I believe all the points you make are super-valid ones. But
> > > >>>> maybe
> > > >>>>>>> you
> > > >>>>>>>> are
> > > >>>>>>>>>>> missing the full context a bit.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I followed the original discussion
> > > >>>>>>>>>>> <
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/a2d426f93c0f4e5f0347627308638b59ca4072fd022a42af1163e34a@%3Cdev.airflow.apache.org%3E
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>> from the very beginning and took part in the initial
> > > >> discussions
> > > >>>>>>> when
> > > >>>>>>>>>> this
> > > >>>>>>>>>>> topic was raised. From the discussion it is quite clear to
> me
> > > >>>> that
> > > >>>>>>> this
> > > >>>>>>>>>>> mostly a "tactical" approach to implement something that is
> > > >>>>>>>> backportable
> > > >>>>>>>>>> to
> > > >>>>>>>>>>> 1.10 and rather quick to implement. This is targeted to
> make
> > > >>>> users
> > > >>>>>>> more
> > > >>>>>>>>>>> happy with their 1.10 version without the timing
> uncertainty
> > > >> and
> > > >>>>>>> effort
> > > >>>>>>>>>> of
> > > >>>>>>>>>>> migration to 2.0. It solves the major pain point of
> stability
> > > >> of
> > > >>>>>>> the UI
> > > >>>>>>>>>> in
> > > >>>>>>>>>>> case there are complex DAGs for which parsing crashes the
> > > >>>>>>> webserver.
> > > >>>>>>>>>>> Like in "being nice to your users".
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> There will be a separate effort to make pretty much all of
> > the
> > > >>>>>>> things
> > > >>>>>>>> you
> > > >>>>>>>>>>> mentioned in 2.0 in a non-backportable way as it requires
> far
> > > >>>> too
> > > >>>>>>> many
> > > >>>>>>>>>>> changes in the way how Airflow works internally.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Maybe it needs some more explanation + long term plan that
> > > >>>> follows
> > > >>>>>>> in
> > > >>>>>>>> the
> > > >>>>>>>>>>> AIP itself to explain it to those who have not followed the
> > > >>>> initial
> > > >>>>>>>>>>> discussion, but I think it's fully justified change.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> J.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Tue, Oct 15, 2019 at 9:10 PM Alex Guziel <
> > > >>>>>>> [email protected]
> > > >>>>>>>>>>> .invalid>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> -1 (binding)
> > > >>>>>>>>>>>> Good points made by Dan. We don't need to have the future
> > > >> plan
> > > >>>>>>>>>>> implemented
> > > >>>>>>>>>>>> completely but it would be nice to see more detailed notes
> > > >>>> about
> > > >>>>>>> how
> > > >>>>>>>>>> this
> > > >>>>>>>>>>>> will play out in the future. We shouldn't walk into a
> system
> > > >>>> that
> > > >>>>>>>>>> causes
> > > >>>>>>>>>>>> more pain in the future. (I can't say for sure that it
> does,
> > > >>>> but I
> > > >>>>>>>>>> can't
> > > >>>>>>>>>>>> say that it doesn't either). I don't think the proposal is
> > > >>>>>>> necessarily
> > > >>>>>>>>>>>> wrong or bad, but I think we need some more detailed
> > planning
> > > >>>>>>> around
> > > >>>>>>>>>>> future
> > > >>>>>>>>>>>> milestones.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Tue, Oct 15, 2019 at 12:04 PM Dan Davydov
> > > >>>>>>>>>>> <[email protected]
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> -1 (binding), this may sound a bit FUD-y but I don't feel
> > > >> this
> > > >>>>>>> has
> > > >>>>>>>>>> been
> > > >>>>>>>>>>>>> thought through enough...
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Having both a SimpleDagBag representation and the JSON
> > > >>>>>>> representation
> > > >>>>>>>>>>>>> doesn't make sense to me at the moment: *"**Quoting from
> > > >>>> Airflow
> > > >>>>>>>>>> code,
> > > >>>>>>>>>>> it
> > > >>>>>>>>>>>>> is “a simplified representation of a DAG that contains
> all
> > > >>>>>>> attributes
> > > >>>>>>>>>>>>> required for instantiating and scheduling its associated
> > > >>>>>>> tasks.”. It
> > > >>>>>>>>>>> does
> > > >>>>>>>>>>>>> not contain enough information required by the
> webserver.".
> > > >>>> *Why
> > > >>>>>>> not
> > > >>>>>>>>>>>> create
> > > >>>>>>>>>>>>> a representation that can be used by both? This is going
> to
> > > >>>> be a
> > > >>>>>>> big
> > > >>>>>>>>>>>>> headache to both understand and work with in the codebase
> > > >>>> since
> > > >>>>>>> it
> > > >>>>>>>>>> will
> > > >>>>>>>>>>>> be
> > > >>>>>>>>>>>>> another definition that we need to keep in sync.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Not sure if fileloc/fileloc_hash is the right solution,
> the
> > > >>>>>>> longterm
> > > >>>>>>>>>>>>> solution I am imagining has clients responsible for
> > > >> uploading
> > > >>>>>>> DAGs
> > > >>>>>>>>>>> rather
> > > >>>>>>>>>>>>> than retrieving them from the filesystem so
> > > >>>> fileloc/fileloc_hash
> > > >>>>>>>>>>> wouldn't
> > > >>>>>>>>>>>>> even exist (dag_id would be used for indexing here).
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Versioning isn't really addressed either (e.g. if a DAG
> > > >>>> topology
> > > >>>>>>>>>>> changes
> > > >>>>>>>>>>>>> with some update you want to be able to show both the old
> > > >> and
> > > >>>> new
> > > >>>>>>>>>> ones,
> > > >>>>>>>>>>>> or
> > > >>>>>>>>>>>>> at least have a way to deal with them), there is an
> > argument
> > > >>>> that
> > > >>>>>>>>>> this
> > > >>>>>>>>>>> is
> > > >>>>>>>>>>>>> acceptable since it isn't currently addressed now, but
> I'm
> > > >>>>>>> worried
> > > >>>>>>>>>> that
> > > >>>>>>>>>>>>> large schema changes should think through the long term
> > > >> plan a
> > > >>>>>>> bit
> > > >>>>>>>>>>> more.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I feel like getting this wrong is going to make it very
> > hard
> > > >>>> to
> > > >>>>>>>>>> migrate
> > > >>>>>>>>>>>>> things in the future, and make the codebase worse
> (another
> > > >>>>>>>>>>> representation
> > > >>>>>>>>>>>>> of DAGs that we need to support/understand/keep parity
> > for).
> > > >>>> If
> > > >>>>>>> I'm
> > > >>>>>>>>>>> wrong
> > > >>>>>>>>>>>>> about this then I would be more willing to +1 this
> change.
> > > >>>> This
> > > >>>>>>> doc
> > > >>>>>>>>>> is
> > > >>>>>>>>>>> a
> > > >>>>>>>>>>>>> 1-2 pager and I feel like it is not thorough or deep
> enough
> > > >>>> and
> > > >>>>>>>>>> doesn't
> > > >>>>>>>>>>>>> give me enough confidence that the work in the PR is
> going
> > > >> to
> > > >>>>>>> make it
> > > >>>>>>>>>>>>> easier to complete the future milestones instead of
> harder.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Tue, Oct 15, 2019 at 11:26 AM Kamil Breguła <
> > > >>>>>>>>>>>> [email protected]>
> > > >>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> +1 (binding)
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Tue, Oct 15, 2019 at 2:57 AM Kaxil Naik <
> > > >>>> [email protected]
> > > >>>>>>>>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Hello, Airflow community,
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> This email calls for a vote to add the DAG
> Serialization
> > > >>>>>>> feature
> > > >>>>>>>>>> at
> > > >>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/5743.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> *AIP*:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-24+DAG+Persistence+in+DB+using+JSON+for+Airflow+Webserver+and+%28optional%29+Scheduler
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> *Previous Mailing List discussion*:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/65d282368e0a7c19815badb8b1c6c8d72b0975ce94f601e13af44f74@%3Cdev.airflow.apache.org%3E
> > > >>>>>>>>>>>>>>> .
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> *Authors*: Kaxil Naik, Zhou Fang, Ash-Berlin Taylor
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> *Summary*:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>  - DAGs are serialized using JSON format and stored in
> a
> > > >>>>>>>>>>>>> SerializedDag
> > > >>>>>>>>>>>>>>>  table
> > > >>>>>>>>>>>>>>>  - The Webserver now instead of having to parse the DAG
> > > >>>> file
> > > >>>>>>>>>>> again,
> > > >>>>>>>>>>>>>>>  reads the serialized DAGs in JSON, de-serializes them
> > > >> and
> > > >>>>>>>>>>> creates
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>  DagBag and uses it to show in the UI.
> > > >>>>>>>>>>>>>>>  - Instead of loading an entire DagBag when the
> WebServer
> > > >>>>>>>>>> starts
> > > >>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>  only load each DAG on demand from the Serialized Dag
> > > >>>> table.
> > > >>>>>>>>>> This
> > > >>>>>>>>>>>>> helps
> > > >>>>>>>>>>>>>>>  reduce Webserver startup time and memory. The
> reduction
> > > >> is
> > > >>>>>>>>>>> notable
> > > >>>>>>>>>>>>>> when you
> > > >>>>>>>>>>>>>>>  have a large number of DAGs.
> > > >>>>>>>>>>>>>>>  - A JSON Schema has been defined and we validate the
> > > >>>>>>>>>> serialized
> > > >>>>>>>>>>>> dag
> > > >>>>>>>>>>>>>>>  before writing it to the database
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> [image: image.png]
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> A PR (https://github.com/apache/airflow/pull/5743) is
> > > >> ready
> > > >>>>>>> for
> > > >>>>>>>>>>>> review
> > > >>>>>>>>>>>>>>> from the committers and community.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> We also have a WIP PR (
> > > >>>>>>>>>> https://github.com/apache/airflow/pull/5992
> > > >>>>>>>>>>> )
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> backport this feature to 1.10.* branch.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> A big thank you to Zhou and Ash for their continuous
> help
> > > >> in
> > > >>>>>>>>>>>> improving
> > > >>>>>>>>>>>>>>> this feature/PR.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> This email is formally calling for a vote to accept the
> > > >> AIP
> > > >>>> and
> > > >>>>>>>>>> PR.
> > > >>>>>>>>>>>>>> Please
> > > >>>>>>>>>>>>>>> note that we will update the PR / feature to fix bugs
> if
> > > >> we
> > > >>>>>>> find
> > > >>>>>>>>>>> any.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>>>> Kaxil
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> --
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Jarek Potiuk
> > > >>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
> > > >>>> Engineer
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> M: +48 660 796 129 <+48660796129>
> > > >>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Re: [VOTE] AIP-24: Persisting serialized DAG in DB for webserver scalability

Reply via email to