Yes Fokko that is true, the overall aggregated saving from removing the
overhead is actually gonna be esp. large for us as we start tens of
millions of tasks everyday. Looking forward to include that change in our
code base.

Hi Jarek, automated performance testing sounds extremely tasty and we
Airbnb would love to contribute wherever we can! Please don't hesitate to
reach out if you want to discuss or need help from us.


Cheers,
Kevin Y

On Wed, Oct 23, 2019 at 1:40 AM Jarek Potiuk <jarek.pot...@polidea.com>
wrote:

> Just to let everyone know - in the context - we are planning @Polidea to
> work on automated performance testing for Airlfow. Since Performance and
> Reliability is super important, we think about defining a set of consistent
> performance tests that we will be able to run automatically ionusing
> different Deployments (Composer, Astronomer, On-Prem Kubernetes/Helm Chart-
> installed Airflow from official image) and provide access to graphs and
> dashboard + historical performance numbers for Airflow including running
> some of the tests in 1.10.x line.
>
> This is quite ambitious venture when we think about some details, but it
> seems from the discussion above is highly needed.
>
> I plan to have a new AIP/discussion started this week - so just wanted you
> to give some heads-up on that. I do not want to hijack this thread, so bear
> with me - I have a draft proposal for AIP-28 which I want to first discuss
> with a few people to get their initial input and then propose it here,
>
> J.
>
>
> On Tue, Oct 22, 2019 at 6:31 PM Driesprong, Fokko <fo...@driesprong.frl>
> wrote:
>
> > Removing overhead for starting the processes would not only benefit the
> k8s
> > executor, but also the workers spawn subprocesses.
> >
> > I would definitely be interested to see some numbers on the improvement
> of
> > AIP-17 in practice. Maybe we should build some benchmark to see if we
> > introduced performance regression in the current master. Maybe we can do
> > something similar to the Apache Spark project, and create a preview
> release
> > for Airflow 2.0.
> >
> > Cheers, Fokko
> >
> > Op di 22 okt. 2019 om 04:58 schreef Kevin Yang <yrql...@gmail.com>:
> >
> > > For sure Fokko! I'll go through the PRs after finishing reading the one
> > for
> > > AIP-24.
> > >
> > > AIP-17 does need quite some rewrites but I think we're pretty close. We
> > > plan to roll it out in our production cluster and then open source it
> > after
> > > we believe it is stable. At the moment we're doing it by reusing
> > > task_instance table and we expect to see a big drop on the DB load as
> we
> > > believe that huge amount of heartbeat is the biggest contributor to DB
> > load
> > > and connection issue. @yingbo.w...@airbnb.com <yingbo.w...@airbnb.com>
> > can
> > > help to provide more details.
> > >
> > > Being able to reduce task start up overhead I think is great,
> especially
> > > for users of K8S executor but I guess it would not help too much on the
> > > sensor case since sensor tend to be relatively longer running tasks and
> > > don't get scheduled that often.
> > >
> > > I agree we should not wait for too long with 2.0, esp. those two items
> > can
> > > be expand to large changes. As long as we acknowledge the importance of
> > the
> > > two items and keep them under our radar I'm happy.
> > >
> > >
> > > Cheers,
> > > Kevin Y
> > >
> > > On Mon, Oct 21, 2019 at 7:34 AM James Meickle
> > > <jmeic...@quantopian.com.invalid> wrote:
> > >
> > > > I would feel better about a faster 2.0 release if we had a better
> plan
> > > for
> > > > how often we'll do future major version increments. As-is this might
> be
> > > the
> > > > first change to break backwards compat meaningfully in a while.
> > > >
> > > > On Mon, Oct 21, 2019 at 3:03 AM Driesprong, Fokko
> <fo...@driesprong.frl
> > >
> > > > wrote:
> > > >
> > > > > Thanks Kevin,
> > > > >
> > > > > Kevin would love to have your input on this
> > > > > <https://github.com/apache/airflow/pull/6210> PR. This one tries
> to
> > > > > implement an async implementation of the operator, based on the
> > sensor
> > > by
> > > > > Seelman. And also this <
> https://github.com/apache/airflow/pull/6370>
> > > > one,
> > > > > which is required to make it work.
> > > > >
> > > > > For me, the most important question is how we are going to batch
> > these
> > > > poke
> > > > > operations in a way that doesn't add too much complexity. AIP-17
> > sounds
> > > > > like a great idea but requires a lot of rewriting and also adds
> > another
> > > > > table on which we keep state (which also will add load to the DB).
> > > Also,
> > > > > Ash has some optimizations that reduce the overhead of starting a
> > task,
> > > > > which might also partially mitigate the problem of the overhead
> when
> > > > > starting a task.
> > > > >
> > > > > Personally I feel that we should not wait too long with the 2.0
> > > release,
> > > > > and not try to cram everything in there. Right now we're already
> > > > > backporting a lot to 1.10 and the resolving of the conflicts is
> > getting
> > > > > more tedious. This already broke the 1.10.4 release. The master
> > branch
> > > > > already has a lot of new stuff in there, that is just waiting to be
> > > > > released.
> > > > >
> > > > > Cheers, Fokko
> > > > >
> > > > >
> > > > > Op ma 21 okt. 2019 om 06:04 schreef Kevin Yang <yrql...@gmail.com
> >:
> > > > >
> > > > > >   Thanks Ash for putting together the doc, somehow I cannot do
> > > anything
> > > > > on
> > > > > > confluence so I'll put my comments here.
> > > > > >
> > > > > > +1 for using this opportunity to define how we want to do
> releases,
> > > > e.g.
> > > > > > frequency, compatibility rules, etc.
> > > > > >
> > > > > > If the DAG isolation is being worked on I would love to see it in
> > > 2.0.
> > > > > >
> > > > > > Adding two other items I think are quite important:
> > > > > >
> > > > > >    - DB reliability/performance
> > > > > >       - DB is a single point of failure just as the scheduler and
> > per
> > > > > >       experience operating a huge cluster in Airbnb( 6k+
> DAGs/60k+
> > > > > > tasks), it is
> > > > > >       a bigger treat on the stability of Airflow
> > > > > >       - If the reason behind improving scheduler performance is
> > > > > >       scalability then I think we can instead work on the DB, or
> > > > > something
> > > > > > like
> > > > > >       AIP-17
> > > > > >       <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-17+Airflow+sensor+optimization
> > > > > > >
> > > > > >    - Project baseline
> > > > > >       - As we grow more mature doing releases, we should consider
> > > > > establish
> > > > > >       the baseline for Airflow and thus create easier upgrade
> > > > experience,
> > > > > > e.g.
> > > > > >       performance benchmarking, defining API( not the web
> endpoint
> > > but
> > > > > API
> > > > > > like
> > > > > >       how each operator params are used) and tests on them, etc.
> > > > > >       - Not necessarily need to be fully included in 2.0 as I
> image
> > > > this
> > > > > >       would be a long incremental work but the earlier we start
> the
> > > > > > earlier we
> > > > > >       benefit
> > > > > >
> > > > > >
> > > > > > Cheers,
> > > > > > Kevin Y
> > > > > >
> > > > > > On Wed, Oct 9, 2019 at 7:00 PM Chao-Han Tsai <
> milton0...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Although Airflow has the concept of task priority like Ash
> > > mentioned,
> > > > > it
> > > > > > > does not pre-empt running tasks.
> > > > > > >
> > > > > > > On Wed, Oct 9, 2019 at 12:42 AM Ash Berlin-Taylor <
> > a...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > There's already a concept called priority_weight on tasks
> > > > > > > >
> > > > > >
> > > >
> > http://airflow.apache.org/concepts.html?highlight=priority_weight#pools
> > > > > > > > (the doc about it is in relation to pools, but everything is
> > run
> > > > in a
> > > > > > > pool
> > > > > > > > of "default_pool" if not specified.)
> > > > > > > >
> > > > > > > > Is that what you want?
> > > > > > > >
> > > > > > > > On 9 October 2019 07:38:38 BST, bharath palaksha <
> > > > > bharath...@gmail.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > > >Hi,
> > > > > > > > >
> > > > > > > > >Is there any discussion thread on adding priority to tasks
> and
> > > > > > > > >cost-based
> > > > > > > > >optimization?
> > > > > > > > >priority and pre-emption as an option to the user. If
> priority
> > > is
> > > > > > > > >specified, scheduler has to schedule high priority tasks and
> > if
> > > > > > > > >pre-emption
> > > > > > > > >is true, it can pre-empt current running task which is of
> > lower
> > > > > > > > >priority
> > > > > > > > >
> > > > > > > > >Thanks,
> > > > > > > > >Bharath
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >On Mon, Sep 30, 2019 at 11:19 PM James Meickle
> > > > > > > > ><jmeic...@quantopian.com.invalid> wrote:
> > > > > > > > >
> > > > > > > > >> For what I'm looking for out of a 2.0, as an
> > operator/cluster
> > > > > admin
> > > > > > > > >> (separate from what I'd like to see as a DAG developer),
> I'd
> > > > love
> > > > > to
> > > > > > > > >see:
> > > > > > > > >>
> > > > > > > > >> - Combine breaking changes into 2.0, and do as few as
> > possible
> > > > > after
> > > > > > > > >> - A semver policy for 2.0 and onwards. (For instance we
> got
> > > bit
> > > > > hard
> > > > > > > > >by a
> > > > > > > > >> breaking API change in the k8s operator)
> > > > > > > > >> - Regularly scheduled releases (like: "minor every other
> > > month,
> > > > > > major
> > > > > > > > >every
> > > > > > > > >> other year")
> > > > > > > > >> - A security backport policy
> > > > > > > > >> - Pinned deps for releases
> > > > > > > > >> - A way to get integration/cloud vendor operator updates
> > > > > > out-of-tree,
> > > > > > > > >> without having to pull in unrelated Airflow updates
> > > > > > > > >>
> > > > > > > > >> For a lot of people, Airflow is an off-the-shelf app
> rather
> > > > than a
> > > > > > > > >library,
> > > > > > > > >> but we don't actually ship or support it anything like
> most
> > > > > > > > >comparable
> > > > > > > > >> off-the-shelf apps. It makes it much harder to support
> than
> > > > other
> > > > > > > > >> applications, unless you're a Python developer yourself.
> > > > > > > > >>
> > > > > > > > >> On Mon, Sep 30, 2019 at 11:18 AM Jarek Potiuk
> > > > > > > > ><jarek.pot...@polidea.com>
> > > > > > > > >> wrote:
> > > > > > > > >>
> > > > > > > > >> > All those are very important and we are going to work on
> > > some
> > > > of
> > > > > > > > >them as
> > > > > > > > >> > well.
> > > > > > > > >> >
> > > > > > > > >> > I think if there are breaking changes, we should rather
> > try
> > > to
> > > > > fit
> > > > > > > > >them
> > > > > > > > >> in
> > > > > > > > >> > 2.0 release - at least to the point that they can be
> base
> > > for
> > > > > > > > >extending
> > > > > > > > >> it
> > > > > > > > >> > in later versions in backwards-compatible way (maybe
> then
> > we
> > > > > > should
> > > > > > > > >adopt
> > > > > > > > >> > SemVer officially and follow it).
> > > > > > > > >> >
> > > > > > > > >> > J.
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > On Tue, Sep 24, 2019 at 11:52 PM James Meickle
> > > > > > > > >> > <jmeic...@quantopian.com.invalid> wrote:
> > > > > > > > >> >
> > > > > > > > >> > > My question with that is, how often do we want to do
> > major
> > > > > > > > >version
> > > > > > > > >> > > increments? There's a few  API breaking changes I'd
> love
> > > to
> > > > > see,
> > > > > > > > >but
> > > > > > > > >> > > whether to propose them for 2.0 depends on what the
> wait
> > > > until
> > > > > > > > >3.0
> > > > > > > > >> looks
> > > > > > > > >> > > like (or whether we'll allow more minor version
> > breakages
> > > in
> > > > > the
> > > > > > > > >> future)
> > > > > > > > >> > >
> > > > > > > > >> > > On Tue, Sep 24, 2019, 11:44 Dan Davydov
> > > > > > > > ><ddavy...@twitter.com.invalid>
> > > > > > > > >> > > wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > > I think along with "Improve Webserver Performance"
> we
> > > > should
> > > > > > > > >solve
> > > > > > > > >> the
> > > > > > > > >> > > > serialization and task execution isolation problems
> a
> > > > little
> > > > > > > > >bit more
> > > > > > > > >> > > > completely since I imagine there could be backwards
> > > > > > > > >compatibility
> > > > > > > > >> > issues.
> > > > > > > > >> > > > e.g. mapping each task JSON to a Docker image or
> other
> > > > > > > > >serialized
> > > > > > > > >> > > > representation that workers would then consume. See
> > the
> > > > > > > > >attached PDF,
> > > > > > > > >> > > > AIP-24 is a subset of the DAG Definition
> Serialization
> > > > work,
> > > > > > > > >but in
> > > > > > > > >> my
> > > > > > > > >> > > > opinion we should still work on DAG Isolation too.
> My
> > > only
> > > > > > > > >concern is
> > > > > > > > >> > > that
> > > > > > > > >> > > > the scope is too big for 2.0.
> > > > > > > > >> > > >
> > > > > > > > >> > > > cc @Sumit Maheshwari <smaheshw...@twitter.com> who
> is
> > > > also
> > > > > > > > >looking
> > > > > > > > >> at
> > > > > > > > >> > > > tackling some of these problems.
> > > > > > > > >> > > >
> > > > > > > > >> > > > On Tue, Sep 24, 2019 at 9:47 AM Ash Berlin-Taylor
> > > > > > > > ><a...@apache.org>
> > > > > > > > >> > > wrote:
> > > > > > > > >> > > >
> > > > > > > > >> > > >> I'm also in favour of py-test (and it's what I use
> > for
> > > > most
> > > > > > of
> > > > > > > > >my
> > > > > > > > >> > > >> development) which is why I created
> > > > > > > > >> > > >> https://issues.apache.org/jira/browse/AIRFLOW-4863
> ,
> > > but
> > > > I
> > > > > > > > >don't
> > > > > > > > >> think
> > > > > > > > >> > > >> non-user-facing/impacting changes need to go on the
> > > road
> > > > > map.
> > > > > > > > >> > > >>
> > > > > > > > >> > > >> -ash
> > > > > > > > >> > > >>
> > > > > > > > >> > > >> > On 24 Sep 2019, at 13:53, Tomasz Urbaszek <
> > > > > > > > >> > > tomasz.urbas...@polidea.com>
> > > > > > > > >> > > >> wrote:
> > > > > > > > >> > > >> >
> > > > > > > > >> > > >> > I am thinking about proposing migration from
> > nosetest
> > > > to
> > > > > > > > >pytest
> > > > > > > > >> > > because
> > > > > > > > >> > > >> > it's "more up to date". I have even a POC but a
> lot
> > > of
> > > > > test
> > > > > > > > >fails
> > > > > > > > >> > due
> > > > > > > > >> > > to
> > > > > > > > >> > > >> > probably side effects.
> > > > > > > > >> > > >> >
> > > > > > > > >> > > >> > Best,
> > > > > > > > >> > > >> > Tomek
> > > > > > > > >> > > >> >
> > > > > > > > >> > > >> > On Tue, Sep 24, 2019 at 2:38 PM Ash Berlin-Taylor
> > > > > > > > ><a...@apache.org
> > > > > > > > >> >
> > > > > > > > >> > > >> wrote:
> > > > > > > > >> > > >> >
> > > > > > > > >> > > >> >> That formatted very badly in plain text. The
> list
> > > was:
> > > > > > > > >> > > >> >>
> > > > > > > > >> > > >> >>        • Knative Executor (AIP-25, currently
> > draft.
> > > > > Being
> > > > > > > > >worked
> > > > > > > > >> on
> > > > > > > > >> > > by
> > > > > > > > >> > > >> >> Daniel Imberman )
> > > > > > > > >> > > >> >>        • Improve Webserver performance (AIP-24,
> > > > > currently
> > > > > > > > >draft.
> > > > > > > > >> > > Being
> > > > > > > > >> > > >> >> worked on by myself, Kaxil Naik and Zhou Fang)
> > > > > > > > >> > > >> >>        • Enhanced real-time UI
> > > > > > > > >> > > >> >>        • Improve Scheduler performance
> > > > > > > > >> > > >> >>        • Extend/finish the API (AIP-13 is part
> of
> > > > this,
> > > > > > but
> > > > > > > > >not
> > > > > > > > >> > all)
> > > > > > > > >> > > >> >>        • Production Docker image + Helm chart
> > > > > > > > >> > > >> >>
> > > > > > > > >> > > >> >>> On 24 Sep 2019, at 13:36, Ash Berlin-Taylor
> > > > > > > > ><a...@apache.org>
> > > > > > > > >> > wrote:
> > > > > > > > >> > > >> >>>
> > > > > > > > >> > > >> >>> Hi everyone,
> > > > > > > > >> > > >> >>>
> > > > > > > > >> > > >> >>> I'd like to start working on a concrete plan to
> > get
> > > > > > > > >Airflow 2.0
> > > > > > > > >> > out,
> > > > > > > > >> > > >> and
> > > > > > > > >> > > >> >> as a result I've started updating
> > > > > > > > >> > > >> >>
> > > > > > > > >
> > https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+2.0
> > > > > > > > >> > > >> >>>
> > > > > > > > >> > > >> >>> In addition to all the tidy up work ("spring
> > > > cleaning",
> > > > > > > > >finish
> > > > > > > > >> > tidy
> > > > > > > > >> > > up
> > > > > > > > >> > > >> >> after dropping Py2 etc) I'd propose the
> following
> > 6
> > > > high
> > > > > > > > >level
> > > > > > > > >> > items:
> > > > > > > > >> > > >> >>>
> > > > > > > > >> > > >> >>> Knative Executor (AIP-25, currently draft.
> Being
> > > > worked
> > > > > > on
> > > > > > > > >by
> > > > > > > > >> > Daniel
> > > > > > > > >> > > >> >> Imberman )
> > > > > > > > >> > > >> >>> Improve Webserver performance (AIP-24,
> currently
> > > > draft.
> > > > > > > > >Being
> > > > > > > > >> > worked
> > > > > > > > >> > > >> on
> > > > > > > > >> > > >> >> by myself, Kaxil Naik and Zhou Fang)
> > > > > > > > >> > > >> >>> Enhanced real-time UI
> > > > > > > > >> > > >> >>> Improve Scheduler performance
> > > > > > > > >> > > >> >>> Extend/finish the API (AIP-13 is part of this,
> > but
> > > > not
> > > > > > > > >all)
> > > > > > > > >> > > >> >>> Production Docker image + Helm chart
> > > > > > > > >> > > >> >>> We at Astronomer are committing to work on
> these
> > in
> > > > > > > > >roughly this
> > > > > > > > >> > > order
> > > > > > > > >> > > >> >> if no one else gets to them first. I also
> propose
> > > that
> > > > > we
> > > > > > > > >create
> > > > > > > > >> > SIGs
> > > > > > > > >> > > >> >> (Special Interest Groups) in slack with
> > > > > weekly/fortnightly
> > > > > > > > >(every
> > > > > > > > >> > 14
> > > > > > > > >> > > >> days)
> > > > > > > > >> > > >> >> "calls"/update sessions. We already have #sig-ui
> > and
> > > > > > > > >> > > >> #sig-dag-serialisation.
> > > > > > > > >> > > >> >>>
> > > > > > > > >> > > >> >>> This roadmap is also not a promise that all of
> > > these
> > > > > will
> > > > > > > > >be
> > > > > > > > >> done
> > > > > > > > >> > > >> before
> > > > > > > > >> > > >> >> Airflow 2.0 - we may decide later to push
> > something
> > > > back
> > > > > > to
> > > > > > > > >v2.1
> > > > > > > > >> > etc.
> > > > > > > > >> > > >> >>>
> > > > > > > > >> > > >> >>> Does anyone disagree strongly with these
> > > priorities,
> > > > or
> > > > > > > > >have
> > > > > > > > >> > > anything
> > > > > > > > >> > > >> >> they want to add that you are willing to work
> on?
> > > > > > > > >> > > >> >>>
> > > > > > > > >> > > >> >>> Thanks,
> > > > > > > > >> > > >> >>> Ash
> > > > > > > > >> > > >> >>
> > > > > > > > >> > > >> >>
> > > > > > > > >> > > >> >
> > > > > > > > >> > > >> > --
> > > > > > > > >> > > >> >
> > > > > > > > >> > > >> > Tomasz Urbaszek
> > > > > > > > >> > > >> > Polidea <https://www.polidea.com/> | Junior
> > Software
> > > > > > > > >Engineer
> > > > > > > > >> > > >> >
> > > > > > > > >> > > >> > M: +48 505 628 493 <+48505628493>
> > > > > > > > >> > > >> > E: tomasz.urbas...@polidea.com
> > > > > > > > ><tomasz.urbasz...@polidea.com>
> > > > > > > > >> > > >> >
> > > > > > > > >> > > >> > Unique Tech
> > > > > > > > >> > > >> > Check out our projects! <
> > > > > https://www.polidea.com/our-work>
> > > > > > > > >> > > >>
> > > > > > > > >> > > >>
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > --
> > > > > > > > >> >
> > > > > > > > >> > Jarek Potiuk
> > > > > > > > >> > Polidea <https://www.polidea.com/> | Principal Software
> > > > > Engineer
> > > > > > > > >> >
> > > > > > > > >> > M: +48 660 796 129 <+48660796129>
> > > > > > > > >> > [image: Polidea] <https://www.polidea.com/>
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Chao-Han Tsai
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>

Reply via email to