Re: Let's talk Airflow 2.0

Kamil Breguła Fri, 03 Apr 2020 00:39:15 -0700

>
> Move (tested) components out of contrib folder
>
> https://lists.apache.org/thread.html/c880ef89f8cb4a0240c404f9372615b998c4a4eeca342651927d596c@%3Cdev.airflow.apache.org%3E



AIP-21 is done. We have a follow-up in the form of system tests and
backport packages, but this is not a blocker. We can go further without
system tests.
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-21%3A+Changes+in+import+paths


On Thu, Apr 2, 2020 at 6:55 PM Daniel Imberman <daniel.imber...@gmail.com>
wrote:

> Hello all,
>
> I've been reviewing This wiki page
> <https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+2.0> and I
> wanted to discuss which of these features are still relevant/are blockers
> for 2.0
>
> I figured we should start with the status of the high priority fixes
>
>
>    1.
> *Knative Executor (handled by KEDA) *The KEDA Autoscaler has accomplished
>    everything that we would've hoped for from a KnativeExecutor. I think we
>    can consider this work done.
>    2. *Improve Webserver performance*
>    @Kaxil Naik <kaxiln...@gmail.com> It appears that DAG serialization is
>    already merged? Are there other steps needed for webserver performance?
>    3.
> *Enhanced real-time UI *While this would make for a great "wow!" feature,
>    I don't think this is necessary for a 2.0. Unless anyone wants to work on
>    this in the short term :).
>    4.
> *Improve Scheduler performance and reliability @Ash Berlin-Taylor
>    <a...@apache.org> *has been hard at work on scheduler HA, and we've
>    already
>    5.
> *Extend/finish the API @Kamil Breguła <kamil.breg...@polidea.com> *I see
>    that there is an active PR here
>    <https://github.com/apache/airflow/pull/7549>, are we getting close on
>    this?
>    6.
> *Production Docker image *The PR here should be merged hopefully today!
>
> Now there's a second section of "needs more info" tickets and I wanted to
> see which of these we could triage for after 2.0
>
> Rework Subdags to be less "bolted-on" and more native to the scheduler
>>
>> There are all sorts of edge cases around subdags that result from the
>> tasks in the subdag being run by another executor, instead of being handled
>> and scheduled by the core Scheduler. We should make the scheduler "see in"
>> to the Subdags and make it responsible for scheduling tasks. This should
>> make subdags less error prone and more predictable. It may involve
>> replacing/significantly changing the SubDagOperator
>
>
> I think it's pretty damn crucial we fix subdags and backfills. I'm on the
> fence about this one. On the one hand it could possibly wait. On the other
> hand it would be embarrasing to release a 2.0 and still have this feature
> broken
>
> Move (tested) components out of contrib folder
>>
>> https://lists.apache.org/thread.html/c880ef89f8cb4a0240c404f9372615b998c4a4eeca342651927d596c@%3Cdev.airflow.apache.org%3E
>>
>
> I think this has been something we've actively worked on. Is this task
> complete?
>
> Filter passwords/sensitive info from logs.
>>
>> Jenkins does this if the password comes from a connection - it would be
>> good if we could do this too
>
>
> While I agree that this is an important issue, I think if no one has
> picked this up yet we should triage for a later release (I also don't think
> this should break anything)
>
> Allow Backfill runs to be handled by the scheduler/triggered from UI
>>
>> It would be nice to not need console access to run airflow backfill, and
>> to have not it not stop if the SSH session is closed.
>>
>> Lots of details to work out here though around how this would work, where
>> would it show up in UI, priority of tasks, ways of reducing
>> concurrency/load to allow normal tasks to run etc.
>
>
> Once again I think this is a feature that's a bit embarrasing NOT to fix
> for a 2.0 release, but also understand that there's only so much manpower.
> I also imagine this would be a HUGE undertaking so I think this would push
> back 2.0 significantly.
>
>
> Rationalize HA in Connections
>>
>> Right now it is possible to create multiple connections with the same ID
>> and *some*  Connections/hooks will support this and pick a random one
>> from the list. This feature isn't well documented or understood (and the
>> CLI doesn't support it as well as the UI for instance) so we should examine
>> if this makes sense, or if we should support it individually in certain
>> connection types instead.
>
>
> I honestly don't know enough about this to have a strong opinion on it
>
> Make setting up HTTPS connections easier/more expected
>>
>> It appears that @Kamil Breguła <kamil.breg...@polidea.com>  has an open
> PR <https://github.com/apache/airflow/pull/5239/files> for this one. @Kamil
> Breguła <kamil.breg...@polidea.com> do you think this would be hard to
> merge?
>
> Front end/"browser" testing
>>
>> The Airflow UI is non trivial and there have been a number of JS/html
>> bugs that could have been caught by better front-end testing.
>>
>> It has been suggested to look at Cypress for this over Selenium. What
>> ever we choose we need to pay careful attention to avoid slow or flakey UI
>> tests.
>>
>
> This, to me, is a crucial step in ensuring a smooth 2.0 transition. I've
> been taking time to learn cypress recently, and once the airflow helm chart
> is merged I think merging a set of integration/behavior/UI tests is crucial
>
> What does everyone think? Open to suggestions on this assessment!
>
>
> On Tue, Mar 31, 2020 at 11:46 AM Kaxil Naik <kaxiln...@gmail.com> wrote:
>
>> Got it. Thanks Daniel for leading this
>>
>> On Tue, Mar 31, 2020 at 7:40 PM Daniel Imberman <
>> daniel.imber...@gmail.com> wrote:
>>
>>> I think including both is fine as long as the old one contains
>>> deprecation warnings/force a feature flag to allow it
>>> (e.g. —allow-deprecated)
>>>
>>> via Newton Mail
>>> <https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.32&pv=10.14.6&source=email_footer_2>
>>>
>>> On Tue, Mar 31, 2020 at 11:33 AM, Kamil Breguła <
>>> kamil.breg...@polidea.com> wrote:
>>>
>>> On Sun, Mar 22, 2020 at 9:20 AM Robin Edwards <r...@bidnamic.com> wrote:
>>>
>>> > Also does the new API need to be feature complete or just enough
>>> > functionality to warrant removing the existing experimental one.
>>> >
>>>
>>> I think we should release at least one version that will contain the
>>> new and old REST APIs simultaneously. It is not easy to upgrade two
>>> complex systems at the same time. However, if we do this, some users
>>> will have to do it. Older versions can be hidden behind the feature
>>> gate. We can also add deprecation warnings.
>>>
>>> >
>>> > R
>>> >
>>> > On Fri, 20 Mar 2020, 20:29 Daniel Imberman, <daniel.imber...@gmail.com
>>> >
>>> > wrote:
>>> >
>>> > > Great! Hope to get a few more folx to give +1's but I think we have
>>> a good
>>> > > path forward here :)
>>> > >
>>> > > On Fri, Mar 20, 2020 at 12:51 PM Jarek Potiuk <
>>> jarek.pot...@polidea.com>
>>> > > wrote:
>>> > >
>>> > > > >
>>> > > > >
>>> > > > > I agree especially for larger-scale users migrations are a
>>> difficult
>>> > > > > process. Perhaps we can adopt something similar to a blockchain
>>> fork
>>> > > > (e.g.
>>> > > > > determine X known airflow using companies, and start the
>>> countdown as
>>> > > > soon
>>> > > > > as Y% of them migrate). I really just want to make sure we don't
>>> end up
>>> > > > > with a python2/3 situation. Even if we continue support it
>>> should only
>>> > > be
>>> > > > > for bugfixes and we should not add any new features into 1.10.
>>> > > > >
>>> > > >
>>> > > > I think we are in perfect sync - I think feature-migration should
>>> end
>>> > > > almost immediately after we release 2.0. But bug-fixing should
>>> continue
>>> > > > for quite some time. On that front - having backport packages will
>>> help
>>> > > > with releasing "integrations" quite independently from 1.10/2.0
>>> version
>>> > > > (which I think is good for those who are - for this or another
>>> reason -
>>> > > > stuck on 2.0). On the other hand we should make sure that the
>>> important
>>> > > > stuff for 2.0 that is not "feature" is also backported to 1.10. For
>>> > > example
>>> > > > a lot of recent performance improvements that we have now in 2.0
>>> will
>>> > > > be possible (and not that complex) to backport to 1.10. Some of
>>> this
>>> > > effort
>>> > > > is actually easier to do in 2.0 and then apply to 1.10 in similar
>>> fashion
>>> > > > as it is easier to understand and reason about the 2.0 code now
>>> when
>>> > > > we have some refactoring/pylints etc in place. So we should make
>>> sure
>>> > > > we get the latest 1.10 to a "good" state - before we freeze it for
>>> bugfix
>>> > > > only.
>>> > > > I know it might mean that some people will stay with 1.10 for
>>> longer, but
>>> > > > that's also OK for them. The reason to migrate to 2.0 should be not
>>> > > > performance but some important features (like API or HA) that come
>>> > > > with it.
>>> > > >
>>> > > > I couldn't agree more :). If we can start people writing (close
>>> to) 2.0
>>> > > > > compliant DAGs before the release of 2.0 that will make the
>>> migration
>>> > > > > process so much easier :).
>>> > > > >
>>> > > >
>>> > > > Yeah. I even thought that we should write a
>>> > > > "How good your DAGs are for 2.0" assessment tool.
>>> > > >
>>> > > >
>>> > > > > If there aren't any extra steps or features that we need to add
>>> (beyond
>>> > > > the
>>> > > > > ones discussed here), I think a good next step would be to
>>> create an
>>> > > > > official checklist just so we can see all of these features in
>>> one
>>> > > place
>>> > > > > (and hopefully start breaking them down into as small of changes
>>> as
>>> > > > > possible).
>>> > > > >
>>> > > > > Does that sound ok?
>>> > > > >
>>> > > >
>>> > > > Perfectly OK for me!
>>> > > >
>>> > > >
>>> > > > >
>>> > > > > --
>>> > > >
>>> > > > Jarek Potiuk
>>> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
>>> > > >
>>> > > > M: +48 660 796 129 <+48660796129>
>>> > > > [image: Polidea] <https://www.polidea.com/>
>>> > > >
>>> > >
>>>
>>>

Re: Let's talk Airflow 2.0

Reply via email to