> I have some thoughts on the Subdag (will open a new thread if necessary).
> Instead of having a separate child DAG, would it be better to chain all the
> tasks from the child dag to the parent dag and then drop the child dag?
> In this way, the whole child dag (actually just the tasks in it) will be
> respected by the scheduler the same as the parent dag.
>
> To have a similar zoom in/out in the UI, we can add a column to the
> task_instance table as a marker to group these tasks together when rendered
> in the UI.
>

+1

I think this is a brilliant idea and it would simplify things a lot.

We used to used subdags but ended up replacing them with a magic template
operator where you specified a start and end task to get round the subdag
issues.


Let me know how you guys think.

Thanks
> Bin
>
> On Thu, Apr 2, 2020 at 9:55 AM Daniel Imberman <daniel.imber...@gmail.com>
> wrote:
>
> > Hello all,
> >
> > I've been reviewing This wiki page
> > <https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+2.0> and I
> > wanted to discuss which of these features are still relevant/are blockers
> > for 2.0
> >
> > I figured we should start with the status of the high priority fixes
> >
> >
> >    1.
> > *Knative Executor (handled by KEDA) *The KEDA Autoscaler has accomplished
> >    everything that we would've hoped for from a KnativeExecutor. I think
> we
> >    can consider this work done.
> >    2. *Improve Webserver performance*
> >    @Kaxil Naik <kaxiln...@gmail.com> It appears that DAG serialization
> is
> >    already merged? Are there other steps needed for webserver
> performance?
> >    3.
> > *Enhanced real-time UI *While this would make for a great "wow!"
> feature, I
> >    don't think this is necessary for a 2.0. Unless anyone wants to work
> on
> >    this in the short term :).
> >    4.
> > *Improve Scheduler performance and reliability @Ash Berlin-Taylor
> >    <a...@apache.org> *has been hard at work on scheduler HA, and we've
> >    already
> >    5.
> > *Extend/finish the API @Kamil Breguła <kamil.breg...@polidea.com> *I see
> >    that there is an active PR here
> >    <https://github.com/apache/airflow/pull/7549>, are we getting close
> on
> >    this?
> >    6.
> > *Production Docker image *The PR here should be merged hopefully today!
> >
> > Now there's a second section of "needs more info" tickets and I wanted to
> > see which of these we could triage for after 2.0
> >
> > Rework Subdags to be less "bolted-on" and more native to the scheduler
> > >
> > > There are all sorts of edge cases around subdags that result from the
> > > tasks in the subdag being run by another executor, instead of being
> > handled
> > > and scheduled by the core Scheduler. We should make the scheduler "see
> > in"
> > > to the Subdags and make it responsible for scheduling tasks. This
> should
> > > make subdags less error prone and more predictable. It may involve
> > > replacing/significantly changing the SubDagOperator
> >
> >
> > I think it's pretty damn crucial we fix subdags and backfills. I'm on the
> > fence about this one. On the one hand it could possibly wait. On the
> other
> > hand it would be embarrasing to release a 2.0 and still have this feature
> > broken
> >
> > Move (tested) components out of contrib folder
> > >
> > >
> >
> https://lists.apache.org/thread.html/c880ef89f8cb4a0240c404f9372615b998c4a4eeca342651927d596c@%3Cdev.airflow.apache.org%3E
> > >
> >
> > I think this has been something we've actively worked on. Is this task
> > complete?
> >
> > Filter passwords/sensitive info from logs.
> > >
> > > Jenkins does this if the password comes from a connection - it would be
> > > good if we could do this too
> >
> >
> > While I agree that this is an important issue, I think if no one has
> picked
> > this up yet we should triage for a later release (I also don't think this
> > should break anything)
> >
> > Allow Backfill runs to be handled by the scheduler/triggered from UI
> > >
> > > It would be nice to not need console access to run airflow backfill,
> and
> > > to have not it not stop if the SSH session is closed.
> > >
> > > Lots of details to work out here though around how this would work,
> where
> > > would it show up in UI, priority of tasks, ways of reducing
> > > concurrency/load to allow normal tasks to run etc.
> >
> >
> > Once again I think this is a feature that's a bit embarrasing NOT to fix
> > for a 2.0 release, but also understand that there's only so much
> manpower.
> > I also imagine this would be a HUGE undertaking so I think this would
> push
> > back 2.0 significantly.
> >
> >
> > Rationalize HA in Connections
> > >
> > > Right now it is possible to create multiple connections with the same
> ID
> > > and *some*  Connections/hooks will support this and pick a random one
> > > from the list. This feature isn't well documented or understood (and
> the
> > > CLI doesn't support it as well as the UI for instance) so we should
> > examine
> > > if this makes sense, or if we should support it individually in certain
> > > connection types instead.
> >
> >
> > I honestly don't know enough about this to have a strong opinion on it
> >
> > Make setting up HTTPS connections easier/more expected
> > >
> > > It appears that @Kamil Breguła <kamil.breg...@polidea.com>  has an
> open
> > PR
> > <https://github.com/apache/airflow/pull/5239/files> for this one. @Kamil
> > Breguła <kamil.breg...@polidea.com> do you think this would be hard to
> > merge?
> >
> > Front end/"browser" testing
> > >
> > > The Airflow UI is non trivial and there have been a number of JS/html
> > bugs
> > > that could have been caught by better front-end testing.
> > >
> > > It has been suggested to look at Cypress for this over Selenium. What
> > ever
> > > we choose we need to pay careful attention to avoid slow or flakey UI
> > tests.
> > >
> >
> > This, to me, is a crucial step in ensuring a smooth 2.0 transition. I've
> > been taking time to learn cypress recently, and once the airflow helm
> chart
> > is merged I think merging a set of integration/behavior/UI tests is
> crucial
> >
> > What does everyone think? Open to suggestions on this assessment!
> >
> >
> > On Tue, Mar 31, 2020 at 11:46 AM Kaxil Naik <kaxiln...@gmail.com> wrote:
> >
> > > Got it. Thanks Daniel for leading this
> > >
> > > On Tue, Mar 31, 2020 at 7:40 PM Daniel Imberman <
> > daniel.imber...@gmail.com>
> > > wrote:
> > >
> > >> I think including both is fine as long as the old one contains
> > >> deprecation warnings/force a feature flag to allow it
> > >> (e.g. —allow-deprecated)
> > >>
> > >> via Newton Mail
> > >> <
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.32&pv=10.14.6&source=email_footer_2
> > >
> > >>
> > >> On Tue, Mar 31, 2020 at 11:33 AM, Kamil Breguła <
> > >> kamil.breg...@polidea.com> wrote:
> > >>
> > >> On Sun, Mar 22, 2020 at 9:20 AM Robin Edwards <r...@bidnamic.com>
> wrote:
> > >>
> > >> > Also does the new API need to be feature complete or just enough
> > >> > functionality to warrant removing the existing experimental one.
> > >> >
> > >>
> > >> I think we should release at least one version that will contain the
> > >> new and old REST APIs simultaneously. It is not easy to upgrade two
> > >> complex systems at the same time. However, if we do this, some users
> > >> will have to do it. Older versions can be hidden behind the feature
> > >> gate. We can also add deprecation warnings.
> > >>
> > >> >
> > >> > R
> > >> >
> > >> > On Fri, 20 Mar 2020, 20:29 Daniel Imberman, <
> > daniel.imber...@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Great! Hope to get a few more folx to give +1's but I think we
> have
> > a
> > >> good
> > >> > > path forward here :)
> > >> > >
> > >> > > On Fri, Mar 20, 2020 at 12:51 PM Jarek Potiuk <
> > >> jarek.pot...@polidea.com>
> > >> > > wrote:
> > >> > >
> > >> > > > >
> > >> > > > >
> > >> > > > > I agree especially for larger-scale users migrations are a
> > >> difficult
> > >> > > > > process. Perhaps we can adopt something similar to a
> blockchain
> > >> fork
> > >> > > > (e.g.
> > >> > > > > determine X known airflow using companies, and start the
> > >> countdown as
> > >> > > > soon
> > >> > > > > as Y% of them migrate). I really just want to make sure we
> don't
> > >> end up
> > >> > > > > with a python2/3 situation. Even if we continue support it
> > should
> > >> only
> > >> > > be
> > >> > > > > for bugfixes and we should not add any new features into 1.10.
> > >> > > > >
> > >> > > >
> > >> > > > I think we are in perfect sync - I think feature-migration
> should
> > >> end
> > >> > > > almost immediately after we release 2.0. But bug-fixing should
> > >> continue
> > >> > > > for quite some time. On that front - having backport packages
> will
> > >> help
> > >> > > > with releasing "integrations" quite independently from 1.10/2.0
> > >> version
> > >> > > > (which I think is good for those who are - for this or another
> > >> reason -
> > >> > > > stuck on 2.0). On the other hand we should make sure that the
> > >> important
> > >> > > > stuff for 2.0 that is not "feature" is also backported to 1.10.
> > For
> > >> > > example
> > >> > > > a lot of recent performance improvements that we have now in 2.0
> > >> will
> > >> > > > be possible (and not that complex) to backport to 1.10. Some of
> > this
> > >> > > effort
> > >> > > > is actually easier to do in 2.0 and then apply to 1.10 in
> similar
> > >> fashion
> > >> > > > as it is easier to understand and reason about the 2.0 code now
> > when
> > >> > > > we have some refactoring/pylints etc in place. So we should make
> > >> sure
> > >> > > > we get the latest 1.10 to a "good" state - before we freeze it
> for
> > >> bugfix
> > >> > > > only.
> > >> > > > I know it might mean that some people will stay with 1.10 for
> > >> longer, but
> > >> > > > that's also OK for them. The reason to migrate to 2.0 should be
> > not
> > >> > > > performance but some important features (like API or HA) that
> come
> > >> > > > with it.
> > >> > > >
> > >> > > > I couldn't agree more :). If we can start people writing (close
> > to)
> > >> 2.0
> > >> > > > > compliant DAGs before the release of 2.0 that will make the
> > >> migration
> > >> > > > > process so much easier :).
> > >> > > > >
> > >> > > >
> > >> > > > Yeah. I even thought that we should write a
> > >> > > > "How good your DAGs are for 2.0" assessment tool.
> > >> > > >
> > >> > > >
> > >> > > > > If there aren't any extra steps or features that we need to
> add
> > >> (beyond
> > >> > > > the
> > >> > > > > ones discussed here), I think a good next step would be to
> > create
> > >> an
> > >> > > > > official checklist just so we can see all of these features in
> > one
> > >> > > place
> > >> > > > > (and hopefully start breaking them down into as small of
> changes
> > >> as
> > >> > > > > possible).
> > >> > > > >
> > >> > > > > Does that sound ok?
> > >> > > > >
> > >> > > >
> > >> > > > Perfectly OK for me!
> > >> > > >
> > >> > > >
> > >> > > > >
> > >> > > > > --
> > >> > > >
> > >> > > > Jarek Potiuk
> > >> > > > Polidea <https://www.polidea.com/> | Principal Software
> Engineer
> > >> > > >
> > >> > > > M: +48 660 796 129 <+48660796129>
> > >> > > > [image: Polidea] <https://www.polidea.com/>
> > >> > > >
> > >> > >
> > >>
> > >>
> >
>

Reply via email to