Re: 1.7.1 release status

Maxime Beauchemin Fri, 06 May 2016 07:54:05 -0700

I think that there are currently 4 ways to attach a task to a DAG:
* Operator(task_id='foo', dag=dag)
* dag.add_task(t)
* Context manager: `with DAG('foo') as dag:`
* inferred from set_upstream() and set_downstream in 1.7.1 (from my
understanding)


The pattern of passing the DAG object to BaseOperator constructor should
probably be mandatory as this is how the `default_args` and `params` magic
happens and it may be unclear to users that it is the case. I'm assuming
that the behavior of the context manager is equivalent to the explictit
dag=dag

Knowing this, set_upstream shouldn't have to infer dag attribution, but
simply check that the dags on both sides is the same object.

So the changes I'd suggest would be:
* The absence of dag=dag (or equivalent context manager) WARNS about
deprecation in 2.0
* DAG.add_task becomes private DAG._add_task and now WARNS about
deprecation in 2.0 on usage, and also warns when reattributing a DAG twice
* set_upstream/downstream infers where needed, but WARNS about deprecation
of inference in 2.0
* Add new methods to replace the upstream(dag.root) pattern, the terms root
is unclear (does it mean firsts or lasts?). SE I'd suggest convenience
methods `set_as_last` and `set_as_first` for clarity

Max

On Fri, May 6, 2016 at 6:32 AM, Bolke de Bruin <bdbr...@gmail.com> wrote:

> I agree with disabling building broken DAGs and also disabling / not
> supporting orphaned Operators/TaskInstances. It creates so many issues down
> the line and the fixes are relatively easy. It will make airflow and the
> used DAGs more maintainable.
>
> So +1, just make sure it is well documented in UPDATING.md
>
> my 2 cents.
>
> Bolke
>
> > Op 6 mei 2016, om 07:22 heeft Jeremiah Lowin <jlo...@apache.org> het
> volgende geschreven:
> >
> > Tonight I was working with Dan on fixing the speed regression with large
> > DAGs: https://github.com/apache/incubator-airflow/pull/1470. That clears
> > the first blocker for 1.7.1 as described in AIRFLOW-52.
> >
> > We wanted to ask for the group's thoughts on the second blocker.
> Basically,
> > the issue centers on this pattern:
> >
> > ```python
> > # email is an operator WITHOUT a DAG.
> > email = Operator(...)
> >
> > # create dependencies for email
> > email.set_upstream(dag.roots)
> >
> > # add email to the DAG
> > dag.add_task(email)
> > ```
> >
> > Why is this a problem? Under Airflow 1.7.0, this DAG is actually
> completely
> > broken after the set_upstream command, because it has a dependency to a
> > task that's not in the DAG. It can't be run and will even raise an
> > exception if you do something simple like access dag.roots. HOWEVER, this
> > building this broken DAG is allowed in 1.7.0 and the user cures it in the
> > last line by explicitly adding the task.
> >
> > In https://github.com/apache/incubator-airflow/pull/1318, which will be
> > part of 1.7.1, I took steps that prevent users from creating broken DAGs
> at
> > all. The relevant fix in this case is that the email task would be
> > automatically added to the DAG (it would infer its membership from the
> > tasks in dag.roots). However, once that inference is made, the last line
> > becomes illegal, since you can't add a task to a DAG it's already in.
> >
> > So here's the thing: because the last line becomes illegal, this code
> > snippet will no longer run under 1.7.1. My understanding is that it is
> > being used in production at Airbnb, so I wanted to raise the issue to see
> > if we can get comfortable with the change.
> >
> > My opinion is that being able to build a broken DAG is *always* a bug,
> and
> > so the issue should be fixed even if that creates some incompatibilities
> > for anyone exploiting it. Particularly in this case, where the remedy is
> > simply to delete the last line.
> >
> > We thought about deprecating the behavior, but I don't see how we can
> > because 1) we don't know for sure that the user is trying to do something
> > illegal at the time of the set_upstream call, and 2) the guard against
> > double-adding a task to a DAG has been in Airflow for a very long time,
> so
> > reverting it would constitute a really massive behavior change.
> >
> > So my vote is to proceed with the fix, but as it could potentially
> > inconvenience the hand that feeds (and by "feeds" I mean "gave us
> Airflow")
> > I'd like to be sensitive to their needs.
> >
> > J
> >
> > On Thu, May 5, 2016 at 2:09 PM Dan Davydov
> <dan.davy...@airbnb.com.invalid>
> > wrote:
> >
> >> Moved discussion to https://issues.apache.org/jira/browse/AIRFLOW-52
> and
> >> updated the status of the task there.
> >>
> >> On Tue, May 3, 2016 at 2:32 AM, Dan Davydov <dan.davy...@airbnb.com>
> >> wrote:
> >>
> >>> It's per DAG unfortunately (we have some pretty funky DAGs here).
> >>> On May 2, 2016 10:26 PM, "Bolke de Bruin" <bdbr...@gmail.com> wrote:
> >>>
> >>>> Hi dan
> >>>>
> >>>> Is that per dag or per dag bag? Multiprocessing should parallelize dag
> >>>> parsing so I am very curious. Let me know if I can help out.
> >>>> Bolke
> >>>>
> >>>> Sent from my iPhone
> >>>>
> >>>>> On 3 mei 2016, at 01:47, Dan Davydov <dan.davy...@airbnb.com.INVALID
> >
> >>>> wrote:
> >>>>>
> >>>>> So a quick update, unfortunately we saw some DAGBag parsing time
> >>>> increases
> >>>>> (~10x for some DAGs) on the webservers with the 1.7.1rc3. Because of
> >>>> this I
> >>>>> will be working on a staging cluster that has a copy of our
> production
> >>>>> production DAGBag, and is a copy of our production airflow
> >>>> infrastructure,
> >>>>> just without the workers. This will let us debug the release outside
> >> of
> >>>>> production.
> >>>>>
> >>>>> On Thu, Apr 28, 2016 at 10:20 AM, Dan Davydov <
> dan.davy...@airbnb.com
> >>>
> >>>>> wrote:
> >>>>>
> >>>>>> Definitely, here were the issues we hit:
> >>>>>> - airbnb/airflow#1365 occured
> >>>>>> - Webservers/scheduler were timing out and stuck in restart cycles
> >> due
> >>>> to
> >>>>>> increased time spent on parsing DAGs due to
> airbnb/airflow#1213/files
> >>>>>> - Failed tasks that ran after the upgrade and the revert (after we
> >>>>>> reverted the upgrade) were unable to be cleared (but running the
> >> tasks
> >>>>>> through the UI worked without clearing them)
> >>>>>> - The way log files were stored on S3 was changed (airflow now
> >>>> requires a
> >>>>>> connection to be setup) which broke log storage
> >>>>>> - Some DAGs were broken (unable to be parsed) due to package
> >>>>>> reorganization in open-source (the import paths were changed) (the
> >>>> utils
> >>>>>> refactor commit)
> >>>>>>
> >>>>>> On Thu, Apr 28, 2016 at 12:17 AM, Bolke de Bruin <bdbr...@gmail.com
> >
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Dan,
> >>>>>>>
> >>>>>>> Are you able to share some of the bugs you have been hitting and
> >>>>>>> connected commits?
> >>>>>>>
> >>>>>>> We could at the very least learn from them and maybe even improve
> >>>> testing.
> >>>>>>>
> >>>>>>> Bolke
> >>>>>>>
> >>>>>>>
> >>>>>>>>> Op 28 apr. 2016, om 06:51 heeft Dan Davydov
> >>>>>>>> <dan.davy...@airbnb.com.INVALID> het volgende geschreven:
> >>>>>>>>
> >>>>>>>> All of the blockers were fixed as of yesterday (there was some
> >> issue
> >>>>>>> that
> >>>>>>>> Jeremiah was looking at with the last release candidate which I
> >>>> think is
> >>>>>>>> fixed but I'm not sure). I started staging the airbnb_1.7.1rc3 tag
> >>>>>>> earlier
> >>>>>>>> today, so as long as metrics look OK and the 1.7.1rc2 issues seem
> >>>>>>> resolved
> >>>>>>>> tomorrow I will release internally either tomorrow or Monday (we
> >> try
> >>>> to
> >>>>>>>> avoid releases on Friday). If there aren't any issues we can push
> >> the
> >>>>>>> 1.7.1
> >>>>>>>> tag on Monday/Tuesday.
> >>>>>>>>
> >>>>>>>> @Sid
> >>>>>>>> I think we were originally aiming to deploy internally once every
> >> two
> >>>>>>> weeks
> >>>>>>>> but we decided to do it once a month in the end. I'm not too sure
> >>>> about
> >>>>>>>> that so Max can comment there.
> >>>>>>>>
> >>>>>>>> We have been running 1.7.0 in production for about a month now and
> >> it
> >>>>>>>> stable.
> >>>>>>>>
> >>>>>>>> I think what really slowed down this release cycle is some commits
> >>>> that
> >>>>>>>> caused severe bugs that we decided to roll-forward with instead of
> >>>>>>> rolling
> >>>>>>>> back. We can potentially try reverting these commits next time
> >> while
> >>>> the
> >>>>>>>> fixes are applied for the next version, although this is not
> always
> >>>>>>> trivial
> >>>>>>>> to do.
> >>>>>>>>
> >>>>>>>> On Wed, Apr 27, 2016 at 9:31 PM, Siddharth Anand <
> >>>>>>>> siddharthan...@yahoo.com.invalid> wrote:
> >>>>>>>>
> >>>>>>>>> Btw, is anyone of the committers running 1.7.0 or later in any
> >>>> staging
> >>>>>>> or
> >>>>>>>>> production env? I have to say that given that 1.6.2 was the most
> >>>> stable
> >>>>>>>>> release and is 4 or more months old does not say much for our
> >>>> release
> >>>>>>>>> cadence or process. What's our plan for 1.7.1?
> >>>>>>>>>
> >>>>>>>>> Sent from Sid's iPhone
> >>>>>>>>>
> >>>>>>>>>>> On Apr 27, 2016, at 9:05 PM, Chris Riccomini <
> >>>> criccom...@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hey all,
> >>>>>>>>>>
> >>>>>>>>>> I just wanted to check in on the 1.7.1 release status. I know
> >> there
> >>>>>>> have
> >>>>>>>>>> been some major-ish bugs, as well as several people doing tests.
> >>>>>>> Should
> >>>>>>>>> we
> >>>>>>>>>> create a 1.7.1 release JIRA, and track outstanding issues there?
> >>>>>>>>>>
> >>>>>>>>>> Cheers,
> >>>>>>>>>> Chris
> >>>>>>
> >>>>
> >>>
> >>
>
>

Re: 1.7.1 release status

Reply via email to