IMHO 1 is a blocker. The other issues could have been mitigated but 1 is a
dealbreaker for Airbnb. We have lots of large, critical DAGs that would be
in a standstill because of individual task failures, where in reality a lot
of progress can be made.

Airflow should really do as much work as possible and honor the
dependencies specified by the user before giving up and requiring
intervention.

Max

On Thu, Feb 23, 2017 at 1:10 PM, Chris Riccomini <criccom...@apache.org>
wrote:

> My 2c:
>
> I observed both #1 and #2 in Dan's list. I figured y'all had had a
> discussion about the change in behavior. :) In any case, I made my peace
> with it, and we've been running happily in production for weeks now, so I
> personally don't see it as a blocker. Obviously, if it's an issue for you
> guys at AirBNB, a patch and merge to master is critical, but I still think
> we should fix this stuff as part of 1.8.1.
>
> One compelling counter argument to this is that there's a bit of whiplash
> in terms of behavior, where 1.7.1.* behaves one way, then 1.8.0 behaves
> another, then 1.8.1 goes back to the old way again. I guess I'm just not
> that worried about it.
>
> Anyway.. take it or leave it. :)
>
> Cheers,
> Chris
>
> On Thu, Feb 23, 2017 at 12:31 PM, Bolke de Bruin <bdbr...@gmail.com>
> wrote:
>
> > Gotcha. Will be patient. Good luck.
> >
> > Bolke
> >
> > > On 23 Feb 2017, at 21:12, Dan Davydov <dan.davy...@airbnb.com.INVALID>
> > wrote:
> > >
> > > Here is an example for 1, you can see that there are some white tasks
> > that should have been run. I don't have time to create a skeleton DAG at
> > the moment unfortunately because of release-related firefighting. Will
> > hopefully post back here later once firefighting is done.
> > >
> > >
> > > On Thu, Feb 23, 2017 at 12:00 PM, Bolke de Bruin <bdbr...@gmail.com
> > <mailto:bdbr...@gmail.com>> wrote:
> > > Hey Dan, Alex,
> > >
> > > Indeed #1 seems serious, specifically the the second part - skipping
> the
> > root task (root task of the whole DAG?). Do you have a skeleton DAG that
> > exposes the issue? Is there a root cause analysis? When was the issue
> > introduced? On the the issue Alex mentioned, we don’t see that and I
> cannot
> > really align the description of the issue with the PR yet, ie. I need
> > clarification.
> > >
> > > Obviously, I’m not very happy if we indeed need to retract the release
> > as we are ~12 hours away from closing of the vote at the IPMC mailinglist
> > (strangely enough no one has voted yet). However, if it is that serious
> > that it cannot wait for 1.8.1 then we need to do it. I would define
> > “serious” as many people are going to be affected by it and they will not
> > have a workaround available to them (ie. patching code or database), but
> > the opinion of the community might differ.
> > >
> > > Cheers
> > > Bolke
> > >
> > > P.S. I am also interested in #3, as it sounds like a integrity issue
> > (which verify_integrity should catch) but also maybe too strong a
> > assumption that such a task should exist (ie. a task was added to a Dag
> in
> > a later stage).
> > >
> > >
> > > > On 23 Feb 2017, at 20:15, Dan Davydov <dan.davy...@airbnb.com
> <mailto:
> > dan.davy...@airbnb.com>.INVALID> wrote:
> > > >
> > > > Some more issues found by our users in addition to the one Alex
> > reported
> > > > and the UI issue when a dagrun doesn't have a start date:
> > > > 1. If a task fails it fails the whole dagrun immediately fails, this
> > is a
> > > > very large change to how control flow works as the rest of the tasks
> > in the
> > > > DAG are not run (even e.g. leaf tasks). The same is true of the
> skipped
> > > > status (if a leaf task is skipped then the root task for the DAG will
> > get
> > > > skipped and none of the other tasks in the DAG will run).
> > > > 2. The black squares in the UI for tasks that aren't ready to run yet
> > are
> > > > confusing and make it hard for users to see which tasks haven't run
> yet
> > > > (lower contrast). We should never initialize tasks in the DB that do
> > not
> > > > have a state (or at the least these should be white).
> > > > 3. The Dagrun has a get_task_instance method that will fail if a
> dagrun
> > > > doesn't have a copy of a task instance created which we have seen
> > happen
> > > > for some DAGs. This prevents those tasks from getting scheduled.
> > > >
> > > > I already patched 3 (and have a PR in flight for open source), and am
> > > > working on a patch for 1 internally. 1 should be a blocker for
> > releasing.
> > > >
> > > > On Wed, Feb 22, 2017 at 4:38 PM, Alex Guziel <alex.guz...@airbnb.com
> > <mailto:alex.guz...@airbnb.com>.invalid
> > > >> wrote:
> > > >
> > > >> I have some concern that this change
> > > >> https://github.com/apache/incubator-airflow/pull/1939 <
> > https://github.com/apache/incubator-airflow/pull/1939>
> > > >> [AIRFLOW-679] may be having issues because we are seeing lots of
> > double
> > > >> triggers
> > > >> of tasks and tasks being killed as a result.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Feb 22, 2017 4:35 PM, Dan Davydov
> > dan.davy...@airbnb.com.INVALID
> > > >> wrote:
> > > >> Bumping the thread so another user can comment.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Feb 22, 2017 at 3:12 PM, Maxime Beauchemin <
> > > >>
> > > >> maximebeauche...@gmail.com <mailto:maximebeauche...@gmail.com>>
> > wrote:
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>> What I meant to ask is "how much engineering effort it takes to
> bake
> > a
> > > >>
> > > >>> single RC?", I guess it depends on how much git-fu is necessary
> plus
> > some
> > > >>
> > > >>> overhead cost of doing the series of actions/commands/emails/jira.
> > > >>
> > > >>>
> > > >>
> > > >>> I can volunteer for 1.8.1 (hopefully I can get do it along another
> > Airbnb
> > > >>
> > > >>> engineer/volunteer to tag along) and will try to document/automate
> > > >>
> > > >>> everything I can as I go through the process. The goal of 1.8.1
> > could be
> > > >> to
> > > >>
> > > >>> basically package 1.8.0 + Dan's bugfix, and for Airbnb to get
> > familiar
> > > >> with
> > > >>
> > > >>> the process.
> > > >>
> > > >>>
> > > >>
> > > >>> It'd be great if you can dump your whole process on the wiki, and
> > we'll
> > > >>
> > > >>> improve it on this next pass.
> > > >>
> > > >>>
> > > >>
> > > >>> Thanks again for the mountain of work that went into packaging this
> > > >>
> > > >>> release.
> > > >>
> > > >>>
> > > >>
> > > >>> Max
> > > >>
> > > >>>
> > > >>
> > > >>> On Wed, Feb 22, 2017 at 2:44 PM, Bolke de Bruin <bdbr...@gmail.com
> > <mailto:bdbr...@gmail.com>>
> > > >> wrote:
> > > >>
> > > >>>
> > > >>
> > > >>>> I thought you volunteered to baby sit 1.8.1 Chris ;-)?
> > > >>
> > > >>>>
> > > >>
> > > >>>> Sent from my iPhone
> > > >>
> > > >>>>
> > > >>
> > > >>>>> On 22 Feb 2017, at 23:31, Chris Riccomini <criccom...@apache.org
> > <mailto:criccom...@apache.org>>
> > > >>
> > > >>> wrote:
> > > >>
> > > >>>>>
> > > >>
> > > >>>>> I'm +1 for doing a 1.8.1 fast follow-on
> > > >>
> > > >>>>>
> > > >>
> > > >>>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin <
> > > >>
> > > >>>>> maximebeauche...@gmail.com <mailto:maximebeauche...@gmail.com>>
> > wrote:
> > > >>
> > > >>>>>
> > > >>
> > > >>>>>> Our database may have edge cases that could be associated with
> > > >> running
> > > >>
> > > >>>> any
> > > >>
> > > >>>>>> previous version that may or may not have been part of an
> official
> > > >>
> > > >>>> release.
> > > >>
> > > >>>>>>
> > > >>
> > > >>>>>> Let's see if anyone else reports the issue. If no one does, one
> > > >> option
> > > >>
> > > >>>> is
> > > >>
> > > >>>>>> to release 1.8.0 as is with a comment in the release notes, and
> > > >> have a
> > > >>
> > > >>>>>> future official minor apache release 1.8.1 that would fix these
> > > >> minor
> > > >>
> > > >>>>>> issues that are not deal breaker.
> > > >>
> > > >>>>>>
> > > >>
> > > >>>>>> @bolke, I'm curious, how long does it take you to go through one
> > > >>
> > > >>> release
> > > >>
> > > >>>>>> cycle? Oh, and do you have a documented step by step process for
> > > >>
> > > >>>> releasing?
> > > >>
> > > >>>>>> I'd like to add the Pypi part to this doc and add committers
> that
> > > >> are
> > > >>
> > > >>>>>> interested to have rights on the project on Pypi.
> > > >>
> > > >>>>>>
> > > >>
> > > >>>>>> Max
> > > >>
> > > >>>>>>
> > > >>
> > > >>>>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin <
> > bdbr...@gmail.com <mailto:bdbr...@gmail.com>
> > > >>>
> > > >>
> > > >>>> wrote:
> > > >>
> > > >>>>>>>
> > > >>
> > > >>>>>>> So it is a database integrity issue? Afaik a start_date should
> > > >> always
> > > >>
> > > >>>> be
> > > >>
> > > >>>>>>> set for a DagRun (create_dagrun) does so I didn't check the
> code
> > > >>
> > > >>>> though.
> > > >>
> > > >>>>>>>
> > > >>
> > > >>>>>>> Sent from my iPhone
> > > >>
> > > >>>>>>>
> > > >>
> > > >>>>>>>> On 22 Feb 2017, at 22:19, Dan Davydov <dan.davy...@airbnb.com
> > <mailto:dan.davy...@airbnb.com>.
> > > >>
> > > >>>> INVALID>
> > > >>
> > > >>>>>>> wrote:
> > > >>
> > > >>>>>>>>
> > > >>
> > > >>>>>>>> Should clarify this occurs when a dagrun does not have a start
> > > >> date,
> > > >>
> > > >>>>>> not
> > > >>
> > > >>>>>>> a
> > > >>
> > > >>>>>>>> dag (which makes it even less likely to happen). I don't think
> > > >> this
> > > >>
> > > >>> is
> > > >>
> > > >>>>>> a
> > > >>
> > > >>>>>>>> blocker for releasing.
> > > >>
> > > >>>>>>>>
> > > >>
> > > >>>>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov <
> > > >>
> > > >>> dan.davy...@airbnb.com <mailto:dan.davy...@airbnb.com>
> > > >>
> > > >>>>>
> > > >>
> > > >>>>>>> wrote:
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>> I rolled this out in our prod and the webservers failed to
> load
> > > >> due
> > > >>
> > > >>>> to
> > > >>
> > > >>>>>>>>> this commit:
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag
> > > >>
> > > >>>>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>> This fixed it:
> > > >>
> > > >>>>>>>>> - </a> <span id="statuses_info"
> > > >>
> > > >>>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true"
> > > >>
> > > >>> title="Start
> > > >>
> > > >>>>>>> Date:
> > > >>
> > > >>>>>>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"></span>
> > > >>
> > > >>>>>>>>> + </a> <span id="statuses_info"
> > > >>
> > > >>>>>>>>> class="glyphicon glyphicon-info-sign"
> > aria-hidden="true"></span>
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>> This is caused by assuming that all DAGs have start dates
> set,
> > > >> so a
> > > >>
> > > >>>>>>> broken
> > > >>
> > > >>>>>>>>> DAG will take down the whole UI. Not sure if we want to make
> > > >> this a
> > > >>
> > > >>>>>>> blocker
> > > >>
> > > >>>>>>>>> for the release or not, I'm guessing for most deployments
> this
> > > >>
> > > >>> would
> > > >>
> > > >>>>>>> occur
> > > >>
> > > >>>>>>>>> pretty rarely. I'll submit a PR to fix it soon.
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini <
> > > >>
> > > >>>>>> criccom...@apache.org <mailto:criccom...@apache.org>
> > > >>
> > > >>>>>>>>
> > > >>
> > > >>>>>>>>> wrote:
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>>> Ack that the vote has already passed, but belated +1
> (binding)
> > > >>
> > > >>>>>>>>>>
> > > >>
> > > >>>>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin <
> > > >>
> > > >>> bdbr...@gmail.com <mailto:bdbr...@gmail.com>>
> > > >>
> > > >>>>>>>>>> wrote:
> > > >>
> > > >>>>>>>>>>
> > > >>
> > > >>>>>>>>>>> IPMC Voting can be found here:
> > > >>
> > > >>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-
> general/
> > <http://mail-archives.apache.org/mod_mbox/incubator-general/>
> > > >>
> > > >>>>>>>>>> 201702.mbox/%
> > > >>
> > > >>>>>>>>>>> 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com <mailto:
> > 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com>%3e <
> > > >>
> > > >>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-
> general/
> > <http://mail-archives.apache.org/mod_mbox/incubator-general/>
> > > >>
> > > >>>>>>>>>> 201702.mbox/%
> > > >>
> > > >>>>>>>>>>> 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com <mailto:
> > 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com>%3E>
> > > >>
> > > >>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>> Kind regards,
> > > >>
> > > >>>>>>>>>>> Bolke
> > > >>
> > > >>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin <
> bdbr...@gmail.com
> > <mailto:bdbr...@gmail.com>>
> > > >>
> > > >>>>>> wrote:
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> Hello,
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has been
> > > >>
> > > >>>> accepted.
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> 9 “+1” votes received:
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> - Maxime Beauchemin (binding)
> > > >>
> > > >>>>>>>>>>>> - Arthur Wiedmer (binding)
> > > >>
> > > >>>>>>>>>>>> - Dan Davydov (binding)
> > > >>
> > > >>>>>>>>>>>> - Jeremiah Lowin (binding)
> > > >>
> > > >>>>>>>>>>>> - Siddharth Anand (binding)
> > > >>
> > > >>>>>>>>>>>> - Alex van Boxel (binding)
> > > >>
> > > >>>>>>>>>>>> - Bolke de Bruin (binding)
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> - Jayesh Senjaliya (non-binding)
> > > >>
> > > >>>>>>>>>>>> - Yi (non-binding)
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> Vote thread (start):
> > > >>
> > > >>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- <
> > http://mail-archives.apache.org/mod_mbox/incubator->
> > > >>
> > > >>>>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188-
> > > >>
> > > >>>>>>>>>>> 6c92c31a2...@gmail.com <mailto:6c92c31a2...@gmail.com>%3e
> <
> > http://mail-archives.apache <http://mail-archives.apache/>.
> > > >>
> > > >>>>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/%
> 3C7EB7B6D6-
> > > >>
> > > >>>>>>>>>> 092E-48D2-AA0F-
> > > >>
> > > >>>>>>>>>>> 15f44376a...@gmail.com <mailto:15f44376a...@gmail.com>%3E>
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> Next steps:
> > > >>
> > > >>>>>>>>>>>> 1) will start the voting process at the IPMC mailinglist.
> I
> > do
> > > >>
> > > >>>>>> expect
> > > >>
> > > >>>>>>>>>>> some changes to be required mostly in documentation maybe a
> > > >>
> > > >>> license
> > > >>
> > > >>>>>>> here
> > > >>
> > > >>>>>>>>>>> and there. So, we might end up with changes to stable. As
> > long
> > > >> as
> > > >>
> > > >>>>>>> these
> > > >>
> > > >>>>>>>>>> are
> > > >>
> > > >>>>>>>>>>> not (significant) code changes I will not re-raise the
> vote.
> > > >>
> > > >>>>>>>>>>>> 2) Only after the positive voting on the IPMC and
> > > >> finalisation I
> > > >>
> > > >>>>>> will
> > > >>
> > > >>>>>>>>>>> rebrand the RC to Release.
> > > >>
> > > >>>>>>>>>>>> 3) I will upload it to the incubator release page, then
> the
> > > >> tar
> > > >>
> > > >>>>>> ball
> > > >>
> > > >>>>>>>>>>> needs to propagate to the mirrors.
> > > >>
> > > >>>>>>>>>>>> 4) Update the website (can someone volunteer please?)
> > > >>
> > > >>>>>>>>>>>> 5) Finally, I will ask Maxime to upload it to pypi. It
> seems
> > > >> we
> > > >>
> > > >>>> can
> > > >>
> > > >>>>>>>>>> keep
> > > >>
> > > >>>>>>>>>>> the apache branding as lib cloud is doing this as well (
> > > >>
> > > >>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package <
> > https://libcloud.apache.org/downloads.html#pypi-package> <
> > > >>
> > > >>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package <
> > https://libcloud.apache.org/downloads.html#pypi-package>>).
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> Jippie!
> > > >>
> > > >>>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>> Bolke
> > > >>
> > > >>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>>
> > > >>
> > > >>>>>>>>>>
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>>>
> > > >>
> > > >>>>>>>
> > > >>
> > > >>>>>>
> > > >>
> > > >>>>
> > > >>
> > > >>>
> > > >>
> > >
> > >
> >
> >
>

Reply via email to