To expand on Max's point it doesn't concern me that this is a blocker for AirBnB, but it's not logical behavior and I'm sure many companies rely on the previous behavior (which I would say is the logically correct one). We are already running a fork of the release internally so we are unaffected, I'm more concerned about: a) Airflow 1.8.0 having a huge issue/regression in behavior that causes a lot of companies to revert or patch after upgrading. b) An illogical change being made in Airflow that makes the behavior non-intuitive.
Here are my PRs to fix the various issues (we might as well merge all of them in the next RC if we have one): Here is the fix for the dagruns ending prematurely: https://github. com/apache/incubator-airflow/pull/2099 Here is the fix for dagruns in a bad state crashing the UI (not a blocker but might as well include it in the next RC if we create one): https://github.com/apache/incubator-airflow/pull/2094 Black Squares in UI: No fix yet (will try to work on one shortly) but it's not a blocker. Double Trigger Issue That Alex G Mentioned: We have been seeing tasks in the running state get run by another worker almost exactly 1 hour after they start running. Double triggers are pretty unacceptable in Airflow, but I'm not counting this as a blocker because I don't fully understand what it is happening but it is still pretty scary. Internally we have a patch that mitigates this to some degree but Alex G is still investigating. On Thu, Feb 23, 2017 at 1:49 PM, Bolke de Bruin <bdbr...@gmail.com> wrote: > I’m not particularly against another RC. On the IPMC there were some > issues mentioned regarding licensing, which probably are blocking as well > (eg. no LICENSE etc in the tar ball). I found some HighCharts left overs as > well, while addressing the licensing issues. PR here: > https://github.com/apache/incubator-airflow/pull/2098 < > https://github.com/apache/incubator-airflow/pull/2098> , will be merged > shortly. > > I just hope we can get our own vote to pass quickly(!) and not have > another last minute blocker :P. > > Cheers > Bolke > > > On 23 Feb 2017, at 22:41, Maxime Beauchemin <maximebeauche...@gmail.com> > wrote: > > > > IMHO 1 is a blocker. The other issues could have been mitigated but 1 is > a > > dealbreaker for Airbnb. We have lots of large, critical DAGs that would > be > > in a standstill because of individual task failures, where in reality a > lot > > of progress can be made. > > > > Airflow should really do as much work as possible and honor the > > dependencies specified by the user before giving up and requiring > > intervention. > > > > Max > > > > On Thu, Feb 23, 2017 at 1:10 PM, Chris Riccomini <criccom...@apache.org> > > wrote: > > > >> My 2c: > >> > >> I observed both #1 and #2 in Dan's list. I figured y'all had had a > >> discussion about the change in behavior. :) In any case, I made my peace > >> with it, and we've been running happily in production for weeks now, so > I > >> personally don't see it as a blocker. Obviously, if it's an issue for > you > >> guys at AirBNB, a patch and merge to master is critical, but I still > think > >> we should fix this stuff as part of 1.8.1. > >> > >> One compelling counter argument to this is that there's a bit of > whiplash > >> in terms of behavior, where 1.7.1.* behaves one way, then 1.8.0 behaves > >> another, then 1.8.1 goes back to the old way again. I guess I'm just not > >> that worried about it. > >> > >> Anyway.. take it or leave it. :) > >> > >> Cheers, > >> Chris > >> > >> On Thu, Feb 23, 2017 at 12:31 PM, Bolke de Bruin <bdbr...@gmail.com> > >> wrote: > >> > >>> Gotcha. Will be patient. Good luck. > >>> > >>> Bolke > >>> > >>>> On 23 Feb 2017, at 21:12, Dan Davydov <dan.davy...@airbnb.com. > INVALID> > >>> wrote: > >>>> > >>>> Here is an example for 1, you can see that there are some white tasks > >>> that should have been run. I don't have time to create a skeleton DAG > at > >>> the moment unfortunately because of release-related firefighting. Will > >>> hopefully post back here later once firefighting is done. > >>>> > >>>> > >>>> On Thu, Feb 23, 2017 at 12:00 PM, Bolke de Bruin <bdbr...@gmail.com > >>> <mailto:bdbr...@gmail.com>> wrote: > >>>> Hey Dan, Alex, > >>>> > >>>> Indeed #1 seems serious, specifically the the second part - skipping > >> the > >>> root task (root task of the whole DAG?). Do you have a skeleton DAG > that > >>> exposes the issue? Is there a root cause analysis? When was the issue > >>> introduced? On the the issue Alex mentioned, we don’t see that and I > >> cannot > >>> really align the description of the issue with the PR yet, ie. I need > >>> clarification. > >>>> > >>>> Obviously, I’m not very happy if we indeed need to retract the release > >>> as we are ~12 hours away from closing of the vote at the IPMC > mailinglist > >>> (strangely enough no one has voted yet). However, if it is that serious > >>> that it cannot wait for 1.8.1 then we need to do it. I would define > >>> “serious” as many people are going to be affected by it and they will > not > >>> have a workaround available to them (ie. patching code or database), > but > >>> the opinion of the community might differ. > >>>> > >>>> Cheers > >>>> Bolke > >>>> > >>>> P.S. I am also interested in #3, as it sounds like a integrity issue > >>> (which verify_integrity should catch) but also maybe too strong a > >>> assumption that such a task should exist (ie. a task was added to a Dag > >> in > >>> a later stage). > >>>> > >>>> > >>>>> On 23 Feb 2017, at 20:15, Dan Davydov <dan.davy...@airbnb.com > >> <mailto: > >>> dan.davy...@airbnb.com>.INVALID> wrote: > >>>>> > >>>>> Some more issues found by our users in addition to the one Alex > >>> reported > >>>>> and the UI issue when a dagrun doesn't have a start date: > >>>>> 1. If a task fails it fails the whole dagrun immediately fails, this > >>> is a > >>>>> very large change to how control flow works as the rest of the tasks > >>> in the > >>>>> DAG are not run (even e.g. leaf tasks). The same is true of the > >> skipped > >>>>> status (if a leaf task is skipped then the root task for the DAG will > >>> get > >>>>> skipped and none of the other tasks in the DAG will run). > >>>>> 2. The black squares in the UI for tasks that aren't ready to run yet > >>> are > >>>>> confusing and make it hard for users to see which tasks haven't run > >> yet > >>>>> (lower contrast). We should never initialize tasks in the DB that do > >>> not > >>>>> have a state (or at the least these should be white). > >>>>> 3. The Dagrun has a get_task_instance method that will fail if a > >> dagrun > >>>>> doesn't have a copy of a task instance created which we have seen > >>> happen > >>>>> for some DAGs. This prevents those tasks from getting scheduled. > >>>>> > >>>>> I already patched 3 (and have a PR in flight for open source), and am > >>>>> working on a patch for 1 internally. 1 should be a blocker for > >>> releasing. > >>>>> > >>>>> On Wed, Feb 22, 2017 at 4:38 PM, Alex Guziel <alex.guz...@airbnb.com > >>> <mailto:alex.guz...@airbnb.com>.invalid > >>>>>> wrote: > >>>>> > >>>>>> I have some concern that this change > >>>>>> https://github.com/apache/incubator-airflow/pull/1939 < > >>> https://github.com/apache/incubator-airflow/pull/1939> > >>>>>> [AIRFLOW-679] may be having issues because we are seeing lots of > >>> double > >>>>>> triggers > >>>>>> of tasks and tasks being killed as a result. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Wed, Feb 22, 2017 4:35 PM, Dan Davydov > >>> dan.davy...@airbnb.com.INVALID > >>>>>> wrote: > >>>>>> Bumping the thread so another user can comment. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Wed, Feb 22, 2017 at 3:12 PM, Maxime Beauchemin < > >>>>>> > >>>>>> maximebeauche...@gmail.com <mailto:maximebeauche...@gmail.com>> > >>> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> What I meant to ask is "how much engineering effort it takes to > >> bake > >>> a > >>>>>> > >>>>>>> single RC?", I guess it depends on how much git-fu is necessary > >> plus > >>> some > >>>>>> > >>>>>>> overhead cost of doing the series of actions/commands/emails/jira. > >>>>>> > >>>>>>> > >>>>>> > >>>>>>> I can volunteer for 1.8.1 (hopefully I can get do it along another > >>> Airbnb > >>>>>> > >>>>>>> engineer/volunteer to tag along) and will try to document/automate > >>>>>> > >>>>>>> everything I can as I go through the process. The goal of 1.8.1 > >>> could be > >>>>>> to > >>>>>> > >>>>>>> basically package 1.8.0 + Dan's bugfix, and for Airbnb to get > >>> familiar > >>>>>> with > >>>>>> > >>>>>>> the process. > >>>>>> > >>>>>>> > >>>>>> > >>>>>>> It'd be great if you can dump your whole process on the wiki, and > >>> we'll > >>>>>> > >>>>>>> improve it on this next pass. > >>>>>> > >>>>>>> > >>>>>> > >>>>>>> Thanks again for the mountain of work that went into packaging this > >>>>>> > >>>>>>> release. > >>>>>> > >>>>>>> > >>>>>> > >>>>>>> Max > >>>>>> > >>>>>>> > >>>>>> > >>>>>>> On Wed, Feb 22, 2017 at 2:44 PM, Bolke de Bruin <bdbr...@gmail.com > >>> <mailto:bdbr...@gmail.com>> > >>>>>> wrote: > >>>>>> > >>>>>>> > >>>>>> > >>>>>>>> I thought you volunteered to baby sit 1.8.1 Chris ;-)? > >>>>>> > >>>>>>>> > >>>>>> > >>>>>>>> Sent from my iPhone > >>>>>> > >>>>>>>> > >>>>>> > >>>>>>>>> On 22 Feb 2017, at 23:31, Chris Riccomini <criccom...@apache.org > >>> <mailto:criccom...@apache.org>> > >>>>>> > >>>>>>> wrote: > >>>>>> > >>>>>>>>> > >>>>>> > >>>>>>>>> I'm +1 for doing a 1.8.1 fast follow-on > >>>>>> > >>>>>>>>> > >>>>>> > >>>>>>>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin < > >>>>>> > >>>>>>>>> maximebeauche...@gmail.com <mailto:maximebeauche...@gmail.com>> > >>> wrote: > >>>>>> > >>>>>>>>> > >>>>>> > >>>>>>>>>> Our database may have edge cases that could be associated with > >>>>>> running > >>>>>> > >>>>>>>> any > >>>>>> > >>>>>>>>>> previous version that may or may not have been part of an > >> official > >>>>>> > >>>>>>>> release. > >>>>>> > >>>>>>>>>> > >>>>>> > >>>>>>>>>> Let's see if anyone else reports the issue. If no one does, one > >>>>>> option > >>>>>> > >>>>>>>> is > >>>>>> > >>>>>>>>>> to release 1.8.0 as is with a comment in the release notes, and > >>>>>> have a > >>>>>> > >>>>>>>>>> future official minor apache release 1.8.1 that would fix these > >>>>>> minor > >>>>>> > >>>>>>>>>> issues that are not deal breaker. > >>>>>> > >>>>>>>>>> > >>>>>> > >>>>>>>>>> @bolke, I'm curious, how long does it take you to go through one > >>>>>> > >>>>>>> release > >>>>>> > >>>>>>>>>> cycle? Oh, and do you have a documented step by step process for > >>>>>> > >>>>>>>> releasing? > >>>>>> > >>>>>>>>>> I'd like to add the Pypi part to this doc and add committers > >> that > >>>>>> are > >>>>>> > >>>>>>>>>> interested to have rights on the project on Pypi. > >>>>>> > >>>>>>>>>> > >>>>>> > >>>>>>>>>> Max > >>>>>> > >>>>>>>>>> > >>>>>> > >>>>>>>>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin < > >>> bdbr...@gmail.com <mailto:bdbr...@gmail.com> > >>>>>>> > >>>>>> > >>>>>>>> wrote: > >>>>>> > >>>>>>>>>>> > >>>>>> > >>>>>>>>>>> So it is a database integrity issue? Afaik a start_date should > >>>>>> always > >>>>>> > >>>>>>>> be > >>>>>> > >>>>>>>>>>> set for a DagRun (create_dagrun) does so I didn't check the > >> code > >>>>>> > >>>>>>>> though. > >>>>>> > >>>>>>>>>>> > >>>>>> > >>>>>>>>>>> Sent from my iPhone > >>>>>> > >>>>>>>>>>> > >>>>>> > >>>>>>>>>>>> On 22 Feb 2017, at 22:19, Dan Davydov <dan.davy...@airbnb.com > >>> <mailto:dan.davy...@airbnb.com>. > >>>>>> > >>>>>>>> INVALID> > >>>>>> > >>>>>>>>>>> wrote: > >>>>>> > >>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>> Should clarify this occurs when a dagrun does not have a start > >>>>>> date, > >>>>>> > >>>>>>>>>> not > >>>>>> > >>>>>>>>>>> a > >>>>>> > >>>>>>>>>>>> dag (which makes it even less likely to happen). I don't think > >>>>>> this > >>>>>> > >>>>>>> is > >>>>>> > >>>>>>>>>> a > >>>>>> > >>>>>>>>>>>> blocker for releasing. > >>>>>> > >>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov < > >>>>>> > >>>>>>> dan.davy...@airbnb.com <mailto:dan.davy...@airbnb.com> > >>>>>> > >>>>>>>>> > >>>>>> > >>>>>>>>>>> wrote: > >>>>>> > >>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>> I rolled this out in our prod and the webservers failed to > >> load > >>>>>> due > >>>>>> > >>>>>>>> to > >>>>>> > >>>>>>>>>>>>> this commit: > >>>>>> > >>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag > >>>>>> > >>>>>>>>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72 > >>>>>> > >>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>> This fixed it: > >>>>>> > >>>>>>>>>>>>> - </a> <span id="statuses_info" > >>>>>> > >>>>>>>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true" > >>>>>> > >>>>>>> title="Start > >>>>>> > >>>>>>>>>>> Date: > >>>>>> > >>>>>>>>>>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"></span> > >>>>>> > >>>>>>>>>>>>> + </a> <span id="statuses_info" > >>>>>> > >>>>>>>>>>>>> class="glyphicon glyphicon-info-sign" > >>> aria-hidden="true"></span> > >>>>>> > >>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>> This is caused by assuming that all DAGs have start dates > >> set, > >>>>>> so a > >>>>>> > >>>>>>>>>>> broken > >>>>>> > >>>>>>>>>>>>> DAG will take down the whole UI. Not sure if we want to make > >>>>>> this a > >>>>>> > >>>>>>>>>>> blocker > >>>>>> > >>>>>>>>>>>>> for the release or not, I'm guessing for most deployments > >> this > >>>>>> > >>>>>>> would > >>>>>> > >>>>>>>>>>> occur > >>>>>> > >>>>>>>>>>>>> pretty rarely. I'll submit a PR to fix it soon. > >>>>>> > >>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini < > >>>>>> > >>>>>>>>>> criccom...@apache.org <mailto:criccom...@apache.org> > >>>>>> > >>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>> wrote: > >>>>>> > >>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>>> Ack that the vote has already passed, but belated +1 > >> (binding) > >>>>>> > >>>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin < > >>>>>> > >>>>>>> bdbr...@gmail.com <mailto:bdbr...@gmail.com>> > >>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>> > >>>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>>>> IPMC Voting can be found here: > >>>>>> > >>>>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- > >> general/ > >>> <http://mail-archives.apache.org/mod_mbox/incubator-general/> > >>>>>> > >>>>>>>>>>>>>> 201702.mbox/% > >>>>>> > >>>>>>>>>>>>>>> 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com <mailto: > >>> 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com>%3e < > >>>>>> > >>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- > >> general/ > >>> <http://mail-archives.apache.org/mod_mbox/incubator-general/> > >>>>>> > >>>>>>>>>>>>>> 201702.mbox/% > >>>>>> > >>>>>>>>>>>>>>> 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com <mailto: > >>> 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com>%3E> > >>>>>> > >>>>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>>>> Kind regards, > >>>>>> > >>>>>>>>>>>>>>> Bolke > >>>>>> > >>>>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin < > >> bdbr...@gmail.com > >>> <mailto:bdbr...@gmail.com>> > >>>>>> > >>>>>>>>>> wrote: > >>>>>> > >>>>>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>>>>> Hello, > >>>>>> > >>>>>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has been > >>>>>> > >>>>>>>> accepted. > >>>>>> > >>>>>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>>>>> 9 “+1” votes received: > >>>>>> > >>>>>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>>>>> - Maxime Beauchemin (binding) > >>>>>> > >>>>>>>>>>>>>>>> - Arthur Wiedmer (binding) > >>>>>> > >>>>>>>>>>>>>>>> - Dan Davydov (binding) > >>>>>> > >>>>>>>>>>>>>>>> - Jeremiah Lowin (binding) > >>>>>> > >>>>>>>>>>>>>>>> - Siddharth Anand (binding) > >>>>>> > >>>>>>>>>>>>>>>> - Alex van Boxel (binding) > >>>>>> > >>>>>>>>>>>>>>>> - Bolke de Bruin (binding) > >>>>>> > >>>>>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>>>>> - Jayesh Senjaliya (non-binding) > >>>>>> > >>>>>>>>>>>>>>>> - Yi (non-binding) > >>>>>> > >>>>>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>>>>> Vote thread (start): > >>>>>> > >>>>>>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- < > >>> http://mail-archives.apache.org/mod_mbox/incubator-> > >>>>>> > >>>>>>>>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188- > >>>>>> > >>>>>>>>>>>>>>> 6c92c31a2...@gmail.com <mailto:6c92c31a2...@gmail.com>%3e > >> < > >>> http://mail-archives.apache <http://mail-archives.apache/>. > >>>>>> > >>>>>>>>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/% > >> 3C7EB7B6D6- > >>>>>> > >>>>>>>>>>>>>> 092E-48D2-AA0F- > >>>>>> > >>>>>>>>>>>>>>> 15f44376a...@gmail.com <mailto:15f44376a...@gmail.com>%3E> > >>>>>> > >>>>>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>>>>> Next steps: > >>>>>> > >>>>>>>>>>>>>>>> 1) will start the voting process at the IPMC mailinglist. > >> I > >>> do > >>>>>> > >>>>>>>>>> expect > >>>>>> > >>>>>>>>>>>>>>> some changes to be required mostly in documentation maybe a > >>>>>> > >>>>>>> license > >>>>>> > >>>>>>>>>>> here > >>>>>> > >>>>>>>>>>>>>>> and there. So, we might end up with changes to stable. As > >>> long > >>>>>> as > >>>>>> > >>>>>>>>>>> these > >>>>>> > >>>>>>>>>>>>>> are > >>>>>> > >>>>>>>>>>>>>>> not (significant) code changes I will not re-raise the > >> vote. > >>>>>> > >>>>>>>>>>>>>>>> 2) Only after the positive voting on the IPMC and > >>>>>> finalisation I > >>>>>> > >>>>>>>>>> will > >>>>>> > >>>>>>>>>>>>>>> rebrand the RC to Release. > >>>>>> > >>>>>>>>>>>>>>>> 3) I will upload it to the incubator release page, then > >> the > >>>>>> tar > >>>>>> > >>>>>>>>>> ball > >>>>>> > >>>>>>>>>>>>>>> needs to propagate to the mirrors. > >>>>>> > >>>>>>>>>>>>>>>> 4) Update the website (can someone volunteer please?) > >>>>>> > >>>>>>>>>>>>>>>> 5) Finally, I will ask Maxime to upload it to pypi. It > >> seems > >>>>>> we > >>>>>> > >>>>>>>> can > >>>>>> > >>>>>>>>>>>>>> keep > >>>>>> > >>>>>>>>>>>>>>> the apache branding as lib cloud is doing this as well ( > >>>>>> > >>>>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package < > >>> https://libcloud.apache.org/downloads.html#pypi-package> < > >>>>>> > >>>>>>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package < > >>> https://libcloud.apache.org/downloads.html#pypi-package>>). > >>>>>> > >>>>>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>>>>> Jippie! > >>>>>> > >>>>>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>>>>> Bolke > >>>>>> > >>>>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>>>> > >>>>>> > >>>>>>>>>>> > >>>>>> > >>>>>>>>>> > >>>>>> > >>>>>>>> > >>>>>> > >>>>>>> > >>>>>> > >>>> > >>>> > >>> > >>> > >> > >