Here is an example for 1, you can see that there are some white tasks that should have been run. I don't have time to create a skeleton DAG at the moment unfortunately because of release-related firefighting. Will hopefully post back here later once firefighting is done. [image: Inline image 1]
On Thu, Feb 23, 2017 at 12:00 PM, Bolke de Bruin <bdbr...@gmail.com> wrote: > Hey Dan, Alex, > > Indeed #1 seems serious, specifically the the second part - skipping the > root task (root task of the whole DAG?). Do you have a skeleton DAG that > exposes the issue? Is there a root cause analysis? When was the issue > introduced? On the the issue Alex mentioned, we don’t see that and I cannot > really align the description of the issue with the PR yet, ie. I need > clarification. > > Obviously, I’m not very happy if we indeed need to retract the release as > we are ~12 hours away from closing of the vote at the IPMC mailinglist > (strangely enough no one has voted yet). However, if it is that serious > that it cannot wait for 1.8.1 then we need to do it. I would define > “serious” as many people are going to be affected by it and they will not > have a workaround available to them (ie. patching code or database), but > the opinion of the community might differ. > > Cheers > Bolke > > P.S. I am also interested in #3, as it sounds like a integrity issue > (which verify_integrity should catch) but also maybe too strong a > assumption that such a task should exist (ie. a task was added to a Dag in > a later stage). > > > > On 23 Feb 2017, at 20:15, Dan Davydov <dan.davy...@airbnb.com.INVALID> > wrote: > > > > Some more issues found by our users in addition to the one Alex reported > > and the UI issue when a dagrun doesn't have a start date: > > 1. If a task fails it fails the whole dagrun immediately fails, this is a > > very large change to how control flow works as the rest of the tasks in > the > > DAG are not run (even e.g. leaf tasks). The same is true of the skipped > > status (if a leaf task is skipped then the root task for the DAG will get > > skipped and none of the other tasks in the DAG will run). > > 2. The black squares in the UI for tasks that aren't ready to run yet are > > confusing and make it hard for users to see which tasks haven't run yet > > (lower contrast). We should never initialize tasks in the DB that do not > > have a state (or at the least these should be white). > > 3. The Dagrun has a get_task_instance method that will fail if a dagrun > > doesn't have a copy of a task instance created which we have seen happen > > for some DAGs. This prevents those tasks from getting scheduled. > > > > I already patched 3 (and have a PR in flight for open source), and am > > working on a patch for 1 internally. 1 should be a blocker for releasing. > > > > On Wed, Feb 22, 2017 at 4:38 PM, Alex Guziel <alex.guz...@airbnb.com. > invalid > >> wrote: > > > >> I have some concern that this change > >> https://github.com/apache/incubator-airflow/pull/1939 > >> [AIRFLOW-679] may be having issues because we are seeing lots of double > >> triggers > >> of tasks and tasks being killed as a result. > >> > >> > >> > >> > >> > >> On Wed, Feb 22, 2017 4:35 PM, Dan Davydov dan.davy...@airbnb.com.INVALID > >> wrote: > >> Bumping the thread so another user can comment. > >> > >> > >> > >> > >> On Wed, Feb 22, 2017 at 3:12 PM, Maxime Beauchemin < > >> > >> maximebeauche...@gmail.com> wrote: > >> > >> > >> > >> > >>> What I meant to ask is "how much engineering effort it takes to bake a > >> > >>> single RC?", I guess it depends on how much git-fu is necessary plus > some > >> > >>> overhead cost of doing the series of actions/commands/emails/jira. > >> > >>> > >> > >>> I can volunteer for 1.8.1 (hopefully I can get do it along another > Airbnb > >> > >>> engineer/volunteer to tag along) and will try to document/automate > >> > >>> everything I can as I go through the process. The goal of 1.8.1 could > be > >> to > >> > >>> basically package 1.8.0 + Dan's bugfix, and for Airbnb to get familiar > >> with > >> > >>> the process. > >> > >>> > >> > >>> It'd be great if you can dump your whole process on the wiki, and we'll > >> > >>> improve it on this next pass. > >> > >>> > >> > >>> Thanks again for the mountain of work that went into packaging this > >> > >>> release. > >> > >>> > >> > >>> Max > >> > >>> > >> > >>> On Wed, Feb 22, 2017 at 2:44 PM, Bolke de Bruin <bdbr...@gmail.com> > >> wrote: > >> > >>> > >> > >>>> I thought you volunteered to baby sit 1.8.1 Chris ;-)? > >> > >>>> > >> > >>>> Sent from my iPhone > >> > >>>> > >> > >>>>> On 22 Feb 2017, at 23:31, Chris Riccomini <criccom...@apache.org> > >> > >>> wrote: > >> > >>>>> > >> > >>>>> I'm +1 for doing a 1.8.1 fast follow-on > >> > >>>>> > >> > >>>>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin < > >> > >>>>> maximebeauche...@gmail.com> wrote: > >> > >>>>> > >> > >>>>>> Our database may have edge cases that could be associated with > >> running > >> > >>>> any > >> > >>>>>> previous version that may or may not have been part of an official > >> > >>>> release. > >> > >>>>>> > >> > >>>>>> Let's see if anyone else reports the issue. If no one does, one > >> option > >> > >>>> is > >> > >>>>>> to release 1.8.0 as is with a comment in the release notes, and > >> have a > >> > >>>>>> future official minor apache release 1.8.1 that would fix these > >> minor > >> > >>>>>> issues that are not deal breaker. > >> > >>>>>> > >> > >>>>>> @bolke, I'm curious, how long does it take you to go through one > >> > >>> release > >> > >>>>>> cycle? Oh, and do you have a documented step by step process for > >> > >>>> releasing? > >> > >>>>>> I'd like to add the Pypi part to this doc and add committers that > >> are > >> > >>>>>> interested to have rights on the project on Pypi. > >> > >>>>>> > >> > >>>>>> Max > >> > >>>>>> > >> > >>>>>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin <bdbr...@gmail.com > >>> > >> > >>>> wrote: > >> > >>>>>>> > >> > >>>>>>> So it is a database integrity issue? Afaik a start_date should > >> always > >> > >>>> be > >> > >>>>>>> set for a DagRun (create_dagrun) does so I didn't check the code > >> > >>>> though. > >> > >>>>>>> > >> > >>>>>>> Sent from my iPhone > >> > >>>>>>> > >> > >>>>>>>> On 22 Feb 2017, at 22:19, Dan Davydov <dan.davy...@airbnb.com. > >> > >>>> INVALID> > >> > >>>>>>> wrote: > >> > >>>>>>>> > >> > >>>>>>>> Should clarify this occurs when a dagrun does not have a start > >> date, > >> > >>>>>> not > >> > >>>>>>> a > >> > >>>>>>>> dag (which makes it even less likely to happen). I don't think > >> this > >> > >>> is > >> > >>>>>> a > >> > >>>>>>>> blocker for releasing. > >> > >>>>>>>> > >> > >>>>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov < > >> > >>> dan.davy...@airbnb.com > >> > >>>>> > >> > >>>>>>> wrote: > >> > >>>>>>>>> > >> > >>>>>>>>> I rolled this out in our prod and the webservers failed to load > >> due > >> > >>>> to > >> > >>>>>>>>> this commit: > >> > >>>>>>>>> > >> > >>>>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag > >> > >>>>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72 > >> > >>>>>>>>> > >> > >>>>>>>>> This fixed it: > >> > >>>>>>>>> - </a> <span id="statuses_info" > >> > >>>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true" > >> > >>> title="Start > >> > >>>>>>> Date: > >> > >>>>>>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"></span> > >> > >>>>>>>>> + </a> <span id="statuses_info" > >> > >>>>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> > >> > >>>>>>>>> > >> > >>>>>>>>> This is caused by assuming that all DAGs have start dates set, > >> so a > >> > >>>>>>> broken > >> > >>>>>>>>> DAG will take down the whole UI. Not sure if we want to make > >> this a > >> > >>>>>>> blocker > >> > >>>>>>>>> for the release or not, I'm guessing for most deployments this > >> > >>> would > >> > >>>>>>> occur > >> > >>>>>>>>> pretty rarely. I'll submit a PR to fix it soon. > >> > >>>>>>>>> > >> > >>>>>>>>> > >> > >>>>>>>>> > >> > >>>>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini < > >> > >>>>>> criccom...@apache.org > >> > >>>>>>>> > >> > >>>>>>>>> wrote: > >> > >>>>>>>>> > >> > >>>>>>>>>> Ack that the vote has already passed, but belated +1 (binding) > >> > >>>>>>>>>> > >> > >>>>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin < > >> > >>> bdbr...@gmail.com> > >> > >>>>>>>>>> wrote: > >> > >>>>>>>>>> > >> > >>>>>>>>>>> IPMC Voting can be found here: > >> > >>>>>>>>>>> > >> > >>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/ > >> > >>>>>>>>>> 201702.mbox/% > >> > >>>>>>>>>>> 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com%3e < > >> > >>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/ > >> > >>>>>>>>>> 201702.mbox/% > >> > >>>>>>>>>>> 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com%3E> > >> > >>>>>>>>>>> > >> > >>>>>>>>>>> Kind regards, > >> > >>>>>>>>>>> Bolke > >> > >>>>>>>>>>> > >> > >>>>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin <bdbr...@gmail.com> > >> > >>>>>> wrote: > >> > >>>>>>>>>>>> > >> > >>>>>>>>>>>> Hello, > >> > >>>>>>>>>>>> > >> > >>>>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has been > >> > >>>> accepted. > >> > >>>>>>>>>>>> > >> > >>>>>>>>>>>> 9 “+1” votes received: > >> > >>>>>>>>>>>> > >> > >>>>>>>>>>>> - Maxime Beauchemin (binding) > >> > >>>>>>>>>>>> - Arthur Wiedmer (binding) > >> > >>>>>>>>>>>> - Dan Davydov (binding) > >> > >>>>>>>>>>>> - Jeremiah Lowin (binding) > >> > >>>>>>>>>>>> - Siddharth Anand (binding) > >> > >>>>>>>>>>>> - Alex van Boxel (binding) > >> > >>>>>>>>>>>> - Bolke de Bruin (binding) > >> > >>>>>>>>>>>> > >> > >>>>>>>>>>>> - Jayesh Senjaliya (non-binding) > >> > >>>>>>>>>>>> - Yi (non-binding) > >> > >>>>>>>>>>>> > >> > >>>>>>>>>>>> Vote thread (start): > >> > >>>>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- > >> > >>>>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188- > >> > >>>>>>>>>>> 6c92c31a2...@gmail.com%3e <http://mail-archives.apache. > >> > >>>>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/%3C7EB7B6D6- > >> > >>>>>>>>>> 092E-48D2-AA0F- > >> > >>>>>>>>>>> 15f44376a...@gmail.com%3E> > >> > >>>>>>>>>>>> > >> > >>>>>>>>>>>> Next steps: > >> > >>>>>>>>>>>> 1) will start the voting process at the IPMC mailinglist. I do > >> > >>>>>> expect > >> > >>>>>>>>>>> some changes to be required mostly in documentation maybe a > >> > >>> license > >> > >>>>>>> here > >> > >>>>>>>>>>> and there. So, we might end up with changes to stable. As long > >> as > >> > >>>>>>> these > >> > >>>>>>>>>> are > >> > >>>>>>>>>>> not (significant) code changes I will not re-raise the vote. > >> > >>>>>>>>>>>> 2) Only after the positive voting on the IPMC and > >> finalisation I > >> > >>>>>> will > >> > >>>>>>>>>>> rebrand the RC to Release. > >> > >>>>>>>>>>>> 3) I will upload it to the incubator release page, then the > >> tar > >> > >>>>>> ball > >> > >>>>>>>>>>> needs to propagate to the mirrors. > >> > >>>>>>>>>>>> 4) Update the website (can someone volunteer please?) > >> > >>>>>>>>>>>> 5) Finally, I will ask Maxime to upload it to pypi. It seems > >> we > >> > >>>> can > >> > >>>>>>>>>> keep > >> > >>>>>>>>>>> the apache branding as lib cloud is doing this as well ( > >> > >>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package < > >> > >>>>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package>). > >> > >>>>>>>>>>>> > >> > >>>>>>>>>>>> Jippie! > >> > >>>>>>>>>>>> > >> > >>>>>>>>>>>> Bolke > >> > >>>>>>>>>>> > >> > >>>>>>>>>>> > >> > >>>>>>>>>> > >> > >>>>>>>>> > >> > >>>>>>>>> > >> > >>>>>>> > >> > >>>>>> > >> > >>>> > >> > >>> > >> > >