Interesting if this is related to what I was seeing -- but to be clear the error I observed is non-deterministic and doesn't happen every time (obviously, because otherwise there would be no passing Travis runs). Is that the case for what you're describing, Dan/Alex?
On Sat, Feb 25, 2017 at 4:13 AM Alex Van Boxel <a...@vanboxel.be> wrote: About: Skipped tasks potentially cause a dagrun to be marked failure/success prematurely. Isn't that related to the discussion I had with Max about the ONE_SUCCESS trigger? When skipping tasks for now you need to put ONE_SUCCESS. I had kind of a fix but it was rejected because it changed behaviour. On Sat, Feb 25, 2017 at 9:19 AM Bolke de Bruin <bdbr...@gmail.com> wrote: > Not trying to muddy the waters, but the observation of Jeremiah (non > deterministic outcomes) might have to do something with #3. I didn’t dive > in deeper, yet. > > ====================================================================== > ERROR: test_backfill_examples (tests.BackfillJobTest) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/home/travis/build/apache/incubator-airflow/tests/jobs.py", line > 164, in test_backfill_examples > job.run() > File "/home/travis/build/apache/incubator-airflow/airflow/jobs.py", line > 200, in run > self._execute() > File "/home/travis/build/apache/incubator-airflow/airflow/jobs.py", line > 1999, in _execute > raise AirflowException(err) > AirflowException: --------------------------------------------------- > Some task instances failed: > set([('example_short_circuit_operator', 'condition_is_True', > datetime.datetime(2016, 1, 1, 0, 0))]) > https://s3.amazonaws.com/archive.travis-ci.org/jobs/204780706/log.txt < > https://s3.amazonaws.com/archive.travis-ci.org/jobs/204780706/log.txt> > > Bolke > > > On 25 Feb 2017, at 09:07, Bolke de Bruin <bdbr...@gmail.com> wrote: > > > > Hi Dan, > > > > - Backfill indeed runs only one dagrun at the time, see line 1755 of > jobs.py. I’ll think about how to fix this over the weekend (I think it was > my change that introduced this). Suggestions always welcome. Depending the > impact it is a blocker or not. We don’t often use backfills and definitely > not at your size, so that is why it didn’t pop up with us. I’m assuming > blocker for now, btw. > > - Speculation on the High DB Load. I’m not sure what your benchmark is > here (1.7.1 + multi processor dags?), but as you mentioned in the code > dependencies are checked a couple of times for one run and even task > instance. Dependency checking requires aggregation on the DB, which is a > performance killer. Annoying but not a blocker. > > - Skipped tasks potentially cause a dagrun to be marked failure/success > prematurely. BranchOperators are widely used if it affects these operators, > then it is a blocker. > > > > - Bolke > > > >> On 25 Feb 2017, at 02:04, Dan Davydov <dan.davy...@airbnb.com.INVALID> > wrote: > >> > >> Update on old pending issues: > >> - Black Squares in UI: Fix merged > >> - Double Trigger Issue That Alex G Mentioned: Alex has a PR in flight > >> > >> New Issues: > >> - Backfill seems to be having issues (only running one dagrun at a > time), > >> we are still investigating - might be a blocker > >> - High DB Load (~8x more than 1.7) - We are still investigating but it's > >> probably not a blocker for the release > >> - Skipped tasks potentially cause a dagrun to be marked as > failure/success > >> prematurely - not sure whether or not to classify this as a blocker > (only > >> really an issue for users who use the BranchingPythonOperator, which > AirBnB > >> does) > >> > >> On Thu, Feb 23, 2017 at 5:59 PM, siddharth anand <san...@apache.org> > wrote: > >> > >>> IMHO, a DAG run without a start date is non-sensical but is not > enforced > >>> That said, our UI allows for the manual creation of DAG Runs without a > >>> start date as shown in the images below: > >>> > >>> > >>> - https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot% > >>> 202017-02-22%2016.00.40.png?dl=0 > >>> <https://www.dropbox.com/s/3sxcqh04eztpl7p/Screenshot% > >>> 202017-02-22%2016.00.40.png?dl=0> > >>> - https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot% > >>> 202017-02-22%2016.02.22.png?dl=0 > >>> <https://www.dropbox.com/s/4q6rr9dwghag1yy/Screenshot% > >>> 202017-02-22%2016.02.22.png?dl=0> > >>> > >>> > >>> On Wed, Feb 22, 2017 at 2:26 PM, Maxime Beauchemin < > >>> maximebeauche...@gmail.com> wrote: > >>> > >>>> Our database may have edge cases that could be associated with running > >>> any > >>>> previous version that may or may not have been part of an official > >>> release. > >>>> > >>>> Let's see if anyone else reports the issue. If no one does, one > option is > >>>> to release 1.8.0 as is with a comment in the release notes, and have a > >>>> future official minor apache release 1.8.1 that would fix these minor > >>>> issues that are not deal breaker. > >>>> > >>>> @bolke, I'm curious, how long does it take you to go through one > release > >>>> cycle? Oh, and do you have a documented step by step process for > >>> releasing? > >>>> I'd like to add the Pypi part to this doc and add committers that are > >>>> interested to have rights on the project on Pypi. > >>>> > >>>> Max > >>>> > >>>> On Wed, Feb 22, 2017 at 2:00 PM, Bolke de Bruin <bdbr...@gmail.com> > >>> wrote: > >>>> > >>>>> So it is a database integrity issue? Afaik a start_date should always > >>> be > >>>>> set for a DagRun (create_dagrun) does so I didn't check the code > >>> though. > >>>>> > >>>>> Sent from my iPhone > >>>>> > >>>>>> On 22 Feb 2017, at 22:19, Dan Davydov <dan.davy...@airbnb.com. > >>> INVALID> > >>>>> wrote: > >>>>>> > >>>>>> Should clarify this occurs when a dagrun does not have a start date, > >>>> not > >>>>> a > >>>>>> dag (which makes it even less likely to happen). I don't think this > >>> is > >>>> a > >>>>>> blocker for releasing. > >>>>>> > >>>>>>> On Wed, Feb 22, 2017 at 1:15 PM, Dan Davydov < > >>> dan.davy...@airbnb.com> > >>>>> wrote: > >>>>>>> > >>>>>>> I rolled this out in our prod and the webservers failed to load due > >>> to > >>>>>>> this commit: > >>>>>>> > >>>>>>> [AIRFLOW-510] Filter Paused Dags, show Last Run & Trigger Dag > >>>>>>> 7c94d81c390881643f94d5e3d7d6fb351a445b72 > >>>>>>> > >>>>>>> This fixed it: > >>>>>>> - </a> <span id="statuses_info" > >>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true" > >>> title="Start > >>>>> Date: > >>>>>>> {{last_run.start_date.strftime('%Y-%m-%d %H:%M')}}"></span> > >>>>>>> + </a> <span id="statuses_info" > >>>>>>> class="glyphicon glyphicon-info-sign" aria-hidden="true"></span> > >>>>>>> > >>>>>>> This is caused by assuming that all DAGs have start dates set, so a > >>>>> broken > >>>>>>> DAG will take down the whole UI. Not sure if we want to make this a > >>>>> blocker > >>>>>>> for the release or not, I'm guessing for most deployments this > would > >>>>> occur > >>>>>>> pretty rarely. I'll submit a PR to fix it soon. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Feb 21, 2017 at 9:49 AM, Chris Riccomini < > >>>> criccom...@apache.org > >>>>>> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Ack that the vote has already passed, but belated +1 (binding) > >>>>>>>> > >>>>>>>> On Tue, Feb 21, 2017 at 7:42 AM, Bolke de Bruin < > bdbr...@gmail.com > >>>> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> IPMC Voting can be found here: > >>>>>>>>> > >>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/ > >>>>>>>> 201702.mbox/% > >>>>>>>>> 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com%3e < > >>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-general/ > >>>>>>>> 201702.mbox/% > >>>>>>>>> 3c676bdc9f-1b55-4469-92a7-9ff309ad0...@gmail.com%3E> > >>>>>>>>> > >>>>>>>>> Kind regards, > >>>>>>>>> Bolke > >>>>>>>>> > >>>>>>>>>> On 21 Feb 2017, at 08:20, Bolke de Bruin <bdbr...@gmail.com> > >>>> wrote: > >>>>>>>>>> > >>>>>>>>>> Hello, > >>>>>>>>>> > >>>>>>>>>> Apache Airflow (incubating) 1.8.0 (based on RC4) has been > >>> accepted. > >>>>>>>>>> > >>>>>>>>>> 9 “+1” votes received: > >>>>>>>>>> > >>>>>>>>>> - Maxime Beauchemin (binding) > >>>>>>>>>> - Arthur Wiedmer (binding) > >>>>>>>>>> - Dan Davydov (binding) > >>>>>>>>>> - Jeremiah Lowin (binding) > >>>>>>>>>> - Siddharth Anand (binding) > >>>>>>>>>> - Alex van Boxel (binding) > >>>>>>>>>> - Bolke de Bruin (binding) > >>>>>>>>>> > >>>>>>>>>> - Jayesh Senjaliya (non-binding) > >>>>>>>>>> - Yi (non-binding) > >>>>>>>>>> > >>>>>>>>>> Vote thread (start): > >>>>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator- > >>>>>>>>> airflow-dev/201702.mbox/%3cD360D9BE-C358-42A1-9188- > >>>>>>>>> 6c92c31a2...@gmail.com%3e <http://mail-archives.apache. > >>>>>>>>> org/mod_mbox/incubator-airflow-dev/201702.mbox/%3C7EB7B6D6- > >>>>>>>> 092E-48D2-AA0F- > >>>>>>>>> 15f44376a...@gmail.com%3E> > >>>>>>>>>> > >>>>>>>>>> Next steps: > >>>>>>>>>> 1) will start the voting process at the IPMC mailinglist. I do > >>>> expect > >>>>>>>>> some changes to be required mostly in documentation maybe a > >>> license > >>>>> here > >>>>>>>>> and there. So, we might end up with changes to stable. As long as > >>>>> these > >>>>>>>> are > >>>>>>>>> not (significant) code changes I will not re-raise the vote. > >>>>>>>>>> 2) Only after the positive voting on the IPMC and finalisation I > >>>> will > >>>>>>>>> rebrand the RC to Release. > >>>>>>>>>> 3) I will upload it to the incubator release page, then the tar > >>>> ball > >>>>>>>>> needs to propagate to the mirrors. > >>>>>>>>>> 4) Update the website (can someone volunteer please?) > >>>>>>>>>> 5) Finally, I will ask Maxime to upload it to pypi. It seems we > >>> can > >>>>>>>> keep > >>>>>>>>> the apache branding as lib cloud is doing this as well ( > >>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package < > >>>>>>>>> https://libcloud.apache.org/downloads.html#pypi-package>). > >>>>>>>>>> > >>>>>>>>>> Jippie! > >>>>>>>>>> > >>>>>>>>>> Bolke > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>>> > >>> > > > > -- _/ _/ Alex Van Boxel