State of to be Alpha 3: Blockers:
* XCom throws an duplicate / locking error (new - confirmed) * one_failed task not being run: now seems to pass suddenly (so fixed?) -> need to investigate why Fixed issues * Regression in email * LDAP case sensitivity Pending features: * DAG.catchup : minor changes needed, documentation still required, integration tests seem to pass flawlessly * Cgroups + impersonation: clean up of patches on going, more tests and more elaborate documentation required. Integration tests not executed yet * Schedule all pending DAG runs in a single scheduler loop: no progress (**) * Email attachments * Add execution_date to trigger_dag If the pending features are merged within a reasonable time frame (except for **, as no progress currently) then I am planning to mark the tarball as Beta and only allow bug fixes and (very) minor features. Hopefully end of next week. Bolke. > On 5 Jan 2017, at 20:07, Chris Riccomini <criccom...@apache.org> wrote: > > I have merged Robin's LDAP patch. > > On Thu, Jan 5, 2017 at 10:39 AM, Chris Riccomini <criccom...@apache.org> > wrote: > >> I have found the email problem: >> >> https://issues.apache.org/jira/browse/AIRFLOW-734 >> >> Working on a fix. >> >> On Thu, Jan 5, 2017 at 9:10 AM, Chris Riccomini <criccom...@apache.org> >> wrote: >> >>> Also, I'm seeing a second issue: >>> >>> SMTP doesn't seem to work for us anymore: >>> >>> [2017-01-05 15:10:13,666] {models.py:1378} ERROR - SMTP AUTH extension not >>> supported by server. >>> Traceback (most recent call last): >>> File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1374, in >>> handle_failure >>> self.email_alert(error, is_retry=False) >>> File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1521, in >>> email_alert >>> send_email(task.email, title, body) >>> File "/usr/lib/python2.7/site-packages/airflow/utils/email.py", line 43, >>> in send_email >>> return backend(to, subject, html_content, files=files, dryrun=dryrun, >>> cc=cc, bcc=bcc, mime_subtype=mime_subtype) >>> File "/usr/lib/python2.7/site-packages/airflow/utils/email.py", line 84, >>> in send_email_smtp >>> send_MIME_email(SMTP_MAIL_FROM, recipients, msg, dryrun) >>> File "/usr/lib/python2.7/site-packages/airflow/utils/email.py", line 100, >>> in send_MIME_email >>> s.login(SMTP_USER, SMTP_PASSWORD) >>> File "/usr/lib64/python2.7/smtplib.py", line 584, in login >>> raise SMTPException("SMTP AUTH extension not supported by server.") >>> SMTPException: SMTP AUTH extension not supported by server. >>> >>> This was working on 1.7.1.2. Having a look now. >>> >>> On Thu, Jan 5, 2017 at 9:09 AM, Chris Riccomini <criccom...@apache.org> >>> wrote: >>> >>>> Hey Robin, >>>> >>>> Awesome, thanks! I love open source. :) Let me try your patch out and >>>> merge it. >>>> >>>> Cheers, >>>> Chris >>>> >>>> On Thu, Jan 5, 2017 at 1:34 AM, Miller, Robin < >>>> robin.mil...@affiliate.oliverwyman.com> wrote: >>>> >>>>> Hi Chris, >>>>> >>>>> >>>>> I think I ran into this issue when setting up LDAP Auth in our >>>>> environment (we're using very close to master as we needed some of the >>>>> newer features/bugfixes). The problem turned out to be that the search was >>>>> finding no results, so the line: >>>>> >>>>> >>>>> groups_list = [regex.search(i).group(1) for i in user_groups] >>>>> >>>>> >>>>> would fail because it had no matching groups to return. This turned out >>>>> to be because Windows Active Directory (the LDAP server we're using) >>>>> returned capitals "CN=" where the code expected lowercase: regex = >>>>> re.compile("cn=([^,]*).*") >>>>> >>>>> >>>>> I haven't looked it up, but Windows Active Directory is case >>>>> insensitive when it comes to usernames and groups, so I wouldn't be >>>>> surprised if the protocol itself is case insensitive and both "cn=" and >>>>> "CN=" should be considered valid. As such I've a PR open for a simple fix >>>>> to make this regex case insensitive: https://github.com/apache/incu >>>>> bator-airflow/pull/1945 >>>>> >>>>> >>>>> Hopefully this helps, >>>>> >>>>> Robin Miller >>>>> OLIVER WYMAN >>>>> robin.mil...@affiliate.oliverwyman.com<mailto:robin.miller@a >>>>> ffiliate.oliverwyman.com> >>>>> www.oliverwyman.com<http://www.oliverwyman.com/> >>>>> >>>>> ________________________________ >>>>> From: Chris Riccomini <criccom...@apache.org> >>>>> Sent: 05 January 2017 00:34:27 >>>>> To: dev@airflow.incubator.apache.org >>>>> Subject: Re: Airflow 1.8.0 alpha 2 >>>>> >>>>> I am now running 1.8.0a2 in our dev environment. It seems to be >>>>> functioning >>>>> well. >>>>> >>>>> One issue we've hit is that the LDAP auth plugin isn't working for us >>>>> anymore: >>>>> >>>>> Traceback (most recent call last): >>>>> File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1988, in >>>>> wsgi_app >>>>> response = self.full_dispatch_request() >>>>> File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1641, in >>>>> full_dispatch_request >>>>> rv = self.handle_user_exception(e) >>>>> File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1544, in >>>>> handle_user_exception >>>>> reraise(exc_type, exc_value, tb) >>>>> File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1639, in >>>>> full_dispatch_request >>>>> rv = self.dispatch_request() >>>>> File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1625, in >>>>> dispatch_request >>>>> return self.view_functions[rule.endpoint](**req.view_args) >>>>> File "/usr/lib64/python2.7/site-packages/flask_admin/base.py", line >>>>> 69, >>>>> in inner >>>>> return self._run_view(f, *args, **kwargs) >>>>> File "/usr/lib64/python2.7/site-packages/flask_admin/base.py", line >>>>> 368, >>>>> in _run_view >>>>> return fn(self, *args, **kwargs) >>>>> File "/usr/lib/python2.7/site-packages/airflow/www/views.py", line >>>>> 657, >>>>> in login >>>>> return airflow.login.login(self, request) >>>>> File >>>>> "/usr/lib/python2.7/site-packages/airflow/contrib/auth/backe >>>>> nds/ldap_auth.py", >>>>> line 276, in login >>>>> flask_login.login_user(LdapUser(user)) >>>>> File "<string>", line 4, in __init__ >>>>> File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/state.py", >>>>> line >>>>> 306, in _initialize_instance >>>>> manager.dispatch.init_failure(self, args, kwargs) >>>>> File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelp >>>>> ers.py", >>>>> line 60, in __exit__ >>>>> compat.reraise(exc_type, exc_value, exc_tb) >>>>> File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/state.py", >>>>> line >>>>> 303, in _initialize_instance >>>>> return manager.original_init(*mixed[1:], **kwargs) >>>>> File >>>>> "/usr/lib/python2.7/site-packages/airflow/contrib/auth/backe >>>>> nds/ldap_auth.py", >>>>> line 148, in __init__ >>>>> user.username) >>>>> File >>>>> "/usr/lib/python2.7/site-packages/airflow/contrib/auth/backe >>>>> nds/ldap_auth.py", >>>>> line 106, in groups_user >>>>> groups_list = [regex.search(i).group(1) for i in user_groups] >>>>> AttributeError: 'NoneType' object has no attribute 'group' >>>>> >>>>> I believe it's from this patch: >>>>> >>>>> https://github.com/apache/incubator-airflow/commit/d6d3f5367 >>>>> 3ba3736d7a858531823933cfef2bb4e >>>>> >>>>> I haven't dug into it yet. We'll need to fix it, though. In the >>>>> meantime, I >>>>> just commented out the call to `groups_user`. More to come. >>>>> >>>>> On Wed, Jan 4, 2017 at 12:32 PM, Alex Van Boxel <a...@vanboxel.be> >>>>> wrote: >>>>> >>>>>> I have another fix that certainly need to be in the final release, >>>>> but not >>>>>> ready to merge due to failed tests: >>>>>> >>>>>> https://github.com/apache/incubator-airflow/pull/1961 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Jan 4, 2017 at 8:48 PM Chris Riccomini <criccom...@apache.org >>>>>> >>>>>> wrote: >>>>>> >>>>>>> Great, I will work toward deploying this in our dev cluster today. >>>>> :D >>>>>>> >>>>>>> On Wed, Jan 4, 2017 at 11:47 AM, Bolke de Bruin <bdbr...@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>>> Some issues remain: >>>>>>>> >>>>>>>> * one_failed not executed as dag run is marked failed seemingly >>>>>>>> prematurely (@chris yes you should see this, see below for an >>>>> example >>>>>>> that >>>>>>>> is not working properly), confirmed regression >>>>>>>> * celery instability Alex >>>>>>>> * Wrong DAG state after failure inside branch >>>>>>>> >>>>>>>> So Alpha 2 is definitely not ready for production, but please do >>>>> put in >>>>>>>> your canary dags and let them run. I am still quite concerned >>>>> about the >>>>>>>> scheduler integrity and stability. >>>>>>>> >>>>>>>> - Bolke >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> one_failed_not_executed.py >>>>>>>> ==== >>>>>>>> from airflow import DAG >>>>>>>> from airflow.operators.bash_operator import BashOperator >>>>>>>> from airflow.operators.dummy_operator import DummyOperator >>>>>>>> from datetime import datetime, timedelta >>>>>>>> >>>>>>>> default_args = { >>>>>>>> 'owner': 'airflow', >>>>>>>> 'depends_on_past': False, >>>>>>>> 'start_date': datetime(2016,10,5,19), >>>>>>>> 'email': ['airf...@airflow.com'], >>>>>>>> 'email_on_failure': False, >>>>>>>> 'email_on_retry': False, >>>>>>>> 'retries': 1, >>>>>>>> 'retry_delay': timedelta(seconds=1), >>>>>>>> } >>>>>>>> >>>>>>>> dag = DAG('tutorial', default_args=default_args, >>>>>>> schedule_interval='@once') >>>>>>>> >>>>>>>> task1 = BashOperator( >>>>>>>> task_id='first_one', >>>>>>>> bash_command='date', >>>>>>>> dag=dag) >>>>>>>> >>>>>>>> task2 = BashOperator( >>>>>>>> task_id='second_one', >>>>>>>> bash_command='this_should_not_work', >>>>>>>> dag=dag) >>>>>>>> >>>>>>>> task2.set_upstream(task1) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> task3 = BashOperator( >>>>>>>> task_id='third_one', >>>>>>>> bash_command='random_command_third', >>>>>>>> dag=dag) >>>>>>>> >>>>>>>> task3.set_upstream(task2) >>>>>>>> >>>>>>>> fail_task = DummyOperator( >>>>>>>> task_id='one_failed', >>>>>>>> trigger_rule='one_failed', >>>>>>>> dag=dag) >>>>>>>> >>>>>>>> fail_task.set_upstream([task1,task2,task3]) >>>>>>>> >>>>>>>>> On 4 Jan 2017, at 20:35, Chris Riccomini <criccom...@apache.org >>>>>> >>>>>>> wrote: >>>>>>>>> >>>>>>>>> Bolke, can you describe the current state of the alpha 2 >>>>> release? I >>>>>> saw >>>>>>>>> some comments from Alex yesterday about celery instability. If >>>>> I'm >>>>>>>> running >>>>>>>>> on LocalExecutor, should I be seeing any issues? >>>>>>>>> >>>>>>>>> On Wed, Jan 4, 2017 at 8:20 AM, Bolke de Bruin < >>>>> bdbr...@gmail.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi All, >>>>>>>>>> >>>>>>>>>> I have put up Airflow 1.8.0 alpha 2 in >>>>> https://people.apache.org/~ >>>>>>>> bolke/ < >>>>>>>>>> https://people.apache.org/~bolke/> . >>>>>>>>>> >>>>>>>>>> Note: This still cannot be considered an Apache release. >>>>> Working on >>>>>>>> this. >>>>>>>>>> >>>>>>>>>> This build is signed (note it is served over https). >>>>>>>>>> >>>>>>>>>> Changes are in the area of scheduler stability. >>>>>>>>>> >>>>>>>>>> - Bolke >>>>>>>> >>>>>>>> >>>>>>> >>>>>> -- >>>>>> _/ >>>>>> _/ Alex Van Boxel >>>>>> >>>>> >>>>> ________________________________ >>>>> This e-mail and any attachments may be confidential or legally >>>>> privileged. If you received this message in error or are not the intended >>>>> recipient, you should destroy the e-mail message and any attachments or >>>>> copies, and you are prohibited from retaining, distributing, disclosing or >>>>> using any information contained herein. Please inform us of the erroneous >>>>> delivery by return e-mail. Thank you for your cooperation. >>>>> >>>> >>>> >>> >>