[jira] [Created] (AIRFLOW-578) BaseJob does not check return code of a process
Ze Wang created AIRFLOW-578: --- Summary: BaseJob does not check return code of a process Key: AIRFLOW-578 URL: https://issues.apache.org/jira/browse/AIRFLOW-578 Project: Apache Airflow Issue Type: Bug Components: scheduler Reporter: Ze Wang BaseJob ignores the return code of the spawned process. which makes even that process is killed or returned abnormally, it will think it finishes with success -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (AIRFLOW-575) Improve tutorial information about default_args
[ https://issues.apache.org/jira/browse/AIRFLOW-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-575. Resolution: Fixed > Improve tutorial information about default_args > --- > > Key: AIRFLOW-575 > URL: https://issues.apache.org/jira/browse/AIRFLOW-575 > Project: Apache Airflow > Issue Type: Improvement > Components: Documentation >Reporter: Laura Lorenz >Assignee: Laura Lorenz >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-575) Improve tutorial information about default_args
[ https://issues.apache.org/jira/browse/AIRFLOW-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582719#comment-15582719 ] ASF subversion and git services commented on AIRFLOW-575: - Commit 80d3c8d461f1c95d173aa72a055737d8ad379ae1 in incubator-airflow's branch refs/heads/master from [~lauralorenz] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=80d3c8d ] [AIRFLOW-575] Clarify tutorial and FAQ about `schedule_interval` always inheriting from DAG object - Update the tutorial with a comment helping to explain the use of default_args and include all the possible parameters in line - Clarify in the FAQ the possibility of an unexpected default `schedule_interval`in case airflow users mistakenly try to overwrite the default `schedule_interval` in a DAG's `default_args` parameter > Improve tutorial information about default_args > --- > > Key: AIRFLOW-575 > URL: https://issues.apache.org/jira/browse/AIRFLOW-575 > Project: Apache Airflow > Issue Type: Improvement > Components: Documentation >Reporter: Laura Lorenz >Assignee: Laura Lorenz >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[1/2] incubator-airflow git commit: [AIRFLOW-575] Clarify tutorial and FAQ about `schedule_interval` always inheriting from DAG object
Repository: incubator-airflow Updated Branches: refs/heads/master 0235d59d0 -> 916f1eb2f [AIRFLOW-575] Clarify tutorial and FAQ about `schedule_interval` always inheriting from DAG object - Update the tutorial with a comment helping to explain the use of default_args and include all the possible parameters in line - Clarify in the FAQ the possibility of an unexpected default `schedule_interval`in case airflow users mistakenly try to overwrite the default `schedule_interval` in a DAG's `default_args` parameter Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/80d3c8d4 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/80d3c8d4 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/80d3c8d4 Branch: refs/heads/master Commit: 80d3c8d461f1c95d173aa72a055737d8ad379ae1 Parents: 11ad53a Author: lauralorenzAuthored: Tue Apr 19 17:03:46 2016 -0400 Committer: lauralorenz Committed: Mon Oct 17 12:36:38 2016 -0400 -- airflow/example_dags/tutorial.py | 14 -- docs/faq.rst | 5 + 2 files changed, 17 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/80d3c8d4/airflow/example_dags/tutorial.py -- diff --git a/airflow/example_dags/tutorial.py b/airflow/example_dags/tutorial.py index 9462463..e929389 100644 --- a/airflow/example_dags/tutorial.py +++ b/airflow/example_dags/tutorial.py @@ -10,6 +10,8 @@ from datetime import datetime, timedelta seven_days_ago = datetime.combine(datetime.today() - timedelta(7), datetime.min.time()) +# these args will get passed on to each operator +# you can override them on a per-task basis during operator initialization default_args = { 'owner': 'airflow', 'depends_on_past': False, @@ -22,11 +24,19 @@ default_args = { # 'queue': 'bash_queue', # 'pool': 'backfill', # 'priority_weight': 10, -# 'schedule_interval': timedelta(1), # 'end_date': datetime(2016, 1, 1), +# 'wait_for_downstream': False, +# 'dag': dag, +# 'adhoc':False, +# 'sla': timedelta(hours=2), +# 'execution_timeout': timedelta(seconds=300), +# 'on_failure_callback': some_function, +# 'on_success_callback': some_other_function, +# 'on_retry_callback': another_function, +# 'trigger_rule': u'all_success' } -dag = DAG('tutorial', default_args=default_args) +dag = DAG('tutorial', default_args=default_args, schedule_interval=timedelta(days=1)) # t1, t2 and t3 are examples of tasks created by instantiating operators t1 = BashOperator( http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/80d3c8d4/docs/faq.rst -- diff --git a/docs/faq.rst b/docs/faq.rst index 6418dcb..b5b28af 100644 --- a/docs/faq.rst +++ b/docs/faq.rst @@ -17,6 +17,11 @@ Here are some of the common causes: - Is your ``start_date`` set properly? The Airflow scheduler triggers the task soon after the ``start_date + scheduler_interval`` is passed. +- Is your ``schedule_interval`` set properly? The default ``schedule_interval`` + is one day (``datetime.timedelta(1)``). You must specify a different ``schedule_interval`` + directly to the DAG object you instantiate, not as a ``default_param``, as task instances + do not override their parent DAG's ``schedule_interval``. + - Is your ``start_date`` beyond where you can see it in the UI? If you set your it to some time say 3 months ago, you won't be able to see it in the main view in the UI, but you should be able to see it in the
[2/2] incubator-airflow git commit: Merge pull request #1402 from lauralorenz/schedule_interval_default_args_docs
Merge pull request #1402 from lauralorenz/schedule_interval_default_args_docs Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/916f1eb2 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/916f1eb2 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/916f1eb2 Branch: refs/heads/master Commit: 916f1eb2feedae4f4d827466cfe91821ef30f885 Parents: 0235d59 80d3c8d Author: Arthur WiedmerAuthored: Mon Oct 17 09:46:57 2016 -0700 Committer: Arthur Wiedmer Committed: Mon Oct 17 09:46:57 2016 -0700 -- airflow/example_dags/tutorial.py | 14 -- docs/faq.rst | 5 + 2 files changed, 17 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/916f1eb2/airflow/example_dags/tutorial.py -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/916f1eb2/docs/faq.rst --
[jira] [Updated] (AIRFLOW-575) Improve tutorial information about default_args
[ https://issues.apache.org/jira/browse/AIRFLOW-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laura Lorenz updated AIRFLOW-575: - External issue URL: https://github.com/apache/incubator-airflow/pull/1402 > Improve tutorial information about default_args > --- > > Key: AIRFLOW-575 > URL: https://issues.apache.org/jira/browse/AIRFLOW-575 > Project: Apache Airflow > Issue Type: Improvement > Components: Documentation >Reporter: Laura Lorenz >Assignee: Laura Lorenz >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (AIRFLOW-575) Improve tutorial information about default_args
[ https://issues.apache.org/jira/browse/AIRFLOW-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-575 started by Laura Lorenz. > Improve tutorial information about default_args > --- > > Key: AIRFLOW-575 > URL: https://issues.apache.org/jira/browse/AIRFLOW-575 > Project: Apache Airflow > Issue Type: Improvement > Components: Documentation >Reporter: Laura Lorenz >Assignee: Laura Lorenz >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (AIRFLOW-577) BigQuery Hook failure message too opaque
[ https://issues.apache.org/jira/browse/AIRFLOW-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini closed AIRFLOW-577. --- Resolution: Fixed Fix Version/s: Airflow 1.8 Merged! > BigQuery Hook failure message too opaque > > > Key: AIRFLOW-577 > URL: https://issues.apache.org/jira/browse/AIRFLOW-577 > Project: Apache Airflow > Issue Type: Bug >Reporter: Georg Walther > Fix For: Airflow 1.8 > > > The BigQuery service returns routinely opaque error messages such as "Too > many errors ..." - the Airflow BigQuery hook returns this opaque error > message by accessing the respective keys in the job dictionary: > "job['status']['errorResult']" > When debugging BigQuery issues in Airflow we routinely need to try and step > into the BigQuery hook to inspect the job dictionary for further hints at > what caused the error. Therefore it would help to output the BigQuery hook > job dictionary in the first place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
incubator-airflow git commit: [AIRFLOW-577] Output BigQuery job for improved debugging
Repository: incubator-airflow Updated Branches: refs/heads/master e36f9a750 -> 0235d59d0 [AIRFLOW-577] Output BigQuery job for improved debugging Closes #1838 from waltherg/fix/bq_error_message Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/0235d59d Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/0235d59d Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/0235d59d Branch: refs/heads/master Commit: 0235d59d052524d0d773e07b13867691223f9904 Parents: e36f9a7 Author: Georg WaltherAuthored: Mon Oct 17 08:51:56 2016 -0700 Committer: Chris Riccomini Committed: Mon Oct 17 08:51:56 2016 -0700 -- airflow/contrib/hooks/bigquery_hook.py | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/0235d59d/airflow/contrib/hooks/bigquery_hook.py -- diff --git a/airflow/contrib/hooks/bigquery_hook.py b/airflow/contrib/hooks/bigquery_hook.py index c5b57a9..e8528ac 100644 --- a/airflow/contrib/hooks/bigquery_hook.py +++ b/airflow/contrib/hooks/bigquery_hook.py @@ -435,7 +435,10 @@ class BigQueryBaseCursor(object): # Check if job had errors. if 'errorResult' in job['status']: raise Exception( -'BigQuery job failed. Final error was: %s', job['status']['errorResult']) +'BigQuery job failed. Final error was: {}. The job was: {}'.format( +job['status']['errorResult'], job +) +) return job_id
[jira] [Comment Edited] (AIRFLOW-139) Executing VACUUM with PostgresOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582253#comment-15582253 ] Daniel Zohar edited comment on AIRFLOW-139 at 10/17/16 1:24 PM: Looking back at the original commit - https://github.com/apache/incubator-airflow/commit/28da05d860147b5e0df37d998f437af6a5d4d178 No tests were added, and no real justification for the fix aside for a link to PG release notes. I'd think with the current project standards it wouldn't have been merged in that state. [~underyx] could you please provide more insight into why this was added maybe I'm missing something here. was (Author: dan...@memrise.com): Looking back at the original commit - https://github.com/Memrise/incubator-airflow/commit/28da05d860147b5e0df37d998f437af6a5d4d178 No tests were added, and no real justification for the fix aside for a link to PG release notes. I'd think with the current project standards it wouldn't have been merged in that state. [~underyx] could you please provide more insight into why this was added maybe I'm missing something here. > Executing VACUUM with PostgresOperator > -- > > Key: AIRFLOW-139 > URL: https://issues.apache.org/jira/browse/AIRFLOW-139 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.7.0 >Reporter: Rafael > > Dear Airflow Maintainers, > h1. Environment > * Airflow version: *v1.7.0* > * Airflow components: *PostgresOperator* > * Python Version: *Python 3.5.1* > * Operating System: *15.4.0 Darwin* > h1. Description of Issue > I am trying to execute a `VACUUM` command as part of DAG with the > `PostgresOperator`, which fails with the following error: > {quote} > [2016-05-14 16:14:01,849] {__init__.py:36} INFO - Using executor > SequentialExecutor > Traceback (most recent call last): > File "/usr/local/bin/airflow", line 15, in > args.func(args) > File > "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/bin/cli.py", > line 203, in run > pool=args.pool, > File > "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/models.py", > line 1067, in run > result = task_copy.execute(context=context) > File > "/usr/local/lib/python3.5/site-packages/airflow/operators/postgres_operator.py", > line 39, in execute > self.hook.run(self.sql, self.autocommit, parameters=self.parameters) > File > "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/hooks/dbapi_hook.py", > line 109, in run > cur.execute(s) > psycopg2.InternalError: VACUUM cannot run inside a transaction block > {quote} > I could create a small python script that performs the operation, as > explained in [this stackoverflow > entry](http://stackoverflow.com/questions/1017463/postgresql-how-to-run-vacuum-from-code-outside-transaction-block). > However, I would like to know first if the `VACUUM` command should be > supported by the `PostgresOperator`. > h1. Reproducing the Issue > The operator can be declared as follows: > {quote} > conn = ('postgres_default') > t4 = PostgresOperator( > task_id='vacuum', > postgres_conn_id=conn, > sql=("VACUUM public.table"), > dag=dag > ) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-139) Executing VACUUM with PostgresOperator
[ https://issues.apache.org/jira/browse/AIRFLOW-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582253#comment-15582253 ] Daniel Zohar commented on AIRFLOW-139: -- Looking back at the original commit - https://github.com/Memrise/incubator-airflow/commit/28da05d860147b5e0df37d998f437af6a5d4d178 No tests were added, and no real justification for the fix aside for a link to PG release notes. I'd think with the current project standards it wouldn't have been merged in that state. [~underyx] could you please provide more insight into why this was added maybe I'm missing something here. > Executing VACUUM with PostgresOperator > -- > > Key: AIRFLOW-139 > URL: https://issues.apache.org/jira/browse/AIRFLOW-139 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.7.0 >Reporter: Rafael > > Dear Airflow Maintainers, > h1. Environment > * Airflow version: *v1.7.0* > * Airflow components: *PostgresOperator* > * Python Version: *Python 3.5.1* > * Operating System: *15.4.0 Darwin* > h1. Description of Issue > I am trying to execute a `VACUUM` command as part of DAG with the > `PostgresOperator`, which fails with the following error: > {quote} > [2016-05-14 16:14:01,849] {__init__.py:36} INFO - Using executor > SequentialExecutor > Traceback (most recent call last): > File "/usr/local/bin/airflow", line 15, in > args.func(args) > File > "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/bin/cli.py", > line 203, in run > pool=args.pool, > File > "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/models.py", > line 1067, in run > result = task_copy.execute(context=context) > File > "/usr/local/lib/python3.5/site-packages/airflow/operators/postgres_operator.py", > line 39, in execute > self.hook.run(self.sql, self.autocommit, parameters=self.parameters) > File > "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/airflow/hooks/dbapi_hook.py", > line 109, in run > cur.execute(s) > psycopg2.InternalError: VACUUM cannot run inside a transaction block > {quote} > I could create a small python script that performs the operation, as > explained in [this stackoverflow > entry](http://stackoverflow.com/questions/1017463/postgresql-how-to-run-vacuum-from-code-outside-transaction-block). > However, I would like to know first if the `VACUUM` command should be > supported by the `PostgresOperator`. > h1. Reproducing the Issue > The operator can be declared as follows: > {quote} > conn = ('postgres_default') > t4 = PostgresOperator( > task_id='vacuum', > postgres_conn_id=conn, > sql=("VACUUM public.table"), > dag=dag > ) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-577) BigQuery Hook failure message too opaque
Georg Walther created AIRFLOW-577: - Summary: BigQuery Hook failure message too opaque Key: AIRFLOW-577 URL: https://issues.apache.org/jira/browse/AIRFLOW-577 Project: Apache Airflow Issue Type: Bug Reporter: Georg Walther The BigQuery service returns routinely opaque error messages such as "Too many errors ..." - the Airflow BigQuery hook returns this opaque error message by accessing the respective keys in the job dictionary: "job['status']['errorResult']" When debugging BigQuery issues in Airflow we routinely need to try and step into the BigQuery hook to inspect the job dictionary for further hints at what caused the error. Therefore it would help to output the BigQuery hook job dictionary in the first place. -- This message was sent by Atlassian JIRA (v6.3.4#6332)