[jira] [Updated] (AIRFLOW-179) DbApiHook string serialization fails when string contains non-ASCII characters
[ https://issues.apache.org/jira/browse/AIRFLOW-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Bodley updated AIRFLOW-179: Description: The DbApiHook.insert_rows(...) method tries to serialize all values to strings using the ASCII codec, this is problematic if the cell contains non-ASCII characters, i.e. >>> from airflow.hooks import DbApiHook >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng') Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line 196, in _serialize_cell return "'" + str(cell).replace("'", "''") + "'" File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line 102, in __new__ return super(newstr, cls).__new__(cls, value) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: ordinal not in range(128) Rather than manually trying to serialize and escape values to an ASCII string one should try to serialize the value to string using the character set of the corresponding target database leveraging the connection to mutate the object to the SQL string literal. Note an exception should still be thrown if the target encoding is not compatible with the source encoding. was: The DbApiHook.insert_rows(...) method tries to serialize all values to strings using the ASCII codec, this is problematic if the cell contains non-ASCII characters, i.e. >>> from airflow.hooks import DbApiHook >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng') Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line 196, in _serialize_cell return "'" + str(cell).replace("'", "''") + "'" File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line 102, in __new__ return super(newstr, cls).__new__(cls, value) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: ordinal not in range(128) Rather than manually trying to serialize values to an ASCII string one should try to serialize the value to string using the character set of the corresponding target database leveraging the connection to mutate an object to the SQL string literal. Note an exception should still be thrown if the target encoding is not compatible with the source encoding. > DbApiHook string serialization fails when string contains non-ASCII characters > -- > > Key: AIRFLOW-179 > URL: https://issues.apache.org/jira/browse/AIRFLOW-179 > Project: Apache Airflow > Issue Type: Bug > Components: hooks >Reporter: John Bodley >Assignee: John Bodley > > The DbApiHook.insert_rows(...) method tries to serialize all values to > strings using the ASCII codec, this is problematic if the cell contains > non-ASCII characters, i.e. > >>> from airflow.hooks import DbApiHook > >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng') > Traceback (most recent call last): > File "", line 1, in > File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", > line 196, in _serialize_cell > return "'" + str(cell).replace("'", "''") + "'" > File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line > 102, in __new__ > return super(newstr, cls).__new__(cls, value) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: > ordinal not in range(128) > Rather than manually trying to serialize and escape values to an ASCII string > one should try to serialize the value to string using the character set of > the corresponding target database leveraging the connection to mutate the > object to the SQL string literal. > Note an exception should still be thrown if the target encoding is not > compatible with the source encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (AIRFLOW-179) DbApiHook string serialization fails when string contains non-ASCII characters
[ https://issues.apache.org/jira/browse/AIRFLOW-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-179 started by John Bodley. --- > DbApiHook string serialization fails when string contains non-ASCII characters > -- > > Key: AIRFLOW-179 > URL: https://issues.apache.org/jira/browse/AIRFLOW-179 > Project: Apache Airflow > Issue Type: Bug > Components: hooks >Reporter: John Bodley >Assignee: John Bodley > > The DbApiHook.insert_rows(...) method tries to serialize all values to > strings using the ASCII codec, this is problematic if the cell contains > non-ASCII characters, i.e. > >>> from airflow.hooks import DbApiHook > >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng') > Traceback (most recent call last): > File "", line 1, in > File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", > line 196, in _serialize_cell > return "'" + str(cell).replace("'", "''") + "'" > File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line > 102, in __new__ > return super(newstr, cls).__new__(cls, value) > UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: > ordinal not in range(128) > Rather than manually trying to serialize values to an ASCII string one should > try to serialize the value to string using the character set of the > corresponding target database leveraging the connection to mutate an object > to the SQL string literal. > Note an exception should still be thrown if the target encoding is not > compatible with the source encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-179) DbApiHook string serialization fails when string contains non-ASCII characters
John Bodley created AIRFLOW-179: --- Summary: DbApiHook string serialization fails when string contains non-ASCII characters Key: AIRFLOW-179 URL: https://issues.apache.org/jira/browse/AIRFLOW-179 Project: Apache Airflow Issue Type: Bug Components: hooks Reporter: John Bodley Assignee: John Bodley The DbApiHook.insert_rows(...) method tries to serialize all values to strings using the ASCII codec, this is problematic if the cell contains non-ASCII characters, i.e. >>> from airflow.hooks import DbApiHook >>> DbApiHook._serialize_cell('Nguyễn Tấn Dũng') Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.7/dist-packages/airflow/hooks/dbapi_hook.py", line 196, in _serialize_cell return "'" + str(cell).replace("'", "''") + "'" File "/usr/local/lib/python2.7/dist-packages/future/types/newstr.py", line 102, in __new__ return super(newstr, cls).__new__(cls, value) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 4: ordinal not in range(128) Rather than manually trying to serialize values to an ASCII string one should try to serialize the value to string using the character set of the corresponding target database leveraging the connection to mutate an object to the SQL string literal. Note an exception should still be thrown if the target encoding is not compatible with the source encoding. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-178) Zip files in DAG folder does not get picked up by Ariflow
[ https://issues.apache.org/jira/browse/AIRFLOW-178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joy Gao updated AIRFLOW-178: External issue URL: https://github.com/apache/incubator-airflow/pull/1545 > Zip files in DAG folder does not get picked up by Ariflow > - > > Key: AIRFLOW-178 > URL: https://issues.apache.org/jira/browse/AIRFLOW-178 > Project: Apache Airflow > Issue Type: Bug >Reporter: Joy Gao >Assignee: Joy Gao >Priority: Minor > > The collect_dags method in DagBag class currently skips any file that does > not end in '.py', thereby skipping potential zip files in the DAG folder. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301539#comment-15301539 ] Bolke de Bruin edited comment on AIRFLOW-168 at 5/26/16 5:08 AM: - The double scheduling is indeed a bug on master, also with the updated scheduler from 124, that I will need to fix. was (Author: bolke): The double scheduling is indeed a bug, also with the updated scheduler from 124, that I will need to fix. > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, > screenshot-1.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. Though it > works as expected on a prod env which is running v1.7 with celery workers and > mysql backend. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301539#comment-15301539 ] Bolke de Bruin commented on AIRFLOW-168: The double scheduling is indeed a bug, also with the updated scheduler from 124, that I will need to fix. > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, > screenshot-1.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. Though it > works as expected on a prod env which is running v1.7 with celery workers and > mysql backend. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-171) Email does not work in 1.7.1.2
[ https://issues.apache.org/jira/browse/AIRFLOW-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301533#comment-15301533 ] Bolke de Bruin commented on AIRFLOW-171: It should, but I dont think it is, be mentioned in UPDATING.md. > Email does not work in 1.7.1.2 > -- > > Key: AIRFLOW-171 > URL: https://issues.apache.org/jira/browse/AIRFLOW-171 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.7.1 > Environment: AWS Amazon Linux Image >Reporter: Hao Ye > > Job failure emails was working in 1.7.0. They seem to have stopped working in > 1.7.1. > Error is > {quote} > [2016-05-25 00:48:02,334] {models.py:1311} ERROR - Failed to send email to: > ['em...@email.com'] > [2016-05-25 00:48:02,334] {models.py:1312} ERROR - 'module' object has no > attribute 'send_email_smtp' > Traceback (most recent call last): > File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, > in handle_failure > self.email_alert(error, is_retry=False) > File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, > in email_alert > send_email(task.email, title, body) > File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line > 42, in send_email > backend = getattr(module, attr) > AttributeError: 'module' object has no attribute 'send_email_smtp' > {quote} > File exists and method exists. Seems to work fine when called in python > directly. > Maybe it's loading the wrong email module. > Tried to set PYTHONPATH to have > /usr/local/lib/python2.7/site-packages/airflow earlier in the path, but that > didn't seem to work either. > Could this be related to the utils refactoring that happened between 1.7.0 > and 1.7.1? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-178) Zip files in DAG folder does not get picked up by Ariflow
Joy Gao created AIRFLOW-178: --- Summary: Zip files in DAG folder does not get picked up by Ariflow Key: AIRFLOW-178 URL: https://issues.apache.org/jira/browse/AIRFLOW-178 Project: Apache Airflow Issue Type: Bug Reporter: Joy Gao Assignee: Joy Gao Priority: Minor The collect_dags method in DagBag class currently skips any file that does not end in '.py', thereby skipping potential zip files in the DAG folder. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301507#comment-15301507 ] Bolke de Bruin edited comment on AIRFLOW-168 at 5/26/16 4:44 AM: - Yes sorry I mentioned this on gitter. With master deadlock detection is broken due to the eager creation and the scheduler will not check for existing task_instances before creation, hence the constraint error. It will need the follow up patch from AIRFLOW-128. Black is indeed the color (basically undefined) for tasksinstances that are created but have not been picked up by the scheduler. was (Author: bolke): Yes sorry I mentioned this on gitter. With master deadlock detection is broken due to the eager creation. It will need the follow up patch from AIRFLOW-128. Black is indeed the color (basically undefined) > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, > screenshot-1.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. Though it > works as expected on a prod env which is running v1.7 with celery workers and > mysql backend. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301507#comment-15301507 ] Bolke de Bruin edited comment on AIRFLOW-168 at 5/26/16 4:42 AM: - Yes sorry I mentioned this on gitter. With master deadlock detection is broken due to the eager creation. It will need the follow up patch from AIRFLOW-128. Black is indeed the color (basically undefined) was (Author: bolke): Yes sorry I mentioned this on gitter. With master deadlock detection is broken due to the eager creation. It will need the follow up patch from AIRFLOW-128. > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, > screenshot-1.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. Though it > works as expected on a prod env which is running v1.7 with celery workers and > mysql backend. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (AIRFLOW-161) Redirection to external url
[ https://issues.apache.org/jira/browse/AIRFLOW-161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Maheshwari reassigned AIRFLOW-161: Assignee: Sumit Maheshwari > Redirection to external url > --- > > Key: AIRFLOW-161 > URL: https://issues.apache.org/jira/browse/AIRFLOW-161 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Reporter: Sumit Maheshwari >Assignee: Sumit Maheshwari > > Hi, > I am not able to find a good way (apart from loading everything upfront), > where I can redirect someone to a external service url, using the information > stored in airflow. There could be many use cases like downloading a signed > file from s3, redirecting to hadoop job tracker, or a direct case on which I > am working which is linking airflow tasks to qubole commands. > I already have a working model and will open a PR soon. Please let me know if > there existing ways already. > Thanks, > Sumit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-177) Resume a failed dag
Sumit Maheshwari created AIRFLOW-177: Summary: Resume a failed dag Key: AIRFLOW-177 URL: https://issues.apache.org/jira/browse/AIRFLOW-177 Project: Apache Airflow Issue Type: New Feature Components: core Reporter: Sumit Maheshwari Say I've a dag with 10 nodes and one of the dag run got failed at 5th node. Now if I want to resume that dag, I can go and run individual task one by one. Is there any way by which I can just tell dag_id and execution_date (or run_id) and it automatically retries only failed tasks? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (AIRFLOW-167) Get dag state for a given execution date.
[ https://issues.apache.org/jira/browse/AIRFLOW-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Maheshwari reassigned AIRFLOW-167: Assignee: Sumit Maheshwari > Get dag state for a given execution date. > - > > Key: AIRFLOW-167 > URL: https://issues.apache.org/jira/browse/AIRFLOW-167 > Project: Apache Airflow > Issue Type: Improvement > Components: cli >Reporter: Sumit Maheshwari >Assignee: Sumit Maheshwari > > I was trying to get state for a particular dag-run programmatically, but > couldn't find a way. > If we could have a rest call like > `/admin/dagrun?dag_id=&execution_date=` and get the output that > would be best. Currently we've to do html parsing to get the same. > Other (and easier) way is to add a cli support like we have for `task_state`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301507#comment-15301507 ] Bolke de Bruin commented on AIRFLOW-168: Yes sorry I mentioned this on gitter. With master deadlock detection is broken due to the eager creation. It will need the follow up patch from AIRFLOW-128. > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, > screenshot-1.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. Though it > works as expected on a prod env which is running v1.7 with celery workers and > mysql backend. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Maheshwari updated AIRFLOW-168: - Description: I was looking at example_xcom example and found that it got scheduled twice. Ones at the start_time and ones at the current time. To be correct I tried multiple times (by reloading db) and its same. I am on airflow master, using sequential executor with sqlite3. Though it works as expected on a prod env which is running v1.7 with celery workers and mysql backend. was: I was looking at example_xcom example and found that it got scheduled twice. Ones at the start_time and ones at the current time. To be correct I tried multiple times (by reloading db) and its same. I am on airflow master, using sequential executor with sqlite3. > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, > screenshot-1.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. Though it > works as expected on a prod env which is running v1.7 with celery workers and > mysql backend. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301461#comment-15301461 ] Chris Riccomini edited comment on AIRFLOW-168 at 5/26/16 3:24 AM: -- I noticed that the scheduler log shows (stacktrace at bottom): {noformat} [2016-05-25 20:22:37,925] {jobs.py:580} INFO - Prioritizing 0 queued jobs [2016-05-25 20:22:37,933] {jobs.py:732} INFO - Starting 0 scheduler jobs [2016-05-25 20:22:37,933] {jobs.py:747} INFO - Done queuing tasks, calling the executor's heartbeat [2016-05-25 20:22:37,933] {jobs.py:750} INFO - Loop took: 0.011795 seconds [2016-05-25 20:22:37,936] {models.py:308} INFO - Finding 'running' jobs without a recent heartbeat [2016-05-25 20:22:37,937] {models.py:314} INFO - Failing jobs without heartbeat after 2016-05-25 20:20:22.937222 [2016-05-25 20:22:42,925] {jobs.py:580} INFO - Prioritizing 0 queued jobs [2016-05-25 20:22:42,934] {jobs.py:732} INFO - Starting 1 scheduler jobs [2016-05-25 20:22:42,977] {models.py:2703} INFO - Checking state for [2016-05-25 20:22:42,983] {jobs.py:504} INFO - Getting list of tasks to skip for active runs. [2016-05-25 20:22:42,986] {jobs.py:520} INFO - Checking dependencies on 3 tasks instances, minus 0 skippable ones [2016-05-25 20:22:42,991] {base_executor.py:36} INFO - Adding to queue: airflow run example_xcom push 2016-05-25T20:22:42.953808 --local -sd DAGS_FOLDER/example_dags/example_xcom.py [2016-05-25 20:22:42,993] {base_executor.py:36} INFO - Adding to queue: airflow run example_xcom push_by_returning 2016-05-25T20:22:42.953808 --local -sd DAGS_FOLDER/example_dags/example_xcom.py [2016-05-25 20:22:43,011] {jobs.py:747} INFO - Done queuing tasks, calling the executor's heartbeat [2016-05-25 20:22:43,012] {jobs.py:750} INFO - Loop took: 0.089461 seconds [2016-05-25 20:22:43,018] {models.py:308} INFO - Finding 'running' jobs without a recent heartbeat [2016-05-25 20:22:43,019] {models.py:314} INFO - Failing jobs without heartbeat after 2016-05-25 20:20:28.019143 [2016-05-25 20:22:43,028] {sequential_executor.py:26} INFO - Executing command: airflow run example_xcom push 2016-05-25T20:22:42.953808 --local -sd DAGS_FOLDER/example_dags/example_xcom.py [2016-05-25 20:22:43,453] {__init__.py:36} INFO - Using executor SequentialExecutor Logging into: /Users/chrisr/airflow/logs/example_xcom/push/2016-05-25T20:22:42.953808 [2016-05-25 20:22:44,300] {__init__.py:36} INFO - Using executor SequentialExecutor [2016-05-25 20:22:48,937] {sequential_executor.py:26} INFO - Executing command: airflow run example_xcom push_by_returning 2016-05-25T20:22:42.953808 --local -sd DAGS_FOLDER/example_dags/example_xcom.py [2016-05-25 20:22:49,366] {__init__.py:36} INFO - Using executor SequentialExecutor Logging into: /Users/chrisr/airflow/logs/example_xcom/push_by_returning/2016-05-25T20:22:42.953808 [2016-05-25 20:22:50,210] {__init__.py:36} INFO - Using executor SequentialExecutor [2016-05-25 20:22:54,844] {jobs.py:580} INFO - Prioritizing 0 queued jobs [2016-05-25 20:22:54,853] {jobs.py:732} INFO - Starting 1 scheduler jobs [2016-05-25 20:22:54,903] {models.py:2703} INFO - Checking state for [2016-05-25 20:22:54,907] {models.py:2703} INFO - Checking state for [2016-05-25 20:22:54,911] {jobs.py:504} INFO - Getting list of tasks to skip for active runs. [2016-05-25 20:22:54,913] {jobs.py:520} INFO - Checking dependencies on 6 tasks instances, minus 2 skippable ones [2016-05-25 20:22:54,920] {base_executor.py:36} INFO - Adding to queue: airflow run example_xcom push 2015-01-01T00:00:00 --local -sd DAGS_FOLDER/example_dags/example_xcom.py [2016-05-25 20:22:54,921] {base_executor.py:36} INFO - Adding to queue: airflow run example_xcom push_by_returning 2015-01-01T00:00:00 --local -sd DAGS_FOLDER/example_dags/example_xcom.py [2016-05-25 20:22:54,935] {base_executor.py:36} INFO - Adding to queue: airflow run example_xcom puller 2016-05-25T20:22:42.953808 --local -sd DAGS_FOLDER/example_dags/example_xcom.py [2016-05-25 20:22:54,954] {jobs.py:747} INFO - Done queuing tasks, calling the executor's heartbeat [2016-05-25 20:22:54,954] {jobs.py:750} INFO - Loop took: 0.113319 seconds [2016-05-25 20:22:54,960] {models.py:308} INFO - Finding 'running' jobs without a recent heartbeat [2016-05-25 20:22:54,960] {models.py:314} INFO - Failing jobs without heartbeat after 2016-05-25 20:20:39.960629 [2016-05-25 20:22:54,978] {sequential_executor.py:26} INFO - Executing command: airflow run example_xcom push_by_returning 2015-01-01T00:00:00 --local -sd DAGS_FOLDER/example_dags/example_xcom.py [2016-05-25 20:22:55,410] {__init__.py:36} INFO - Using executor SequentialExecutor Logging into: /Users/chrisr/airflow/logs/example_xcom/push_by_returning/2015-01-01T00:00:00 [2016-05-25 20:22:56,239] {__init__.py:36} INFO - Using executor SequentialExecutor [2016-05-25 20:23:00,873] {sequential_executor
[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301461#comment-15301461 ] Chris Riccomini commented on AIRFLOW-168: - I noticed that the scheduler log shows: {noformat} [2016-05-25 20:22:37,925] {jobs.py:580} INFO - Prioritizing 0 queued jobs [2016-05-25 20:22:37,933] {jobs.py:732} INFO - Starting 0 scheduler jobs [2016-05-25 20:22:37,933] {jobs.py:747} INFO - Done queuing tasks, calling the executor's heartbeat [2016-05-25 20:22:37,933] {jobs.py:750} INFO - Loop took: 0.011795 seconds [2016-05-25 20:22:37,936] {models.py:308} INFO - Finding 'running' jobs without a recent heartbeat [2016-05-25 20:22:37,937] {models.py:314} INFO - Failing jobs without heartbeat after 2016-05-25 20:20:22.937222 [2016-05-25 20:22:42,925] {jobs.py:580} INFO - Prioritizing 0 queued jobs [2016-05-25 20:22:42,934] {jobs.py:732} INFO - Starting 1 scheduler jobs [2016-05-25 20:22:42,977] {models.py:2703} INFO - Checking state for [2016-05-25 20:22:42,983] {jobs.py:504} INFO - Getting list of tasks to skip for active runs. [2016-05-25 20:22:42,986] {jobs.py:520} INFO - Checking dependencies on 3 tasks instances, minus 0 skippable ones [2016-05-25 20:22:42,991] {base_executor.py:36} INFO - Adding to queue: airflow run example_xcom push 2016-05-25T20:22:42.953808 --local -sd DAGS_FOLDER/example_dags/example_xcom.py [2016-05-25 20:22:42,993] {base_executor.py:36} INFO - Adding to queue: airflow run example_xcom push_by_returning 2016-05-25T20:22:42.953808 --local -sd DAGS_FOLDER/example_dags/example_xcom.py [2016-05-25 20:22:43,011] {jobs.py:747} INFO - Done queuing tasks, calling the executor's heartbeat [2016-05-25 20:22:43,012] {jobs.py:750} INFO - Loop took: 0.089461 seconds [2016-05-25 20:22:43,018] {models.py:308} INFO - Finding 'running' jobs without a recent heartbeat [2016-05-25 20:22:43,019] {models.py:314} INFO - Failing jobs without heartbeat after 2016-05-25 20:20:28.019143 [2016-05-25 20:22:43,028] {sequential_executor.py:26} INFO - Executing command: airflow run example_xcom push 2016-05-25T20:22:42.953808 --local -sd DAGS_FOLDER/example_dags/example_xcom.py [2016-05-25 20:22:43,453] {__init__.py:36} INFO - Using executor SequentialExecutor Logging into: /Users/chrisr/airflow/logs/example_xcom/push/2016-05-25T20:22:42.953808 [2016-05-25 20:22:44,300] {__init__.py:36} INFO - Using executor SequentialExecutor [2016-05-25 20:22:48,937] {sequential_executor.py:26} INFO - Executing command: airflow run example_xcom push_by_returning 2016-05-25T20:22:42.953808 --local -sd DAGS_FOLDER/example_dags/example_xcom.py [2016-05-25 20:22:49,366] {__init__.py:36} INFO - Using executor SequentialExecutor Logging into: /Users/chrisr/airflow/logs/example_xcom/push_by_returning/2016-05-25T20:22:42.953808 [2016-05-25 20:22:50,210] {__init__.py:36} INFO - Using executor SequentialExecutor [2016-05-25 20:22:54,844] {jobs.py:580} INFO - Prioritizing 0 queued jobs [2016-05-25 20:22:54,853] {jobs.py:732} INFO - Starting 1 scheduler jobs [2016-05-25 20:22:54,903] {models.py:2703} INFO - Checking state for [2016-05-25 20:22:54,907] {models.py:2703} INFO - Checking state for [2016-05-25 20:22:54,911] {jobs.py:504} INFO - Getting list of tasks to skip for active runs. [2016-05-25 20:22:54,913] {jobs.py:520} INFO - Checking dependencies on 6 tasks instances, minus 2 skippable ones [2016-05-25 20:22:54,920] {base_executor.py:36} INFO - Adding to queue: airflow run example_xcom push 2015-01-01T00:00:00 --local -sd DAGS_FOLDER/example_dags/example_xcom.py [2016-05-25 20:22:54,921] {base_executor.py:36} INFO - Adding to queue: airflow run example_xcom push_by_returning 2015-01-01T00:00:00 --local -sd DAGS_FOLDER/example_dags/example_xcom.py [2016-05-25 20:22:54,935] {base_executor.py:36} INFO - Adding to queue: airflow run example_xcom puller 2016-05-25T20:22:42.953808 --local -sd DAGS_FOLDER/example_dags/example_xcom.py [2016-05-25 20:22:54,954] {jobs.py:747} INFO - Done queuing tasks, calling the executor's heartbeat [2016-05-25 20:22:54,954] {jobs.py:750} INFO - Loop took: 0.113319 seconds [2016-05-25 20:22:54,960] {models.py:308} INFO - Finding 'running' jobs without a recent heartbeat [2016-05-25 20:22:54,960] {models.py:314} INFO - Failing jobs without heartbeat after 2016-05-25 20:20:39.960629 [2016-05-25 20:22:54,978] {sequential_executor.py:26} INFO - Executing command: airflow run example_xcom push_by_returning 2015-01-01T00:00:00 --local -sd DAGS_FOLDER/example_dags/example_xcom.py [2016-05-25 20:22:55,410] {__init__.py:36} INFO - Using executor SequentialExecutor Logging into: /Users/chrisr/airflow/logs/example_xcom/push_by_returning/2015-01-01T00:00:00 [2016-05-25 20:22:56,239] {__init__.py:36} INFO - Using executor SequentialExecutor [2016-05-25 20:23:00,873] {sequential_executor.py:26} INFO - Executing command: airflow run example_xcom push 2015-01
[jira] [Updated] (AIRFLOW-161) Redirection to external url
[ https://issues.apache.org/jira/browse/AIRFLOW-161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumit Maheshwari updated AIRFLOW-161: - External issue URL: https://github.com/apache/incubator-airflow/pull/1538/files > Redirection to external url > --- > > Key: AIRFLOW-161 > URL: https://issues.apache.org/jira/browse/AIRFLOW-161 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Reporter: Sumit Maheshwari > > Hi, > I am not able to find a good way (apart from loading everything upfront), > where I can redirect someone to a external service url, using the information > stored in airflow. There could be many use cases like downloading a signed > file from s3, redirecting to hadoop job tracker, or a direct case on which I > am working which is linking airflow tasks to qubole commands. > I already have a working model and will open a PR soon. Please let me know if > there existing ways already. > Thanks, > Sumit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301449#comment-15301449 ] Chris Riccomini commented on AIRFLOW-168: - The comment in [this|https://github.com/apache/incubator-airflow/pull/1506] pull request from [~bolke] reads: {quote} This creates dagrun from a Dag. It also creates the TaskInstances from the tasks known at instantiation time. By having taskinstances created at dagrun instantiation time, deadlocks that were tested for will not take place anymore (@jlowin, correct? different test required?). *For now, the visual consequence of having these taskinstances already there is that they will be black in the tree view.* {quote} > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, > screenshot-1.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-161) Redirection to external url
[ https://issues.apache.org/jira/browse/AIRFLOW-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301450#comment-15301450 ] Sumit Maheshwari commented on AIRFLOW-161: -- Sure, can you please cc top contributors to PR or here. > Redirection to external url > --- > > Key: AIRFLOW-161 > URL: https://issues.apache.org/jira/browse/AIRFLOW-161 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Reporter: Sumit Maheshwari > > Hi, > I am not able to find a good way (apart from loading everything upfront), > where I can redirect someone to a external service url, using the information > stored in airflow. There could be many use cases like downloading a signed > file from s3, redirecting to hadoop job tracker, or a direct case on which I > am working which is linking airflow tasks to qubole commands. > I already have a working model and will open a PR soon. Please let me know if > there existing ways already. > Thanks, > Sumit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301444#comment-15301444 ] Sumit Maheshwari commented on AIRFLOW-168: -- Actually I am getting the same.. 2 schedules, 5 task_instaces (instead of 6). > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, > screenshot-1.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301442#comment-15301442 ] Chris Riccomini edited comment on AIRFLOW-168 at 5/26/16 3:16 AM: -- I was able to reproduce this. My results were even stranger. One of the tasks is showing up as black in the treeview. [~bolke], I'm wondering if this is related to the scheduler work you're doing? !screenshot-1.png||width=300! was (Author: criccomini): I was able to reproduce this. My results were even stranger. One of the tasks is showing up as black in the treeview. [~bolke], I'm wondering if this is related to the scheduler work you're doing? !screenshot-1.png|thumbnail! > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, > screenshot-1.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301442#comment-15301442 ] Chris Riccomini edited comment on AIRFLOW-168 at 5/26/16 3:16 AM: -- I was able to reproduce this. My results were even stranger. One of the tasks is showing up as black in the treeview. [~bolke], I'm wondering if this is related to the scheduler work you're doing? !screenshot-1.png|width=300! was (Author: criccomini): I was able to reproduce this. My results were even stranger. One of the tasks is showing up as black in the treeview. [~bolke], I'm wondering if this is related to the scheduler work you're doing? !screenshot-1.png||width=300! > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, > screenshot-1.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301442#comment-15301442 ] Chris Riccomini edited comment on AIRFLOW-168 at 5/26/16 3:17 AM: -- I was able to reproduce this. My results were even stranger. One of the tasks is showing up as black in the treeview. [~bolke], I'm wondering if this is related to the scheduler work you're doing? !screenshot-1.png|width=500! was (Author: criccomini): I was able to reproduce this. My results were even stranger. One of the tasks is showing up as black in the treeview. [~bolke], I'm wondering if this is related to the scheduler work you're doing? !screenshot-1.png|width=300! > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, > screenshot-1.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301442#comment-15301442 ] Chris Riccomini edited comment on AIRFLOW-168 at 5/26/16 3:15 AM: -- I was able to reproduce this. My results were even stranger. One of the tasks is showing up as black in the treeview. [~bolke], I'm wondering if this is related to the scheduler work you're doing? !screenshot-1.png|thumbnail! was (Author: criccomini): I was able to reproduce this. My results were even stranger. One of the tasks is showing up as black in the treeview. [~bolke], I'm wondering if this is related to the scheduler work you're doing? !screenshot-1.png! > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, > screenshot-1.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301442#comment-15301442 ] Chris Riccomini edited comment on AIRFLOW-168 at 5/26/16 3:14 AM: -- I was able to reproduce this. My results were even stranger. One of the tasks is showing up as black in the treeview. [~bolke], I'm wondering if this is related to the scheduler work you're doing? !screenshot-1.png! was (Author: criccomini): I was able to reproduce this. My results were even stranger. One of the tasks is showing up as black in the treeview. [~bolke], I'm wondering if this is related to the scheduler work you're doing? > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, > screenshot-1.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301442#comment-15301442 ] Chris Riccomini commented on AIRFLOW-168: - I was able to reproduce this. My results were even stranger. One of the tasks is showing up as black in the treeview. [~bolke], I'm wondering if this is related to the scheduler work you're doing? > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, > screenshot-1.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301443#comment-15301443 ] Chris Riccomini commented on AIRFLOW-168: - Note: My machine is running on PST, not UTC. > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, > screenshot-1.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-168: Attachment: screenshot-1.png > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png, > screenshot-1.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301440#comment-15301440 ] Sumit Maheshwari commented on AIRFLOW-168: -- No, it's set to IST, will that be a concern? > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-168: Affects Version/s: Airflow 1.7.1.2 > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 1.7.1.2 >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301436#comment-15301436 ] Chris Riccomini commented on AIRFLOW-168: - Is the timezone on the machine set to UTC? > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-161) Redirection to external url
[ https://issues.apache.org/jira/browse/AIRFLOW-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301433#comment-15301433 ] Chris Riccomini commented on AIRFLOW-161: - Yea, I wouldn't object if there were a generic way to redirect from the UI. My objection is more hard coding Quoble stuff in generic Airflow files (views.py, dag.html). I think we'd also need to loop in a few more committers to make sure everyone agrees on the approach taken. > Redirection to external url > --- > > Key: AIRFLOW-161 > URL: https://issues.apache.org/jira/browse/AIRFLOW-161 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Reporter: Sumit Maheshwari > > Hi, > I am not able to find a good way (apart from loading everything upfront), > where I can redirect someone to a external service url, using the information > stored in airflow. There could be many use cases like downloading a signed > file from s3, redirecting to hadoop job tracker, or a direct case on which I > am working which is linking airflow tasks to qubole commands. > I already have a working model and will open a PR soon. Please let me know if > there existing ways already. > Thanks, > Sumit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-161) Redirection to external url
[ https://issues.apache.org/jira/browse/AIRFLOW-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301400#comment-15301400 ] Sumit Maheshwari commented on AIRFLOW-161: -- Fair enough, I can't challenge that decision, as Qubole is not as big as aws or gce :). However as that link will be visible only for qubole_operator type tasks, which implies that user is using qubole and having that link will help him. Also I think that airflow gonna need /redirect (or similar) route in near future. > Redirection to external url > --- > > Key: AIRFLOW-161 > URL: https://issues.apache.org/jira/browse/AIRFLOW-161 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Reporter: Sumit Maheshwari > > Hi, > I am not able to find a good way (apart from loading everything upfront), > where I can redirect someone to a external service url, using the information > stored in airflow. There could be many use cases like downloading a signed > file from s3, redirecting to hadoop job tracker, or a direct case on which I > am working which is linking airflow tasks to qubole commands. > I already have a working model and will open a PR soon. Please let me know if > there existing ways already. > Thanks, > Sumit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-167) Get dag state for a given execution date.
[ https://issues.apache.org/jira/browse/AIRFLOW-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301398#comment-15301398 ] Chris Riccomini commented on AIRFLOW-167: - Commented, thanks! > Get dag state for a given execution date. > - > > Key: AIRFLOW-167 > URL: https://issues.apache.org/jira/browse/AIRFLOW-167 > Project: Apache Airflow > Issue Type: Improvement > Components: cli >Reporter: Sumit Maheshwari > > I was trying to get state for a particular dag-run programmatically, but > couldn't find a way. > If we could have a rest call like > `/admin/dagrun?dag_id=&execution_date=` and get the output that > would be best. Currently we've to do html parsing to get the same. > Other (and easier) way is to add a cli support like we have for `task_state`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-169) Hide expire dags in UI
[ https://issues.apache.org/jira/browse/AIRFLOW-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301346#comment-15301346 ] Sumit Maheshwari commented on AIRFLOW-169: -- I am referring to landing page, ie. /admin. Expired dags means dags which are supposed to run @once and already ran, or dags with end_time which is in past to current time. Similarly if in cli, we can pass some option (say -e) to list_dags command, which will ignores those expired dags. > Hide expire dags in UI > -- > > Key: AIRFLOW-169 > URL: https://issues.apache.org/jira/browse/AIRFLOW-169 > Project: Apache Airflow > Issue Type: Wish > Components: ui >Reporter: Sumit Maheshwari > > It would be great if we've option to hide expired schedules from UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301339#comment-15301339 ] Sumit Maheshwari commented on AIRFLOW-168: -- I was on the latest master. > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-167) Get dag state for a given execution date.
[ https://issues.apache.org/jira/browse/AIRFLOW-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-167: External issue URL: https://github.com/apache/incubator-airflow/pull/1541 > Get dag state for a given execution date. > - > > Key: AIRFLOW-167 > URL: https://issues.apache.org/jira/browse/AIRFLOW-167 > Project: Apache Airflow > Issue Type: Improvement > Components: cli >Reporter: Sumit Maheshwari > > I was trying to get state for a particular dag-run programmatically, but > couldn't find a way. > If we could have a rest call like > `/admin/dagrun?dag_id=&execution_date=` and get the output that > would be best. Currently we've to do html parsing to get the same. > Other (and easier) way is to add a cli support like we have for `task_state`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-167) Get dag state for a given execution date.
[ https://issues.apache.org/jira/browse/AIRFLOW-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301338#comment-15301338 ] Sumit Maheshwari commented on AIRFLOW-167: -- Yup, already opened https://github.com/apache/incubator-airflow/pull/1541/files. > Get dag state for a given execution date. > - > > Key: AIRFLOW-167 > URL: https://issues.apache.org/jira/browse/AIRFLOW-167 > Project: Apache Airflow > Issue Type: Improvement > Components: cli >Reporter: Sumit Maheshwari > > I was trying to get state for a particular dag-run programmatically, but > couldn't find a way. > If we could have a rest call like > `/admin/dagrun?dag_id=&execution_date=` and get the output that > would be best. Currently we've to do html parsing to get the same. > Other (and easier) way is to add a cli support like we have for `task_state`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-171) Email does not work in 1.7.1.2
[ https://issues.apache.org/jira/browse/AIRFLOW-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301175#comment-15301175 ] Hao Ye commented on AIRFLOW-171: Thanks, I got it working. We had an older version of the config that pointed to email_backend = airflow.utils.send_email_smtp instead of email_backend = airflow.utils.email.send_email_smtp. On that note, is there an easy way to detect config changes when upgrading? We currently keep our config across upgrades and so may not pick up new changes. > Email does not work in 1.7.1.2 > -- > > Key: AIRFLOW-171 > URL: https://issues.apache.org/jira/browse/AIRFLOW-171 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.7.1 > Environment: AWS Amazon Linux Image >Reporter: Hao Ye > > Job failure emails was working in 1.7.0. They seem to have stopped working in > 1.7.1. > Error is > {quote} > [2016-05-25 00:48:02,334] {models.py:1311} ERROR - Failed to send email to: > ['em...@email.com'] > [2016-05-25 00:48:02,334] {models.py:1312} ERROR - 'module' object has no > attribute 'send_email_smtp' > Traceback (most recent call last): > File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, > in handle_failure > self.email_alert(error, is_retry=False) > File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, > in email_alert > send_email(task.email, title, body) > File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line > 42, in send_email > backend = getattr(module, attr) > AttributeError: 'module' object has no attribute 'send_email_smtp' > {quote} > File exists and method exists. Seems to work fine when called in python > directly. > Maybe it's loading the wrong email module. > Tried to set PYTHONPATH to have > /usr/local/lib/python2.7/site-packages/airflow earlier in the path, but that > didn't seem to work either. > Could this be related to the utils refactoring that happened between 1.7.0 > and 1.7.1? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[1/2] incubator-airflow git commit: AIRFLOW-45: Support Hidden Airflow Variables
Repository: incubator-airflow Updated Branches: refs/heads/master 7332c40c2 -> 456dada69 AIRFLOW-45: Support Hidden Airflow Variables Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/3e309415 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/3e309415 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/3e309415 Branch: refs/heads/master Commit: 3e3094157eb516ad37c4691ddfcdda5c9444352e Parents: 7332c40 Author: Matthew Chen Authored: Wed May 25 08:45:24 2016 -0700 Committer: Matthew Chen Committed: Wed May 25 08:45:24 2016 -0700 -- airflow/configuration.py | 9 - airflow/www/views.py | 36 +++- docs/img/variable_hidden.png | Bin 0 -> 154299 bytes docs/ui.rst | 13 + 4 files changed, 56 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3e309415/airflow/configuration.py -- diff --git a/airflow/configuration.py b/airflow/configuration.py index 13bb344..582bc7c 100644 --- a/airflow/configuration.py +++ b/airflow/configuration.py @@ -156,7 +156,10 @@ defaults = { }, 'github_enterprise': { 'api_rev': 'v3' -} +}, +'admin': { +'hide_sensitive_variable_fields': True, +}, } DEFAULT_CONFIG = """\ @@ -386,6 +389,10 @@ authenticate = False # default_principal = admin # default_secret = admin +[admin] +# UI to hide sensitive variable fields when set to True +hide_sensitive_variable_fields = True + """ TEST_CONFIG = """\ http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3e309415/airflow/www/views.py -- diff --git a/airflow/www/views.py b/airflow/www/views.py index bcd390c..78f9677 100644 --- a/airflow/www/views.py +++ b/airflow/www/views.py @@ -82,6 +82,17 @@ current_user = airflow.login.current_user logout_user = airflow.login.logout_user FILTER_BY_OWNER = False + +DEFAULT_SENSITIVE_VARIABLE_FIELDS = ( +'password', +'secret', +'passwd', +'authorization', +'api_key', +'apikey', +'access_token', +) + if conf.getboolean('webserver', 'FILTER_BY_OWNER'): # filter_by_owner if authentication is enabled and filter_by_owner is true FILTER_BY_OWNER = not current_app.config['LOGIN_DISABLED'] @@ -265,6 +276,11 @@ def recurse_tasks(tasks, task_ids, dag_ids, task_id_to_dag): task_id_to_dag[tasks.task_id] = tasks.dag +def should_hide_value_for_key(key_name): +return any(s in key_name for s in DEFAULT_SENSITIVE_VARIABLE_FIELDS) \ + and conf.getboolean('admin', 'hide_sensitive_variable_fields') + + class Airflow(BaseView): def is_visible(self): @@ -2015,11 +2031,17 @@ admin.add_view(mv) class VariableView(wwwutils.LoginMixin, AirflowModelView): verbose_name = "Variable" verbose_name_plural = "Variables" + +def hidden_field_formatter(view, context, model, name): +if should_hide_value_for_key(model.key): +return Markup('*' * 8) +return getattr(model, name) + form_columns = ( 'key', 'val', ) -column_list = ('key', 'is_encrypted',) +column_list = ('key', 'val', 'is_encrypted',) column_filters = ('key', 'val') column_searchable_list = ('key', 'val') form_widget_args = { @@ -2028,6 +2050,18 @@ class VariableView(wwwutils.LoginMixin, AirflowModelView): 'rows': 20, } } +column_sortable_list = ( +'key', +'val', +'is_encrypted', +) +column_formatters = { +'val': hidden_field_formatter +} + +def on_form_prefill(self, form, id): +if should_hide_value_for_key(form.key.data): +form.val.data = '*' * 8 class JobModelView(ModelViewOnly): http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3e309415/docs/img/variable_hidden.png -- diff --git a/docs/img/variable_hidden.png b/docs/img/variable_hidden.png new file mode 100644 index 000..e081ca3 Binary files /dev/null and b/docs/img/variable_hidden.png differ http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3e309415/docs/ui.rst -- diff --git a/docs/ui.rst b/docs/ui.rst index 112804e..4b232fa 100644 --- a/docs/ui.rst +++ b/docs/ui.rst @@ -41,6 +41,19 @@ dependencies and their current status for a specific run. +Variable View +. +The variable view allows you to list, create, edit or delete the key-value pair +of a variable used during jobs. Value of
[jira] [Closed] (AIRFLOW-45) Support hidden Airflow variables
[ https://issues.apache.org/jira/browse/AIRFLOW-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini closed AIRFLOW-45. -- Resolution: Fixed +1 Merged. Thanks! [~cheny258]. > Support hidden Airflow variables > > > Key: AIRFLOW-45 > URL: https://issues.apache.org/jira/browse/AIRFLOW-45 > Project: Apache Airflow > Issue Type: Improvement > Components: security >Reporter: Chris Riccomini >Assignee: Matthew Chen > > We have a use case where someone wants to set a variable for their DAG, but > they don't want it visible via the UI. I see that variables are encrypted in > the DB (if the crypto package is installed), but the variables are still > visible via the UI, which is a little annoying. > Obviously, this is not 100% secure, since you can still create a DAG to read > the variable, but it will at least keep arbitrary users from logging > in/loading the UI and seeing the variable. > I propose basically handling this the same way that DB hook passwords are > handled. Don't show them in the UI when the edit button is clicked, but allow > the variables to be editable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-45) Support hidden Airflow variables
[ https://issues.apache.org/jira/browse/AIRFLOW-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301084#comment-15301084 ] ASF subversion and git services commented on AIRFLOW-45: Commit 3e3094157eb516ad37c4691ddfcdda5c9444352e in incubator-airflow's branch refs/heads/master from [~cheny258] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=3e30941 ] AIRFLOW-45: Support Hidden Airflow Variables > Support hidden Airflow variables > > > Key: AIRFLOW-45 > URL: https://issues.apache.org/jira/browse/AIRFLOW-45 > Project: Apache Airflow > Issue Type: Improvement > Components: security >Reporter: Chris Riccomini >Assignee: Matthew Chen > > We have a use case where someone wants to set a variable for their DAG, but > they don't want it visible via the UI. I see that variables are encrypted in > the DB (if the crypto package is installed), but the variables are still > visible via the UI, which is a little annoying. > Obviously, this is not 100% secure, since you can still create a DAG to read > the variable, but it will at least keep arbitrary users from logging > in/loading the UI and seeing the variable. > I propose basically handling this the same way that DB hook passwords are > handled. Don't show them in the UI when the edit button is clicked, but allow > the variables to be editable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-176) PR tool crashes with non-integer JIRA ids
[ https://issues.apache.org/jira/browse/AIRFLOW-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301115#comment-15301115 ] Chris Riccomini commented on AIRFLOW-176: - +1 > PR tool crashes with non-integer JIRA ids > - > > Key: AIRFLOW-176 > URL: https://issues.apache.org/jira/browse/AIRFLOW-176 > Project: Apache Airflow > Issue Type: Bug > Components: PR tool >Affects Versions: Airflow 1.7.1.2 >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin > > The PR tool crashes if a non-integer id is passed. This includes the default > ID (AIRFLOW-XXX) so it affects folks who don't type in a new ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[2/2] incubator-airflow git commit: Merge pull request #1530 from mattuuh7/hidden-fields
Merge pull request #1530 from mattuuh7/hidden-fields Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/456dada6 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/456dada6 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/456dada6 Branch: refs/heads/master Commit: 456dada695174989e6785f08f58112e760b72d8b Parents: 7332c40 3e30941 Author: Chris Riccomini Authored: Wed May 25 16:15:01 2016 -0700 Committer: Chris Riccomini Committed: Wed May 25 16:15:01 2016 -0700 -- airflow/configuration.py | 9 - airflow/www/views.py | 36 +++- docs/img/variable_hidden.png | Bin 0 -> 154299 bytes docs/ui.rst | 13 + 4 files changed, 56 insertions(+), 2 deletions(-) --
[jira] [Work started] (AIRFLOW-173) Create a FileSensor / NFSFileSensor sensor
[ https://issues.apache.org/jira/browse/AIRFLOW-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-173 started by Andre. - > Create a FileSensor / NFSFileSensor sensor > -- > > Key: AIRFLOW-173 > URL: https://issues.apache.org/jira/browse/AIRFLOW-173 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Andre >Assignee: Andre >Priority: Minor > > While HDFS and WebHDFS suit most organisations using Hadoop, for some shops > running MapR-FS, Airflow implementation is simplified by the use of plain > files pointing to MapR's NFS gateways. > A FileSensor and/or a NFSFileSensor would assist the adoption of Airflow > within the MapR customer base, but more importantly, help those who are using > POSIX compliant distributed filesystems that can be mounted on Unix > derivative systems (e.g. as MapR-FS (via NFS), CephFS, GlusterFS, etc). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-173) Create a FileSensor / NFSFileSensor sensor
[ https://issues.apache.org/jira/browse/AIRFLOW-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andre updated AIRFLOW-173: -- Assignee: (was: Andre) > Create a FileSensor / NFSFileSensor sensor > -- > > Key: AIRFLOW-173 > URL: https://issues.apache.org/jira/browse/AIRFLOW-173 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Andre >Priority: Minor > > While HDFS and WebHDFS suit most organisations using Hadoop, for some shops > running MapR-FS, Airflow implementation is simplified by the use of plain > files pointing to MapR's NFS gateways. > A FileSensor and/or a NFSFileSensor would assist the adoption of Airflow > within the MapR customer base, but more importantly, help those who are using > POSIX compliant distributed filesystems that can be mounted on Unix > derivative systems (e.g. as MapR-FS (via NFS), CephFS, GlusterFS, etc). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-167) Get dag state for a given execution date.
[ https://issues.apache.org/jira/browse/AIRFLOW-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-167: Component/s: cli > Get dag state for a given execution date. > - > > Key: AIRFLOW-167 > URL: https://issues.apache.org/jira/browse/AIRFLOW-167 > Project: Apache Airflow > Issue Type: Improvement > Components: cli >Reporter: Sumit Maheshwari > > I was trying to get state for a particular dag-run programmatically, but > couldn't find a way. > If we could have a rest call like > `/admin/dagrun?dag_id=&execution_date=` and get the output that > would be best. Currently we've to do html parsing to get the same. > Other (and easier) way is to add a cli support like we have for `task_state`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-172) All example DAGs report "Only works with the CeleryExecutor, sorry"
[ https://issues.apache.org/jira/browse/AIRFLOW-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301030#comment-15301030 ] Andre commented on AIRFLOW-172: --- indeed. Noticed that. May need to re-read the documentation and possibly suggest some changes to tutorial > All example DAGs report "Only works with the CeleryExecutor, sorry" > --- > > Key: AIRFLOW-172 > URL: https://issues.apache.org/jira/browse/AIRFLOW-172 > Project: Apache Airflow > Issue Type: Bug > Components: executor >Affects Versions: Airflow 1.7.1 >Reporter: Andre > > After installing airflow and trying to run some example DAGs I was faced with > {{Only works with the CeleryExecutor, sorry}} > on every DAG I tried to run. > {code}$ pip list > airflow (1.7.1.2) > alembic (0.8.6) > Babel (1.3) > bitarray (0.8.1) > cffi (1.6.0) > chartkick (0.4.2) > croniter (0.3.12) > cryptography (1.3.2) > dill (0.2.5) > docutils (0.12) > Flask (0.10.1) > Flask-Admin (1.4.0) > Flask-Cache (0.13.1) > Flask-Login (0.2.11) > Flask-WTF (0.12) > funcsigs (0.4) > future (0.15.2) > google-apputils (0.4.2) > gunicorn (19.3.0) > hive-thrift-py (0.0.1) > idna (2.1) > impyla (0.13.7) > itsdangerous (0.24) > Jinja2 (2.8) > lockfile (0.12.2) > Mako (1.0.4) > Markdown (2.6.6) > MarkupSafe (0.23) > mysqlclient (1.3.7) > numpy (1.11.0) > pandas (0.18.1) > pip (8.1.2) > ply (3.8) > protobuf (2.6.1) > pyasn1 (0.1.9) > pycparser (2.14) > Pygments (2.1.3) > PyHive (0.1.8) > pykerberos (1.1.10) > python-daemon (2.1.1) > python-dateutil (2.5.3) > python-editor (1.0) > python-gflags (3.0.5) > pytz (2016.4) > requests (2.10.0) > setproctitle (1.1.10) > setuptools (21.2.1) > six (1.10.0) > snakebite (2.9.0) > SQLAlchemy (1.0.13) > thrift (0.9.3) > thriftpy (0.3.8) > unicodecsv (0.14.1) > Werkzeug (0.11.10) > WTForms (2.1) > {code} > {code} > $ airflow webserver -p 8088 > [2016-05-25 15:22:48,204] {__init__.py:36} INFO - Using executor LocalExecutor > _ > |__( )_ __/__ / __ > /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / / > ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ / > _/_/ |_/_/ /_//_//_/ \//|__/ > [2016-05-25 15:22:49,066] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > Running the Gunicorn server with 4 syncworkers on host 0.0.0.0 and port 8088 > with a timeout of 120... > [2016-05-25 15:22:49 +1000] [20191] [INFO] Starting gunicorn 19.3.0 > [2016-05-25 15:22:49 +1000] [20191] [INFO] Listening at: http://0.0.0.0:8088 > (20191) > [2016-05-25 15:22:49 +1000] [20191] [INFO] Using worker: sync > [2016-05-25 15:22:49 +1000] [20197] [INFO] Booting worker with pid: 20197 > [2016-05-25 15:22:49 +1000] [20198] [INFO] Booting worker with pid: 20198 > [2016-05-25 15:22:49 +1000] [20199] [INFO] Booting worker with pid: 20199 > [2016-05-25 15:22:49 +1000] [20200] [INFO] Booting worker with pid: 20200 > [2016-05-25 15:22:50,086] {__init__.py:36} INFO - Using executor LocalExecutor > [2016-05-25 15:22:50,176] {__init__.py:36} INFO - Using executor LocalExecutor > [2016-05-25 15:22:50,262] {__init__.py:36} INFO - Using executor LocalExecutor > [2016-05-25 15:22:50,364] {__init__.py:36} INFO - Using executor LocalExecutor > [2016-05-25 15:22:50,931] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > [2016-05-25 15:22:51,000] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > [2016-05-25 15:22:51,093] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > [2016-05-25 15:22:51,191] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-173) Create a FileSensor / NFSFileSensor sensor
[ https://issues.apache.org/jira/browse/AIRFLOW-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15301022#comment-15301022 ] Andre commented on AIRFLOW-173: --- inotify is more efficient but it is not portable to some file systems... For example, these are DFSs that can be mounted as normal filesystems but where, I suspect, the inotify approach wouldn't play ball nicely: Ceph http://www.spinics.net/lists/ceph-users/msg23087.html Gluster: https://www.gluster.org/pipermail/gluster-users/2012-September/011276.html As consequence, when writting https://github.com/apache/incubator-airflow/pull/1543, I ended up using a more "unsophisticated" approach of pooling the file (very much like WebHDFS and HDFS sensors are doing due to the lack of inotify). > Create a FileSensor / NFSFileSensor sensor > -- > > Key: AIRFLOW-173 > URL: https://issues.apache.org/jira/browse/AIRFLOW-173 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Andre >Priority: Minor > > While HDFS and WebHDFS suit most organisations using Hadoop, for some shops > running MapR-FS, Airflow implementation is simplified by the use of plain > files pointing to MapR's NFS gateways. > A FileSensor and/or a NFSFileSensor would assist the adoption of Airflow > within the MapR customer base, but more importantly, help those who are using > POSIX compliant distributed filesystems that can be mounted on Unix > derivative systems (e.g. as MapR-FS (via NFS), CephFS, GlusterFS, etc). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-167) Get dag state for a given execution date.
[ https://issues.apache.org/jira/browse/AIRFLOW-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300973#comment-15300973 ] Chris Riccomini commented on AIRFLOW-167: - This sounds reasonable to me. Want to send a PR? > Get dag state for a given execution date. > - > > Key: AIRFLOW-167 > URL: https://issues.apache.org/jira/browse/AIRFLOW-167 > Project: Apache Airflow > Issue Type: Improvement > Components: cli >Reporter: Sumit Maheshwari > > I was trying to get state for a particular dag-run programmatically, but > couldn't find a way. > If we could have a rest call like > `/admin/dagrun?dag_id=&execution_date=` and get the output that > would be best. Currently we've to do html parsing to get the same. > Other (and easier) way is to add a cli support like we have for `task_state`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-160) Parse DAG files through child processes
[ https://issues.apache.org/jira/browse/AIRFLOW-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300992#comment-15300992 ] Chris Riccomini commented on AIRFLOW-160: - {quote} We've also seen an unusual case where modules loaded by the user DAG affect operation of the scheduler {quote} We're also very concerned about security, and having DAGs evaluated in-process in the scheduler is pretty dangerous, since it allows DAGs to take over the scheduler. Definite +1 to making DAG parsing a subprocess. As a separate ticket, we will also probably want to make the subprocesses run as a DAG-specific user (e.g. owner). This will prevent DAGs from messing with the Airflow files on the file system, killing Airflow processes, etc. {quote} I think inotify is more suitable or an API call to refresh the dagbag if triggered externally. API call is also nicer because it can update all processes that require a load of the dagbag. {quote} +1 to this comment as well. Our ops folks were actually asking today if there's an API to trigger a DAG refresh. They are going to push DAGs to a folder via a deploy script, and would like to tell Airflow to refresh accordingly. Polling other than during this operation is pointless. inotify would also work (and is probably a better solution than the API, even). > Parse DAG files through child processes > --- > > Key: AIRFLOW-160 > URL: https://issues.apache.org/jira/browse/AIRFLOW-160 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Reporter: Paul Yang >Assignee: Paul Yang > > Currently, the Airflow scheduler parses all user DAG files in the same > process as the scheduler itself. We've seen issues in production where bad > DAG files cause scheduler to fail. A simple example is if the user script > calls `sys.exit(1)`, the scheduler will exit as well. We've also seen an > unusual case where modules loaded by the user DAG affect operation of the > scheduler. For better uptime, the scheduler should be resistant to these > problematic user DAGs. > The proposed solution is to parse and schedule user DAGs through child > processes. This way, the main scheduler process is more isolated from bad > DAGs. There's a side benefit as well - since parsing is distributed among > multiple processes, it's possible to parse the DAG files more frequently, > reducing the latency between when a DAG is modified and when the changes are > picked up. > Another issue right now is that all DAGs must be scheduled before any tasks > are sent to the executor. This means that the frequency of task scheduling is > limited by the slowest DAG to schedule. The changes needed for scheduling > DAGs through child processes will also make it easy to decouple this process > and allow tasks to be scheduled and sent to the executor in a more > independent fashion. This way, overall scheduling won't be held back by a > slow DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-101) Acces the tree view of the Web UI instead of the graph view when clicking on a dag
[ https://issues.apache.org/jira/browse/AIRFLOW-101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300985#comment-15300985 ] Chris Riccomini commented on AIRFLOW-101: - [~bolke], which PR was this part of? > Acces the tree view of the Web UI instead of the graph view when clicking on > a dag > -- > > Key: AIRFLOW-101 > URL: https://issues.apache.org/jira/browse/AIRFLOW-101 > Project: Apache Airflow > Issue Type: Improvement > Components: ui >Affects Versions: Airflow 1.7.0 > Environment: All >Reporter: Michal TOMA >Priority: Minor > Fix For: Airflow 1.7.1 > > Original Estimate: 1h > Remaining Estimate: 1h > > I'd like to have a config parameter that would allow to access directly the > tree view of the DAG tasks instead of the current graph view. > I my environment failed tasks are very common and I need to have a quick view > of what failed and when in the past. As of now I must click either the DAG > and than click the tree view menu or click the very small tree view icon. > For me the DAG graph is not that important and I'd like to see the tree view > when clicking on the name of the DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-167) Get dag state for a given execution date.
[ https://issues.apache.org/jira/browse/AIRFLOW-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300977#comment-15300977 ] Chris Riccomini commented on AIRFLOW-167: - Specifically, the CLI solution sounds reasonable. The REST solution is better, but we haven't yet set up a REST API for Airflow yet. In the mean time, want to send a PR for the CLI {{dag_state}} command? > Get dag state for a given execution date. > - > > Key: AIRFLOW-167 > URL: https://issues.apache.org/jira/browse/AIRFLOW-167 > Project: Apache Airflow > Issue Type: Improvement > Components: cli >Reporter: Sumit Maheshwari > > I was trying to get state for a particular dag-run programmatically, but > couldn't find a way. > If we could have a rest call like > `/admin/dagrun?dag_id=&execution_date=` and get the output that > would be best. Currently we've to do html parsing to get the same. > Other (and easier) way is to add a cli support like we have for `task_state`. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-176) PR tool crashes with non-integer JIRA ids
[ https://issues.apache.org/jira/browse/AIRFLOW-176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Lowin updated AIRFLOW-176: --- External issue URL: https://github.com/apache/incubator-airflow/pull/1544 > PR tool crashes with non-integer JIRA ids > - > > Key: AIRFLOW-176 > URL: https://issues.apache.org/jira/browse/AIRFLOW-176 > Project: Apache Airflow > Issue Type: Bug > Components: PR tool >Affects Versions: Airflow 1.7.1.2 >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin > > The PR tool crashes if a non-integer id is passed. This includes the default > ID (AIRFLOW-XXX) so it affects folks who don't type in a new ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-168: Affects Version/s: Airflow 1.7.1.2 > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300951#comment-15300951 ] Chris Riccomini commented on AIRFLOW-168: - What version of Airflow are you running? > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-168: Affects Version/s: (was: Airflow 1.7.1.2) > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-169) Hide expire dags in UI
[ https://issues.apache.org/jira/browse/AIRFLOW-169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-169: Component/s: ui > Hide expire dags in UI > -- > > Key: AIRFLOW-169 > URL: https://issues.apache.org/jira/browse/AIRFLOW-169 > Project: Apache Airflow > Issue Type: Wish > Components: ui >Reporter: Sumit Maheshwari > > It would be great if we've option to hide expired schedules from UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-169) Hide expire dags in UI
[ https://issues.apache.org/jira/browse/AIRFLOW-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300948#comment-15300948 ] Chris Riccomini commented on AIRFLOW-169: - What do you mean by expired schedules? You mean DagRuns that have finished? Which page in the UI are you referring to? > Hide expire dags in UI > -- > > Key: AIRFLOW-169 > URL: https://issues.apache.org/jira/browse/AIRFLOW-169 > Project: Apache Airflow > Issue Type: Wish > Components: ui >Reporter: Sumit Maheshwari > > It would be great if we've option to hide expired schedules from UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-172) All example DAGs report "Only works with the CeleryExecutor, sorry"
[ https://issues.apache.org/jira/browse/AIRFLOW-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300921#comment-15300921 ] Chris Riccomini commented on AIRFLOW-172: - Try turning on the scheduler: {{airflow scheduler}} > All example DAGs report "Only works with the CeleryExecutor, sorry" > --- > > Key: AIRFLOW-172 > URL: https://issues.apache.org/jira/browse/AIRFLOW-172 > Project: Apache Airflow > Issue Type: Bug > Components: executor >Affects Versions: Airflow 1.7.1 >Reporter: Andre > > After installing airflow and trying to run some example DAGs I was faced with > {{Only works with the CeleryExecutor, sorry}} > on every DAG I tried to run. > {code}$ pip list > airflow (1.7.1.2) > alembic (0.8.6) > Babel (1.3) > bitarray (0.8.1) > cffi (1.6.0) > chartkick (0.4.2) > croniter (0.3.12) > cryptography (1.3.2) > dill (0.2.5) > docutils (0.12) > Flask (0.10.1) > Flask-Admin (1.4.0) > Flask-Cache (0.13.1) > Flask-Login (0.2.11) > Flask-WTF (0.12) > funcsigs (0.4) > future (0.15.2) > google-apputils (0.4.2) > gunicorn (19.3.0) > hive-thrift-py (0.0.1) > idna (2.1) > impyla (0.13.7) > itsdangerous (0.24) > Jinja2 (2.8) > lockfile (0.12.2) > Mako (1.0.4) > Markdown (2.6.6) > MarkupSafe (0.23) > mysqlclient (1.3.7) > numpy (1.11.0) > pandas (0.18.1) > pip (8.1.2) > ply (3.8) > protobuf (2.6.1) > pyasn1 (0.1.9) > pycparser (2.14) > Pygments (2.1.3) > PyHive (0.1.8) > pykerberos (1.1.10) > python-daemon (2.1.1) > python-dateutil (2.5.3) > python-editor (1.0) > python-gflags (3.0.5) > pytz (2016.4) > requests (2.10.0) > setproctitle (1.1.10) > setuptools (21.2.1) > six (1.10.0) > snakebite (2.9.0) > SQLAlchemy (1.0.13) > thrift (0.9.3) > thriftpy (0.3.8) > unicodecsv (0.14.1) > Werkzeug (0.11.10) > WTForms (2.1) > {code} > {code} > $ airflow webserver -p 8088 > [2016-05-25 15:22:48,204] {__init__.py:36} INFO - Using executor LocalExecutor > _ > |__( )_ __/__ / __ > /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / / > ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ / > _/_/ |_/_/ /_//_//_/ \//|__/ > [2016-05-25 15:22:49,066] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > Running the Gunicorn server with 4 syncworkers on host 0.0.0.0 and port 8088 > with a timeout of 120... > [2016-05-25 15:22:49 +1000] [20191] [INFO] Starting gunicorn 19.3.0 > [2016-05-25 15:22:49 +1000] [20191] [INFO] Listening at: http://0.0.0.0:8088 > (20191) > [2016-05-25 15:22:49 +1000] [20191] [INFO] Using worker: sync > [2016-05-25 15:22:49 +1000] [20197] [INFO] Booting worker with pid: 20197 > [2016-05-25 15:22:49 +1000] [20198] [INFO] Booting worker with pid: 20198 > [2016-05-25 15:22:49 +1000] [20199] [INFO] Booting worker with pid: 20199 > [2016-05-25 15:22:49 +1000] [20200] [INFO] Booting worker with pid: 20200 > [2016-05-25 15:22:50,086] {__init__.py:36} INFO - Using executor LocalExecutor > [2016-05-25 15:22:50,176] {__init__.py:36} INFO - Using executor LocalExecutor > [2016-05-25 15:22:50,262] {__init__.py:36} INFO - Using executor LocalExecutor > [2016-05-25 15:22:50,364] {__init__.py:36} INFO - Using executor LocalExecutor > [2016-05-25 15:22:50,931] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > [2016-05-25 15:22:51,000] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > [2016-05-25 15:22:51,093] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > [2016-05-25 15:22:51,191] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-161) Redirection to external url
[ https://issues.apache.org/jira/browse/AIRFLOW-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300938#comment-15300938 ] Chris Riccomini commented on AIRFLOW-161: - I don't think that we want to embed Quoble logic directly in Airflow. I'm a bit out of my element on the UI-front, though. Perhaps there's a way to achieve this through plugins, or by simply putting the link in the logs? > Redirection to external url > --- > > Key: AIRFLOW-161 > URL: https://issues.apache.org/jira/browse/AIRFLOW-161 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Reporter: Sumit Maheshwari > > Hi, > I am not able to find a good way (apart from loading everything upfront), > where I can redirect someone to a external service url, using the information > stored in airflow. There could be many use cases like downloading a signed > file from s3, redirecting to hadoop job tracker, or a direct case on which I > am working which is linking airflow tasks to qubole commands. > I already have a working model and will open a PR soon. Please let me know if > there existing ways already. > Thanks, > Sumit -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (AIRFLOW-172) All example DAGs report "Only works with the CeleryExecutor, sorry"
[ https://issues.apache.org/jira/browse/AIRFLOW-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini closed AIRFLOW-172. --- Resolution: Not A Bug Please re-open if you have further questions. > All example DAGs report "Only works with the CeleryExecutor, sorry" > --- > > Key: AIRFLOW-172 > URL: https://issues.apache.org/jira/browse/AIRFLOW-172 > Project: Apache Airflow > Issue Type: Bug > Components: executor >Affects Versions: Airflow 1.7.1 >Reporter: Andre > > After installing airflow and trying to run some example DAGs I was faced with > {{Only works with the CeleryExecutor, sorry}} > on every DAG I tried to run. > {code}$ pip list > airflow (1.7.1.2) > alembic (0.8.6) > Babel (1.3) > bitarray (0.8.1) > cffi (1.6.0) > chartkick (0.4.2) > croniter (0.3.12) > cryptography (1.3.2) > dill (0.2.5) > docutils (0.12) > Flask (0.10.1) > Flask-Admin (1.4.0) > Flask-Cache (0.13.1) > Flask-Login (0.2.11) > Flask-WTF (0.12) > funcsigs (0.4) > future (0.15.2) > google-apputils (0.4.2) > gunicorn (19.3.0) > hive-thrift-py (0.0.1) > idna (2.1) > impyla (0.13.7) > itsdangerous (0.24) > Jinja2 (2.8) > lockfile (0.12.2) > Mako (1.0.4) > Markdown (2.6.6) > MarkupSafe (0.23) > mysqlclient (1.3.7) > numpy (1.11.0) > pandas (0.18.1) > pip (8.1.2) > ply (3.8) > protobuf (2.6.1) > pyasn1 (0.1.9) > pycparser (2.14) > Pygments (2.1.3) > PyHive (0.1.8) > pykerberos (1.1.10) > python-daemon (2.1.1) > python-dateutil (2.5.3) > python-editor (1.0) > python-gflags (3.0.5) > pytz (2016.4) > requests (2.10.0) > setproctitle (1.1.10) > setuptools (21.2.1) > six (1.10.0) > snakebite (2.9.0) > SQLAlchemy (1.0.13) > thrift (0.9.3) > thriftpy (0.3.8) > unicodecsv (0.14.1) > Werkzeug (0.11.10) > WTForms (2.1) > {code} > {code} > $ airflow webserver -p 8088 > [2016-05-25 15:22:48,204] {__init__.py:36} INFO - Using executor LocalExecutor > _ > |__( )_ __/__ / __ > /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / / > ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ / > _/_/ |_/_/ /_//_//_/ \//|__/ > [2016-05-25 15:22:49,066] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > Running the Gunicorn server with 4 syncworkers on host 0.0.0.0 and port 8088 > with a timeout of 120... > [2016-05-25 15:22:49 +1000] [20191] [INFO] Starting gunicorn 19.3.0 > [2016-05-25 15:22:49 +1000] [20191] [INFO] Listening at: http://0.0.0.0:8088 > (20191) > [2016-05-25 15:22:49 +1000] [20191] [INFO] Using worker: sync > [2016-05-25 15:22:49 +1000] [20197] [INFO] Booting worker with pid: 20197 > [2016-05-25 15:22:49 +1000] [20198] [INFO] Booting worker with pid: 20198 > [2016-05-25 15:22:49 +1000] [20199] [INFO] Booting worker with pid: 20199 > [2016-05-25 15:22:49 +1000] [20200] [INFO] Booting worker with pid: 20200 > [2016-05-25 15:22:50,086] {__init__.py:36} INFO - Using executor LocalExecutor > [2016-05-25 15:22:50,176] {__init__.py:36} INFO - Using executor LocalExecutor > [2016-05-25 15:22:50,262] {__init__.py:36} INFO - Using executor LocalExecutor > [2016-05-25 15:22:50,364] {__init__.py:36} INFO - Using executor LocalExecutor > [2016-05-25 15:22:50,931] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > [2016-05-25 15:22:51,000] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > [2016-05-25 15:22:51,093] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > [2016-05-25 15:22:51,191] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (AIRFLOW-171) Email does not work in 1.7.1.2
[ https://issues.apache.org/jira/browse/AIRFLOW-171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini closed AIRFLOW-171. --- Resolution: Information Provided Please re-open if you have more questions. > Email does not work in 1.7.1.2 > -- > > Key: AIRFLOW-171 > URL: https://issues.apache.org/jira/browse/AIRFLOW-171 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.7.1 > Environment: AWS Amazon Linux Image >Reporter: Hao Ye > > Job failure emails was working in 1.7.0. They seem to have stopped working in > 1.7.1. > Error is > {quote} > [2016-05-25 00:48:02,334] {models.py:1311} ERROR - Failed to send email to: > ['em...@email.com'] > [2016-05-25 00:48:02,334] {models.py:1312} ERROR - 'module' object has no > attribute 'send_email_smtp' > Traceback (most recent call last): > File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, > in handle_failure > self.email_alert(error, is_retry=False) > File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, > in email_alert > send_email(task.email, title, body) > File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line > 42, in send_email > backend = getattr(module, attr) > AttributeError: 'module' object has no attribute 'send_email_smtp' > {quote} > File exists and method exists. Seems to work fine when called in python > directly. > Maybe it's loading the wrong email module. > Tried to set PYTHONPATH to have > /usr/local/lib/python2.7/site-packages/airflow earlier in the path, but that > didn't seem to work either. > Could this be related to the utils refactoring that happened between 1.7.0 > and 1.7.1? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (AIRFLOW-175) PR merge tool needs to reset environment after work_local finishes
[ https://issues.apache.org/jira/browse/AIRFLOW-175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Lowin resolved AIRFLOW-175. Resolution: Fixed Merged in https://github.com/apache/incubator-airflow/pull/1534 > PR merge tool needs to reset environment after work_local finishes > -- > > Key: AIRFLOW-175 > URL: https://issues.apache.org/jira/browse/AIRFLOW-175 > Project: Apache Airflow > Issue Type: Bug > Components: PR tool >Affects Versions: Airflow 1.7.1.2 >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin > > If you use the pr tool to work locally ({{airflow-pr work_local}}) and make > changes to the files, then an error is raised when you try to exit the PR > tool because git refuses to overwrite the changes. The tool needs to call > {{git reset --hard}} before exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-171) Email does not work in 1.7.1.2
[ https://issues.apache.org/jira/browse/AIRFLOW-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300911#comment-15300911 ] Chris Riccomini commented on AIRFLOW-171: - This looks like something is wrong with your environment. > Email does not work in 1.7.1.2 > -- > > Key: AIRFLOW-171 > URL: https://issues.apache.org/jira/browse/AIRFLOW-171 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.7.1 > Environment: AWS Amazon Linux Image >Reporter: Hao Ye > > Job failure emails was working in 1.7.0. They seem to have stopped working in > 1.7.1. > Error is > {quote} > [2016-05-25 00:48:02,334] {models.py:1311} ERROR - Failed to send email to: > ['em...@email.com'] > [2016-05-25 00:48:02,334] {models.py:1312} ERROR - 'module' object has no > attribute 'send_email_smtp' > Traceback (most recent call last): > File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, > in handle_failure > self.email_alert(error, is_retry=False) > File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, > in email_alert > send_email(task.email, title, body) > File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line > 42, in send_email > backend = getattr(module, attr) > AttributeError: 'module' object has no attribute 'send_email_smtp' > {quote} > File exists and method exists. Seems to work fine when called in python > directly. > Maybe it's loading the wrong email module. > Tried to set PYTHONPATH to have > /usr/local/lib/python2.7/site-packages/airflow earlier in the path, but that > didn't seem to work either. > Could this be related to the utils refactoring that happened between 1.7.0 > and 1.7.1? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-108) Add data retention policy to Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300909#comment-15300909 ] Chris Riccomini commented on AIRFLOW-108: - I spoke with [~maxime.beauche...@apache.org] about this a bit. One trick that I didn't realize is that you can delete all of the task instances after their DagRun is marked as success/failed. Once the DagRun is marked as such, if the tasks are deleted, the scheduler won't try to re-run them because the DagRun is already showing as a terminal state. This is a bit hacky, but does work. I still think a retention policy that allows us to delete TaskInstances *and* DagRuns would be useful, but due to the trick described above, I think this JIRA is probably lower priority than it was when I initially filed this ticket. > Add data retention policy to Airflow > > > Key: AIRFLOW-108 > URL: https://issues.apache.org/jira/browse/AIRFLOW-108 > Project: Apache Airflow > Issue Type: Wish > Components: db, scheduler >Reporter: Chris Riccomini > > Airflow's DB currently holds the entire history of all executions for all > time. This is problematic as the DB grows. The UI starts to get slower, and > the DB's disk usage grows. There is no bound to how large the DB will grow. > It would be useful to add a feature in Airflow to do two things: > # Delete old data from the DB > # Mark some lower watermark, past which DAG executions are ignored > For example, (2) would allow you to tell the scheduler "ignore all data prior > to a year ago". And (1) would allow Airflow to delete all data prior to > January 1, 2015. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-171) Email does not work in 1.7.1.2
[ https://issues.apache.org/jira/browse/AIRFLOW-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300917#comment-15300917 ] Chris Riccomini commented on AIRFLOW-171: - Or this config value is wrong, as [~bolke] said: {code} path, attr = configuration.get('email', 'EMAIL_BACKEND').rsplit('.', 1) {code} > Email does not work in 1.7.1.2 > -- > > Key: AIRFLOW-171 > URL: https://issues.apache.org/jira/browse/AIRFLOW-171 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.7.1 > Environment: AWS Amazon Linux Image >Reporter: Hao Ye > > Job failure emails was working in 1.7.0. They seem to have stopped working in > 1.7.1. > Error is > {quote} > [2016-05-25 00:48:02,334] {models.py:1311} ERROR - Failed to send email to: > ['em...@email.com'] > [2016-05-25 00:48:02,334] {models.py:1312} ERROR - 'module' object has no > attribute 'send_email_smtp' > Traceback (most recent call last): > File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, > in handle_failure > self.email_alert(error, is_retry=False) > File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, > in email_alert > send_email(task.email, title, body) > File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line > 42, in send_email > backend = getattr(module, attr) > AttributeError: 'module' object has no attribute 'send_email_smtp' > {quote} > File exists and method exists. Seems to work fine when called in python > directly. > Maybe it's loading the wrong email module. > Tried to set PYTHONPATH to have > /usr/local/lib/python2.7/site-packages/airflow earlier in the path, but that > didn't seem to work either. > Could this be related to the utils refactoring that happened between 1.7.0 > and 1.7.1? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-171) Email does not work in 1.7.1.2
[ https://issues.apache.org/jira/browse/AIRFLOW-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300901#comment-15300901 ] Chris Riccomini commented on AIRFLOW-171: - Just a note: I have confirmed that email is working for me on 1.7.1.2. We had a task with retry=2, retry_delay5. It failed twice, and an email was sent. > Email does not work in 1.7.1.2 > -- > > Key: AIRFLOW-171 > URL: https://issues.apache.org/jira/browse/AIRFLOW-171 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.7.1 > Environment: AWS Amazon Linux Image >Reporter: Hao Ye > > Job failure emails was working in 1.7.0. They seem to have stopped working in > 1.7.1. > Error is > {quote} > [2016-05-25 00:48:02,334] {models.py:1311} ERROR - Failed to send email to: > ['em...@email.com'] > [2016-05-25 00:48:02,334] {models.py:1312} ERROR - 'module' object has no > attribute 'send_email_smtp' > Traceback (most recent call last): > File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, > in handle_failure > self.email_alert(error, is_retry=False) > File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, > in email_alert > send_email(task.email, title, body) > File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line > 42, in send_email > backend = getattr(module, attr) > AttributeError: 'module' object has no attribute 'send_email_smtp' > {quote} > File exists and method exists. Seems to work fine when called in python > directly. > Maybe it's loading the wrong email module. > Tried to set PYTHONPATH to have > /usr/local/lib/python2.7/site-packages/airflow earlier in the path, but that > didn't seem to work either. > Could this be related to the utils refactoring that happened between 1.7.0 > and 1.7.1? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-160) Parse DAG files through child processes
[ https://issues.apache.org/jira/browse/AIRFLOW-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300747#comment-15300747 ] Bolke de Bruin commented on AIRFLOW-160: +1 on the idea, -1 on more polling. I think inotify is more suitable or an API call to refresh the dagbag if triggered externally. API call is also nicer because it can update all processes that require a load of the dagbag. > Parse DAG files through child processes > --- > > Key: AIRFLOW-160 > URL: https://issues.apache.org/jira/browse/AIRFLOW-160 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Reporter: Paul Yang >Assignee: Paul Yang > > Currently, the Airflow scheduler parses all user DAG files in the same > process as the scheduler itself. We've seen issues in production where bad > DAG files cause scheduler to fail. A simple example is if the user script > calls `sys.exit(1)`, the scheduler will exit as well. We've also seen an > unusual case where modules loaded by the user DAG affect operation of the > scheduler. For better uptime, the scheduler should be resistant to these > problematic user DAGs. > The proposed solution is to parse and schedule user DAGs through child > processes. This way, the main scheduler process is more isolated from bad > DAGs. There's a side benefit as well - since parsing is distributed among > multiple processes, it's possible to parse the DAG files more frequently, > reducing the latency between when a DAG is modified and when the changes are > picked up. > Another issue right now is that all DAGs must be scheduled before any tasks > are sent to the executor. This means that the frequency of task scheduling is > limited by the slowest DAG to schedule. The changes needed for scheduling > DAGs through child processes will also make it easy to decouple this process > and allow tasks to be scheduled and sent to the executor in a more > independent fashion. This way, overall scheduling won't be held back by a > slow DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (AIRFLOW-101) Acces the tree view of the Web UI instead of the graph view when clicking on a dag
[ https://issues.apache.org/jira/browse/AIRFLOW-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin resolved AIRFLOW-101. Resolution: Fixed This has been done > Acces the tree view of the Web UI instead of the graph view when clicking on > a dag > -- > > Key: AIRFLOW-101 > URL: https://issues.apache.org/jira/browse/AIRFLOW-101 > Project: Apache Airflow > Issue Type: Improvement > Components: ui >Affects Versions: Airflow 1.7.0 > Environment: All >Reporter: Michal TOMA >Priority: Minor > Fix For: Airflow 1.7.1 > > Original Estimate: 1h > Remaining Estimate: 1h > > I'd like to have a config parameter that would allow to access directly the > tree view of the DAG tasks instead of the current graph view. > I my environment failed tasks are very common and I need to have a quick view > of what failed and when in the past. As of now I must click either the DAG > and than click the tree view menu or click the very small tree view icon. > For me the DAG graph is not that important and I'd like to see the tree view > when clicking on the name of the DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-166) Webserver times out using systemd script
[ https://issues.apache.org/jira/browse/AIRFLOW-166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300739#comment-15300739 ] Bolke de Bruin commented on AIRFLOW-166: It probably is due to some locations not being writable by airflow. Check if passing --pid helps. Maybe even --log --stdout --stderr is needed > Webserver times out using systemd script > > > Key: AIRFLOW-166 > URL: https://issues.apache.org/jira/browse/AIRFLOW-166 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.7.1 > Environment: CentOS 7 >Reporter: Yuri Bendana > > I just upgraded to 1.7.1.2 from 1.6 and I'm having a problem starting the > webserver using the systemd script. This used to work fine. The issue is > that it starts and then just hangs, no error is reported and it finally times > out after about a minute. I tried starting it from the command line and it > works fine without timing out. I also ran it in daemon mode with -D and > again it seems to be fine. Any thoughts on how to debug this? > Here's the log output: > {code} > May 23 16:27:50 ybendana-linux systemd: Starting Airflow webserver daemon... > May 23 16:27:51 ybendana-linux airflow: [2016-05-23 16:27:51,444] > {__init__.py:36} INFO - Using executor LocalExecutor > May 23 16:27:53 ybendana-linux airflow: _ > May 23 16:27:53 ybendana-linux airflow: |__( )_ __/__ > / __ > May 23 16:27:53 ybendana-linux airflow: /| |_ /__ ___/_ /_ __ /_ > __ \_ | /| / / > May 23 16:27:53 ybendana-linux airflow: ___ ___ | / _ / _ __/ _ / / > /_/ /_ |/ |/ / > May 23 16:27:53 ybendana-linux airflow: _/_/ |_/_/ /_//_//_/ > \//|__/ > May 23 16:27:53 ybendana-linux airflow: [2016-05-23 16:27:53,446] > {models.py:154} INFO - Filling up the DagBag from /opt/airflow/dags > May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55 +] [15960] > [INFO] Starting gunicorn 19.3.0 > May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55 +] [15960] > [INFO] Listening at: http://0.0.0.0:8080 (15960) > May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55 +] [15960] > [INFO] Using worker: sync > May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55 +] [16067] > [INFO] Booting worker with pid: 16067 > May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55 +] [16069] > [INFO] Booting worker with pid: 16069 > May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55 +] [16070] > [INFO] Booting worker with pid: 16070 > May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55 +] [16071] > [INFO] Booting worker with pid: 16071 > May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55,876] > {__init__.py:36} INFO - Using executor LocalExecutor > May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55,950] > {__init__.py:36} INFO - Using executor LocalExecutor > May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55,972] > {__init__.py:36} INFO - Using executor LocalExecutor > May 23 16:27:55 ybendana-linux airflow: [2016-05-23 16:27:55,997] > {__init__.py:36} INFO - Using executor LocalExecutor > May 23 16:27:57 ybendana-linux airflow: [2016-05-23 16:27:57,885] > {models.py:154} INFO - Filling up the DagBag from /opt/airflow/dags > May 23 16:27:57 ybendana-linux airflow: [2016-05-23 16:27:57,951] > {models.py:154} INFO - Filling up the DagBag from /opt/airflow/dags > May 23 16:27:57 ybendana-linux airflow: [2016-05-23 16:27:57,983] > {models.py:154} INFO - Filling up the DagBag from /opt/airflow/dags > May 23 16:27:58 ybendana-linux airflow: [2016-05-23 16:27:58,014] > {models.py:154} INFO - Filling up the DagBag from /opt/airflow/dags > May 23 16:29:20 ybendana-linux systemd: airflow-webserver.service start > operation timed out. Terminating. > May 23 16:29:20 ybendana-linux airflow: [2016-05-23 16:29:20 +] [16070] > [INFO] Worker exiting (pid: 16070) > May 23 16:29:20 ybendana-linux airflow: [2016-05-23 16:29:20 +] [15960] > [INFO] Handling signal: term > May 23 16:29:20 ybendana-linux airflow: [2016-05-23 16:29:20 +] [16071] > [INFO] Worker exiting (pid: 16071) > May 23 16:29:20 ybendana-linux airflow: [2016-05-23 16:29:20 +] [16069] > [INFO] Worker exiting (pid: 16069) > May 23 16:29:20 ybendana-linux airflow: [2016-05-23 16:29:20 +] [16067] > [INFO] Worker exiting (pid: 16067) > May 23 16:29:21 ybendana-linux airflow: [2016-05-23 16:29:21 +] [15960] > [INFO] Shutting down: Master > May 23 16:29:21 ybendana-linux systemd: Failed to start Airflow webserver > daemon. > May 23 16:29:21 ybendana-linux systemd: Unit airflow-webserver.service > entered failed state. > May 23 16:29:21 ybendana-linux systemd: airflow-webserver.service failed.
[jira] [Commented] (AIRFLOW-171) Email does not work in 1.7.1.2
[ https://issues.apache.org/jira/browse/AIRFLOW-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300733#comment-15300733 ] Bolke de Bruin commented on AIRFLOW-171: Have you checked your config and properly configured the backend? > Email does not work in 1.7.1.2 > -- > > Key: AIRFLOW-171 > URL: https://issues.apache.org/jira/browse/AIRFLOW-171 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.7.1 > Environment: AWS Amazon Linux Image >Reporter: Hao Ye > > Job failure emails was working in 1.7.0. They seem to have stopped working in > 1.7.1. > Error is > {quote} > [2016-05-25 00:48:02,334] {models.py:1311} ERROR - Failed to send email to: > ['em...@email.com'] > [2016-05-25 00:48:02,334] {models.py:1312} ERROR - 'module' object has no > attribute 'send_email_smtp' > Traceback (most recent call last): > File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, > in handle_failure > self.email_alert(error, is_retry=False) > File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, > in email_alert > send_email(task.email, title, body) > File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line > 42, in send_email > backend = getattr(module, attr) > AttributeError: 'module' object has no attribute 'send_email_smtp' > {quote} > File exists and method exists. Seems to work fine when called in python > directly. > Maybe it's loading the wrong email module. > Tried to set PYTHONPATH to have > /usr/local/lib/python2.7/site-packages/airflow earlier in the path, but that > didn't seem to work either. > Could this be related to the utils refactoring that happened between 1.7.0 > and 1.7.1? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-173) Create a FileSensor / NFSFileSensor sensor
[ https://issues.apache.org/jira/browse/AIRFLOW-173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300728#comment-15300728 ] Bolke de Bruin commented on AIRFLOW-173: I like it, but wouldnt a inotify combination with a trigger dag_run not be more efficient? > Create a FileSensor / NFSFileSensor sensor > -- > > Key: AIRFLOW-173 > URL: https://issues.apache.org/jira/browse/AIRFLOW-173 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Andre >Priority: Minor > > While HDFS and WebHDFS suit most organisations using Hadoop, for some shops > running MapR-FS, Airflow implementation is simplified by the use of plain > files pointing to MapR's NFS gateways. > A FileSensor and/or a NFSFileSensor would assist the adoption of Airflow > within the MapR customer base, but more importantly, help those who are using > POSIX compliant distributed filesystems that can be mounted on Unix > derivative systems (e.g. as MapR-FS (via NFS), CephFS, GlusterFS, etc). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (AIRFLOW-157) Minor fixes for PR merge tool
[ https://issues.apache.org/jira/browse/AIRFLOW-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Lowin resolved AIRFLOW-157. Resolution: Fixed Fix Version/s: (was: Airflow 1.8) Merged in https://github.com/apache/incubator-airflow/pull/1534 > Minor fixes for PR merge tool > - > > Key: AIRFLOW-157 > URL: https://issues.apache.org/jira/browse/AIRFLOW-157 > Project: Apache Airflow > Issue Type: Bug > Components: PR tool >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin >Priority: Minor > > 1. subscripting a {{filter}} object fails in Python3 > 2. JIRA issue inference looks for a 4 or 5 digit issue number... we're not > quite there yet! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-176) PR tool crashes with non-integer JIRA ids
Jeremiah Lowin created AIRFLOW-176: -- Summary: PR tool crashes with non-integer JIRA ids Key: AIRFLOW-176 URL: https://issues.apache.org/jira/browse/AIRFLOW-176 Project: Apache Airflow Issue Type: Bug Components: PR tool Affects Versions: Airflow 1.7.1.2 Reporter: Jeremiah Lowin Assignee: Jeremiah Lowin The PR tool crashes if a non-integer id is passed. This includes the default ID (AIRFLOW-XXX) so it affects folks who don't type in a new ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-157) Minor fixes for PR merge tool
[ https://issues.apache.org/jira/browse/AIRFLOW-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300240#comment-15300240 ] ASF subversion and git services commented on AIRFLOW-157: - Commit 805944b74744b34e1510c2f5d080de98704705d0 in incubator-airflow's branch refs/heads/master from [~jlowin] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=805944b ] [AIRFLOW-157] Make PR tool Py3-compat; add JIRA command - Adds Python3 compatibility (filter objects can't be indexed) - Adds JIRA command to close issues without merging a PR - Adds general usability fixes and starts cleaning up code > Minor fixes for PR merge tool > - > > Key: AIRFLOW-157 > URL: https://issues.apache.org/jira/browse/AIRFLOW-157 > Project: Apache Airflow > Issue Type: Bug > Components: PR tool >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin >Priority: Minor > Fix For: Airflow 1.8 > > > 1. subscripting a {{filter}} object fails in Python3 > 2. JIRA issue inference looks for a 4 or 5 digit issue number... we're not > quite there yet! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[1/3] incubator-airflow git commit: [AIRFLOW-157] Make PR tool Py3-compat; add JIRA command
Repository: incubator-airflow Updated Branches: refs/heads/master ac96fbf85 -> 7332c40c2 [AIRFLOW-157] Make PR tool Py3-compat; add JIRA command - Adds Python3 compatibility (filter objects can't be indexed) - Adds JIRA command to close issues without merging a PR - Adds general usability fixes and starts cleaning up code Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/805944b7 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/805944b7 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/805944b7 Branch: refs/heads/master Commit: 805944b74744b34e1510c2f5d080de98704705d0 Parents: 98f10d5 Author: jlowin Authored: Fri May 20 17:15:07 2016 -0400 Committer: jlowin Committed: Wed May 25 10:52:13 2016 -0400 -- dev/README.md | 5 +- dev/airflow-pr | 220 +++- 2 files changed, 134 insertions(+), 91 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/805944b7/dev/README.md -- diff --git a/dev/README.md b/dev/README.md index 59ea024..a0c185e 100755 --- a/dev/README.md +++ b/dev/README.md @@ -8,7 +8,6 @@ It is very important that PRs reference a JIRA issue. The preferred way to do th __Please note:__ this tool will restore your current branch when it finishes, but you will lose any uncommitted changes. Make sure you commit any changes you wish to keep before proceeding. -Also, do not run this tool from inside the `dev` folder if you are working with a PR that predates the `dev` directory. It will be unable to restore itself from a nonexistent location. Run it from the main airflow directory instead: `dev/airflow-pr`. ### Execution Simply execute the `airflow-pr` tool: @@ -28,6 +27,7 @@ Options: --help Show this message and exit. Commands: + close_jira Close a JIRA issue (without merging a PR) merge Merge a GitHub PR into Airflow master work_local Clone a GitHub PR locally for testing (no push) ``` @@ -38,8 +38,7 @@ Execute `airflow-pr merge` to be interactively guided through the process of mer Execute `airflow-pr work_local` to only merge the PR locally. The tool will pause once the merge is complete, allowing the user to explore the PR, and then will delete the merge and restore the original development environment. -Both commands can be followed by a PR number (`airflow-pr merge 42`); otherwise the tool will prompt for one. - +Execute `airflow-pr close_jira` to close a JIRA issue without needing to merge a PR. You will be prompted for an issue number and close comment. ### Configuration http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/805944b7/dev/airflow-pr -- diff --git a/dev/airflow-pr b/dev/airflow-pr index 918ad54..dab9540 100755 --- a/dev/airflow-pr +++ b/dev/airflow-pr @@ -35,6 +35,7 @@ import os import re import subprocess import sys +import textwrap # Python 3 compatibility try: @@ -95,41 +96,32 @@ def get_json(url): if ( "X-RateLimit-Remaining" in e.headers and e.headers["X-RateLimit-Remaining"] == '0'): -print( +click.echo( "Exceeded the GitHub API rate limit; set the environment " "variable GITHUB_OAUTH_KEY in order to make authenticated " "GitHub requests.") else: -print("Unable to fetch URL, exiting: %s" % url) +click.echo("Unable to fetch URL, exiting: %s" % url) sys.exit(-1) def fail(msg): -print(msg) +click.echo(msg) clean_up() sys.exit(-1) def run_cmd(cmd): if isinstance(cmd, list): -print(' {}'.format(' '.join(cmd))) +click.echo('>> Running command: {}'.format(' '.join(cmd))) return subprocess.check_output(cmd).decode('utf-8') else: -print(' {}'.format(cmd)) +click.echo('>> Running command: {}'.format(cmd)) return subprocess.check_output(cmd.split(" ")).decode('utf-8') -def get_yes_no(prompt): -while True: -result = raw_input("\n%s (y/n): " % prompt) -if result.lower() not in ('y', 'n'): -print('Invalid response.') -else: -break -return result.lower() == 'y' - def continue_maybe(prompt): -if not get_yes_no(prompt): +if not click.confirm(prompt): fail("Okay, exiting.") @@ -137,13 +129,13 @@ def clean_up(): if 'original_head' not in globals(): return -print("Restoring head pointer to %s" % original_head) +click.echo("Restoring head pointer to %s" % original_head) run_cmd("git checkout %s" % o
[2/3] incubator-airflow git commit: [AIRFLOW-175] Run git-reset before checkout in PR tool
[AIRFLOW-175] Run git-reset before checkout in PR tool If the user made any changes, git checkout will fail because the changes would be overwritten. Running git reset blows the changes away. Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/6d87679a Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/6d87679a Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/6d87679a Branch: refs/heads/master Commit: 6d87679a56b7fd6f918439db953ca6b959752721 Parents: 805944b Author: jlowin Authored: Wed May 25 10:49:10 2016 -0400 Committer: jlowin Committed: Wed May 25 10:53:22 2016 -0400 -- dev/airflow-pr | 3 +++ 1 file changed, 3 insertions(+) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/6d87679a/dev/airflow-pr -- diff --git a/dev/airflow-pr b/dev/airflow-pr index dab9540..8dd8df7 100755 --- a/dev/airflow-pr +++ b/dev/airflow-pr @@ -129,6 +129,9 @@ def clean_up(): if 'original_head' not in globals(): return +click.echo('Resetting git to remove any changes') +run_cmd('git reset --hard') + click.echo("Restoring head pointer to %s" % original_head) run_cmd("git checkout %s" % original_head)
[3/3] incubator-airflow git commit: Merge pull request #1534 from jlowin/pr-tool-2
Merge pull request #1534 from jlowin/pr-tool-2 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/7332c40c Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/7332c40c Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/7332c40c Branch: refs/heads/master Commit: 7332c40c24f85ca3be20511af1c6b618b5adfe7f Parents: ac96fbf 6d87679 Author: jlowin Authored: Wed May 25 11:40:53 2016 -0400 Committer: jlowin Committed: Wed May 25 11:40:53 2016 -0400 -- dev/README.md | 5 +- dev/airflow-pr | 223 +++- 2 files changed, 137 insertions(+), 91 deletions(-) --
[jira] [Commented] (AIRFLOW-175) PR merge tool needs to reset environment after work_local finishes
[ https://issues.apache.org/jira/browse/AIRFLOW-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300241#comment-15300241 ] ASF subversion and git services commented on AIRFLOW-175: - Commit 6d87679a56b7fd6f918439db953ca6b959752721 in incubator-airflow's branch refs/heads/master from [~jlowin] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=6d87679 ] [AIRFLOW-175] Run git-reset before checkout in PR tool If the user made any changes, git checkout will fail because the changes would be overwritten. Running git reset blows the changes away. > PR merge tool needs to reset environment after work_local finishes > -- > > Key: AIRFLOW-175 > URL: https://issues.apache.org/jira/browse/AIRFLOW-175 > Project: Apache Airflow > Issue Type: Bug > Components: PR tool >Affects Versions: Airflow 1.7.1.2 >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin > > If you use the pr tool to work locally ({{airflow-pr work_local}}) and make > changes to the files, then an error is raised when you try to exit the PR > tool because git refuses to overwrite the changes. The tool needs to call > {{git reset --hard}} before exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-175) PR merge tool needs to reset environment after work_local finishes
[ https://issues.apache.org/jira/browse/AIRFLOW-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300226#comment-15300226 ] Chris Riccomini commented on AIRFLOW-175: - +1 > PR merge tool needs to reset environment after work_local finishes > -- > > Key: AIRFLOW-175 > URL: https://issues.apache.org/jira/browse/AIRFLOW-175 > Project: Apache Airflow > Issue Type: Bug > Components: PR tool >Affects Versions: Airflow 1.7.1.2 >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin > > If you use the pr tool to work locally ({{airflow-pr work_local}}) and make > changes to the files, then an error is raised when you try to exit the PR > tool because git refuses to overwrite the changes. The tool needs to call > {{git reset --hard}} before exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-175) PR merge tool needs to reset environment after work_local finishes
[ https://issues.apache.org/jira/browse/AIRFLOW-175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-175: External issue URL: https://github.com/apache/incubator-airflow/pull/1534 > PR merge tool needs to reset environment after work_local finishes > -- > > Key: AIRFLOW-175 > URL: https://issues.apache.org/jira/browse/AIRFLOW-175 > Project: Apache Airflow > Issue Type: Bug > Components: PR tool >Affects Versions: Airflow 1.7.1.2 >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin > > If you use the pr tool to work locally ({{airflow-pr work_local}}) and make > changes to the files, then an error is raised when you try to exit the PR > tool because git refuses to overwrite the changes. The tool needs to call > {{git reset --hard}} before exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-157) Minor fixes for PR merge tool
[ https://issues.apache.org/jira/browse/AIRFLOW-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini updated AIRFLOW-157: Component/s: PR tool > Minor fixes for PR merge tool > - > > Key: AIRFLOW-157 > URL: https://issues.apache.org/jira/browse/AIRFLOW-157 > Project: Apache Airflow > Issue Type: Bug > Components: PR tool >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin >Priority: Minor > Fix For: Airflow 1.8 > > > 1. subscripting a {{filter}} object fails in Python3 > 2. JIRA issue inference looks for a 4 or 5 digit issue number... we're not > quite there yet! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-175) PR merge tool needs to reset environment after work_local finishes
Jeremiah Lowin created AIRFLOW-175: -- Summary: PR merge tool needs to reset environment after work_local finishes Key: AIRFLOW-175 URL: https://issues.apache.org/jira/browse/AIRFLOW-175 Project: Apache Airflow Issue Type: Bug Components: PR tool Affects Versions: 1.7.1.2 Reporter: Jeremiah Lowin Assignee: Jeremiah Lowin If you use the pr tool to work locally ({{airflow-pr work_local}}) and make changes to the files, then an error is raised when you try to exit the PR tool because git refuses to overwrite the changes. The tool needs to call {{git reset --hard}} before exiting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (AIRFLOW-52) Release airflow 1.7.1
[ https://issues.apache.org/jira/browse/AIRFLOW-52?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Riccomini closed AIRFLOW-52. -- Resolution: Done Closing. > Release airflow 1.7.1 > - > > Key: AIRFLOW-52 > URL: https://issues.apache.org/jira/browse/AIRFLOW-52 > Project: Apache Airflow > Issue Type: Task > Components: release >Reporter: Dan Davydov >Assignee: Dan Davydov > Labels: release > > Release the airflow 1.7.1 tag. > Current status: > There are three issues blocking this release caused by this commit: > https://github.com/apache/incubator-airflow/commit/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6 > -1. DAGs with a lot of tasks take much longer to parse (~25x slowdown)- > -2. The following kind of patterns fail:- > {code} > email.set_upstream(dag.roots) > dag.add_task(email) > {code} > This is because set_upstream now calls add_task and a task can't be added > more than once. > -3. Airflow losing queued tasks (see linked issue)- > -4. Airflow putting dags in a stuck state (AIRFLOW-92)- > I'm working with the owner of the commit to resolve these issues. > The way to catch (1) in the future is an integration test that asserts a > given non-trivial DAG parses under X seconds -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-172) All example DAGs report "Only works with the CeleryExecutor, sorry"
[ https://issues.apache.org/jira/browse/AIRFLOW-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300096#comment-15300096 ] Jeremiah Lowin commented on AIRFLOW-172: Are you trying to run tasks by hand in the Airflow UI? I think that's the only place where that error message exists. That's different than running a DAG, it's more for maintenance. > All example DAGs report "Only works with the CeleryExecutor, sorry" > --- > > Key: AIRFLOW-172 > URL: https://issues.apache.org/jira/browse/AIRFLOW-172 > Project: Apache Airflow > Issue Type: Bug > Components: executor >Affects Versions: Airflow 1.7.1 >Reporter: Andre > > After installing airflow and trying to run some example DAGs I was faced with > {{Only works with the CeleryExecutor, sorry}} > on every DAG I tried to run. > {code}$ pip list > airflow (1.7.1.2) > alembic (0.8.6) > Babel (1.3) > bitarray (0.8.1) > cffi (1.6.0) > chartkick (0.4.2) > croniter (0.3.12) > cryptography (1.3.2) > dill (0.2.5) > docutils (0.12) > Flask (0.10.1) > Flask-Admin (1.4.0) > Flask-Cache (0.13.1) > Flask-Login (0.2.11) > Flask-WTF (0.12) > funcsigs (0.4) > future (0.15.2) > google-apputils (0.4.2) > gunicorn (19.3.0) > hive-thrift-py (0.0.1) > idna (2.1) > impyla (0.13.7) > itsdangerous (0.24) > Jinja2 (2.8) > lockfile (0.12.2) > Mako (1.0.4) > Markdown (2.6.6) > MarkupSafe (0.23) > mysqlclient (1.3.7) > numpy (1.11.0) > pandas (0.18.1) > pip (8.1.2) > ply (3.8) > protobuf (2.6.1) > pyasn1 (0.1.9) > pycparser (2.14) > Pygments (2.1.3) > PyHive (0.1.8) > pykerberos (1.1.10) > python-daemon (2.1.1) > python-dateutil (2.5.3) > python-editor (1.0) > python-gflags (3.0.5) > pytz (2016.4) > requests (2.10.0) > setproctitle (1.1.10) > setuptools (21.2.1) > six (1.10.0) > snakebite (2.9.0) > SQLAlchemy (1.0.13) > thrift (0.9.3) > thriftpy (0.3.8) > unicodecsv (0.14.1) > Werkzeug (0.11.10) > WTForms (2.1) > {code} > {code} > $ airflow webserver -p 8088 > [2016-05-25 15:22:48,204] {__init__.py:36} INFO - Using executor LocalExecutor > _ > |__( )_ __/__ / __ > /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / / > ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ / > _/_/ |_/_/ /_//_//_/ \//|__/ > [2016-05-25 15:22:49,066] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > Running the Gunicorn server with 4 syncworkers on host 0.0.0.0 and port 8088 > with a timeout of 120... > [2016-05-25 15:22:49 +1000] [20191] [INFO] Starting gunicorn 19.3.0 > [2016-05-25 15:22:49 +1000] [20191] [INFO] Listening at: http://0.0.0.0:8088 > (20191) > [2016-05-25 15:22:49 +1000] [20191] [INFO] Using worker: sync > [2016-05-25 15:22:49 +1000] [20197] [INFO] Booting worker with pid: 20197 > [2016-05-25 15:22:49 +1000] [20198] [INFO] Booting worker with pid: 20198 > [2016-05-25 15:22:49 +1000] [20199] [INFO] Booting worker with pid: 20199 > [2016-05-25 15:22:49 +1000] [20200] [INFO] Booting worker with pid: 20200 > [2016-05-25 15:22:50,086] {__init__.py:36} INFO - Using executor LocalExecutor > [2016-05-25 15:22:50,176] {__init__.py:36} INFO - Using executor LocalExecutor > [2016-05-25 15:22:50,262] {__init__.py:36} INFO - Using executor LocalExecutor > [2016-05-25 15:22:50,364] {__init__.py:36} INFO - Using executor LocalExecutor > [2016-05-25 15:22:50,931] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > [2016-05-25 15:22:51,000] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > [2016-05-25 15:22:51,093] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > [2016-05-25 15:22:51,191] {models.py:154} INFO - Filling up the DagBag from > /opt/airflow/production/dags > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-52) Release airflow 1.7.1
[ https://issues.apache.org/jira/browse/AIRFLOW-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300086#comment-15300086 ] Jeremiah Lowin commented on AIRFLOW-52: --- Should this be closed? > Release airflow 1.7.1 > - > > Key: AIRFLOW-52 > URL: https://issues.apache.org/jira/browse/AIRFLOW-52 > Project: Apache Airflow > Issue Type: Task > Components: release >Reporter: Dan Davydov >Assignee: Dan Davydov > Labels: release > > Release the airflow 1.7.1 tag. > Current status: > There are three issues blocking this release caused by this commit: > https://github.com/apache/incubator-airflow/commit/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6 > -1. DAGs with a lot of tasks take much longer to parse (~25x slowdown)- > -2. The following kind of patterns fail:- > {code} > email.set_upstream(dag.roots) > dag.add_task(email) > {code} > This is because set_upstream now calls add_task and a task can't be added > more than once. > -3. Airflow losing queued tasks (see linked issue)- > -4. Airflow putting dags in a stuck state (AIRFLOW-92)- > I'm working with the owner of the commit to resolve these issues. > The way to catch (1) in the future is an integration test that asserts a > given non-trivial DAG parses under X seconds -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-174) Add --debug option to scheduler
Jeremiah Lowin created AIRFLOW-174: -- Summary: Add --debug option to scheduler Key: AIRFLOW-174 URL: https://issues.apache.org/jira/browse/AIRFLOW-174 Project: Apache Airflow Issue Type: Improvement Components: scheduler Affects Versions: Airflow 1.7.1 Reporter: Jeremiah Lowin Assignee: Bolke de Bruin Priority: Minor {{airflow webserver}} has a {{--debug}} param which enables the use of interactive debuggers like {{ipdb}} (among other side effects). Unfortunately the {{airflow scheduler}} process does not respect debugger instructions, which makes tracing errors very difficult. It just prints the following error and resumes operation: {code} Traceback (most recent call last): File "/Users/jlowin/git/airflow/airflow/jobs.py", line 690, in _do_dags self.process_dag(dag, tis_out) File "/Users/jlowin/git/airflow/airflow/jobs.py", line 521, in process_dag run.update_state() File "/Users/jlowin/git/airflow/airflow/utils/db.py", line 53, in wrapper result = func(*args, **kwargs) File "/Users/jlowin/git/airflow/airflow/models.py", line 3471, in update_state all_deadlocked = (has_unfinished_tasks and no_dependencies_met) File "/Users/jlowin/git/airflow/airflow/models.py", line 3471, in update_state all_deadlocked = (has_unfinished_tasks and no_dependencies_met) File "/Users/jlowin/anaconda3/lib/python3.5/bdb.py", line 48, in trace_dispatch return self.dispatch_line(frame) File "/Users/jlowin/anaconda3/lib/python3.5/bdb.py", line 67, in dispatch_line if self.quitting: raise BdbQuit bdb.BdbQuit {code} [~bolke] I'm assigning this to you for now because I suspect it's related to the subprocess/daemonizing changes you made though I'm not sure. If we can enable {{ipdb}} it will make future scheduler work so much easier! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-168) schedule_interval @once scheduling dag atleast twice
[ https://issues.apache.org/jira/browse/AIRFLOW-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299668#comment-15299668 ] Sumit Maheshwari commented on AIRFLOW-168: -- Also heard that if we change the start_date of that dag, scheduler creates instances for all the dates in between earlier and changed one as well. > schedule_interval @once scheduling dag atleast twice > > > Key: AIRFLOW-168 > URL: https://issues.apache.org/jira/browse/AIRFLOW-168 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Reporter: Sumit Maheshwari > Attachments: Screen Shot 2016-05-24 at 9.51.50 PM.png > > > I was looking at example_xcom example and found that it got scheduled twice. > Ones at the start_time and ones at the current time. To be correct I tried > multiple times (by reloading db) and its same. > I am on airflow master, using sequential executor with sqlite3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-173) Create a FileSensor / NFSFileSensor sensor
Andre created AIRFLOW-173: - Summary: Create a FileSensor / NFSFileSensor sensor Key: AIRFLOW-173 URL: https://issues.apache.org/jira/browse/AIRFLOW-173 Project: Apache Airflow Issue Type: Improvement Reporter: Andre Priority: Minor While HDFS and WebHDFS suit most organisations using Hadoop, for some shops running MapR-FS, Airflow implementation is simplified by the use of plain files pointing to MapR's NFS gateways. A FileSensor and/or a NFSFileSensor would assist the adoption of Airflow within the MapR customer base, but more importantly, help those who are using POSIX compliant distributed filesystems that can be mounted on Unix derivative systems (e.g. as MapR-FS (via NFS), CephFS, GlusterFS, etc). -- This message was sent by Atlassian JIRA (v6.3.4#6332)