[jira] [Work started] (AIRFLOW-946) Virtualenv not explicitly used by webserver/worker subprocess

2017-03-07 Thread Daniel Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-946 started by Daniel Huang.

> Virtualenv not explicitly used by webserver/worker subprocess
> -
>
> Key: AIRFLOW-946
> URL: https://issues.apache.org/jira/browse/AIRFLOW-946
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Reporter: Daniel Huang
>Assignee: Daniel Huang
>Priority: Minor
>
> I have airflow installed in a virtualenv. I'd expect calling 
> {{/path/to/venv/bin/airflow webserver}} or {{/path/to/venv/bin/airflow 
> worker}} *without activating my virtualenv* to work. However, they both fail 
> to run properly because they spawn a process that is called without 
> specifying the virtualenv.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-955) job failed to execute tasks

2017-03-07 Thread Jeff Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Liu updated AIRFLOW-955:
-
Attachment: browse-task-instances-example-03-01-2017.png

> job failed to execute tasks
> ---
>
> Key: AIRFLOW-955
> URL: https://issues.apache.org/jira/browse/AIRFLOW-955
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jeff Liu
> Attachments: browse-task-instances-example-03-01-2017.png
>
>
> I have recently run into a very strange issue that the job fails consistently 
> to run tasks or hung in a running state without running any tasks.
> My dag structure has four levels, the main dag-> a subdag (level 1) -> a 
> subdag( level2) -> 5 tasks. 
> The log shows that sometime the level2 subdag fails to execute tasks, but 
> when I look for the logs for tasks, I don't see any records that the tasks 
> were ever executed (on the matching ds date).
> Here is the some logs from a level2 dag example:
> {noformat}
> 
> Attempt 1 out of 2
> 
> [2017-03-07 06:45:55,230] {models.py:1041} INFO - Executing 
>  aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units>
> on 2017-03-01 16:00:00
> [2017-03-07 06:45:55,462] {base_executor.py:34} INFO - Adding to queue: 
> airflow run 
> aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris.aws_pr
> od_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units 
> mysql_to_gcs 2017-03-01T16:00:00 --local -s 2017-03-01T16:00:00 -sd 
> DAGS_FOLDER/aws_prod_s
> tore_hybris_to_bq.py
> [2017-03-07 06:46:00,017] {celery_executor.py:62} INFO - [celery] queuing 
> (u'aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris.aws_prod_stor
> e_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units', 
> 'mysql_to_gcs', datetime.datetime(2017, 3, 1, 16, 0)) through celery, 
> queue=default
> [2017-03-07 06:46:00,169] {jobs.py:813} INFO - [backfill progress] waiting: 6 
> | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
> [2017-03-07 06:46:05,094] {jobs.py:813} INFO - [backfill progress] waiting: 6 
> | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
> [2017-03-07 06:46:10,015] {jobs.py:813} INFO - [backfill progress] waiting: 6 
> | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
> [2017-03-07 06:46:15,023] {jobs.py:813} INFO - [backfill progress] waiting: 6 
> | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
> [2017-03-07 06:46:20,020] {jobs.py:813} INFO - [backfill progress] waiting: 6 
> | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
> [2017-03-07 06:46:25,031] {jobs.py:813} INFO - [backfill progress] waiting: 6 
> | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
> [2017-03-07 06:46:30,032] {jobs.py:813} INFO - [backfill progress] waiting: 6 
> | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
> [2017-03-07 06:46:35,034] {jobs.py:813} INFO - [backfill progress] waiting: 6 
> | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
> [2017-03-07 06:46:40,015] {jobs.py:813} INFO - [backfill progress] waiting: 6 
> | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
> [2017-03-07 06:46:45,022] {jobs.py:813} INFO - [backfill progress] waiting: 6 
> | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
> [2017-03-07 06:46:50,023] {jobs.py:813} INFO - [backfill progress] waiting: 6 
> | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
> [2017-03-07 06:46:55,018] {jobs.py:813} INFO - [backfill progress] waiting: 6 
> | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
> [2017-03-07 06:47:00,027] {jobs.py:813} INFO - [backfill progress] waiting: 6 
> | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
> [2017-03-07 06:47:05,033] {jobs.py:813} INFO - [backfill progress] waiting: 6 
> | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
> [2017-03-07 06:47:10,043] {base_executor.py:34} INFO - Adding to queue: 
> airflow run 
> aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris.aws_pr
> od_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units gcs_to_bq 
> 2017-03-01T16:00:00 --local -s 2017-03-01T16:00:00 -sd 
> DAGS_FOLDER/aws_prod_stor
> e_hybris_to_bq.py
> [2017-03-07 06:47:15,023] {celery_executor.py:62} INFO - [celery] queuing 
> (u'aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris.aws_prod_stor
> e_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units', 'gcs_to_bq', 
> datetime.datetime(2017, 3, 1, 16, 0)) through celery, queue=default
> [2017-03-07 06:47:15,035] {jobs.py:813} INFO - [backfill progress] waiting: 5 
> | succeeded: 1 | kicked_off: 2 | failed: 0 | wont_run: 0
> [2017-03-07 06:47:20,073] {jobs.py:813} INFO - [backfill progress] waiting: 5 
> | succ

[jira] [Created] (AIRFLOW-955) job failed to execute tasks

2017-03-07 Thread Jeff Liu (JIRA)
Jeff Liu created AIRFLOW-955:


 Summary: job failed to execute tasks
 Key: AIRFLOW-955
 URL: https://issues.apache.org/jira/browse/AIRFLOW-955
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Jeff Liu


I have recently run into a very strange issue that the job fails consistently 
to run tasks or hung in a running state without running any tasks.

My dag structure has four levels, the main dag-> a subdag (level 1) -> a 
subdag( level2) -> 5 tasks. 
The log shows that sometime the level2 subdag fails to execute tasks, but when 
I look for the logs for tasks, I don't see any records that the tasks were ever 
executed (on the matching ds date).

Here is the some logs from a level2 dag example:
{noformat}

Attempt 1 out of 2


[2017-03-07 06:45:55,230] {models.py:1041} INFO - Executing 

on 2017-03-01 16:00:00
[2017-03-07 06:45:55,462] {base_executor.py:34} INFO - Adding to queue: airflow 
run aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris.aws_pr
od_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units 
mysql_to_gcs 2017-03-01T16:00:00 --local -s 2017-03-01T16:00:00 -sd 
DAGS_FOLDER/aws_prod_s
tore_hybris_to_bq.py
[2017-03-07 06:46:00,017] {celery_executor.py:62} INFO - [celery] queuing 
(u'aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris.aws_prod_stor
e_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units', 'mysql_to_gcs', 
datetime.datetime(2017, 3, 1, 16, 0)) through celery, queue=default
[2017-03-07 06:46:00,169] {jobs.py:813} INFO - [backfill progress] waiting: 6 | 
succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
[2017-03-07 06:46:05,094] {jobs.py:813} INFO - [backfill progress] waiting: 6 | 
succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
[2017-03-07 06:46:10,015] {jobs.py:813} INFO - [backfill progress] waiting: 6 | 
succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
[2017-03-07 06:46:15,023] {jobs.py:813} INFO - [backfill progress] waiting: 6 | 
succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
[2017-03-07 06:46:20,020] {jobs.py:813} INFO - [backfill progress] waiting: 6 | 
succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
[2017-03-07 06:46:25,031] {jobs.py:813} INFO - [backfill progress] waiting: 6 | 
succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
[2017-03-07 06:46:30,032] {jobs.py:813} INFO - [backfill progress] waiting: 6 | 
succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
[2017-03-07 06:46:35,034] {jobs.py:813} INFO - [backfill progress] waiting: 6 | 
succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
[2017-03-07 06:46:40,015] {jobs.py:813} INFO - [backfill progress] waiting: 6 | 
succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
[2017-03-07 06:46:45,022] {jobs.py:813} INFO - [backfill progress] waiting: 6 | 
succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
[2017-03-07 06:46:50,023] {jobs.py:813} INFO - [backfill progress] waiting: 6 | 
succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
[2017-03-07 06:46:55,018] {jobs.py:813} INFO - [backfill progress] waiting: 6 | 
succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
[2017-03-07 06:47:00,027] {jobs.py:813} INFO - [backfill progress] waiting: 6 | 
succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
[2017-03-07 06:47:05,033] {jobs.py:813} INFO - [backfill progress] waiting: 6 | 
succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0
[2017-03-07 06:47:10,043] {base_executor.py:34} INFO - Adding to queue: airflow 
run aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris.aws_pr
od_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units gcs_to_bq 
2017-03-01T16:00:00 --local -s 2017-03-01T16:00:00 -sd DAGS_FOLDER/aws_prod_stor
e_hybris_to_bq.py
[2017-03-07 06:47:15,023] {celery_executor.py:62} INFO - [celery] queuing 
(u'aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris.aws_prod_stor
e_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units', 'gcs_to_bq', 
datetime.datetime(2017, 3, 1, 16, 0)) through celery, queue=default
[2017-03-07 06:47:15,035] {jobs.py:813} INFO - [backfill progress] waiting: 5 | 
succeeded: 1 | kicked_off: 2 | failed: 0 | wont_run: 0
[2017-03-07 06:47:20,073] {jobs.py:813} INFO - [backfill progress] waiting: 5 | 
succeeded: 1 | kicked_off: 2 | failed: 0 | wont_run: 0
[2017-03-07 06:47:25,025] {jobs.py:813} INFO - [backfill progress] waiting: 5 | 
succeeded: 1 | kicked_off: 2 | failed: 0 | wont_run: 0
[2017-03-07 06:47:30,029] {jobs.py:813} INFO - [backfill progress] waiting: 5 | 
succeeded: 1 | kicked_off: 2 | failed: 0 | wont_run: 0
[2017-03-07 06:47:35,020] {jobs.py:813} INFO - [backfill progress] waiting: 5 | 
succeeded: 1 | kicked_off: 2 | failed: 0 | wont_run: 0
[2017-03-07 06:47:40,025] {jobs

[jira] [Created] (AIRFLOW-954) Installing future 0.16 breaks airflow initdb on Python 2.7

2017-03-07 Thread Sean Cronin (JIRA)
Sean Cronin created AIRFLOW-954:
---

 Summary: Installing future 0.16 breaks airflow initdb on Python 2.7
 Key: AIRFLOW-954
 URL: https://issues.apache.org/jira/browse/AIRFLOW-954
 Project: Apache Airflow
  Issue Type: Bug
 Environment: Python 2.7.3 on Ubuntu 12.04 with a clean virtualenv and 
HEAD of airflow master installed
Reporter: Sean Cronin


On master, if you run {{airflow initdb}} after installing HEAD of airflow 
master with Python 2.7, you get the following error:

{code}
Traceback (most recent call last):
  File "/home/sean/.virtualenvs/airflow-test/bin/airflow", line 17, in 
from airflow import configuration
  File 
"/home/sean/.virtualenvs/airflow-test/local/lib/python2.7/site-packages/airflow/__init__.py",
 line 29, in 
from airflow import configuration as conf
  File 
"/home/sean/.virtualenvs/airflow-test/local/lib/python2.7/site-packages/airflow/configuration.py",
 line 33, in 
from configparser import ConfigParser
ImportError: No module named configparser
{code}

This seems to be due to https://github.com/apache/incubator-airflow/pull/2091 
which bumps {{future}} in {{setup.py}} so that future 0.16 can be installed.

Python future [got rid 
of](http://python-future.org/whatsnew.html#what-s-new-in-version-0-16-0-2016-10-27)
 their {{configparser}} alias in 0.16

The recommended way to fix this is to install {{future==0.16}} and install 
{{configparser}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-953) Tasks marked as successful should have start/end_date/duration set

2017-03-07 Thread Dan Davydov (JIRA)
Dan Davydov created AIRFLOW-953:
---

 Summary: Tasks marked as successful should have 
start/end_date/duration set
 Key: AIRFLOW-953
 URL: https://issues.apache.org/jira/browse/AIRFLOW-953
 Project: Apache Airflow
  Issue Type: Bug
  Components: webserver
Reporter: Dan Davydov


Tasks marked as successful should have start/end_date/duration/operator set . 
start/end dates should be the same and denote the time the task was marked as 
successful, duration should be 0, and the operator should be filled in 
correctly with the task's operator.

This should be fixed because otherwise the task instance state is not complete 
which could break some operations in Airflow, and prevents things like scripts 
that delete old tasks from Airflow (since e.g. start_date is not specified for 
these tasks).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AIRFLOW-953) Tasks marked as successful should have complete state set

2017-03-07 Thread Dan Davydov (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Davydov updated AIRFLOW-953:

Summary: Tasks marked as successful should have complete state set  (was: 
Tasks marked as successful should have start/end_date/duration set)

> Tasks marked as successful should have complete state set
> -
>
> Key: AIRFLOW-953
> URL: https://issues.apache.org/jira/browse/AIRFLOW-953
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Dan Davydov
>  Labels: beginner, starter
>
> Tasks marked as successful should have start/end_date/duration/operator set . 
> start/end dates should be the same and denote the time the task was marked as 
> successful, duration should be 0, and the operator should be filled in 
> correctly with the task's operator.
> This should be fixed because otherwise the task instance state is not 
> complete which could break some operations in Airflow, and prevents things 
> like scripts that delete old tasks from Airflow (since e.g. start_date is not 
> specified for these tasks).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-952) Cannot save an empty extra field via the connections UI

2017-03-07 Thread Vijay Bhat (JIRA)
Vijay Bhat created AIRFLOW-952:
--

 Summary: Cannot save an empty extra field via the connections UI 
 Key: AIRFLOW-952
 URL: https://issues.apache.org/jira/browse/AIRFLOW-952
 Project: Apache Airflow
  Issue Type: Bug
  Components: models, ui
Reporter: Vijay Bhat
Assignee: Vijay Bhat
Priority: Minor


Once you fill out the extra field parameter in the connections web UI, you 
cannot clear it out. 

Steps to reproduce:

- open the default mysql connection via the web UI
- enter a JSON string for the extra field and save
- go back to editing the mysql connection and clear out the extra field.
- hit save, and go back to the mysql connection edit UI, and the JSON string 
will still be there.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-951) Missing AWS operators for docs::api_reference

2017-03-07 Thread msempere (JIRA)
msempere created AIRFLOW-951:


 Summary: Missing AWS operators for docs::api_reference
 Key: AIRFLOW-951
 URL: https://issues.apache.org/jira/browse/AIRFLOW-951
 Project: Apache Airflow
  Issue Type: Bug
  Components: docs
Reporter: msempere
Priority: Minor


AWS related operators are missing from 
https://airflow.incubator.apache.org/code.html?highlight=operators#operators

Also AWS related contributed operators are missing.

The code is there, but the documentation is not up to date to reflect those 
operators.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-950) Missing AWS integrations on documentation::integrations

2017-03-07 Thread msempere (JIRA)
msempere created AIRFLOW-950:


 Summary: Missing AWS integrations on documentation::integrations
 Key: AIRFLOW-950
 URL: https://issues.apache.org/jira/browse/AIRFLOW-950
 Project: Apache Airflow
  Issue Type: Bug
  Components: docs
Reporter: msempere
Priority: Minor


See: https://airflow.incubator.apache.org/integration.html#aws

Documentation is missing current AWS integrations like:

 - redshift_to_s3_operator
 - s3_file_transform_operator
 - s3_to_hive_operator
 - ecs_operator
 - emr_add_steps_operator
 - emr_create_job_flow_operator
 - emr_terminate_job_flow_operator
 - etc.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AIRFLOW-949) kill_process_tree does not kill the root process

2017-03-07 Thread Erik Cederstrand (JIRA)
Erik Cederstrand created AIRFLOW-949:


 Summary: kill_process_tree does not kill the root process
 Key: AIRFLOW-949
 URL: https://issues.apache.org/jira/browse/AIRFLOW-949
 Project: Apache Airflow
  Issue Type: Bug
  Components: utils
Affects Versions: 1.8.0rc4
Reporter: Erik Cederstrand
 Attachments: helpers.patch

The kill_process_tree() function in airflow/utils/helper.py does not attempt to 
kill the root process. Since there's also a kill_descendant_processes() 
function, I assume that was the intent.

Also, according to the comments, the intent is to send first SIGTERM, and then 
SIGKILL, to decendant processes. But in fact, SIGTERM is sent twice.

The attached patch fixes both problems.

This was found while investigating why the airflow worker would not kill 
certain jobs that had crashed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)