[jira] [Work started] (AIRFLOW-946) Virtualenv not explicitly used by webserver/worker subprocess
[ https://issues.apache.org/jira/browse/AIRFLOW-946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on AIRFLOW-946 started by Daniel Huang. > Virtualenv not explicitly used by webserver/worker subprocess > - > > Key: AIRFLOW-946 > URL: https://issues.apache.org/jira/browse/AIRFLOW-946 > Project: Apache Airflow > Issue Type: Bug > Components: cli >Reporter: Daniel Huang >Assignee: Daniel Huang >Priority: Minor > > I have airflow installed in a virtualenv. I'd expect calling > {{/path/to/venv/bin/airflow webserver}} or {{/path/to/venv/bin/airflow > worker}} *without activating my virtualenv* to work. However, they both fail > to run properly because they spawn a process that is called without > specifying the virtualenv. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-955) job failed to execute tasks
[ https://issues.apache.org/jira/browse/AIRFLOW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Liu updated AIRFLOW-955: - Attachment: browse-task-instances-example-03-01-2017.png > job failed to execute tasks > --- > > Key: AIRFLOW-955 > URL: https://issues.apache.org/jira/browse/AIRFLOW-955 > Project: Apache Airflow > Issue Type: Bug >Reporter: Jeff Liu > Attachments: browse-task-instances-example-03-01-2017.png > > > I have recently run into a very strange issue that the job fails consistently > to run tasks or hung in a running state without running any tasks. > My dag structure has four levels, the main dag-> a subdag (level 1) -> a > subdag( level2) -> 5 tasks. > The log shows that sometime the level2 subdag fails to execute tasks, but > when I look for the logs for tasks, I don't see any records that the tasks > were ever executed (on the matching ds date). > Here is the some logs from a level2 dag example: > {noformat} > > Attempt 1 out of 2 > > [2017-03-07 06:45:55,230] {models.py:1041} INFO - Executing > aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units> > on 2017-03-01 16:00:00 > [2017-03-07 06:45:55,462] {base_executor.py:34} INFO - Adding to queue: > airflow run > aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris.aws_pr > od_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units > mysql_to_gcs 2017-03-01T16:00:00 --local -s 2017-03-01T16:00:00 -sd > DAGS_FOLDER/aws_prod_s > tore_hybris_to_bq.py > [2017-03-07 06:46:00,017] {celery_executor.py:62} INFO - [celery] queuing > (u'aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris.aws_prod_stor > e_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units', > 'mysql_to_gcs', datetime.datetime(2017, 3, 1, 16, 0)) through celery, > queue=default > [2017-03-07 06:46:00,169] {jobs.py:813} INFO - [backfill progress] waiting: 6 > | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 > [2017-03-07 06:46:05,094] {jobs.py:813} INFO - [backfill progress] waiting: 6 > | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 > [2017-03-07 06:46:10,015] {jobs.py:813} INFO - [backfill progress] waiting: 6 > | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 > [2017-03-07 06:46:15,023] {jobs.py:813} INFO - [backfill progress] waiting: 6 > | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 > [2017-03-07 06:46:20,020] {jobs.py:813} INFO - [backfill progress] waiting: 6 > | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 > [2017-03-07 06:46:25,031] {jobs.py:813} INFO - [backfill progress] waiting: 6 > | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 > [2017-03-07 06:46:30,032] {jobs.py:813} INFO - [backfill progress] waiting: 6 > | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 > [2017-03-07 06:46:35,034] {jobs.py:813} INFO - [backfill progress] waiting: 6 > | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 > [2017-03-07 06:46:40,015] {jobs.py:813} INFO - [backfill progress] waiting: 6 > | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 > [2017-03-07 06:46:45,022] {jobs.py:813} INFO - [backfill progress] waiting: 6 > | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 > [2017-03-07 06:46:50,023] {jobs.py:813} INFO - [backfill progress] waiting: 6 > | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 > [2017-03-07 06:46:55,018] {jobs.py:813} INFO - [backfill progress] waiting: 6 > | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 > [2017-03-07 06:47:00,027] {jobs.py:813} INFO - [backfill progress] waiting: 6 > | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 > [2017-03-07 06:47:05,033] {jobs.py:813} INFO - [backfill progress] waiting: 6 > | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 > [2017-03-07 06:47:10,043] {base_executor.py:34} INFO - Adding to queue: > airflow run > aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris.aws_pr > od_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units gcs_to_bq > 2017-03-01T16:00:00 --local -s 2017-03-01T16:00:00 -sd > DAGS_FOLDER/aws_prod_stor > e_hybris_to_bq.py > [2017-03-07 06:47:15,023] {celery_executor.py:62} INFO - [celery] queuing > (u'aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris.aws_prod_stor > e_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units', 'gcs_to_bq', > datetime.datetime(2017, 3, 1, 16, 0)) through celery, queue=default > [2017-03-07 06:47:15,035] {jobs.py:813} INFO - [backfill progress] waiting: 5 > | succeeded: 1 | kicked_off: 2 | failed: 0 | wont_run: 0 > [2017-03-07 06:47:20,073] {jobs.py:813} INFO - [backfill progress] waiting: 5 > | succ
[jira] [Created] (AIRFLOW-955) job failed to execute tasks
Jeff Liu created AIRFLOW-955: Summary: job failed to execute tasks Key: AIRFLOW-955 URL: https://issues.apache.org/jira/browse/AIRFLOW-955 Project: Apache Airflow Issue Type: Bug Reporter: Jeff Liu I have recently run into a very strange issue that the job fails consistently to run tasks or hung in a running state without running any tasks. My dag structure has four levels, the main dag-> a subdag (level 1) -> a subdag( level2) -> 5 tasks. The log shows that sometime the level2 subdag fails to execute tasks, but when I look for the logs for tasks, I don't see any records that the tasks were ever executed (on the matching ds date). Here is the some logs from a level2 dag example: {noformat} Attempt 1 out of 2 [2017-03-07 06:45:55,230] {models.py:1041} INFO - Executing on 2017-03-01 16:00:00 [2017-03-07 06:45:55,462] {base_executor.py:34} INFO - Adding to queue: airflow run aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris.aws_pr od_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units mysql_to_gcs 2017-03-01T16:00:00 --local -s 2017-03-01T16:00:00 -sd DAGS_FOLDER/aws_prod_s tore_hybris_to_bq.py [2017-03-07 06:46:00,017] {celery_executor.py:62} INFO - [celery] queuing (u'aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris.aws_prod_stor e_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units', 'mysql_to_gcs', datetime.datetime(2017, 3, 1, 16, 0)) through celery, queue=default [2017-03-07 06:46:00,169] {jobs.py:813} INFO - [backfill progress] waiting: 6 | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 [2017-03-07 06:46:05,094] {jobs.py:813} INFO - [backfill progress] waiting: 6 | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 [2017-03-07 06:46:10,015] {jobs.py:813} INFO - [backfill progress] waiting: 6 | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 [2017-03-07 06:46:15,023] {jobs.py:813} INFO - [backfill progress] waiting: 6 | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 [2017-03-07 06:46:20,020] {jobs.py:813} INFO - [backfill progress] waiting: 6 | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 [2017-03-07 06:46:25,031] {jobs.py:813} INFO - [backfill progress] waiting: 6 | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 [2017-03-07 06:46:30,032] {jobs.py:813} INFO - [backfill progress] waiting: 6 | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 [2017-03-07 06:46:35,034] {jobs.py:813} INFO - [backfill progress] waiting: 6 | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 [2017-03-07 06:46:40,015] {jobs.py:813} INFO - [backfill progress] waiting: 6 | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 [2017-03-07 06:46:45,022] {jobs.py:813} INFO - [backfill progress] waiting: 6 | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 [2017-03-07 06:46:50,023] {jobs.py:813} INFO - [backfill progress] waiting: 6 | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 [2017-03-07 06:46:55,018] {jobs.py:813} INFO - [backfill progress] waiting: 6 | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 [2017-03-07 06:47:00,027] {jobs.py:813} INFO - [backfill progress] waiting: 6 | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 [2017-03-07 06:47:05,033] {jobs.py:813} INFO - [backfill progress] waiting: 6 | succeeded: 0 | kicked_off: 1 | failed: 0 | wont_run: 0 [2017-03-07 06:47:10,043] {base_executor.py:34} INFO - Adding to queue: airflow run aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris.aws_pr od_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units gcs_to_bq 2017-03-01T16:00:00 --local -s 2017-03-01T16:00:00 -sd DAGS_FOLDER/aws_prod_stor e_hybris_to_bq.py [2017-03-07 06:47:15,023] {celery_executor.py:62} INFO - [celery] queuing (u'aws_prod_store_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris.aws_prod_stor e_hybris_mysql_to_bq_dag.mysql_proxy_prod_store_hybris_units', 'gcs_to_bq', datetime.datetime(2017, 3, 1, 16, 0)) through celery, queue=default [2017-03-07 06:47:15,035] {jobs.py:813} INFO - [backfill progress] waiting: 5 | succeeded: 1 | kicked_off: 2 | failed: 0 | wont_run: 0 [2017-03-07 06:47:20,073] {jobs.py:813} INFO - [backfill progress] waiting: 5 | succeeded: 1 | kicked_off: 2 | failed: 0 | wont_run: 0 [2017-03-07 06:47:25,025] {jobs.py:813} INFO - [backfill progress] waiting: 5 | succeeded: 1 | kicked_off: 2 | failed: 0 | wont_run: 0 [2017-03-07 06:47:30,029] {jobs.py:813} INFO - [backfill progress] waiting: 5 | succeeded: 1 | kicked_off: 2 | failed: 0 | wont_run: 0 [2017-03-07 06:47:35,020] {jobs.py:813} INFO - [backfill progress] waiting: 5 | succeeded: 1 | kicked_off: 2 | failed: 0 | wont_run: 0 [2017-03-07 06:47:40,025] {jobs
[jira] [Created] (AIRFLOW-954) Installing future 0.16 breaks airflow initdb on Python 2.7
Sean Cronin created AIRFLOW-954: --- Summary: Installing future 0.16 breaks airflow initdb on Python 2.7 Key: AIRFLOW-954 URL: https://issues.apache.org/jira/browse/AIRFLOW-954 Project: Apache Airflow Issue Type: Bug Environment: Python 2.7.3 on Ubuntu 12.04 with a clean virtualenv and HEAD of airflow master installed Reporter: Sean Cronin On master, if you run {{airflow initdb}} after installing HEAD of airflow master with Python 2.7, you get the following error: {code} Traceback (most recent call last): File "/home/sean/.virtualenvs/airflow-test/bin/airflow", line 17, in from airflow import configuration File "/home/sean/.virtualenvs/airflow-test/local/lib/python2.7/site-packages/airflow/__init__.py", line 29, in from airflow import configuration as conf File "/home/sean/.virtualenvs/airflow-test/local/lib/python2.7/site-packages/airflow/configuration.py", line 33, in from configparser import ConfigParser ImportError: No module named configparser {code} This seems to be due to https://github.com/apache/incubator-airflow/pull/2091 which bumps {{future}} in {{setup.py}} so that future 0.16 can be installed. Python future [got rid of](http://python-future.org/whatsnew.html#what-s-new-in-version-0-16-0-2016-10-27) their {{configparser}} alias in 0.16 The recommended way to fix this is to install {{future==0.16}} and install {{configparser}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-953) Tasks marked as successful should have start/end_date/duration set
Dan Davydov created AIRFLOW-953: --- Summary: Tasks marked as successful should have start/end_date/duration set Key: AIRFLOW-953 URL: https://issues.apache.org/jira/browse/AIRFLOW-953 Project: Apache Airflow Issue Type: Bug Components: webserver Reporter: Dan Davydov Tasks marked as successful should have start/end_date/duration/operator set . start/end dates should be the same and denote the time the task was marked as successful, duration should be 0, and the operator should be filled in correctly with the task's operator. This should be fixed because otherwise the task instance state is not complete which could break some operations in Airflow, and prevents things like scripts that delete old tasks from Airflow (since e.g. start_date is not specified for these tasks). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (AIRFLOW-953) Tasks marked as successful should have complete state set
[ https://issues.apache.org/jira/browse/AIRFLOW-953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dan Davydov updated AIRFLOW-953: Summary: Tasks marked as successful should have complete state set (was: Tasks marked as successful should have start/end_date/duration set) > Tasks marked as successful should have complete state set > - > > Key: AIRFLOW-953 > URL: https://issues.apache.org/jira/browse/AIRFLOW-953 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Reporter: Dan Davydov > Labels: beginner, starter > > Tasks marked as successful should have start/end_date/duration/operator set . > start/end dates should be the same and denote the time the task was marked as > successful, duration should be 0, and the operator should be filled in > correctly with the task's operator. > This should be fixed because otherwise the task instance state is not > complete which could break some operations in Airflow, and prevents things > like scripts that delete old tasks from Airflow (since e.g. start_date is not > specified for these tasks). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-952) Cannot save an empty extra field via the connections UI
Vijay Bhat created AIRFLOW-952: -- Summary: Cannot save an empty extra field via the connections UI Key: AIRFLOW-952 URL: https://issues.apache.org/jira/browse/AIRFLOW-952 Project: Apache Airflow Issue Type: Bug Components: models, ui Reporter: Vijay Bhat Assignee: Vijay Bhat Priority: Minor Once you fill out the extra field parameter in the connections web UI, you cannot clear it out. Steps to reproduce: - open the default mysql connection via the web UI - enter a JSON string for the extra field and save - go back to editing the mysql connection and clear out the extra field. - hit save, and go back to the mysql connection edit UI, and the JSON string will still be there. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-951) Missing AWS operators for docs::api_reference
msempere created AIRFLOW-951: Summary: Missing AWS operators for docs::api_reference Key: AIRFLOW-951 URL: https://issues.apache.org/jira/browse/AIRFLOW-951 Project: Apache Airflow Issue Type: Bug Components: docs Reporter: msempere Priority: Minor AWS related operators are missing from https://airflow.incubator.apache.org/code.html?highlight=operators#operators Also AWS related contributed operators are missing. The code is there, but the documentation is not up to date to reflect those operators. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-950) Missing AWS integrations on documentation::integrations
msempere created AIRFLOW-950: Summary: Missing AWS integrations on documentation::integrations Key: AIRFLOW-950 URL: https://issues.apache.org/jira/browse/AIRFLOW-950 Project: Apache Airflow Issue Type: Bug Components: docs Reporter: msempere Priority: Minor See: https://airflow.incubator.apache.org/integration.html#aws Documentation is missing current AWS integrations like: - redshift_to_s3_operator - s3_file_transform_operator - s3_to_hive_operator - ecs_operator - emr_add_steps_operator - emr_create_job_flow_operator - emr_terminate_job_flow_operator - etc. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-949) kill_process_tree does not kill the root process
Erik Cederstrand created AIRFLOW-949: Summary: kill_process_tree does not kill the root process Key: AIRFLOW-949 URL: https://issues.apache.org/jira/browse/AIRFLOW-949 Project: Apache Airflow Issue Type: Bug Components: utils Affects Versions: 1.8.0rc4 Reporter: Erik Cederstrand Attachments: helpers.patch The kill_process_tree() function in airflow/utils/helper.py does not attempt to kill the root process. Since there's also a kill_descendant_processes() function, I assume that was the intent. Also, according to the comments, the intent is to send first SIGTERM, and then SIGKILL, to decendant processes. But in fact, SIGTERM is sent twice. The attached patch fixes both problems. This was found while investigating why the airflow worker would not kill certain jobs that had crashed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)