[jira] [Work stopped] (AIRFLOW-6799) webgui cannot display all tasks

2020-02-14 Thread Soeren Laursen (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-6799 stopped by Soeren Laursen.
---
> webgui cannot display all tasks
> ---
>
> Key: AIRFLOW-6799
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6799
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui, webserver
>Affects Versions: 1.10.7, 1.10.9
> Environment: linux in a docker container.
>Reporter: Soeren Laursen
>Priority: Blocker
>
> The we have "to many" task the graph rendering stops with an 
> Edge 'undefined' is not in graph javascript error,
> There is no graph in the webgui. Lowering the number of task will enable the 
> the rendering again.
> Examplecode:
> from airflow import DAG
> from airflow.models import DAG
> from airflow.operators.dummy_operator import DummyOperator
> from datetime import datetime
> from datetime import timedelta
> DAG_task_concurrency = 30
> DAG_max_active_runs = 10
> MAIN_DAG_ID = 'BUG_IN_GRAPH_DISPLAY'
> default_args = {
> 'owner':'prod', 
> 'depends_on_past':False, 
> 'email':['s...@fcoo.dk'], 
> 'email_on_failure':False, 
> 'email_on_retry':False, 
> 'retries':3, 
> 'retry_delay':timedelta(seconds=30),
> 'queue':'default'}
> BUG_DAG = DAG(MAIN_DAG_ID,
>   default_args=default_args,
>   catchup=False,
>   orientation='LR',
>   concurrency=DAG_task_concurrency,
>   schedule_interval='@once',
>   max_active_runs=DAG_max_active_runs,
>   start_date=(datetime(2020, 2, 5))
> )
> # To many tasks
> max_task = 160
> # 156 ok
> task_range = list(range(0, max_task + 1))
> start_task =  DummyOperator(task_id='start_task', dag=BUG_DAG)
> after_all_complete =  DummyOperator(task_id='after_all_complete', dag=BUG_DAG)
> for task_step in task_range:
> task1 = DummyOperator(task_id='task_{0}'.format(task_step),dag=BUG_DAG)
> start_task >> task1 >> after_all_complete



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6799) webgui cannot display all tasks

2020-02-14 Thread Soeren Laursen (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soeren Laursen updated AIRFLOW-6799:

Component/s: ui

> webgui cannot display all tasks
> ---
>
> Key: AIRFLOW-6799
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6799
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui, webserver
>Affects Versions: 1.10.7, 1.10.9
> Environment: linux in a docker container.
>Reporter: Soeren Laursen
>Priority: Blocker
>
> The we have "to many" task the graph rendering stops with an 
> Edge 'undefined' is not in graph javascript error,
> There is no graph in the webgui. Lowering the number of task will enable the 
> the rendering again.
> Examplecode:
> from airflow import DAG
> from airflow.models import DAG
> from airflow.operators.dummy_operator import DummyOperator
> from datetime import datetime
> from datetime import timedelta
> DAG_task_concurrency = 30
> DAG_max_active_runs = 10
> MAIN_DAG_ID = 'BUG_IN_GRAPH_DISPLAY'
> default_args = {
> 'owner':'prod', 
> 'depends_on_past':False, 
> 'email':['s...@fcoo.dk'], 
> 'email_on_failure':False, 
> 'email_on_retry':False, 
> 'retries':3, 
> 'retry_delay':timedelta(seconds=30),
> 'queue':'default'}
> BUG_DAG = DAG(MAIN_DAG_ID,
>   default_args=default_args,
>   catchup=False,
>   orientation='LR',
>   concurrency=DAG_task_concurrency,
>   schedule_interval='@once',
>   max_active_runs=DAG_max_active_runs,
>   start_date=(datetime(2020, 2, 5))
> )
> # To many tasks
> max_task = 160
> # 156 ok
> task_range = list(range(0, max_task + 1))
> start_task =  DummyOperator(task_id='start_task', dag=BUG_DAG)
> after_all_complete =  DummyOperator(task_id='after_all_complete', dag=BUG_DAG)
> for task_step in task_range:
> task1 = DummyOperator(task_id='task_{0}'.format(task_step),dag=BUG_DAG)
> start_task >> task1 >> after_all_complete



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-6799) webgui cannot display all tasks

2020-02-14 Thread Soeren Laursen (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soeren Laursen updated AIRFLOW-6799:

Priority: Blocker  (was: Major)

> webgui cannot display all tasks
> ---
>
> Key: AIRFLOW-6799
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6799
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: 1.10.7, 1.10.9
> Environment: linux in a docker container.
>Reporter: Soeren Laursen
>Priority: Blocker
>
> The we have "to many" task the graph rendering stops with an 
> Edge 'undefined' is not in graph javascript error,
> There is no graph in the webgui. Lowering the number of task will enable the 
> the rendering again.
> Examplecode:
> from airflow import DAG
> from airflow.models import DAG
> from airflow.operators.dummy_operator import DummyOperator
> from datetime import datetime
> from datetime import timedelta
> DAG_task_concurrency = 30
> DAG_max_active_runs = 10
> MAIN_DAG_ID = 'BUG_IN_GRAPH_DISPLAY'
> default_args = {
> 'owner':'prod', 
> 'depends_on_past':False, 
> 'email':['s...@fcoo.dk'], 
> 'email_on_failure':False, 
> 'email_on_retry':False, 
> 'retries':3, 
> 'retry_delay':timedelta(seconds=30),
> 'queue':'default'}
> BUG_DAG = DAG(MAIN_DAG_ID,
>   default_args=default_args,
>   catchup=False,
>   orientation='LR',
>   concurrency=DAG_task_concurrency,
>   schedule_interval='@once',
>   max_active_runs=DAG_max_active_runs,
>   start_date=(datetime(2020, 2, 5))
> )
> # To many tasks
> max_task = 160
> # 156 ok
> task_range = list(range(0, max_task + 1))
> start_task =  DummyOperator(task_id='start_task', dag=BUG_DAG)
> after_all_complete =  DummyOperator(task_id='after_all_complete', dag=BUG_DAG)
> for task_step in task_range:
> task1 = DummyOperator(task_id='task_{0}'.format(task_step),dag=BUG_DAG)
> start_task >> task1 >> after_all_complete



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (AIRFLOW-6799) webgui cannot display all tasks

2020-02-14 Thread Soeren Laursen (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soeren Laursen reassigned AIRFLOW-6799:
---

Assignee: (was: Soeren Laursen)

> webgui cannot display all tasks
> ---
>
> Key: AIRFLOW-6799
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6799
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: 1.10.7, 1.10.9
> Environment: linux in a docker container.
>Reporter: Soeren Laursen
>Priority: Major
>
> The we have "to many" task the graph rendering stops with an 
> Edge 'undefined' is not in graph javascript error,
> There is no graph in the webgui. Lowering the number of task will enable the 
> the rendering again.
> Examplecode:
> from airflow import DAG
> from airflow.models import DAG
> from airflow.operators.dummy_operator import DummyOperator
> from datetime import datetime
> from datetime import timedelta
> DAG_task_concurrency = 30
> DAG_max_active_runs = 10
> MAIN_DAG_ID = 'BUG_IN_GRAPH_DISPLAY'
> default_args = {
> 'owner':'prod', 
> 'depends_on_past':False, 
> 'email':['s...@fcoo.dk'], 
> 'email_on_failure':False, 
> 'email_on_retry':False, 
> 'retries':3, 
> 'retry_delay':timedelta(seconds=30),
> 'queue':'default'}
> BUG_DAG = DAG(MAIN_DAG_ID,
>   default_args=default_args,
>   catchup=False,
>   orientation='LR',
>   concurrency=DAG_task_concurrency,
>   schedule_interval='@once',
>   max_active_runs=DAG_max_active_runs,
>   start_date=(datetime(2020, 2, 5))
> )
> # To many tasks
> max_task = 160
> # 156 ok
> task_range = list(range(0, max_task + 1))
> start_task =  DummyOperator(task_id='start_task', dag=BUG_DAG)
> after_all_complete =  DummyOperator(task_id='after_all_complete', dag=BUG_DAG)
> for task_step in task_range:
> task1 = DummyOperator(task_id='task_{0}'.format(task_step),dag=BUG_DAG)
> start_task >> task1 >> after_all_complete



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (AIRFLOW-6799) webgui cannot display all tasks

2020-02-14 Thread Soeren Laursen (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-6799 started by Soeren Laursen.
---
> webgui cannot display all tasks
> ---
>
> Key: AIRFLOW-6799
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6799
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Affects Versions: 1.10.7, 1.10.9
> Environment: linux in a docker container.
>Reporter: Soeren Laursen
>Assignee: Soeren Laursen
>Priority: Major
>
> The we have "to many" task the graph rendering stops with an 
> Edge 'undefined' is not in graph javascript error,
> There is no graph in the webgui. Lowering the number of task will enable the 
> the rendering again.
> Examplecode:
> from airflow import DAG
> from airflow.models import DAG
> from airflow.operators.dummy_operator import DummyOperator
> from datetime import datetime
> from datetime import timedelta
> DAG_task_concurrency = 30
> DAG_max_active_runs = 10
> MAIN_DAG_ID = 'BUG_IN_GRAPH_DISPLAY'
> default_args = {
> 'owner':'prod', 
> 'depends_on_past':False, 
> 'email':['s...@fcoo.dk'], 
> 'email_on_failure':False, 
> 'email_on_retry':False, 
> 'retries':3, 
> 'retry_delay':timedelta(seconds=30),
> 'queue':'default'}
> BUG_DAG = DAG(MAIN_DAG_ID,
>   default_args=default_args,
>   catchup=False,
>   orientation='LR',
>   concurrency=DAG_task_concurrency,
>   schedule_interval='@once',
>   max_active_runs=DAG_max_active_runs,
>   start_date=(datetime(2020, 2, 5))
> )
> # To many tasks
> max_task = 160
> # 156 ok
> task_range = list(range(0, max_task + 1))
> start_task =  DummyOperator(task_id='start_task', dag=BUG_DAG)
> after_all_complete =  DummyOperator(task_id='after_all_complete', dag=BUG_DAG)
> for task_step in task_range:
> task1 = DummyOperator(task_id='task_{0}'.format(task_step),dag=BUG_DAG)
> start_task >> task1 >> after_all_complete



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-2195) Task get terminatet before timeout reached

2020-02-14 Thread Soeren Laursen (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soeren Laursen closed AIRFLOW-2195.
---
Resolution: Fixed

> Task get terminatet before timeout reached
> --
>
> Key: AIRFLOW-2195
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2195
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
> Environment: linux, ubuntu 16.04
>Reporter: Soeren Laursen
>Priority: Major
>
> We have a task which is running for more than 6 hours. It is a backup job 
> that uses dirvish.
>  
> After precise 6 hours it gets terminated:
> [2018-03-07 11:55:31,928] \{base_task_runner.py:98} INFO - Subtask: 
> [2018-03-07 11:55:31,928] \{bash_operator.py:80} INFO - Temporary script 
> location: 
> /tmp/airflowtmpwx57pf7q//tmp/airflowtmpwx57pf7q/Backup_of_arch-fcoo-getm-ns1cdvo33ldi
>  [2018-03-07 11:55:31,928] \{base_task_runner.py:98} INFO - Subtask: 
> [2018-03-07 11:55:31,928] \{bash_operator.py:88} INFO - Running command: sudo 
> /backup/dirvish/scripts/airflow-dirvish.sh arch-fcoo-getm-ns1c [2018-03-07 
> 11:55:31,933] \{base_task_runner.py:98} INFO - Subtask: [2018-03-07 
> 11:55:31,933] \{bash_operator.py:97} INFO - Output: [2018-03-07 17:57:56,378] 
> \{cli.py:374} INFO - Running on host storage-bck02 [2018-03-07 17:57:56,421] 
> \{models.py:1190} INFO - Dependencies not met for  Dirvish_on_storage-bck02.Dirvish_job_FCOO_GETM.Backup_of_arch-fcoo-getm-ns1c 
> 2018-03-06 07:00:00 [running]>, dependency 'Task Instance Not Already 
> Running' FAILED: Task is already running, it started on 2018-03-07 
> 10:55:30.946641. [2018-03-07 17:57:56,421] \{models.py:1190} INFO - 
> Dependencies not met for  Dirvish_on_storage-bck02.Dirvish_job_FCOO_GETM.Backup_of_arch-fcoo-getm-ns1c 
> 2018-03-06 07:00:00 [running]>, dependency 'Task Instance State' FAILED: Task 
> is in the 'running' state which is not a valid state for execution. The task 
> must be cleared in order to be run. [2018-03-07 17:58:05,261] 
> \{helpers.py:233} INFO - Terminating descendant processes of 
> ['/usr/bin/python3 /usr/local/bin/airflow run 
> Dirvish_on_storage-bck02.Dirvish_job_FCOO_GETM Backup_of_arch-fcoo-getm-ns1c 
> 2018-03-06T07:00:00 --job_id 32501 --raw -sd 
> /home/airflow/airflow/airflow/dags/dirvish_on_storage-bck02.py'] PID: 5169 
> [2018-03-07 17:58:05,261] \{helpers.py:237} INFO - Terminating descendant 
> process ['bash', 
> '/tmp/airflowtmpwx57pf7q/Backup_of_arch-fcoo-getm-ns1cdvo33ldi'] PID: 5180 
> [2018-03-07 17:58:05,268] \{helpers.py:195} ERROR - b''
> The dirvish scripts continues in the background af finish as it should, but 
> task that depends on the backup jobs stops.
> Even if:
> execution_timeout=None
> I have made a small test dag to test execution_timeout, it works as expected. 
> Tasks get stopped if they reach the timeout. Bash script that use sleep.
> My college has found a reference:
> https://github.com/apache/incubator-airflow/blob/master/airflow/config_templates/default_celery.py
> There we have visibility_timeout=21600
> In the default airflow.cfg it is described as:
> [celery_broker_transport_options]
> # The visibility timeout defines the number of seconds to wait for the worker
> # to acknowledge the task before the message is redelivered to another worker.
> # Make sure to increase the visibility timeout to match the time of the 
> longest
> # ETA you're planning to use. Especially important in case of using Redis or 
> SQS
> visibility_timeout = 21600
> Is the problem that our tasks are not acknowledged somehow is celery?
> best regards
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-1947) airflow json file created i /tmp get wrong permission when using run_as_user

2020-02-13 Thread Soeren Laursen (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Soeren Laursen closed AIRFLOW-1947.
---
Resolution: Fixed

> airflow json file created i /tmp get wrong permission when using run_as_user
> 
>
> Key: AIRFLOW-1947
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1947
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: 1.8.0
> Environment: ubuntu 16.04 LTS
>Reporter: Soeren Laursen
>Priority: Critical
>
> We are using run_as_user on two specific task, to make sure that the 
> resulting files are assigned to the correct user.
> If we are running the task as the Airflow user the task get done as expected.
> *DAG START*
> from airflow import DAG
> from airflow.operators.bash_operator import BashOperator
> from datetime import datetime, timedelta
> default_args = {
> 'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': datetime(2015, 6, 1),
> 'email': ['s...@fcoo.dk'],
> 'email_on_failure': False,
> 'email_on_retry': False,
> 'retries': 1,
> 'retry_delay': timedelta(minutes=5),
> 'queue': 'storage-arch03',
> 'dagrun_timeout' : timedelta(minutes=60)
> # 'pool': 'backfill',
> # 'priority_weight': 10,
> # 'end_date': datetime(2016, 1, 1),
> }
> dag = DAG('Archive_Sentinel-1_data_from_FCOO_ftp_server', 
> default_args=default_args, schedule_interval=timedelta(1))
> archivingTodaysData = BashOperator(
> task_id='Archive_todays_data',
> bash_command='/home/airflow/airflowScripts/archive-Sentinel-1-data.sh 0 ',
> dag=dag)
> archivingYesterdaysData = BashOperator(
> task_id='Archive_yesterdays_data',
> bash_command='/home/airflow/airflowScripts/archive-Sentinel-1-data.sh 1 ',
> dag=dag)
> # First archive the newest data, then the data from yesterday.
> archivingYesterdaysData.set_upstream( archivingTodaysData )
> *DAG END*
> When we run the tast with a user called prod by using the run_as_user, the 
> file(s) are generated In the /tmp
> -rw---  1 airflow airflow 2205 dec 19 11:46 tmpicu87_au
> But the prod user cannot read the file. From the log file we have:
> [2017-12-19 11:46:31,803] {base_task_runner.py:112} INFO - Running: ['bash', 
> '-c', 'sudo -H -u prod airflow run 
> Archive_Sentinel-1_data_from_FCOO_ftp_server Archive_yesterdays_data 
> 2017-12-19T00:00:00 --job_id 1047 --raw -sd 
> DAGS_FOLDER/archive-Sentinel-1-data-from-ftp-server.py --cfg_path 
> /tmp/tmpicu87_au']
> [2017-12-19 11:46:32,463] {base_task_runner.py:95} INFO - Subtask: 
> [2017-12-19 11:46:32,462] {__init__.py:57} INFO - Using executor 
> SequentialExecutor
> [2017-12-19 11:46:32,587] {base_task_runner.py:95} INFO - Subtask: 
> [2017-12-19 11:46:32,587] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python3.5/lib2to3/Grammar.txt
> [2017-12-19 11:46:32,630] {base_task_runner.py:95} INFO - Subtask: 
> [2017-12-19 11:46:32,630] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python3.5/lib2to3/PatternGrammar.txt
> [2017-12-19 11:46:33,124] {base_task_runner.py:95} INFO - Subtask: 
> /usr/local/lib/python3.5/dist-packages/airflow/www/app.py:23: 
> FlaskWTFDeprecationWarning: "flask_wtf.CsrfProtect" has been renamed to 
> "CSRFProtect" and will be removed in 1.0.
> [2017-12-19 11:46:33,124] {base_task_runner.py:95} INFO - Subtask:   csrf = 
> CsrfProtect()
> [2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask: Traceback 
> (most recent call last):
> [2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask:   File 
> "/usr/local/bin/airflow", line 28, in 
> [2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask: 
> args.func(args)
> [2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask:   File 
> "/usr/local/lib/python3.5/dist-packages/airflow/bin/cli.py", line 329, in run
> [2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask: with 
> open(args.cfg_path, 'r') as conf_file:
> [2017-12-19 11:46:33,344] {base_task_runner.py:95} INFO - Subtask: 
> PermissionError: [Errno 13] Permission denied: '/tmp/tmpicu87_au'
> [2017-12-19 11:46:36,770] {jobs.py:2125} INFO - Task exited with return code 1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (AIRFLOW-6799) webgui cannot display all tasks

2020-02-13 Thread Soeren Laursen (Jira)
Soeren Laursen created AIRFLOW-6799:
---

 Summary: webgui cannot display all tasks
 Key: AIRFLOW-6799
 URL: https://issues.apache.org/jira/browse/AIRFLOW-6799
 Project: Apache Airflow
  Issue Type: Bug
  Components: webserver
Affects Versions: 1.10.9, 1.10.7
 Environment: linux in a docker container.
Reporter: Soeren Laursen


The we have "to many" task the graph rendering stops with an 
Edge 'undefined' is not in graph javascript error,

There is no graph in the webgui. Lowering the number of task will enable the 
the rendering again.

Examplecode:
from airflow import DAG
from airflow.models import DAG
from airflow.operators.dummy_operator import DummyOperator

from datetime import datetime
from datetime import timedelta

DAG_task_concurrency = 30
DAG_max_active_runs = 10

MAIN_DAG_ID = 'BUG_IN_GRAPH_DISPLAY'

default_args = {
'owner':'prod', 
'depends_on_past':False, 
'email':['s...@fcoo.dk'], 
'email_on_failure':False, 
'email_on_retry':False, 
'retries':3, 
'retry_delay':timedelta(seconds=30),
'queue':'default'}

BUG_DAG = DAG(MAIN_DAG_ID,
  default_args=default_args,
  catchup=False,
  orientation='LR',
  concurrency=DAG_task_concurrency,
  schedule_interval='@once',
  max_active_runs=DAG_max_active_runs,
  start_date=(datetime(2020, 2, 5))
)

# To many tasks
max_task = 160
# 156 ok

task_range = list(range(0, max_task + 1))

start_task =  DummyOperator(task_id='start_task', dag=BUG_DAG)
after_all_complete =  DummyOperator(task_id='after_all_complete', dag=BUG_DAG)

for task_step in task_range:
task1 = DummyOperator(task_id='task_{0}'.format(task_step),dag=BUG_DAG)
start_task >> task1 >> after_all_complete



--
This message was sent by Atlassian Jira
(v8.3.4#803005)