[jira] [Commented] (AIRFLOW-265) Custom parameters for DockerOperator

2020-03-31 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071824#comment-17071824
 ] 

Daniel Imberman commented on AIRFLOW-265:
-

[~Becky] I've moved this to a github issue. Please assign this to yourself 
https://github.com/apache/airflow/issues/7921

> Custom parameters for DockerOperator
> 
>
> Key: AIRFLOW-265
> URL: https://issues.apache.org/jira/browse/AIRFLOW-265
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Alexandr Nikitin
>Assignee: Ngwe Becky
>Priority: Major
>  Labels: docker, gsoc, gsoc2020, mentor
>
> Add ability to specify custom parameters to docker cli. E.g. 
> "--volume-driver=""" or --net="bridge" or any other



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-265) Custom parameters for DockerOperator

2020-03-31 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-265.
---
Resolution: Auto Closed

> Custom parameters for DockerOperator
> 
>
> Key: AIRFLOW-265
> URL: https://issues.apache.org/jira/browse/AIRFLOW-265
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Alexandr Nikitin
>Assignee: Ngwe Becky
>Priority: Major
>  Labels: docker, gsoc, gsoc2020, mentor
>
> Add ability to specify custom parameters to docker cli. E.g. 
> "--volume-driver=""" or --net="bridge" or any other



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-254) Webserver should refresh all workers in case of a dag refresh / update

2020-03-30 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-254.
---
Resolution: Auto Closed

> Webserver should refresh all workers in case of a dag refresh / update
> --
>
> Key: AIRFLOW-254
> URL: https://issues.apache.org/jira/browse/AIRFLOW-254
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Bolke de Bruin
>Assignee: Chinmay Kousik
>Priority: Major
>
> The webserver only refreshes one process in case a dag refresh is demanded or 
> an update is made to a dag. This is annoying as you might end up with old 
> code in the views or the dreaded "scheduler has put this in the db, but the 
> webserver hasnt got it yet".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-254) Webserver should refresh all workers in case of a dag refresh / update

2020-03-30 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071359#comment-17071359
 ] 

Daniel Imberman commented on AIRFLOW-254:
-

This issue has been closed.
 
If you still feel this ticket is relevant, please 
submit a github issue
 here 
https://github.com/apache/airflow/issues/new/choose

> Webserver should refresh all workers in case of a dag refresh / update
> --
>
> Key: AIRFLOW-254
> URL: https://issues.apache.org/jira/browse/AIRFLOW-254
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Bolke de Bruin
>Assignee: Chinmay Kousik
>Priority: Major
>
> The webserver only refreshes one process in case a dag refresh is demanded or 
> an update is made to a dag. This is annoying as you might end up with old 
> code in the views or the dreaded "scheduler has put this in the db, but the 
> webserver hasnt got it yet".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-235) Improve connectors interface

2020-03-30 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071358#comment-17071358
 ] 

Daniel Imberman commented on AIRFLOW-235:
-

This issue has been closed.
 
If you still feel this ticket is relevant, please 
submit a github issue
 here 
https://github.com/apache/airflow/issues/new/choose

> Improve connectors interface
> 
>
> Key: AIRFLOW-235
> URL: https://issues.apache.org/jira/browse/AIRFLOW-235
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Affects Versions: 1.7.1.2
>Reporter: Jakob Homan
>Assignee: Chi Su
>Priority: Major
>
> Right now the connections interface has the same fields for all connectors, 
> whether or not they apply.  Per-connector values are stuffed into the extra 
> field, which doesn't have any description or clarification.  Connectors don't 
> have any way of displaying what extra information they require.
> It would be better if connectors could define what fields they specified 
> through the interface (a map of field name to type, description, validator, 
> etc).  The connector web page could then render these and pass them back to 
> the connector when it is instantiated. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-221) Improve task instance timeout logic to work in LocalExecutor

2020-03-30 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-221.
---
Resolution: Auto Closed

> Improve task instance timeout logic to work in LocalExecutor
> 
>
> Key: AIRFLOW-221
> URL: https://issues.apache.org/jira/browse/AIRFLOW-221
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Maxime Beauchemin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-235) Improve connectors interface

2020-03-30 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-235.
---
Resolution: Auto Closed

> Improve connectors interface
> 
>
> Key: AIRFLOW-235
> URL: https://issues.apache.org/jira/browse/AIRFLOW-235
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Affects Versions: 1.7.1.2
>Reporter: Jakob Homan
>Assignee: Chi Su
>Priority: Major
>
> Right now the connections interface has the same fields for all connectors, 
> whether or not they apply.  Per-connector values are stuffed into the extra 
> field, which doesn't have any description or clarification.  Connectors don't 
> have any way of displaying what extra information they require.
> It would be better if connectors could define what fields they specified 
> through the interface (a map of field name to type, description, validator, 
> etc).  The connector web page could then render these and pass them back to 
> the connector when it is instantiated. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-221) Improve task instance timeout logic to work in LocalExecutor

2020-03-30 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071357#comment-17071357
 ] 

Daniel Imberman commented on AIRFLOW-221:
-

This issue has been closed.
 
If you still feel this ticket is relevant, please 
submit a github issue
 here 
https://github.com/apache/airflow/issues/new/choose

> Improve task instance timeout logic to work in LocalExecutor
> 
>
> Key: AIRFLOW-221
> URL: https://issues.apache.org/jira/browse/AIRFLOW-221
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Maxime Beauchemin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-212) Bad plugin code can prevent Airflow from loading - add timeout

2020-03-30 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-212.
---
Resolution: Auto Closed

> Bad plugin code can prevent Airflow from loading - add timeout
> --
>
> Key: AIRFLOW-212
> URL: https://issues.apache.org/jira/browse/AIRFLOW-212
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: plugins
>Reporter: Maxime Beauchemin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-204) Tasks from manually triggered dag runs do not show up in Queued status on the UI

2020-03-30 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-204.
---
Resolution: Auto Closed

> Tasks from manually triggered dag runs do not show up in Queued status on the 
> UI
> 
>
> Key: AIRFLOW-204
> URL: https://issues.apache.org/jira/browse/AIRFLOW-204
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.7.1.2
>Reporter: Sergei Iakhnin
>Priority: Major
>
> When I manually trigger dag runs I see that the scheduler is trying to add 
> the tasks to the work queue, but these tasks never appear in queued status on 
> the webserver UI, thus there is a lack of visibility into these outstanding 
> tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-204) Tasks from manually triggered dag runs do not show up in Queued status on the UI

2020-03-30 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071355#comment-17071355
 ] 

Daniel Imberman commented on AIRFLOW-204:
-

This issue has been closed.
 
If you still feel this ticket is relevant, please 
submit a github issue
 here 
https://github.com/apache/airflow/issues/new/choose

> Tasks from manually triggered dag runs do not show up in Queued status on the 
> UI
> 
>
> Key: AIRFLOW-204
> URL: https://issues.apache.org/jira/browse/AIRFLOW-204
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Affects Versions: 1.7.1.2
>Reporter: Sergei Iakhnin
>Priority: Major
>
> When I manually trigger dag runs I see that the scheduler is trying to add 
> the tasks to the work queue, but these tasks never appear in queued status on 
> the webserver UI, thus there is a lack of visibility into these outstanding 
> tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-212) Bad plugin code can prevent Airflow from loading - add timeout

2020-03-30 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071356#comment-17071356
 ] 

Daniel Imberman commented on AIRFLOW-212:
-

This issue has been closed.
 
If you still feel this ticket is relevant, please 
submit a github issue
 here 
https://github.com/apache/airflow/issues/new/choose

> Bad plugin code can prevent Airflow from loading - add timeout
> --
>
> Key: AIRFLOW-212
> URL: https://issues.apache.org/jira/browse/AIRFLOW-212
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: plugins
>Reporter: Maxime Beauchemin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-154) Handle xcom parameters when running a single task

2020-03-30 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071354#comment-17071354
 ] 

Daniel Imberman commented on AIRFLOW-154:
-

This issue has been closed.
 
If you still feel this ticket is relevant, please 
submit a github issue
 here 
https://github.com/apache/airflow/issues/new/choose

> Handle xcom parameters when running a single task
> -
>
> Key: AIRFLOW-154
> URL: https://issues.apache.org/jira/browse/AIRFLOW-154
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: xcom
>Reporter: Timo
>Priority: Minor
>
> What would be the correct way to handle xcom parameters when running a single 
> task in a dag. Simplified example:
> task 1:
>   - Push parameter
> task 2:
>   - Pull parameter from task 1
>   - Push parameter
> task 3:
>   - Pull parameter from task 2
> I would like to only run task 2 manually.
>   1. Is it possible to specify any parameters from the commandline that would 
> substitute the xcom pull from task 1?  [Related issue with 
> PR|https://issues.apache.org/jira/browse/AIRFLOW-152]
>   2. What happens with the parameters pushed in task 2? Are they left behind 
> somewhere?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-146) Clear status button on task instance page

2020-03-30 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071353#comment-17071353
 ] 

Daniel Imberman commented on AIRFLOW-146:
-

This issue has been closed.
 
If you still feel this ticket is relevant, please 
submit a github issue
 here 
https://github.com/apache/airflow/issues/new/choose

> Clear status button on task instance page
> -
>
> Key: AIRFLOW-146
> URL: https://issues.apache.org/jira/browse/AIRFLOW-146
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Reporter: Chris Riccomini
>Priority: Major
>
> It would be nice to be able to clear the status (including past/future, 
> upstream/downstream) of a task from the task instance page, similar to the 
> way you can when you click on a task in the tree view.
> The use case is that we had an outage on one of our systems, so several 
> hundred tasks failed. We wanted to clear all of their statuses, including 
> their downstream tasks. We had no way to do this without doing one at a time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-154) Handle xcom parameters when running a single task

2020-03-30 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-154.
---
Resolution: Auto Closed

> Handle xcom parameters when running a single task
> -
>
> Key: AIRFLOW-154
> URL: https://issues.apache.org/jira/browse/AIRFLOW-154
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: xcom
>Reporter: Timo
>Priority: Minor
>
> What would be the correct way to handle xcom parameters when running a single 
> task in a dag. Simplified example:
> task 1:
>   - Push parameter
> task 2:
>   - Pull parameter from task 1
>   - Push parameter
> task 3:
>   - Pull parameter from task 2
> I would like to only run task 2 manually.
>   1. Is it possible to specify any parameters from the commandline that would 
> substitute the xcom pull from task 1?  [Related issue with 
> PR|https://issues.apache.org/jira/browse/AIRFLOW-152]
>   2. What happens with the parameters pushed in task 2? Are they left behind 
> somewhere?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-146) Clear status button on task instance page

2020-03-30 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-146.
---
Resolution: Auto Closed

> Clear status button on task instance page
> -
>
> Key: AIRFLOW-146
> URL: https://issues.apache.org/jira/browse/AIRFLOW-146
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ui
>Reporter: Chris Riccomini
>Priority: Major
>
> It would be nice to be able to clear the status (including past/future, 
> upstream/downstream) of a task from the task instance page, similar to the 
> way you can when you click on a task in the tree view.
> The use case is that we had an outage on one of our systems, so several 
> hundred tasks failed. We wanted to clear all of their statuses, including 
> their downstream tasks. We had no way to do this without doing one at a time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-98) Using Flask extensions from a plugin

2020-03-30 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-98?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-98.
--
Resolution: Auto Closed

> Using Flask extensions from a plugin
> 
>
> Key: AIRFLOW-98
> URL: https://issues.apache.org/jira/browse/AIRFLOW-98
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Timo
>Priority: Major
>
> I am creating a plugin which should be able to use some Flask extensions. Is 
> there any way to do this?
> Basically I need to import the airflow.www.app.app object in the plugin to 
> initialise the extension with.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-133) SLAs don't seem to work with schedule_interval=None

2020-03-30 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-133.
---
Resolution: Auto Closed

> SLAs don't seem to work with schedule_interval=None
> ---
>
> Key: AIRFLOW-133
> URL: https://issues.apache.org/jira/browse/AIRFLOW-133
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.7.0
>Reporter: Eric Johnson
>Assignee: Siddharth Anand
>Priority: Minor
>
> The issue is pretty simple. It looks like if you have a DAG with a 
> {{schedule_interval=None}}, you can't use an SLA with it. I'm running Airflow 
> 1.7.0 and it runs into trouble in jobs.py at this line around line 255.
> {{dttm = dag.following_schedule(dttm)}}
> I assume because there is no schedule to follow.
> I've provided a simple example to illustrate the issue. It's a task that will 
> take 2 minutes but the SLA is set at 1 minute. The SLA is not enforced. 
> {code}
> from builtins import range
> from airflow.operators import BashOperator, DummyOperator, TimeSensor
> from airflow.models import DAG
> from datetime import datetime, timedelta, time
> one_day_ago = datetime.combine(datetime.today() - timedelta(1), 
> datetime.min.time())
> args = {
> 'owner': 'ejohnson',
> 'start_date' : one_day_ago,
> 'email' : "ejohn...@sample.com",
> 'email_on_failure' : True
> }
> # This sets up the daily build jobs
> build_dir = DAG(
> dag_id='DailyBuild',
> default_args=args,
> schedule_interval=None)
> build = BashOperator(
> task_id='build',
> bash_command='sleep 2m',
> sla=timedelta(minutes=1),
> dag=build_dir)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-133) SLAs don't seem to work with schedule_interval=None

2020-03-30 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071352#comment-17071352
 ] 

Daniel Imberman commented on AIRFLOW-133:
-

This issue has been closed.
 
If you still feel this ticket is relevant, please 
submit a github issue
 here 
https://github.com/apache/airflow/issues/new/choose

> SLAs don't seem to work with schedule_interval=None
> ---
>
> Key: AIRFLOW-133
> URL: https://issues.apache.org/jira/browse/AIRFLOW-133
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.7.0
>Reporter: Eric Johnson
>Assignee: Siddharth Anand
>Priority: Minor
>
> The issue is pretty simple. It looks like if you have a DAG with a 
> {{schedule_interval=None}}, you can't use an SLA with it. I'm running Airflow 
> 1.7.0 and it runs into trouble in jobs.py at this line around line 255.
> {{dttm = dag.following_schedule(dttm)}}
> I assume because there is no schedule to follow.
> I've provided a simple example to illustrate the issue. It's a task that will 
> take 2 minutes but the SLA is set at 1 minute. The SLA is not enforced. 
> {code}
> from builtins import range
> from airflow.operators import BashOperator, DummyOperator, TimeSensor
> from airflow.models import DAG
> from datetime import datetime, timedelta, time
> one_day_ago = datetime.combine(datetime.today() - timedelta(1), 
> datetime.min.time())
> args = {
> 'owner': 'ejohnson',
> 'start_date' : one_day_ago,
> 'email' : "ejohn...@sample.com",
> 'email_on_failure' : True
> }
> # This sets up the daily build jobs
> build_dir = DAG(
> dag_id='DailyBuild',
> default_args=args,
> schedule_interval=None)
> build = BashOperator(
> task_id='build',
> bash_command='sleep 2m',
> sla=timedelta(minutes=1),
> dag=build_dir)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-98) Using Flask extensions from a plugin

2020-03-30 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071351#comment-17071351
 ] 

Daniel Imberman commented on AIRFLOW-98:


This issue has been closed.
 
If you still feel this ticket is relevant, please 
submit a github issue
 here 
https://github.com/apache/airflow/issues/new/choose

> Using Flask extensions from a plugin
> 
>
> Key: AIRFLOW-98
> URL: https://issues.apache.org/jira/browse/AIRFLOW-98
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Timo
>Priority: Major
>
> I am creating a plugin which should be able to use some Flask extensions. Is 
> there any way to do this?
> Basically I need to import the airflow.www.app.app object in the plugin to 
> initialise the extension with.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-60) Implement a GenericBulkTransfer operator

2020-03-30 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-60?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071348#comment-17071348
 ] 

Daniel Imberman commented on AIRFLOW-60:


This issue has been closed.
 
If you still feel this ticket is relevant, please 
submit a github issue
 here 
https://github.com/apache/airflow/issues/new/choose

> Implement a GenericBulkTransfer operator
> 
>
> Key: AIRFLOW-60
> URL: https://issues.apache.org/jira/browse/AIRFLOW-60
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: operators
>Reporter: Bence Nagy
>Priority: Minor
>
> This would be almost the same as GenericTransfer, except it would use 
> {{bulk_dump}} and {{bulk_load}} instead of {{get_records}} and 
> {{insert_rows}} to be able to transfer data larger than what fits into memory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-70) Use dag_run_id in TaskInstances for lineage

2020-03-30 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-70?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071349#comment-17071349
 ] 

Daniel Imberman commented on AIRFLOW-70:


This issue has been closed.
 
If you still feel this ticket is relevant, please 
submit a github issue
 here 
https://github.com/apache/airflow/issues/new/choose

> Use dag_run_id in TaskInstances for lineage
> ---
>
> Key: AIRFLOW-70
> URL: https://issues.apache.org/jira/browse/AIRFLOW-70
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 1.7.1
>Reporter: Bolke de Bruin
>Assignee: zgl
>Priority: Major
> Fix For: 1.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-70) Use dag_run_id in TaskInstances for lineage

2020-03-30 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-70?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-70.
--
Resolution: Auto Closed

> Use dag_run_id in TaskInstances for lineage
> ---
>
> Key: AIRFLOW-70
> URL: https://issues.apache.org/jira/browse/AIRFLOW-70
> Project: Apache Airflow
>  Issue Type: Sub-task
>  Components: scheduler
>Affects Versions: 1.7.1
>Reporter: Bolke de Bruin
>Assignee: zgl
>Priority: Major
> Fix For: 1.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-72) Implement proper capacity scheduler

2020-03-30 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071350#comment-17071350
 ] 

Daniel Imberman commented on AIRFLOW-72:


This issue has been closed.
 
If you still feel this ticket is relevant, please 
submit a github issue
 here 
https://github.com/apache/airflow/issues/new/choose

> Implement proper capacity scheduler
> ---
>
> Key: AIRFLOW-72
> URL: https://issues.apache.org/jira/browse/AIRFLOW-72
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Bolke de Bruin
>Priority: Major
>  Labels: pool, queue, scheduler
> Fix For: 2.0.0
>
>
> The scheduler is supposed to maintain queues and pools according to a 
> "capacity" model. However it is currently not properly implemented as 
> therefore issues as being able to oversubscribe to pools exist, race 
> conditions for queuing/dequeuing exist and probably others.
> This Jira Epic is to track all related issues to pooling/queuing and the 
> (tbd) roadmap to a proper capacity scheduler.
> Why queuing / scheduling broken:
> Locking is not properly implemented and cannot be as a check for slot 
> availability is spread throughout the scheduler, taskinstance and executor. 
> This makes obtaining a slot non-atomic and results in over subscribing. In 
> addition it leads to race conditions as having two tasks being picked from 
> the queue at the same time as the scheduler determines that a queued task 
> still needs to be send to the executor, while in an earlier run this already 
> happened.
> In order to fix this Pool handling needs to be centralized (code wise) and 
> work with a mutex (with_for_update()) on the database records. The 
> scheduler/taskinstance can then do something like:
> slot = Pool.obtain_slot(pool_id)
> Pool.release_slot(slot)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-72) Implement proper capacity scheduler

2020-03-30 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-72.
--
Resolution: Auto Closed

> Implement proper capacity scheduler
> ---
>
> Key: AIRFLOW-72
> URL: https://issues.apache.org/jira/browse/AIRFLOW-72
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Bolke de Bruin
>Priority: Major
>  Labels: pool, queue, scheduler
> Fix For: 2.0.0
>
>
> The scheduler is supposed to maintain queues and pools according to a 
> "capacity" model. However it is currently not properly implemented as 
> therefore issues as being able to oversubscribe to pools exist, race 
> conditions for queuing/dequeuing exist and probably others.
> This Jira Epic is to track all related issues to pooling/queuing and the 
> (tbd) roadmap to a proper capacity scheduler.
> Why queuing / scheduling broken:
> Locking is not properly implemented and cannot be as a check for slot 
> availability is spread throughout the scheduler, taskinstance and executor. 
> This makes obtaining a slot non-atomic and results in over subscribing. In 
> addition it leads to race conditions as having two tasks being picked from 
> the queue at the same time as the scheduler determines that a queued task 
> still needs to be send to the executor, while in an earlier run this already 
> happened.
> In order to fix this Pool handling needs to be centralized (code wise) and 
> work with a mutex (with_for_update()) on the database records. The 
> scheduler/taskinstance can then do something like:
> slot = Pool.obtain_slot(pool_id)
> Pool.release_slot(slot)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-60) Implement a GenericBulkTransfer operator

2020-03-30 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-60?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-60.
--
Resolution: Auto Closed

> Implement a GenericBulkTransfer operator
> 
>
> Key: AIRFLOW-60
> URL: https://issues.apache.org/jira/browse/AIRFLOW-60
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: operators
>Reporter: Bence Nagy
>Priority: Minor
>
> This would be almost the same as GenericTransfer, except it would use 
> {{bulk_dump}} and {{bulk_load}} instead of {{get_records}} and 
> {{insert_rows}} to be able to transfer data larger than what fits into memory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-30) Make preoperators part of the same transaction as the actual operation

2020-03-30 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071346#comment-17071346
 ] 

Daniel Imberman commented on AIRFLOW-30:


This issue has been closed.
 
If you still feel this ticket is relevant, please 
submit a github issue
 here 
https://github.com/apache/airflow/issues/new/choose

> Make preoperators part of the same transaction as the actual operation
> --
>
> Key: AIRFLOW-30
> URL: https://issues.apache.org/jira/browse/AIRFLOW-30
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Bence Nagy
>Assignee: Kevin Mandich
>Priority: Major
>
> All my use cases would work better if each operator would execute everything 
> in one transaction. Two examples:
> - I want to {{GenericTransfer}} a set of rows from one DB to another, and I 
> have to create the table first in the destination DB. I feel like it'd be a 
> lot more clean if I didn't have empty tables lying around if the insertion 
> fails for some reason later on.
> - I want to {{GenericTransfer}} all rows from an entire table periodically to 
> sync it from one DB to another. To do this correctly I want to clear the 
> destination table first to make sure I end up with no duplicate rows, so I'd 
> have a {{DELETE * FROM dst_table}} preoperator. If the insertions fail 
> afterwards, I'd end up with no data (it would be better in most cases to fall 
> back to the old data), and even if everything is working correctly, I'll have 
> an empty table while the insertions as still executing.
> To fix this, the relevant {{DbApiHook}} methods could support a new kwarg to 
> set whether it should commit at the end.
> Thoughts?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-30) Make preoperators part of the same transaction as the actual operation

2020-03-30 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-30?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-30.
--
Resolution: Auto Closed

> Make preoperators part of the same transaction as the actual operation
> --
>
> Key: AIRFLOW-30
> URL: https://issues.apache.org/jira/browse/AIRFLOW-30
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Reporter: Bence Nagy
>Assignee: Kevin Mandich
>Priority: Major
>
> All my use cases would work better if each operator would execute everything 
> in one transaction. Two examples:
> - I want to {{GenericTransfer}} a set of rows from one DB to another, and I 
> have to create the table first in the destination DB. I feel like it'd be a 
> lot more clean if I didn't have empty tables lying around if the insertion 
> fails for some reason later on.
> - I want to {{GenericTransfer}} all rows from an entire table periodically to 
> sync it from one DB to another. To do this correctly I want to clear the 
> destination table first to make sure I end up with no duplicate rows, so I'd 
> have a {{DELETE * FROM dst_table}} preoperator. If the insertions fail 
> afterwards, I'd end up with no data (it would be better in most cases to fall 
> back to the old data), and even if everything is working correctly, I'll have 
> an empty table while the insertions as still executing.
> To fix this, the relevant {{DbApiHook}} methods could support a new kwarg to 
> set whether it should commit at the end.
> Thoughts?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-905) Support querying dag_run state by run_id

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-905.
---
Resolution: Auto Closed

> Support querying dag_run state by run_id
> 
>
> Key: AIRFLOW-905
> URL: https://issues.apache.org/jira/browse/AIRFLOW-905
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: cli
>Reporter: Shashi Kant Sharms
>Priority: Minor
>
> Support querying dag_run and task_instances by run_id. Currently run id is 
> supported in trigger dag command but it is no used in any of querying command.
> Existing Commands :
> # airflow dag_state [-h] [-sd SUBDIR] dag_id execution_date
> # airflow task_state [-h] [-sd SUBDIR] dag_id task_id execution_date
> New Commands :
> # airflow dag_state [-h] [-sd SUBDIR] dag_id execution_date|run_id
> # airflow task_state [-h] [-sd SUBDIR] dag_id task_id execution_date|run_id
> For dag state change we can pass run id from CLI and pass the same in find 
> function for Dags.
> for task_state we need to modify the Task Instance model to include run id 
> and then we can find based upon run id as well. As part of this we will make 
> execution date as optional attribute for task instance find function. 
> As part of this change I'll also include run id in "airflow run"  and 
> "airflow render"  commands. Details will follow around this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-905) Support querying dag_run state by run_id

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070622#comment-17070622
 ] 

Daniel Imberman commented on AIRFLOW-905:
-

This issue has been moved to https://github.com/apache/airflow/issues/8001

> Support querying dag_run state by run_id
> 
>
> Key: AIRFLOW-905
> URL: https://issues.apache.org/jira/browse/AIRFLOW-905
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: cli
>Reporter: Shashi Kant Sharms
>Priority: Minor
>
> Support querying dag_run and task_instances by run_id. Currently run id is 
> supported in trigger dag command but it is no used in any of querying command.
> Existing Commands :
> # airflow dag_state [-h] [-sd SUBDIR] dag_id execution_date
> # airflow task_state [-h] [-sd SUBDIR] dag_id task_id execution_date
> New Commands :
> # airflow dag_state [-h] [-sd SUBDIR] dag_id execution_date|run_id
> # airflow task_state [-h] [-sd SUBDIR] dag_id task_id execution_date|run_id
> For dag state change we can pass run id from CLI and pass the same in find 
> function for Dags.
> for task_state we need to modify the Task Instance model to include run id 
> and then we can find based upon run id as well. As part of this we will make 
> execution date as optional attribute for task instance find function. 
> As part of this change I'll also include run id in "airflow run"  and 
> "airflow render"  commands. Details will follow around this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-849) Show whether spark job is "running" or "waiting"

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-849.
---
Resolution: Auto Closed

> Show whether spark job is "running" or "waiting"
> 
>
> Key: AIRFLOW-849
> URL: https://issues.apache.org/jira/browse/AIRFLOW-849
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Reporter: Daniel Blazevski
>Priority: Minor
>
> As of now, Airflow shows that a task is "running" in Airflow when it could be 
> listed as "WAITING" in the Spark UI (and can be listed as "WAITING" for hours 
> when the Airflow UI might indicate it is running if I forget to check the 
> Spark UI). 
> Would be awesome once ticket 802 for spark integration is complete to add 
> this enhancement of more granular detail about the task. 
> Note (Daniel 3/29/20): We should add an "external" field to task state to 
> store task-specific data like spark cluster state



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-849) Show whether spark job is "running" or "waiting"

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070620#comment-17070620
 ] 

Daniel Imberman commented on AIRFLOW-849:
-

This issue has been moved to https://github.com/apache/airflow/issues/8000

> Show whether spark job is "running" or "waiting"
> 
>
> Key: AIRFLOW-849
> URL: https://issues.apache.org/jira/browse/AIRFLOW-849
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Reporter: Daniel Blazevski
>Priority: Minor
>
> As of now, Airflow shows that a task is "running" in Airflow when it could be 
> listed as "WAITING" in the Spark UI (and can be listed as "WAITING" for hours 
> when the Airflow UI might indicate it is running if I forget to check the 
> Spark UI). 
> Would be awesome once ticket 802 for spark integration is complete to add 
> this enhancement of more granular detail about the task. 
> Note (Daniel 3/29/20): We should add an "external" field to task state to 
> store task-specific data like spark cluster state



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-849) Show whether spark job is "running" or "waiting"

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman updated AIRFLOW-849:

Description: 
As of now, Airflow shows that a task is "running" in Airflow when it could be 
listed as "WAITING" in the Spark UI (and can be listed as "WAITING" for hours 
when the Airflow UI might indicate it is running if I forget to check the Spark 
UI). 

Would be awesome once ticket 802 for spark integration is complete to add this 
enhancement of more granular detail about the task. 

Note (Daniel 3/29/20): We should add an "external" field to task state to store 
task-specific data like spark cluster state

  was:
As of now, Airflow shows that a task is "running" in Airflow when it could be 
listed as "WAITING" in the Spark UI (and can be listed as "WAITING" for hours 
when the Airflow UI might indicate it is running if I forget to check the Spark 
UI). 

Would be awesome once ticket 802 for spark integration is complete to add this 
enhancement of more granular detail about the task. 


> Show whether spark job is "running" or "waiting"
> 
>
> Key: AIRFLOW-849
> URL: https://issues.apache.org/jira/browse/AIRFLOW-849
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Reporter: Daniel Blazevski
>Priority: Minor
>
> As of now, Airflow shows that a task is "running" in Airflow when it could be 
> listed as "WAITING" in the Spark UI (and can be listed as "WAITING" for hours 
> when the Airflow UI might indicate it is running if I forget to check the 
> Spark UI). 
> Would be awesome once ticket 802 for spark integration is complete to add 
> this enhancement of more granular detail about the task. 
> Note (Daniel 3/29/20): We should add an "external" field to task state to 
> store task-specific data like spark cluster state



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-901) airflow interface shows jobs that no longer exist

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070618#comment-17070618
 ] 

Daniel Imberman commented on AIRFLOW-901:
-

[~antoine.grouazel] do you know if this issue is still present?

> airflow interface shows jobs that no longer exist
> -
>
> Key: AIRFLOW-901
> URL: https://issues.apache.org/jira/browse/AIRFLOW-901
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: api
>Affects Versions: 1.8.0
> Environment: ubuntu 12.04
>Reporter: Antoine Grouazel
>Priority: Major
>
> The web interface with version 1.8 seems to show o longer existing jobs with 
> status 'running'.
> when I look for the given PID I get "No such process" message.
> Possible origin start/stop of workers with cold shut down.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-902) display the missing conditions for running a task

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-902.
---
Resolution: Auto Closed

> display the missing conditions for running a task
> -
>
> Key: AIRFLOW-902
> URL: https://issues.apache.org/jira/browse/AIRFLOW-902
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: api
>Affects Versions: 2.0.0
>Reporter: Antoine Grouazel
>Priority: Minor
>
> One information that is missing in the interface is the reason why a task in 
> queue is not executed by the workers.
> There are many possible reasons:
> -dag concurrency
> -workers down
> -process concurrency
> -status of the parent tasks
> It could be nice to have this information on the home page of the web 
> interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-902) display the missing conditions for running a task

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070619#comment-17070619
 ] 

Daniel Imberman commented on AIRFLOW-902:
-

This issue has been moved to https://github.com/apache/airflow/issues/7998

> display the missing conditions for running a task
> -
>
> Key: AIRFLOW-902
> URL: https://issues.apache.org/jira/browse/AIRFLOW-902
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: api
>Affects Versions: 2.0.0
>Reporter: Antoine Grouazel
>Priority: Minor
>
> One information that is missing in the interface is the reason why a task in 
> queue is not executed by the workers.
> There are many possible reasons:
> -dag concurrency
> -workers down
> -process concurrency
> -status of the parent tasks
> It could be nice to have this information on the home page of the web 
> interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-890) Having a REST layer for all command line interface of Airflow

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070617#comment-17070617
 ] 

Daniel Imberman commented on AIRFLOW-890:
-

This issue has been moved to https://github.com/apache/airflow/issues/7997

> Having a REST layer for all command line interface of Airflow
> -
>
> Key: AIRFLOW-890
> URL: https://issues.apache.org/jira/browse/AIRFLOW-890
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: cli
>Reporter: Amit Ghosh
>Priority: Major
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> We are planning to use Airflow and was thinking if can have a REST interface 
> for all the command line interface. This will be a cool feature to have. If 
> you all wish then I can take up that task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-890) Having a REST layer for all command line interface of Airflow

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-890.
---
Resolution: Auto Closed

> Having a REST layer for all command line interface of Airflow
> -
>
> Key: AIRFLOW-890
> URL: https://issues.apache.org/jira/browse/AIRFLOW-890
> Project: Apache Airflow
>  Issue Type: Wish
>  Components: cli
>Reporter: Amit Ghosh
>Priority: Major
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> We are planning to use Airflow and was thinking if can have a REST interface 
> for all the command line interface. This will be a cool feature to have. If 
> you all wish then I can take up that task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-883) Assigning operator to DAG via bitwise composition does not pickup default args

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070616#comment-17070616
 ] 

Daniel Imberman commented on AIRFLOW-883:
-

This issue has been moved to https://github.com/apache/airflow/issues/7996

> Assigning operator to DAG via bitwise composition does not pickup default args
> --
>
> Key: AIRFLOW-883
> URL: https://issues.apache.org/jira/browse/AIRFLOW-883
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Affects Versions: 1.10.3
>Reporter: Daniel Huang
>Assignee: Ash Berlin-Taylor
>Priority: Minor
>
> This is only the case when the operator does not specify {{dag=dag}} and is 
> not initialized within a DAG's context manager (due to 
> https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/utils/decorators.py#L50)
> Example:
> {code}
> default_args = {
> 'owner': 'airflow', 
> 'start_date': datetime(2017, 2, 1)
> }
> dag = DAG('my_dag', default_args=default_args)
> dummy = DummyOperator(task_id='dummy')
> dag >> dummy
> {code}
> This will raise a {{Task is missing the start_date parameter}}. I _think_ 
> this should probably be allowed because I assume the purpose of supporting 
> {{dag >> op}} was to allow delayed assignment of an operator to a DAG. 
> I believe to fix this, on assignment, we would need to go back and go through 
> dag.default_args to see if any of those attrs weren't explicitly set on 
> task...not the cleanest. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-883) Assigning operator to DAG via bitwise composition does not pickup default args

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-883.
---
Resolution: Auto Closed

> Assigning operator to DAG via bitwise composition does not pickup default args
> --
>
> Key: AIRFLOW-883
> URL: https://issues.apache.org/jira/browse/AIRFLOW-883
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Affects Versions: 1.10.3
>Reporter: Daniel Huang
>Assignee: Ash Berlin-Taylor
>Priority: Minor
>
> This is only the case when the operator does not specify {{dag=dag}} and is 
> not initialized within a DAG's context manager (due to 
> https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/utils/decorators.py#L50)
> Example:
> {code}
> default_args = {
> 'owner': 'airflow', 
> 'start_date': datetime(2017, 2, 1)
> }
> dag = DAG('my_dag', default_args=default_args)
> dummy = DummyOperator(task_id='dummy')
> dag >> dummy
> {code}
> This will raise a {{Task is missing the start_date parameter}}. I _think_ 
> this should probably be allowed because I assume the purpose of supporting 
> {{dag >> op}} was to allow delayed assignment of an operator to a DAG. 
> I believe to fix this, on assignment, we would need to go back and go through 
> dag.default_args to see if any of those attrs weren't explicitly set on 
> task...not the cleanest. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-883) Assigning operator to DAG via bitwise composition does not pickup default args

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman updated AIRFLOW-883:

Affects Version/s: 1.10.3

> Assigning operator to DAG via bitwise composition does not pickup default args
> --
>
> Key: AIRFLOW-883
> URL: https://issues.apache.org/jira/browse/AIRFLOW-883
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Affects Versions: 1.10.3
>Reporter: Daniel Huang
>Assignee: Ash Berlin-Taylor
>Priority: Minor
>
> This is only the case when the operator does not specify {{dag=dag}} and is 
> not initialized within a DAG's context manager (due to 
> https://github.com/apache/incubator-airflow/blob/fb0c5775cda4f84c07d8d5c0e6277fc387c172e6/airflow/utils/decorators.py#L50)
> Example:
> {code}
> default_args = {
> 'owner': 'airflow', 
> 'start_date': datetime(2017, 2, 1)
> }
> dag = DAG('my_dag', default_args=default_args)
> dummy = DummyOperator(task_id='dummy')
> dag >> dummy
> {code}
> This will raise a {{Task is missing the start_date parameter}}. I _think_ 
> this should probably be allowed because I assume the purpose of supporting 
> {{dag >> op}} was to allow delayed assignment of an operator to a DAG. 
> I believe to fix this, on assignment, we would need to go back and go through 
> dag.default_args to see if any of those attrs weren't explicitly set on 
> task...not the cleanest. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-879) apply_defaults ignored with BaseOperator._set_relatives dag assignment

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-879.
---
Resolution: Auto Closed

> apply_defaults ignored with BaseOperator._set_relatives dag assignment
> --
>
> Key: AIRFLOW-879
> URL: https://issues.apache.org/jira/browse/AIRFLOW-879
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Reporter: George Leslie-Waksman
>Priority: Major
>
> When a task is instantiated, the apply_defaults decorator retrieves 
> default_args and params from the task's dag's defaults_args and params if a 
> dag is specified.
> During set_upstream/set_downstream if task A has a dag assigned but task B 
> does not, task B will be assigned the dag from task A.
> The set_upstream/set_downstream implicit dag assignment occurs after 
> apply_defaults has been processed so the task will not receive the dag's 
> default args.
> {code:title=bad_arg_dag.py}
> import datetime
> import airflow.models
> from airflow.operators.dummy_operator import DummyOperator
> DAG = airflow.models.DAG(
> dag_id='test_dag',
> schedule_interval=None,
> start_date=datetime.datetime(2017, 2, 14),
> default_args={'owner': 'airflow', 'queue': 'some_queue'},
> )
> TASK1 = DummyOperator(
> task_id='task1',
> dag=DAG,
> )
> TASK2 = DummyOperator(
> task_id='task2',
> )
> TASK2.set_upstream(TASK1)
> {code}
> In this case, both TASK1 and TASK2 will be assigned to DAG and TASK1 will 
> receive the dag default queue of 'some_queue' but TASK2 will receive the 
> airflow configuration default queue of 'default'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-879) apply_defaults ignored with BaseOperator._set_relatives dag assignment

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070614#comment-17070614
 ] 

Daniel Imberman commented on AIRFLOW-879:
-

This issue has been moved to https://github.com/apache/airflow/issues/7995

> apply_defaults ignored with BaseOperator._set_relatives dag assignment
> --
>
> Key: AIRFLOW-879
> URL: https://issues.apache.org/jira/browse/AIRFLOW-879
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Reporter: George Leslie-Waksman
>Priority: Major
>
> When a task is instantiated, the apply_defaults decorator retrieves 
> default_args and params from the task's dag's defaults_args and params if a 
> dag is specified.
> During set_upstream/set_downstream if task A has a dag assigned but task B 
> does not, task B will be assigned the dag from task A.
> The set_upstream/set_downstream implicit dag assignment occurs after 
> apply_defaults has been processed so the task will not receive the dag's 
> default args.
> {code:title=bad_arg_dag.py}
> import datetime
> import airflow.models
> from airflow.operators.dummy_operator import DummyOperator
> DAG = airflow.models.DAG(
> dag_id='test_dag',
> schedule_interval=None,
> start_date=datetime.datetime(2017, 2, 14),
> default_args={'owner': 'airflow', 'queue': 'some_queue'},
> )
> TASK1 = DummyOperator(
> task_id='task1',
> dag=DAG,
> )
> TASK2 = DummyOperator(
> task_id='task2',
> )
> TASK2.set_upstream(TASK1)
> {code}
> In this case, both TASK1 and TASK2 will be assigned to DAG and TASK1 will 
> receive the dag default queue of 'some_queue' but TASK2 will receive the 
> airflow configuration default queue of 'default'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-878) FileNotFoundError: 'gunicorn' after initial setup

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-878.
---
Resolution: Auto Closed

> FileNotFoundError: 'gunicorn' after initial setup
> -
>
> Key: AIRFLOW-878
> URL: https://issues.apache.org/jira/browse/AIRFLOW-878
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Manuel Barkhau
>Priority: Major
>
> I get the following error after installing airflow and doing {{airflow init}}
> {code}
> $ venv/bin/airflow webserver 
> [2017-02-15 12:24:52,813] {__init__.py:56} INFO - Using executor 
> SequentialExecutor
> [2017-02-15 12:24:52,886] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python3.5/lib2to3/Grammar.txt
> [2017-02-15 12:24:52,904] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python3.5/lib2to3/PatternGrammar.txt
>      _
>  |__( )_  __/__  /  __
>   /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
> ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
>  _/_/  |_/_/  /_//_//_/  \//|__/
>  
> /home/mbarkhau/testproject/venv/lib/python3.5/site-packages/flask/exthook.py:71:
>  ExtDeprecationWarning: Importing flask.ext.cache is deprecated, use 
> flask_cache in
> stead.
>   .format(x=modname), ExtDeprecationWarning
> [2017-02-15 12:24:53,145] [11995] {models.py:167} INFO - Filling up the 
> DagBag from /home/mbarkhau/airflow/dags
> Traceback (most recent call last):
>   File "venv/bin/airflow", line 6, in 
> exec(compile(open(__file__).read(), __file__, 'exec'))
>   File "/home/mbarkhau/testproject/venv/src/airflow/airflow/bin/airflow", 
> line 28, in 
> args.func(args)
>   File "/home/mbarkhau/testproject/venv/src/airflow/airflow/bin/cli.py", line 
> 791, in webserver
> gunicorn_master_proc = subprocess.Popen(run_args)
>   File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
> restore_signals, start_new_session)
>   File "/usr/lib/python3.5/subprocess.py", line 1551, in _execute_child
> raise child_exception_type(errno_num, err_msg)
> FileNotFoundError: [Errno 2] No such file or directory: 'gunicorn'
> Running the Gunicorn Server with:
> Workers: 4 sync
> Host: 0.0.0.0:8080
> Timeout: 120
> Logfiles: - -
> =
> {code}
> My setup
> {code}
> $ venv/bin/python --version
> Python 3.5.2
> $ venv/bin/pip freeze | grep airflow
> -e 
> git+ssh://g...@github.com/apache/incubator-airflow.git@debc69e2787542cd56ab28b6c48db01c65ad05c4#egg=airflow
> $ uname -a
> Linux mbarkhau-office 4.4.0-62-generic #83-Ubuntu SMP Wed Jan 18 14:10:15 UTC 
> 2017 x86_64 x86_64 x86_64 GNU/Linux
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-851) Docs: Info how to cancel a running job

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070613#comment-17070613
 ] 

Daniel Imberman commented on AIRFLOW-851:
-

This issue has been moved to https://github.com/apache/airflow/issues/7994

> Docs: Info how to cancel a running job
> --
>
> Key: AIRFLOW-851
> URL: https://issues.apache.org/jira/browse/AIRFLOW-851
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Andi Pl
>Priority: Major
>
> I am not sure how to kill a running Job.
> We have long running bash jobs and sometimes find out, that there is an issue 
> and we want to cancel them. 
> Right now we have to use ssh and do:
> {code}
> ps -aux  | grep "bash script name"
> kill pid
> {code}
> Maybe there is already a way to do it form the UI, but we could not figure it 
> out. I would like to add documentation, how to do it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-851) Docs: Info how to cancel a running job

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070611#comment-17070611
 ] 

Daniel Imberman commented on AIRFLOW-851:
-

This issue has been moved to https://github.com/apache/airflow/issues/7992

> Docs: Info how to cancel a running job
> --
>
> Key: AIRFLOW-851
> URL: https://issues.apache.org/jira/browse/AIRFLOW-851
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Andi Pl
>Priority: Major
>
> I am not sure how to kill a running Job.
> We have long running bash jobs and sometimes find out, that there is an issue 
> and we want to cancel them. 
> Right now we have to use ssh and do:
> {code}
> ps -aux  | grep "bash script name"
> kill pid
> {code}
> Maybe there is already a way to do it form the UI, but we could not figure it 
> out. I would like to add documentation, how to do it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-851) Docs: Info how to cancel a running job

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070612#comment-17070612
 ] 

Daniel Imberman commented on AIRFLOW-851:
-

This issue has been moved to https://github.com/apache/airflow/issues/7993

> Docs: Info how to cancel a running job
> --
>
> Key: AIRFLOW-851
> URL: https://issues.apache.org/jira/browse/AIRFLOW-851
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Andi Pl
>Priority: Major
>
> I am not sure how to kill a running Job.
> We have long running bash jobs and sometimes find out, that there is an issue 
> and we want to cancel them. 
> Right now we have to use ssh and do:
> {code}
> ps -aux  | grep "bash script name"
> kill pid
> {code}
> Maybe there is already a way to do it form the UI, but we could not figure it 
> out. I would like to add documentation, how to do it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-851) Docs: Info how to cancel a running job

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-851.
---
Resolution: Auto Closed

> Docs: Info how to cancel a running job
> --
>
> Key: AIRFLOW-851
> URL: https://issues.apache.org/jira/browse/AIRFLOW-851
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Andi Pl
>Priority: Major
>
> I am not sure how to kill a running Job.
> We have long running bash jobs and sometimes find out, that there is an issue 
> and we want to cancel them. 
> Right now we have to use ssh and do:
> {code}
> ps -aux  | grep "bash script name"
> kill pid
> {code}
> Maybe there is already a way to do it form the UI, but we could not figure it 
> out. I would like to add documentation, how to do it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-849) Show whether spark job is "running" or "waiting"

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070610#comment-17070610
 ] 

Daniel Imberman commented on AIRFLOW-849:
-

I'm on the fence about this. It could be nice to have a "preparing" or 
"starting" state.

[~ash] [~potiuk] What do you think?

> Show whether spark job is "running" or "waiting"
> 
>
> Key: AIRFLOW-849
> URL: https://issues.apache.org/jira/browse/AIRFLOW-849
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: ui
>Reporter: Daniel Blazevski
>Priority: Minor
>
> As of now, Airflow shows that a task is "running" in Airflow when it could be 
> listed as "WAITING" in the Spark UI (and can be listed as "WAITING" for hours 
> when the Airflow UI might indicate it is running if I forget to check the 
> Spark UI). 
> Would be awesome once ticket 802 for spark integration is complete to add 
> this enhancement of more granular detail about the task. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-848) Flower does not cleanup pid file on SIGTERM

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070608#comment-17070608
 ] 

Daniel Imberman commented on AIRFLOW-848:
-

[~turbaszek] did your recent PRs fix this issue?

> Flower does not cleanup pid file on SIGTERM
> ---
>
> Key: AIRFLOW-848
> URL: https://issues.apache.org/jira/browse/AIRFLOW-848
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.7.1.3
>Reporter: Rahul
>Priority: Minor
>
> I am running a monit restart for celery flower. However, when I send a 
> SIGTERM signal to flower and start again, flower doesn't start up. This is 
> because of the pid file. I think it should clear this on sigterm/sigint.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-804) Airflow installs latest Celery which does not support `sqla` transport

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-804.
---
Resolution: Feedback Received

> Airflow installs latest Celery which does not support `sqla` transport
> --
>
> Key: AIRFLOW-804
> URL: https://issues.apache.org/jira/browse/AIRFLOW-804
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executors
>Reporter: Mohamed Zenadi
>Priority: Minor
>
> Kombu dropped support for `sqla` transport in its version 4.x. With the 
> default configuration this will raise an exception:
> {code}
> $ airflow flower
> [2017-01-25 17:20:05,064] {__init__.py:57} INFO - Using executor 
> SequentialExecutor
> [I 170125 17:20:07 command:136] Visit me at http://0.0.0.0:
> Unknown Celery version
> Traceback (most recent call last):
>   File 
> "/home/10025051/.linuxbrew/opt/python3/lib/python3.6/site-packages/kombu/transport/__init__.py",
>  line 53, in resolve_transport
> transport = TRANSPORT_ALIASES[transport]
> KeyError: 'sqla'
> {code}
> A fix would be to change the default configuration for not using `sqla`. To 
> avoid any confusing error for a new user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-708) SSHExecuteOperator dont respect multiline output from the command

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-708.
---
Resolution: Auto Closed

> SSHExecuteOperator dont respect multiline output from the command 
> --
>
> Key: AIRFLOW-708
> URL: https://issues.apache.org/jira/browse/AIRFLOW-708
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jay Sen
>Assignee: Jay Sen
>Priority: Major
>
> I find following piece of code works when you have 1 liner output but simply 
> cant work for multiline output from the given bash_command ( it will print 
> proper multiline output in log though )
> {code}
> line = ''
> for line in iter(sp.stdout.readline, b''):
> line = line.decode().strip()
> logging.info(line)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-636) Warn on DagBag skipping modules

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-636.
---
Resolution: Auto Closed

> Warn on DagBag skipping modules
> ---
>
> Key: AIRFLOW-636
> URL: https://issues.apache.org/jira/browse/AIRFLOW-636
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Maxime Beauchemin
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-669) JobScheduler skips the first dag run

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070488#comment-17070488
 ] 

Daniel Imberman commented on AIRFLOW-669:
-

[~ericmoritz] Is this ticket still relevant in 1.10

> JobScheduler skips the first dag run
> 
>
> Key: AIRFLOW-669
> URL: https://issues.apache.org/jira/browse/AIRFLOW-669
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.7.1.3, 1.8.0
>Reporter: Eric Moritz
>Priority: Major
>
> There appears to be a bug in Airflow's JobScheduler where it does not create 
> the first scheduled dag run until the second scheduled dag run.
> For instance for a DAG with the following properties:
> {code}
> dag = DAG(
> 'test_scheduler_dagrun_skips_first',
> start_date=datetime.datetime(2016, 11, 1, 12),
> schedule_interval="00 * * * *"
> )
> {code}
> When the current time is {{2016-11-01T12:01}} the expected behavior is that a 
> dag run will be scheduled for {{2016-11-01T12:00}} however the 
> {{JobSchedule.create_dag_run()}} method returns {{None}}
> When the current time is {{2016-11-01T12:01}} the expected behavior is that a 
> dag run will be scheduled for {{2016-11-01T13:00}} however the 
> {{JobSchedule.create_dag_run()}} creates a DagRun model with the 
> {{execution_date}} of {{2016-11-01T12:00}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-644) Issue with past runs when using starttime as datetime.now()

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-644.
---
Resolution: Auto Closed

> Issue with past runs when using starttime as datetime.now()
> ---
>
> Key: AIRFLOW-644
> URL: https://issues.apache.org/jira/browse/AIRFLOW-644
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Reporter: Puneeth Potu
>Priority: Major
>
> Hi, we used the following snippet in the dag parameters
> ```default_args = {
> 'owner': 'dwh',
> 'depends_on_past': True,
> 'wait_for_downstream': True,
> 'start_date': datetime.now(),```
> When used datetime.now() along with frequency as @daily I see the last 5 runs 
> in my graph view, and the dag status of  all the previous runs is "FAILED"
> When used datetime.now() along with frequency as @monthly I see the last 14 
> runs in my graph view, and the dag status of  all the previous runs is 
> "FAILED"
> When used datetime.now() along with frequency as @weekly I see the last 53 
> runs in my graph view, and the dag status of  all the previous runs is 
> "FAILED"
> For monthly and weekly it is not showing either the current week or month. I 
> activated my Dags today (11/22/2016).
> I see weekly runs populated from (2015-11-15 to 2016-11-13), and I don't see 
> 2016-11-20 which is the latest.
> I see Monthly runs populated from (2015-09-01 to 2016-10-01) and I don't see 
> 2016-11-01 which is the latest.
> Please, advise if this is the expected behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-636) Warn on DagBag skipping modules

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070485#comment-17070485
 ] 

Daniel Imberman commented on AIRFLOW-636:
-

This issue has been moved to https://github.com/apache/airflow/issues/7990

> Warn on DagBag skipping modules
> ---
>
> Key: AIRFLOW-636
> URL: https://issues.apache.org/jira/browse/AIRFLOW-636
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Maxime Beauchemin
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-673) Add operational metrics test for SchedulerJob

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-673.
---
Resolution: Auto Closed

> Add operational metrics test for SchedulerJob
> -
>
> Key: AIRFLOW-673
> URL: https://issues.apache.org/jira/browse/AIRFLOW-673
> Project: Apache Airflow
>  Issue Type: Test
>  Components: core, scheduler, tests
>Reporter: Vijay Bhat
>Assignee: Vijay Bhat
>Priority: Major
>
> Add a performance test to supply operational metrics for SchedulerJob. We 
> want to know if any DAG is starved of resources, and this will be reflected 
> in the stats printed out at the end of the test run.
> Please refer to the discussion in the PR - 
> https://github.com/apache/incubator-airflow/pull/1906.
> @bolkedbruin - This has operational consequences which I can not fully 
> oversee. Such as the scheduling now might be biased towards one dag (ie. do 
> other dags run at all?). While this issue is already there it might become 
> more prevalent because of this.
> Can you supply operational metrics @vijaysbhat?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-593) Tasks do not get backfilled sequentially

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-593.
---
Resolution: Auto Closed

> Tasks do not get backfilled sequentially
> 
>
> Key: AIRFLOW-593
> URL: https://issues.apache.org/jira/browse/AIRFLOW-593
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun, scheduler
>Affects Versions: 1.7.1.3
>Reporter: Jong Kim
>Priority: Minor
> Attachments: Screen Shot 2018-07-20 at 10.04.24 AM.png
>
>
> I need to have the tasks within a DAG complete in order when running 
> backfills. I am running on my mac locally using SequentialExecutor.
> Let's say I have a DAG running daily at 11AM UTC (0 11 * * *) with a 
> start_date: datetime(2016, 10, 20, 11, 0, 0). The DAG consists of 3 tasks, 
> which must complete in order. task0 -> task1 -> task2. This dependency is set 
> using .set_downstream().
> Today (2016/10/22) I reset the database, turn-on the DAGrun using the on/off 
> toggle in the webserver, and issue "airflow scheduler", which will 
> automatically backfill starting from start_date.
> It will backfill for 2016/10/20 and 2016/10/21.  I expect backfill to run 
> like the following sequentially:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 20, 11, 0, 0) task2
> datetime(2016, 10, 21, 11, 0, 0) task0
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task2
> With 'depends_on_past': False, I see Airflow running tasks grouped by 
> sequence number something like this, which is not what I want:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 21, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 20, 11, 0, 0) task2
> datetime(2016, 10, 21, 11, 0, 0) task2
> With 'depends_on_past': True and 'wait_for_downstream': True, I expect it to 
> run like what I need to, but instead it runs some tasks out of order like 
> this:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task0   <- out of order!
> datetime(2016, 10, 20, 11, 0, 0) task2   <- out of order!
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task2
> Is this a bug? If not, am I understanding 'depends_on_past' and 
> 'wait_for_downstream' correctly? What do I need to do?
> The only remedy I can think of is to backfill each date manually.
> Public gist of DAG: 
> https://gist.github.com/jong-eatsa/cba1bf3c182b38e966696da47164faf1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-594) Load plugins and workflows from installed packages

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-594.
---
Resolution: Auto Closed

> Load plugins and workflows from installed packages
> --
>
> Key: AIRFLOW-594
> URL: https://issues.apache.org/jira/browse/AIRFLOW-594
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: models, plugins
>Reporter: Malte Harder
>Assignee: Drew Sonne
>Priority: Minor
>
> Within my company (http://blue-yonder.com) we are using Airflow. As our 
> infrastructure is suited towards the deployment of python packages we 
> currently have a workaround in place to link modules from entrypoints in 
> packages to the corresponding airflow directories. We would be happy if 
> airflow would directly load plugins and dags from installed packages if the 
> they are exposed via entrypoints.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-593) Tasks do not get backfilled sequentially

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070484#comment-17070484
 ] 

Daniel Imberman commented on AIRFLOW-593:
-

This issue has been moved to https://github.com/apache/airflow/issues/7989

> Tasks do not get backfilled sequentially
> 
>
> Key: AIRFLOW-593
> URL: https://issues.apache.org/jira/browse/AIRFLOW-593
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun, scheduler
>Affects Versions: 1.7.1.3
>Reporter: Jong Kim
>Priority: Minor
> Attachments: Screen Shot 2018-07-20 at 10.04.24 AM.png
>
>
> I need to have the tasks within a DAG complete in order when running 
> backfills. I am running on my mac locally using SequentialExecutor.
> Let's say I have a DAG running daily at 11AM UTC (0 11 * * *) with a 
> start_date: datetime(2016, 10, 20, 11, 0, 0). The DAG consists of 3 tasks, 
> which must complete in order. task0 -> task1 -> task2. This dependency is set 
> using .set_downstream().
> Today (2016/10/22) I reset the database, turn-on the DAGrun using the on/off 
> toggle in the webserver, and issue "airflow scheduler", which will 
> automatically backfill starting from start_date.
> It will backfill for 2016/10/20 and 2016/10/21.  I expect backfill to run 
> like the following sequentially:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 20, 11, 0, 0) task2
> datetime(2016, 10, 21, 11, 0, 0) task0
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task2
> With 'depends_on_past': False, I see Airflow running tasks grouped by 
> sequence number something like this, which is not what I want:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 21, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 20, 11, 0, 0) task2
> datetime(2016, 10, 21, 11, 0, 0) task2
> With 'depends_on_past': True and 'wait_for_downstream': True, I expect it to 
> run like what I need to, but instead it runs some tasks out of order like 
> this:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task0   <- out of order!
> datetime(2016, 10, 20, 11, 0, 0) task2   <- out of order!
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task2
> Is this a bug? If not, am I understanding 'depends_on_past' and 
> 'wait_for_downstream' correctly? What do I need to do?
> The only remedy I can think of is to backfill each date manually.
> Public gist of DAG: 
> https://gist.github.com/jong-eatsa/cba1bf3c182b38e966696da47164faf1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-152) Add --task_params option to 'airflow run'

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-152.
---
Resolution: Fixed

> Add --task_params option to 'airflow run'
> -
>
> Key: AIRFLOW-152
> URL: https://issues.apache.org/jira/browse/AIRFLOW-152
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: cli
>Reporter: Jeffrey Picard
>Priority: Minor
>
> Currently there is a 'task_params' option which can add to or override
> values in the params dictionary for a task, but it is only available
> when running a task with 'airflow test'.
> By accepting this parameter in 'airflow run' and then passing it to the
> subprocess through the command method in the TaskInstance class this
> option can be supported.
> This has use cases in running tasks in an ad-hoc manner where a
> parameter may define an environment (i.e. testing vs. production) or
> input / output locations and a developer may want to tweak them on the
> fly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-137) Airflow does not respect 'max_active_runs' when task from multiple dag runs cleared

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070483#comment-17070483
 ] 

Daniel Imberman commented on AIRFLOW-137:
-

[~ash] is this ticket still relevant?

> Airflow does not respect 'max_active_runs' when task from multiple dag runs 
> cleared
> ---
>
> Key: AIRFLOW-137
> URL: https://issues.apache.org/jira/browse/AIRFLOW-137
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Tomasz Bartczak
>Priority: Minor
>
> Also requested at https://github.com/apache/incubator-airflow/issues/1442
> Dear Airflow Maintainers,
> Environment
> Before I tell you about my issue, let me describe my Airflow environment:
> Please fill out any appropriate fields:
> Airflow version: 1.7.0
> Airflow components: webserver, mysql, scheduler with celery executor
> Python Version: 2.7.6
> Operating System: Linux Ubuntu 3.19.0-26-generic Scheduler runs with 
> --num-runs and get restarted around every minute or so
> Description of Issue
> Now that you know a little about me, let me tell you about the issue I am 
> having:
> What did you expect to happen?
> After running 'airflow clear -t spark_final_observations2csv -s 
> 2016-04-07T01:00:00 -e 2016-04-11T01:00:00 MODELLING_V6' I expected that this 
> task gets executed in all dag-runs in specified by given time-range - 
> respecting 'max_active_runs'
> Dag configuration:
> concurrency= 3,
> max_active_runs = 2,
> What happened instead?
> Airflow at first started executing 3 of those tasks, which already 
> violates 'max_active_runs', but it looks like 'concurrency' was the applied 
> limit here.
> 3_running_2_pending
> After first task was done - airflow scheduled all other tasks, making it 5 
> running dags at the same time that violates all specified limit.
> In the GUI we saw red warning (5/2 Dags running ;-) )
> Reproducing the Issue
> max_active_runs is respected in a day-to-day basis - when of the tasks was 
> stuck - airflow didn't start more than 2 dags concurrently.
> [screenshots in the original issue: 
> https://github.com/apache/incubator-airflow/issues/1442]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-98) Using Flask extensions from a plugin

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070482#comment-17070482
 ] 

Daniel Imberman commented on AIRFLOW-98:


[~kaxilnaik] is this ticket still relevant?

> Using Flask extensions from a plugin
> 
>
> Key: AIRFLOW-98
> URL: https://issues.apache.org/jira/browse/AIRFLOW-98
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Timo
>Priority: Major
>
> I am creating a plugin which should be able to use some Flask extensions. Is 
> there any way to do this?
> Basically I need to import the airflow.www.app.app object in the plugin to 
> initialise the extension with.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-32) Remove deprecated features prior to releasing Airflow 2.0

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-32?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-32.
--
Resolution: Auto Closed

> Remove deprecated features prior to releasing Airflow 2.0
> -
>
> Key: AIRFLOW-32
> URL: https://issues.apache.org/jira/browse/AIRFLOW-32
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Jeremiah Lowin
>Priority: Major
>  Labels: deprecated
> Fix For: 2.0.0
>
>
> A number of features have been marked for deprecation in Airflow 2.0. They 
> need to be deleted prior to release. 
> Usually the error message or comments will mention Airflow 2.0 with either a 
> #TODO or #FIXME.
> Tracking list (not necessarily complete!):
> JIRA:
> AIRFLOW-31
> AIRFLOW-200
> GitHub:
> https://github.com/airbnb/airflow/pull/1137/files#diff-1c2404a3a60f829127232842250ff406R233
> https://github.com/airbnb/airflow/pull/1219
> https://github.com/airbnb/airflow/pull/1285



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-81) Scheduler blackout time period

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070481#comment-17070481
 ] 

Daniel Imberman commented on AIRFLOW-81:


This issue has been moved to https://github.com/apache/airflow/issues/7988

> Scheduler blackout time period
> --
>
> Key: AIRFLOW-81
> URL: https://issues.apache.org/jira/browse/AIRFLOW-81
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Sean McIntyre
>Priority: Minor
>  Labels: features
>
> I have the need for a scheduler blackout time period in Airflow.
> My team, which uses Airflow, has been asked to not query one of my company's 
> data sources between midnight and 7 AM. When we launch big backfills on this 
> data source, it would be nice to have the Scheduler not schedule some 
> TaskInstances during the blackout hours.
> We (@r39132 and @ledsusop) brainstormed a few ideas on gitter on how to do 
> this...
> (1) Put more state/logic in the TaskInstance and Scheduler like this:
> my_task = PythonOperator(
> task_id='my_task',
> python_callable=my_command_that_access_the_datasource,
> provide_context=True,
> dag=dag,
> blackout=my_blackout_logic_for_the_datasource # <---
> )
> where my_blackout_logic is some function I provide that the scheduler calls 
> to determine whether or not it is the blackout period.
> (2) Pause DAGs on nightly basis. This can be done with the `pause_dag` CLI 
> command scheduled by cron / Jenkins. However could this be considered a core 
> feature to bring into the Airflow UI and scheduling system?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-81) Scheduler blackout time period

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-81?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-81.
--
Resolution: Auto Closed

> Scheduler blackout time period
> --
>
> Key: AIRFLOW-81
> URL: https://issues.apache.org/jira/browse/AIRFLOW-81
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: scheduler
>Reporter: Sean McIntyre
>Priority: Minor
>  Labels: features
>
> I have the need for a scheduler blackout time period in Airflow.
> My team, which uses Airflow, has been asked to not query one of my company's 
> data sources between midnight and 7 AM. When we launch big backfills on this 
> data source, it would be nice to have the Scheduler not schedule some 
> TaskInstances during the blackout hours.
> We (@r39132 and @ledsusop) brainstormed a few ideas on gitter on how to do 
> this...
> (1) Put more state/logic in the TaskInstance and Scheduler like this:
> my_task = PythonOperator(
> task_id='my_task',
> python_callable=my_command_that_access_the_datasource,
> provide_context=True,
> dag=dag,
> blackout=my_blackout_logic_for_the_datasource # <---
> )
> where my_blackout_logic is some function I provide that the scheduler calls 
> to determine whether or not it is the blackout period.
> (2) Pause DAGs on nightly basis. This can be done with the `pause_dag` CLI 
> command scheduled by cron / Jenkins. However could this be considered a core 
> feature to bring into the Airflow UI and scheduling system?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-32) Remove deprecated features prior to releasing Airflow 2.0

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070480#comment-17070480
 ] 

Daniel Imberman commented on AIRFLOW-32:


This issue has been moved to https://github.com/apache/airflow/issues/7987

> Remove deprecated features prior to releasing Airflow 2.0
> -
>
> Key: AIRFLOW-32
> URL: https://issues.apache.org/jira/browse/AIRFLOW-32
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Jeremiah Lowin
>Priority: Major
>  Labels: deprecated
> Fix For: 2.0.0
>
>
> A number of features have been marked for deprecation in Airflow 2.0. They 
> need to be deleted prior to release. 
> Usually the error message or comments will mention Airflow 2.0 with either a 
> #TODO or #FIXME.
> Tracking list (not necessarily complete!):
> JIRA:
> AIRFLOW-31
> AIRFLOW-200
> GitHub:
> https://github.com/airbnb/airflow/pull/1137/files#diff-1c2404a3a60f829127232842250ff406R233
> https://github.com/airbnb/airflow/pull/1219
> https://github.com/airbnb/airflow/pull/1285



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-20) Improving the scheduler by making dag runs more coherent

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-20?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-20.
--
Resolution: Auto Closed

> Improving the scheduler by making dag runs more coherent
> 
>
> Key: AIRFLOW-20
> URL: https://issues.apache.org/jira/browse/AIRFLOW-20
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Bolke de Bruin
>Assignee: zgl
>Priority: Major
>  Labels: backfill, database, scheduler
>
> The need to align the start_date with the interval is counter intuitive
> and leads to a lot of questions and issue creation, although it is in the 
> documentation. If we are
> able to fix this with none or little consequences for current setups that 
> should be preferred, I think.
> The dependency explainer is really great work, but it doesn’t address the 
> core issue.
> If you consider a DAG a description of cohesion between work items (in OOP 
> java terms
> a class), then a DagRun is the instantiation of a DAG in time (in OOP java 
> terms an instance). 
> Tasks are then the description of a work item and a TaskInstance the 
> instantiation of the Task in time.
> In my opinion issues pop up due to the current paradigm of considering the 
> TaskInstance
> the smallest unit of work and asking it to maintain its own state in relation 
> to other TaskInstances
> in a DagRun and in a previous DagRun of which it has no (real) perception. 
> Tasks are instantiated
> by a cartesian product with the dates of DagRun instead of the DagRuns 
> itself. 
> The very loose coupling between DagRuns and TaskInstances can be improved 
> while maintaining
> flexibility to run tasks without a DagRun. This would help with a couple of 
> things:
> 1. start_date can be used as a ‘execution_date’ or a point in time when to 
> start looking
> 2. a new interval for a dag will maintain depends_on_past
> 3. paused dags do not give trouble
> 4. tasks will be executed in order 
> 5. the ignore_first_depend_on_past could be removed as a task will now know 
> if it is really the first
> In PR-1431 a lot of this work has been done by:
> 1. Adding a “previous” field to a DagRun allowing it to connect to its 
> predecessor
> 2. Adding a dag_run_id to TaskInstances so a TaskInstance knows about the 
> DagRun if needed
> 3. Using start_date + interval as the first run date unless start_date is on 
> the interval then start_date is the first run date



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-835) SMTP Mail delivery fails with server using CRAM-MD5 auth

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-835.
---
Resolution: Auto Closed

> SMTP Mail delivery fails with server using CRAM-MD5 auth
> 
>
> Key: AIRFLOW-835
> URL: https://issues.apache.org/jira/browse/AIRFLOW-835
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: utils
>Affects Versions: 1.7.1, 1.9.0
> Environment: https://hub.docker.com/_/python/ (debian:jessie + 
> python2.7 in docker)
>Reporter: Joseph Harris
>Priority: Minor
>
> Traceback when sending email from smtp-server configured to offer CRAM-MD5 
> (in all cases, tls included). This occurs because the configuration module 
> returns the password as a futures.types.newstr, instead of a plain str (see 
> below for gory details of why this breaks).
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, 
> in handle_failure
> self.email_alert(error, is_retry=False)
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, 
> in email_alert
> send_email(task.email, title, body)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 43, in send_email
> return backend(to, subject, html_content, files=files, dryrun=dryrun)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 79, in send_email_smtp
> send_MIME_email(SMTP_MAIL_FROM, to, msg, dryrun)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 95, in send_MIME_email
> s.login(SMTP_USER, SMTP_PASSWORD)
>   File "/usr/local/lib/python2.7/smtplib.py", line 607, in login
> (code, resp) = self.docmd(encode_cram_md5(resp, user, password))
>   File "/usr/local/lib/python2.7/smtplib.py", line 571, in encode_cram_md5
> response = user + " " + hmac.HMAC(password, challenge).hexdigest()
>   File "/usr/local/lib/python2.7/hmac.py", line 75, in __init__
> self.outer.update(key.translate(trans_5C))
>   File "/usr/local/lib/python2.7/site-packages/future/types/newstr.py", line 
> 390, in translate
> if ord(c) in table:
> TypeError: 'in ' requires string as left operand, not int
> SMTP configs:
> [email]
> email_backend = airflow.utils.email.send_email_smtp
> [smtp]
> smtp_host = {a_smtp_server}
> smtp_port = 587
> smtp_starttls = True
> smtp_ssl = False
> smtp_user = {a_username}
> smtp_password = {a_password}
> smtp_mail_from = {a_email_addr}
> *Gory details
> If the server offers CRAM-MD5, smptlib prefers this by default, and will try 
> to use hmac.HMAC to hash the password:
> https://hg.python.org/cpython/file/2.7/Lib/smtplib.py#l602
> https://hg.python.org/cpython/file/2.7/Lib/smtplib.py#l571
> But if the password is a newstr, newstr.translate expects a dict mapping 
> instead of str, and raises an exception.
> https://hg.python.org/cpython/file/2.7/Lib/hmac.py#l75
> All of this occurs after a successful SMTP.ehlo(), so it's probably not crap 
> container networking
> Could be resolved by passing the smtp password as a futures.types.newbytes, 
> as this behaves as expected:
> from future.types import newstr, newbytes
> import hmac
> # Make str / newstr types
> test = 'a_string'
> test_newstr = newstr(test)
> test_newbytes = newbytes(test)
> msg = 'future problems'
> # Test 1 - Try to do a HMAC:
> # fine
> hmac.HMAC(test, msg)
> # fails horribly
> hmac.HMAC(test_newstr, msg)
> # is completely fine
> hmac.HMAC(test_newbytes, msg)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-835) SMTP Mail delivery fails with server using CRAM-MD5 auth

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070476#comment-17070476
 ] 

Daniel Imberman commented on AIRFLOW-835:
-

This issue has been moved to https://github.com/apache/airflow/issues/7986

> SMTP Mail delivery fails with server using CRAM-MD5 auth
> 
>
> Key: AIRFLOW-835
> URL: https://issues.apache.org/jira/browse/AIRFLOW-835
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: utils
>Affects Versions: 1.7.1, 1.9.0
> Environment: https://hub.docker.com/_/python/ (debian:jessie + 
> python2.7 in docker)
>Reporter: Joseph Harris
>Priority: Minor
>
> Traceback when sending email from smtp-server configured to offer CRAM-MD5 
> (in all cases, tls included). This occurs because the configuration module 
> returns the password as a futures.types.newstr, instead of a plain str (see 
> below for gory details of why this breaks).
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, 
> in handle_failure
> self.email_alert(error, is_retry=False)
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, 
> in email_alert
> send_email(task.email, title, body)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 43, in send_email
> return backend(to, subject, html_content, files=files, dryrun=dryrun)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 79, in send_email_smtp
> send_MIME_email(SMTP_MAIL_FROM, to, msg, dryrun)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 95, in send_MIME_email
> s.login(SMTP_USER, SMTP_PASSWORD)
>   File "/usr/local/lib/python2.7/smtplib.py", line 607, in login
> (code, resp) = self.docmd(encode_cram_md5(resp, user, password))
>   File "/usr/local/lib/python2.7/smtplib.py", line 571, in encode_cram_md5
> response = user + " " + hmac.HMAC(password, challenge).hexdigest()
>   File "/usr/local/lib/python2.7/hmac.py", line 75, in __init__
> self.outer.update(key.translate(trans_5C))
>   File "/usr/local/lib/python2.7/site-packages/future/types/newstr.py", line 
> 390, in translate
> if ord(c) in table:
> TypeError: 'in ' requires string as left operand, not int
> SMTP configs:
> [email]
> email_backend = airflow.utils.email.send_email_smtp
> [smtp]
> smtp_host = {a_smtp_server}
> smtp_port = 587
> smtp_starttls = True
> smtp_ssl = False
> smtp_user = {a_username}
> smtp_password = {a_password}
> smtp_mail_from = {a_email_addr}
> *Gory details
> If the server offers CRAM-MD5, smptlib prefers this by default, and will try 
> to use hmac.HMAC to hash the password:
> https://hg.python.org/cpython/file/2.7/Lib/smtplib.py#l602
> https://hg.python.org/cpython/file/2.7/Lib/smtplib.py#l571
> But if the password is a newstr, newstr.translate expects a dict mapping 
> instead of str, and raises an exception.
> https://hg.python.org/cpython/file/2.7/Lib/hmac.py#l75
> All of this occurs after a successful SMTP.ehlo(), so it's probably not crap 
> container networking
> Could be resolved by passing the smtp password as a futures.types.newbytes, 
> as this behaves as expected:
> from future.types import newstr, newbytes
> import hmac
> # Make str / newstr types
> test = 'a_string'
> test_newstr = newstr(test)
> test_newbytes = newbytes(test)
> msg = 'future problems'
> # Test 1 - Try to do a HMAC:
> # fine
> hmac.HMAC(test, msg)
> # fails horribly
> hmac.HMAC(test_newstr, msg)
> # is completely fine
> hmac.HMAC(test_newbytes, msg)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AIRFLOW-835) SMTP Mail delivery fails with server using CRAM-MD5 auth

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman updated AIRFLOW-835:

Affects Version/s: 1.9.0

> SMTP Mail delivery fails with server using CRAM-MD5 auth
> 
>
> Key: AIRFLOW-835
> URL: https://issues.apache.org/jira/browse/AIRFLOW-835
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: utils
>Affects Versions: 1.7.1, 1.9.0
> Environment: https://hub.docker.com/_/python/ (debian:jessie + 
> python2.7 in docker)
>Reporter: Joseph Harris
>Priority: Minor
>
> Traceback when sending email from smtp-server configured to offer CRAM-MD5 
> (in all cases, tls included). This occurs because the configuration module 
> returns the password as a futures.types.newstr, instead of a plain str (see 
> below for gory details of why this breaks).
> Traceback (most recent call last):
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1308, 
> in handle_failure
> self.email_alert(error, is_retry=False)
>   File "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1425, 
> in email_alert
> send_email(task.email, title, body)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 43, in send_email
> return backend(to, subject, html_content, files=files, dryrun=dryrun)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 79, in send_email_smtp
> send_MIME_email(SMTP_MAIL_FROM, to, msg, dryrun)
>   File "/usr/local/lib/python2.7/site-packages/airflow/utils/email.py", line 
> 95, in send_MIME_email
> s.login(SMTP_USER, SMTP_PASSWORD)
>   File "/usr/local/lib/python2.7/smtplib.py", line 607, in login
> (code, resp) = self.docmd(encode_cram_md5(resp, user, password))
>   File "/usr/local/lib/python2.7/smtplib.py", line 571, in encode_cram_md5
> response = user + " " + hmac.HMAC(password, challenge).hexdigest()
>   File "/usr/local/lib/python2.7/hmac.py", line 75, in __init__
> self.outer.update(key.translate(trans_5C))
>   File "/usr/local/lib/python2.7/site-packages/future/types/newstr.py", line 
> 390, in translate
> if ord(c) in table:
> TypeError: 'in ' requires string as left operand, not int
> SMTP configs:
> [email]
> email_backend = airflow.utils.email.send_email_smtp
> [smtp]
> smtp_host = {a_smtp_server}
> smtp_port = 587
> smtp_starttls = True
> smtp_ssl = False
> smtp_user = {a_username}
> smtp_password = {a_password}
> smtp_mail_from = {a_email_addr}
> *Gory details
> If the server offers CRAM-MD5, smptlib prefers this by default, and will try 
> to use hmac.HMAC to hash the password:
> https://hg.python.org/cpython/file/2.7/Lib/smtplib.py#l602
> https://hg.python.org/cpython/file/2.7/Lib/smtplib.py#l571
> But if the password is a newstr, newstr.translate expects a dict mapping 
> instead of str, and raises an exception.
> https://hg.python.org/cpython/file/2.7/Lib/hmac.py#l75
> All of this occurs after a successful SMTP.ehlo(), so it's probably not crap 
> container networking
> Could be resolved by passing the smtp password as a futures.types.newbytes, 
> as this behaves as expected:
> from future.types import newstr, newbytes
> import hmac
> # Make str / newstr types
> test = 'a_string'
> test_newstr = newstr(test)
> test_newbytes = newbytes(test)
> msg = 'future problems'
> # Test 1 - Try to do a HMAC:
> # fine
> hmac.HMAC(test, msg)
> # fails horribly
> hmac.HMAC(test_newstr, msg)
> # is completely fine
> hmac.HMAC(test_newbytes, msg)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-833) Sub-command to generate the default config

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-833.
---
Resolution: Auto Closed

> Sub-command to generate the default config
> --
>
> Key: AIRFLOW-833
> URL: https://issues.apache.org/jira/browse/AIRFLOW-833
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: cli
>Reporter: George Sakkis
>Assignee: George Sakkis
>Priority: Major
>
> 1. Currently airflow autogenerates a default {{$AIRFLOW_HOME/airflow.cfg}} as 
> soon as any command is issued or even when running {{import airflow}} from 
> python, which IMO is rather intrusive (even more so when AIRFLOW_HOME 
> defaults to the user's home directory).
> 2. Moreover, once the file is created it is not trivial to generate a default 
> config again; one has to either move the existing one to another location or 
> change (at least temporarily) AIRFLOW_HOME.
> Both issues can be addressed with the extension of the CLI with a subcommand 
> to generate the default config explicitly at will instead of implicitly at 
> import time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-833) Sub-command to generate the default config

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070470#comment-17070470
 ] 

Daniel Imberman commented on AIRFLOW-833:
-

This issue has been moved to https://github.com/apache/airflow/issues/7985

> Sub-command to generate the default config
> --
>
> Key: AIRFLOW-833
> URL: https://issues.apache.org/jira/browse/AIRFLOW-833
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: cli
>Reporter: George Sakkis
>Assignee: George Sakkis
>Priority: Major
>
> 1. Currently airflow autogenerates a default {{$AIRFLOW_HOME/airflow.cfg}} as 
> soon as any command is issued or even when running {{import airflow}} from 
> python, which IMO is rather intrusive (even more so when AIRFLOW_HOME 
> defaults to the user's home directory).
> 2. Moreover, once the file is created it is not trivial to generate a default 
> config again; one has to either move the existing one to another location or 
> change (at least temporarily) AIRFLOW_HOME.
> Both issues can be addressed with the extension of the CLI with a subcommand 
> to generate the default config explicitly at will instead of implicitly at 
> import time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-811) Bash_operator dont read multiline output

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-811.
---
Resolution: Auto Closed

> Bash_operator dont read multiline output
> 
>
> Key: AIRFLOW-811
> URL: https://issues.apache.org/jira/browse/AIRFLOW-811
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jay Sen
>Assignee: Jay Sen
>Priority: Minor
>
> following piece of code is the root cause of it.  
> {code}
> line = ''
> for line in iter(sp.stdout.readline, b''):
> line = line.decode(self.output_encoding).strip()
> logging.info(line)
> {code}
> I plan to fix it using string buffer instead of just 1 line string variable 
> here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-811) Bash_operator dont read multiline output

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070468#comment-17070468
 ] 

Daniel Imberman commented on AIRFLOW-811:
-

This issue has been moved to https://github.com/apache/airflow/issues/7983

> Bash_operator dont read multiline output
> 
>
> Key: AIRFLOW-811
> URL: https://issues.apache.org/jira/browse/AIRFLOW-811
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jay Sen
>Assignee: Jay Sen
>Priority: Minor
>
> following piece of code is the root cause of it.  
> {code}
> line = ''
> for line in iter(sp.stdout.readline, b''):
> line = line.decode(self.output_encoding).strip()
> logging.info(line)
> {code}
> I plan to fix it using string buffer instead of just 1 line string variable 
> here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-819) Dateutil macro is currently useless

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-819.
---
Resolution: Auto Closed

> Dateutil macro is currently useless
> ---
>
> Key: AIRFLOW-819
> URL: https://issues.apache.org/jira/browse/AIRFLOW-819
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.7.1
>Reporter: Brandon Humpert
>Priority: Minor
>
> Line {{macros/\_\_init\_\_.py:4}} reads
> {noformat}
> import dateutil
> {noformat}
> which turns out to not actually do anything, due to {{dateutil}} having a 
> completely empty root {{\_\_init\_\_.py}}:
> {noformat}
> In [1]: import dateutil
> In [2]: dir(dateutil)
> Out[2]:
> ['__builtins__',
>  '__doc__',
>  '__file__',
>  '__name__',
>  '__package__',
>  '__path__',
>  '__version__']
> {noformat}
> I suspect that instead, you should replace that line with:
> {noformat}
> import dateutil.parser
> import dateutil.rrule
> ...
> {noformat}
> or obviously delete it entirely and rely on user macros for dateutil 
> implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-824) Allow writing to XCOM values via API

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-824.
---
Resolution: Auto Closed

> Allow writing to XCOM values via API
> 
>
> Key: AIRFLOW-824
> URL: https://issues.apache.org/jira/browse/AIRFLOW-824
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Robin Miller
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-818) User env variables overridden when using kerberos

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070465#comment-17070465
 ] 

Daniel Imberman commented on AIRFLOW-818:
-

Hi [~bolke] is this ticket still relevant? We're porting tickets to 
github/doing spring cleaning.

> User env variables overridden when using kerberos
> -
>
> Key: AIRFLOW-818
> URL: https://issues.apache.org/jira/browse/AIRFLOW-818
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Priority: Major
>
> If a user wants to use kerberos authentication and airflow.cfg lists kerberos 
> for security, the user's env variables will be overwritten by the cfg 
> settings resulting in not being able to use kerberos authentication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-805) When creating a DagRun from the interface Run Id should be mandatory

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-805.
---
Resolution: Auto Closed

> When creating a DagRun from the interface Run Id should be mandatory
> 
>
> Key: AIRFLOW-805
> URL: https://issues.apache.org/jira/browse/AIRFLOW-805
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Mohamed Zenadi
>Priority: Minor
>
> If the user forgets to set the Run Id, he'll be welcomed (in the scheduler) 
> with the following exception:
> {code}
>   File "projects/incubator-airflow/airflow/models.py", line 4085, in 
> is_backfill
> if "backfill" in self.run_id:
> TypeError: argument of type 'NoneType' is not iterable
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-804) Airflow installs latest Celery which does not support `sqla` transport

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070462#comment-17070462
 ] 

Daniel Imberman commented on AIRFLOW-804:
-

Hi [~zeapo] is this ticket still relevant?

> Airflow installs latest Celery which does not support `sqla` transport
> --
>
> Key: AIRFLOW-804
> URL: https://issues.apache.org/jira/browse/AIRFLOW-804
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executors
>Reporter: Mohamed Zenadi
>Priority: Minor
>
> Kombu dropped support for `sqla` transport in its version 4.x. With the 
> default configuration this will raise an exception:
> {code}
> $ airflow flower
> [2017-01-25 17:20:05,064] {__init__.py:57} INFO - Using executor 
> SequentialExecutor
> [I 170125 17:20:07 command:136] Visit me at http://0.0.0.0:
> Unknown Celery version
> Traceback (most recent call last):
>   File 
> "/home/10025051/.linuxbrew/opt/python3/lib/python3.6/site-packages/kombu/transport/__init__.py",
>  line 53, in resolve_transport
> transport = TRANSPORT_ALIASES[transport]
> KeyError: 'sqla'
> {code}
> A fix would be to change the default configuration for not using `sqla`. To 
> avoid any confusing error for a new user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-795) Creating temporary file in /tmp is not happening in RHEL

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-795.
---
Resolution: Auto Closed

> Creating temporary file in /tmp is not happening in RHEL
> 
>
> Key: AIRFLOW-795
> URL: https://issues.apache.org/jira/browse/AIRFLOW-795
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli, contrib, operators
>Reporter: Vinish
>Priority: Major
> Attachments: Selection_028.png
>
>
> The subprocess.Popen is unable to create temporary files in RHEL machines 
> unless addition parameter is given 'shell=True' in every operator.py file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-798) LocalTaskJob can terminate prematurely

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-798.
---
Resolution: Auto Closed

> LocalTaskJob can terminate prematurely
> --
>
> Key: AIRFLOW-798
> URL: https://issues.apache.org/jira/browse/AIRFLOW-798
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Priority: Major
>
> LocalTaskJob monitors the task instance for changes and terminates if it is 
> changed outside its scope.
> This can happen also to tasks finishing properly.
> In addition to log reports a wrong state, as the TaskInstance in LocalTaskJob 
> is not refreshed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-791) At start up all running dag_runs are being checked, but not fixed

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070460#comment-17070460
 ] 

Daniel Imberman commented on AIRFLOW-791:
-

Hi [~bolke] is this ticket still relevant?

> At start up all running dag_runs are being checked, but not fixed
> -
>
> Key: AIRFLOW-791
> URL: https://issues.apache.org/jira/browse/AIRFLOW-791
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Priority: Major
>
> At start up all running dag_runs are being checked, we seemed to have a lot 
> of “left over” dag_runs (couple of thousand)
> - Checking was logged to INFO -> requires a fsync for every log message 
> making it very slow
> - Checking would happen at every restart, but dag_runs’ states were not being 
> updated
> - These dag_runs would never er be marked anything else than running for some 
> reason
> -> Applied work around to update all dag_run in sql before a certain date to 
> -> finished
> -> need to investigate why dag_runs did not get marked “finished/failed” 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-795) Creating temporary file in /tmp is not happening in RHEL

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070461#comment-17070461
 ] 

Daniel Imberman commented on AIRFLOW-795:
-

This issue has been moved to https://github.com/apache/airflow/issues/7981

> Creating temporary file in /tmp is not happening in RHEL
> 
>
> Key: AIRFLOW-795
> URL: https://issues.apache.org/jira/browse/AIRFLOW-795
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli, contrib, operators
>Reporter: Vinish
>Priority: Major
> Attachments: Selection_028.png
>
>
> The subprocess.Popen is unable to create temporary files in RHEL machines 
> unless addition parameter is given 'shell=True' in every operator.py file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-788) Context unexpectedly added to hive conf

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-788.
---
Resolution: Auto Closed

> Context unexpectedly added to hive conf
> ---
>
> Key: AIRFLOW-788
> URL: https://issues.apache.org/jira/browse/AIRFLOW-788
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Reporter: Bolke de Bruin
>Assignee: Alan Ma
>Priority: Major
>  Labels: hive, hive-hooks
> Fix For: 1.10.0
>
>
> If specifying hive_conf to run_cli extra variables are added from the 
> context, e.g. airflow.ctx.dag.dag_id . 
> In secured environments this can raise the need for a configuration change as 
> these variables might not be whitelisted.
> Secondly one could regard it as information leakage, as its is added without 
> the user's consent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-772) Handle failed DAG parsing on workers

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070459#comment-17070459
 ] 

Daniel Imberman commented on AIRFLOW-772:
-

Hey [~aoen] is this ticket still needed?

> Handle failed DAG parsing on workers
> 
>
> Key: AIRFLOW-772
> URL: https://issues.apache.org/jira/browse/AIRFLOW-772
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Dan Davydov
>Priority: Major
>
> At the moment "airflow run" commands for DAGs that parse successfully on the 
> scheduler but not on workers just get stuck in the queued state with no 
> failure emails/logging. We need to figure out a way to fail these DAGs on 
> workers instead of keeping them stuck in the queued state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-769) Schedule requirement for subdags

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-769.
---
Resolution: Auto Closed

> Schedule requirement for subdags
> 
>
> Key: AIRFLOW-769
> URL: https://issues.apache.org/jira/browse/AIRFLOW-769
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.7.1.3
>Reporter: Dirk Gorissen
>Priority: Major
>  Labels: subdag
>
> I suspect this to be a known problem but I failed to find an issue about 
> this. Its confusing enough I thought to raise it just in case.
> The docs say:
> >SubDAGs must have a schedule and be enabled. If the SubDAG’s schedule 
> >is set to None or @once, the SubDAG will succeed without having done 
> >anything
> And further googling reveals this is because subdags are implemented as a 
> backfill job.
> As a user I find this very confusing. I see subdags as just a convenient 
> logical grouping. Why should they need their own schedule, never mind it not 
> being allowed certain values. Im assuming you can just put whatever value and 
> that it just gets ignored. For it wouldnt make logical sense to have your 
> subdag run at a different schedule then the encompassing dag. But I admit to 
> not having explicitly tested that yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-767) Scheduler Processor creates logging directories as 777 instead of 755

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-767.
---
Resolution: Auto Closed

> Scheduler Processor creates logging directories as 777 instead of 755
> -
>
> Key: AIRFLOW-767
> URL: https://issues.apache.org/jira/browse/AIRFLOW-767
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Bolke de Bruin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-767) Scheduler Processor creates logging directories as 777 instead of 755

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070414#comment-17070414
 ] 

Daniel Imberman commented on AIRFLOW-767:
-

This issue has been moved to https://github.com/apache/airflow/issues/7978

> Scheduler Processor creates logging directories as 777 instead of 755
> -
>
> Key: AIRFLOW-767
> URL: https://issues.apache.org/jira/browse/AIRFLOW-767
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Reporter: Bolke de Bruin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-769) Schedule requirement for subdags

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070415#comment-17070415
 ] 

Daniel Imberman commented on AIRFLOW-769:
-

This issue has been moved to https://github.com/apache/airflow/issues/7979

> Schedule requirement for subdags
> 
>
> Key: AIRFLOW-769
> URL: https://issues.apache.org/jira/browse/AIRFLOW-769
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Affects Versions: 1.7.1.3
>Reporter: Dirk Gorissen
>Priority: Major
>  Labels: subdag
>
> I suspect this to be a known problem but I failed to find an issue about 
> this. Its confusing enough I thought to raise it just in case.
> The docs say:
> >SubDAGs must have a schedule and be enabled. If the SubDAG’s schedule 
> >is set to None or @once, the SubDAG will succeed without having done 
> >anything
> And further googling reveals this is because subdags are implemented as a 
> backfill job.
> As a user I find this very confusing. I see subdags as just a convenient 
> logical grouping. Why should they need their own schedule, never mind it not 
> being allowed certain values. Im assuming you can just put whatever value and 
> that it just gets ignored. For it wouldnt make logical sense to have your 
> subdag run at a different schedule then the encompassing dag. But I admit to 
> not having explicitly tested that yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-758) airflow webserver does not properly detach when using '--daemon'

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070413#comment-17070413
 ] 

Daniel Imberman commented on AIRFLOW-758:
-

Hi [~bolke], is this ticket still relevent?

> airflow webserver does not properly detach when using '--daemon'
> 
>
> Key: AIRFLOW-758
> URL: https://issues.apache.org/jira/browse/AIRFLOW-758
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Priority: Major
>
> The new rolling restart functionality does not properly detach when using 
> "--daemon", it wrongly demonises the sub process of unicorn while it needs to 
> stay running itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-754) Operator for IBM BigSQL

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-754.
---
Resolution: Auto Closed

> Operator for IBM BigSQL
> ---
>
> Key: AIRFLOW-754
> URL: https://issues.apache.org/jira/browse/AIRFLOW-754
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: hooks, operators
>Reporter: Michael Gonzalez
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-733) Deferred DAG assignment breaks with default_args

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070412#comment-17070412
 ] 

Daniel Imberman commented on AIRFLOW-733:
-

This issue has been moved to https://github.com/apache/airflow/issues/7977

> Deferred DAG assignment breaks with default_args
> 
>
> Key: AIRFLOW-733
> URL: https://issues.apache.org/jira/browse/AIRFLOW-733
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Reporter: George Sakkis
>Assignee: Ash Berlin-Taylor
>Priority: Minor
>
> Deferred DAG assignment raises {{AirflowException}} if the dag has 
> {{default_args}} instead of {{start_date}}:
> {noformat}
> default_args = {'start_date': datetime(2016, 1, 1)}
> dag = DAG('my_dag2', default_args=default_args)
> deferred_op = DummyOperator(task_id='dummy')
> deferred_op.dag = dag
> ---
> AirflowException  Traceback (most recent call last)
>  in ()
> > 1 deferred_op.dag = dag
> ...
> AirflowException: Task is missing the start_date parameter
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-745) DAG Links Ordering Does not Match Ordering on Other Pages

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-745.
---
Resolution: Auto Closed

> DAG Links Ordering Does not Match Ordering on Other Pages
> -
>
> Key: AIRFLOW-745
> URL: https://issues.apache.org/jira/browse/AIRFLOW-745
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.7.1.3
>Reporter: Bryant Moscon
>Priority: Minor
> Attachments: DAGs.png, graphView.png
>
>
> on the DAGs page (localhost:8080/admin) on the left side of the DAG table 
> under links, the links are in the following order:
> Tree, Graph, Task Duration, Landing Times, Gnatt, Code, Details
> (see DAGs attachment)
> on the task page, the links are in the following order:
> Graph, Tree, Task Duration, Landing Times, Gnatt, Details, Code
> (see graphView attachment)
> Any reason these are not in the same order?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-733) Deferred DAG assignment breaks with default_args

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-733.
---
Resolution: Auto Closed

> Deferred DAG assignment breaks with default_args
> 
>
> Key: AIRFLOW-733
> URL: https://issues.apache.org/jira/browse/AIRFLOW-733
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Reporter: George Sakkis
>Assignee: Ash Berlin-Taylor
>Priority: Minor
>
> Deferred DAG assignment raises {{AirflowException}} if the dag has 
> {{default_args}} instead of {{start_date}}:
> {noformat}
> default_args = {'start_date': datetime(2016, 1, 1)}
> dag = DAG('my_dag2', default_args=default_args)
> deferred_op = DummyOperator(task_id='dummy')
> deferred_op.dag = dag
> ---
> AirflowException  Traceback (most recent call last)
>  in ()
> > 1 deferred_op.dag = dag
> ...
> AirflowException: Task is missing the start_date parameter
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-733) Deferred DAG assignment breaks with default_args

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070411#comment-17070411
 ] 

Daniel Imberman commented on AIRFLOW-733:
-

This issue has been moved to https://github.com/apache/airflow/issues/7976

> Deferred DAG assignment breaks with default_args
> 
>
> Key: AIRFLOW-733
> URL: https://issues.apache.org/jira/browse/AIRFLOW-733
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: models
>Reporter: George Sakkis
>Assignee: Ash Berlin-Taylor
>Priority: Minor
>
> Deferred DAG assignment raises {{AirflowException}} if the dag has 
> {{default_args}} instead of {{start_date}}:
> {noformat}
> default_args = {'start_date': datetime(2016, 1, 1)}
> dag = DAG('my_dag2', default_args=default_args)
> deferred_op = DummyOperator(task_id='dummy')
> deferred_op.dag = dag
> ---
> AirflowException  Traceback (most recent call last)
>  in ()
> > 1 deferred_op.dag = dag
> ...
> AirflowException: Task is missing the start_date parameter
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-708) SSHExecuteOperator dont respect multiline output from the command

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070409#comment-17070409
 ] 

Daniel Imberman commented on AIRFLOW-708:
-

This issue has been moved to https://github.com/apache/airflow/issues/7973

> SSHExecuteOperator dont respect multiline output from the command 
> --
>
> Key: AIRFLOW-708
> URL: https://issues.apache.org/jira/browse/AIRFLOW-708
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jay Sen
>Assignee: Jay Sen
>Priority: Major
>
> I find following piece of code works when you have 1 liner output but simply 
> cant work for multiline output from the given bash_command ( it will print 
> proper multiline output in log though )
> {code}
> line = ''
> for line in iter(sp.stdout.readline, b''):
> line = line.decode().strip()
> logging.info(line)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-714) PrestoHook - Add session properties

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-714.
---
Resolution: Auto Closed

> PrestoHook - Add session properties
> ---
>
> Key: AIRFLOW-714
> URL: https://issues.apache.org/jira/browse/AIRFLOW-714
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 2.0.0
>Reporter: Teresa Fontanella De Santis
>Assignee: Teresa Fontanella De Santis
>Priority: Minor
> Fix For: 2.0.0
>
>
> In presto, there are some session properties 
> (https://prestodb.io/docs/current/sql/set-session.html) that can be used in 
> order to make queries more efficient. The idea would be to add session 
> properties in the "Extra Fields" param in Presto Connection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AIRFLOW-714) PrestoHook - Add session properties

2020-03-29 Thread Daniel Imberman (Jira)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17070410#comment-17070410
 ] 

Daniel Imberman commented on AIRFLOW-714:
-

This issue has been moved to https://github.com/apache/airflow/issues/7974

> PrestoHook - Add session properties
> ---
>
> Key: AIRFLOW-714
> URL: https://issues.apache.org/jira/browse/AIRFLOW-714
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Affects Versions: 2.0.0
>Reporter: Teresa Fontanella De Santis
>Assignee: Teresa Fontanella De Santis
>Priority: Minor
> Fix For: 2.0.0
>
>
> In presto, there are some session properties 
> (https://prestodb.io/docs/current/sql/set-session.html) that can be used in 
> order to make queries more efficient. The idea would be to add session 
> properties in the "Extra Fields" param in Presto Connection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AIRFLOW-711) stopping scheduler makes running sensor fail

2020-03-29 Thread Daniel Imberman (Jira)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Imberman closed AIRFLOW-711.
---
Resolution: Auto Closed

> stopping scheduler makes running sensor fail
> 
>
> Key: AIRFLOW-711
> URL: https://issues.apache.org/jira/browse/AIRFLOW-711
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: core
>Reporter: Jay Sen
>Priority: Major
>
> on master brach, when i stop the scheduler ( by stopping it via my IDE ) it 
> causes the sensor which is in running state and has run fine for couple of 
> time already, will instantly fail with following error
> {code}
> [2016-12-21 19:25:08,588] {models.py:1347} ERROR - 
> Traceback (most recent call last):
>   File "/src/apache/airflow/airflow/models.py", line 1304, in run
> result = task_copy.execute(context=context)
>   File "/src/apache/airflow/airflow/operators/sensors.py", line 79, in execute
> sleep(self.poke_interval)
> KeyboardInterrupt
> {code}
> on IDE side i can see the following traceback generated
> {code}
> Traceback (most recent call last):
>   File "./airflow", line 28, in 
> args.func(args)
>   File "/src/apache/airflow/airflow/bin/cli.py", line 370, in run
> run_job.run()
>   File "/src/apache/airflow/airflow/jobs.py", line 202, in run
> self._execute()
>   File "/src/apache/airflow/airflow/jobs.py", line 2028, in _execute
> self.heartbeat()
>   File "/src/apache/airflow/airflow/jobs.py", line 177, in heartbeat
> sleep(sleep_for)
> KeyboardInterrupt
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   >