from:"Apache Spark \(JIRA\)"

[jira] [Commented] (AIRFLOW-2489) Align Flask dependencies with FlaskAppBuilder

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601636#comment-16601636
 ] 

Apache Spark commented on AIRFLOW-2489:
---

User 'Fokko' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3382

> Align Flask dependencies with FlaskAppBuilder
> -
>
> Key: AIRFLOW-2489
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2489
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: 2.0.0
>
>
> Right now it might take a while to update the dependencies. And we would like 
> to update the dependencies to make sure that we don't have any version 
> conflicts like:
> Traceback (most recent call last):
>   File 
> "/home/travis/build/Fokko/incubator-airflow/.tox/py27-backend_postgres/bin/airflow",
>  line 4, in 
> 
> __import__('pkg_resources').require('apache-airflow==2.0.0.dev0+incubating')
>   File 
> "/home/travis/build/Fokko/incubator-airflow/.tox/py27-backend_postgres/lib/python2.7/site-packages/pkg_resources/__init__.py",
>  line 3086, in 
> @_call_aside
>   File 
> "/home/travis/build/Fokko/incubator-airflow/.tox/py27-backend_postgres/lib/python2.7/site-packages/pkg_resources/__init__.py",
>  line 3070, in _call_aside
> f(*args, **kwargs)
>   File 
> "/home/travis/build/Fokko/incubator-airflow/.tox/py27-backend_postgres/lib/python2.7/site-packages/pkg_resources/__init__.py",
>  line 3099, in _initialize_master_working_set
> working_set = WorkingSet._build_master()
>   File 
> "/home/travis/build/Fokko/incubator-airflow/.tox/py27-backend_postgres/lib/python2.7/site-packages/pkg_resources/__init__.py",
>  line 576, in _build_master
> return cls._build_from_requirements(__requires__)
>   File 
> "/home/travis/build/Fokko/incubator-airflow/.tox/py27-backend_postgres/lib/python2.7/site-packages/pkg_resources/__init__.py",
>  line 589, in _build_from_requirements
> dists = ws.resolve(reqs, Environment())
>   File 
> "/home/travis/build/Fokko/incubator-airflow/.tox/py27-backend_postgres/lib/python2.7/site-packages/pkg_resources/__init__.py",
>  line 783, in resolve
> raise VersionConflict(dist, req).with_context(dependent_req)
> pkg_resources.ContextualVersionConflict: (Flask 0.12.4 
> (/home/travis/build/Fokko/incubator-airflow/.tox/py27-backend_postgres/lib/python2.7/site-packages),
>  Requirement.parse('Flask<0.12.2,>=0.10.0'), set(['flask-appbuilder']))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2476) tabulate update: 0.8.2 is tested

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2476:
-

Assignee: Ruslan Dautkhanov  (was: Holden Karau's magical unicorn)

> tabulate update: 0.8.2 is tested
> 
>
> Key: AIRFLOW-2476
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2476
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: Airflow 1.8, 1.9.0, 1.10.0, 2.0.0, 1.10
>Reporter: Ruslan Dautkhanov
>Assignee: Ruslan Dautkhanov
>Priority: Major
>
> As discussed on the dev list, tabulate==0.8.2 is good to go with Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2484) Dictionary contains duplicate keys in MySQL to GCS Op

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601633#comment-16601633
 ] 

Apache Spark commented on AIRFLOW-2484:
---

User 'kaxil' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3376

> Dictionary contains duplicate keys in MySQL to GCS Op
> -
>
> Key: AIRFLOW-2484
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2484
> Project: Apache Airflow
>  Issue Type: Task
>  Components: contrib, gcp
>Affects Versions: 1.9.0, 1.10.0
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 1.10.0
>
>
> Helper function that maps from MySQL fields to BigQuery fields `type_map` 
> contains duplicate keys in MySQL to GCS Op



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2420) Add functionality for Azure Data Lake

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601635#comment-16601635
 ] 

Apache Spark commented on AIRFLOW-2420:
---

User 'marcusrehm' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/

> Add functionality for Azure Data Lake
> -
>
> Key: AIRFLOW-2420
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2420
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: hooks
>Reporter: Marcus Rehm
>Assignee: Marcus Rehm
>Priority: Major
> Fix For: 2.0.0
>
>
> Currently Airflow has a hook for Azure Blob Storage but it does not support 
> Azure Data Lake.
> As a first step a hook would interface with Azure Data Lake via the Python 
> SDK over the adl protocol.
>  
> The hook would have a simple interface to upload and download files with all 
> parameters available in ADL sdk and also a check for file to query if a file 
> exists in the data lake. This last functions will enable sensors development 
> in the future.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2496) v1-10-test branch reports version 2.0 instead of 1.10

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2496:
-

Assignee: Holden Karau's magical unicorn

> v1-10-test branch reports version 2.0 instead of 1.10
> -
>
> Key: AIRFLOW-2496
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2496
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: release
>Affects Versions: 1.10
>Reporter: Craig Rodrigues
>Assignee: Holden Karau's magical unicorn
>Priority: Minor
> Fix For: 1.10
>
>
> I created a requirements.txt with one line: git+
> [https://github.com/apache/incubator-airflow@v1-10-test#egg=apache-airflow[celery,crypto,emr,hive,hdfs,ldap,mysql,postgres,redis,slack,s3]]
> I then did: 1. create a virtual environment 2. pip install -r 
> requirements.txt 3. airflow webserver When I look at the version in the web 
> interface, it shows a version of: 2.0.0.dev0+incubating even though I used 
> the v1-10-test branch. This seems wrong. It ooks like these two commits got 
> merged to v1-10-test branch which bump the version to 2.0:
> [https://github.com/apache/incubator-airflow/commit/305a787]
> [https://github.com/apache/incubator-airflow/commit/a30acaf]
> That seems wrong for v1-10-test branch. It would be nice if this version was 
> 1.10.0.dev0+incubating (or whatever), since it looks like I will need to 
> deploy v1-10-test branch to prod this week, and then very soon after when 
> 1.10 is released, re-deploy airflow 1.10. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2460) KubernetesPodOperator should be able to attach to volume mounts and configmaps

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601634#comment-16601634
 ] 

Apache Spark commented on AIRFLOW-2460:
---

User 'dimberman' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3356

> KubernetesPodOperator should be able to attach to volume mounts and configmaps
> --
>
> Key: AIRFLOW-2460
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2460
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Daniel Imberman
>Assignee: Daniel Imberman
>Priority: Major
> Fix For: 1.10.0, 2.0.0
>
>
> In order to run tasks using the KubernetesPodOperator in a production 
> setting, users need to be able to access pre-existing data through 
> PersistentVolumes or ConfigMaps.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2496) v1-10-test branch reports version 2.0 instead of 1.10

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2496:
-

Assignee: (was: Holden Karau's magical unicorn)

> v1-10-test branch reports version 2.0 instead of 1.10
> -
>
> Key: AIRFLOW-2496
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2496
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: release
>Affects Versions: 1.10
>Reporter: Craig Rodrigues
>Priority: Minor
> Fix For: 1.10
>
>
> I created a requirements.txt with one line: git+
> [https://github.com/apache/incubator-airflow@v1-10-test#egg=apache-airflow[celery,crypto,emr,hive,hdfs,ldap,mysql,postgres,redis,slack,s3]]
> I then did: 1. create a virtual environment 2. pip install -r 
> requirements.txt 3. airflow webserver When I look at the version in the web 
> interface, it shows a version of: 2.0.0.dev0+incubating even though I used 
> the v1-10-test branch. This seems wrong. It ooks like these two commits got 
> merged to v1-10-test branch which bump the version to 2.0:
> [https://github.com/apache/incubator-airflow/commit/305a787]
> [https://github.com/apache/incubator-airflow/commit/a30acaf]
> That seems wrong for v1-10-test branch. It would be nice if this version was 
> 1.10.0.dev0+incubating (or whatever), since it looks like I will need to 
> deploy v1-10-test branch to prod this week, and then very soon after when 
> 1.10 is released, re-deploy airflow 1.10. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2476) tabulate update: 0.8.2 is tested

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2476:
-

Assignee: Holden Karau's magical unicorn  (was: Ruslan Dautkhanov)

> tabulate update: 0.8.2 is tested
> 
>
> Key: AIRFLOW-2476
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2476
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: Airflow 1.8, 1.9.0, 1.10.0, 2.0.0, 1.10
>Reporter: Ruslan Dautkhanov
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>
> As discussed on the dev list, tabulate==0.8.2 is good to go with Airflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2486) Extra slash in base_url when port provided

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601632#comment-16601632
 ] 

Apache Spark commented on AIRFLOW-2486:
---

User 'jason-udacity' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3379

> Extra slash in base_url when port provided
> --
>
> Key: AIRFLOW-2486
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2486
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jason Shao
>Assignee: Jason Shao
>Priority: Major
> Fix For: 1.10.0, 2.0.0
>
>
> {{Issue in 
> }}[incubator-airflow|https://github.com/jason-udacity/incubator-airflow]/{{[airflow/hooks/http_hook.py|https://github.com/apache/incubator-airflow/pull/3377/files#diff-80514189dfbbac3803594380c3a714f1]}}
> {{self.base_url}} includes an unnecessary slash when {{conn.port}} is 
> specified.
> This often leads to unintended redirects that are especially problematic when 
> a request body is needed.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2482) Add test for rewrite method for GCS Hook

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601637#comment-16601637
 ] 

Apache Spark commented on AIRFLOW-2482:
---

User 'kaxil' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3374

> Add test for rewrite method for GCS Hook
> 
>
> Key: AIRFLOW-2482
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2482
> Project: Apache Airflow
>  Issue Type: Test
>  Components: contrib, gcp
>Affects Versions: 1.10.0
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Minor
> Fix For: 2.0.0
>
>
> The Tests for rewrite method in gcs hook is missing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-1840) Fix Celery config

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601630#comment-16601630
 ] 

Apache Spark commented on AIRFLOW-1840:
---

User 'ashb' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3549

> Fix Celery config
> -
>
> Key: AIRFLOW-1840
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1840
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: 1.10.0
>
>
> While configuring the Celery executor I keep running into this problem:
> ==> /var/log/airflow/scheduler.log <==
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/airflow/executors/celery_executor.py",
>  line 83, in sync
> state = async.state
>   File "/usr/local/lib/python2.7/dist-packages/celery/result.py", line 394, 
> in state
> return self._get_task_meta()['status']
>   File "/usr/local/lib/python2.7/dist-packages/celery/result.py", line 339, 
> in _get_task_meta
> return self._maybe_set_cache(self.backend.get_task_meta(self.id))
>   File "/usr/local/lib/python2.7/dist-packages/celery/backends/base.py", line 
> 307, in get_task_meta
> meta = self._get_task_meta_for(task_id)
> AttributeError: 'DisabledBackend' object has no attribute '_get_task_meta_for'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2733) Airflow webserver crashes with refresh interval <= 0 in daemon mode

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2733:
-

Assignee: Holden Karau's magical unicorn

> Airflow webserver crashes with refresh interval <= 0 in daemon mode
> ---
>
> Key: AIRFLOW-2733
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2733
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>
> In airflow/bin/cli.py calls to mointor_gunicorn sub-function 
> ([https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L841)]
>  are made using a mix of psutil.Process objects and subprocess.Popen objects.
>  
> Case 1: 
> [https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L878]
> Case 2: 
> [https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L884]
>  
> In the event worker_refresh_interval is <=0 and we are in daemon mode, we end 
> up calling a non-existent `poll` function on a psutil.Process object: 
> https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L847



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2625) Create an API to list all the available DAGs

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2625:
-

Assignee: Holden Karau's magical unicorn

> Create an API to list all the available DAGs
> 
>
> Key: AIRFLOW-2625
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2625
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: api, DAG
>Reporter: Verdan Mahmood
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>  Labels: api, api-required
>
> There should be an API to list all the available DAGs in the system. (this is 
> basically same as the DAGs list page aka Airflow home page)
> This should include all the basic information related to a DAG. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2537) clearing tasks shouldn't set backfill DAG runs to `running`

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2537:
-

Assignee: Holden Karau's magical unicorn

> clearing tasks shouldn't set backfill DAG runs to `running`
> ---
>
> Key: AIRFLOW-2537
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2537
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Maxime Beauchemin
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2633) Retry loop on AWSBatchOperator won't quit

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2633:
-

Assignee: Holden Karau's magical unicorn  (was: Sebastian Schwartz)

> Retry loop on AWSBatchOperator won't quit
> -
>
> Key: AIRFLOW-2633
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2633
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 2.0.0
>Reporter: Sebastian Schwartz
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>  Labels: patch, pull-request-available
> Fix For: 2.0.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The exponential backoff retry loop that is a fallback for AWSBatchOperator as 
> a strategy for polling job success does not quit until maximum retries is 
> reached due to a control flow error. This is a simply one line fix. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2642) [kubernetes executor worker] the value of git-sync init container ENV GIT_SYNC_ROOT is wrong

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2642:
-

Assignee: pengchen  (was: Holden Karau's magical unicorn)

> [kubernetes executor worker] the value of git-sync init container ENV 
> GIT_SYNC_ROOT is wrong
> 
>
> Key: AIRFLOW-2642
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2642
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 2.0.0, 1.10
>Reporter: pengchen
>Assignee: pengchen
>Priority: Major
> Fix For: 1.10
>
>
> There are two way of syncing dags, pvc and git-sync. When we use git-sync 
> this way, the generated worker pod yaml file fragment is as follows
>  
> {code:java}
> worker container:
> ---
> containers:
> - args:
> - airflow run tutorial1 print_date 2018-06-19T07:57:15.011693+00:00 --local 
> -sd
> /root/airflow/dags/dags/example_dags/tutorial1.py
> command:
> - bash
> - -cx
> - --
> env:
> - name: AIRFLOW__CORE__AIRFLOW_HOME
> value: /root/airflow
> - name: AIRFLOW__CORE__EXECUTOR
> value: LocalExecutor
> - name: AIRFLOW__CORE__DAGS_FOLDER
> value: /tmp/dags
> - name: SQL_ALCHEMY_CONN
> valueFrom:
> secretKeyRef:
> key: sql_alchemy_conn
> name: airflow-secrets
> init container:
> ---
> initContainers:
> - env:
> - name: GIT_SYNC_REPO
> value: https://code.devops.xiaohongshu.com/pengchen/Airflow-DAGs.git
> - name: GIT_SYNC_BRANCH
> value: master
> - name: GIT_SYNC_ROOT
> value: /tmp
> - name: GIT_SYNC_DEST
> value: dags
> - name: GIT_SYNC_ONE_TIME
> value: "true"
> - name: GIT_SYNC_USERNAME
> value: XXX
> - name: GIT_SYNC_PASSWORD
> value: XXX
> image: library/git-sync-amd64:v2.0.5
> imagePullPolicy: IfNotPresent
> name: git-sync-clone
> resources: {}
> securityContext:
> runAsUser: 0
> terminationMessagePath: /dev/termination-log
> terminationMessagePolicy: File
> volumeMounts:
> - mountPath: /root/airflow/dags/
> name: airflow-dags
> - mountPath: /root/airflow/logs
> name: airflow-logs
> - mountPath: /root/airflow/airflow.cfg
> name: airflow-config
> readOnly: true
> subPath: airflow.cfg
> - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
> name: default-token-xz87t
> readOnly: true
> {code}
> According to the configuration, git-sync will synchronize dags to /tmp/dags 
> directory. However the worker container command args(airflow run tutorial1 
> print_date 2018-06-19T07:57:15.011693+00:00 --local -sd
>  /root/airflow/dags/dags/example_dags/tutorial1.py) are generated by the 
> scheduler. Therefore, the task error is as follows
> {code:java}
> + airflow run tutorial1 print_date 2018-06-19T07:57:15.011693+00:00 --local 
> -sd /root/airflow/dags/dags/example_dags/tutorial1.py
> [2018-06-19 07:57:29,075] {settings.py:174} INFO - setting.configure_orm(): 
> Using pool settings. pool_size=5, pool_recycle=1800
> [2018-06-19 07:57:29,232] {__init__.py:51} INFO - Using executor LocalExecutor
> [2018-06-19 07:57:29,373] {models.py:219} INFO - Filling up the DagBag from 
> /root/airflow/dags/dags/example_dags/tutorial1.py
> [2018-06-19 07:57:29,648] {models.py:310} INFO - File 
> /usr/local/lib/python2.7/dist-packages/airflow/example_dags/__init__.py 
> assumed to contain no DAGs. Skipping.
> Traceback (most recent call last):
> File "/usr/local/bin/airflow", line 32, in 
> args.func(args)
> File "/usr/local/lib/python2.7/dist-packages/airflow/utils/cli.py", line 74, 
> in wrapper
> return f(*args, **kwargs)
> File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 475, 
> in run
> dag = get_dag(args)
> File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 146, 
> in get_dag
> 'parse.'.format(args.dag_id))
> airflow.exceptions.AirflowException: dag_id could not be found: tutorial1. 
> Either the dag did not exist or it failed to parse.
> {code}
>  
> The log shows that the worker cannot find the corresponding dag, so I think 
> the environment variable GIT_SYNC_ROOT should be consistent with 
> dag_volume_mount_path.  
> The worker's environment variable AIRFLOW__CORE__DAGS_FOLDER is invalid, and 
> AIRFLOW__CORE__EXECUTOR is also invalid
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2642) [kubernetes executor worker] the value of git-sync init container ENV GIT_SYNC_ROOT is wrong

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2642:
-

Assignee: Holden Karau's magical unicorn  (was: pengchen)

> [kubernetes executor worker] the value of git-sync init container ENV 
> GIT_SYNC_ROOT is wrong
> 
>
> Key: AIRFLOW-2642
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2642
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Affects Versions: 2.0.0, 1.10
>Reporter: pengchen
>Assignee: Holden Karau's magical unicorn
>Priority: Major
> Fix For: 1.10
>
>
> There are two way of syncing dags, pvc and git-sync. When we use git-sync 
> this way, the generated worker pod yaml file fragment is as follows
>  
> {code:java}
> worker container:
> ---
> containers:
> - args:
> - airflow run tutorial1 print_date 2018-06-19T07:57:15.011693+00:00 --local 
> -sd
> /root/airflow/dags/dags/example_dags/tutorial1.py
> command:
> - bash
> - -cx
> - --
> env:
> - name: AIRFLOW__CORE__AIRFLOW_HOME
> value: /root/airflow
> - name: AIRFLOW__CORE__EXECUTOR
> value: LocalExecutor
> - name: AIRFLOW__CORE__DAGS_FOLDER
> value: /tmp/dags
> - name: SQL_ALCHEMY_CONN
> valueFrom:
> secretKeyRef:
> key: sql_alchemy_conn
> name: airflow-secrets
> init container:
> ---
> initContainers:
> - env:
> - name: GIT_SYNC_REPO
> value: https://code.devops.xiaohongshu.com/pengchen/Airflow-DAGs.git
> - name: GIT_SYNC_BRANCH
> value: master
> - name: GIT_SYNC_ROOT
> value: /tmp
> - name: GIT_SYNC_DEST
> value: dags
> - name: GIT_SYNC_ONE_TIME
> value: "true"
> - name: GIT_SYNC_USERNAME
> value: XXX
> - name: GIT_SYNC_PASSWORD
> value: XXX
> image: library/git-sync-amd64:v2.0.5
> imagePullPolicy: IfNotPresent
> name: git-sync-clone
> resources: {}
> securityContext:
> runAsUser: 0
> terminationMessagePath: /dev/termination-log
> terminationMessagePolicy: File
> volumeMounts:
> - mountPath: /root/airflow/dags/
> name: airflow-dags
> - mountPath: /root/airflow/logs
> name: airflow-logs
> - mountPath: /root/airflow/airflow.cfg
> name: airflow-config
> readOnly: true
> subPath: airflow.cfg
> - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
> name: default-token-xz87t
> readOnly: true
> {code}
> According to the configuration, git-sync will synchronize dags to /tmp/dags 
> directory. However the worker container command args(airflow run tutorial1 
> print_date 2018-06-19T07:57:15.011693+00:00 --local -sd
>  /root/airflow/dags/dags/example_dags/tutorial1.py) are generated by the 
> scheduler. Therefore, the task error is as follows
> {code:java}
> + airflow run tutorial1 print_date 2018-06-19T07:57:15.011693+00:00 --local 
> -sd /root/airflow/dags/dags/example_dags/tutorial1.py
> [2018-06-19 07:57:29,075] {settings.py:174} INFO - setting.configure_orm(): 
> Using pool settings. pool_size=5, pool_recycle=1800
> [2018-06-19 07:57:29,232] {__init__.py:51} INFO - Using executor LocalExecutor
> [2018-06-19 07:57:29,373] {models.py:219} INFO - Filling up the DagBag from 
> /root/airflow/dags/dags/example_dags/tutorial1.py
> [2018-06-19 07:57:29,648] {models.py:310} INFO - File 
> /usr/local/lib/python2.7/dist-packages/airflow/example_dags/__init__.py 
> assumed to contain no DAGs. Skipping.
> Traceback (most recent call last):
> File "/usr/local/bin/airflow", line 32, in 
> args.func(args)
> File "/usr/local/lib/python2.7/dist-packages/airflow/utils/cli.py", line 74, 
> in wrapper
> return f(*args, **kwargs)
> File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 475, 
> in run
> dag = get_dag(args)
> File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 146, 
> in get_dag
> 'parse.'.format(args.dag_id))
> airflow.exceptions.AirflowException: dag_id could not be found: tutorial1. 
> Either the dag did not exist or it failed to parse.
> {code}
>  
> The log shows that the worker cannot find the corresponding dag, so I think 
> the environment variable GIT_SYNC_ROOT should be consistent with 
> dag_volume_mount_path.  
> The worker's environment variable AIRFLOW__CORE__DAGS_FOLDER is invalid, and 
> AIRFLOW__CORE__EXECUTOR is also invalid
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2733) Airflow webserver crashes with refresh interval <= 0 in daemon mode

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601631#comment-16601631
 ] 

Apache Spark commented on AIRFLOW-2733:
---

User 'gwax' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3586

> Airflow webserver crashes with refresh interval <= 0 in daemon mode
> ---
>
> Key: AIRFLOW-2733
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2733
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Priority: Major
>
> In airflow/bin/cli.py calls to mointor_gunicorn sub-function 
> ([https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L841)]
>  are made using a mix of psutil.Process objects and subprocess.Popen objects.
>  
> Case 1: 
> [https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L878]
> Case 2: 
> [https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L884]
>  
> In the event worker_refresh_interval is <=0 and we are in daemon mode, we end 
> up calling a non-existent `poll` function on a psutil.Process object: 
> https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L847



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2616) Pluggable class-based views for APIs

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2616:
-

Assignee: (was: Holden Karau's magical unicorn)

> Pluggable class-based views for APIs
> 
>
> Key: AIRFLOW-2616
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2616
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: api
>Reporter: Verdan Mahmood
>Priority: Major
>  Labels: api_endpoints, architecture
>
> With the increase of API code base, the current architecture (functional 
> views) will become messy in no time. Same routes with different http methods 
> become more confusing in the code base. 
> We can either use Flask's Pluggable views, which are inspired by Django's 
> generic class-based views to make our API structure more modular, or we can 
> look for Flask-RESTful framework. 
>  
> http://flask.pocoo.org/docs/0.12/views/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2711) zendesk hook doesn't handle search endpoint properly

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2711:
-

Assignee: Holden Karau's magical unicorn

> zendesk hook doesn't handle search endpoint properly
> 
>
> Key: AIRFLOW-2711
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2711
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Chris Chow
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>
> the zendesk hook assumes that the api's response includes the expected result 
> in the key with the same name as the api endpoint, e.g. that the results of a 
> query to /api/v2/users.json includes the key 'users'. /api/v2/search.json 
> actually includes results under the key 'results'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2537) clearing tasks shouldn't set backfill DAG runs to `running`

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2537:
-

Assignee: (was: Holden Karau's magical unicorn)

> clearing tasks shouldn't set backfill DAG runs to `running`
> ---
>
> Key: AIRFLOW-2537
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2537
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Maxime Beauchemin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2701) Clean up dangling backfill dagrun

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2701:
-

Assignee: Tao Feng  (was: Holden Karau's magical unicorn)

> Clean up dangling backfill dagrun
> -
>
> Key: AIRFLOW-2701
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2701
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Tao Feng
>Assignee: Tao Feng
>Priority: Major
>
> When user tries to backfill and hit ctrol+9, the backfill dagrun will stay as 
> running state. We should set it to failed if it has unfinished tasks.
>  
> In our production, we see lots of these dangling backfill dagrun which will 
> cause as one active dagrun in the next backfill. This may prevent user from 
> backfilling if the max_active_run is reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2632) AWSBatchOperator allow ints in overrides

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2632:
-

Assignee: Sebastian Schwartz  (was: Holden Karau's magical unicorn)

> AWSBatchOperator allow ints in overrides
> 
>
> Key: AIRFLOW-2632
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2632
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 2.0.0
>Reporter: Sebastian Schwartz
>Assignee: Sebastian Schwartz
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> {{The AWSBatchOperator takes an *overrides* dict as a templated paramater: 
> [https://airflow.readthedocs.io/en/latest/integration.html#aws]}}
>  
> However, the templating does not support ints. This is an issue because in 
> *overrides,* the *vcpus* and *memory* paramaters must be ints for the AWS 
> client to correctly submit the job. Removing templating on the *overrides* 
> issue solves this problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2632) AWSBatchOperator allow ints in overrides

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2632:
-

Assignee: Holden Karau's magical unicorn  (was: Sebastian Schwartz)

> AWSBatchOperator allow ints in overrides
> 
>
> Key: AIRFLOW-2632
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2632
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 2.0.0
>Reporter: Sebastian Schwartz
>Assignee: Holden Karau's magical unicorn
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> {{The AWSBatchOperator takes an *overrides* dict as a templated paramater: 
> [https://airflow.readthedocs.io/en/latest/integration.html#aws]}}
>  
> However, the templating does not support ints. This is an issue because in 
> *overrides,* the *vcpus* and *memory* paramaters must be ints for the AWS 
> client to correctly submit the job. Removing templating on the *overrides* 
> issue solves this problem. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2711) zendesk hook doesn't handle search endpoint properly

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2711:
-

Assignee: (was: Holden Karau's magical unicorn)

> zendesk hook doesn't handle search endpoint properly
> 
>
> Key: AIRFLOW-2711
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2711
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Chris Chow
>Priority: Major
>
> the zendesk hook assumes that the api's response includes the expected result 
> in the key with the same name as the api endpoint, e.g. that the results of a 
> query to /api/v2/users.json includes the key 'users'. /api/v2/search.json 
> actually includes results under the key 'results'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2633) Retry loop on AWSBatchOperator won't quit

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2633:
-

Assignee: Sebastian Schwartz  (was: Holden Karau's magical unicorn)

> Retry loop on AWSBatchOperator won't quit
> -
>
> Key: AIRFLOW-2633
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2633
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: operators
>Affects Versions: 2.0.0
>Reporter: Sebastian Schwartz
>Assignee: Sebastian Schwartz
>Priority: Major
>  Labels: patch, pull-request-available
> Fix For: 2.0.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The exponential backoff retry loop that is a fallback for AWSBatchOperator as 
> a strategy for polling job success does not quit until maximum retries is 
> reached due to a control flow error. This is a simply one line fix. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2733) Airflow webserver crashes with refresh interval <= 0 in daemon mode

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2733:
-

Assignee: (was: Holden Karau's magical unicorn)

> Airflow webserver crashes with refresh interval <= 0 in daemon mode
> ---
>
> Key: AIRFLOW-2733
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2733
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: George Leslie-Waksman
>Priority: Major
>
> In airflow/bin/cli.py calls to mointor_gunicorn sub-function 
> ([https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L841)]
>  are made using a mix of psutil.Process objects and subprocess.Popen objects.
>  
> Case 1: 
> [https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L878]
> Case 2: 
> [https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L884]
>  
> In the event worker_refresh_interval is <=0 and we are in daemon mode, we end 
> up calling a non-existent `poll` function on a psutil.Process object: 
> https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L847



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2701) Clean up dangling backfill dagrun

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2701:
-

Assignee: Holden Karau's magical unicorn  (was: Tao Feng)

> Clean up dangling backfill dagrun
> -
>
> Key: AIRFLOW-2701
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2701
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Tao Feng
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>
> When user tries to backfill and hit ctrol+9, the backfill dagrun will stay as 
> running state. We should set it to failed if it has unfinished tasks.
>  
> In our production, we see lots of these dangling backfill dagrun which will 
> cause as one active dagrun in the next backfill. This may prevent user from 
> backfilling if the max_active_run is reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2616) Pluggable class-based views for APIs

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2616:
-

Assignee: Holden Karau's magical unicorn

> Pluggable class-based views for APIs
> 
>
> Key: AIRFLOW-2616
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2616
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: api
>Reporter: Verdan Mahmood
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>  Labels: api_endpoints, architecture
>
> With the increase of API code base, the current architecture (functional 
> views) will become messy in no time. Same routes with different http methods 
> become more confusing in the code base. 
> We can either use Flask's Pluggable views, which are inspired by Django's 
> generic class-based views to make our API structure more modular, or we can 
> look for Flask-RESTful framework. 
>  
> http://flask.pocoo.org/docs/0.12/views/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2625) Create an API to list all the available DAGs

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2625:
-

Assignee: (was: Holden Karau's magical unicorn)

> Create an API to list all the available DAGs
> 
>
> Key: AIRFLOW-2625
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2625
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: api, DAG
>Reporter: Verdan Mahmood
>Priority: Major
>  Labels: api, api-required
>
> There should be an API to list all the available DAGs in the system. (this is 
> basically same as the DAGs list page aka Airflow home page)
> This should include all the basic information related to a DAG. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-1737) set_task_instance_state fails because of strptime

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-1737:
-

Assignee: Tao Feng  (was: Holden Karau's magical unicorn)

> set_task_instance_state fails because of strptime
> -
>
> Key: AIRFLOW-1737
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1737
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webapp
>Reporter: Andre Boechat
>Assignee: Tao Feng
>Priority: Minor
> Attachments: Screenshot_2017-10-18_15-58-29.png
>
>
> Context:
> * DAG run triggered manually
> * Using the web application to change the state of a task
> When trying to set the state of a task, an exception is thrown: *ValueError: 
> unconverted data remains: ..372649* (look at the attached screenshot).
> I think the problem comes from the "execution date" created by manually 
> triggered DAGs, since the date-time includes a fractional part. In my 
> database, I see scheduled DAGs with execution dates like "10-18T15:00:00", 
> while manually triggered ones with dates like "09-21T16:36:16.170988". If we 
> look at the method *set_task_instance_state* in *airflow.www.views*, we see 
> that the format string used with *strptime* doesn't consider any fractional 
> part.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2885) A Bug in www_rbac.utils.get_params

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2885:
-

Assignee: Holden Karau's magical unicorn  (was: Xiaodong DENG)

> A Bug in www_rbac.utils.get_params
> --
>
> Key: AIRFLOW-2885
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2885
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Xiaodong DENG
>Assignee: Holden Karau's magical unicorn
>Priority: Critical
>
> *get_params(page=0, search="abc",showPaused=False)* returns 
> "_search=abc&showPaused=False_", while it's supposed to return 
> "page=0&search=abc&showPaused=False".
> This is because Python takes 0 as False when it's used in a conditional 
> statement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2780) Adds IMAP Hook to interact with a mail server

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2780:
-

Assignee: Felix Uellendall  (was: Holden Karau's magical unicorn)

> Adds IMAP Hook to interact with a mail server
> -
>
> Key: AIRFLOW-2780
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2780
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Felix Uellendall
>Assignee: Felix Uellendall
>Priority: Major
>
> This Hook connects to a mail server via IMAP to be able to retrieve email 
> attachments by using [Python's IMAP 
> Library.|https://docs.python.org/3.6/library/imaplib.html]
> Features:
> - `has_mail_attachment`: Can be used in a `Sensor` to check if there is an 
> attachment on the mail server with the given name.
> - `retrieve_mail_attachments`: Can be used in an `Operator` to do sth. with 
> the attachments returned as list of tuple.
> - `download_mail_attachments`: Can be used in an `Operator` to download the 
> attachment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2885) A Bug in www_rbac.utils.get_params

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2885:
-

Assignee: Xiaodong DENG  (was: Holden Karau's magical unicorn)

> A Bug in www_rbac.utils.get_params
> --
>
> Key: AIRFLOW-2885
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2885
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webserver
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Critical
>
> *get_params(page=0, search="abc",showPaused=False)* returns 
> "_search=abc&showPaused=False_", while it's supposed to return 
> "page=0&search=abc&showPaused=False".
> This is because Python takes 0 as False when it's used in a conditional 
> statement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2428) Add AutoScalingRole key to emr_hook

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2428:
-

Assignee: (was: Holden Karau's magical unicorn)

> Add AutoScalingRole key to emr_hook
> ---
>
> Key: AIRFLOW-2428
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2428
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Reporter: Kyle Hamlin
>Priority: Minor
> Fix For: 1.10.0
>
>
> Need to be able to pass the `AutoScalingRole` param to the `run_job_flow` 
> method for EMR autoscaling to work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2877) Make docs site URL consistent everywhere

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2877:
-

Assignee: Taylor Edmiston  (was: Holden Karau's magical unicorn)

> Make docs site URL consistent everywhere
> 
>
> Key: AIRFLOW-2877
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2877
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs
>Reporter: Taylor Edmiston
>Assignee: Taylor Edmiston
>Priority: Minor
>
> We currently have several references to multiple docs sites throughout the 
> repo (https://airflow.readthedocs.io/, https://airflow.apache.org/, 
> https://airflow.incubator.apache.org/, etc).
> This PR makes the docs site URL consistent everywhere.
> All references to the docs site now point to the latest stable version, with 
> the one exception being the top-level dev docs site on master in the readme.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2840) cli option to update existing connection

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2840:
-

Assignee: Holden Karau's magical unicorn  (was: David)

> cli option to update existing connection
> 
>
> Key: AIRFLOW-2840
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2840
> Project: Apache Airflow
>  Issue Type: Wish
>Reporter: David
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>
> Add cli options to update existing airflow connection
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2875) Env variables should have percent signs escaped before writing to tmp config

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2875:
-

Assignee: (was: Holden Karau's magical unicorn)

> Env variables should have percent signs escaped before writing to tmp config
> 
>
> Key: AIRFLOW-2875
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2875
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration
> Environment: Ubuntu
> Airflow 1.10rc2
>Reporter: William Horton
>Priority: Major
>
> I encountered this when I was using an environment variable for 
> `AIRFLOW__CELERY__BROKER_URL`. The airflow worker was able to run and 
> communicate with the SQS queue, but when it received a task and began to run 
> it, I encountered an error with this trace:
> {code:java}
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring Traceback (most recent call last):
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File "/opt/airflow/venv/bin/airflow", line 32, in 
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring args.func(args)
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File 
> "/opt/airflow/venv/local/lib/python2.7/site-packages/airflow/utils/cli.py", 
> line 74, in wrapper
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring return f(*args, **kwargs)
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File 
> "/opt/airflow/venv/local/lib/python2.7/site-packages/airflow/bin/cli.py", 
> line 460, in run
> [2018-08-08 15:19:24,403] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring conf.set(section, option, value)
> [2018-08-08 15:19:24,403] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File 
> "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py",
>  line 1239, in set
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring super(ConfigParser, self).set(section, option, value)
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File 
> "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py",
>  line 914, in set
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring value)
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File 
> "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py",
>  line 392, in before_set
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring "position %d" % (value, tmp_value.find('%')))
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring ValueError: invalid interpolation syntax in 
> {code}
> The issue was that the broker url had a percent sign, and when the cli called 
> `conf.set(section, option, value)`, it was throwing because it interpreted 
> the percent as an interpolation.
> To avoid this issue, I would propose that the environment variables be 
> escaped when being written in `utils.configuration.tmp_configuration_copy`, 
> so that when `conf.set` is called in `bin/cli`, it doesn't throw on these 
> unescaped values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2876) Bump version of Tenacity

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2876:
-

Assignee: Holden Karau's magical unicorn

> Bump version of Tenacity
> 
>
> Key: AIRFLOW-2876
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2876
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Fokko Driesprong
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>
> Since 4.8.0 is not Python 3.7 compatible, we want to bump the version to 
> 4.12.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2874) Enable Flask App Builder theme support

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2874:
-

Assignee: Verdan Mahmood  (was: Holden Karau's magical unicorn)

> Enable Flask App Builder theme support
> --
>
> Key: AIRFLOW-2874
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2874
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Verdan Mahmood
>Priority: Major
>
> To customize the look and feel of Apache Airflow (an effort towards making 
> Airflow a whitelabel application), we should enable the support of FAB's 
> theme, which can be set in configuration. 
> Theme can be use in conjunction of existing `navbar_color` configuration or 
> can be used separately by simple unsetting the navbar_color config. 
>  
> http://flask-appbuilder.readthedocs.io/en/latest/customizing.html#changing-themes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-1737) set_task_instance_state fails because of strptime

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-1737:
-

Assignee: Holden Karau's magical unicorn  (was: Tao Feng)

> set_task_instance_state fails because of strptime
> -
>
> Key: AIRFLOW-1737
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1737
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webapp
>Reporter: Andre Boechat
>Assignee: Holden Karau's magical unicorn
>Priority: Minor
> Attachments: Screenshot_2017-10-18_15-58-29.png
>
>
> Context:
> * DAG run triggered manually
> * Using the web application to change the state of a task
> When trying to set the state of a task, an exception is thrown: *ValueError: 
> unconverted data remains: ..372649* (look at the attached screenshot).
> I think the problem comes from the "execution date" created by manually 
> triggered DAGs, since the date-time includes a fractional part. In my 
> database, I see scheduled DAGs with execution dates like "10-18T15:00:00", 
> while manually triggered ones with dates like "09-21T16:36:16.170988". If we 
> look at the method *set_task_instance_state* in *airflow.www.views*, we see 
> that the format string used with *strptime* doesn't consider any fractional 
> part.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2874) Enable Flask App Builder theme support

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2874:
-

Assignee: Holden Karau's magical unicorn  (was: Verdan Mahmood)

> Enable Flask App Builder theme support
> --
>
> Key: AIRFLOW-2874
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2874
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>
> To customize the look and feel of Apache Airflow (an effort towards making 
> Airflow a whitelabel application), we should enable the support of FAB's 
> theme, which can be set in configuration. 
> Theme can be use in conjunction of existing `navbar_color` configuration or 
> can be used separately by simple unsetting the navbar_color config. 
>  
> http://flask-appbuilder.readthedocs.io/en/latest/customizing.html#changing-themes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2877) Make docs site URL consistent everywhere

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2877:
-

Assignee: Holden Karau's magical unicorn  (was: Taylor Edmiston)

> Make docs site URL consistent everywhere
> 
>
> Key: AIRFLOW-2877
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2877
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: docs
>Reporter: Taylor Edmiston
>Assignee: Holden Karau's magical unicorn
>Priority: Minor
>
> We currently have several references to multiple docs sites throughout the 
> repo (https://airflow.readthedocs.io/, https://airflow.apache.org/, 
> https://airflow.incubator.apache.org/, etc).
> This PR makes the docs site URL consistent everywhere.
> All references to the docs site now point to the latest stable version, with 
> the one exception being the top-level dev docs site on master in the readme.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2840) cli option to update existing connection

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2840:
-

Assignee: David  (was: Holden Karau's magical unicorn)

> cli option to update existing connection
> 
>
> Key: AIRFLOW-2840
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2840
> Project: Apache Airflow
>  Issue Type: Wish
>Reporter: David
>Assignee: David
>Priority: Major
>
> Add cli options to update existing airflow connection
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2780) Adds IMAP Hook to interact with a mail server

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2780:
-

Assignee: Holden Karau's magical unicorn  (was: Felix Uellendall)

> Adds IMAP Hook to interact with a mail server
> -
>
> Key: AIRFLOW-2780
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2780
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Felix Uellendall
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>
> This Hook connects to a mail server via IMAP to be able to retrieve email 
> attachments by using [Python's IMAP 
> Library.|https://docs.python.org/3.6/library/imaplib.html]
> Features:
> - `has_mail_attachment`: Can be used in a `Sensor` to check if there is an 
> attachment on the mail server with the given name.
> - `retrieve_mail_attachments`: Can be used in an `Operator` to do sth. with 
> the attachments returned as list of tuple.
> - `download_mail_attachments`: Can be used in an `Operator` to download the 
> attachment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-1737) set_task_instance_state fails because of strptime

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601629#comment-16601629
 ] 

Apache Spark commented on AIRFLOW-1737:
---

User '7yl4r' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3754

> set_task_instance_state fails because of strptime
> -
>
> Key: AIRFLOW-1737
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1737
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: webapp
>Reporter: Andre Boechat
>Assignee: Tao Feng
>Priority: Minor
> Attachments: Screenshot_2017-10-18_15-58-29.png
>
>
> Context:
> * DAG run triggered manually
> * Using the web application to change the state of a task
> When trying to set the state of a task, an exception is thrown: *ValueError: 
> unconverted data remains: ..372649* (look at the attached screenshot).
> I think the problem comes from the "execution date" created by manually 
> triggered DAGs, since the date-time includes a fractional part. In my 
> database, I see scheduled DAGs with execution dates like "10-18T15:00:00", 
> while manually triggered ones with dates like "09-21T16:36:16.170988". If we 
> look at the method *set_task_instance_state* in *airflow.www.views*, we see 
> that the format string used with *strptime* doesn't consider any fractional 
> part.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2428) Add AutoScalingRole key to emr_hook

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2428:
-

Assignee: Holden Karau's magical unicorn

> Add AutoScalingRole key to emr_hook
> ---
>
> Key: AIRFLOW-2428
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2428
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks
>Reporter: Kyle Hamlin
>Assignee: Holden Karau's magical unicorn
>Priority: Minor
> Fix For: 1.10.0
>
>
> Need to be able to pass the `AutoScalingRole` param to the `run_job_flow` 
> method for EMR autoscaling to work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2829) Brush up the CI script for minikube

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2829:
-

Assignee: Holden Karau's magical unicorn  (was: Kengo Seki)

> Brush up the CI script for minikube
> ---
>
> Key: AIRFLOW-2829
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2829
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ci
>Reporter: Kengo Seki
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>
> Ran {{scripts/ci/kubernetes/minikube/start_minikube.sh}} locally and found 
> some points that can be improved:
> - minikube version is hard-coded
> - Defined but unused variables: {{$_HELM_VERSION}}, {{$_VM_DRIVER}}
> - Undefined variables: {{$unameOut}}
> - The following lines cause warnings if download is skipped:
> {code}
>  69 sudo mv bin/minikube /usr/local/bin/minikube
>  70 sudo mv bin/kubectl /usr/local/bin/kubectl
> {code}
> - {{return}} s at line 81 and 96 won't work since it's outside of a function
> - To run this script as a non-root user, {{-E}} is required for {{sudo}}. See 
> https://github.com/kubernetes/minikube/issues/1883.
> {code}
> 105 _MINIKUBE="sudo PATH=$PATH minikube"
> 106 
> 107 $_MINIKUBE config set bootstrapper localkube
> 108 $_MINIKUBE start --kubernetes-version=${_KUBERNETES_VERSION}  
> --vm-driver=none
> 109 $_MINIKUBE update-context
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2875) Env variables should have percent signs escaped before writing to tmp config

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2875:
-

Assignee: Holden Karau's magical unicorn

> Env variables should have percent signs escaped before writing to tmp config
> 
>
> Key: AIRFLOW-2875
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2875
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: configuration
> Environment: Ubuntu
> Airflow 1.10rc2
>Reporter: William Horton
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>
> I encountered this when I was using an environment variable for 
> `AIRFLOW__CELERY__BROKER_URL`. The airflow worker was able to run and 
> communicate with the SQS queue, but when it received a task and began to run 
> it, I encountered an error with this trace:
> {code:java}
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring Traceback (most recent call last):
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File "/opt/airflow/venv/bin/airflow", line 32, in 
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring args.func(args)
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File 
> "/opt/airflow/venv/local/lib/python2.7/site-packages/airflow/utils/cli.py", 
> line 74, in wrapper
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring return f(*args, **kwargs)
> [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File 
> "/opt/airflow/venv/local/lib/python2.7/site-packages/airflow/bin/cli.py", 
> line 460, in run
> [2018-08-08 15:19:24,403] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring conf.set(section, option, value)
> [2018-08-08 15:19:24,403] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File 
> "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py",
>  line 1239, in set
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring super(ConfigParser, self).set(section, option, value)
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File 
> "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py",
>  line 914, in set
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring value)
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring File 
> "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py",
>  line 392, in before_set
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring "position %d" % (value, tmp_value.find('%')))
> [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask 
> mirroring ValueError: invalid interpolation syntax in 
> {code}
> The issue was that the broker url had a percent sign, and when the cli called 
> `conf.set(section, option, value)`, it was throwing because it interpreted 
> the percent as an interpolation.
> To avoid this issue, I would propose that the environment variables be 
> escaped when being written in `utils.configuration.tmp_configuration_copy`, 
> so that when `conf.set` is called in `bin/cli`, it doesn't throw on these 
> unescaped values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2876) Bump version of Tenacity

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2876:
-

Assignee: (was: Holden Karau's magical unicorn)

> Bump version of Tenacity
> 
>
> Key: AIRFLOW-2876
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2876
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Fokko Driesprong
>Priority: Major
>
> Since 4.8.0 is not Python 3.7 compatible, we want to bump the version to 
> 4.12.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2829) Brush up the CI script for minikube

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2829:
-

Assignee: Kengo Seki  (was: Holden Karau's magical unicorn)

> Brush up the CI script for minikube
> ---
>
> Key: AIRFLOW-2829
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2829
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: ci
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Major
>
> Ran {{scripts/ci/kubernetes/minikube/start_minikube.sh}} locally and found 
> some points that can be improved:
> - minikube version is hard-coded
> - Defined but unused variables: {{$_HELM_VERSION}}, {{$_VM_DRIVER}}
> - Undefined variables: {{$unameOut}}
> - The following lines cause warnings if download is skipped:
> {code}
>  69 sudo mv bin/minikube /usr/local/bin/minikube
>  70 sudo mv bin/kubectl /usr/local/bin/kubectl
> {code}
> - {{return}} s at line 81 and 96 won't work since it's outside of a function
> - To run this script as a non-root user, {{-E}} is required for {{sudo}}. See 
> https://github.com/kubernetes/minikube/issues/1883.
> {code}
> 105 _MINIKUBE="sudo PATH=$PATH minikube"
> 106 
> 107 $_MINIKUBE config set bootstrapper localkube
> 108 $_MINIKUBE start --kubernetes-version=${_KUBERNETES_VERSION}  
> --vm-driver=none
> 109 $_MINIKUBE update-context
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2916) Add argument `verify` for AwsHook() and S3 related sensors/operators

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2916:
-

Assignee: Xiaodong DENG  (was: Holden Karau's magical unicorn)

> Add argument `verify` for AwsHook() and S3 related sensors/operators
> 
>
> Key: AIRFLOW-2916
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2916
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks, operators
>Reporter: Xiaodong DENG
>Assignee: Xiaodong DENG
>Priority: Minor
>
> The AwsHook() and S3-related operators/sensors are depending on package boto3.
> In boto3, when we initiate a client or a resource, argument `verify` is 
> provided (https://boto3.readthedocs.io/en/latest/reference/core/session.html 
> ).
> It is useful when
>  # users want to use a different CA cert bundle than the one used by botocore.
>  # users want to have '--no-verify-ssl'. This is especially useful when we're 
> using on-premises S3 or other implementations of object storage, like IBM's 
> Cloud Object Storage.
> However, this feature is not provided in Airflow for S3 yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-1978) WinRM hook and operator

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601621#comment-16601621
 ] 

Apache Spark commented on AIRFLOW-1978:
---

User 'cloneluke' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3316

> WinRM hook and operator
> ---
>
> Key: AIRFLOW-1978
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1978
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib
>Affects Versions: 1.9.0
>Reporter: Luke Bodeen
>Assignee: Luke Bodeen
>Priority: Minor
>  Labels: features, windows
> Fix For: 2.0.0
>
>
> I would like to connect and run windows job via winrm protocol. This could 
> then run any job on windows that you can run via command window in Windows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2315) S3Hook Extra Extras

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2315:
-

Assignee: Josh Bacon  (was: Holden Karau's magical unicorn)

> S3Hook Extra Extras 
> 
>
> Key: AIRFLOW-2315
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2315
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: boto3
>Affects Versions: 1.9.0
>Reporter: Josh Bacon
>Assignee: Josh Bacon
>Priority: Minor
>  Labels: beginner, features, newbie, pull-request-available, 
> starter
> Fix For: 1.10.0
>
>
> Feature improvement request to S3Hook to support additional JSON extra 
> arguments to apply to both upload and download ExtraArgs.
> Allowed Upload Arguments: 
> [http://boto3.readthedocs.io/en/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS]
> Allowed Download Arguments: 
> [http://boto3.readthedocs.io/en/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2920) Kubernetes pod operator: namespace is a hard requirement

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2920:
-

Assignee: (was: Holden Karau's magical unicorn)

> Kubernetes pod operator: namespace is a hard requirement
> 
>
> Key: AIRFLOW-2920
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2920
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jon Davies
>Priority: Major
>
> Hello,
> I'm using the Kubernetes pod operator for my DAGs, I install Airflow to its 
> own namespace within my Kubernetes cluster (for example: "testing-airflow") 
> and I would like pods spun up by that Airflow instance to live in that 
> namespace.
> However, I have to hardcode the namespace into my DAG definition code and so 
> I have to rebuild the Docker image for Airflow to be able to spin up a 
> "production-airflow" namespace as the namespace is a hard requirement in the 
> Python code - it'd be nice if the DAG could just default to its own namespace 
> if none is defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-1886) Failed jobs are not being counted towards max_active_runs_per_dag

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-1886:
-

Assignee: Holden Karau's magical unicorn  (was: Oleg Yamin)

> Failed jobs are not being counted towards max_active_runs_per_dag
> -
>
> Key: AIRFLOW-1886
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1886
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: 1.8.1
>Reporter: Oleg Yamin
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>
> # Currently, I have setup max_active_runs_per_dag = 2 in airflow.cfg but when 
> a DAG aborts, it will keep submitting next DAG in the queue not counting the 
> current incomplete DAG that is already in the queue. I am using 1.8.1 but i 
> see that the jobs.py in latest version is still not addressing this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-1158) Multipart uploads to s3 cut off at nearest division

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-1158:
-

Assignee: Holden Karau's magical unicorn  (was: Maksim Pecherskiy)

> Multipart uploads to s3 cut off at nearest division
> ---
>
> Key: AIRFLOW-1158
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1158
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws, hooks
>Reporter: Maksim Pecherskiy
>Assignee: Holden Karau's magical unicorn
>Priority: Minor
>
> When I try to upload a file of say 104MBs, using multipart uploads of 10MB 
> chunks, I get 10 chunks of 10MBs and that's it.  The 4MBs left over do not 
> get uploaded.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-1952) Add the navigation bar color parameter

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601623#comment-16601623
 ] 

Apache Spark commented on AIRFLOW-1952:
---

User 'Licht-T' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/2903

> Add the navigation bar color parameter
> --
>
> Key: AIRFLOW-1952
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1952
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Licht Takeuchi
>Assignee: Licht Takeuchi
>Priority: Major
> Fix For: 2.0.0
>
>
> We operate multiple Airflow's (eg. Production, Staging, etc.), so we cannot 
> distinguish which Airflow is. This feature enables us to discern the Airflow 
> by the color of navigation bar.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2315) S3Hook Extra Extras

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2315:
-

Assignee: Holden Karau's magical unicorn  (was: Josh Bacon)

> S3Hook Extra Extras 
> 
>
> Key: AIRFLOW-2315
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2315
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: boto3
>Affects Versions: 1.9.0
>Reporter: Josh Bacon
>Assignee: Holden Karau's magical unicorn
>Priority: Minor
>  Labels: beginner, features, newbie, pull-request-available, 
> starter
> Fix For: 1.10.0
>
>
> Feature improvement request to S3Hook to support additional JSON extra 
> arguments to apply to both upload and download ExtraArgs.
> Allowed Upload Arguments: 
> [http://boto3.readthedocs.io/en/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS]
> Allowed Download Arguments: 
> [http://boto3.readthedocs.io/en/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2407) Undefined names in Python code

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601624#comment-16601624
 ] 

Apache Spark commented on AIRFLOW-2407:
---

User 'cclauss' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3299

> Undefined names in Python code
> --
>
> Key: AIRFLOW-2407
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2407
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: cclauss
>Priority: Minor
> Fix For: 2.0.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> flake8 testing of https://github.com/apache/incubator-airflow on Python 3.6.3
> $ *flake8 . --count --select=E901,E999,F821,F822,F823 --show-source 
> --statistics*
> {noformat}
> ./airflow/contrib/auth/backends/kerberos_auth.py:67:13: F821 undefined name 
> 'logging'
> logging.error('Password validation for principal %s failed %s', 
> user_principal, e)
> ^
> ./airflow/contrib/hooks/aws_hook.py:75:13: F821 undefined name 'logging'
> logging.warning("Option Error in parsing s3 config file")
> ^
> ./airflow/contrib/operators/datastore_export_operator.py:105:19: F821 
> undefined name 'AirflowException'
> raise AirflowException('Operation failed: 
> result={}'.format(result))
>   ^
> ./airflow/contrib/operators/datastore_import_operator.py:94:19: F821 
> undefined name 'AirflowException'
> raise AirflowException('Operation failed: 
> result={}'.format(result))
>   ^
> ./airflow/contrib/sensors/qubole_sensor.py:62:9: F821 undefined name 'this'
> this.log.info('Poking: %s', self.data)
> ^
> ./airflow/contrib/sensors/qubole_sensor.py:68:13: F821 undefined name 
> 'logging'
> logging.exception(e)
> ^
> ./airflow/contrib/sensors/qubole_sensor.py:71:9: F821 undefined name 'this'
> this.log.info('Status of this Poke: %s', status)
> ^
> ./airflow/www/app.py:148:17: F821 undefined name 'reload'
> reload(e)
> ^
> ./tests/operators/hive_operator.py:178:27: F821 undefined name 'cursor_mock'
> __enter__=cursor_mock,
>   ^
> ./tests/operators/hive_operator.py:184:27: F821 undefined name 'get_conn_mock'
> __enter__=get_conn_mock,
>   ^
> ./tests/operators/test_virtualenv_operator.py:166:19: F821 undefined name 
> 'virtualenv_string_args'
> print(virtualenv_string_args)
>   ^
> ./tests/operators/test_virtualenv_operator.py:167:16: F821 undefined name 
> 'virtualenv_string_args'
> if virtualenv_string_args[0] != virtualenv_string_args[2]:
>^
> ./tests/operators/test_virtualenv_operator.py:167:45: F821 undefined name 
> 'virtualenv_string_args'
> if virtualenv_string_args[0] != virtualenv_string_args[2]:
> ^
> 13F821 undefined name 'logging'
> 13
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-1812) Update Logging config example in Updating.md

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601614#comment-16601614
 ] 

Apache Spark commented on AIRFLOW-1812:
---

User 'Fokko' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/2784

> Update Logging config example in Updating.md
> 
>
> Key: AIRFLOW-1812
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1812
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Fokko Driesprong
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-1158) Multipart uploads to s3 cut off at nearest division

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601627#comment-16601627
 ] 

Apache Spark commented on AIRFLOW-1158:
---

User 'stellah' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/2337

> Multipart uploads to s3 cut off at nearest division
> ---
>
> Key: AIRFLOW-1158
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1158
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws, hooks
>Reporter: Maksim Pecherskiy
>Assignee: Maksim Pecherskiy
>Priority: Minor
>
> When I try to upload a file of say 104MBs, using multipart uploads of 10MB 
> chunks, I get 10 chunks of 10MBs and that's it.  The 4MBs left over do not 
> get uploaded.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2920) Kubernetes pod operator: namespace is a hard requirement

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2920:
-

Assignee: Holden Karau's magical unicorn

> Kubernetes pod operator: namespace is a hard requirement
> 
>
> Key: AIRFLOW-2920
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2920
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Jon Davies
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>
> Hello,
> I'm using the Kubernetes pod operator for my DAGs, I install Airflow to its 
> own namespace within my Kubernetes cluster (for example: "testing-airflow") 
> and I would like pods spun up by that Airflow instance to live in that 
> namespace.
> However, I have to hardcode the namespace into my DAG definition code and so 
> I have to rebuild the Docker image for Airflow to be able to spin up a 
> "production-airflow" namespace as the namespace is a hard requirement in the 
> Python code - it'd be nice if the DAG could just default to its own namespace 
> if none is defined.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-1488) Add a sensor operator to wait on DagRuns

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-1488:
-

Assignee: Yati  (was: Holden Karau's magical unicorn)

> Add a sensor operator to wait on DagRuns
> 
>
> Key: AIRFLOW-1488
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1488
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, operators
>Reporter: Yati
>Assignee: Yati
>Priority: Major
>
> The 
> [ExternalTaskSensor|https://airflow.incubator.apache.org/code.html#airflow.operators.ExternalTaskSensor]
>  operator already allows for encoding dependencies on tasks in external DAGs. 
> However, when you have teams, each owning multiple small-to-medium sized 
> DAGs, it is desirable to be able to wait on an external DagRun as a whole. 
> This allows the owners of an upstream DAG to refactor their code freely by 
> splitting/squashing task responsibilities, without worrying about dependent 
> DAGs breaking.
> I'll now enumerate the easiest ways of achieving this that come to mind:
> * Make all DAGs always have a join DummyOperator in the end, with a task id 
> that follows some convention, e.g., "{{ dag_id }}.__end__".
> * Make ExternalTaskSensor poke for a DagRun instead of TaskInstances when the 
> external_task_id argument is None.
> * Implement a separate DagRunSensor operator.
> After considerations, we decided to implement a separate operator, which 
> we've been using in the team for our workflows, and I think it would make a 
> good addition to contrib.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2697) Drop snakebite in favour of hdfs3

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2697:
-

Assignee: Julian de Ruiter  (was: Holden Karau's magical unicorn)

> Drop snakebite in favour of hdfs3
> -
>
> Key: AIRFLOW-2697
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2697
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Julian de Ruiter
>Assignee: Julian de Ruiter
>Priority: Major
>
> The current HdfsHook relies on the snakebite library, which is unfortunately 
> not compatible with Python 3. To add Python 3 support for the HdfsHook 
> requires switching to a different library for interacting with HDFS. The 
> hdfs3 library is an attractive alternative, as it supports Python 3 and seems 
> to be stable and relatively well supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2374) Airflow fails to show logs

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601628#comment-16601628
 ] 

Apache Spark commented on AIRFLOW-2374:
---

User 'berislavlopac' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3265

> Airflow fails to show logs
> --
>
> Key: AIRFLOW-2374
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2374
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Berislav Lopac
>Assignee: Berislav Lopac
>Priority: Blocker
>
> When viewing a log in the webserver, the page shows a loading gif and the log 
> never appears. Looking in the Javascript console, the problem appears to be 
> error 500 when loading the {{get_logs_with_metadata}} endpoint, givving the 
> following trace:
> {code:java}
>   / (  ()   )  \___
>  /( (  (  )   _))  )   )\
>(( (   )()  )   (   )  )
>  ((/  ( _(   )   (   _) ) (  () )  )
> ( (  ( (_)   (((   )  .((_ ) .  )_
>( (  )(  (  ))   ) . ) (   )
>   (  (   (  (   ) (  _  ( _) ).  ) . ) ) ( )
>   ( (  (   ) (  )   (  )) ) _)(   )  )  )
>  ( (  ( \ ) ((_  ( ) ( )  )   ) )  )) ( )
>   (  (   (  (   (_ ( ) ( _)  ) (  )  )   )
>  ( (  ( (  (  ) (_  )  ) )  _)   ) _( ( )
>   ((  (   )(( _)   _) _(_ (  (_ )
>(_((__(_(__(( ( ( |  ) ) ) )_))__))_)___)
>((__)\\||lll|l||///  \_))
> (   /(/ (  )  ) )\   )
>   (( ( ( | | ) ) )\   )
>(   /(| / ( )) ) ) )) )
>  ( ( _(|)_) )
>   (  ||\(|(|)|/|| )
> (|(||(||))
>   ( //|/l|||)|\\ \ )
> (/ / //  /|//\\  \ \  \ _)
> ---
> Node: airflow-nods-dev
> ---
> Traceback (most recent call last):
>   File 
> "/opt/airflow/src/apache-airflow/airflow/utils/log/gcs_task_handler.py", line 
> 113, in _read
> remote_log = self.gcs_read(remote_loc)
>   File 
> "/opt/airflow/src/apache-airflow/airflow/utils/log/gcs_task_handler.py", line 
> 131, in gcs_read
> return self.hook.download(bkt, blob).decode()
>   File "/opt/airflow/src/apache-airflow/airflow/contrib/hooks/gcs_hook.py", 
> line 107, in download
> .get_media(bucket=bucket, object=object) \
>   File "/usr/local/lib/python3.6/dist-packages/oauth2client/_helpers.py", 
> line 133, in positional_wrapper
> return wrapped(*args, **kwargs)
>   File "/usr/local/lib/python3.6/dist-packages/googleapiclient/http.py", line 
> 841, in execute
> raise HttpError(resp, content, uri=self.uri)
> googleapiclient.errors.HttpError:  https://www.googleapis.com/storage/v1/b/bucket-af/o/test-logs%2Fgeneric_transfer_single%2Ftransfer_file%2F2018-04-25T13%3A00%3A51.250983%2B00%3A00%2F1.log?alt=media
>  returned "Not Found">
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1982, in 
> wsgi_app
> response = self.full_dispatch_request()
>   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1614, in 
> full_dispatch_request
> rv = self.handle_user_exception(e)
>   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1517, in 
> handle_user_exception
> reraise(exc_type, exc_value, tb)
>   File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 33, in 
> reraise
> raise value
>   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1612, in 
> full_dispatch_request
> rv = self.dispatch_request()
>   File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1598, in 
> dispatch_request
> return self.view_functions[rule.endpoint](**req.view_args)
>   File "/usr/local/lib/python3.6/dist-packages/flask_admin/base.py", line 69, 
> in inner
> return self._run_view(f, *args, **kwargs)
>   File "/usr/local/lib/python3.6/dist-packages/flask_admin/base.py", line 
> 368, in _run_view
> return fn(self, *args, **kwargs)
>   File "/usr/local/lib/python3.6/dist-packages/flask_login.py", line 758, in 
> decorated_view
> return func(*args, **kwargs)
>   File "/opt/airflow/src/apache-airflow/airflow/www/utils.py", line 269, in 
> wrapper
> return f(*args, **kwargs)
>   File "/opt/airfl

[jira] [Assigned] (AIRFLOW-2928) Use uuid.uuid4 to create unique job name

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2928:
-

Assignee: (was: Holden Karau's magical unicorn)

> Use uuid.uuid4 to create unique job name
> 
>
> Key: AIRFLOW-2928
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2928
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Ken Kawamoto
>Priority: Minor
>
> some components in Airflow use the first 8 bytes of _uuid.uuid1_ to generate 
> a unique job name. The first 8 bytes, however, seem to come from clock. so if 
> this is called multiple times in a short time period, two ids will likely 
> collide.
> _uuid.uuid4_ provides random values.
> {code}
> Python 2.7.15 (default, Jun 17 2018, 12:46:58)
> [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import uuid
> >>> for i in range(10):
> ...   uuid.uuid1()
> ...
> UUID('e8bc9959-a586-11e8-ab8c-8c859010d0c2')
> UUID('e8c254e3-a586-11e8-ac39-8c859010d0c2')
> UUID('e8c2560f-a586-11e8-8251-8c859010d0c2')
> UUID('e8c256c2-a586-11e8-994a-8c859010d0c2')
> UUID('e8c25759-a586-11e8-9ba6-8c859010d0c2')
> UUID('e8c257e6-a586-11e8-a854-8c859010d0c2')
> UUID('e8c2587d-a586-11e8-89e9-8c859010d0c2')
> UUID('e8c2590a-a586-11e8-a825-8c859010d0c2')
> UUID('e8c25994-a586-11e8-9421-8c859010d0c2')
> UUID('e8c25a21-a586-11e8-83fd-8c859010d0c2')
> >>> for i in range(10):
> ...   uuid.uuid4()
> ...
> UUID('f1eba69f-18ea-467e-a414-b18d67f34a51')
> UUID('aaa4e18e-d4e6-42c9-905c-3cde714c2741')
> UUID('82f55c27-69ae-474b-ab9a-afcc7891587c')
> UUID('fab63643-ad33-4307-837b-68444fce7240')
> UUID('c4efca6c-3d1b-436c-8b09-e9b7f55ccefb')
> UUID('58de3a76-9d98-4427-8232-d6d7df2a1904')
> UUID('4f0a55e8-1357-4697-a345-e60891685b00')
> UUID('0fed47a3-07b6-423e-ae2e-d821c440cb63')
> UUID('144b2c55-a9bd-431d-b536-239fb2048a5e')
> UUID('d47fd8a0-48e9-4dcc-87f7-42c022c309a8')
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-1886) Failed jobs are not being counted towards max_active_runs_per_dag

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-1886:
-

Assignee: Oleg Yamin  (was: Holden Karau's magical unicorn)

> Failed jobs are not being counted towards max_active_runs_per_dag
> -
>
> Key: AIRFLOW-1886
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1886
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: 1.8.1
>Reporter: Oleg Yamin
>Assignee: Oleg Yamin
>Priority: Major
>
> # Currently, I have setup max_active_runs_per_dag = 2 in airflow.cfg but when 
> a DAG aborts, it will keep submitting next DAG in the queue not counting the 
> current incomplete DAG that is already in the queue. I am using 1.8.1 but i 
> see that the jobs.py in latest version is still not addressing this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2899) Sensitive data exposed when Exporting Variables

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2899:
-

Assignee: Kaxil Naik  (was: Holden Karau's magical unicorn)

> Sensitive data exposed when Exporting Variables
> ---
>
> Key: AIRFLOW-2899
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2899
> Project: Apache Airflow
>  Issue Type: Task
>  Components: security
>Affects Versions: 1.9.0, 1.8.2, 1.10.0
>Reporter: Kaxil Naik
>Assignee: Kaxil Naik
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: image-2018-08-14-15-39-17-680.png
>
>
> Currently, the sensitive variable is hidden from being exposed in the Web UI. 
> However, if the UI is compromised, someone can export variables where all the 
> sensitive variables are exported in plain text format.
>  !image-2018-08-14-15-39-17-680.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2394) Kubernetes operator should not require cmd and arguments

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601617#comment-16601617
 ] 

Apache Spark commented on AIRFLOW-2394:
---

User 'ese' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3289

> Kubernetes operator should not require cmd and arguments
> 
>
> Key: AIRFLOW-2394
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2394
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Sergio B
>Priority: Major
> Fix For: 1.10.0, 2.0.0
>
>
> KubernetesOperator should not require and rely on docker entrypoint for cmds 
> and docker command for arguments.
> If you do not define them in the container spec, kubernetes rely on the 
> docker entrypoint and command.
> https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#container-v1-core
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2697) Drop snakebite in favour of hdfs3

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2697:
-

Assignee: Holden Karau's magical unicorn  (was: Julian de Ruiter)

> Drop snakebite in favour of hdfs3
> -
>
> Key: AIRFLOW-2697
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2697
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Julian de Ruiter
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>
> The current HdfsHook relies on the snakebite library, which is unfortunately 
> not compatible with Python 3. To add Python 3 support for the HdfsHook 
> requires switching to a different library for interacting with HDFS. The 
> hdfs3 library is an attractive alternative, as it supports Python 3 and seems 
> to be stable and relatively well supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2404) Message for why a DAG run has not been scheduled missing information

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601626#comment-16601626
 ] 

Apache Spark commented on AIRFLOW-2404:
---

User 'AetherUnbound' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3286

> Message for why a DAG run has not been scheduled missing information
> 
>
> Key: AIRFLOW-2404
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2404
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: webserver
>Reporter: Matthew Bowden
>Assignee: Matthew Bowden
>Priority: Major
> Fix For: 2.0.0
>
>
> The webserver lists the following reasons for why a DAG run/task instance 
> might not be started:
>  * The scheduler is down or under heavy load
>  * This task instance already ran and had its state changed manually (e.g. 
> cleared in the UI)
> Another reason that the task might not have been started is because of the 
> following:
>  * The {{parallelism}} configuration value may be too low
>  * The {{dag_concurrency}} configuration value may be too low
>  * The {{max_active_dag_runs_per_dag}} configuration value may be too low
>  * The {{non_pooled_task_slot_count}} configuration value may be too low



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-1488) Add a sensor operator to wait on DagRuns

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601625#comment-16601625
 ] 

Apache Spark commented on AIRFLOW-1488:
---

User 'milanvdm' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3234

> Add a sensor operator to wait on DagRuns
> 
>
> Key: AIRFLOW-1488
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1488
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, operators
>Reporter: Yati
>Assignee: Yati
>Priority: Major
>
> The 
> [ExternalTaskSensor|https://airflow.incubator.apache.org/code.html#airflow.operators.ExternalTaskSensor]
>  operator already allows for encoding dependencies on tasks in external DAGs. 
> However, when you have teams, each owning multiple small-to-medium sized 
> DAGs, it is desirable to be able to wait on an external DagRun as a whole. 
> This allows the owners of an upstream DAG to refactor their code freely by 
> splitting/squashing task responsibilities, without worrying about dependent 
> DAGs breaking.
> I'll now enumerate the easiest ways of achieving this that come to mind:
> * Make all DAGs always have a join DummyOperator in the end, with a task id 
> that follows some convention, e.g., "{{ dag_id }}.__end__".
> * Make ExternalTaskSensor poke for a DagRun instead of TaskInstances when the 
> external_task_id argument is None.
> * Implement a separate DagRunSensor operator.
> After considerations, we decided to implement a separate operator, which 
> we've been using in the team for our workflows, and I think it would make a 
> good addition to contrib.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2916) Add argument `verify` for AwsHook() and S3 related sensors/operators

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2916:
-

Assignee: Holden Karau's magical unicorn  (was: Xiaodong DENG)

> Add argument `verify` for AwsHook() and S3 related sensors/operators
> 
>
> Key: AIRFLOW-2916
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2916
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks, operators
>Reporter: Xiaodong DENG
>Assignee: Holden Karau's magical unicorn
>Priority: Minor
>
> The AwsHook() and S3-related operators/sensors are depending on package boto3.
> In boto3, when we initiate a client or a resource, argument `verify` is 
> provided (https://boto3.readthedocs.io/en/latest/reference/core/session.html 
> ).
> It is useful when
>  # users want to use a different CA cert bundle than the one used by botocore.
>  # users want to have '--no-verify-ssl'. This is especially useful when we're 
> using on-premises S3 or other implementations of object storage, like IBM's 
> Cloud Object Storage.
> However, this feature is not provided in Airflow for S3 yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-855) Security - Airflow SQLAlchemy PickleType Allows for Code Execution

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601620#comment-16601620
 ] 

Apache Spark commented on AIRFLOW-855:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/2132

> Security - Airflow SQLAlchemy PickleType Allows for Code Execution
> --
>
> Key: AIRFLOW-855
> URL: https://issues.apache.org/jira/browse/AIRFLOW-855
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Attachments: test_dag.txt
>
>
> Impact: Anyone able to modify the application's underlying database, or a 
> computer where certain DAG tasks are executed, may execute arbitrary code on 
> the Airflow host.
> Location: The XCom class in /airflow-internal-master/airflow/models.py
> Description: Airflow uses the SQLAlchemy object-relational mapping (ORM) to 
> allow for a database agnostic, object-oriented manipulation of application 
> data. You express database tables and values using Python (in this 
> application's use) classes, and the ORM transparently manipulates the 
> underlying database, when you programatically access these structures.
> Airflow defines the following class, defining an XCom's11 ORM model:
> {code}
> class XCom(Base): 
>   """
>   Base class for XCom objects. 
>   """
>   __tablename__ = "xcom"
>   id = Column(Integer, primary_key=True) 
>   key = Column(String(512))
>   value = Column(PickleType(pickler=dill)) 
>   timestamp = Column(
> DateTime, default=func.now(), nullable=False) 
>   execution_date = Column(DateTime, nullable=False)
> {code}
> XComs are used for inter-task communication, and their values are either 
> defined in a DAG, or the return value of the python_callable() function or 
> the task's execute() method, executed on an remote host. XCom values are, 
> according to this model, of the PickleType, meaning that objects assigned to 
> the value column are transparently serialized (when being written to) and 
> deserialized (when being read from). The deserialization of user- controlled 
> pickle objects allows for the execution of arbitrary code. This means that 
> "slaves" (where DAG code is executed) can compromise "masters" (where DAGs 
> are defined in code) by returning an object that, when serialized (and 
> subsequently deserialized), causes remote code execution. This can also be 
> triggered by anyone who has write access to this portion of the database.
> Note: NCC Group plans to meet with developers in the coming days to discuss 
> this finding, and it will be updated to reflect any additional insight 
> provided by this meeting.
> Reproduction Steps:
> 1. Configure a local instance of Airflow.
> 2. Insert the attached DAG into your AIRFLOW_HOME/dags directory.
> This example models a slave returning a malicious object to a task's 
> python_callable by creating a portable object (with reduce) containing a 
> reverse shell and pushing it as an XCom's value. This value is serialized 
> upon xcom_push and deserialized upon xcom_pull.
> In an actual exploit scenario, this value would be DAG function's return 
> value, as assigned by code within the function, executing on a malicious 
> remote machine.
> 3. Start a netcat listener on your machine's port 
> 4. Execute this task from the command line with airflow run push 2016-11-17. 
> Note that your netcat listener has received a shell connect-back.
> Remediation: Consider the use of a custom SQLAlchemy data type that performs 
> this transparent serialization and deserialization, but with JSON (a 
> text-based exchange format), rather than pickles (which may contain code).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-1488) Add a sensor operator to wait on DagRuns

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-1488:
-

Assignee: Holden Karau's magical unicorn  (was: Yati)

> Add a sensor operator to wait on DagRuns
> 
>
> Key: AIRFLOW-1488
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1488
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, operators
>Reporter: Yati
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>
> The 
> [ExternalTaskSensor|https://airflow.incubator.apache.org/code.html#airflow.operators.ExternalTaskSensor]
>  operator already allows for encoding dependencies on tasks in external DAGs. 
> However, when you have teams, each owning multiple small-to-medium sized 
> DAGs, it is desirable to be able to wait on an external DagRun as a whole. 
> This allows the owners of an upstream DAG to refactor their code freely by 
> splitting/squashing task responsibilities, without worrying about dependent 
> DAGs breaking.
> I'll now enumerate the easiest ways of achieving this that come to mind:
> * Make all DAGs always have a join DummyOperator in the end, with a task id 
> that follows some convention, e.g., "{{ dag_id }}.__end__".
> * Make ExternalTaskSensor poke for a DagRun instead of TaskInstances when the 
> external_task_id argument is None.
> * Implement a separate DagRunSensor operator.
> After considerations, we decided to implement a separate operator, which 
> we've been using in the team for our workflows, and I think it would make a 
> good addition to contrib.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2315) S3Hook Extra Extras

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601618#comment-16601618
 ] 

Apache Spark commented on AIRFLOW-2315:
---

User 'jbacon' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3475

> S3Hook Extra Extras 
> 
>
> Key: AIRFLOW-2315
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2315
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: boto3
>Affects Versions: 1.9.0
>Reporter: Josh Bacon
>Assignee: Josh Bacon
>Priority: Minor
>  Labels: beginner, features, newbie, pull-request-available, 
> starter
> Fix For: 1.10.0
>
>
> Feature improvement request to S3Hook to support additional JSON extra 
> arguments to apply to both upload and download ExtraArgs.
> Allowed Upload Arguments: 
> [http://boto3.readthedocs.io/en/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS]
> Allowed Download Arguments: 
> [http://boto3.readthedocs.io/en/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2961) Speed up test_backfill_examples test

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2961:
-

Assignee: (was: Holden Karau's magical unicorn)

> Speed up test_backfill_examples test
> 
>
> Key: AIRFLOW-2961
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2961
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Fokko Driesprong
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2899) Sensitive data exposed when Exporting Variables

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2899:
-

Assignee: Holden Karau's magical unicorn  (was: Kaxil Naik)

> Sensitive data exposed when Exporting Variables
> ---
>
> Key: AIRFLOW-2899
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2899
> Project: Apache Airflow
>  Issue Type: Task
>  Components: security
>Affects Versions: 1.9.0, 1.8.2, 1.10.0
>Reporter: Kaxil Naik
>Assignee: Holden Karau's magical unicorn
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: image-2018-08-14-15-39-17-680.png
>
>
> Currently, the sensitive variable is hidden from being exposed in the Web UI. 
> However, if the UI is compromised, someone can export variables where all the 
> sensitive variables are exported in plain text format.
>  !image-2018-08-14-15-39-17-680.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2928) Use uuid.uuid4 to create unique job name

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2928:
-

Assignee: Holden Karau's magical unicorn

> Use uuid.uuid4 to create unique job name
> 
>
> Key: AIRFLOW-2928
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2928
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Ken Kawamoto
>Assignee: Holden Karau's magical unicorn
>Priority: Minor
>
> some components in Airflow use the first 8 bytes of _uuid.uuid1_ to generate 
> a unique job name. The first 8 bytes, however, seem to come from clock. so if 
> this is called multiple times in a short time period, two ids will likely 
> collide.
> _uuid.uuid4_ provides random values.
> {code}
> Python 2.7.15 (default, Jun 17 2018, 12:46:58)
> [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import uuid
> >>> for i in range(10):
> ...   uuid.uuid1()
> ...
> UUID('e8bc9959-a586-11e8-ab8c-8c859010d0c2')
> UUID('e8c254e3-a586-11e8-ac39-8c859010d0c2')
> UUID('e8c2560f-a586-11e8-8251-8c859010d0c2')
> UUID('e8c256c2-a586-11e8-994a-8c859010d0c2')
> UUID('e8c25759-a586-11e8-9ba6-8c859010d0c2')
> UUID('e8c257e6-a586-11e8-a854-8c859010d0c2')
> UUID('e8c2587d-a586-11e8-89e9-8c859010d0c2')
> UUID('e8c2590a-a586-11e8-a825-8c859010d0c2')
> UUID('e8c25994-a586-11e8-9421-8c859010d0c2')
> UUID('e8c25a21-a586-11e8-83fd-8c859010d0c2')
> >>> for i in range(10):
> ...   uuid.uuid4()
> ...
> UUID('f1eba69f-18ea-467e-a414-b18d67f34a51')
> UUID('aaa4e18e-d4e6-42c9-905c-3cde714c2741')
> UUID('82f55c27-69ae-474b-ab9a-afcc7891587c')
> UUID('fab63643-ad33-4307-837b-68444fce7240')
> UUID('c4efca6c-3d1b-436c-8b09-e9b7f55ccefb')
> UUID('58de3a76-9d98-4427-8232-d6d7df2a1904')
> UUID('4f0a55e8-1357-4697-a345-e60891685b00')
> UUID('0fed47a3-07b6-423e-ae2e-d821c440cb63')
> UUID('144b2c55-a9bd-431d-b536-239fb2048a5e')
> UUID('d47fd8a0-48e9-4dcc-87f7-42c022c309a8')
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2961) Speed up test_backfill_examples test

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2961:
-

Assignee: Holden Karau's magical unicorn

> Speed up test_backfill_examples test
> 
>
> Key: AIRFLOW-2961
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2961
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Fokko Driesprong
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-1158) Multipart uploads to s3 cut off at nearest division

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-1158:
-

Assignee: Maksim Pecherskiy  (was: Holden Karau's magical unicorn)

> Multipart uploads to s3 cut off at nearest division
> ---
>
> Key: AIRFLOW-1158
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1158
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: aws, hooks
>Reporter: Maksim Pecherskiy
>Assignee: Maksim Pecherskiy
>Priority: Minor
>
> When I try to upload a file of say 104MBs, using multipart uploads of 10MB 
> chunks, I get 10 chunks of 10MBs and that's it.  The 4MBs left over do not 
> get uploaded.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2697) Drop snakebite in favour of hdfs3

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601619#comment-16601619
 ] 

Apache Spark commented on AIRFLOW-2697:
---

User 'jrderuiter' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3560

> Drop snakebite in favour of hdfs3
> -
>
> Key: AIRFLOW-2697
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2697
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Julian de Ruiter
>Assignee: Julian de Ruiter
>Priority: Major
>
> The current HdfsHook relies on the snakebite library, which is unfortunately 
> not compatible with Python 3. To add Python 3 support for the HdfsHook 
> requires switching to a different library for interacting with HDFS. The 
> hdfs3 library is an attractive alternative, as it supports Python 3 and seems 
> to be stable and relatively well supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2574) initdb fails when mysql password contains percent sign

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2574:
-

Assignee: Holden Karau's magical unicorn

> initdb fails when mysql password contains percent sign
> --
>
> Key: AIRFLOW-2574
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2574
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Reporter: Zihao Zhang
>Assignee: Holden Karau's magical unicorn
>Priority: Minor
>
> [db.py|https://github.com/apache/incubator-airflow/blob/3358551c8e73d9019900f7a85f18ebfd88591450/airflow/utils/db.py#L345]
>  uses 
> [config.set_main_option|http://alembic.zzzcomputing.com/en/latest/api/config.html#alembic.config.Config.set_main_option]
>  which says "A raw percent sign not part of an interpolation symbol must 
> therefore be escaped"
> When there is a percent sign in database connection string, this will crash 
> due to bad interpolation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-1886) Failed jobs are not being counted towards max_active_runs_per_dag

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601615#comment-16601615
 ] 

Apache Spark commented on AIRFLOW-1886:
---

User 'oyamin' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/2846

> Failed jobs are not being counted towards max_active_runs_per_dag
> -
>
> Key: AIRFLOW-1886
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1886
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DagRun
>Affects Versions: 1.8.1
>Reporter: Oleg Yamin
>Assignee: Oleg Yamin
>Priority: Major
>
> # Currently, I have setup max_active_runs_per_dag = 2 in airflow.cfg but when 
> a DAG aborts, it will keep submitting next DAG in the queue not counting the 
> current incomplete DAG that is already in the queue. I am using 1.8.1 but i 
> see that the jobs.py in latest version is still not addressing this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2408) Remove coveralls deps

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601616#comment-16601616
 ] 

Apache Spark commented on AIRFLOW-2408:
---

User 'Fokko' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3295

> Remove coveralls deps
> -
>
> Key: AIRFLOW-2408
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2408
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Fokko Driesprong
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2222) GoogleCloudStorageHook.copy fails for large files between locations

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601622#comment-16601622
 ] 

Apache Spark commented on AIRFLOW-:
---

User 'berislavlopac' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3264

> GoogleCloudStorageHook.copy fails for large files between locations
> ---
>
> Key: AIRFLOW-
> URL: https://issues.apache.org/jira/browse/AIRFLOW-
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Berislav Lopac
>Assignee: Berislav Lopac
>Priority: Major
> Fix For: 1.10.0, 2.0.0
>
>
> When copying large files (confirmed for around 3GB) between buckets in 
> different projects, the operation fails and the Google API returns error 
> [413—Payload Too 
> Large|https://cloud.google.com/storage/docs/json_api/v1/status-codes#413_Payload_Too_Large].
>  The documentation for the error says:
> {quote}The Cloud Storage JSON API supports up to 5 TB objects.
> This error may, alternatively, arise if copying objects between locations 
> and/or storage classes can not complete within 30 seconds. In this case, use 
> the 
> [Rewrite|https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite] 
> method instead.{quote}
> The reason seems to be that the {{GoogleCloudStorageHook.copy}} is using the 
> API {{copy}} method.
> h3. Proposed Solution
> There are two potential solutions:
> # Implement {{GoogleCloudStorageHook.rewrite}} method which can be called 
> from operators and other objects to ensure successful execution. This method 
> is more flexible but requires changes both in the {{GoogleCloudStorageHook}} 
> class and any other classes that use it for copying files to ensure that they 
> explicitly call {{rewrite}} when needed.
> # Modify {{GoogleCloudStorageHook.copy}} to determine when to use {{rewrite}} 
> instead of {{copy}} underneath. This requires updating only the 
> {{GoogleCloudStorageHook}} class, but the logic might not cover all the edge 
> cases and could be difficult to implement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2574) initdb fails when mysql password contains percent sign

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-2574:
-

Assignee: (was: Holden Karau's magical unicorn)

> initdb fails when mysql password contains percent sign
> --
>
> Key: AIRFLOW-2574
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2574
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Reporter: Zihao Zhang
>Priority: Minor
>
> [db.py|https://github.com/apache/incubator-airflow/blob/3358551c8e73d9019900f7a85f18ebfd88591450/airflow/utils/db.py#L345]
>  uses 
> [config.set_main_option|http://alembic.zzzcomputing.com/en/latest/api/config.html#alembic.config.Config.set_main_option]
>  which says "A raw percent sign not part of an interpolation symbol must 
> therefore be escaped"
> When there is a percent sign in database connection string, this will crash 
> due to bad interpolation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2457) Upgrade FAB version in setup.py to support timezone

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601609#comment-16601609
 ] 

Apache Spark commented on AIRFLOW-2457:
---

User 'jgao54' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3349

> Upgrade FAB version in setup.py to support timezone
> ---
>
> Key: AIRFLOW-2457
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2457
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: 1.10
>Reporter: Joy Gao
>Assignee: Joy Gao
>Priority: Major
> Fix For: 1.10.0
>
>
> FAB 1.9.6 doesn't support datetime with timezones, upgrade to 1.10.0 will fix 
> this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-1514) View Log goes out of memory

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601601#comment-16601601
 ] 

Apache Spark commented on AIRFLOW-1514:
---

User 'NielsZeilemaker' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/2526

> View Log goes out of memory
> ---
>
> Key: AIRFLOW-1514
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1514
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Niels Zeilemaker
>Assignee: Niels Zeilemaker
>Priority: Major
>
> If you attempt to view a logfile which is big, we get a out of memory 
> exception from jinja.
> Let's only show the tail of the logfile + a link to a raw log page which 
> doesn't use jinja.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-1867) sendgrid fails on python3 with attachments

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-1867:
-

Assignee: Holden Karau's magical unicorn

> sendgrid fails on python3 with attachments
> --
>
> Key: AIRFLOW-1867
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1867
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Scott Kruger
>Assignee: Holden Karau's magical unicorn
>Priority: Minor
>
> Sendgrid emails raise an exception on python 3 when attaching files due to 
> {{base64.b64encode}} returning {{bytes}} rather than {{unicode/string}} (see: 
> https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/utils/sendgrid.py#L69).
>   The fix is simple: decode the base64 data to `utf-8`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2409) At user creation allow the password as a parameter

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601596#comment-16601596
 ] 

Apache Spark commented on AIRFLOW-2409:
---

User 'Fokko' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3302

> At user creation allow the password as a parameter
> --
>
> Key: AIRFLOW-2409
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2409
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
> Fix For: 1.10.0, 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-1867) sendgrid fails on python3 with attachments

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601600#comment-16601600
 ] 

Apache Spark commented on AIRFLOW-1867:
---

User 'thesquelched' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/2824

> sendgrid fails on python3 with attachments
> --
>
> Key: AIRFLOW-1867
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1867
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Scott Kruger
>Priority: Minor
>
> Sendgrid emails raise an exception on python 3 when attaching files due to 
> {{base64.b64encode}} returning {{bytes}} rather than {{unicode/string}} (see: 
> https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/utils/sendgrid.py#L69).
>   The fix is simple: decode the base64 data to `utf-8`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-51) docker_operator - Improve the integration with swarm

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601602#comment-16601602
 ] 

Apache Spark commented on AIRFLOW-51:
-

User 'asnir' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/2491

> docker_operator - Improve the integration with swarm
> 
>
> Key: AIRFLOW-51
> URL: https://issues.apache.org/jira/browse/AIRFLOW-51
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Amikam Snir
>Priority: Minor
>  Labels: operators
>
> Swarm is not well supported by docker_operator, due to this issue: 
> https://github.com/docker/swarm/issues/475
> In order to fix it, we will use cpu_shares instead of cpus.
> p.s. The default value is None.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-1998) Implement Databricks Operator for jobs/run-now endpoint

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-1998:
-

Assignee: Israel Knight  (was: Holden Karau's magical unicorn)

> Implement Databricks Operator for jobs/run-now endpoint
> ---
>
> Key: AIRFLOW-1998
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1998
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: hooks, operators
>Affects Versions: 1.9.0
>Reporter: Diego Rabatone Oliveira
>Assignee: Israel Knight
>Priority: Major
>
> Implement a Operator to deal with Databricks '2.0/jobs/run-now' API Endpoint.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2437) Add PubNub to README

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601611#comment-16601611
 ] 

Apache Spark commented on AIRFLOW-2437:
---

User 'jzucker2' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3332

> Add PubNub to README
> 
>
> Key: AIRFLOW-2437
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2437
> Project: Apache Airflow
>  Issue Type: Wish
>Reporter: Jordan Zucker
>Assignee: Jordan Zucker
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Add PubNub to current list of Airflow users



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-1423) Enhance scheduler logs to better explain DAG runs decisions

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-1423:
-

Assignee: Holden Karau's magical unicorn

> Enhance scheduler logs to better explain DAG runs decisions
> ---
>
> Key: AIRFLOW-1423
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1423
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Ultrabug
>Assignee: Holden Karau's magical unicorn
>Priority: Major
> Attachments: add_scheduler_logs.patch
>
>
> One of the most frustrating topic for users is usually related to their 
> understanding on the scheduler decisions about running a DAG or not.
> It would be wise to add more logs in the jobs creation decision so that it 
> gets more clear whether a DAG is run or not and why.
> This patch adds such simple and useful logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2333) Add Segment Hook to Airflow

2018-09-02 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/AIRFLOW-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601610#comment-16601610
 ] 

Apache Spark commented on AIRFLOW-2333:
---

User 'jzucker2' has created a pull request for this issue:
https://github.com/apache/incubator-airflow/pull/3237

> Add Segment Hook to Airflow
> ---
>
> Key: AIRFLOW-2333
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2333
> Project: Apache Airflow
>  Issue Type: New Feature
>  Components: contrib, hooks
>Reporter: Jordan Zucker
>Assignee: Jordan Zucker
>Priority: Minor
> Fix For: 1.10.0, 2.0.0
>
>
> [Segment]([https://segment.com/)] is used by many to track analytics. Would 
> be nice to allow Airflow to interact with Segment and store username and 
> password with encryption in its database.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-1514) View Log goes out of memory

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-1514:
-

Assignee: Holden Karau's magical unicorn  (was: Niels Zeilemaker)

> View Log goes out of memory
> ---
>
> Key: AIRFLOW-1514
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1514
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Niels Zeilemaker
>Assignee: Holden Karau's magical unicorn
>Priority: Major
>
> If you attempt to view a logfile which is big, we get a out of memory 
> exception from jinja.
> Let's only show the tail of the logfile + a link to a raw log page which 
> doesn't use jinja.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-1342) S3 connections use host "extra" parameter, not Host parameter on connection

2018-09-02 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned AIRFLOW-1342:
-

Assignee: Holden Karau's magical unicorn  (was: Victor Duarte Diniz 
Monteiro)

> S3 connections use host "extra" parameter, not Host parameter on connection
> ---
>
> Key: AIRFLOW-1342
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1342
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.8, 1.8.1, 1.8.2
>Reporter: Bryan Vanderhoof
>Assignee: Holden Karau's magical unicorn
>Priority: Trivial
>
> The ability to connect to S3 using Sigv4 was added to resolve AIRFLOW-1034. 
> That implementation expects a "host" parameter to be added to the JSON object 
> in the Extra field on the connection object.
> However, the S3 connection object contains a top-level Host variable, which 
> remains unused. This is deeply counterintuitive. The default should be to 
> leverage this Host variable (and optionally the "host" parameter in the Extra 
> object, to maintain compatibility with the existing implementation).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 3 4 5 6 7 >

1 - 100 of 693 matches

Mail list logo