[jira] [Commented] (AIRFLOW-2489) Align Flask dependencies with FlaskAppBuilder
[ https://issues.apache.org/jira/browse/AIRFLOW-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601636#comment-16601636 ] Apache Spark commented on AIRFLOW-2489: --- User 'Fokko' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3382 > Align Flask dependencies with FlaskAppBuilder > - > > Key: AIRFLOW-2489 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2489 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 2.0.0 > > > Right now it might take a while to update the dependencies. And we would like > to update the dependencies to make sure that we don't have any version > conflicts like: > Traceback (most recent call last): > File > "/home/travis/build/Fokko/incubator-airflow/.tox/py27-backend_postgres/bin/airflow", > line 4, in > > __import__('pkg_resources').require('apache-airflow==2.0.0.dev0+incubating') > File > "/home/travis/build/Fokko/incubator-airflow/.tox/py27-backend_postgres/lib/python2.7/site-packages/pkg_resources/__init__.py", > line 3086, in > @_call_aside > File > "/home/travis/build/Fokko/incubator-airflow/.tox/py27-backend_postgres/lib/python2.7/site-packages/pkg_resources/__init__.py", > line 3070, in _call_aside > f(*args, **kwargs) > File > "/home/travis/build/Fokko/incubator-airflow/.tox/py27-backend_postgres/lib/python2.7/site-packages/pkg_resources/__init__.py", > line 3099, in _initialize_master_working_set > working_set = WorkingSet._build_master() > File > "/home/travis/build/Fokko/incubator-airflow/.tox/py27-backend_postgres/lib/python2.7/site-packages/pkg_resources/__init__.py", > line 576, in _build_master > return cls._build_from_requirements(__requires__) > File > "/home/travis/build/Fokko/incubator-airflow/.tox/py27-backend_postgres/lib/python2.7/site-packages/pkg_resources/__init__.py", > line 589, in _build_from_requirements > dists = ws.resolve(reqs, Environment()) > File > "/home/travis/build/Fokko/incubator-airflow/.tox/py27-backend_postgres/lib/python2.7/site-packages/pkg_resources/__init__.py", > line 783, in resolve > raise VersionConflict(dist, req).with_context(dependent_req) > pkg_resources.ContextualVersionConflict: (Flask 0.12.4 > (/home/travis/build/Fokko/incubator-airflow/.tox/py27-backend_postgres/lib/python2.7/site-packages), > Requirement.parse('Flask<0.12.2,>=0.10.0'), set(['flask-appbuilder'])) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2476) tabulate update: 0.8.2 is tested
[ https://issues.apache.org/jira/browse/AIRFLOW-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2476: - Assignee: Ruslan Dautkhanov (was: Holden Karau's magical unicorn) > tabulate update: 0.8.2 is tested > > > Key: AIRFLOW-2476 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2476 > Project: Apache Airflow > Issue Type: Improvement >Affects Versions: Airflow 1.8, 1.9.0, 1.10.0, 2.0.0, 1.10 >Reporter: Ruslan Dautkhanov >Assignee: Ruslan Dautkhanov >Priority: Major > > As discussed on the dev list, tabulate==0.8.2 is good to go with Airflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2484) Dictionary contains duplicate keys in MySQL to GCS Op
[ https://issues.apache.org/jira/browse/AIRFLOW-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601633#comment-16601633 ] Apache Spark commented on AIRFLOW-2484: --- User 'kaxil' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3376 > Dictionary contains duplicate keys in MySQL to GCS Op > - > > Key: AIRFLOW-2484 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2484 > Project: Apache Airflow > Issue Type: Task > Components: contrib, gcp >Affects Versions: 1.9.0, 1.10.0 >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Minor > Fix For: 1.10.0 > > > Helper function that maps from MySQL fields to BigQuery fields `type_map` > contains duplicate keys in MySQL to GCS Op -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2420) Add functionality for Azure Data Lake
[ https://issues.apache.org/jira/browse/AIRFLOW-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601635#comment-16601635 ] Apache Spark commented on AIRFLOW-2420: --- User 'marcusrehm' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/ > Add functionality for Azure Data Lake > - > > Key: AIRFLOW-2420 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2420 > Project: Apache Airflow > Issue Type: New Feature > Components: hooks >Reporter: Marcus Rehm >Assignee: Marcus Rehm >Priority: Major > Fix For: 2.0.0 > > > Currently Airflow has a hook for Azure Blob Storage but it does not support > Azure Data Lake. > As a first step a hook would interface with Azure Data Lake via the Python > SDK over the adl protocol. > > The hook would have a simple interface to upload and download files with all > parameters available in ADL sdk and also a check for file to query if a file > exists in the data lake. This last functions will enable sensors development > in the future. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2496) v1-10-test branch reports version 2.0 instead of 1.10
[ https://issues.apache.org/jira/browse/AIRFLOW-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2496: - Assignee: Holden Karau's magical unicorn > v1-10-test branch reports version 2.0 instead of 1.10 > - > > Key: AIRFLOW-2496 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2496 > Project: Apache Airflow > Issue Type: Bug > Components: release >Affects Versions: 1.10 >Reporter: Craig Rodrigues >Assignee: Holden Karau's magical unicorn >Priority: Minor > Fix For: 1.10 > > > I created a requirements.txt with one line: git+ > [https://github.com/apache/incubator-airflow@v1-10-test#egg=apache-airflow[celery,crypto,emr,hive,hdfs,ldap,mysql,postgres,redis,slack,s3]] > I then did: 1. create a virtual environment 2. pip install -r > requirements.txt 3. airflow webserver When I look at the version in the web > interface, it shows a version of: 2.0.0.dev0+incubating even though I used > the v1-10-test branch. This seems wrong. It ooks like these two commits got > merged to v1-10-test branch which bump the version to 2.0: > [https://github.com/apache/incubator-airflow/commit/305a787] > [https://github.com/apache/incubator-airflow/commit/a30acaf] > That seems wrong for v1-10-test branch. It would be nice if this version was > 1.10.0.dev0+incubating (or whatever), since it looks like I will need to > deploy v1-10-test branch to prod this week, and then very soon after when > 1.10 is released, re-deploy airflow 1.10. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2460) KubernetesPodOperator should be able to attach to volume mounts and configmaps
[ https://issues.apache.org/jira/browse/AIRFLOW-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601634#comment-16601634 ] Apache Spark commented on AIRFLOW-2460: --- User 'dimberman' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3356 > KubernetesPodOperator should be able to attach to volume mounts and configmaps > -- > > Key: AIRFLOW-2460 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2460 > Project: Apache Airflow > Issue Type: Bug >Reporter: Daniel Imberman >Assignee: Daniel Imberman >Priority: Major > Fix For: 1.10.0, 2.0.0 > > > In order to run tasks using the KubernetesPodOperator in a production > setting, users need to be able to access pre-existing data through > PersistentVolumes or ConfigMaps. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2496) v1-10-test branch reports version 2.0 instead of 1.10
[ https://issues.apache.org/jira/browse/AIRFLOW-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2496: - Assignee: (was: Holden Karau's magical unicorn) > v1-10-test branch reports version 2.0 instead of 1.10 > - > > Key: AIRFLOW-2496 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2496 > Project: Apache Airflow > Issue Type: Bug > Components: release >Affects Versions: 1.10 >Reporter: Craig Rodrigues >Priority: Minor > Fix For: 1.10 > > > I created a requirements.txt with one line: git+ > [https://github.com/apache/incubator-airflow@v1-10-test#egg=apache-airflow[celery,crypto,emr,hive,hdfs,ldap,mysql,postgres,redis,slack,s3]] > I then did: 1. create a virtual environment 2. pip install -r > requirements.txt 3. airflow webserver When I look at the version in the web > interface, it shows a version of: 2.0.0.dev0+incubating even though I used > the v1-10-test branch. This seems wrong. It ooks like these two commits got > merged to v1-10-test branch which bump the version to 2.0: > [https://github.com/apache/incubator-airflow/commit/305a787] > [https://github.com/apache/incubator-airflow/commit/a30acaf] > That seems wrong for v1-10-test branch. It would be nice if this version was > 1.10.0.dev0+incubating (or whatever), since it looks like I will need to > deploy v1-10-test branch to prod this week, and then very soon after when > 1.10 is released, re-deploy airflow 1.10. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2476) tabulate update: 0.8.2 is tested
[ https://issues.apache.org/jira/browse/AIRFLOW-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2476: - Assignee: Holden Karau's magical unicorn (was: Ruslan Dautkhanov) > tabulate update: 0.8.2 is tested > > > Key: AIRFLOW-2476 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2476 > Project: Apache Airflow > Issue Type: Improvement >Affects Versions: Airflow 1.8, 1.9.0, 1.10.0, 2.0.0, 1.10 >Reporter: Ruslan Dautkhanov >Assignee: Holden Karau's magical unicorn >Priority: Major > > As discussed on the dev list, tabulate==0.8.2 is good to go with Airflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2486) Extra slash in base_url when port provided
[ https://issues.apache.org/jira/browse/AIRFLOW-2486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601632#comment-16601632 ] Apache Spark commented on AIRFLOW-2486: --- User 'jason-udacity' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3379 > Extra slash in base_url when port provided > -- > > Key: AIRFLOW-2486 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2486 > Project: Apache Airflow > Issue Type: Bug >Reporter: Jason Shao >Assignee: Jason Shao >Priority: Major > Fix For: 1.10.0, 2.0.0 > > > {{Issue in > }}[incubator-airflow|https://github.com/jason-udacity/incubator-airflow]/{{[airflow/hooks/http_hook.py|https://github.com/apache/incubator-airflow/pull/3377/files#diff-80514189dfbbac3803594380c3a714f1]}} > {{self.base_url}} includes an unnecessary slash when {{conn.port}} is > specified. > This often leads to unintended redirects that are especially problematic when > a request body is needed. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2482) Add test for rewrite method for GCS Hook
[ https://issues.apache.org/jira/browse/AIRFLOW-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601637#comment-16601637 ] Apache Spark commented on AIRFLOW-2482: --- User 'kaxil' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3374 > Add test for rewrite method for GCS Hook > > > Key: AIRFLOW-2482 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2482 > Project: Apache Airflow > Issue Type: Test > Components: contrib, gcp >Affects Versions: 1.10.0 >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Minor > Fix For: 2.0.0 > > > The Tests for rewrite method in gcs hook is missing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1840) Fix Celery config
[ https://issues.apache.org/jira/browse/AIRFLOW-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601630#comment-16601630 ] Apache Spark commented on AIRFLOW-1840: --- User 'ashb' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3549 > Fix Celery config > - > > Key: AIRFLOW-1840 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1840 > Project: Apache Airflow > Issue Type: Bug >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.10.0 > > > While configuring the Celery executor I keep running into this problem: > ==> /var/log/airflow/scheduler.log <== > Traceback (most recent call last): > File > "/usr/local/lib/python2.7/dist-packages/airflow/executors/celery_executor.py", > line 83, in sync > state = async.state > File "/usr/local/lib/python2.7/dist-packages/celery/result.py", line 394, > in state > return self._get_task_meta()['status'] > File "/usr/local/lib/python2.7/dist-packages/celery/result.py", line 339, > in _get_task_meta > return self._maybe_set_cache(self.backend.get_task_meta(self.id)) > File "/usr/local/lib/python2.7/dist-packages/celery/backends/base.py", line > 307, in get_task_meta > meta = self._get_task_meta_for(task_id) > AttributeError: 'DisabledBackend' object has no attribute '_get_task_meta_for' -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2733) Airflow webserver crashes with refresh interval <= 0 in daemon mode
[ https://issues.apache.org/jira/browse/AIRFLOW-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2733: - Assignee: Holden Karau's magical unicorn > Airflow webserver crashes with refresh interval <= 0 in daemon mode > --- > > Key: AIRFLOW-2733 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2733 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Assignee: Holden Karau's magical unicorn >Priority: Major > > In airflow/bin/cli.py calls to mointor_gunicorn sub-function > ([https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L841)] > are made using a mix of psutil.Process objects and subprocess.Popen objects. > > Case 1: > [https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L878] > Case 2: > [https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L884] > > In the event worker_refresh_interval is <=0 and we are in daemon mode, we end > up calling a non-existent `poll` function on a psutil.Process object: > https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L847 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2625) Create an API to list all the available DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2625: - Assignee: Holden Karau's magical unicorn > Create an API to list all the available DAGs > > > Key: AIRFLOW-2625 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2625 > Project: Apache Airflow > Issue Type: New Feature > Components: api, DAG >Reporter: Verdan Mahmood >Assignee: Holden Karau's magical unicorn >Priority: Major > Labels: api, api-required > > There should be an API to list all the available DAGs in the system. (this is > basically same as the DAGs list page aka Airflow home page) > This should include all the basic information related to a DAG. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2537) clearing tasks shouldn't set backfill DAG runs to `running`
[ https://issues.apache.org/jira/browse/AIRFLOW-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2537: - Assignee: Holden Karau's magical unicorn > clearing tasks shouldn't set backfill DAG runs to `running` > --- > > Key: AIRFLOW-2537 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2537 > Project: Apache Airflow > Issue Type: Bug >Reporter: Maxime Beauchemin >Assignee: Holden Karau's magical unicorn >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2633) Retry loop on AWSBatchOperator won't quit
[ https://issues.apache.org/jira/browse/AIRFLOW-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2633: - Assignee: Holden Karau's magical unicorn (was: Sebastian Schwartz) > Retry loop on AWSBatchOperator won't quit > - > > Key: AIRFLOW-2633 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2633 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: 2.0.0 >Reporter: Sebastian Schwartz >Assignee: Holden Karau's magical unicorn >Priority: Major > Labels: patch, pull-request-available > Fix For: 2.0.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > The exponential backoff retry loop that is a fallback for AWSBatchOperator as > a strategy for polling job success does not quit until maximum retries is > reached due to a control flow error. This is a simply one line fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2642) [kubernetes executor worker] the value of git-sync init container ENV GIT_SYNC_ROOT is wrong
[ https://issues.apache.org/jira/browse/AIRFLOW-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2642: - Assignee: pengchen (was: Holden Karau's magical unicorn) > [kubernetes executor worker] the value of git-sync init container ENV > GIT_SYNC_ROOT is wrong > > > Key: AIRFLOW-2642 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2642 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Affects Versions: 2.0.0, 1.10 >Reporter: pengchen >Assignee: pengchen >Priority: Major > Fix For: 1.10 > > > There are two way of syncing dags, pvc and git-sync. When we use git-sync > this way, the generated worker pod yaml file fragment is as follows > > {code:java} > worker container: > --- > containers: > - args: > - airflow run tutorial1 print_date 2018-06-19T07:57:15.011693+00:00 --local > -sd > /root/airflow/dags/dags/example_dags/tutorial1.py > command: > - bash > - -cx > - -- > env: > - name: AIRFLOW__CORE__AIRFLOW_HOME > value: /root/airflow > - name: AIRFLOW__CORE__EXECUTOR > value: LocalExecutor > - name: AIRFLOW__CORE__DAGS_FOLDER > value: /tmp/dags > - name: SQL_ALCHEMY_CONN > valueFrom: > secretKeyRef: > key: sql_alchemy_conn > name: airflow-secrets > init container: > --- > initContainers: > - env: > - name: GIT_SYNC_REPO > value: https://code.devops.xiaohongshu.com/pengchen/Airflow-DAGs.git > - name: GIT_SYNC_BRANCH > value: master > - name: GIT_SYNC_ROOT > value: /tmp > - name: GIT_SYNC_DEST > value: dags > - name: GIT_SYNC_ONE_TIME > value: "true" > - name: GIT_SYNC_USERNAME > value: XXX > - name: GIT_SYNC_PASSWORD > value: XXX > image: library/git-sync-amd64:v2.0.5 > imagePullPolicy: IfNotPresent > name: git-sync-clone > resources: {} > securityContext: > runAsUser: 0 > terminationMessagePath: /dev/termination-log > terminationMessagePolicy: File > volumeMounts: > - mountPath: /root/airflow/dags/ > name: airflow-dags > - mountPath: /root/airflow/logs > name: airflow-logs > - mountPath: /root/airflow/airflow.cfg > name: airflow-config > readOnly: true > subPath: airflow.cfg > - mountPath: /var/run/secrets/kubernetes.io/serviceaccount > name: default-token-xz87t > readOnly: true > {code} > According to the configuration, git-sync will synchronize dags to /tmp/dags > directory. However the worker container command args(airflow run tutorial1 > print_date 2018-06-19T07:57:15.011693+00:00 --local -sd > /root/airflow/dags/dags/example_dags/tutorial1.py) are generated by the > scheduler. Therefore, the task error is as follows > {code:java} > + airflow run tutorial1 print_date 2018-06-19T07:57:15.011693+00:00 --local > -sd /root/airflow/dags/dags/example_dags/tutorial1.py > [2018-06-19 07:57:29,075] {settings.py:174} INFO - setting.configure_orm(): > Using pool settings. pool_size=5, pool_recycle=1800 > [2018-06-19 07:57:29,232] {__init__.py:51} INFO - Using executor LocalExecutor > [2018-06-19 07:57:29,373] {models.py:219} INFO - Filling up the DagBag from > /root/airflow/dags/dags/example_dags/tutorial1.py > [2018-06-19 07:57:29,648] {models.py:310} INFO - File > /usr/local/lib/python2.7/dist-packages/airflow/example_dags/__init__.py > assumed to contain no DAGs. Skipping. > Traceback (most recent call last): > File "/usr/local/bin/airflow", line 32, in > args.func(args) > File "/usr/local/lib/python2.7/dist-packages/airflow/utils/cli.py", line 74, > in wrapper > return f(*args, **kwargs) > File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 475, > in run > dag = get_dag(args) > File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 146, > in get_dag > 'parse.'.format(args.dag_id)) > airflow.exceptions.AirflowException: dag_id could not be found: tutorial1. > Either the dag did not exist or it failed to parse. > {code} > > The log shows that the worker cannot find the corresponding dag, so I think > the environment variable GIT_SYNC_ROOT should be consistent with > dag_volume_mount_path. > The worker's environment variable AIRFLOW__CORE__DAGS_FOLDER is invalid, and > AIRFLOW__CORE__EXECUTOR is also invalid > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2642) [kubernetes executor worker] the value of git-sync init container ENV GIT_SYNC_ROOT is wrong
[ https://issues.apache.org/jira/browse/AIRFLOW-2642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2642: - Assignee: Holden Karau's magical unicorn (was: pengchen) > [kubernetes executor worker] the value of git-sync init container ENV > GIT_SYNC_ROOT is wrong > > > Key: AIRFLOW-2642 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2642 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Affects Versions: 2.0.0, 1.10 >Reporter: pengchen >Assignee: Holden Karau's magical unicorn >Priority: Major > Fix For: 1.10 > > > There are two way of syncing dags, pvc and git-sync. When we use git-sync > this way, the generated worker pod yaml file fragment is as follows > > {code:java} > worker container: > --- > containers: > - args: > - airflow run tutorial1 print_date 2018-06-19T07:57:15.011693+00:00 --local > -sd > /root/airflow/dags/dags/example_dags/tutorial1.py > command: > - bash > - -cx > - -- > env: > - name: AIRFLOW__CORE__AIRFLOW_HOME > value: /root/airflow > - name: AIRFLOW__CORE__EXECUTOR > value: LocalExecutor > - name: AIRFLOW__CORE__DAGS_FOLDER > value: /tmp/dags > - name: SQL_ALCHEMY_CONN > valueFrom: > secretKeyRef: > key: sql_alchemy_conn > name: airflow-secrets > init container: > --- > initContainers: > - env: > - name: GIT_SYNC_REPO > value: https://code.devops.xiaohongshu.com/pengchen/Airflow-DAGs.git > - name: GIT_SYNC_BRANCH > value: master > - name: GIT_SYNC_ROOT > value: /tmp > - name: GIT_SYNC_DEST > value: dags > - name: GIT_SYNC_ONE_TIME > value: "true" > - name: GIT_SYNC_USERNAME > value: XXX > - name: GIT_SYNC_PASSWORD > value: XXX > image: library/git-sync-amd64:v2.0.5 > imagePullPolicy: IfNotPresent > name: git-sync-clone > resources: {} > securityContext: > runAsUser: 0 > terminationMessagePath: /dev/termination-log > terminationMessagePolicy: File > volumeMounts: > - mountPath: /root/airflow/dags/ > name: airflow-dags > - mountPath: /root/airflow/logs > name: airflow-logs > - mountPath: /root/airflow/airflow.cfg > name: airflow-config > readOnly: true > subPath: airflow.cfg > - mountPath: /var/run/secrets/kubernetes.io/serviceaccount > name: default-token-xz87t > readOnly: true > {code} > According to the configuration, git-sync will synchronize dags to /tmp/dags > directory. However the worker container command args(airflow run tutorial1 > print_date 2018-06-19T07:57:15.011693+00:00 --local -sd > /root/airflow/dags/dags/example_dags/tutorial1.py) are generated by the > scheduler. Therefore, the task error is as follows > {code:java} > + airflow run tutorial1 print_date 2018-06-19T07:57:15.011693+00:00 --local > -sd /root/airflow/dags/dags/example_dags/tutorial1.py > [2018-06-19 07:57:29,075] {settings.py:174} INFO - setting.configure_orm(): > Using pool settings. pool_size=5, pool_recycle=1800 > [2018-06-19 07:57:29,232] {__init__.py:51} INFO - Using executor LocalExecutor > [2018-06-19 07:57:29,373] {models.py:219} INFO - Filling up the DagBag from > /root/airflow/dags/dags/example_dags/tutorial1.py > [2018-06-19 07:57:29,648] {models.py:310} INFO - File > /usr/local/lib/python2.7/dist-packages/airflow/example_dags/__init__.py > assumed to contain no DAGs. Skipping. > Traceback (most recent call last): > File "/usr/local/bin/airflow", line 32, in > args.func(args) > File "/usr/local/lib/python2.7/dist-packages/airflow/utils/cli.py", line 74, > in wrapper > return f(*args, **kwargs) > File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 475, > in run > dag = get_dag(args) > File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 146, > in get_dag > 'parse.'.format(args.dag_id)) > airflow.exceptions.AirflowException: dag_id could not be found: tutorial1. > Either the dag did not exist or it failed to parse. > {code} > > The log shows that the worker cannot find the corresponding dag, so I think > the environment variable GIT_SYNC_ROOT should be consistent with > dag_volume_mount_path. > The worker's environment variable AIRFLOW__CORE__DAGS_FOLDER is invalid, and > AIRFLOW__CORE__EXECUTOR is also invalid > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2733) Airflow webserver crashes with refresh interval <= 0 in daemon mode
[ https://issues.apache.org/jira/browse/AIRFLOW-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601631#comment-16601631 ] Apache Spark commented on AIRFLOW-2733: --- User 'gwax' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3586 > Airflow webserver crashes with refresh interval <= 0 in daemon mode > --- > > Key: AIRFLOW-2733 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2733 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Priority: Major > > In airflow/bin/cli.py calls to mointor_gunicorn sub-function > ([https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L841)] > are made using a mix of psutil.Process objects and subprocess.Popen objects. > > Case 1: > [https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L878] > Case 2: > [https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L884] > > In the event worker_refresh_interval is <=0 and we are in daemon mode, we end > up calling a non-existent `poll` function on a psutil.Process object: > https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L847 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2616) Pluggable class-based views for APIs
[ https://issues.apache.org/jira/browse/AIRFLOW-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2616: - Assignee: (was: Holden Karau's magical unicorn) > Pluggable class-based views for APIs > > > Key: AIRFLOW-2616 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2616 > Project: Apache Airflow > Issue Type: Improvement > Components: api >Reporter: Verdan Mahmood >Priority: Major > Labels: api_endpoints, architecture > > With the increase of API code base, the current architecture (functional > views) will become messy in no time. Same routes with different http methods > become more confusing in the code base. > We can either use Flask's Pluggable views, which are inspired by Django's > generic class-based views to make our API structure more modular, or we can > look for Flask-RESTful framework. > > http://flask.pocoo.org/docs/0.12/views/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2711) zendesk hook doesn't handle search endpoint properly
[ https://issues.apache.org/jira/browse/AIRFLOW-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2711: - Assignee: Holden Karau's magical unicorn > zendesk hook doesn't handle search endpoint properly > > > Key: AIRFLOW-2711 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2711 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: Chris Chow >Assignee: Holden Karau's magical unicorn >Priority: Major > > the zendesk hook assumes that the api's response includes the expected result > in the key with the same name as the api endpoint, e.g. that the results of a > query to /api/v2/users.json includes the key 'users'. /api/v2/search.json > actually includes results under the key 'results' -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2537) clearing tasks shouldn't set backfill DAG runs to `running`
[ https://issues.apache.org/jira/browse/AIRFLOW-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2537: - Assignee: (was: Holden Karau's magical unicorn) > clearing tasks shouldn't set backfill DAG runs to `running` > --- > > Key: AIRFLOW-2537 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2537 > Project: Apache Airflow > Issue Type: Bug >Reporter: Maxime Beauchemin >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2701) Clean up dangling backfill dagrun
[ https://issues.apache.org/jira/browse/AIRFLOW-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2701: - Assignee: Tao Feng (was: Holden Karau's magical unicorn) > Clean up dangling backfill dagrun > - > > Key: AIRFLOW-2701 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2701 > Project: Apache Airflow > Issue Type: Bug >Reporter: Tao Feng >Assignee: Tao Feng >Priority: Major > > When user tries to backfill and hit ctrol+9, the backfill dagrun will stay as > running state. We should set it to failed if it has unfinished tasks. > > In our production, we see lots of these dangling backfill dagrun which will > cause as one active dagrun in the next backfill. This may prevent user from > backfilling if the max_active_run is reached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2632) AWSBatchOperator allow ints in overrides
[ https://issues.apache.org/jira/browse/AIRFLOW-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2632: - Assignee: Sebastian Schwartz (was: Holden Karau's magical unicorn) > AWSBatchOperator allow ints in overrides > > > Key: AIRFLOW-2632 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2632 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: 2.0.0 >Reporter: Sebastian Schwartz >Assignee: Sebastian Schwartz >Priority: Minor > Labels: pull-request-available > Fix For: 2.0.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > {{The AWSBatchOperator takes an *overrides* dict as a templated paramater: > [https://airflow.readthedocs.io/en/latest/integration.html#aws]}} > > However, the templating does not support ints. This is an issue because in > *overrides,* the *vcpus* and *memory* paramaters must be ints for the AWS > client to correctly submit the job. Removing templating on the *overrides* > issue solves this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2632) AWSBatchOperator allow ints in overrides
[ https://issues.apache.org/jira/browse/AIRFLOW-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2632: - Assignee: Holden Karau's magical unicorn (was: Sebastian Schwartz) > AWSBatchOperator allow ints in overrides > > > Key: AIRFLOW-2632 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2632 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: 2.0.0 >Reporter: Sebastian Schwartz >Assignee: Holden Karau's magical unicorn >Priority: Minor > Labels: pull-request-available > Fix For: 2.0.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > {{The AWSBatchOperator takes an *overrides* dict as a templated paramater: > [https://airflow.readthedocs.io/en/latest/integration.html#aws]}} > > However, the templating does not support ints. This is an issue because in > *overrides,* the *vcpus* and *memory* paramaters must be ints for the AWS > client to correctly submit the job. Removing templating on the *overrides* > issue solves this problem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2711) zendesk hook doesn't handle search endpoint properly
[ https://issues.apache.org/jira/browse/AIRFLOW-2711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2711: - Assignee: (was: Holden Karau's magical unicorn) > zendesk hook doesn't handle search endpoint properly > > > Key: AIRFLOW-2711 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2711 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.9.0 >Reporter: Chris Chow >Priority: Major > > the zendesk hook assumes that the api's response includes the expected result > in the key with the same name as the api endpoint, e.g. that the results of a > query to /api/v2/users.json includes the key 'users'. /api/v2/search.json > actually includes results under the key 'results' -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2633) Retry loop on AWSBatchOperator won't quit
[ https://issues.apache.org/jira/browse/AIRFLOW-2633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2633: - Assignee: Sebastian Schwartz (was: Holden Karau's magical unicorn) > Retry loop on AWSBatchOperator won't quit > - > > Key: AIRFLOW-2633 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2633 > Project: Apache Airflow > Issue Type: Improvement > Components: operators >Affects Versions: 2.0.0 >Reporter: Sebastian Schwartz >Assignee: Sebastian Schwartz >Priority: Major > Labels: patch, pull-request-available > Fix For: 2.0.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > The exponential backoff retry loop that is a fallback for AWSBatchOperator as > a strategy for polling job success does not quit until maximum retries is > reached due to a control flow error. This is a simply one line fix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2733) Airflow webserver crashes with refresh interval <= 0 in daemon mode
[ https://issues.apache.org/jira/browse/AIRFLOW-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2733: - Assignee: (was: Holden Karau's magical unicorn) > Airflow webserver crashes with refresh interval <= 0 in daemon mode > --- > > Key: AIRFLOW-2733 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2733 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Priority: Major > > In airflow/bin/cli.py calls to mointor_gunicorn sub-function > ([https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L841)] > are made using a mix of psutil.Process objects and subprocess.Popen objects. > > Case 1: > [https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L878] > Case 2: > [https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L884] > > In the event worker_refresh_interval is <=0 and we are in daemon mode, we end > up calling a non-existent `poll` function on a psutil.Process object: > https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L847 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2701) Clean up dangling backfill dagrun
[ https://issues.apache.org/jira/browse/AIRFLOW-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2701: - Assignee: Holden Karau's magical unicorn (was: Tao Feng) > Clean up dangling backfill dagrun > - > > Key: AIRFLOW-2701 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2701 > Project: Apache Airflow > Issue Type: Bug >Reporter: Tao Feng >Assignee: Holden Karau's magical unicorn >Priority: Major > > When user tries to backfill and hit ctrol+9, the backfill dagrun will stay as > running state. We should set it to failed if it has unfinished tasks. > > In our production, we see lots of these dangling backfill dagrun which will > cause as one active dagrun in the next backfill. This may prevent user from > backfilling if the max_active_run is reached. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2616) Pluggable class-based views for APIs
[ https://issues.apache.org/jira/browse/AIRFLOW-2616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2616: - Assignee: Holden Karau's magical unicorn > Pluggable class-based views for APIs > > > Key: AIRFLOW-2616 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2616 > Project: Apache Airflow > Issue Type: Improvement > Components: api >Reporter: Verdan Mahmood >Assignee: Holden Karau's magical unicorn >Priority: Major > Labels: api_endpoints, architecture > > With the increase of API code base, the current architecture (functional > views) will become messy in no time. Same routes with different http methods > become more confusing in the code base. > We can either use Flask's Pluggable views, which are inspired by Django's > generic class-based views to make our API structure more modular, or we can > look for Flask-RESTful framework. > > http://flask.pocoo.org/docs/0.12/views/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2625) Create an API to list all the available DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2625: - Assignee: (was: Holden Karau's magical unicorn) > Create an API to list all the available DAGs > > > Key: AIRFLOW-2625 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2625 > Project: Apache Airflow > Issue Type: New Feature > Components: api, DAG >Reporter: Verdan Mahmood >Priority: Major > Labels: api, api-required > > There should be an API to list all the available DAGs in the system. (this is > basically same as the DAGs list page aka Airflow home page) > This should include all the basic information related to a DAG. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1737) set_task_instance_state fails because of strptime
[ https://issues.apache.org/jira/browse/AIRFLOW-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-1737: - Assignee: Tao Feng (was: Holden Karau's magical unicorn) > set_task_instance_state fails because of strptime > - > > Key: AIRFLOW-1737 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1737 > Project: Apache Airflow > Issue Type: Bug > Components: webapp >Reporter: Andre Boechat >Assignee: Tao Feng >Priority: Minor > Attachments: Screenshot_2017-10-18_15-58-29.png > > > Context: > * DAG run triggered manually > * Using the web application to change the state of a task > When trying to set the state of a task, an exception is thrown: *ValueError: > unconverted data remains: ..372649* (look at the attached screenshot). > I think the problem comes from the "execution date" created by manually > triggered DAGs, since the date-time includes a fractional part. In my > database, I see scheduled DAGs with execution dates like "10-18T15:00:00", > while manually triggered ones with dates like "09-21T16:36:16.170988". If we > look at the method *set_task_instance_state* in *airflow.www.views*, we see > that the format string used with *strptime* doesn't consider any fractional > part. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2885) A Bug in www_rbac.utils.get_params
[ https://issues.apache.org/jira/browse/AIRFLOW-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2885: - Assignee: Holden Karau's magical unicorn (was: Xiaodong DENG) > A Bug in www_rbac.utils.get_params > -- > > Key: AIRFLOW-2885 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2885 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Reporter: Xiaodong DENG >Assignee: Holden Karau's magical unicorn >Priority: Critical > > *get_params(page=0, search="abc",showPaused=False)* returns > "_search=abc&showPaused=False_", while it's supposed to return > "page=0&search=abc&showPaused=False". > This is because Python takes 0 as False when it's used in a conditional > statement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2780) Adds IMAP Hook to interact with a mail server
[ https://issues.apache.org/jira/browse/AIRFLOW-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2780: - Assignee: Felix Uellendall (was: Holden Karau's magical unicorn) > Adds IMAP Hook to interact with a mail server > - > > Key: AIRFLOW-2780 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2780 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Felix Uellendall >Assignee: Felix Uellendall >Priority: Major > > This Hook connects to a mail server via IMAP to be able to retrieve email > attachments by using [Python's IMAP > Library.|https://docs.python.org/3.6/library/imaplib.html] > Features: > - `has_mail_attachment`: Can be used in a `Sensor` to check if there is an > attachment on the mail server with the given name. > - `retrieve_mail_attachments`: Can be used in an `Operator` to do sth. with > the attachments returned as list of tuple. > - `download_mail_attachments`: Can be used in an `Operator` to download the > attachment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2885) A Bug in www_rbac.utils.get_params
[ https://issues.apache.org/jira/browse/AIRFLOW-2885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2885: - Assignee: Xiaodong DENG (was: Holden Karau's magical unicorn) > A Bug in www_rbac.utils.get_params > -- > > Key: AIRFLOW-2885 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2885 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Critical > > *get_params(page=0, search="abc",showPaused=False)* returns > "_search=abc&showPaused=False_", while it's supposed to return > "page=0&search=abc&showPaused=False". > This is because Python takes 0 as False when it's used in a conditional > statement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2428) Add AutoScalingRole key to emr_hook
[ https://issues.apache.org/jira/browse/AIRFLOW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2428: - Assignee: (was: Holden Karau's magical unicorn) > Add AutoScalingRole key to emr_hook > --- > > Key: AIRFLOW-2428 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2428 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Reporter: Kyle Hamlin >Priority: Minor > Fix For: 1.10.0 > > > Need to be able to pass the `AutoScalingRole` param to the `run_job_flow` > method for EMR autoscaling to work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2877) Make docs site URL consistent everywhere
[ https://issues.apache.org/jira/browse/AIRFLOW-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2877: - Assignee: Taylor Edmiston (was: Holden Karau's magical unicorn) > Make docs site URL consistent everywhere > > > Key: AIRFLOW-2877 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2877 > Project: Apache Airflow > Issue Type: Improvement > Components: docs >Reporter: Taylor Edmiston >Assignee: Taylor Edmiston >Priority: Minor > > We currently have several references to multiple docs sites throughout the > repo (https://airflow.readthedocs.io/, https://airflow.apache.org/, > https://airflow.incubator.apache.org/, etc). > This PR makes the docs site URL consistent everywhere. > All references to the docs site now point to the latest stable version, with > the one exception being the top-level dev docs site on master in the readme. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2840) cli option to update existing connection
[ https://issues.apache.org/jira/browse/AIRFLOW-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2840: - Assignee: Holden Karau's magical unicorn (was: David) > cli option to update existing connection > > > Key: AIRFLOW-2840 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2840 > Project: Apache Airflow > Issue Type: Wish >Reporter: David >Assignee: Holden Karau's magical unicorn >Priority: Major > > Add cli options to update existing airflow connection > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2875) Env variables should have percent signs escaped before writing to tmp config
[ https://issues.apache.org/jira/browse/AIRFLOW-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2875: - Assignee: (was: Holden Karau's magical unicorn) > Env variables should have percent signs escaped before writing to tmp config > > > Key: AIRFLOW-2875 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2875 > Project: Apache Airflow > Issue Type: Bug > Components: configuration > Environment: Ubuntu > Airflow 1.10rc2 >Reporter: William Horton >Priority: Major > > I encountered this when I was using an environment variable for > `AIRFLOW__CELERY__BROKER_URL`. The airflow worker was able to run and > communicate with the SQS queue, but when it received a task and began to run > it, I encountered an error with this trace: > {code:java} > [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring Traceback (most recent call last): > [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring File "/opt/airflow/venv/bin/airflow", line 32, in > [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring args.func(args) > [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring File > "/opt/airflow/venv/local/lib/python2.7/site-packages/airflow/utils/cli.py", > line 74, in wrapper > [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring return f(*args, **kwargs) > [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring File > "/opt/airflow/venv/local/lib/python2.7/site-packages/airflow/bin/cli.py", > line 460, in run > [2018-08-08 15:19:24,403] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring conf.set(section, option, value) > [2018-08-08 15:19:24,403] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring File > "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py", > line 1239, in set > [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring super(ConfigParser, self).set(section, option, value) > [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring File > "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py", > line 914, in set > [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring value) > [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring File > "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py", > line 392, in before_set > [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring "position %d" % (value, tmp_value.find('%'))) > [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring ValueError: invalid interpolation syntax in > {code} > The issue was that the broker url had a percent sign, and when the cli called > `conf.set(section, option, value)`, it was throwing because it interpreted > the percent as an interpolation. > To avoid this issue, I would propose that the environment variables be > escaped when being written in `utils.configuration.tmp_configuration_copy`, > so that when `conf.set` is called in `bin/cli`, it doesn't throw on these > unescaped values. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2876) Bump version of Tenacity
[ https://issues.apache.org/jira/browse/AIRFLOW-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2876: - Assignee: Holden Karau's magical unicorn > Bump version of Tenacity > > > Key: AIRFLOW-2876 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2876 > Project: Apache Airflow > Issue Type: Bug >Reporter: Fokko Driesprong >Assignee: Holden Karau's magical unicorn >Priority: Major > > Since 4.8.0 is not Python 3.7 compatible, we want to bump the version to > 4.12.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2874) Enable Flask App Builder theme support
[ https://issues.apache.org/jira/browse/AIRFLOW-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2874: - Assignee: Verdan Mahmood (was: Holden Karau's magical unicorn) > Enable Flask App Builder theme support > -- > > Key: AIRFLOW-2874 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2874 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Verdan Mahmood >Assignee: Verdan Mahmood >Priority: Major > > To customize the look and feel of Apache Airflow (an effort towards making > Airflow a whitelabel application), we should enable the support of FAB's > theme, which can be set in configuration. > Theme can be use in conjunction of existing `navbar_color` configuration or > can be used separately by simple unsetting the navbar_color config. > > http://flask-appbuilder.readthedocs.io/en/latest/customizing.html#changing-themes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1737) set_task_instance_state fails because of strptime
[ https://issues.apache.org/jira/browse/AIRFLOW-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-1737: - Assignee: Holden Karau's magical unicorn (was: Tao Feng) > set_task_instance_state fails because of strptime > - > > Key: AIRFLOW-1737 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1737 > Project: Apache Airflow > Issue Type: Bug > Components: webapp >Reporter: Andre Boechat >Assignee: Holden Karau's magical unicorn >Priority: Minor > Attachments: Screenshot_2017-10-18_15-58-29.png > > > Context: > * DAG run triggered manually > * Using the web application to change the state of a task > When trying to set the state of a task, an exception is thrown: *ValueError: > unconverted data remains: ..372649* (look at the attached screenshot). > I think the problem comes from the "execution date" created by manually > triggered DAGs, since the date-time includes a fractional part. In my > database, I see scheduled DAGs with execution dates like "10-18T15:00:00", > while manually triggered ones with dates like "09-21T16:36:16.170988". If we > look at the method *set_task_instance_state* in *airflow.www.views*, we see > that the format string used with *strptime* doesn't consider any fractional > part. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2874) Enable Flask App Builder theme support
[ https://issues.apache.org/jira/browse/AIRFLOW-2874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2874: - Assignee: Holden Karau's magical unicorn (was: Verdan Mahmood) > Enable Flask App Builder theme support > -- > > Key: AIRFLOW-2874 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2874 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Verdan Mahmood >Assignee: Holden Karau's magical unicorn >Priority: Major > > To customize the look and feel of Apache Airflow (an effort towards making > Airflow a whitelabel application), we should enable the support of FAB's > theme, which can be set in configuration. > Theme can be use in conjunction of existing `navbar_color` configuration or > can be used separately by simple unsetting the navbar_color config. > > http://flask-appbuilder.readthedocs.io/en/latest/customizing.html#changing-themes -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2877) Make docs site URL consistent everywhere
[ https://issues.apache.org/jira/browse/AIRFLOW-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2877: - Assignee: Holden Karau's magical unicorn (was: Taylor Edmiston) > Make docs site URL consistent everywhere > > > Key: AIRFLOW-2877 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2877 > Project: Apache Airflow > Issue Type: Improvement > Components: docs >Reporter: Taylor Edmiston >Assignee: Holden Karau's magical unicorn >Priority: Minor > > We currently have several references to multiple docs sites throughout the > repo (https://airflow.readthedocs.io/, https://airflow.apache.org/, > https://airflow.incubator.apache.org/, etc). > This PR makes the docs site URL consistent everywhere. > All references to the docs site now point to the latest stable version, with > the one exception being the top-level dev docs site on master in the readme. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2840) cli option to update existing connection
[ https://issues.apache.org/jira/browse/AIRFLOW-2840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2840: - Assignee: David (was: Holden Karau's magical unicorn) > cli option to update existing connection > > > Key: AIRFLOW-2840 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2840 > Project: Apache Airflow > Issue Type: Wish >Reporter: David >Assignee: David >Priority: Major > > Add cli options to update existing airflow connection > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2780) Adds IMAP Hook to interact with a mail server
[ https://issues.apache.org/jira/browse/AIRFLOW-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2780: - Assignee: Holden Karau's magical unicorn (was: Felix Uellendall) > Adds IMAP Hook to interact with a mail server > - > > Key: AIRFLOW-2780 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2780 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Felix Uellendall >Assignee: Holden Karau's magical unicorn >Priority: Major > > This Hook connects to a mail server via IMAP to be able to retrieve email > attachments by using [Python's IMAP > Library.|https://docs.python.org/3.6/library/imaplib.html] > Features: > - `has_mail_attachment`: Can be used in a `Sensor` to check if there is an > attachment on the mail server with the given name. > - `retrieve_mail_attachments`: Can be used in an `Operator` to do sth. with > the attachments returned as list of tuple. > - `download_mail_attachments`: Can be used in an `Operator` to download the > attachment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1737) set_task_instance_state fails because of strptime
[ https://issues.apache.org/jira/browse/AIRFLOW-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601629#comment-16601629 ] Apache Spark commented on AIRFLOW-1737: --- User '7yl4r' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3754 > set_task_instance_state fails because of strptime > - > > Key: AIRFLOW-1737 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1737 > Project: Apache Airflow > Issue Type: Bug > Components: webapp >Reporter: Andre Boechat >Assignee: Tao Feng >Priority: Minor > Attachments: Screenshot_2017-10-18_15-58-29.png > > > Context: > * DAG run triggered manually > * Using the web application to change the state of a task > When trying to set the state of a task, an exception is thrown: *ValueError: > unconverted data remains: ..372649* (look at the attached screenshot). > I think the problem comes from the "execution date" created by manually > triggered DAGs, since the date-time includes a fractional part. In my > database, I see scheduled DAGs with execution dates like "10-18T15:00:00", > while manually triggered ones with dates like "09-21T16:36:16.170988". If we > look at the method *set_task_instance_state* in *airflow.www.views*, we see > that the format string used with *strptime* doesn't consider any fractional > part. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2428) Add AutoScalingRole key to emr_hook
[ https://issues.apache.org/jira/browse/AIRFLOW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2428: - Assignee: Holden Karau's magical unicorn > Add AutoScalingRole key to emr_hook > --- > > Key: AIRFLOW-2428 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2428 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Reporter: Kyle Hamlin >Assignee: Holden Karau's magical unicorn >Priority: Minor > Fix For: 1.10.0 > > > Need to be able to pass the `AutoScalingRole` param to the `run_job_flow` > method for EMR autoscaling to work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2829) Brush up the CI script for minikube
[ https://issues.apache.org/jira/browse/AIRFLOW-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2829: - Assignee: Holden Karau's magical unicorn (was: Kengo Seki) > Brush up the CI script for minikube > --- > > Key: AIRFLOW-2829 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2829 > Project: Apache Airflow > Issue Type: Bug > Components: ci >Reporter: Kengo Seki >Assignee: Holden Karau's magical unicorn >Priority: Major > > Ran {{scripts/ci/kubernetes/minikube/start_minikube.sh}} locally and found > some points that can be improved: > - minikube version is hard-coded > - Defined but unused variables: {{$_HELM_VERSION}}, {{$_VM_DRIVER}} > - Undefined variables: {{$unameOut}} > - The following lines cause warnings if download is skipped: > {code} > 69 sudo mv bin/minikube /usr/local/bin/minikube > 70 sudo mv bin/kubectl /usr/local/bin/kubectl > {code} > - {{return}} s at line 81 and 96 won't work since it's outside of a function > - To run this script as a non-root user, {{-E}} is required for {{sudo}}. See > https://github.com/kubernetes/minikube/issues/1883. > {code} > 105 _MINIKUBE="sudo PATH=$PATH minikube" > 106 > 107 $_MINIKUBE config set bootstrapper localkube > 108 $_MINIKUBE start --kubernetes-version=${_KUBERNETES_VERSION} > --vm-driver=none > 109 $_MINIKUBE update-context > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2875) Env variables should have percent signs escaped before writing to tmp config
[ https://issues.apache.org/jira/browse/AIRFLOW-2875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2875: - Assignee: Holden Karau's magical unicorn > Env variables should have percent signs escaped before writing to tmp config > > > Key: AIRFLOW-2875 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2875 > Project: Apache Airflow > Issue Type: Bug > Components: configuration > Environment: Ubuntu > Airflow 1.10rc2 >Reporter: William Horton >Assignee: Holden Karau's magical unicorn >Priority: Major > > I encountered this when I was using an environment variable for > `AIRFLOW__CELERY__BROKER_URL`. The airflow worker was able to run and > communicate with the SQS queue, but when it received a task and began to run > it, I encountered an error with this trace: > {code:java} > [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring Traceback (most recent call last): > [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring File "/opt/airflow/venv/bin/airflow", line 32, in > [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring args.func(args) > [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring File > "/opt/airflow/venv/local/lib/python2.7/site-packages/airflow/utils/cli.py", > line 74, in wrapper > [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring return f(*args, **kwargs) > [2018-08-08 15:19:24,402] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring File > "/opt/airflow/venv/local/lib/python2.7/site-packages/airflow/bin/cli.py", > line 460, in run > [2018-08-08 15:19:24,403] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring conf.set(section, option, value) > [2018-08-08 15:19:24,403] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring File > "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py", > line 1239, in set > [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring super(ConfigParser, self).set(section, option, value) > [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring File > "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py", > line 914, in set > [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring value) > [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring File > "/opt/airflow/venv/local/lib/python2.7/site-packages/backports/configparser/_init_.py", > line 392, in before_set > [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring "position %d" % (value, tmp_value.find('%'))) > [2018-08-08 15:19:24,406] {base_task_runner.py:108} INFO - Job 13898: Subtask > mirroring ValueError: invalid interpolation syntax in > {code} > The issue was that the broker url had a percent sign, and when the cli called > `conf.set(section, option, value)`, it was throwing because it interpreted > the percent as an interpolation. > To avoid this issue, I would propose that the environment variables be > escaped when being written in `utils.configuration.tmp_configuration_copy`, > so that when `conf.set` is called in `bin/cli`, it doesn't throw on these > unescaped values. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2876) Bump version of Tenacity
[ https://issues.apache.org/jira/browse/AIRFLOW-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2876: - Assignee: (was: Holden Karau's magical unicorn) > Bump version of Tenacity > > > Key: AIRFLOW-2876 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2876 > Project: Apache Airflow > Issue Type: Bug >Reporter: Fokko Driesprong >Priority: Major > > Since 4.8.0 is not Python 3.7 compatible, we want to bump the version to > 4.12.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2829) Brush up the CI script for minikube
[ https://issues.apache.org/jira/browse/AIRFLOW-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2829: - Assignee: Kengo Seki (was: Holden Karau's magical unicorn) > Brush up the CI script for minikube > --- > > Key: AIRFLOW-2829 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2829 > Project: Apache Airflow > Issue Type: Bug > Components: ci >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Major > > Ran {{scripts/ci/kubernetes/minikube/start_minikube.sh}} locally and found > some points that can be improved: > - minikube version is hard-coded > - Defined but unused variables: {{$_HELM_VERSION}}, {{$_VM_DRIVER}} > - Undefined variables: {{$unameOut}} > - The following lines cause warnings if download is skipped: > {code} > 69 sudo mv bin/minikube /usr/local/bin/minikube > 70 sudo mv bin/kubectl /usr/local/bin/kubectl > {code} > - {{return}} s at line 81 and 96 won't work since it's outside of a function > - To run this script as a non-root user, {{-E}} is required for {{sudo}}. See > https://github.com/kubernetes/minikube/issues/1883. > {code} > 105 _MINIKUBE="sudo PATH=$PATH minikube" > 106 > 107 $_MINIKUBE config set bootstrapper localkube > 108 $_MINIKUBE start --kubernetes-version=${_KUBERNETES_VERSION} > --vm-driver=none > 109 $_MINIKUBE update-context > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2916) Add argument `verify` for AwsHook() and S3 related sensors/operators
[ https://issues.apache.org/jira/browse/AIRFLOW-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2916: - Assignee: Xiaodong DENG (was: Holden Karau's magical unicorn) > Add argument `verify` for AwsHook() and S3 related sensors/operators > > > Key: AIRFLOW-2916 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2916 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks, operators >Reporter: Xiaodong DENG >Assignee: Xiaodong DENG >Priority: Minor > > The AwsHook() and S3-related operators/sensors are depending on package boto3. > In boto3, when we initiate a client or a resource, argument `verify` is > provided (https://boto3.readthedocs.io/en/latest/reference/core/session.html > ). > It is useful when > # users want to use a different CA cert bundle than the one used by botocore. > # users want to have '--no-verify-ssl'. This is especially useful when we're > using on-premises S3 or other implementations of object storage, like IBM's > Cloud Object Storage. > However, this feature is not provided in Airflow for S3 yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1978) WinRM hook and operator
[ https://issues.apache.org/jira/browse/AIRFLOW-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601621#comment-16601621 ] Apache Spark commented on AIRFLOW-1978: --- User 'cloneluke' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3316 > WinRM hook and operator > --- > > Key: AIRFLOW-1978 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1978 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib >Affects Versions: 1.9.0 >Reporter: Luke Bodeen >Assignee: Luke Bodeen >Priority: Minor > Labels: features, windows > Fix For: 2.0.0 > > > I would like to connect and run windows job via winrm protocol. This could > then run any job on windows that you can run via command window in Windows. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2315) S3Hook Extra Extras
[ https://issues.apache.org/jira/browse/AIRFLOW-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2315: - Assignee: Josh Bacon (was: Holden Karau's magical unicorn) > S3Hook Extra Extras > > > Key: AIRFLOW-2315 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2315 > Project: Apache Airflow > Issue Type: Improvement > Components: boto3 >Affects Versions: 1.9.0 >Reporter: Josh Bacon >Assignee: Josh Bacon >Priority: Minor > Labels: beginner, features, newbie, pull-request-available, > starter > Fix For: 1.10.0 > > > Feature improvement request to S3Hook to support additional JSON extra > arguments to apply to both upload and download ExtraArgs. > Allowed Upload Arguments: > [http://boto3.readthedocs.io/en/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS] > Allowed Download Arguments: > [http://boto3.readthedocs.io/en/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2920) Kubernetes pod operator: namespace is a hard requirement
[ https://issues.apache.org/jira/browse/AIRFLOW-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2920: - Assignee: (was: Holden Karau's magical unicorn) > Kubernetes pod operator: namespace is a hard requirement > > > Key: AIRFLOW-2920 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2920 > Project: Apache Airflow > Issue Type: Bug >Reporter: Jon Davies >Priority: Major > > Hello, > I'm using the Kubernetes pod operator for my DAGs, I install Airflow to its > own namespace within my Kubernetes cluster (for example: "testing-airflow") > and I would like pods spun up by that Airflow instance to live in that > namespace. > However, I have to hardcode the namespace into my DAG definition code and so > I have to rebuild the Docker image for Airflow to be able to spin up a > "production-airflow" namespace as the namespace is a hard requirement in the > Python code - it'd be nice if the DAG could just default to its own namespace > if none is defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1886) Failed jobs are not being counted towards max_active_runs_per_dag
[ https://issues.apache.org/jira/browse/AIRFLOW-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-1886: - Assignee: Holden Karau's magical unicorn (was: Oleg Yamin) > Failed jobs are not being counted towards max_active_runs_per_dag > - > > Key: AIRFLOW-1886 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1886 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: 1.8.1 >Reporter: Oleg Yamin >Assignee: Holden Karau's magical unicorn >Priority: Major > > # Currently, I have setup max_active_runs_per_dag = 2 in airflow.cfg but when > a DAG aborts, it will keep submitting next DAG in the queue not counting the > current incomplete DAG that is already in the queue. I am using 1.8.1 but i > see that the jobs.py in latest version is still not addressing this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1158) Multipart uploads to s3 cut off at nearest division
[ https://issues.apache.org/jira/browse/AIRFLOW-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-1158: - Assignee: Holden Karau's magical unicorn (was: Maksim Pecherskiy) > Multipart uploads to s3 cut off at nearest division > --- > > Key: AIRFLOW-1158 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1158 > Project: Apache Airflow > Issue Type: Bug > Components: aws, hooks >Reporter: Maksim Pecherskiy >Assignee: Holden Karau's magical unicorn >Priority: Minor > > When I try to upload a file of say 104MBs, using multipart uploads of 10MB > chunks, I get 10 chunks of 10MBs and that's it. The 4MBs left over do not > get uploaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1952) Add the navigation bar color parameter
[ https://issues.apache.org/jira/browse/AIRFLOW-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601623#comment-16601623 ] Apache Spark commented on AIRFLOW-1952: --- User 'Licht-T' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/2903 > Add the navigation bar color parameter > -- > > Key: AIRFLOW-1952 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1952 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Licht Takeuchi >Assignee: Licht Takeuchi >Priority: Major > Fix For: 2.0.0 > > > We operate multiple Airflow's (eg. Production, Staging, etc.), so we cannot > distinguish which Airflow is. This feature enables us to discern the Airflow > by the color of navigation bar. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2315) S3Hook Extra Extras
[ https://issues.apache.org/jira/browse/AIRFLOW-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2315: - Assignee: Holden Karau's magical unicorn (was: Josh Bacon) > S3Hook Extra Extras > > > Key: AIRFLOW-2315 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2315 > Project: Apache Airflow > Issue Type: Improvement > Components: boto3 >Affects Versions: 1.9.0 >Reporter: Josh Bacon >Assignee: Holden Karau's magical unicorn >Priority: Minor > Labels: beginner, features, newbie, pull-request-available, > starter > Fix For: 1.10.0 > > > Feature improvement request to S3Hook to support additional JSON extra > arguments to apply to both upload and download ExtraArgs. > Allowed Upload Arguments: > [http://boto3.readthedocs.io/en/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS] > Allowed Download Arguments: > [http://boto3.readthedocs.io/en/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2407) Undefined names in Python code
[ https://issues.apache.org/jira/browse/AIRFLOW-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601624#comment-16601624 ] Apache Spark commented on AIRFLOW-2407: --- User 'cclauss' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3299 > Undefined names in Python code > -- > > Key: AIRFLOW-2407 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2407 > Project: Apache Airflow > Issue Type: Bug >Reporter: cclauss >Priority: Minor > Fix For: 2.0.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > flake8 testing of https://github.com/apache/incubator-airflow on Python 3.6.3 > $ *flake8 . --count --select=E901,E999,F821,F822,F823 --show-source > --statistics* > {noformat} > ./airflow/contrib/auth/backends/kerberos_auth.py:67:13: F821 undefined name > 'logging' > logging.error('Password validation for principal %s failed %s', > user_principal, e) > ^ > ./airflow/contrib/hooks/aws_hook.py:75:13: F821 undefined name 'logging' > logging.warning("Option Error in parsing s3 config file") > ^ > ./airflow/contrib/operators/datastore_export_operator.py:105:19: F821 > undefined name 'AirflowException' > raise AirflowException('Operation failed: > result={}'.format(result)) > ^ > ./airflow/contrib/operators/datastore_import_operator.py:94:19: F821 > undefined name 'AirflowException' > raise AirflowException('Operation failed: > result={}'.format(result)) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:62:9: F821 undefined name 'this' > this.log.info('Poking: %s', self.data) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:68:13: F821 undefined name > 'logging' > logging.exception(e) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:71:9: F821 undefined name 'this' > this.log.info('Status of this Poke: %s', status) > ^ > ./airflow/www/app.py:148:17: F821 undefined name 'reload' > reload(e) > ^ > ./tests/operators/hive_operator.py:178:27: F821 undefined name 'cursor_mock' > __enter__=cursor_mock, > ^ > ./tests/operators/hive_operator.py:184:27: F821 undefined name 'get_conn_mock' > __enter__=get_conn_mock, > ^ > ./tests/operators/test_virtualenv_operator.py:166:19: F821 undefined name > 'virtualenv_string_args' > print(virtualenv_string_args) > ^ > ./tests/operators/test_virtualenv_operator.py:167:16: F821 undefined name > 'virtualenv_string_args' > if virtualenv_string_args[0] != virtualenv_string_args[2]: >^ > ./tests/operators/test_virtualenv_operator.py:167:45: F821 undefined name > 'virtualenv_string_args' > if virtualenv_string_args[0] != virtualenv_string_args[2]: > ^ > 13F821 undefined name 'logging' > 13 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1812) Update Logging config example in Updating.md
[ https://issues.apache.org/jira/browse/AIRFLOW-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601614#comment-16601614 ] Apache Spark commented on AIRFLOW-1812: --- User 'Fokko' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/2784 > Update Logging config example in Updating.md > > > Key: AIRFLOW-1812 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1812 > Project: Apache Airflow > Issue Type: Bug >Reporter: Fokko Driesprong >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1158) Multipart uploads to s3 cut off at nearest division
[ https://issues.apache.org/jira/browse/AIRFLOW-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601627#comment-16601627 ] Apache Spark commented on AIRFLOW-1158: --- User 'stellah' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/2337 > Multipart uploads to s3 cut off at nearest division > --- > > Key: AIRFLOW-1158 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1158 > Project: Apache Airflow > Issue Type: Bug > Components: aws, hooks >Reporter: Maksim Pecherskiy >Assignee: Maksim Pecherskiy >Priority: Minor > > When I try to upload a file of say 104MBs, using multipart uploads of 10MB > chunks, I get 10 chunks of 10MBs and that's it. The 4MBs left over do not > get uploaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2920) Kubernetes pod operator: namespace is a hard requirement
[ https://issues.apache.org/jira/browse/AIRFLOW-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2920: - Assignee: Holden Karau's magical unicorn > Kubernetes pod operator: namespace is a hard requirement > > > Key: AIRFLOW-2920 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2920 > Project: Apache Airflow > Issue Type: Bug >Reporter: Jon Davies >Assignee: Holden Karau's magical unicorn >Priority: Major > > Hello, > I'm using the Kubernetes pod operator for my DAGs, I install Airflow to its > own namespace within my Kubernetes cluster (for example: "testing-airflow") > and I would like pods spun up by that Airflow instance to live in that > namespace. > However, I have to hardcode the namespace into my DAG definition code and so > I have to rebuild the Docker image for Airflow to be able to spin up a > "production-airflow" namespace as the namespace is a hard requirement in the > Python code - it'd be nice if the DAG could just default to its own namespace > if none is defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1488) Add a sensor operator to wait on DagRuns
[ https://issues.apache.org/jira/browse/AIRFLOW-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-1488: - Assignee: Yati (was: Holden Karau's magical unicorn) > Add a sensor operator to wait on DagRuns > > > Key: AIRFLOW-1488 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1488 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, operators >Reporter: Yati >Assignee: Yati >Priority: Major > > The > [ExternalTaskSensor|https://airflow.incubator.apache.org/code.html#airflow.operators.ExternalTaskSensor] > operator already allows for encoding dependencies on tasks in external DAGs. > However, when you have teams, each owning multiple small-to-medium sized > DAGs, it is desirable to be able to wait on an external DagRun as a whole. > This allows the owners of an upstream DAG to refactor their code freely by > splitting/squashing task responsibilities, without worrying about dependent > DAGs breaking. > I'll now enumerate the easiest ways of achieving this that come to mind: > * Make all DAGs always have a join DummyOperator in the end, with a task id > that follows some convention, e.g., "{{ dag_id }}.__end__". > * Make ExternalTaskSensor poke for a DagRun instead of TaskInstances when the > external_task_id argument is None. > * Implement a separate DagRunSensor operator. > After considerations, we decided to implement a separate operator, which > we've been using in the team for our workflows, and I think it would make a > good addition to contrib. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2697) Drop snakebite in favour of hdfs3
[ https://issues.apache.org/jira/browse/AIRFLOW-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2697: - Assignee: Julian de Ruiter (was: Holden Karau's magical unicorn) > Drop snakebite in favour of hdfs3 > - > > Key: AIRFLOW-2697 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2697 > Project: Apache Airflow > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Julian de Ruiter >Assignee: Julian de Ruiter >Priority: Major > > The current HdfsHook relies on the snakebite library, which is unfortunately > not compatible with Python 3. To add Python 3 support for the HdfsHook > requires switching to a different library for interacting with HDFS. The > hdfs3 library is an attractive alternative, as it supports Python 3 and seems > to be stable and relatively well supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2374) Airflow fails to show logs
[ https://issues.apache.org/jira/browse/AIRFLOW-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601628#comment-16601628 ] Apache Spark commented on AIRFLOW-2374: --- User 'berislavlopac' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3265 > Airflow fails to show logs > -- > > Key: AIRFLOW-2374 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2374 > Project: Apache Airflow > Issue Type: Bug >Reporter: Berislav Lopac >Assignee: Berislav Lopac >Priority: Blocker > > When viewing a log in the webserver, the page shows a loading gif and the log > never appears. Looking in the Javascript console, the problem appears to be > error 500 when loading the {{get_logs_with_metadata}} endpoint, givving the > following trace: > {code:java} > / ( () ) \___ > /( ( ( ) _)) ) )\ >(( ( )() ) ( ) ) > ((/ ( _( ) ( _) ) ( () ) ) > ( ( ( (_) ((( ) .((_ ) . )_ >( ( )( ( )) ) . ) ( ) > ( ( ( ( ) ( _ ( _) ). ) . ) ) ( ) > ( ( ( ) ( ) ( )) ) _)( ) ) ) > ( ( ( \ ) ((_ ( ) ( ) ) ) ) )) ( ) > ( ( ( ( (_ ( ) ( _) ) ( ) ) ) > ( ( ( ( ( ) (_ ) ) ) _) ) _( ( ) > (( ( )(( _) _) _(_ ( (_ ) >(_((__(_(__(( ( ( | ) ) ) )_))__))_)___) >((__)\\||lll|l||/// \_)) > ( /(/ ( ) ) )\ ) > (( ( ( | | ) ) )\ ) >( /(| / ( )) ) ) )) ) > ( ( _(|)_) ) > ( ||\(|(|)|/|| ) > (|(||(||)) > ( //|/l|||)|\\ \ ) > (/ / // /|//\\ \ \ \ _) > --- > Node: airflow-nods-dev > --- > Traceback (most recent call last): > File > "/opt/airflow/src/apache-airflow/airflow/utils/log/gcs_task_handler.py", line > 113, in _read > remote_log = self.gcs_read(remote_loc) > File > "/opt/airflow/src/apache-airflow/airflow/utils/log/gcs_task_handler.py", line > 131, in gcs_read > return self.hook.download(bkt, blob).decode() > File "/opt/airflow/src/apache-airflow/airflow/contrib/hooks/gcs_hook.py", > line 107, in download > .get_media(bucket=bucket, object=object) \ > File "/usr/local/lib/python3.6/dist-packages/oauth2client/_helpers.py", > line 133, in positional_wrapper > return wrapped(*args, **kwargs) > File "/usr/local/lib/python3.6/dist-packages/googleapiclient/http.py", line > 841, in execute > raise HttpError(resp, content, uri=self.uri) > googleapiclient.errors.HttpError: https://www.googleapis.com/storage/v1/b/bucket-af/o/test-logs%2Fgeneric_transfer_single%2Ftransfer_file%2F2018-04-25T13%3A00%3A51.250983%2B00%3A00%2F1.log?alt=media > returned "Not Found"> > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1982, in > wsgi_app > response = self.full_dispatch_request() > File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1614, in > full_dispatch_request > rv = self.handle_user_exception(e) > File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1517, in > handle_user_exception > reraise(exc_type, exc_value, tb) > File "/usr/local/lib/python3.6/dist-packages/flask/_compat.py", line 33, in > reraise > raise value > File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1612, in > full_dispatch_request > rv = self.dispatch_request() > File "/usr/local/lib/python3.6/dist-packages/flask/app.py", line 1598, in > dispatch_request > return self.view_functions[rule.endpoint](**req.view_args) > File "/usr/local/lib/python3.6/dist-packages/flask_admin/base.py", line 69, > in inner > return self._run_view(f, *args, **kwargs) > File "/usr/local/lib/python3.6/dist-packages/flask_admin/base.py", line > 368, in _run_view > return fn(self, *args, **kwargs) > File "/usr/local/lib/python3.6/dist-packages/flask_login.py", line 758, in > decorated_view > return func(*args, **kwargs) > File "/opt/airflow/src/apache-airflow/airflow/www/utils.py", line 269, in > wrapper > return f(*args, **kwargs) > File "/opt/airfl
[jira] [Assigned] (AIRFLOW-2928) Use uuid.uuid4 to create unique job name
[ https://issues.apache.org/jira/browse/AIRFLOW-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2928: - Assignee: (was: Holden Karau's magical unicorn) > Use uuid.uuid4 to create unique job name > > > Key: AIRFLOW-2928 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2928 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Ken Kawamoto >Priority: Minor > > some components in Airflow use the first 8 bytes of _uuid.uuid1_ to generate > a unique job name. The first 8 bytes, however, seem to come from clock. so if > this is called multiple times in a short time period, two ids will likely > collide. > _uuid.uuid4_ provides random values. > {code} > Python 2.7.15 (default, Jun 17 2018, 12:46:58) > [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> import uuid > >>> for i in range(10): > ... uuid.uuid1() > ... > UUID('e8bc9959-a586-11e8-ab8c-8c859010d0c2') > UUID('e8c254e3-a586-11e8-ac39-8c859010d0c2') > UUID('e8c2560f-a586-11e8-8251-8c859010d0c2') > UUID('e8c256c2-a586-11e8-994a-8c859010d0c2') > UUID('e8c25759-a586-11e8-9ba6-8c859010d0c2') > UUID('e8c257e6-a586-11e8-a854-8c859010d0c2') > UUID('e8c2587d-a586-11e8-89e9-8c859010d0c2') > UUID('e8c2590a-a586-11e8-a825-8c859010d0c2') > UUID('e8c25994-a586-11e8-9421-8c859010d0c2') > UUID('e8c25a21-a586-11e8-83fd-8c859010d0c2') > >>> for i in range(10): > ... uuid.uuid4() > ... > UUID('f1eba69f-18ea-467e-a414-b18d67f34a51') > UUID('aaa4e18e-d4e6-42c9-905c-3cde714c2741') > UUID('82f55c27-69ae-474b-ab9a-afcc7891587c') > UUID('fab63643-ad33-4307-837b-68444fce7240') > UUID('c4efca6c-3d1b-436c-8b09-e9b7f55ccefb') > UUID('58de3a76-9d98-4427-8232-d6d7df2a1904') > UUID('4f0a55e8-1357-4697-a345-e60891685b00') > UUID('0fed47a3-07b6-423e-ae2e-d821c440cb63') > UUID('144b2c55-a9bd-431d-b536-239fb2048a5e') > UUID('d47fd8a0-48e9-4dcc-87f7-42c022c309a8') > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1886) Failed jobs are not being counted towards max_active_runs_per_dag
[ https://issues.apache.org/jira/browse/AIRFLOW-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-1886: - Assignee: Oleg Yamin (was: Holden Karau's magical unicorn) > Failed jobs are not being counted towards max_active_runs_per_dag > - > > Key: AIRFLOW-1886 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1886 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: 1.8.1 >Reporter: Oleg Yamin >Assignee: Oleg Yamin >Priority: Major > > # Currently, I have setup max_active_runs_per_dag = 2 in airflow.cfg but when > a DAG aborts, it will keep submitting next DAG in the queue not counting the > current incomplete DAG that is already in the queue. I am using 1.8.1 but i > see that the jobs.py in latest version is still not addressing this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2899) Sensitive data exposed when Exporting Variables
[ https://issues.apache.org/jira/browse/AIRFLOW-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2899: - Assignee: Kaxil Naik (was: Holden Karau's magical unicorn) > Sensitive data exposed when Exporting Variables > --- > > Key: AIRFLOW-2899 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2899 > Project: Apache Airflow > Issue Type: Task > Components: security >Affects Versions: 1.9.0, 1.8.2, 1.10.0 >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Major > Fix For: 2.0.0 > > Attachments: image-2018-08-14-15-39-17-680.png > > > Currently, the sensitive variable is hidden from being exposed in the Web UI. > However, if the UI is compromised, someone can export variables where all the > sensitive variables are exported in plain text format. > !image-2018-08-14-15-39-17-680.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2394) Kubernetes operator should not require cmd and arguments
[ https://issues.apache.org/jira/browse/AIRFLOW-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601617#comment-16601617 ] Apache Spark commented on AIRFLOW-2394: --- User 'ese' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3289 > Kubernetes operator should not require cmd and arguments > > > Key: AIRFLOW-2394 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2394 > Project: Apache Airflow > Issue Type: Bug >Reporter: Sergio B >Priority: Major > Fix For: 1.10.0, 2.0.0 > > > KubernetesOperator should not require and rely on docker entrypoint for cmds > and docker command for arguments. > If you do not define them in the container spec, kubernetes rely on the > docker entrypoint and command. > https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.10/#container-v1-core > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2697) Drop snakebite in favour of hdfs3
[ https://issues.apache.org/jira/browse/AIRFLOW-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2697: - Assignee: Holden Karau's magical unicorn (was: Julian de Ruiter) > Drop snakebite in favour of hdfs3 > - > > Key: AIRFLOW-2697 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2697 > Project: Apache Airflow > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Julian de Ruiter >Assignee: Holden Karau's magical unicorn >Priority: Major > > The current HdfsHook relies on the snakebite library, which is unfortunately > not compatible with Python 3. To add Python 3 support for the HdfsHook > requires switching to a different library for interacting with HDFS. The > hdfs3 library is an attractive alternative, as it supports Python 3 and seems > to be stable and relatively well supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2404) Message for why a DAG run has not been scheduled missing information
[ https://issues.apache.org/jira/browse/AIRFLOW-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601626#comment-16601626 ] Apache Spark commented on AIRFLOW-2404: --- User 'AetherUnbound' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3286 > Message for why a DAG run has not been scheduled missing information > > > Key: AIRFLOW-2404 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2404 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Reporter: Matthew Bowden >Assignee: Matthew Bowden >Priority: Major > Fix For: 2.0.0 > > > The webserver lists the following reasons for why a DAG run/task instance > might not be started: > * The scheduler is down or under heavy load > * This task instance already ran and had its state changed manually (e.g. > cleared in the UI) > Another reason that the task might not have been started is because of the > following: > * The {{parallelism}} configuration value may be too low > * The {{dag_concurrency}} configuration value may be too low > * The {{max_active_dag_runs_per_dag}} configuration value may be too low > * The {{non_pooled_task_slot_count}} configuration value may be too low -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1488) Add a sensor operator to wait on DagRuns
[ https://issues.apache.org/jira/browse/AIRFLOW-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601625#comment-16601625 ] Apache Spark commented on AIRFLOW-1488: --- User 'milanvdm' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3234 > Add a sensor operator to wait on DagRuns > > > Key: AIRFLOW-1488 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1488 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, operators >Reporter: Yati >Assignee: Yati >Priority: Major > > The > [ExternalTaskSensor|https://airflow.incubator.apache.org/code.html#airflow.operators.ExternalTaskSensor] > operator already allows for encoding dependencies on tasks in external DAGs. > However, when you have teams, each owning multiple small-to-medium sized > DAGs, it is desirable to be able to wait on an external DagRun as a whole. > This allows the owners of an upstream DAG to refactor their code freely by > splitting/squashing task responsibilities, without worrying about dependent > DAGs breaking. > I'll now enumerate the easiest ways of achieving this that come to mind: > * Make all DAGs always have a join DummyOperator in the end, with a task id > that follows some convention, e.g., "{{ dag_id }}.__end__". > * Make ExternalTaskSensor poke for a DagRun instead of TaskInstances when the > external_task_id argument is None. > * Implement a separate DagRunSensor operator. > After considerations, we decided to implement a separate operator, which > we've been using in the team for our workflows, and I think it would make a > good addition to contrib. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2916) Add argument `verify` for AwsHook() and S3 related sensors/operators
[ https://issues.apache.org/jira/browse/AIRFLOW-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2916: - Assignee: Holden Karau's magical unicorn (was: Xiaodong DENG) > Add argument `verify` for AwsHook() and S3 related sensors/operators > > > Key: AIRFLOW-2916 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2916 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks, operators >Reporter: Xiaodong DENG >Assignee: Holden Karau's magical unicorn >Priority: Minor > > The AwsHook() and S3-related operators/sensors are depending on package boto3. > In boto3, when we initiate a client or a resource, argument `verify` is > provided (https://boto3.readthedocs.io/en/latest/reference/core/session.html > ). > It is useful when > # users want to use a different CA cert bundle than the one used by botocore. > # users want to have '--no-verify-ssl'. This is especially useful when we're > using on-premises S3 or other implementations of object storage, like IBM's > Cloud Object Storage. > However, this feature is not provided in Airflow for S3 yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-855) Security - Airflow SQLAlchemy PickleType Allows for Code Execution
[ https://issues.apache.org/jira/browse/AIRFLOW-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601620#comment-16601620 ] Apache Spark commented on AIRFLOW-855: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/2132 > Security - Airflow SQLAlchemy PickleType Allows for Code Execution > -- > > Key: AIRFLOW-855 > URL: https://issues.apache.org/jira/browse/AIRFLOW-855 > Project: Apache Airflow > Issue Type: Bug >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Attachments: test_dag.txt > > > Impact: Anyone able to modify the application's underlying database, or a > computer where certain DAG tasks are executed, may execute arbitrary code on > the Airflow host. > Location: The XCom class in /airflow-internal-master/airflow/models.py > Description: Airflow uses the SQLAlchemy object-relational mapping (ORM) to > allow for a database agnostic, object-oriented manipulation of application > data. You express database tables and values using Python (in this > application's use) classes, and the ORM transparently manipulates the > underlying database, when you programatically access these structures. > Airflow defines the following class, defining an XCom's11 ORM model: > {code} > class XCom(Base): > """ > Base class for XCom objects. > """ > __tablename__ = "xcom" > id = Column(Integer, primary_key=True) > key = Column(String(512)) > value = Column(PickleType(pickler=dill)) > timestamp = Column( > DateTime, default=func.now(), nullable=False) > execution_date = Column(DateTime, nullable=False) > {code} > XComs are used for inter-task communication, and their values are either > defined in a DAG, or the return value of the python_callable() function or > the task's execute() method, executed on an remote host. XCom values are, > according to this model, of the PickleType, meaning that objects assigned to > the value column are transparently serialized (when being written to) and > deserialized (when being read from). The deserialization of user- controlled > pickle objects allows for the execution of arbitrary code. This means that > "slaves" (where DAG code is executed) can compromise "masters" (where DAGs > are defined in code) by returning an object that, when serialized (and > subsequently deserialized), causes remote code execution. This can also be > triggered by anyone who has write access to this portion of the database. > Note: NCC Group plans to meet with developers in the coming days to discuss > this finding, and it will be updated to reflect any additional insight > provided by this meeting. > Reproduction Steps: > 1. Configure a local instance of Airflow. > 2. Insert the attached DAG into your AIRFLOW_HOME/dags directory. > This example models a slave returning a malicious object to a task's > python_callable by creating a portable object (with reduce) containing a > reverse shell and pushing it as an XCom's value. This value is serialized > upon xcom_push and deserialized upon xcom_pull. > In an actual exploit scenario, this value would be DAG function's return > value, as assigned by code within the function, executing on a malicious > remote machine. > 3. Start a netcat listener on your machine's port > 4. Execute this task from the command line with airflow run push 2016-11-17. > Note that your netcat listener has received a shell connect-back. > Remediation: Consider the use of a custom SQLAlchemy data type that performs > this transparent serialization and deserialization, but with JSON (a > text-based exchange format), rather than pickles (which may contain code). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1488) Add a sensor operator to wait on DagRuns
[ https://issues.apache.org/jira/browse/AIRFLOW-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-1488: - Assignee: Holden Karau's magical unicorn (was: Yati) > Add a sensor operator to wait on DagRuns > > > Key: AIRFLOW-1488 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1488 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, operators >Reporter: Yati >Assignee: Holden Karau's magical unicorn >Priority: Major > > The > [ExternalTaskSensor|https://airflow.incubator.apache.org/code.html#airflow.operators.ExternalTaskSensor] > operator already allows for encoding dependencies on tasks in external DAGs. > However, when you have teams, each owning multiple small-to-medium sized > DAGs, it is desirable to be able to wait on an external DagRun as a whole. > This allows the owners of an upstream DAG to refactor their code freely by > splitting/squashing task responsibilities, without worrying about dependent > DAGs breaking. > I'll now enumerate the easiest ways of achieving this that come to mind: > * Make all DAGs always have a join DummyOperator in the end, with a task id > that follows some convention, e.g., "{{ dag_id }}.__end__". > * Make ExternalTaskSensor poke for a DagRun instead of TaskInstances when the > external_task_id argument is None. > * Implement a separate DagRunSensor operator. > After considerations, we decided to implement a separate operator, which > we've been using in the team for our workflows, and I think it would make a > good addition to contrib. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2315) S3Hook Extra Extras
[ https://issues.apache.org/jira/browse/AIRFLOW-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601618#comment-16601618 ] Apache Spark commented on AIRFLOW-2315: --- User 'jbacon' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3475 > S3Hook Extra Extras > > > Key: AIRFLOW-2315 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2315 > Project: Apache Airflow > Issue Type: Improvement > Components: boto3 >Affects Versions: 1.9.0 >Reporter: Josh Bacon >Assignee: Josh Bacon >Priority: Minor > Labels: beginner, features, newbie, pull-request-available, > starter > Fix For: 1.10.0 > > > Feature improvement request to S3Hook to support additional JSON extra > arguments to apply to both upload and download ExtraArgs. > Allowed Upload Arguments: > [http://boto3.readthedocs.io/en/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS] > Allowed Download Arguments: > [http://boto3.readthedocs.io/en/latest/reference/customizations/s3.html#boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS] > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2961) Speed up test_backfill_examples test
[ https://issues.apache.org/jira/browse/AIRFLOW-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2961: - Assignee: (was: Holden Karau's magical unicorn) > Speed up test_backfill_examples test > > > Key: AIRFLOW-2961 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2961 > Project: Apache Airflow > Issue Type: Bug >Reporter: Fokko Driesprong >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2899) Sensitive data exposed when Exporting Variables
[ https://issues.apache.org/jira/browse/AIRFLOW-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2899: - Assignee: Holden Karau's magical unicorn (was: Kaxil Naik) > Sensitive data exposed when Exporting Variables > --- > > Key: AIRFLOW-2899 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2899 > Project: Apache Airflow > Issue Type: Task > Components: security >Affects Versions: 1.9.0, 1.8.2, 1.10.0 >Reporter: Kaxil Naik >Assignee: Holden Karau's magical unicorn >Priority: Major > Fix For: 2.0.0 > > Attachments: image-2018-08-14-15-39-17-680.png > > > Currently, the sensitive variable is hidden from being exposed in the Web UI. > However, if the UI is compromised, someone can export variables where all the > sensitive variables are exported in plain text format. > !image-2018-08-14-15-39-17-680.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2928) Use uuid.uuid4 to create unique job name
[ https://issues.apache.org/jira/browse/AIRFLOW-2928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2928: - Assignee: Holden Karau's magical unicorn > Use uuid.uuid4 to create unique job name > > > Key: AIRFLOW-2928 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2928 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Ken Kawamoto >Assignee: Holden Karau's magical unicorn >Priority: Minor > > some components in Airflow use the first 8 bytes of _uuid.uuid1_ to generate > a unique job name. The first 8 bytes, however, seem to come from clock. so if > this is called multiple times in a short time period, two ids will likely > collide. > _uuid.uuid4_ provides random values. > {code} > Python 2.7.15 (default, Jun 17 2018, 12:46:58) > [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> import uuid > >>> for i in range(10): > ... uuid.uuid1() > ... > UUID('e8bc9959-a586-11e8-ab8c-8c859010d0c2') > UUID('e8c254e3-a586-11e8-ac39-8c859010d0c2') > UUID('e8c2560f-a586-11e8-8251-8c859010d0c2') > UUID('e8c256c2-a586-11e8-994a-8c859010d0c2') > UUID('e8c25759-a586-11e8-9ba6-8c859010d0c2') > UUID('e8c257e6-a586-11e8-a854-8c859010d0c2') > UUID('e8c2587d-a586-11e8-89e9-8c859010d0c2') > UUID('e8c2590a-a586-11e8-a825-8c859010d0c2') > UUID('e8c25994-a586-11e8-9421-8c859010d0c2') > UUID('e8c25a21-a586-11e8-83fd-8c859010d0c2') > >>> for i in range(10): > ... uuid.uuid4() > ... > UUID('f1eba69f-18ea-467e-a414-b18d67f34a51') > UUID('aaa4e18e-d4e6-42c9-905c-3cde714c2741') > UUID('82f55c27-69ae-474b-ab9a-afcc7891587c') > UUID('fab63643-ad33-4307-837b-68444fce7240') > UUID('c4efca6c-3d1b-436c-8b09-e9b7f55ccefb') > UUID('58de3a76-9d98-4427-8232-d6d7df2a1904') > UUID('4f0a55e8-1357-4697-a345-e60891685b00') > UUID('0fed47a3-07b6-423e-ae2e-d821c440cb63') > UUID('144b2c55-a9bd-431d-b536-239fb2048a5e') > UUID('d47fd8a0-48e9-4dcc-87f7-42c022c309a8') > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2961) Speed up test_backfill_examples test
[ https://issues.apache.org/jira/browse/AIRFLOW-2961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2961: - Assignee: Holden Karau's magical unicorn > Speed up test_backfill_examples test > > > Key: AIRFLOW-2961 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2961 > Project: Apache Airflow > Issue Type: Bug >Reporter: Fokko Driesprong >Assignee: Holden Karau's magical unicorn >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1158) Multipart uploads to s3 cut off at nearest division
[ https://issues.apache.org/jira/browse/AIRFLOW-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-1158: - Assignee: Maksim Pecherskiy (was: Holden Karau's magical unicorn) > Multipart uploads to s3 cut off at nearest division > --- > > Key: AIRFLOW-1158 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1158 > Project: Apache Airflow > Issue Type: Bug > Components: aws, hooks >Reporter: Maksim Pecherskiy >Assignee: Maksim Pecherskiy >Priority: Minor > > When I try to upload a file of say 104MBs, using multipart uploads of 10MB > chunks, I get 10 chunks of 10MBs and that's it. The 4MBs left over do not > get uploaded. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2697) Drop snakebite in favour of hdfs3
[ https://issues.apache.org/jira/browse/AIRFLOW-2697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601619#comment-16601619 ] Apache Spark commented on AIRFLOW-2697: --- User 'jrderuiter' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3560 > Drop snakebite in favour of hdfs3 > - > > Key: AIRFLOW-2697 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2697 > Project: Apache Airflow > Issue Type: Improvement >Affects Versions: 1.9.0 >Reporter: Julian de Ruiter >Assignee: Julian de Ruiter >Priority: Major > > The current HdfsHook relies on the snakebite library, which is unfortunately > not compatible with Python 3. To add Python 3 support for the HdfsHook > requires switching to a different library for interacting with HDFS. The > hdfs3 library is an attractive alternative, as it supports Python 3 and seems > to be stable and relatively well supported. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2574) initdb fails when mysql password contains percent sign
[ https://issues.apache.org/jira/browse/AIRFLOW-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2574: - Assignee: Holden Karau's magical unicorn > initdb fails when mysql password contains percent sign > -- > > Key: AIRFLOW-2574 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2574 > Project: Apache Airflow > Issue Type: Bug > Components: db >Reporter: Zihao Zhang >Assignee: Holden Karau's magical unicorn >Priority: Minor > > [db.py|https://github.com/apache/incubator-airflow/blob/3358551c8e73d9019900f7a85f18ebfd88591450/airflow/utils/db.py#L345] > uses > [config.set_main_option|http://alembic.zzzcomputing.com/en/latest/api/config.html#alembic.config.Config.set_main_option] > which says "A raw percent sign not part of an interpolation symbol must > therefore be escaped" > When there is a percent sign in database connection string, this will crash > due to bad interpolation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1886) Failed jobs are not being counted towards max_active_runs_per_dag
[ https://issues.apache.org/jira/browse/AIRFLOW-1886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601615#comment-16601615 ] Apache Spark commented on AIRFLOW-1886: --- User 'oyamin' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/2846 > Failed jobs are not being counted towards max_active_runs_per_dag > - > > Key: AIRFLOW-1886 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1886 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun >Affects Versions: 1.8.1 >Reporter: Oleg Yamin >Assignee: Oleg Yamin >Priority: Major > > # Currently, I have setup max_active_runs_per_dag = 2 in airflow.cfg but when > a DAG aborts, it will keep submitting next DAG in the queue not counting the > current incomplete DAG that is already in the queue. I am using 1.8.1 but i > see that the jobs.py in latest version is still not addressing this issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2408) Remove coveralls deps
[ https://issues.apache.org/jira/browse/AIRFLOW-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601616#comment-16601616 ] Apache Spark commented on AIRFLOW-2408: --- User 'Fokko' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3295 > Remove coveralls deps > - > > Key: AIRFLOW-2408 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2408 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Fokko Driesprong >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2222) GoogleCloudStorageHook.copy fails for large files between locations
[ https://issues.apache.org/jira/browse/AIRFLOW-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601622#comment-16601622 ] Apache Spark commented on AIRFLOW-: --- User 'berislavlopac' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3264 > GoogleCloudStorageHook.copy fails for large files between locations > --- > > Key: AIRFLOW- > URL: https://issues.apache.org/jira/browse/AIRFLOW- > Project: Apache Airflow > Issue Type: Bug >Reporter: Berislav Lopac >Assignee: Berislav Lopac >Priority: Major > Fix For: 1.10.0, 2.0.0 > > > When copying large files (confirmed for around 3GB) between buckets in > different projects, the operation fails and the Google API returns error > [413—Payload Too > Large|https://cloud.google.com/storage/docs/json_api/v1/status-codes#413_Payload_Too_Large]. > The documentation for the error says: > {quote}The Cloud Storage JSON API supports up to 5 TB objects. > This error may, alternatively, arise if copying objects between locations > and/or storage classes can not complete within 30 seconds. In this case, use > the > [Rewrite|https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite] > method instead.{quote} > The reason seems to be that the {{GoogleCloudStorageHook.copy}} is using the > API {{copy}} method. > h3. Proposed Solution > There are two potential solutions: > # Implement {{GoogleCloudStorageHook.rewrite}} method which can be called > from operators and other objects to ensure successful execution. This method > is more flexible but requires changes both in the {{GoogleCloudStorageHook}} > class and any other classes that use it for copying files to ensure that they > explicitly call {{rewrite}} when needed. > # Modify {{GoogleCloudStorageHook.copy}} to determine when to use {{rewrite}} > instead of {{copy}} underneath. This requires updating only the > {{GoogleCloudStorageHook}} class, but the logic might not cover all the edge > cases and could be difficult to implement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2574) initdb fails when mysql password contains percent sign
[ https://issues.apache.org/jira/browse/AIRFLOW-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-2574: - Assignee: (was: Holden Karau's magical unicorn) > initdb fails when mysql password contains percent sign > -- > > Key: AIRFLOW-2574 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2574 > Project: Apache Airflow > Issue Type: Bug > Components: db >Reporter: Zihao Zhang >Priority: Minor > > [db.py|https://github.com/apache/incubator-airflow/blob/3358551c8e73d9019900f7a85f18ebfd88591450/airflow/utils/db.py#L345] > uses > [config.set_main_option|http://alembic.zzzcomputing.com/en/latest/api/config.html#alembic.config.Config.set_main_option] > which says "A raw percent sign not part of an interpolation symbol must > therefore be escaped" > When there is a percent sign in database connection string, this will crash > due to bad interpolation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2457) Upgrade FAB version in setup.py to support timezone
[ https://issues.apache.org/jira/browse/AIRFLOW-2457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601609#comment-16601609 ] Apache Spark commented on AIRFLOW-2457: --- User 'jgao54' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3349 > Upgrade FAB version in setup.py to support timezone > --- > > Key: AIRFLOW-2457 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2457 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: 1.10 >Reporter: Joy Gao >Assignee: Joy Gao >Priority: Major > Fix For: 1.10.0 > > > FAB 1.9.6 doesn't support datetime with timezones, upgrade to 1.10.0 will fix > this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1514) View Log goes out of memory
[ https://issues.apache.org/jira/browse/AIRFLOW-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601601#comment-16601601 ] Apache Spark commented on AIRFLOW-1514: --- User 'NielsZeilemaker' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/2526 > View Log goes out of memory > --- > > Key: AIRFLOW-1514 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1514 > Project: Apache Airflow > Issue Type: Bug >Reporter: Niels Zeilemaker >Assignee: Niels Zeilemaker >Priority: Major > > If you attempt to view a logfile which is big, we get a out of memory > exception from jinja. > Let's only show the tail of the logfile + a link to a raw log page which > doesn't use jinja. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1867) sendgrid fails on python3 with attachments
[ https://issues.apache.org/jira/browse/AIRFLOW-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-1867: - Assignee: Holden Karau's magical unicorn > sendgrid fails on python3 with attachments > -- > > Key: AIRFLOW-1867 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1867 > Project: Apache Airflow > Issue Type: Bug >Reporter: Scott Kruger >Assignee: Holden Karau's magical unicorn >Priority: Minor > > Sendgrid emails raise an exception on python 3 when attaching files due to > {{base64.b64encode}} returning {{bytes}} rather than {{unicode/string}} (see: > https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/utils/sendgrid.py#L69). > The fix is simple: decode the base64 data to `utf-8`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2409) At user creation allow the password as a parameter
[ https://issues.apache.org/jira/browse/AIRFLOW-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601596#comment-16601596 ] Apache Spark commented on AIRFLOW-2409: --- User 'Fokko' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3302 > At user creation allow the password as a parameter > -- > > Key: AIRFLOW-2409 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2409 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Fix For: 1.10.0, 2.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1867) sendgrid fails on python3 with attachments
[ https://issues.apache.org/jira/browse/AIRFLOW-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601600#comment-16601600 ] Apache Spark commented on AIRFLOW-1867: --- User 'thesquelched' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/2824 > sendgrid fails on python3 with attachments > -- > > Key: AIRFLOW-1867 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1867 > Project: Apache Airflow > Issue Type: Bug >Reporter: Scott Kruger >Priority: Minor > > Sendgrid emails raise an exception on python 3 when attaching files due to > {{base64.b64encode}} returning {{bytes}} rather than {{unicode/string}} (see: > https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/utils/sendgrid.py#L69). > The fix is simple: decode the base64 data to `utf-8`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-51) docker_operator - Improve the integration with swarm
[ https://issues.apache.org/jira/browse/AIRFLOW-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601602#comment-16601602 ] Apache Spark commented on AIRFLOW-51: - User 'asnir' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/2491 > docker_operator - Improve the integration with swarm > > > Key: AIRFLOW-51 > URL: https://issues.apache.org/jira/browse/AIRFLOW-51 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Amikam Snir >Priority: Minor > Labels: operators > > Swarm is not well supported by docker_operator, due to this issue: > https://github.com/docker/swarm/issues/475 > In order to fix it, we will use cpu_shares instead of cpus. > p.s. The default value is None. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1998) Implement Databricks Operator for jobs/run-now endpoint
[ https://issues.apache.org/jira/browse/AIRFLOW-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-1998: - Assignee: Israel Knight (was: Holden Karau's magical unicorn) > Implement Databricks Operator for jobs/run-now endpoint > --- > > Key: AIRFLOW-1998 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1998 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks, operators >Affects Versions: 1.9.0 >Reporter: Diego Rabatone Oliveira >Assignee: Israel Knight >Priority: Major > > Implement a Operator to deal with Databricks '2.0/jobs/run-now' API Endpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2437) Add PubNub to README
[ https://issues.apache.org/jira/browse/AIRFLOW-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601611#comment-16601611 ] Apache Spark commented on AIRFLOW-2437: --- User 'jzucker2' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3332 > Add PubNub to README > > > Key: AIRFLOW-2437 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2437 > Project: Apache Airflow > Issue Type: Wish >Reporter: Jordan Zucker >Assignee: Jordan Zucker >Priority: Trivial > Fix For: 2.0.0 > > > Add PubNub to current list of Airflow users -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1423) Enhance scheduler logs to better explain DAG runs decisions
[ https://issues.apache.org/jira/browse/AIRFLOW-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-1423: - Assignee: Holden Karau's magical unicorn > Enhance scheduler logs to better explain DAG runs decisions > --- > > Key: AIRFLOW-1423 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1423 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler >Reporter: Ultrabug >Assignee: Holden Karau's magical unicorn >Priority: Major > Attachments: add_scheduler_logs.patch > > > One of the most frustrating topic for users is usually related to their > understanding on the scheduler decisions about running a DAG or not. > It would be wise to add more logs in the jobs creation decision so that it > gets more clear whether a DAG is run or not and why. > This patch adds such simple and useful logs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2333) Add Segment Hook to Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16601610#comment-16601610 ] Apache Spark commented on AIRFLOW-2333: --- User 'jzucker2' has created a pull request for this issue: https://github.com/apache/incubator-airflow/pull/3237 > Add Segment Hook to Airflow > --- > > Key: AIRFLOW-2333 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2333 > Project: Apache Airflow > Issue Type: New Feature > Components: contrib, hooks >Reporter: Jordan Zucker >Assignee: Jordan Zucker >Priority: Minor > Fix For: 1.10.0, 2.0.0 > > > [Segment]([https://segment.com/)] is used by many to track analytics. Would > be nice to allow Airflow to interact with Segment and store username and > password with encryption in its database. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1514) View Log goes out of memory
[ https://issues.apache.org/jira/browse/AIRFLOW-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-1514: - Assignee: Holden Karau's magical unicorn (was: Niels Zeilemaker) > View Log goes out of memory > --- > > Key: AIRFLOW-1514 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1514 > Project: Apache Airflow > Issue Type: Bug >Reporter: Niels Zeilemaker >Assignee: Holden Karau's magical unicorn >Priority: Major > > If you attempt to view a logfile which is big, we get a out of memory > exception from jinja. > Let's only show the tail of the logfile + a link to a raw log page which > doesn't use jinja. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1342) S3 connections use host "extra" parameter, not Host parameter on connection
[ https://issues.apache.org/jira/browse/AIRFLOW-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned AIRFLOW-1342: - Assignee: Holden Karau's magical unicorn (was: Victor Duarte Diniz Monteiro) > S3 connections use host "extra" parameter, not Host parameter on connection > --- > > Key: AIRFLOW-1342 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1342 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.8, 1.8.1, 1.8.2 >Reporter: Bryan Vanderhoof >Assignee: Holden Karau's magical unicorn >Priority: Trivial > > The ability to connect to S3 using Sigv4 was added to resolve AIRFLOW-1034. > That implementation expects a "host" parameter to be added to the JSON object > in the Extra field on the connection object. > However, the S3 connection object contains a top-level Host variable, which > remains unused. This is deeply counterintuitive. The default should be to > leverage this Host variable (and optionally the "host" parameter in the Extra > object, to maintain compatibility with the existing implementation). -- This message was sent by Atlassian JIRA (v7.6.3#76005)