date:20180305

[jira] [Created] (AIRFLOW-2184) Create a druid_checker operator

2018-03-05 Thread Tao Feng (JIRA)

Tao Feng created AIRFLOW-2184:
-

 Summary: Create a druid_checker operator
 Key: AIRFLOW-2184
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2184
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Tao Feng
Assignee: Tao Feng


Once we agree on the extended interface provided through druid_hook in 
AIRFLOW-2183, we would like to create a druid_checker operator to do basic data 
quality checking on data in druid.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (AIRFLOW-2183) Refactor DruidHook to able to issue arbitrary query to druid broker

2018-03-05 Thread Tao Feng (JIRA)

Tao Feng created AIRFLOW-2183:
-

 Summary: Refactor DruidHook to able to issue arbitrary query to 
druid broker
 Key: AIRFLOW-2183
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2183
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Tao Feng
Assignee: Tao Feng


Currently the druidhook only maintains connection to overlord and is used 
solely for ingestion purpose. We would like to extend the hook so that it could 
be utilized to issue a query to druid broker. 

There are couples of benefits:
 # Allow any operator to issue query to druid broker.
 # Allow us later on to create a druid_checker for data quality purpose. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2175) Failed to upgradedb 1.8.2 -> 1.9.0

2018-03-05 Thread Damian Momot (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387346#comment-16387346
 ] 

Damian Momot commented on AIRFLOW-2175:
---

This DAG is no longer in dag files but still exists in DB. For 99% it wasn't 
sub dag. It was probably left in such strange state. As I remember this 
installation was initialized on Airflow 1.8.1, then upgraded to 1.8.2.

Field is nullable in DB, DB is MySQL

 

 
{code:java}
DESCRIBE dag;
 
Field   TypeNull   Key Extra
...
'fileloc'   'varchar(2000)' 'YES'  ''  NULL ''
{code}
 

As a workaround I'll just manually alter record in DB, but null check is 
probably a good idea, especially that field is nullable in DB - it might affect 
others

 

> Failed to upgradedb 1.8.2 -> 1.9.0
> --
>
> Key: AIRFLOW-2175
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2175
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Affects Versions: 1.9.0
>Reporter: Damian Momot
>Priority: Critical
>
> We've got airflow installation with hundreds of DAGs and thousands of tasks.
> During upgrade (1.8.2 -> 1.9.0) we've got following error.
> After analyzing stacktrace i've found that it's most likely caused by None 
> value in 'fileloc' field of Dag column. I checked database and indeed we've 
> got one record with such value:
>  
>  
> {code:java}
> SELECT COUNT(*) FROM dag WHERE fileloc IS NULL;
> 1
> SELECT COUNT(*) FROM dag;
> 343
> {code}
>  
>  
> {code:java}
> Traceback (most recent call last):
>  File "/usr/local/bin/airflow", line 27, in 
>  args.func(args)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 913, 
> in upgradedb
>  db_utils.upgradedb()
>  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 320, 
> in upgradedb
>  command.upgrade(config, 'heads')
>  File "/usr/local/lib/python2.7/dist-packages/alembic/command.py", line 174, 
> in upgrade
>  script.run_env()
>  File "/usr/local/lib/python2.7/dist-packages/alembic/script/base.py", line 
> 416, in run_env
>  util.load_python_file(self.dir, 'env.py')
>  File "/usr/local/lib/python2.7/dist-packages/alembic/util/pyfiles.py", line 
> 93, in load_python_file
>  module = load_module_py(module_id, path)
>  File "/usr/local/lib/python2.7/dist-packages/alembic/util/compat.py", line 
> 79, in load_module_py
>  mod = imp.load_source(module_id, path, fp)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/migrations/env.py", 
> line 86, in 
>  run_migrations_online()
>  File "/usr/local/lib/python2.7/dist-packages/airflow/migrations/env.py", 
> line 81, in run_migrations_online
>  context.run_migrations()
>  File "", line 8, in run_migrations
>  File 
> "/usr/local/lib/python2.7/dist-packages/alembic/runtime/environment.py", line 
> 807, in run_migrations
>  self.get_context().run_migrations(**kw)
>  File "/usr/local/lib/python2.7/dist-packages/alembic/runtime/migration.py", 
> line 321, in run_migrations
>  step.migration_fn(**kw)
>  File 
> "/usr/local/lib/python2.7/dist-packages/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py",
>  line 63, in upgrade
>  dag = dagbag.get_dag(ti.dag_id)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 232, 
> in get_dag
>  filepath=orm_dag.fileloc, only_if_updated=False)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 249, 
> in process_file
>  if not os.path.isfile(filepath):
>  File "/usr/lib/python2.7/genericpath.py", line 29, in isfile
>  st = os.stat(path)
> TypeError: coercing to Unicode: need string or buffer, NoneType found{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (AIRFLOW-2118) get_pandas_df does always pass a list of rows to be parsed

2018-03-05 Thread Diane Ivy (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Diane Ivy closed AIRFLOW-2118.
--
Resolution: Fixed

Fixed with https://github.com/apache/incubator-airflow/pull/3066

> get_pandas_df does always pass a list of rows to be parsed
> --
>
> Key: AIRFLOW-2118
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2118
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, hooks
>Affects Versions: 1.9.0
> Environment: pandas-gbp 0.3.1
>Reporter: Diane Ivy
>Assignee: Diane Ivy
>Priority: Minor
>  Labels: easyfix
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> While trying to parse the pages in get_pandas_df if only one page is returned 
> it starts popping off each row and then the gbq_parse_data works incorrectly.
> {{while len(pages) > 0:}}
> {{    page = pages.pop()}}
> {{    dataframe_list.append(gbq_parse_data(schema, page))}}
> Possible solution:
> {{from google.cloud import bigquery}}
> {{if isinstance(pages[0], bigquery.table.Row):}}
> {{    pages = [pages]}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2118) get_pandas_df does always pass a list of rows to be parsed

2018-03-05 Thread Diane Ivy (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387075#comment-16387075
 ] 

Diane Ivy commented on AIRFLOW-2118:


[~Yuyin.Yang] This seems to fixed in the latest version since it no longer uses 
the gbq_parse_data.

> get_pandas_df does always pass a list of rows to be parsed
> --
>
> Key: AIRFLOW-2118
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2118
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, hooks
>Affects Versions: 1.9.0
> Environment: pandas-gbp 0.3.1
>Reporter: Diane Ivy
>Assignee: Diane Ivy
>Priority: Minor
>  Labels: easyfix
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> While trying to parse the pages in get_pandas_df if only one page is returned 
> it starts popping off each row and then the gbq_parse_data works incorrectly.
> {{while len(pages) > 0:}}
> {{    page = pages.pop()}}
> {{    dataframe_list.append(gbq_parse_data(schema, page))}}
> Possible solution:
> {{from google.cloud import bigquery}}
> {{if isinstance(pages[0], bigquery.table.Row):}}
> {{    pages = [pages]}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2181) Convert DOS formatted files to UNIX

2018-03-05 Thread Dan Fowler (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387025#comment-16387025
 ] 

Dan Fowler commented on AIRFLOW-2181:
-

PR: https://github.com/apache/incubator-airflow/pull/3102

> Convert DOS formatted files to UNIX
> ---
>
> Key: AIRFLOW-2181
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2181
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: Dan Fowler
>Assignee: Dan Fowler
>Priority: Trivial
>
> While looking into an issue related to the password_auth backend I noticed 
> the following files are in DOS format:
>  
> tests/www/api/experimental/test_password_endpoints.py
>  airflow/contrib/auth/backends/password_auth.py
>  
> I can't think of a reason why these should be DOS formatted, but if there is 
> let me know and I can close this out. Otherwise, I'll submit a PR for this 
> fix.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Reopened] (AIRFLOW-226) Create separate pip packages for webserver and hooks

2018-03-05 Thread Dan Davydov (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dan Davydov reopened AIRFLOW-226:
-

> Create separate pip packages for webserver and hooks
> 
>
> Key: AIRFLOW-226
> URL: https://issues.apache.org/jira/browse/AIRFLOW-226
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Dan Davydov
>Priority: Minor
>
> There are users who want only the airflow hooks, and others who many not need 
> the front-end. The hooks and webserver should be moved into their own 
> packages, with the current airflow package depending on these packages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-226) Create separate pip packages for webserver and hooks

2018-03-05 Thread Dan Davydov (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16387016#comment-16387016
 ] 

Dan Davydov commented on AIRFLOW-226:
-

I feel strongly (at least for hooks) that they should be moved out. Things like 
storing secrets in the Airflow database, hooks, etc. are convenient, but they 
are equivalent to plugins and should have their own owners and maintainers. It 
doesn't make sense to e.g. make the owner and expert of the HiveHook be a 
committer in this repo but they certainly should be the committer and 
maintainer of the HiveHook repo. Another point as to why it makes sense to 
decouple hooks and the core is that it doesn't scale to support backwards 
incompatible changes to all operators for the Airflow committers, we are 
effectively supporting many hooks which we have no domain knowledge of. Other 
systems such as Jenkins follow a similar plugin framework.

> Create separate pip packages for webserver and hooks
> 
>
> Key: AIRFLOW-226
> URL: https://issues.apache.org/jira/browse/AIRFLOW-226
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Dan Davydov
>Priority: Minor
>
> There are users who want only the airflow hooks, and others who many not need 
> the front-end. The hooks and webserver should be moved into their own 
> packages, with the current airflow package depending on these packages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (AIRFLOW-232) Web UI shows inaccurate task counts on main dashboard

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-232.
-
Resolution: Not A Problem

> Web UI shows inaccurate task counts on main dashboard
> -
>
> Key: AIRFLOW-232
> URL: https://issues.apache.org/jira/browse/AIRFLOW-232
> Project: Apache Airflow
>  Issue Type: Bug
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sergei Iakhnin
>Priority: Major
> Attachments: screenshot-1.png
>
>
> Pstgres, celery, rabbitmq, 170 worker nodes, 1 master.
> select count(*), state from task_instance where dag_id = 'freebayes' group by 
> state;
> upstream_failed   2134
> up_for_retry  520
> success   141421
> running   542
> failed1165



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (AIRFLOW-247) EMR Hook, Operators, Sensor

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-247.
---
   Resolution: Fixed
Fix Version/s: 1.8.0

Fixed in 
https://github.com/apache/incubator-airflow/commit/9f49f12853d83dd051f0f1ed58b5df20bfcfe087

> EMR Hook, Operators, Sensor
> ---
>
> Key: AIRFLOW-247
> URL: https://issues.apache.org/jira/browse/AIRFLOW-247
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Rob Froetscher
>Assignee: Rob Froetscher
>Priority: Minor
> Fix For: 1.8.0
>
>
> Substory of https://issues.apache.org/jira/browse/AIRFLOW-115. It would be 
> nice to have an EMR hook and operators.
> Hook to generally interact with EMR.
> Operators to:
> * setup and start a job flow
> * add steps to an existing jobflow 
> A sensor to:
> * monitor completion and status of EMR jobs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (AIRFLOW-236) Support passing S3 credentials through environmental variables

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-236.
---
Resolution: Fixed

This is possible, both using the AWS standard {{AWS_ACCESS_KEY_ID}} and via 
specifying connections via env vars with {{AIRFLOW_CONN_S3=s3://}}

> Support passing S3 credentials through environmental variables
> --
>
> Key: AIRFLOW-236
> URL: https://issues.apache.org/jira/browse/AIRFLOW-236
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: core
>Reporter: Jakob Homan
>Priority: Major
>
> Right now we expect S3 configs to be passed through one of a variety of 
> config files, or through extra parameters in the connection screen.  It'd be 
> nice to be able to pass these through env variables and note as such through 
> the extra parameters.  This would lessen the need to include credentials in 
> the webapp itself.
> Alternatively, for logging (rather than as a connector), it might just be 
> better for Airflow to use the profie defined as AWS_DEFAULT and avoid needed 
> an explicit configuration at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (AIRFLOW-230) [HiveServer2Hook] adding multi statements support

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-230.
---
   Resolution: Fixed
Fix Version/s: 1.8.0

Fixed as 
https://github.com/apache/incubator-airflow/commit/a599167c433246d96bea711d8bfd5710b2c9d3ff

> [HiveServer2Hook] adding multi statements support
> -
>
> Key: AIRFLOW-230
> URL: https://issues.apache.org/jira/browse/AIRFLOW-230
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Maxime Beauchemin
>Priority: Major
> Fix For: 1.8.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (AIRFLOW-229) new DAG runs 5 times when manually started from website

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-229.
-
Resolution: Invalid

Not an issue anymore. Feel free to re-open if anyone is still seeing this 
behaviour!

> new DAG runs 5 times when manually started from website
> ---
>
> Key: AIRFLOW-229
> URL: https://issues.apache.org/jira/browse/AIRFLOW-229
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.6.2
> Environment: celery, rabbitmq, mysql
>Reporter: audubon
>Priority: Minor
>
> version 1.6.2
> using celery, rabbitmq, mysql
> example:
> from airflow import DAG
> from airflow.operators import BashOperator
> from datetime import datetime, timedelta
> import json
> import sys
> one_day_ahead = datetime.combine(datetime.today() + timedelta(1), 
> datetime.min.time())
> one_day_ahead = one_day_ahead.replace(hour=3, minute=31)
> default_args = {
> 'owner': 'airflow',
> 'depends_on_past': False,
> 'start_date': one_day_ahead,
> 'email': ['m...@email.com'],
> 'email_on_failure': True,
> 'email_on_retry': False,
> 'retries': 1,
> 'retry_delay': timedelta(minutes=5),
> }
> dag = DAG('alpha', default_args=default_args , schedule_interval='15 6 * * *' 
> )
> task = BashOperator(
> task_id='alphaV2',
> bash_command='sleep 10',
> dag=dag)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (AIRFLOW-226) Create separate pip packages for webserver and hooks

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-226.
-
Resolution: Won't Fix

Given most of Airflow is optional dependencies installing Airflow itself is not 
that heavy - and the extra development overhead on an open-source project means 
this is not likely to happen -- especially given the cost to the end user is a 
few extra packages installed.

(Sorry to resurrect a really old ticket only to close it Won't Fix. If you feel 
strongly about this we can reopen and discuss this)

> Create separate pip packages for webserver and hooks
> 
>
> Key: AIRFLOW-226
> URL: https://issues.apache.org/jira/browse/AIRFLOW-226
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Dan Davydov
>Priority: Minor
>
> There are users who want only the airflow hooks, and others who many not need 
> the front-end. The hooks and webserver should be moved into their own 
> packages, with the current airflow package depending on these packages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (AIRFLOW-215) Airflow worker (CeleryExecutor) needs to be restarted to pick up tasks

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-215.
---
Resolution: Fixed

Doesn't apply on 1.9.0 or 1.8.2. Was fixed at some point

> Airflow worker (CeleryExecutor) needs to be restarted to pick up tasks
> --
>
> Key: AIRFLOW-215
> URL: https://issues.apache.org/jira/browse/AIRFLOW-215
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery, subdag
>Affects Versions: Airflow 1.7.1.2
>Reporter: Cyril Scetbon
>Priority: Major
>
> We have a main dag that dynamically creates subdags containing tasks using 
> BashOperator. Using CeleryExecutor we see Celery tasks been created with 
> *STARTED* status but they are not picked up by our worker. However, if we 
> restart our worker, then tasks are picked up. 
> Here you can find code if you want to try to reproduce it 
> https://www.dropbox.com/s/8u7xf8jt55v8zio/dags.zip.
> We also tested using LocalExecutor and everything worked fine.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (AIRFLOW-191) Database connection leak on Postgresql backend

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-191.
---
   Resolution: Fixed
Fix Version/s: Airflow 1.8

Merged in as 
https://github.com/apache/incubator-airflow/commit/4905a5563d47b45e38b91661ee5aa7f3765a129b

> Database connection leak on Postgresql backend
> --
>
> Key: AIRFLOW-191
> URL: https://issues.apache.org/jira/browse/AIRFLOW-191
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: executor
>Affects Versions: Airflow 1.7.1.2
>Reporter: Sergei Iakhnin
>Priority: Major
> Fix For: Airflow 1.8
>
> Attachments: Sid_anands_airflow_idle_in_transaction.png
>
>
> I raised this issue on github several months ago and there was even a PR but 
> it never maid it into mainline. Basically, workers tend to hang onto DB 
> connections in Postgres for recording heartbeat.
> I'm running a cluster with 115 workers, each with 8 slots. My Postgres DB is 
> configured to allow 1000 simultaneous connections. I should effectively be 
> able to run 920 tasks at the same time, but am actually limited to only about 
> 450-480 because of idle transactions from workers hanging on to DB 
> connections.
> If I run the following query
> select count(*),state, client_hostname from pg_stat_activity group by state, 
> client_hostname
> These are the results:
> count state client_hostname
> 1 active  (null)
> 1 idlelocalhost
> 451   idle in transaction (null)
> 446   idle(null)
> 1 active  localhost
> The idle connections are all trying to run COMMIT
> The "idle in transaction" connections are all trying to run 
> SELECT job.id AS job_id, job.dag_id AS job_dag_id, job.state AS job_state, 
> job.job_type AS job_job_type, job.start_date AS job_start_date, job.end_date 
> AS job_end_date, job.latest_heartbeat AS job_latest_heartbeat, 
> job.executor_class AS job_executor_class, job.hostname AS job_hostname, 
> job.unixname AS job_unixname 
> FROM job 
> WHERE job.id = 213823 
>  LIMIT 1
> with differing job.ids of course.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (AIRFLOW-187) Make PR tool more user-friendly

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-187.
---
Resolution: Fixed

Fixed by the merged https://github.com/apache/incubator-airflow/pull/1565 

> Make PR tool more user-friendly
> ---
>
> Key: AIRFLOW-187
> URL: https://issues.apache.org/jira/browse/AIRFLOW-187
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: PR tool
>Reporter: Jeremiah Lowin
>Priority: Minor
>
> General JIRA improvement that can be referenced for any UX improvements to 
> the PR tool, including better or more prompts, documentation, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-184) Add clear/mark success to CLI

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386953#comment-16386953
 ] 

Ash Berlin-Taylor commented on AIRFLOW-184:
---

Is this issue still relevant?

> Add clear/mark success to CLI
> -
>
> Key: AIRFLOW-184
> URL: https://issues.apache.org/jira/browse/AIRFLOW-184
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Reporter: Chris Riccomini
>Assignee: Joy Gao
>Priority: Major
>
> AIRFLOW-177 pointed out that the current CLI does not allow us to clear or 
> mark success a task (including upstream, downstream, past, future, and 
> recursive) the way that the UI widget does. Given a goal of keeping parity 
> between the UI and CLI, it seems like we should support this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (AIRFLOW-182) CLI command `airflow backfill` fails while CLI `airflow run` succeeds

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-182.
-
Resolution: Cannot Reproduce

Airflow 1.7 is now quite old. If this is still happening on the new latest 
version please open another issue and we'd be happy to help solve it

> CLI command `airflow backfill` fails while CLI `airflow run` succeeds
> -
>
> Key: AIRFLOW-182
> URL: https://issues.apache.org/jira/browse/AIRFLOW-182
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: celery
>Affects Versions: Airflow 1.7.0
> Environment: Heroku Cedar 14, Heroku Redis as Celery Broker
>Reporter: Hariharan Mohanraj
>Priority: Minor
>
> When I run the backfill command, I get an error that claims there is no dag 
> in my dag folder with the name "unusual_prefix_dag1", although my dag is 
> actually named dag1. However when I run the run command, the task is 
> scheduled and it works flawlessly.
> {code}
> $ airflow backfill -t task1 -s 2016-05-01 -e 2016-05-07 dag1
> 2016-05-26T23:22:28.816908+00:00 app[worker.1]: [2016-05-26 23:22:28,816] 
> {__init__.py:36} INFO - Using executor CeleryExecutor
> 2016-05-26T23:22:29.214006+00:00 app[worker.1]: Traceback (most recent call 
> last):
> 2016-05-26T23:22:29.214083+00:00 app[worker.1]:   File 
> "/app/.heroku/python/bin/airflow", line 15, in 
> 2016-05-26T23:22:29.214121+00:00 app[worker.1]: args.func(args)
> 2016-05-26T23:22:29.214151+00:00 app[worker.1]:   File 
> "/app/.heroku/python/lib/python2.7/site-packages/airflow/bin/cli.py", line 
> 174, in run
> 2016-05-26T23:22:29.214207+00:00 app[worker.1]: 
> DagPickle).filter(DagPickle.id == args.pickle).first()
> 2016-05-26T23:22:29.214230+00:00 app[worker.1]:   File 
> "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/orm/query.py", 
> line 2634, in first
> 2016-05-26T23:22:29.214616+00:00 app[worker.1]: ret = list(self[0:1])
> 2016-05-26T23:22:29.214626+00:00 app[worker.1]:   File 
> "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/orm/query.py", 
> line 2457, in __getitem__
> 2016-05-26T23:22:29.214984+00:00 app[worker.1]: return list(res)
> 2016-05-26T23:22:29.214992+00:00 app[worker.1]:   File 
> "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", 
> line 86, in instances
> 2016-05-26T23:22:29.215053+00:00 app[worker.1]: util.raise_from_cause(err)
> 2016-05-26T23:22:29.215074+00:00 app[worker.1]:   File 
> "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/util/compat.py", 
> line 200, in raise_from_cause
> 2016-05-26T23:22:29.215121+00:00 app[worker.1]: reraise(type(exception), 
> exception, tb=exc_tb, cause=cause)
> 2016-05-26T23:22:29.215142+00:00 app[worker.1]:   File 
> "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", 
> line 71, in instances
> 2016-05-26T23:22:29.215175+00:00 app[worker.1]: rows = [proc(row) for row 
> in fetch]
> 2016-05-26T23:22:29.215200+00:00 app[worker.1]:   File 
> "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", 
> line 428, in _instance
> 2016-05-26T23:22:29.215274+00:00 app[worker.1]: loaded_instance, 
> populate_existing, populators)
> 2016-05-26T23:22:29.215282+00:00 app[worker.1]:   File 
> "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", 
> line 486, in _populate_full
> 2016-05-26T23:22:29.215369+00:00 app[worker.1]: dict_[key] = getter(row)
> 2016-05-26T23:22:29.215406+00:00 app[worker.1]:   File 
> "/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/sql/sqltypes.py", 
> line 1253, in process
> 2016-05-26T23:22:29.215574+00:00 app[worker.1]: return loads(value)
> 2016-05-26T23:22:29.215595+00:00 app[worker.1]:   File 
> "/app/.heroku/python/lib/python2.7/site-packages/dill/dill.py", line 260, in 
> loads
> 2016-05-26T23:22:29.215657+00:00 app[worker.1]: return load(file)
> 2016-05-26T23:22:29.215678+00:00 app[worker.1]:   File 
> "/app/.heroku/python/lib/python2.7/site-packages/dill/dill.py", line 250, in 
> load
> 2016-05-26T23:22:29.215738+00:00 app[worker.1]: obj = pik.load()
> 2016-05-26T23:22:29.215758+00:00 app[worker.1]:   File 
> "/app/.heroku/python/lib/python2.7/pickle.py", line 858, in load
> 2016-05-26T23:22:29.215895+00:00 app[worker.1]: dispatch[key](self)
> 2016-05-26T23:22:29.215902+00:00 app[worker.1]:   File 
> "/app/.heroku/python/lib/python2.7/pickle.py", line 1090, in load_global
> 2016-05-26T23:22:29.216069+00:00 app[worker.1]: klass = 
> self.find_class(module, name)
> 2016-05-26T23:22:29.216077+00:00 app[worker.1]:   File 
> "/app/.heroku/python/lib/python2.7/site-packages/dill/dill.py", line 406, in 
> find_class
> 2016-05-26T23:22:29.216181+00:00 app[worker.1]: return 
> StockU

[jira] [Closed] (AIRFLOW-181) Travis builds fail due to corrupt cache

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-181.
-
Resolution: Fixed

Closed by 
https://github.com/apache/incubator-airflow/commit/afcd4fcf01696ee26911640cdeb481defd93c3aa

> Travis builds fail due to corrupt cache
> ---
>
> Key: AIRFLOW-181
> URL: https://issues.apache.org/jira/browse/AIRFLOW-181
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Bolke de Bruin
>Assignee: Bolke de Bruin
>Priority: Major
>
> Corrupt cache is preventing from unpacking hadoop. It needs to redownload the 
> distribution without checking the cache



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (AIRFLOW-160) Parse DAG files through child processes

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-160.
-
   Resolution: Fixed
Fix Version/s: Airflow 1.8

Fixed by 
https://github.com/apache/incubator-airflow/commit/fdb7e949140b735b8554ae5b22ad752e86f6ebaf

> Parse DAG files through child processes
> ---
>
> Key: AIRFLOW-160
> URL: https://issues.apache.org/jira/browse/AIRFLOW-160
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Paul Yang
>Assignee: Paul Yang
>Priority: Major
> Fix For: Airflow 1.8
>
>
> Currently, the Airflow scheduler parses all user DAG files in the same 
> process as the scheduler itself. We've seen issues in production where bad 
> DAG files cause scheduler to fail. A simple example is if the user script 
> calls `sys.exit(1)`, the scheduler will exit as well. We've also seen an 
> unusual case where modules loaded by the user DAG affect operation of the 
> scheduler. For better uptime, the scheduler should be resistant to these 
> problematic user DAGs.
> The proposed solution is to parse and schedule user DAGs through child 
> processes. This way, the main scheduler process is more isolated from bad 
> DAGs. There's a side benefit as well - since parsing is distributed among 
> multiple processes, it's possible to parse the DAG files more frequently, 
> reducing the latency between when a DAG is modified and when the changes are 
> picked up.
> Another issue right now is that all DAGs must be scheduled before any tasks 
> are sent to the executor. This means that the frequency of task scheduling is 
> limited by the slowest DAG to schedule. The changes needed for scheduling 
> DAGs through child processes will also make it easy to decouple this process 
> and allow tasks to be scheduled and sent to the executor in a more 
> independent fashion. This way, overall scheduling won't be held back by a 
> slow DAG.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (AIRFLOW-147) HiveServer2Hook.to_csv() writing one row at a time and causing excessive logging

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-147.
-
Resolution: Fixed

Fixed by 
https://github.com/apache/incubator-airflow/commit/a5c00b3f1581580818b585b21abd3df3fa68af64

> HiveServer2Hook.to_csv() writing one row at a time and causing excessive 
> logging
> 
>
> Key: AIRFLOW-147
> URL: https://issues.apache.org/jira/browse/AIRFLOW-147
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: Airflow 1.7.0
>Reporter: Michael Musson
>Priority: Minor
>
> The default behavior of fetchmany() in impala dbapi (which airflow switched 
> to recently) is to return a single row at a time. This causes HiveServer2's 
> to_csv() method to output one row of logging for each row of data in the 
> results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (AIRFLOW-129) Allow CELERYD_PREFETCH_MULTIPLIER to be configurable

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-129.
---
   Resolution: Fixed
Fix Version/s: Airflow 1.9.0

Not the nicest interface for configuring, but it is now possible to do without 
patching Airflow.

> Allow CELERYD_PREFETCH_MULTIPLIER to be configurable
> 
>
> Key: AIRFLOW-129
> URL: https://issues.apache.org/jira/browse/AIRFLOW-129
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: celery
>Affects Versions: Airflow 1.7.0
>Reporter: Nam Ngo
>Priority: Major
> Fix For: Airflow 1.9.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Airflow needs to allow everyone to customise their prefetch limit. Some might 
> have short running task and don't want the overhead of celery latency.
> More on that here:
> http://docs.celeryproject.org/en/latest/userguide/optimizing.html#optimizing-prefetch-limit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (AIRFLOW-135) Clean up git branches (remove old + implement versions)

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-135.
-
Resolution: Fixed

There are now only 6 branches. Nice and clean :)

> Clean up git branches (remove old + implement versions)
> ---
>
> Key: AIRFLOW-135
> URL: https://issues.apache.org/jira/browse/AIRFLOW-135
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: project-management
>Reporter: Jeremiah Lowin
>Priority: Minor
>  Labels: git
> Fix For: Airflow 1.8
>
>
> We have a large number of branches in the git repo, most of which are old 
> features -- I would bet hardly any of them are active. I think they should be 
> deleted if possible. In addition, we should begin using branches (as opposed 
> to tags) to allow easy switching between Airflow versions. Spark 
> (https://github.com/apache/spark) uses the format {{branch-X.X}}; others like 
> Kafka (https://github.com/apache/kafka) simply use a version number. But this 
> is an important way to browse the history and, most importantly, can't be 
> overwritten like a tag (since tags point at commits and commits can be 
> rebased away). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (AIRFLOW-110) Point people to the approriate process to submit PRs in the repository's CONTRIBUTING.md

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-110.
---
Resolution: Fixed

With the addition of the {{.github}} folder this is now quite obvious on GitHub.

> Point people to the approriate process to submit PRs in the repository's 
> CONTRIBUTING.md
> 
>
> Key: AIRFLOW-110
> URL: https://issues.apache.org/jira/browse/AIRFLOW-110
> Project: Apache Airflow
>  Issue Type: Task
>  Components: docs
>Reporter: Arthur Wiedmer
>Priority: Trivial
>  Labels: documentation, newbie
>
> The current process to contribute code could be made more accessible. I am 
> assuming that the entry point to the project is Github and the repository. We 
> could modify the contributing.md as well as the read me to point to the 
> proper way to do this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (AIRFLOW-2123) Install CI Dependencies from setup.py

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2123.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request #3054
[https://github.com/apache/incubator-airflow/pull/3054]

> Install CI Dependencies from setup.py
> -
>
> Key: AIRFLOW-2123
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2123
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Fokko Driesprong
>Priority: Major
> Fix For: 2.0.0
>
>
> Right now we have two places where we keep our dependencies. This is setup.py 
> for installation and requirements.txt for the CI. These files run terribly 
> out of sync and therefore I think it is a good idea to install the CI's 
> dependencies using this setup.py so we have everything in one single place.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

incubator-airflow git commit: [AIRFLOW-2123] Install CI dependencies from setup.py

2018-03-05 Thread ash

Repository: incubator-airflow
Updated Branches:
  refs/heads/master f1df3de9b -> 976fd1245


[AIRFLOW-2123] Install CI dependencies from setup.py

Install the dependencies from setup.py so we keep all the dependencies
in one single place

Closes #3054 from Fokko/fd-fix-ci-2


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/976fd124
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/976fd124
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/976fd124

Branch: refs/heads/master
Commit: 976fd1245a981b37957e4e35367b0e504d8e3d67
Parents: f1df3de
Author: Fokko Driesprong 
Authored: Mon Mar 5 22:46:07 2018 +
Committer: Ash Berlin-Taylor 
Committed: Mon Mar 5 22:46:45 2018 +

--
 scripts/ci/requirements.txt | 97 
 scripts/ci/travis_script.sh |  2 +
 setup.py| 17 +--
 tox.ini |  4 +-
 4 files changed, 17 insertions(+), 103 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/976fd124/scripts/ci/requirements.txt
--
diff --git a/scripts/ci/requirements.txt b/scripts/ci/requirements.txt
deleted file mode 100644
index 9c028d5..000
--- a/scripts/ci/requirements.txt
+++ /dev/null
@@ -1,97 +0,0 @@
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-alembic
-azure-storage>=0.34.0
-bcrypt
-bleach
-boto
-boto3
-celery
-cgroupspy
-chartkick
-cloudant<2.0
-coverage
-coveralls
-croniter>=0.3.17
-cryptography
-datadog
-dill
-distributed
-docker-py
-filechunkio
-flake8
-flask
-flask-admin
-flask-bcrypt
-flask-cache
-flask-login==0.2.11
-Flask-WTF
-flower
-freezegun
-future
-google-api-python-client>=1.5.0,<1.6.0
-gunicorn
-hdfs
-hive-thrift-py
-impyla
-ipython
-jaydebeapi
-jinja2<2.9.0
-jira
-ldap3
-lxml
-markdown
-mock
-moto==1.1.19
-mysqlclient
-nose
-nose-exclude
-nose-ignore-docstring==0.2
-nose-timer
-oauth2client>=2.0.2,<2.1.0
-pandas
-pandas-gbq
-parameterized
-paramiko>=2.1.1
-pendulum>=1.3.2
-psutil>=4.2.0, <5.0.0
-psycopg2
-pygments
-pyhive
-pykerberos
-PyOpenSSL
-PySmbClient
-python-daemon
-python-dateutil
-python-jenkins
-qds-sdk>=1.9.6
-redis
-rednose
-requests
-requests-kerberos
-requests_mock
-sendgrid
-setproctitle
-slackclient
-sphinx
-sphinx-argparse
-Sphinx-PyPI-upload
-sphinx_rtd_theme
-sqlalchemy>=1.1.15, <1.2.0
-statsd
-thrift
-thrift_sasl
-unicodecsv
-zdesk
-kubernetes

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/976fd124/scripts/ci/travis_script.sh
--
diff --git a/scripts/ci/travis_script.sh b/scripts/ci/travis_script.sh
index 86c086a..8766e94 100755
--- a/scripts/ci/travis_script.sh
+++ b/scripts/ci/travis_script.sh
@@ -1,3 +1,5 @@
+#!/usr/bin/env bash
+
 #  Licensed to the Apache Software Foundation (ASF) under one   *
 #  or more contributor license agreements.  See the NOTICE file *
 #  distributed with this work for additional information*

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/976fd124/setup.py
--
diff --git a/setup.py b/setup.py
index d3f48e3..254aa3e 100644
--- a/setup.py
+++ b/setup.py
@@ -27,6 +27,7 @@ logger = logging.getLogger(__name__)
 version = imp.load_source(
 'airflow.version', os.path.join('airflow', 'version.py')).version
 
+PY3 = sys.version_info[0] == 3
 
 class Tox(TestCommand):
 user_options = [('tox-args=', None, "Arguments to pass to tox")]
@@ -153,8 +154,7 @@ ldap = ['ldap3>=0.9.9.1']
 kerberos = ['pykerberos>=1.1.13',
 'requests_kerberos>=0.10.0',
 'thrift_sasl>=0.2.0',
-'snakebite[kerberos]>=2.7.8',
-'kerberos>=1.2.5']
+'snakebite[kerberos]>=2.7.8']
 password = [
 'bcrypt>=2.0.0',
 'flask-bcrypt>=0.7.1',
@@ -166,6 +166,8 @@ redis = ['redis>=2.10.5']
 kubernetes = ['kubernetes>=3.0.0',
   'cryptography>=2.0.0']
 
+zendesk = ['zdesk']
+
 all_dbs = postgres + mysql + hive + mssql + hdfs + vertica + cloudant
 devel = [
 'click',
@@ -185,9 +187,15 @@ devel = [
 ]
 devel_minreq = devel + kubernetes + mysql + doc + password + s3 + cgroups
 devel_hadoop = devel_minreq +

[jira] [Created] (AIRFLOW-2182) Configured

2018-03-05 Thread Richard Ferrer (JIRA)

Richard Ferrer created AIRFLOW-2182:
---

 Summary: Configured
 Key: AIRFLOW-2182
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2182
 Project: Apache Airflow
  Issue Type: New Feature
  Components: authentication
Reporter: Richard Ferrer
Assignee: Richard Ferrer






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2181) Convert DOS formatted files to UNIX

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386816#comment-16386816
 ] 

Ash Berlin-Taylor commented on AIRFLOW-2181:


No reason at all - PR welcomed!

> Convert DOS formatted files to UNIX
> ---
>
> Key: AIRFLOW-2181
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2181
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: Dan Fowler
>Assignee: Dan Fowler
>Priority: Trivial
>
> While looking into an issue related to the password_auth backend I noticed 
> the following files are in DOS format:
>  
> tests/www/api/experimental/test_password_endpoints.py
>  airflow/contrib/auth/backends/password_auth.py
>  
> I can't think of a reason why these should be DOS formatted, but if there is 
> let me know and I can close this out. Otherwise, I'll submit a PR for this 
> fix.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (AIRFLOW-97) "airflow" "DAG" strings in file necessary to import dag

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-97?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor updated AIRFLOW-97:
-
Affects Version/s: Airflow 1.9.0

> "airflow" "DAG" strings in file necessary to import dag
> ---
>
> Key: AIRFLOW-97
> URL: https://issues.apache.org/jira/browse/AIRFLOW-97
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: Airflow 1.7.0, Airflow 1.9.0
>Reporter: Etiene Dalcol
>Priority: Minor
>
> Hello airflow team! Thanks for the awesome tool!
> We made a small module to automate our DAG building process and we are using 
> this module on our DAG definition. Our airflow version is 1.7.0.
> However, airflow will not import this file because it doesn't have the words 
> DAG and airflow on it. (The imports etc are done inside our little module). 
> Apparently there's a safe_mode that skips files without these strings.
> (https://github.com/apache/incubator-airflow/blob/1.7.0/airflow/models.py#L197)
> This safe_mode is default to True but is not passed to the process_file 
> function, so it is always True and there's no apparent way to disable it.
> (https://github.com/apache/incubator-airflow/blob/1.7.0/airflow/models.py#L177)
> (https://github.com/apache/incubator-airflow/blob/1.7.0/airflow/models.py#L313)
> Putting this comment on the top of the file makes it work for the moment and 
> brought me a good laugh today 👯 
> #DAG airflow —> DO NOT REMOVE. the world will explode



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (AIRFLOW-42) Adding logging.debug DagBag loading stats

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-42?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-42.

   Resolution: Fixed
Fix Version/s: 1.8.0

Merged in May 2016 via 
https://github.com/apache/incubator-airflow/commit/3c3f5a67ff80f3e8942aef441f481c62baf97184
 

> Adding logging.debug DagBag loading stats
> -
>
> Key: AIRFLOW-42
> URL: https://issues.apache.org/jira/browse/AIRFLOW-42
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Maxime Beauchemin
>Assignee: Maxime Beauchemin
>Priority: Major
> Fix For: 1.8.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (AIRFLOW-19) How can I have an Operator B iterate over a list returned from upstream by Operator A?

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-19?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-19.

Resolution: Not A Bug

As discussed, the mailing list 
(http://mail-archives.apache.org/mod_mbox/incubator-airflow-dev/) is the best 
place for questions like this.

> How can I have an Operator B iterate over a list returned from upstream by 
> Operator A?
> --
>
> Key: AIRFLOW-19
> URL: https://issues.apache.org/jira/browse/AIRFLOW-19
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: operators
>Reporter: Praveenkumar Venkatesan
>Priority: Minor
>  Labels: support
>
> Here is what I am trying to do exactly: 
> https://gist.github.com/praveev/7b93b50746f8e965f7139ecba028490a
> the python operator log just returns the following
> [2016-04-28 11:56:22,296] {models.py:1041} INFO - Executing 
>  on 2016-04-28 11:56:12
> [2016-04-28 11:56:22,350] {python_operator.py:66} INFO - Done. Returned value 
> was: None
> it didn't even print my kwargs and to_process data
> To simplify this. Lets say t1 returns 3 elements. I want to iterate over the 
> list and run t2 -> t3 for each element.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (AIRFLOW-2181) Convert DOS formatted files to UNIX

2018-03-05 Thread Dan Fowler (JIRA)

Dan Fowler created AIRFLOW-2181:
---

 Summary: Convert DOS formatted files to UNIX
 Key: AIRFLOW-2181
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2181
 Project: Apache Airflow
  Issue Type: Task
Reporter: Dan Fowler
Assignee: Dan Fowler


While looking into an issue related to the password_auth backend I noticed the 
following files are in DOS format:

 

tests/www/api/experimental/test_password_endpoints.py
 airflow/contrib/auth/backends/password_auth.py

 

I can't think of a reason why these should be DOS formatted, but if there is 
let me know and I can close this out. Otherwise, I'll submit a PR for this fix.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work started] (AIRFLOW-2181) Convert DOS formatted files to UNIX

2018-03-05 Thread Dan Fowler (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-2181 started by Dan Fowler.
---
> Convert DOS formatted files to UNIX
> ---
>
> Key: AIRFLOW-2181
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2181
> Project: Apache Airflow
>  Issue Type: Task
>Reporter: Dan Fowler
>Assignee: Dan Fowler
>Priority: Trivial
>
> While looking into an issue related to the password_auth backend I noticed 
> the following files are in DOS format:
>  
> tests/www/api/experimental/test_password_endpoints.py
>  airflow/contrib/auth/backends/password_auth.py
>  
> I can't think of a reason why these should be DOS formatted, but if there is 
> let me know and I can close this out. Otherwise, I'll submit a PR for this 
> fix.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (AIRFLOW-2180) Import Errors on Custom Logging Produce Unhelpful Messages

2018-03-05 Thread Kevin Lawrence Pamplona (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Lawrence Pamplona updated AIRFLOW-2180:
-
Attachment: Screen Shot 2018-03-05 at 1.19.07 PM.png

> Import Errors on Custom Logging Produce Unhelpful Messages
> --
>
> Key: AIRFLOW-2180
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2180
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Kevin Lawrence Pamplona
>Priority: Minor
> Attachments: Screen Shot 2018-03-05 at 1.19.07 PM.png
>
>
> Repro Steps:
> 1. Use airflow.cfg with missing [core/remote_logging] field
> 2. Start airflow or run `PYTHONPATH=config/ python -c 'import log_conf'`given 
> that custom logging config is in config/log_conf.py
> Execution will produce an irrelevant error:
> 'Unable to load custom logging from {}'.format(logging_class_path)
> ImportError: Unable to load custom logging from log_config.LOGGING_CONFIG
> No handlers could be found for logger 
> "airflow.utils.log.logging_mixin.LoggingMixin"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (AIRFLOW-2180) Import Errors on Custom Logging Produce Unhelpful Messages

2018-03-05 Thread Kevin Lawrence Pamplona (JIRA)

Kevin Lawrence Pamplona created AIRFLOW-2180:


 Summary: Import Errors on Custom Logging Produce Unhelpful Messages
 Key: AIRFLOW-2180
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2180
 Project: Apache Airflow
  Issue Type: Bug
Reporter: Kevin Lawrence Pamplona


Repro Steps:
1. Use airflow.cfg with missing [core/remote_logging] field
2. Start airflow or run `PYTHONPATH=config/ python -c 'import log_conf'`given 
that custom logging config is in config/log_conf.py

Execution will produce an irrelevant error:
'Unable to load custom logging from {}'.format(logging_class_path)
ImportError: Unable to load custom logging from log_config.LOGGING_CONFIG
No handlers could be found for logger 
"airflow.utils.log.logging_mixin.LoggingMixin"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2179) Make parametrable the IP on which the worker log server binds to

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386692#comment-16386692
 ] 

Ash Berlin-Taylor commented on AIRFLOW-2179:


Sounds like a sensible change.

> Make parametrable the IP on which the worker log server binds to
> 
>
> Key: AIRFLOW-2179
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2179
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: celery, webserver
>Reporter: Albin Gilles
>Priority: Minor
>
> Hello,
> I'd be glad if the tiny web server subprocess to serve the workers local log 
> files could be set to bind to localhost only as could be done for Gunicorn or 
> Flower. See 
> [cli.py#L865|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L865]
> If you don't see any issue with that possibility, I'll be happy to propose a 
> PR on github.
> Regards,
>  Albin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (AIRFLOW-2163) Add HBC Digital to list of companies using Airflow

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor resolved AIRFLOW-2163.

Resolution: Fixed

> Add HBC Digital to list of companies using Airflow
> --
>
> Key: AIRFLOW-2163
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2163
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: Terry McCartan
>Assignee: Terry McCartan
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[1/2] incubator-airflow git commit: [AIRFLOW-2163] Add HBC Digital to users of airflow

2018-03-05 Thread ash

Repository: incubator-airflow
Updated Branches:
  refs/heads/master 1ac4d07d0 -> f1df3de9b


[AIRFLOW-2163] Add HBC Digital to users of airflow


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/8b6eab7a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/8b6eab7a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/8b6eab7a

Branch: refs/heads/master
Commit: 8b6eab7a269c7e74fb30cdf7efe7070c38bdc1b3
Parents: 2511c46
Author: Terry McCartan 
Authored: Fri Mar 2 12:47:51 2018 +
Committer: Terry McCartan 
Committed: Fri Mar 2 12:47:51 2018 +

--
 README.md | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/8b6eab7a/README.md
--
diff --git a/README.md b/README.md
index fa7bb77..b3ba1b7 100644
--- a/README.md
+++ b/README.md
@@ -135,6 +135,7 @@ Currently **officially** using Airflow:
 1. [Gusto](https://gusto.com) [[@frankhsu](https://github.com/frankhsu)]
 1. [Handshake](https://joinhandshake.com/) 
[[@mhickman](https://github.com/mhickman)]
 1. [Handy](http://www.handy.com/careers/73115?gh_jid=73115&gh_src=o5qcxn) 
[[@marcintustin](https://github.com/marcintustin) / 
[@mtustin-handy](https://github.com/mtustin-handy)]
+1. [HBC Digital](http://tech.hbc.com) 
[[@tmccartan](https://github.com/tmccartan) & 
[@dmateusp](https://github.com/dmateusp)]
 1. [Healthjump](http://www.healthjump.com/) 
[[@miscbits](https://github.com/miscbits)]
 1. [HBO](http://www.hbo.com/)[[@yiwang](https://github.com/yiwang)]
 1. [HelloFresh](https://www.hellofresh.com) 
[[@tammymendt](https://github.com/tammymendt) & 
[@davidsbatista](https://github.com/davidsbatista) & 
[@iuriinedostup](https://github.com/iuriinedostup)]

[2/2] incubator-airflow git commit: Merge pull request #3084 from tmccartan/master

2018-03-05 Thread ash

Merge pull request #3084 from tmccartan/master


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/f1df3de9
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/f1df3de9
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/f1df3de9

Branch: refs/heads/master
Commit: f1df3de9bb3fa5c8206ed9e7f0b089a92785b81a
Parents: 1ac4d07 8b6eab7
Author: Ash Berlin-Taylor 
Authored: Mon Mar 5 20:17:14 2018 +
Committer: Ash Berlin-Taylor 
Committed: Mon Mar 5 20:17:14 2018 +

--
 README.md | 1 +
 1 file changed, 1 insertion(+)
--

[jira] [Updated] (AIRFLOW-2179) Make parametrable the IP on which the worker log server binds to

2018-03-05 Thread Albin Gilles (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Albin Gilles updated AIRFLOW-2179:
--
Description: 
Hello,

I'd be glad if the tiny web server subprocess to serve the workers local log 
files could be set to bind to localhost only as could be done for Gunicorn or 
Flower. See 
[cli.py#L865|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L865]

If you don't see any issue with that possibility, I'll be happy to propose a PR 
on github.

Regards,
 Albin.

  was:
Hello,

I'd be glad if the tiny web server subprocess to serve the workers local log 
files could be set to bind to localhost only as could be done for Gunicorn or 
Flower. See 
[cli.py#L865|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L865

If you don't see any issue with that possibility, I'll be happy to propose a PR 
on github.

Regards,
 Albin.


> Make parametrable the IP on which the worker log server binds to
> 
>
> Key: AIRFLOW-2179
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2179
> Project: Apache Airflow
>  Issue Type: Improvement
>  Components: celery, webserver
>Reporter: Albin Gilles
>Priority: Minor
>
> Hello,
> I'd be glad if the tiny web server subprocess to serve the workers local log 
> files could be set to bind to localhost only as could be done for Gunicorn or 
> Flower. See 
> [cli.py#L865|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L865]
> If you don't see any issue with that possibility, I'll be happy to propose a 
> PR on github.
> Regards,
>  Albin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (AIRFLOW-2179) Make parametrable the IP on which the worker log server binds to

2018-03-05 Thread Albin Gilles (JIRA)

Albin Gilles created AIRFLOW-2179:
-

 Summary: Make parametrable the IP on which the worker log server 
binds to
 Key: AIRFLOW-2179
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2179
 Project: Apache Airflow
  Issue Type: Improvement
  Components: celery, webserver
Reporter: Albin Gilles


Hello,

I'd be glad if the tiny web server subprocess to serve the workers local log 
files could be set to bind to localhost only as could be done for Gunicorn or 
Flower. See 
[cli.py#L865|https://github.com/apache/incubator-airflow/blob/master/airflow/bin/cli.py#L865

If you don't see any issue with that possibility, I'll be happy to propose a PR 
on github.

Regards,
 Albin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (AIRFLOW-2178) Scheduler can't get past SLA check if SMTP settings are incorrect

2018-03-05 Thread James Meickle (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Meickle updated AIRFLOW-2178:
---
Attachment: log.txt

> Scheduler can't get past SLA check if SMTP settings are incorrect
> -
>
> Key: AIRFLOW-2178
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2178
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 1.9.0
> Environment: 16.04
>Reporter: James Meickle
>Priority: Major
> Attachments: log.txt
>
>
> After testing Airflow for a while in staging, I provisioned our prod cluster 
> and enabled the first DAG on it. The "backfill" for this DAG performed just 
> fine, so I assumed everything was working and left it over the weekend.
> However, when the last "backfill" period completed and the scheduler 
> transitioned to the most recent execution date, it began failing in the 
> `manage_slas` method. Due to a configuration difference, SMTP was timing out 
> in production, preventing the SLA check from ever completing; this both 
> blocked SLA notifications, as well as prevented further tasks in this DAG 
> from ever getting scheduled.
> As an operator, I would expect AIrflow to treat scheduling tasks as a 
> higher-priority concern, and to do so even f the SLA feature fails to work. I 
> would also expect Airflow to notify me in the web UI that email sending is 
> not currently working.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (AIRFLOW-2178) Scheduler can't get past SLA check if SMTP settings are incorrect

2018-03-05 Thread James Meickle (JIRA)

James Meickle created AIRFLOW-2178:
--

 Summary: Scheduler can't get past SLA check if SMTP settings are 
incorrect
 Key: AIRFLOW-2178
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2178
 Project: Apache Airflow
  Issue Type: Bug
  Components: scheduler
Affects Versions: 1.9.0
 Environment: 16.04
Reporter: James Meickle


After testing Airflow for a while in staging, I provisioned our prod cluster 
and enabled the first DAG on it. The "backfill" for this DAG performed just 
fine, so I assumed everything was working and left it over the weekend.

However, when the last "backfill" period completed and the scheduler 
transitioned to the most recent execution date, it began failing in the 
`manage_slas` method. Due to a configuration difference, SMTP was timing out in 
production, preventing the SLA check from ever completing; this both blocked 
SLA notifications, as well as prevented further tasks in this DAG from ever 
getting scheduled.

As an operator, I would expect AIrflow to treat scheduling tasks as a 
higher-priority concern, and to do so even f the SLA feature fails to work. I 
would also expect Airflow to notify me in the web UI that email sending is not 
currently working.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2175) Failed to upgradedb 1.8.2 -> 1.9.0

2018-03-05 Thread Joy Gao (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386502#comment-16386502
 ] 

Joy Gao commented on AIRFLOW-2175:
--

Perhaps the fileloc attribute didn't get saved to db successfully. Curious is 
this a subdag?

Maybe add a null check prior to os.path.isfile(filepath) to avoid this 
TypeError.

> Failed to upgradedb 1.8.2 -> 1.9.0
> --
>
> Key: AIRFLOW-2175
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2175
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: db
>Affects Versions: 1.9.0
>Reporter: Damian Momot
>Priority: Critical
>
> We've got airflow installation with hundreds of DAGs and thousands of tasks.
> During upgrade (1.8.2 -> 1.9.0) we've got following error.
> After analyzing stacktrace i've found that it's most likely caused by None 
> value in 'fileloc' field of Dag column. I checked database and indeed we've 
> got one record with such value:
>  
>  
> {code:java}
> SELECT COUNT(*) FROM dag WHERE fileloc IS NULL;
> 1
> SELECT COUNT(*) FROM dag;
> 343
> {code}
>  
>  
> {code:java}
> Traceback (most recent call last):
>  File "/usr/local/bin/airflow", line 27, in 
>  args.func(args)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 913, 
> in upgradedb
>  db_utils.upgradedb()
>  File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 320, 
> in upgradedb
>  command.upgrade(config, 'heads')
>  File "/usr/local/lib/python2.7/dist-packages/alembic/command.py", line 174, 
> in upgrade
>  script.run_env()
>  File "/usr/local/lib/python2.7/dist-packages/alembic/script/base.py", line 
> 416, in run_env
>  util.load_python_file(self.dir, 'env.py')
>  File "/usr/local/lib/python2.7/dist-packages/alembic/util/pyfiles.py", line 
> 93, in load_python_file
>  module = load_module_py(module_id, path)
>  File "/usr/local/lib/python2.7/dist-packages/alembic/util/compat.py", line 
> 79, in load_module_py
>  mod = imp.load_source(module_id, path, fp)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/migrations/env.py", 
> line 86, in 
>  run_migrations_online()
>  File "/usr/local/lib/python2.7/dist-packages/airflow/migrations/env.py", 
> line 81, in run_migrations_online
>  context.run_migrations()
>  File "", line 8, in run_migrations
>  File 
> "/usr/local/lib/python2.7/dist-packages/alembic/runtime/environment.py", line 
> 807, in run_migrations
>  self.get_context().run_migrations(**kw)
>  File "/usr/local/lib/python2.7/dist-packages/alembic/runtime/migration.py", 
> line 321, in run_migrations
>  step.migration_fn(**kw)
>  File 
> "/usr/local/lib/python2.7/dist-packages/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py",
>  line 63, in upgrade
>  dag = dagbag.get_dag(ti.dag_id)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 232, 
> in get_dag
>  filepath=orm_dag.fileloc, only_if_updated=False)
>  File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 249, 
> in process_file
>  if not os.path.isfile(filepath):
>  File "/usr/lib/python2.7/genericpath.py", line 29, in isfile
>  st = os.stat(path)
> TypeError: coercing to Unicode: need string or buffer, NoneType found{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (AIRFLOW-2118) get_pandas_df does always pass a list of rows to be parsed

2018-03-05 Thread Anonymous (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anonymous reassigned AIRFLOW-2118:
--

Assignee: Diane Ivy

> get_pandas_df does always pass a list of rows to be parsed
> --
>
> Key: AIRFLOW-2118
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2118
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib, hooks
>Affects Versions: 1.9.0
> Environment: pandas-gbp 0.3.1
>Reporter: Diane Ivy
>Assignee: Diane Ivy
>Priority: Minor
>  Labels: easyfix
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> While trying to parse the pages in get_pandas_df if only one page is returned 
> it starts popping off each row and then the gbq_parse_data works incorrectly.
> {{while len(pages) > 0:}}
> {{    page = pages.pop()}}
> {{    dataframe_list.append(gbq_parse_data(schema, page))}}
> Possible solution:
> {{from google.cloud import bigquery}}
> {{if isinstance(pages[0], bigquery.table.Row):}}
> {{    pages = [pages]}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (AIRFLOW-2177) Add test for GCS download operator

2018-03-05 Thread Kaxil Naik (JIRA)

Kaxil Naik created AIRFLOW-2177:
---

 Summary: Add test for GCS download operator
 Key: AIRFLOW-2177
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2177
 Project: Apache Airflow
  Issue Type: Task
  Components: contrib, gcp
Reporter: Kaxil Naik
Assignee: Kaxil Naik


Add mock tests for GCS Download operator



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (AIRFLOW-2158) Airflow should not store logs as raw ISO timestamps

2018-03-05 Thread Ash Berlin-Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ash Berlin-Taylor closed AIRFLOW-2158.
--
Resolution: Duplicate

> Airflow should not store logs as raw ISO timestamps
> ---
>
> Key: AIRFLOW-2158
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2158
> Project: Apache Airflow
>  Issue Type: Improvement
> Environment: 1.9.0
>Reporter: Christian D
>Priority: Minor
>  Labels: easyfix, windows
> Fix For: Airflow 2.0
>
>
> Problem:
> When Airflow writes logs to disk, it uses a ISO-8601 timestamp as the 
> filename. In a Linux filesystem this works completely fine (because all 
> characters in a ISO-8601 timestamp is allowed). However, it doesn't work on 
> Windows based systems  (including Azure File Storage) because {{:}} is a 
> disallowed character.
> Solution:
> Ideally, Airflow should store logs such that they're somewhat compatible 
> across file systems. An easy way of fixing this would therefore be to always 
> replace {{:}} with underscores.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (AIRFLOW-2176) Change the way logging is carried out in BigQuery Get Data Operator

2018-03-05 Thread Kaxil Naik (JIRA)

Kaxil Naik created AIRFLOW-2176:
---

 Summary: Change the way logging is carried out in BigQuery Get 
Data Operator
 Key: AIRFLOW-2176
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2176
 Project: Apache Airflow
  Issue Type: Task
  Components: contrib, gcp, logging
Reporter: Kaxil Naik
Assignee: Kaxil Naik


Currently, the logging is done by importing logging package. This should be 
changed to `self.log.info`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (AIRFLOW-2128) 'Tall' DAGs scale worse than 'wide' DAGs

2018-03-05 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Máté Szabó updated AIRFLOW-2128:

Description: 
Tall DAG = a DAG with long chains of dependencies, e.g.: 0 -> 1 -> 2 -> ... -> 
998 -> 999
 Wide DAG = a DAG with many short, parallel dependencies e.g. 0 -> 1; 0 -> 2; 
... 0 -> 999

Take a super simple case where both graphs are of 1000 tasks, and all the tasks 
are just "sleep 0.03" bash commands (see the attached files).
 With the default SequentialExecutor (without paralellism), I would expect my 2 
example DAGs to take (approximately) the same time to run, but apparently this 
is not the case.

For the wide DAG it was about 80 successfully executed tasks in 10 minutes, for 
the tall one it was 0.

This anomaly also seem to affect the web UI. Opening up the graph view or the 
tree view for the wide DAG takes about 6 seconds on my machine, but for the 
tall one it takes significantly longer, in fact currently it does not load at 
all.

  was:
Tall DAG = a DAG with long chains of dependencies, e.g.: 0 -> 1 -> 2 -> ... -> 
998 -> 999
Wide DAG = a DAG with many short, parallel dependencies e.g. 0 -> 1; 0 -> 2; 
... 0 -> 999


Take a super simple case where both graphs are of 1000 tasks, and all the tasks 
are just "sleep 0.03" bash commands (see the attached files).
With the default SequentialExecutor (without paralellism), I would expect my 2 
example DAGs to take (approximately) the same time to run, but apprently this 
is not the case.

For the wide DAG it was about 80 successfully executed tasks in 10 minutes, for 
the tall one it was 0.

This anomaly also seem to affect the web UI. Opening up the graph view or the 
tree view for the wide DAG takes about 6 seconds on my machine, but for the 
tall one it takes significantly longer, in fact currently it does not load at 
all.


> 'Tall' DAGs scale worse than 'wide' DAGs
> 
>
> Key: AIRFLOW-2128
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2128
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, DagRun, scheduler
>Affects Versions: 1.9.0
>Reporter: Máté Szabó
>Priority: Major
>  Labels: performance, usability
> Attachments: tall_dag.py, wide_dag.py
>
>
> Tall DAG = a DAG with long chains of dependencies, e.g.: 0 -> 1 -> 2 -> ... 
> -> 998 -> 999
>  Wide DAG = a DAG with many short, parallel dependencies e.g. 0 -> 1; 0 -> 2; 
> ... 0 -> 999
> Take a super simple case where both graphs are of 1000 tasks, and all the 
> tasks are just "sleep 0.03" bash commands (see the attached files).
>  With the default SequentialExecutor (without paralellism), I would expect my 
> 2 example DAGs to take (approximately) the same time to run, but apparently 
> this is not the case.
> For the wide DAG it was about 80 successfully executed tasks in 10 minutes, 
> for the tall one it was 0.
> This anomaly also seem to affect the web UI. Opening up the graph view or the 
> tree view for the wide DAG takes about 6 seconds on my machine, but for the 
> tall one it takes significantly longer, in fact currently it does not load at 
> all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2128) 'Tall' DAGs scale worse than 'wide' DAGs

2018-03-05 Thread JIRA


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386148#comment-16386148
 ] 

Máté Szabó commented on AIRFLOW-2128:
-

Yes, that's what I meant. But I'd like to emphasize it does not fail, it's just 
really slow. If I let it run for a sufficiently long time it does execute the 
tasks, but I haven't measured the exact time that takes.

> 'Tall' DAGs scale worse than 'wide' DAGs
> 
>
> Key: AIRFLOW-2128
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2128
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: DAG, DagRun, scheduler
>Affects Versions: 1.9.0
>Reporter: Máté Szabó
>Priority: Major
>  Labels: performance, usability
> Attachments: tall_dag.py, wide_dag.py
>
>
> Tall DAG = a DAG with long chains of dependencies, e.g.: 0 -> 1 -> 2 -> ... 
> -> 998 -> 999
> Wide DAG = a DAG with many short, parallel dependencies e.g. 0 -> 1; 0 -> 2; 
> ... 0 -> 999
> Take a super simple case where both graphs are of 1000 tasks, and all the 
> tasks are just "sleep 0.03" bash commands (see the attached files).
> With the default SequentialExecutor (without paralellism), I would expect my 
> 2 example DAGs to take (approximately) the same time to run, but apprently 
> this is not the case.
> For the wide DAG it was about 80 successfully executed tasks in 10 minutes, 
> for the tall one it was 0.
> This anomaly also seem to affect the web UI. Opening up the graph view or the 
> tree view for the wide DAG takes about 6 seconds on my machine, but for the 
> tall one it takes significantly longer, in fact currently it does not load at 
> all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (AIRFLOW-2175) Failed to upgradedb 1.8.2 -> 1.9.0

2018-03-05 Thread Damian Momot (JIRA)

Damian Momot created AIRFLOW-2175:
-

 Summary: Failed to upgradedb 1.8.2 -> 1.9.0
 Key: AIRFLOW-2175
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2175
 Project: Apache Airflow
  Issue Type: Bug
  Components: db
Affects Versions: 1.9.0
Reporter: Damian Momot


We've got airflow installation with hundreds of DAGs and thousands of tasks.

During upgrade (1.8.2 -> 1.9.0) we've got following error.

After analyzing stacktrace i've found that it's most likely caused by None 
value in 'fileloc' field of Dag column. I checked database and indeed we've got 
one record with such value:

 

 
{code:java}
SELECT COUNT(*) FROM dag WHERE fileloc IS NULL;
1
SELECT COUNT(*) FROM dag;
343
{code}
 

 
{code:java}
Traceback (most recent call last):
 File "/usr/local/bin/airflow", line 27, in 
 args.func(args)
 File "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 913, in 
upgradedb
 db_utils.upgradedb()
 File "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 320, 
in upgradedb
 command.upgrade(config, 'heads')
 File "/usr/local/lib/python2.7/dist-packages/alembic/command.py", line 174, in 
upgrade
 script.run_env()
 File "/usr/local/lib/python2.7/dist-packages/alembic/script/base.py", line 
416, in run_env
 util.load_python_file(self.dir, 'env.py')
 File "/usr/local/lib/python2.7/dist-packages/alembic/util/pyfiles.py", line 
93, in load_python_file
 module = load_module_py(module_id, path)
 File "/usr/local/lib/python2.7/dist-packages/alembic/util/compat.py", line 79, 
in load_module_py
 mod = imp.load_source(module_id, path, fp)
 File "/usr/local/lib/python2.7/dist-packages/airflow/migrations/env.py", line 
86, in 
 run_migrations_online()
 File "/usr/local/lib/python2.7/dist-packages/airflow/migrations/env.py", line 
81, in run_migrations_online
 context.run_migrations()
 File "", line 8, in run_migrations
 File "/usr/local/lib/python2.7/dist-packages/alembic/runtime/environment.py", 
line 807, in run_migrations
 self.get_context().run_migrations(**kw)
 File "/usr/local/lib/python2.7/dist-packages/alembic/runtime/migration.py", 
line 321, in run_migrations
 step.migration_fn(**kw)
 File 
"/usr/local/lib/python2.7/dist-packages/airflow/migrations/versions/cc1e65623dc7_add_max_tries_column_to_task_instance.py",
 line 63, in upgrade
 dag = dagbag.get_dag(ti.dag_id)
 File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 232, in 
get_dag
 filepath=orm_dag.fileloc, only_if_updated=False)
 File "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 249, in 
process_file
 if not os.path.isfile(filepath):
 File "/usr/lib/python2.7/genericpath.py", line 29, in isfile
 st = os.stat(path)
TypeError: coercing to Unicode: need string or buffer, NoneType found{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AIRFLOW-2165) XCOM values are being saved as bytestring

2018-03-05 Thread Kaxil Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386030#comment-16386030
 ] 

Kaxil Naik commented on AIRFLOW-2165:
-

It has been mentioned here: 
https://github.com/apache/incubator-airflow/blob/master/UPDATING.md#deprecated-features

> XCOM values are being saved as bytestring
> -
>
> Key: AIRFLOW-2165
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2165
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: xcom
>Affects Versions: 1.9.0
> Environment: Ubuntu
> Airflow 1.9.0 from PIP
>Reporter: Cong Qin
>Priority: Major
> Attachments: Screen Shot 2018-03-02 at 11.09.15 AM.png
>
>
> I noticed after upgrading to 1.9.0 that XCOM values are now being saved as 
> byte strings that cannot be decoded. Once I downgraded back to 1.8.2 the 
> "old" behavior is back.
> It means that when I'm storing certain values inside I cannot pull those 
> values back out sometimes. I'm not sure if this was a documented change 
> anywhere (I looked at the changelog between 1.8.2 and 1.9.0) and I couldn't 
> find out if this was a config level change or something.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

53 matches

Mail list logo