[jira] [Created] (AIRFLOW-1559) MySQL warnings about aborted connections, missing engine disposal

2017-09-01 Thread Daniel Huang (JIRA)
Daniel Huang created AIRFLOW-1559:
-

 Summary: MySQL warnings about aborted connections, missing engine 
disposal
 Key: AIRFLOW-1559
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1559
 Project: Apache Airflow
  Issue Type: Bug
  Components: db
Reporter: Daniel Huang
Priority: Minor


We're seeing a flood of warnings about aborted connections in our MySQL logs. 

{code}
Aborted connection 56720 to db: 'airflow' user: 'foo' host: 'x.x.x.x' (Got an 
error reading communication packets)
{code}

It appears this is because we're not performing [engine 
disposal|http://docs.sqlalchemy.org/en/latest/core/connections.html#engine-disposal].
 The most common source of this warning is from the scheduler, when it kicks 
off new processes to process the DAG files. Calling dispose in 
https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L403 
greatly reduced these messages. However, the worker is still causing some of 
these, I assume from when we spin up processes to run tasks. We do call dispose 
in 
https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1394-L1396,
 but I think it's a bit early. Not sure if there's a place we can put this 
cleanup to ensure it's done everywhere.

Quick script to reproduce this warning message:

{code}
from airflow import settings
from airflow.models import Connection

session = settings.Session()
session.query(Connection).count()
session.close()
# not calling settings.engine.dispose()
{code}

Reproduced with Airflow 1.8.1, MySQL 5.7, and SQLAlchemy 1.1.13. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1558) S3FileTransformOperator fails in Python 3 due to file mode

2017-09-01 Thread Adam Wentz (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Wentz updated AIRFLOW-1558:

Description: 
When running under python3 the S3FileTransformOperator fails with the following 
error:

{noformat}
[2017-09-01 18:44:54,440] {models.py:1427} ERROR - write() argument must be 
str, not bytes
[2017-09-01 18:44:54,443] {base_task_runner.py:95} INFO - Subtask: Traceback 
(most recent call last):
[2017-09-01 18:44:54,444] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1384, in run
[2017-09-01 18:44:54,445] {base_task_runner.py:95} INFO - Subtask: result = 
task_copy.execute(context=context)
[2017-09-01 18:44:54,445] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/lib/python3.6/site-packages/airflow/operators/s3_file_transform_operator.py",
 line 87, in execute
[2017-09-01 18:44:54,446] {base_task_runner.py:95} INFO - Subtask: 
source_s3_key_object.get_contents_to_file(f_source)
[2017-09-01 18:44:54,446] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/airflow/.local/lib/python3.6/site-packages/boto/s3/key.py", line 
1662, in get_contents_to_file
[2017-09-01 18:44:54,447] {base_task_runner.py:95} INFO - Subtask: 
response_headers=response_headers)
[2017-09-01 18:44:54,447] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/airflow/.local/lib/python3.6/site-packages/boto/s3/key.py", line 
1494, in get_file
[2017-09-01 18:44:54,447] {base_task_runner.py:95} INFO - Subtask: 
query_args=None)
[2017-09-01 18:44:54,448] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/airflow/.local/lib/python3.6/site-packages/boto/s3/key.py", line 
1548, in _get_file_internal
[2017-09-01 18:44:54,448] {base_task_runner.py:95} INFO - Subtask: 
fp.write(bytes)
[2017-09-01 18:44:54,448] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/lib/python3.6/tempfile.py", line 483, in func_wrapper
[2017-09-01 18:44:54,449] {base_task_runner.py:95} INFO - Subtask: return 
func(*args, **kwargs)
[2017-09-01 18:44:54,449] {base_task_runner.py:95} INFO - Subtask: TypeError: 
write() argument must be str, not bytes
[2017-09-01 18:44:54,450] {base_task_runner.py:95} INFO - Subtask: [2017-09-01 
18:44:54,443] {models.py:1451} INFO - Marking task as FAILED.

{noformat}

The solution is to open the `NamedTemporaryFile`s with mode `wb` rather than 
`w`. PR: https://github.com/apache/incubator-airflow/pull/2559

  was:
When running under python3 the S3FileTransformOperator fails with the following 
error:

{noformat}
[2017-09-01 18:44:54,440] {models.py:1427} ERROR - write() argument must be 
str, not bytes
[2017-09-01 18:44:54,443] {base_task_runner.py:95} INFO - Subtask: Traceback 
(most recent call last):
[2017-09-01 18:44:54,444] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1384, in run
[2017-09-01 18:44:54,445] {base_task_runner.py:95} INFO - Subtask: result = 
task_copy.execute(context=context)
[2017-09-01 18:44:54,445] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/lib/python3.6/site-packages/airflow/operators/s3_file_transform_operator.py",
 line 87, in execute
[2017-09-01 18:44:54,446] {base_task_runner.py:95} INFO - Subtask: 
source_s3_key_object.get_contents_to_file(f_source)
[2017-09-01 18:44:54,446] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/airflow/.local/lib/python3.6/site-packages/boto/s3/key.py", line 
1662, in get_contents_to_file
[2017-09-01 18:44:54,447] {base_task_runner.py:95} INFO - Subtask: 
response_headers=response_headers)
[2017-09-01 18:44:54,447] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/airflow/.local/lib/python3.6/site-packages/boto/s3/key.py", line 
1494, in get_file
[2017-09-01 18:44:54,447] {base_task_runner.py:95} INFO - Subtask: 
query_args=None)
[2017-09-01 18:44:54,448] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/airflow/.local/lib/python3.6/site-packages/boto/s3/key.py", line 
1548, in _get_file_internal
[2017-09-01 18:44:54,448] {base_task_runner.py:95} INFO - Subtask: 
fp.write(bytes)
[2017-09-01 18:44:54,448] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/lib/python3.6/tempfile.py", line 483, in func_wrapper
[2017-09-01 18:44:54,449] {base_task_runner.py:95} INFO - Subtask: return 
func(*args, **kwargs)
[2017-09-01 18:44:54,449] {base_task_runner.py:95} INFO - Subtask: TypeError: 
write() argument must be str, not bytes
[2017-09-01 18:44:54,450] {base_task_runner.py:95} INFO - Subtask: [2017-09-01 
18:44:54,443] {models.py:1451} INFO - Marking task as FAILED.

{noformat}

The solution is to open the `NamedTemporaryFile`s with mode `wb` rather than 
`w`. I have an incoming PR for this.


> S3FileTransformOperator fails in Python 3 due to file mode
> --

[jira] [Commented] (AIRFLOW-1330) Connection.parse_from_uri doesn't work for google_cloud_platform and so on

2017-09-01 Thread Chris Riccomini (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151082#comment-16151082
 ] 

Chris Riccomini commented on AIRFLOW-1330:
--

I think we have to continue supporting conn_url for backwards compatibility. I 
like adding conn type, and I am not a fan of the strange URI (missing scheme) 
in option (1). So, I guess that would leave us with option (2). Though I wonder 
if we're over thinking this, and it'd be easier to just change the handling of 
underscores in the URI parsing code.

> Connection.parse_from_uri doesn't work for google_cloud_platform and so on
> --
>
> Key: AIRFLOW-1330
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1330
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Reporter: Yu Ishikawa
>Assignee: Shintaro Murakami
>
> h2. Overview
> {{Connection.parse_from_uri}} doesn't work for some types like 
> {{google_cloud_platform}} whose type name includes under scores. Since 
> `urllib.parse.urlparse()` which is used in {{Connection.parse_from_url}} 
> doesn't support a schema name which include under scores.
> So, airflow's CLI doesn't work when a given connection URI includes under 
> scores like {{google_cloud_platform://X}}.
> h3. Workaround
> https://medium.com/@yuu.ishikawa/apache-airflow-how-to-add-a-connection-to-google-cloud-with-cli-af2cc8df138d



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (AIRFLOW-1556) BigQueryBaseCursor should support SQL parameters

2017-09-01 Thread Chris Riccomini (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini resolved AIRFLOW-1556.
--
   Resolution: Fixed
Fix Version/s: 1.9.0

> BigQueryBaseCursor should support SQL parameters
> 
>
> Key: AIRFLOW-1556
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1556
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Reporter: Rajiv Bharadwaja
>Assignee: Rajiv Bharadwaja
>Priority: Minor
>  Labels: newbie
> Fix For: 1.9.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> BigQuery supports running parameterized queries:
> https://cloud.google.com/bigquery/docs/parameterized-queries
> Airflow's bigquery_hook.py and bigquery_operator.py do not support this yet. 
> https://github.com/apache/incubator-airflow/commits/master/airflow/contrib/hooks/bigquery_hook.py
> https://github.com/apache/incubator-airflow/commits/master/airflow/contrib/operators/bigquery_operator.py
> SQL parameter support is different from the jinja templating / airflow macro 
> support for parameterizing query strings; the latter works at the query 
> string level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AIRFLOW-1556) BigQueryBaseCursor should support SQL parameters

2017-09-01 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151076#comment-16151076
 ] 

ASF subversion and git services commented on AIRFLOW-1556:
--

Commit 9df0ac64c0ce1a654875197697a3851484fd57af in incubator-airflow's branch 
refs/heads/master from [~rajivpb]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=9df0ac6 ]

[AIRFLOW-1556][Airflow 1556] Add support for SQL parameters in 
BigQueryBaseCursor

Closes #2557 from rajivpb/sql-parameters


> BigQueryBaseCursor should support SQL parameters
> 
>
> Key: AIRFLOW-1556
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1556
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: contrib
>Reporter: Rajiv Bharadwaja
>Assignee: Rajiv Bharadwaja
>Priority: Minor
>  Labels: newbie
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> BigQuery supports running parameterized queries:
> https://cloud.google.com/bigquery/docs/parameterized-queries
> Airflow's bigquery_hook.py and bigquery_operator.py do not support this yet. 
> https://github.com/apache/incubator-airflow/commits/master/airflow/contrib/hooks/bigquery_hook.py
> https://github.com/apache/incubator-airflow/commits/master/airflow/contrib/operators/bigquery_operator.py
> SQL parameter support is different from the jinja templating / airflow macro 
> support for parameterizing query strings; the latter works at the query 
> string level.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


incubator-airflow git commit: [AIRFLOW-1556][Airflow 1556] Add support for SQL parameters in BigQueryBaseCursor

2017-09-01 Thread criccomini
Repository: incubator-airflow
Updated Branches:
  refs/heads/master de593216d -> 9df0ac64c


[AIRFLOW-1556][Airflow 1556] Add support for SQL parameters in 
BigQueryBaseCursor

Closes #2557 from rajivpb/sql-parameters


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/9df0ac64
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/9df0ac64
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/9df0ac64

Branch: refs/heads/master
Commit: 9df0ac64c0ce1a654875197697a3851484fd57af
Parents: de59321
Author: Rajiv Bharadwaja 
Authored: Fri Sep 1 12:59:11 2017 -0700
Committer: Chris Riccomini 
Committed: Fri Sep 1 12:59:11 2017 -0700

--
 airflow/contrib/hooks/bigquery_hook.py | 6 +-
 airflow/contrib/operators/bigquery_operator.py | 8 +++-
 2 files changed, 12 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/9df0ac64/airflow/contrib/hooks/bigquery_hook.py
--
diff --git a/airflow/contrib/hooks/bigquery_hook.py 
b/airflow/contrib/hooks/bigquery_hook.py
index b979ed9..e60f597 100644
--- a/airflow/contrib/hooks/bigquery_hook.py
+++ b/airflow/contrib/hooks/bigquery_hook.py
@@ -196,7 +196,8 @@ class BigQueryBaseCursor(object):
 udf_config = False,
 use_legacy_sql=True,
 maximum_billing_tier=None,
-create_disposition='CREATE_IF_NEEDED'):
+create_disposition='CREATE_IF_NEEDED',
+query_params=None):
 """
 Executes a BigQuery SQL query. Optionally persists results in a 
BigQuery
 table. See here:
@@ -255,6 +256,9 @@ class BigQueryBaseCursor(object):
 'userDefinedFunctionResources': udf_config
 })
 
+if query_params:
+configuration['query']['queryParameters'] = query_params
+
 return self.run_with_configuration(configuration)
 
 def run_extract(  # noqa

http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/9df0ac64/airflow/contrib/operators/bigquery_operator.py
--
diff --git a/airflow/contrib/operators/bigquery_operator.py 
b/airflow/contrib/operators/bigquery_operator.py
index aaffc2e..3b804a8 100644
--- a/airflow/contrib/operators/bigquery_operator.py
+++ b/airflow/contrib/operators/bigquery_operator.py
@@ -51,6 +51,10 @@ class BigQueryOperator(BaseOperator):
 :param maximum_billing_tier: Positive integer that serves as a multiplier 
of the basic price.
 Defaults to None, in which case it uses the value set in the project.
 :type maximum_billing_tier: integer
+:param query_params: a dictionary containing query parameter types and 
values, passed to
+BigQuery.
+:type query_params: dict
+
 """
 template_fields = ('bql', 'destination_dataset_table')
 template_ext = ('.sql',)
@@ -68,6 +72,7 @@ class BigQueryOperator(BaseOperator):
  use_legacy_sql=True,
  maximum_billing_tier=None,
  create_disposition='CREATE_IF_NEEDED',
+ query_params=None,
  *args,
  **kwargs):
 super(BigQueryOperator, self).__init__(*args, **kwargs)
@@ -81,6 +86,7 @@ class BigQueryOperator(BaseOperator):
 self.udf_config = udf_config
 self.use_legacy_sql = use_legacy_sql
 self.maximum_billing_tier = maximum_billing_tier
+self.query_params = query_params
 
 def execute(self, context):
 logging.info('Executing: %s', self.bql)
@@ -91,4 +97,4 @@ class BigQueryOperator(BaseOperator):
 cursor.run_query(self.bql, self.destination_dataset_table, 
self.write_disposition,
  self.allow_large_results, self.udf_config,
  self.use_legacy_sql, self.maximum_billing_tier,
- self.create_disposition)
+ self.create_disposition, self.query_params)



[jira] [Updated] (AIRFLOW-1558) S3FileTransformOperator fails in Python 3 due to file mode

2017-09-01 Thread Adam Wentz (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Wentz updated AIRFLOW-1558:

Description: 
When running under python3 the S3FileTransformOperator fails with the following 
error:

{noformat}
[2017-09-01 18:44:54,440] {models.py:1427} ERROR - write() argument must be 
str, not bytes
[2017-09-01 18:44:54,443] {base_task_runner.py:95} INFO - Subtask: Traceback 
(most recent call last):
[2017-09-01 18:44:54,444] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1384, in run
[2017-09-01 18:44:54,445] {base_task_runner.py:95} INFO - Subtask: result = 
task_copy.execute(context=context)
[2017-09-01 18:44:54,445] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/lib/python3.6/site-packages/airflow/operators/s3_file_transform_operator.py",
 line 87, in execute
[2017-09-01 18:44:54,446] {base_task_runner.py:95} INFO - Subtask: 
source_s3_key_object.get_contents_to_file(f_source)
[2017-09-01 18:44:54,446] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/airflow/.local/lib/python3.6/site-packages/boto/s3/key.py", line 
1662, in get_contents_to_file
[2017-09-01 18:44:54,447] {base_task_runner.py:95} INFO - Subtask: 
response_headers=response_headers)
[2017-09-01 18:44:54,447] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/airflow/.local/lib/python3.6/site-packages/boto/s3/key.py", line 
1494, in get_file
[2017-09-01 18:44:54,447] {base_task_runner.py:95} INFO - Subtask: 
query_args=None)
[2017-09-01 18:44:54,448] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/airflow/.local/lib/python3.6/site-packages/boto/s3/key.py", line 
1548, in _get_file_internal
[2017-09-01 18:44:54,448] {base_task_runner.py:95} INFO - Subtask: 
fp.write(bytes)
[2017-09-01 18:44:54,448] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/lib/python3.6/tempfile.py", line 483, in func_wrapper
[2017-09-01 18:44:54,449] {base_task_runner.py:95} INFO - Subtask: return 
func(*args, **kwargs)
[2017-09-01 18:44:54,449] {base_task_runner.py:95} INFO - Subtask: TypeError: 
write() argument must be str, not bytes
[2017-09-01 18:44:54,450] {base_task_runner.py:95} INFO - Subtask: [2017-09-01 
18:44:54,443] {models.py:1451} INFO - Marking task as FAILED.

{noformat}

The solution is to open the `NamedTemporaryFile`s with mode `wb` rather than 
`w`. I have an incoming PR for this.

  was:
When running under python3 the S3FileTransformOperator fails with the following 
error:

```
[2017-09-01 18:44:54,440] {models.py:1427} ERROR - write() argument must be 
str, not bytes
[2017-09-01 18:44:54,443] {base_task_runner.py:95} INFO - Subtask: Traceback 
(most recent call last):
[2017-09-01 18:44:54,444] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1384, in run
[2017-09-01 18:44:54,445] {base_task_runner.py:95} INFO - Subtask: result = 
task_copy.execute(context=context)
[2017-09-01 18:44:54,445] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/lib/python3.6/site-packages/airflow/operators/s3_file_transform_operator.py",
 line 87, in execute
[2017-09-01 18:44:54,446] {base_task_runner.py:95} INFO - Subtask: 
source_s3_key_object.get_contents_to_file(f_source)
[2017-09-01 18:44:54,446] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/airflow/.local/lib/python3.6/site-packages/boto/s3/key.py", line 
1662, in get_contents_to_file
[2017-09-01 18:44:54,447] {base_task_runner.py:95} INFO - Subtask: 
response_headers=response_headers)
[2017-09-01 18:44:54,447] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/airflow/.local/lib/python3.6/site-packages/boto/s3/key.py", line 
1494, in get_file
[2017-09-01 18:44:54,447] {base_task_runner.py:95} INFO - Subtask: 
query_args=None)
[2017-09-01 18:44:54,448] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/airflow/.local/lib/python3.6/site-packages/boto/s3/key.py", line 
1548, in _get_file_internal
[2017-09-01 18:44:54,448] {base_task_runner.py:95} INFO - Subtask: 
fp.write(bytes)
[2017-09-01 18:44:54,448] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/lib/python3.6/tempfile.py", line 483, in func_wrapper
[2017-09-01 18:44:54,449] {base_task_runner.py:95} INFO - Subtask: return 
func(*args, **kwargs)
[2017-09-01 18:44:54,449] {base_task_runner.py:95} INFO - Subtask: TypeError: 
write() argument must be str, not bytes
[2017-09-01 18:44:54,450] {base_task_runner.py:95} INFO - Subtask: [2017-09-01 
18:44:54,443] {models.py:1451} INFO - Marking task as FAILED.
```
The solution is to open the `NamedTemporaryFile`s with mode `wb` rather than 
`w`. I have an incoming PR for this.


> S3FileTransformOperator fails in Python 3 due to file mode
> --
>
> K

[jira] [Created] (AIRFLOW-1558) S3FileTransformOperator fails in Python 3 due to file mode

2017-09-01 Thread Adam Wentz (JIRA)
Adam Wentz created AIRFLOW-1558:
---

 Summary: S3FileTransformOperator fails in Python 3 due to file mode
 Key: AIRFLOW-1558
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1558
 Project: Apache Airflow
  Issue Type: Bug
  Components: operators
Affects Versions: Airflow 1.8
 Environment: python3
Reporter: Adam Wentz
Priority: Minor


When running under python3 the S3FileTransformOperator fails with the following 
error:

```
[2017-09-01 18:44:54,440] {models.py:1427} ERROR - write() argument must be 
str, not bytes
[2017-09-01 18:44:54,443] {base_task_runner.py:95} INFO - Subtask: Traceback 
(most recent call last):
[2017-09-01 18:44:54,444] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/lib/python3.6/site-packages/airflow/models.py", line 1384, in run
[2017-09-01 18:44:54,445] {base_task_runner.py:95} INFO - Subtask: result = 
task_copy.execute(context=context)
[2017-09-01 18:44:54,445] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/lib/python3.6/site-packages/airflow/operators/s3_file_transform_operator.py",
 line 87, in execute
[2017-09-01 18:44:54,446] {base_task_runner.py:95} INFO - Subtask: 
source_s3_key_object.get_contents_to_file(f_source)
[2017-09-01 18:44:54,446] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/airflow/.local/lib/python3.6/site-packages/boto/s3/key.py", line 
1662, in get_contents_to_file
[2017-09-01 18:44:54,447] {base_task_runner.py:95} INFO - Subtask: 
response_headers=response_headers)
[2017-09-01 18:44:54,447] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/airflow/.local/lib/python3.6/site-packages/boto/s3/key.py", line 
1494, in get_file
[2017-09-01 18:44:54,447] {base_task_runner.py:95} INFO - Subtask: 
query_args=None)
[2017-09-01 18:44:54,448] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/airflow/.local/lib/python3.6/site-packages/boto/s3/key.py", line 
1548, in _get_file_internal
[2017-09-01 18:44:54,448] {base_task_runner.py:95} INFO - Subtask: 
fp.write(bytes)
[2017-09-01 18:44:54,448] {base_task_runner.py:95} INFO - Subtask:   File 
"/usr/local/lib/python3.6/tempfile.py", line 483, in func_wrapper
[2017-09-01 18:44:54,449] {base_task_runner.py:95} INFO - Subtask: return 
func(*args, **kwargs)
[2017-09-01 18:44:54,449] {base_task_runner.py:95} INFO - Subtask: TypeError: 
write() argument must be str, not bytes
[2017-09-01 18:44:54,450] {base_task_runner.py:95} INFO - Subtask: [2017-09-01 
18:44:54,443] {models.py:1451} INFO - Marking task as FAILED.
```
The solution is to open the `NamedTemporaryFile`s with mode `wb` rather than 
`w`. I have an incoming PR for this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AIRFLOW-1330) Connection.parse_from_uri doesn't work for google_cloud_platform and so on

2017-09-01 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa updated AIRFLOW-1330:
-
Description: 
h2. Overview

{{Connection.parse_from_uri}} doesn't work for some types like 
{{google_cloud_platform}} whose type name includes under scores. Since 
`urllib.parse.urlparse()` which is used in {{Connection.parse_from_url}} 
doesn't support a schema name which include under scores.

So, airflow's CLI doesn't work when a given connection URI includes under 
scores like {{google_cloud_platform://X}}.

h3. Workaround
https://medium.com/@yuu.ishikawa/apache-airflow-how-to-add-a-connection-to-google-cloud-with-cli-af2cc8df138d

  was:
h2. Overview

{{Connection.parse_from_uri}} doesn't work for some types like 
{{google_cloud_platform}} whose type name includes under scores. Since 
`urllib.parse.urlparse()` which is used in {{Connection.parse_from_url}} 
doesn't support a schema name which include under scores.

So, airflow's CLI doesn't work when a given connection URI includes under 
scores like {{google_cloud_platform://X}}.


> Connection.parse_from_uri doesn't work for google_cloud_platform and so on
> --
>
> Key: AIRFLOW-1330
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1330
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Reporter: Yu Ishikawa
>Assignee: Shintaro Murakami
>
> h2. Overview
> {{Connection.parse_from_uri}} doesn't work for some types like 
> {{google_cloud_platform}} whose type name includes under scores. Since 
> `urllib.parse.urlparse()` which is used in {{Connection.parse_from_url}} 
> doesn't support a schema name which include under scores.
> So, airflow's CLI doesn't work when a given connection URI includes under 
> scores like {{google_cloud_platform://X}}.
> h3. Workaround
> https://medium.com/@yuu.ishikawa/apache-airflow-how-to-add-a-connection-to-google-cloud-with-cli-af2cc8df138d



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (AIRFLOW-1330) Connection.parse_from_uri doesn't work for google_cloud_platform and so on

2017-09-01 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/AIRFLOW-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa reassigned AIRFLOW-1330:


Assignee: Shintaro Murakami

> Connection.parse_from_uri doesn't work for google_cloud_platform and so on
> --
>
> Key: AIRFLOW-1330
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1330
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Reporter: Yu Ishikawa
>Assignee: Shintaro Murakami
>
> h2. Overview
> {{Connection.parse_from_uri}} doesn't work for some types like 
> {{google_cloud_platform}} whose type name includes under scores. Since 
> `urllib.parse.urlparse()` which is used in {{Connection.parse_from_url}} 
> doesn't support a schema name which include under scores.
> So, airflow's CLI doesn't work when a given connection URI includes under 
> scores like {{google_cloud_platform://X}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AIRFLOW-1557) backfill ignores configured number of slots in a pool

2017-09-01 Thread Ash Berlin-Taylor (JIRA)
Ash Berlin-Taylor created AIRFLOW-1557:
--

 Summary: backfill ignores configured number of slots in a pool
 Key: AIRFLOW-1557
 URL: https://issues.apache.org/jira/browse/AIRFLOW-1557
 Project: Apache Airflow
  Issue Type: Bug
Affects Versions: 1.8.1
Reporter: Ash Berlin-Taylor
 Attachments: Screen Shot 2017-09-01 at 11.39.32.png

I ran a backfill command with this tool:
The backfill process appears to run as many tasks as possible, even when the 
pool it is running in should limit the numbers:

{noformat}
airflow backfill \
  -t fetch_dk_unfiltered \
  --pool brand_index_api \
  -s 2017-07-31 -e 2017-08-31 \
  -x \
  brand_index_fetcher 
{noformat}

(Nothing else than the backfill is currently using this pool. I wasn't able to 
capture a screen shot of the task instances before the jobs completed.)




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)