incubator-airflow git commit: [AIRFLOW-2737] Restore original license header

2018-07-19 Thread bolke
Repository: incubator-airflow
Updated Branches:
  refs/heads/master af4d9614e -> e9a09c271


[AIRFLOW-2737] Restore original license header

Closes #3591 from seelmann/AIRFLOW-2737-restore-
original-license-header


Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/e9a09c27
Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/e9a09c27
Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/e9a09c27

Branch: refs/heads/master
Commit: e9a09c271dd8e1dc4fbe176761a1ce338f711d55
Parents: af4d961
Author: Stefan Seelmann 
Authored: Thu Jul 19 10:32:25 2018 +0200
Committer: Bolke de Bruin 
Committed: Thu Jul 19 10:32:25 2018 +0200

--
 airflow/api/auth/backend/kerberos_auth.py | 35 +++---
 1 file changed, 21 insertions(+), 14 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/e9a09c27/airflow/api/auth/backend/kerberos_auth.py
--
diff --git a/airflow/api/auth/backend/kerberos_auth.py 
b/airflow/api/auth/backend/kerberos_auth.py
index 7e560fb..50a8810 100644
--- a/airflow/api/auth/backend/kerberos_auth.py
+++ b/airflow/api/auth/backend/kerberos_auth.py
@@ -1,21 +1,28 @@
 # -*- coding: utf-8 -*-
 #
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
+# Copyright (c) 2013, Michael Komitee
+# All rights reserved.
 #
-#   http://www.apache.org/licenses/LICENSE-2.0
+# Redistribution and use in source and binary forms, with or without 
modification,
+# are permitted provided that the following conditions are met:
 #
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
+# 1. Redistributions of source code must retain the above copyright notice,
+# this list of conditions and the following disclaimer.
+#
+# 2. Redistributions in binary form must reproduce the above copyright notice,
+# this list of conditions and the following disclaimer in the documentation
+# and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 
AND
+# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 
FOR
+# ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 
DAMAGES
+# (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+# LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
 from future.standard_library import install_aliases
 



[jira] [Commented] (AIRFLOW-2737) Restore original license header

2018-07-19 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548990#comment-16548990
 ] 

ASF subversion and git services commented on AIRFLOW-2737:
--

Commit e9a09c271dd8e1dc4fbe176761a1ce338f711d55 in incubator-airflow's branch 
refs/heads/master from [~seelmann]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=e9a09c2 ]

[AIRFLOW-2737] Restore original license header

Closes #3591 from seelmann/AIRFLOW-2737-restore-
original-license-header


> Restore original license header
> ---
>
> Key: AIRFLOW-2737
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2737
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
>
> The original license header in airflow/api/auth/backend/kerberos_auth.py was 
> replaced with the AL. It should be restored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2737) Restore original license header

2018-07-19 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548991#comment-16548991
 ] 

ASF subversion and git services commented on AIRFLOW-2737:
--

Commit e9a09c271dd8e1dc4fbe176761a1ce338f711d55 in incubator-airflow's branch 
refs/heads/master from [~seelmann]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=e9a09c2 ]

[AIRFLOW-2737] Restore original license header

Closes #3591 from seelmann/AIRFLOW-2737-restore-
original-license-header


> Restore original license header
> ---
>
> Key: AIRFLOW-2737
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2737
> Project: Apache Airflow
>  Issue Type: Improvement
>Affects Versions: 1.9.0
>Reporter: Stefan Seelmann
>Assignee: Stefan Seelmann
>Priority: Major
> Fix For: 2.0.0
>
>
> The original license header in airflow/api/auth/backend/kerberos_auth.py was 
> replaced with the AL. It should be restored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2766) Replicated *Base date* in each tab on DAG detail view

2018-07-19 Thread Verdan Mahmood (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Verdan Mahmood updated AIRFLOW-2766:

Description: 
At the moment, most of the tabs on DAG detail page has the *Base date* feature 
which behaves only for that particular tab/view. From end-user's perspective it 
is confusing as a user assume to select a date and then see all views based on 
that date

The idea is to move the *Base date* outside the tabs, and each tab will then 
behave on the *Base date* on the Global level. Users will then only need to 
select the base date once and every tab will behave on that date. 

Please see attached screenshot for reference

Or, the second approach could be to make sure the change of Base Date in each 
tab remains the same and doesn't change on page refresh, and then move the 
DAG's meta data tabs (Details, Code, Refresh and Delete buttons) to the top, 
separating group of metadata and graphs. 

  was:
At the moment, most of the tabs on DAG detail page has the *Base date* feature 
which behaves only for that particular tab/view. From end-user's perspective it 
is confusing as a user assume to select a date and then see all views based on 
that date

The idea is to move the *Base date* outside the tabs, and each tab will then 
behave on the *Base date* on the Global level. Users will then only need to 
select the base date once and every tab will behave on that date. 

Please see attached screenshot for reference


> Replicated *Base date* in each tab on DAG detail view
> -
>
> Key: AIRFLOW-2766
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2766
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Verdan Mahmood
>Priority: Minor
> Attachments: Screen Shot 2018-07-18 at 1.57.22 PM.png
>
>
> At the moment, most of the tabs on DAG detail page has the *Base date* 
> feature which behaves only for that particular tab/view. From end-user's 
> perspective it is confusing as a user assume to select a date and then see 
> all views based on that date
> The idea is to move the *Base date* outside the tabs, and each tab will then 
> behave on the *Base date* on the Global level. Users will then only need to 
> select the base date once and every tab will behave on that date. 
> Please see attached screenshot for reference
> Or, the second approach could be to make sure the change of Base Date in each 
> tab remains the same and doesn't change on page refresh, and then move the 
> DAG's meta data tabs (Details, Code, Refresh and Delete buttons) to the top, 
> separating group of metadata and graphs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2766) Replicated *Base date* in each tab on DAG detail view

2018-07-19 Thread Verdan Mahmood (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Verdan Mahmood updated AIRFLOW-2766:

Description: 
At the moment, most of the tabs on DAG detail page has the *Base date* feature 
which behaves only for that particular tab/view. From end-user's perspective it 
is confusing as a user assume to select a date and then see all views based on 
that date

The idea is to move the *Base date* outside the tabs, and each tab will then 
behave on the *Base date* on the Global level. Users will then only need to 
select the base date once and every tab will behave on that date. 

Please see attached screenshot for reference

Or, the second approach could be to make sure the change of Base Date in each 
tab remains the same and doesn't change on page refresh.

  was:
At the moment, most of the tabs on DAG detail page has the *Base date* feature 
which behaves only for that particular tab/view. From end-user's perspective it 
is confusing as a user assume to select a date and then see all views based on 
that date

The idea is to move the *Base date* outside the tabs, and each tab will then 
behave on the *Base date* on the Global level. Users will then only need to 
select the base date once and every tab will behave on that date. 

Please see attached screenshot for reference

Or, the second approach could be to make sure the change of Base Date in each 
tab remains the same and doesn't change on page refresh, and then move the 
DAG's meta data tabs (Details, Code, Refresh and Delete buttons) to the top, 
separating group of metadata and graphs. 


> Replicated *Base date* in each tab on DAG detail view
> -
>
> Key: AIRFLOW-2766
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2766
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Verdan Mahmood
>Assignee: Verdan Mahmood
>Priority: Minor
> Attachments: Screen Shot 2018-07-18 at 1.57.22 PM.png
>
>
> At the moment, most of the tabs on DAG detail page has the *Base date* 
> feature which behaves only for that particular tab/view. From end-user's 
> perspective it is confusing as a user assume to select a date and then see 
> all views based on that date
> The idea is to move the *Base date* outside the tabs, and each tab will then 
> behave on the *Base date* on the Global level. Users will then only need to 
> select the base date once and every tab will behave on that date. 
> Please see attached screenshot for reference
> Or, the second approach could be to make sure the change of Base Date in each 
> tab remains the same and doesn't change on page refresh.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2770) add support for dag folder in the docker image

2018-07-19 Thread Rurui Ye (JIRA)
Rurui Ye created AIRFLOW-2770:
-

 Summary: add support for dag folder in the docker image
 Key: AIRFLOW-2770
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2770
 Project: Apache Airflow
  Issue Type: Improvement
Reporter: Rurui Ye


currently the kube executor need to provider dag_volume_chain or git repo in 
the config file, but if the user has build dag into their docker image, they 
doesn't need to provider these two options, and they can manager their dag 
version by manager the docker image version. 

So I suppose we can add the a new configuration as kube.config.dag_folder_path 
along with dag_volume_chain and git repo. with this config, we can run the 
worker just from the dags in docker image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2770) kubernetes: add support for dag folder in the docker image

2018-07-19 Thread Rurui Ye (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rurui Ye updated AIRFLOW-2770:
--
Summary: kubernetes: add support for dag folder in the docker image  (was: 
add support for dag folder in the docker image)

> kubernetes: add support for dag folder in the docker image
> --
>
> Key: AIRFLOW-2770
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2770
> Project: Apache Airflow
>  Issue Type: Improvement
>Reporter: Rurui Ye
>Priority: Critical
>
> currently the kube executor need to provider dag_volume_chain or git repo in 
> the config file, but if the user has build dag into their docker image, they 
> doesn't need to provider these two options, and they can manager their dag 
> version by manager the docker image version. 
> So I suppose we can add the a new configuration as 
> kube.config.dag_folder_path along with dag_volume_chain and git repo. with 
> this config, we can run the worker just from the dags in docker image.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2771) S3Hook Broad Exception Silent Failure

2018-07-19 Thread Micheal Ascah (JIRA)
Micheal Ascah created AIRFLOW-2771:
--

 Summary: S3Hook Broad Exception Silent Failure
 Key: AIRFLOW-2771
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2771
 Project: Apache Airflow
  Issue Type: Bug
  Components: hooks
Affects Versions: Airflow 2.0, 1.9.0
Reporter: Micheal Ascah
Assignee: Micheal Ascah


h2. Scenario

S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
bad permissions). When poking for the key, it creates an S3Hook and calls 
`check_for_key` on the hook. Currently, the call is caught by a generic except 
clause that catches all exceptions, rather than the expected 
botocore.exceptions.ClientError when an object is not found.
h2. Problem

This causes the sensor to return False and report no issue with the task 
instance until it times out, rather than intuitively failing immediately if the 
connection is incorrectly configured. The current logging output gives no 
insight as to why the key is not being found.
h4. Current code
{code:python}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except:  # <- This catches credential and connection exceptions that should be 
raised
return False
{code}
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
h4. Expected
h5. No credentials
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
Traceback (most recent call last):
...
botocore.exceptions.NoCredentialsError: Unable to locate credentials
{code}
h5. Good credentials
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_does_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
h4. Proposed Change

Add a type to the except clause for botocore.exceptions.ClientError and log the 
message.

{code:python}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except ClientError as e:
self.log.info(e.response["Error"]["Message"]) 
return False
{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2771) S3Hook Broad Exception Silent Failure

2018-07-19 Thread Micheal Ascah (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micheal Ascah updated AIRFLOW-2771:
---
Description: 
h2. Scenario

S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
bad permissions). When poking for the key, it creates an S3Hook and calls 
`check_for_key` on the hook. Currently, the call is caught by a generic except 
clause that catches all exceptions, rather than the expected 
botocore.exceptions.ClientError when an object is not found.
h2. Problem

This causes the sensor to return False and report no issue with the task 
instance until it times out, rather than intuitively failing immediately if the 
connection is incorrectly configured. The current logging output gives no 
insight as to why the key is not being found.
h4. Current code
{code}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except:  # <- This catches credential and connection exceptions that should be 
raised
return False
{code}
{code}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
h4. Expected
h5. No credentials
{code}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
Traceback (most recent call last):
...
botocore.exceptions.NoCredentialsError: Unable to locate credentials
{code}
h5. Good credentials
{code}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_does_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
h4. Proposed Change

Add a type to the except clause for botocore.exceptions.ClientError and log the 
message for both check_for_key and check_for_bucket on S3Hook.
{code}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except ClientError as e:
self.log.info(e.response["Error"]["Message"]) 
return False
{code}
  

  was:
h2. Scenario

S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
bad permissions). When poking for the key, it creates an S3Hook and calls 
`check_for_key` on the hook. Currently, the call is caught by a generic except 
clause that catches all exceptions, rather than the expected 
botocore.exceptions.ClientError when an object is not found.
h2. Problem

This causes the sensor to return False and report no issue with the task 
instance until it times out, rather than intuitively failing immediately if the 
connection is incorrectly configured. The current logging output gives no 
insight as to why the key is not being found.
h4. Current code
{code:python}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except:  # <- This catches credential and connection exceptions that should be 
raised
return False
{code}
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
h4. Expected
h5. No credentials
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
Traceback (most recent call last):
...
botocore.exceptions.NoCredentialsError: Unable to locate credentials
{code}
h5. Good credentials
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_does_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
h4. Proposed Change

Add a type to the except clause for botocore.exceptions.ClientError and log the 
message.

{code:python}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except ClientError as e:
self.log.info(e.response["Error"]["Message"]) 
return False
{code}
 

 


> S3Hook Broad Exception Silent Failure
> -
>
> Key: AIRFLOW-2771
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2771
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: Airflow 2.0, 1.9.0
>Reporter: Micheal Ascah
>Assignee: Micheal Ascah
>Priority: Minor
>  Labels: S3Hook, S3Sensor
>
> h2. Scenario
> S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
> bad permissions). When poking for the key, it creates an S3Hook and calls 
> `check_for_key` on the hook. Currently, the call is caught by a generic 
> except clause that catches all exceptions, rather than the expected 
> botocore.exceptions.ClientError when an object is not found.
> h2. Problem
> This causes the sensor to return False and report no issue with the task 
> instance until it times out, rather than intuitively failing immediately if 
> the connection is incorrectly con

[jira] [Updated] (AIRFLOW-2771) S3Hook Broad Exception Silent Failure

2018-07-19 Thread Micheal Ascah (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micheal Ascah updated AIRFLOW-2771:
---
Description: 
h2. Scenario

S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
bad permissions). There are also no credentials found under ~/.aws/credentials 
for boto to fallback on.

 

When poking for the key, it creates an S3Hook and calls `check_for_key` on the 
hook. Currently, the call is caught by a generic except clause that catches all 
exceptions, rather than the expected botocore.exceptions.ClientError when an 
object is not found.
h2. Problem

This causes the sensor to return False and report no issue with the task 
instance until it times out, rather than intuitively failing immediately if the 
connection is incorrectly configured. The current logging output gives no 
insight as to why the key is not being found.
h4. Current code
{code:java}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except:  # <- This catches credential and connection exceptions that should be 
raised
return False
{code}
{code:java}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
h4. Expected
h5. No credentials
{code:java}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
Traceback (most recent call last):
...
botocore.exceptions.NoCredentialsError: Unable to locate credentials
{code}
h5. Good credentials
{code:java}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_does_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
h4. Proposed Change

Add a type to the except clause for botocore.exceptions.ClientError and log the 
message for both check_for_key and check_for_bucket on S3Hook.
{code:java}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except ClientError as e:
self.log.info(e.response["Error"]["Message"]) 
return False
{code}
  

  was:
h2. Scenario

S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
bad permissions). When poking for the key, it creates an S3Hook and calls 
`check_for_key` on the hook. Currently, the call is caught by a generic except 
clause that catches all exceptions, rather than the expected 
botocore.exceptions.ClientError when an object is not found.
h2. Problem

This causes the sensor to return False and report no issue with the task 
instance until it times out, rather than intuitively failing immediately if the 
connection is incorrectly configured. The current logging output gives no 
insight as to why the key is not being found.
h4. Current code
{code}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except:  # <- This catches credential and connection exceptions that should be 
raised
return False
{code}
{code}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
h4. Expected
h5. No credentials
{code}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
Traceback (most recent call last):
...
botocore.exceptions.NoCredentialsError: Unable to locate credentials
{code}
h5. Good credentials
{code}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_does_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
h4. Proposed Change

Add a type to the except clause for botocore.exceptions.ClientError and log the 
message for both check_for_key and check_for_bucket on S3Hook.
{code}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except ClientError as e:
self.log.info(e.response["Error"]["Message"]) 
return False
{code}
  


> S3Hook Broad Exception Silent Failure
> -
>
> Key: AIRFLOW-2771
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2771
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: Airflow 2.0, 1.9.0
>Reporter: Micheal Ascah
>Assignee: Micheal Ascah
>Priority: Minor
>  Labels: S3Hook, S3Sensor
>
> h2. Scenario
> S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
> bad permissions). There are also no credentials found under 
> ~/.aws/credentials for boto to fallback on.
>  
> When poking for the key, it creates an S3Hook and calls `check_for_key` on 
> the hook. Currently, the call is caught by a generic except clause that 
> catches all exceptions, rather than the expected 
> botocore.exceptions.ClientError when a

[jira] [Updated] (AIRFLOW-2771) S3Hook Broad Exception Silent Failure

2018-07-19 Thread Micheal Ascah (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micheal Ascah updated AIRFLOW-2771:
---
Description: 
h2. Scenario

S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
bad permissions). There are also no credentials found under ~/.aws/credentials 
for boto to fallback on.

 

When poking for the key, it creates an S3Hook and calls `check_for_key` on the 
hook. Currently, the call is caught by a generic except clause that catches all 
exceptions, rather than the expected botocore.exceptions.ClientError when an 
object is not found.
h2. Problem

This causes the sensor to return False and report no issue with the task 
instance until it times out, rather than intuitively failing immediately if the 
connection is incorrectly configured. The current logging output gives no 
insight as to why the key is not being found.
h4. Current code
{code:python}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except:  # <- This catches credential and connection exceptions that should be 
raised
return False
{code}
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
{code}
[2018-07-18 18:57:26,652] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:57:26,651] {sensors.py:537} INFO - Poking for key : s3://bucket/key.txt
[2018-07-18 18:57:26,681] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:57:26,680] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): 
bucket.s3.amazonaws.com
{code}
h4. Expected
h5. No credentials
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
Traceback (most recent call last):
...
botocore.exceptions.NoCredentialsError: Unable to locate credentials
{code}
h5. Good credentials
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_does_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
h4. Proposed Change

Add a type to the except clause for botocore.exceptions.ClientError and log the 
message for both check_for_key and check_for_bucket on S3Hook.
{code:python}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except ClientError as e:
self.log.info(e.response["Error"]["Message"]) 
return False
{code}
  

  was:
h2. Scenario

S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
bad permissions). There are also no credentials found under ~/.aws/credentials 
for boto to fallback on.

 

When poking for the key, it creates an S3Hook and calls `check_for_key` on the 
hook. Currently, the call is caught by a generic except clause that catches all 
exceptions, rather than the expected botocore.exceptions.ClientError when an 
object is not found.
h2. Problem

This causes the sensor to return False and report no issue with the task 
instance until it times out, rather than intuitively failing immediately if the 
connection is incorrectly configured. The current logging output gives no 
insight as to why the key is not being found.
h4. Current code
{code:java}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except:  # <- This catches credential and connection exceptions that should be 
raised
return False
{code}
{code:java}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
h4. Expected
h5. No credentials
{code:java}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
Traceback (most recent call last):
...
botocore.exceptions.NoCredentialsError: Unable to locate credentials
{code}
h5. Good credentials
{code:java}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_does_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
h4. Proposed Change

Add a type to the except clause for botocore.exceptions.ClientError and log the 
message for both check_for_key and check_for_bucket on S3Hook.
{code:java}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except ClientError as e:
self.log.info(e.response["Error"]["Message"]) 
return False
{code}
  


> S3Hook Broad Exception Silent Failure
> -
>
> Key: AIRFLOW-2771
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2771
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: Airflow 2.0, 1.9.0
>Reporter: Micheal Ascah
>Assignee: Micheal Ascah
>Priority: Minor
>  

[jira] [Updated] (AIRFLOW-2771) S3Hook Broad Exception Silent Failure

2018-07-19 Thread Micheal Ascah (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micheal Ascah updated AIRFLOW-2771:
---
Description: 
h2. Scenario

S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
bad permissions). There are also no credentials found under ~/.aws/credentials 
for boto to fallback on.

 

When poking for the key, it creates an S3Hook and calls `check_for_key` on the 
hook. Currently, the call is caught by a generic except clause that catches all 
exceptions, rather than the expected botocore.exceptions.ClientError when an 
object is not found.
h2. Problem

This causes the sensor to return False and report no issue with the task 
instance until it times out, rather than intuitively failing immediately if the 
connection is incorrectly configured. The current logging output gives no 
insight as to why the key is not being found.
h4. Current code
{code}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except:  # <- This catches credential and connection exceptions that should be 
raised
return False
{code}
{code}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
{code:java}
[2018-07-18 18:57:26,652] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:57:26,651] {sensors.py:537} INFO - Poking for key : s3://bucket/key.txt
[2018-07-18 18:57:26,681] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:57:26,680] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): 
bucket.s3.amazonaws.com
[2018-07-18 18:58:26,767] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:58:26,767] {sensors.py:537} INFO - Poking for key : s3://bucket/key.txt
[2018-07-18 18:58:26,809] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:58:26,808] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): 
bucket.s3.amazonaws.com
{code}
h4. Expected
h5. No credentials
{code}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
Traceback (most recent call last):
...
botocore.exceptions.NoCredentialsError: Unable to locate credentials
{code}
h5. Good credentials
{code}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_does_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
h4. Proposed Change

Add a type to the except clause for botocore.exceptions.ClientError and log the 
message for both check_for_key and check_for_bucket on S3Hook.
{code}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except ClientError as e:
self.log.info(e.response["Error"]["Message"]) 
return False
{code}
  

  was:
h2. Scenario

S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
bad permissions). There are also no credentials found under ~/.aws/credentials 
for boto to fallback on.

 

When poking for the key, it creates an S3Hook and calls `check_for_key` on the 
hook. Currently, the call is caught by a generic except clause that catches all 
exceptions, rather than the expected botocore.exceptions.ClientError when an 
object is not found.
h2. Problem

This causes the sensor to return False and report no issue with the task 
instance until it times out, rather than intuitively failing immediately if the 
connection is incorrectly configured. The current logging output gives no 
insight as to why the key is not being found.
h4. Current code
{code:python}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except:  # <- This catches credential and connection exceptions that should be 
raised
return False
{code}
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
{code}
[2018-07-18 18:57:26,652] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:57:26,651] {sensors.py:537} INFO - Poking for key : s3://bucket/key.txt
[2018-07-18 18:57:26,681] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:57:26,680] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): 
bucket.s3.amazonaws.com
{code}
h4. Expected
h5. No credentials
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
Traceback (most recent call last):
...
botocore.exceptions.NoCredentialsError: Unable to locate credentials
{code}
h5. Good credentials
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_does_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
h4. Proposed Change

Add a type to the except clause for botocore.exceptions.ClientError and log the 
message for both check_for_key and c

[jira] [Updated] (AIRFLOW-2771) S3Hook Broad Exception Silent Failure

2018-07-19 Thread Micheal Ascah (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micheal Ascah updated AIRFLOW-2771:
---
Description: 
h2. Scenario

S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
bad permissions). There are also no credentials found under ~/.aws/credentials 
for boto to fallback on.

 

When poking for the key, it creates an S3Hook and calls `check_for_key` on the 
hook. Currently, the call is caught by a generic except clause that catches all 
exceptions, rather than the expected botocore.exceptions.ClientError when an 
object is not found.
h2. Problem

This causes the sensor to return False and report no issue with the task 
instance until it times out, rather than intuitively failing immediately if the 
connection is incorrectly configured. The current logging output gives no 
insight as to why the key is not being found.
h4. Current code
{code:python}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except:  # <- This catches credential and connection exceptions that should be 
raised
return False
{code}
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
{code:python}
[2018-07-18 18:57:26,652] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:57:26,651] {sensors.py:537} INFO - Poking for key : s3://bucket/key.txt
[2018-07-18 18:57:26,681] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:57:26,680] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): 
bucket.s3.amazonaws.com
[2018-07-18 18:58:26,767] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:58:26,767] {sensors.py:537} INFO - Poking for key : s3://bucket/key.txt
[2018-07-18 18:58:26,809] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:58:26,808] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): 
bucket.s3.amazonaws.com
{code}
h4. Expected
h5. No credentials
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
Traceback (most recent call last):
...
botocore.exceptions.NoCredentialsError: Unable to locate credentials
{code}
h5. Good credentials
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_does_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
h4. Proposed Change

Add a type to the except clause for botocore.exceptions.ClientError and log the 
message for both check_for_key and check_for_bucket on S3Hook.
{code:python}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except ClientError as e:
self.log.info(e.response["Error"]["Message"]) 
return False
{code}
  

  was:
h2. Scenario

S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
bad permissions). There are also no credentials found under ~/.aws/credentials 
for boto to fallback on.

 

When poking for the key, it creates an S3Hook and calls `check_for_key` on the 
hook. Currently, the call is caught by a generic except clause that catches all 
exceptions, rather than the expected botocore.exceptions.ClientError when an 
object is not found.
h2. Problem

This causes the sensor to return False and report no issue with the task 
instance until it times out, rather than intuitively failing immediately if the 
connection is incorrectly configured. The current logging output gives no 
insight as to why the key is not being found.
h4. Current code
{code}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except:  # <- This catches credential and connection exceptions that should be 
raised
return False
{code}
{code}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
{code:java}
[2018-07-18 18:57:26,652] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:57:26,651] {sensors.py:537} INFO - Poking for key : s3://bucket/key.txt
[2018-07-18 18:57:26,681] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:57:26,680] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): 
bucket.s3.amazonaws.com
[2018-07-18 18:58:26,767] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:58:26,767] {sensors.py:537} INFO - Poking for key : s3://bucket/key.txt
[2018-07-18 18:58:26,809] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:58:26,808] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): 
bucket.s3.amazonaws.com
{code}
h4. Expected
h5. No credentials
{code}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
Traceback (most recent call last):
...
botocore.exceptions.NoCredentialsError

[jira] [Updated] (AIRFLOW-2771) S3Hook Broad Exception Silent Failure

2018-07-19 Thread Micheal Ascah (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micheal Ascah updated AIRFLOW-2771:
---
Affects Version/s: (was: Airflow 2.0)

> S3Hook Broad Exception Silent Failure
> -
>
> Key: AIRFLOW-2771
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2771
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: 1.9.0
>Reporter: Micheal Ascah
>Assignee: Micheal Ascah
>Priority: Minor
>  Labels: S3Hook, S3Sensor
>
> h2. Scenario
> S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
> bad permissions). There are also no credentials found under 
> ~/.aws/credentials for boto to fallback on.
>  
> When poking for the key, it creates an S3Hook and calls `check_for_key` on 
> the hook. Currently, the call is caught by a generic except clause that 
> catches all exceptions, rather than the expected 
> botocore.exceptions.ClientError when an object is not found.
> h2. Problem
> This causes the sensor to return False and report no issue with the task 
> instance until it times out, rather than intuitively failing immediately if 
> the connection is incorrectly configured. The current logging output gives no 
> insight as to why the key is not being found.
> h4. Current code
> {code:python}
> try:
> self.get_conn().head_object(Bucket=bucket_name, Key=key)
> return True
> except:  # <- This catches credential and connection exceptions that should 
> be raised
> return False
> {code}
> {code:python}
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
> hook.check_for_key(key="test", bucket="test")
> False
> {code}
> {code:python}
> [2018-07-18 18:57:26,652] {base_task_runner.py:98} INFO - Subtask: 
> [2018-07-18 18:57:26,651] {sensors.py:537} INFO - Poking for key : 
> s3://bucket/key.txt
> [2018-07-18 18:57:26,681] {base_task_runner.py:98} INFO - Subtask: 
> [2018-07-18 18:57:26,680] {connectionpool.py:735} INFO - Starting new HTTPS 
> connection (1): bucket.s3.amazonaws.com
> [2018-07-18 18:58:26,767] {base_task_runner.py:98} INFO - Subtask: 
> [2018-07-18 18:58:26,767] {sensors.py:537} INFO - Poking for key : 
> s3://bucket/key.txt
> [2018-07-18 18:58:26,809] {base_task_runner.py:98} INFO - Subtask: 
> [2018-07-18 18:58:26,808] {connectionpool.py:735} INFO - Starting new HTTPS 
> connection (1): bucket.s3.amazonaws.com
> {code}
> h4. Expected
> h5. No credentials
> {code:python}
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
> hook.check_for_key(key="test", bucket="test")
> Traceback (most recent call last):
> ...
> botocore.exceptions.NoCredentialsError: Unable to locate credentials
> {code}
> h5. Good credentials
> {code:python}
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(aws_conn_id="conn_that_does_exist")
> hook.check_for_key(key="test", bucket="test")
> False
> {code}
> h4. Proposed Change
> Add a type to the except clause for botocore.exceptions.ClientError and log 
> the message for both check_for_key and check_for_bucket on S3Hook.
> {code:python}
> try:
> self.get_conn().head_object(Bucket=bucket_name, Key=key)
> return True
> except ClientError as e:
> self.log.info(e.response["Error"]["Message"]) 
> return False
> {code}
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2772) BigQuery hook does not allow specifying both the partition field name and table name at the same time

2018-07-19 Thread Berislav Lopac (JIRA)
Berislav Lopac created AIRFLOW-2772:
---

 Summary: BigQuery hook does not allow specifying both the 
partition field name and table name at the same time
 Key: AIRFLOW-2772
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2772
 Project: Apache Airflow
  Issue Type: Bug
  Components: hooks
Reporter: Berislav Lopac


When creating a load job for a single partition in a BigQuery's partitioned 
table, it is possible to specify either the table name with the partition (e.g. 
{{dataset_name.table_name$partition_id}}), or the field used for the partition 
(e.g. {{time_partitioning=\{"field": "field_name"\}}}) -- but not both.

This is the code that raises the exception, at the very end of 
{{contrib/hooks/bigquery_hook.py}}:

{code}
assert not time_partitioning_in.get('field'), (
"Cannot specify field partition and partition name "
"(dataset.table$partition) at the same time"
)
{code}

My first problem is using {{assert}} for flow control, but more importantly it 
is not clear what is the rationale for this check and the error if both are 
defined? The code works well if we provide just the partition field 
specification, but passing only the partition table name results in the 
following BQ error:

{code}Incompatible table partitioning specification. Expects partitioning 
specification interval(type:day,field:local_event_start_date), but input 
partitioning specification is interval(type:day){code}

which implies that sending both should be perfectly fine.

Can anyone provide any insight?




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AIRFLOW-2771) S3Hook Broad Exception Silent Failure

2018-07-19 Thread Micheal Ascah (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micheal Ascah updated AIRFLOW-2771:
---
Description: 
h2. Scenario

S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
bad permissions). There are also no credentials found under ~/.aws/credentials 
for boto to fallback on.

 

When poking for the key, it creates an S3Hook and calls `check_for_key` on the 
hook. If the call to HeadObject fails, the call is caught by a generic except 
clause that catches all exceptions, rather than the expected 
botocore.exceptions.ClientError when an object is not found.
h2. Problem

This causes the sensor to return False and report no issue with the task 
instance until it times out, rather than intuitively failing immediately if the 
connection is incorrectly configured. The current logging output gives no 
insight as to why the key is not being found.
h4. Current code
{code:python}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except:  # <- This catches credential and connection exceptions that should be 
raised
return False
{code}
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
{code:python}
[2018-07-18 18:57:26,652] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:57:26,651] {sensors.py:537} INFO - Poking for key : s3://bucket/key.txt
[2018-07-18 18:57:26,681] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:57:26,680] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): 
bucket.s3.amazonaws.com
[2018-07-18 18:58:26,767] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:58:26,767] {sensors.py:537} INFO - Poking for key : s3://bucket/key.txt
[2018-07-18 18:58:26,809] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:58:26,808] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): 
bucket.s3.amazonaws.com
{code}
h4. Expected
h5. No credentials
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
Traceback (most recent call last):
...
botocore.exceptions.NoCredentialsError: Unable to locate credentials
{code}
h5. Good credentials
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_does_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
h4. Proposed Change

Add a type to the except clause for botocore.exceptions.ClientError and log the 
message for both check_for_key and check_for_bucket on S3Hook.
{code:python}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except ClientError as e:
self.log.info(e.response["Error"]["Message"]) 
return False
{code}
  

  was:
h2. Scenario

S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
bad permissions). There are also no credentials found under ~/.aws/credentials 
for boto to fallback on.

 

When poking for the key, it creates an S3Hook and calls `check_for_key` on the 
hook. Currently, the call is caught by a generic except clause that catches all 
exceptions, rather than the expected botocore.exceptions.ClientError when an 
object is not found.
h2. Problem

This causes the sensor to return False and report no issue with the task 
instance until it times out, rather than intuitively failing immediately if the 
connection is incorrectly configured. The current logging output gives no 
insight as to why the key is not being found.
h4. Current code
{code:python}
try:
self.get_conn().head_object(Bucket=bucket_name, Key=key)
return True
except:  # <- This catches credential and connection exceptions that should be 
raised
return False
{code}
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
False
{code}
{code:python}
[2018-07-18 18:57:26,652] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:57:26,651] {sensors.py:537} INFO - Poking for key : s3://bucket/key.txt
[2018-07-18 18:57:26,681] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:57:26,680] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): 
bucket.s3.amazonaws.com
[2018-07-18 18:58:26,767] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:58:26,767] {sensors.py:537} INFO - Poking for key : s3://bucket/key.txt
[2018-07-18 18:58:26,809] {base_task_runner.py:98} INFO - Subtask: [2018-07-18 
18:58:26,808] {connectionpool.py:735} INFO - Starting new HTTPS connection (1): 
bucket.s3.amazonaws.com
{code}
h4. Expected
h5. No credentials
{code:python}
from airflow.hooks.S3_hook import S3Hook
hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
hook.check_for_key(key="test", bucket="test")
Traceback (most recent call last

[jira] [Commented] (AIRFLOW-2768) `airflow clear --upstream --only_failed` doesn't clear tasks upstream

2018-07-19 Thread Tylar Murray (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549396#comment-16549396
 ] 

Tylar Murray commented on AIRFLOW-2768:
---

It looks like `–rerun_failed_tasks` was added in `1.10` so I do not have the 
option in `1.9`. From the description it doesn't **sound** like what I want, 
but maybe it is? 

I can "auto rerun only the failed task" with `airflow clear --only_failed`. I 
want to rerun all tasks (even non-failed) within a failed DagRun.

> `airflow clear --upstream --only_failed` doesn't clear tasks upstream
> -
>
> Key: AIRFLOW-2768
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2768
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 1.9.0
> Environment: airflow v1.9.0 
>Reporter: Tylar Murray
>Priority: Major
>  Labels: cli
>
> Should `airflow clear --upstream --only_failed mydagname` clear non-failed 
> tasks upstream of a failed task?
> I believe so, yes. If not, then `airflow clear --upstream --only_failed` and 
> `airflow clear --only_failed` are equivalent.
> Upstream tasks are not being cleared. The same is true for `–downstream`.
>  
> My use-case is: I want to clear all tasks all DagRuns that contain a failed 
> task. In other words: `airflow clear mydagname`, but skip DagRuns that have 
> completed sucessfully.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work started] (AIRFLOW-2771) S3Hook Broad Exception Silent Failure

2018-07-19 Thread Micheal Ascah (JIRA)


 [ 
https://issues.apache.org/jira/browse/AIRFLOW-2771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on AIRFLOW-2771 started by Micheal Ascah.
--
> S3Hook Broad Exception Silent Failure
> -
>
> Key: AIRFLOW-2771
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2771
> Project: Apache Airflow
>  Issue Type: Bug
>  Components: hooks
>Affects Versions: 1.9.0
>Reporter: Micheal Ascah
>Assignee: Micheal Ascah
>Priority: Minor
>  Labels: S3Hook, S3Sensor
>
> h2. Scenario
> S3KeySensor is passed an invalid S3/AWS connection id name (doesn't exist or 
> bad permissions). There are also no credentials found under 
> ~/.aws/credentials for boto to fallback on.
>  
> When poking for the key, it creates an S3Hook and calls `check_for_key` on 
> the hook. If the call to HeadObject fails, the call is caught by a generic 
> except clause that catches all exceptions, rather than the expected 
> botocore.exceptions.ClientError when an object is not found.
> h2. Problem
> This causes the sensor to return False and report no issue with the task 
> instance until it times out, rather than intuitively failing immediately if 
> the connection is incorrectly configured. The current logging output gives no 
> insight as to why the key is not being found.
> h4. Current code
> {code:python}
> try:
> self.get_conn().head_object(Bucket=bucket_name, Key=key)
> return True
> except:  # <- This catches credential and connection exceptions that should 
> be raised
> return False
> {code}
> {code:python}
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
> hook.check_for_key(key="test", bucket="test")
> False
> {code}
> {code:python}
> [2018-07-18 18:57:26,652] {base_task_runner.py:98} INFO - Subtask: 
> [2018-07-18 18:57:26,651] {sensors.py:537} INFO - Poking for key : 
> s3://bucket/key.txt
> [2018-07-18 18:57:26,681] {base_task_runner.py:98} INFO - Subtask: 
> [2018-07-18 18:57:26,680] {connectionpool.py:735} INFO - Starting new HTTPS 
> connection (1): bucket.s3.amazonaws.com
> [2018-07-18 18:58:26,767] {base_task_runner.py:98} INFO - Subtask: 
> [2018-07-18 18:58:26,767] {sensors.py:537} INFO - Poking for key : 
> s3://bucket/key.txt
> [2018-07-18 18:58:26,809] {base_task_runner.py:98} INFO - Subtask: 
> [2018-07-18 18:58:26,808] {connectionpool.py:735} INFO - Starting new HTTPS 
> connection (1): bucket.s3.amazonaws.com
> {code}
> h4. Expected
> h5. No credentials
> {code:python}
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(aws_conn_id="conn_that_doesnt_exist")
> hook.check_for_key(key="test", bucket="test")
> Traceback (most recent call last):
> ...
> botocore.exceptions.NoCredentialsError: Unable to locate credentials
> {code}
> h5. Good credentials
> {code:python}
> from airflow.hooks.S3_hook import S3Hook
> hook = S3Hook(aws_conn_id="conn_that_does_exist")
> hook.check_for_key(key="test", bucket="test")
> False
> {code}
> h4. Proposed Change
> Add a type to the except clause for botocore.exceptions.ClientError and log 
> the message for both check_for_key and check_for_bucket on S3Hook.
> {code:python}
> try:
> self.get_conn().head_object(Bucket=bucket_name, Key=key)
> return True
> except ClientError as e:
> self.log.info(e.response["Error"]["Message"]) 
> return False
> {code}
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2773) DataFlowPythonOperator does not handle correctly task_id containing underscores

2018-07-19 Thread Evgeny Podlepaev (JIRA)
Evgeny Podlepaev created AIRFLOW-2773:
-

 Summary: DataFlowPythonOperator does not handle correctly task_id 
containing underscores
 Key: AIRFLOW-2773
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2773
 Project: Apache Airflow
  Issue Type: Bug
  Components: Dataflow
Affects Versions: 1.9.0
Reporter: Evgeny Podlepaev


DataFlowPythonOperator generates a job name that does not get accepted by 
Dataflow API when task_id contains underscores. Example: 
DataFlowPythonOperator(task_id='analyze_search_results', ...)

will lead to the following error:

ValueError: Pipeline has validations errors: Invalid job_name 
(analyze_search_results-02e17268); the name must consist of only the characters 
[-a-z0-9], starting with a letter and ending with a letter or number.

The fix seems to be as simple as changing DataFlowHook.start_python_dataflow() 
to do

name = task_id.replace('_', '-')



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AIRFLOW-2774) DataFlowPythonOperator needs to support DirectRunner to facilitate local testing

2018-07-19 Thread Evgeny Podlepaev (JIRA)
Evgeny Podlepaev created AIRFLOW-2774:
-

 Summary: DataFlowPythonOperator needs to support DirectRunner to 
facilitate local testing
 Key: AIRFLOW-2774
 URL: https://issues.apache.org/jira/browse/AIRFLOW-2774
 Project: Apache Airflow
  Issue Type: Improvement
  Components: Dataflow
Affects Versions: 1.9.0
Reporter: Evgeny Podlepaev


**DataFlowPythonOperator needs to support DirectRunner as a runner option to 
facilitate local testing of the entire pipeline. Right now if DirectRunner is 
set via job options, the DataFlowHook will wait infinitely trying to get status 
of the remote job which does not exist:

_DataflowJob(self.get_conn(), variables['project'], name,
variables['region'], self.poll_sleep).wait_for_done()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AIRFLOW-2243) airflow seems to load modules multiple times

2018-07-19 Thread Tim Swast (JIRA)


[ 
https://issues.apache.org/jira/browse/AIRFLOW-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550015#comment-16550015
 ] 

Tim Swast commented on AIRFLOW-2243:


Also `imp.load_source()` is deprecated (and completely undocumented) in Python 
3. See: https://bugs.python.org/issue14551

> airflow seems to load modules multiple times
> 
>
> Key: AIRFLOW-2243
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2243
> Project: Apache Airflow
>  Issue Type: Bug
>Reporter: sulphide
>Priority: Major
>
> airflow uses the builtin imp.load_source to load modules, but the 
> documentation (for python 2) says:
> [https://docs.python.org/2/library/imp.html#imp.load_source]
> {{imp.}}{{load_source}}(_name_, _pathname_[, _file_])
> Load and initialize a module implemented as a Python source file and return 
> its module object. If the module was already initialized, it will be 
> initialized _again_. The _name_ argument is used to create or access a module 
> object. The_pathname_ argument points to the source file. The _file_ argument 
> is the source file, open for reading as text, from the beginning. It must 
> currently be a real file object, not a user-defined class emulating a file. 
> Note that if a properly matching byte-compiled file (with suffix {{.pyc}} or 
> {{.pyo}}) exists, it will be used instead of parsing the given source file.
>  
>  
> this means that airflow behaves differently from a typical python program in 
> that a module may be imported multiple times, which could have unexpected 
> effects for those relying on the typical python import semantics.
> https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L300



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)