[ 
https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280644#comment-16280644
 ] 

Barry Hart commented on AIRFLOW-15:
-----------------------------------

I understand. "Wait and see" might be a reasonable strategy. (I.e. until Google 
clarifies the message). The specific reason for my comment is that one of our 
DAGs transfers a very large number of files to and from Google Storage. With 
this number of files, we almost always see some transient 5XX errors from the 
Google side, so we see some value in the google-cloud-python-library, which has 
retry logic built in, both 
[generally|https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/api_core/google/api_core/retry.py]
 and [specifically for Google 
Storage|https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/storage/google/cloud/storage/blob.py#L84-L91].)

(Although Airflow has its own retry support, I see those as being intended for 
coarse-grained retries (i.e. when one task does a few things). When one task is 
transferring thousands of files, it seems useful to retry internal to the task 
as well (per file).

Let me know what you think. It may be worth creating a ticket about retries to 
perhaps get input from other users. For now, we can use the 
google-cloud-python-library directly from our DAGs.

> Remove GCloud from Airflow
> --------------------------
>
>                 Key: AIRFLOW-15
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-15
>             Project: Apache Airflow
>          Issue Type: Task
>          Components: gcp
>            Reporter: Chris Riccomini
>            Assignee: Chris Riccomini
>              Labels: gcp
>
> After speaking with Google, there was some concern about using the 
> [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library 
> for Airflow. There are several concerns:
> # It's not clear (even to people at Google) what this library is, who owns 
> it, etc.
> # It does not support all services (the way 
> [google-api-python-client|https://github.com/google/google-api-python-client] 
> does).
> # There are compatibility issues between google-api-python-client and 
> gcloudpython.
> We currently support both, after libraries depending on which package you you 
> install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove 
> the {{airflow[gcloud]}} packaged, and all associated code.
> The main associated code, afaik, is the use of the {{gcloud}} library in the 
> Google cloud storage hooks/operators--specifically for Google cloud storage 
> Airfow logging.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to