[jira] [Commented] (AIRFLOW-15) Remove GCloud from Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284291#comment-16284291 ] Feng Lu commented on AIRFLOW-15: Sure, [~criccomini], [AIRFLOW-1894|https://issues.apache.org/jira/browse/AIRFLOW-1894] is created to track this. > Remove GCloud from Airflow > -- > > Key: AIRFLOW-15 > URL: https://issues.apache.org/jira/browse/AIRFLOW-15 > Project: Apache Airflow > Issue Type: Task > Components: gcp >Reporter: Chris Riccomini >Assignee: Chris Riccomini > Labels: gcp > > After speaking with Google, there was some concern about using the > [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library > for Airflow. There are several concerns: > # It's not clear (even to people at Google) what this library is, who owns > it, etc. > # It does not support all services (the way > [google-api-python-client|https://github.com/google/google-api-python-client] > does). > # There are compatibility issues between google-api-python-client and > gcloudpython. > We currently support both, after libraries depending on which package you you > install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove > the {{airflow[gcloud]}} packaged, and all associated code. > The main associated code, afaik, is the use of the {{gcloud}} library in the > Google cloud storage hooks/operators--specifically for Google cloud storage > Airfow logging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-15) Remove GCloud from Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282154#comment-16282154 ] Chris Riccomini commented on AIRFLOW-15: [~fenglu], that sounds good to me. This JIRA is actually about removing the gcloud library, though. [~fenglu], can you open a new JIRA that proposes what you're saying, and assign to yourself for process tracking? Also, we need to make sure that the compatibility issues are worked out. Last time we tried running both in parallel, we had major dependency conflict issues (see third bullet point in the description of this JIRA). > Remove GCloud from Airflow > -- > > Key: AIRFLOW-15 > URL: https://issues.apache.org/jira/browse/AIRFLOW-15 > Project: Apache Airflow > Issue Type: Task > Components: gcp >Reporter: Chris Riccomini >Assignee: Chris Riccomini > Labels: gcp > > After speaking with Google, there was some concern about using the > [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library > for Airflow. There are several concerns: > # It's not clear (even to people at Google) what this library is, who owns > it, etc. > # It does not support all services (the way > [google-api-python-client|https://github.com/google/google-api-python-client] > does). > # There are compatibility issues between google-api-python-client and > gcloudpython. > We currently support both, after libraries depending on which package you you > install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove > the {{airflow[gcloud]}} packaged, and all associated code. > The main associated code, afaik, is the use of the {{gcloud}} library in the > Google cloud storage hooks/operators--specifically for Google cloud storage > Airfow logging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-15) Remove GCloud from Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16282079#comment-16282079 ] Feng Lu commented on AIRFLOW-15: [~criccomini] [~barrywhart] Unfortunately we don't have feature parity between the two yet, but it's expected that all GCP services will be supported in google-cloud-python. Checked with our internal team, it's recommended that google-cloud-python should be used whenever possible. Specifically for Airflow GCP connectors, I would propose: - add google-cloud-python dependency as part of gcp_api. - new operators shall base on google-cloud-python if possible. - we (or anyone outside google) can migrate existing GCP operators when the underlying GCP service is available in google-cloud-python. WDYT? Feel free to assign this JIRA issue to me for process tracking. > Remove GCloud from Airflow > -- > > Key: AIRFLOW-15 > URL: https://issues.apache.org/jira/browse/AIRFLOW-15 > Project: Apache Airflow > Issue Type: Task > Components: gcp >Reporter: Chris Riccomini >Assignee: Chris Riccomini > Labels: gcp > > After speaking with Google, there was some concern about using the > [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library > for Airflow. There are several concerns: > # It's not clear (even to people at Google) what this library is, who owns > it, etc. > # It does not support all services (the way > [google-api-python-client|https://github.com/google/google-api-python-client] > does). > # There are compatibility issues between google-api-python-client and > gcloudpython. > We currently support both, after libraries depending on which package you you > install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove > the {{airflow[gcloud]}} packaged, and all associated code. > The main associated code, afaik, is the use of the {{gcloud}} library in the > Google cloud storage hooks/operators--specifically for Google cloud storage > Airfow logging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-15) Remove GCloud from Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280649#comment-16280649 ] Chris Riccomini commented on AIRFLOW-15: [~fenglu], perhaps you can poke someone on your end to get guidance on the right library to use? What are your thoughts? > Remove GCloud from Airflow > -- > > Key: AIRFLOW-15 > URL: https://issues.apache.org/jira/browse/AIRFLOW-15 > Project: Apache Airflow > Issue Type: Task > Components: gcp >Reporter: Chris Riccomini >Assignee: Chris Riccomini > Labels: gcp > > After speaking with Google, there was some concern about using the > [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library > for Airflow. There are several concerns: > # It's not clear (even to people at Google) what this library is, who owns > it, etc. > # It does not support all services (the way > [google-api-python-client|https://github.com/google/google-api-python-client] > does). > # There are compatibility issues between google-api-python-client and > gcloudpython. > We currently support both, after libraries depending on which package you you > install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove > the {{airflow[gcloud]}} packaged, and all associated code. > The main associated code, afaik, is the use of the {{gcloud}} library in the > Google cloud storage hooks/operators--specifically for Google cloud storage > Airfow logging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-15) Remove GCloud from Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280644#comment-16280644 ] Barry Hart commented on AIRFLOW-15: --- I understand. "Wait and see" might be a reasonable strategy. (I.e. until Google clarifies the message). The specific reason for my comment is that one of our DAGs transfers a very large number of files to and from Google Storage. With this number of files, we almost always see some transient 5XX errors from the Google side, so we see some value in the google-cloud-python-library, which has retry logic built in, both [generally|https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/api_core/google/api_core/retry.py] and [specifically for Google Storage|https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/storage/google/cloud/storage/blob.py#L84-L91].) (Although Airflow has its own retry support, I see those as being intended for coarse-grained retries (i.e. when one task does a few things). When one task is transferring thousands of files, it seems useful to retry internal to the task as well (per file). Let me know what you think. It may be worth creating a ticket about retries to perhaps get input from other users. For now, we can use the google-cloud-python-library directly from our DAGs. > Remove GCloud from Airflow > -- > > Key: AIRFLOW-15 > URL: https://issues.apache.org/jira/browse/AIRFLOW-15 > Project: Apache Airflow > Issue Type: Task > Components: gcp >Reporter: Chris Riccomini >Assignee: Chris Riccomini > Labels: gcp > > After speaking with Google, there was some concern about using the > [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library > for Airflow. There are several concerns: > # It's not clear (even to people at Google) what this library is, who owns > it, etc. > # It does not support all services (the way > [google-api-python-client|https://github.com/google/google-api-python-client] > does). > # There are compatibility issues between google-api-python-client and > gcloudpython. > We currently support both, after libraries depending on which package you you > install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove > the {{airflow[gcloud]}} packaged, and all associated code. > The main associated code, afaik, is the use of the {{gcloud}} library in the > Google cloud storage hooks/operators--specifically for Google cloud storage > Airfow logging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-15) Remove GCloud from Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16280614#comment-16280614 ] Chris Riccomini commented on AIRFLOW-15: [~barrywhart], to be honest, I am not sure what the right path forward is. I was told specifically by several Google PMs at Google Next last year not to use the idiomatic library. Since then, the message you are pointing to has appeared on the google python client. The use case is further muddied by the fact that I'm not convinced an idiomatic Python client is actually what we want. The fact that the Google APIs all work the same way in the service binding API makes it pretty easy to abstract over a lot of the mechanics around interacting with Google in a generic way that can be leveraged by all GCP operators. I haven't looked into whether or not this overhead would increase if we went to an idiomatic library where interacting with GCS might look very different from interacting with Dataflow, etc. > Remove GCloud from Airflow > -- > > Key: AIRFLOW-15 > URL: https://issues.apache.org/jira/browse/AIRFLOW-15 > Project: Apache Airflow > Issue Type: Task > Components: gcp >Reporter: Chris Riccomini >Assignee: Chris Riccomini > Labels: gcp > > After speaking with Google, there was some concern about using the > [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library > for Airflow. There are several concerns: > # It's not clear (even to people at Google) what this library is, who owns > it, etc. > # It does not support all services (the way > [google-api-python-client|https://github.com/google/google-api-python-client] > does). > # There are compatibility issues between google-api-python-client and > gcloudpython. > We currently support both, after libraries depending on which package you you > install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove > the {{airflow[gcloud]}} packaged, and all associated code. > The main associated code, afaik, is the use of the {{gcloud}} library in the > Google cloud storage hooks/operators--specifically for Google cloud storage > Airfow logging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-15) Remove GCloud from Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279191#comment-16279191 ] Barry Hart commented on AIRFLOW-15: --- Question: This change moved Airflow to using a library that Google no longer recommends (https://github.com/google/google-api-python-client/#google-cloud-platform-apis): {quote} If you're working with Google Cloud Platform APIs such as Datastore or Pub/Sub, consider using the Cloud Client Libraries for Python instead. These are the new and idiomatic Python libraries targeted specifically at Google Cloud Platform Services. {quote} Should this decision be revisited and possibly reversed? I am happy to open a new ticket, but wanted to raise the question here first for context. > Remove GCloud from Airflow > -- > > Key: AIRFLOW-15 > URL: https://issues.apache.org/jira/browse/AIRFLOW-15 > Project: Apache Airflow > Issue Type: Task > Components: gcp >Reporter: Chris Riccomini >Assignee: Chris Riccomini > Labels: gcp > > After speaking with Google, there was some concern about using the > [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library > for Airflow. There are several concerns: > # It's not clear (even to people at Google) what this library is, who owns > it, etc. > # It does not support all services (the way > [google-api-python-client|https://github.com/google/google-api-python-client] > does). > # There are compatibility issues between google-api-python-client and > gcloudpython. > We currently support both, after libraries depending on which package you you > install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove > the {{airflow[gcloud]}} packaged, and all associated code. > The main associated code, afaik, is the use of the {{gcloud}} library in the > Google cloud storage hooks/operators--specifically for Google cloud storage > Airfow logging. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (AIRFLOW-15) Remove GCloud from Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1527#comment-1527 ] Chris Riccomini commented on AIRFLOW-15: Yes, resolved. > Remove GCloud from Airflow > -- > > Key: AIRFLOW-15 > URL: https://issues.apache.org/jira/browse/AIRFLOW-15 > Project: Apache Airflow > Issue Type: Task > Components: gcp >Reporter: Chris Riccomini >Assignee: Chris Riccomini > Labels: gcp > > After speaking with Google, there was some concern about using the > [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library > for Airflow. There are several concerns: > # It's not clear (even to people at Google) what this library is, who owns > it, etc. > # It does not support all services (the way > [google-api-python-client|https://github.com/google/google-api-python-client] > does). > # There are compatibility issues between google-api-python-client and > gcloudpython. > We currently support both, after libraries depending on which package you you > install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove > the {{airflow[gcloud]}} packaged, and all associated code. > The main associated code, afaik, is the use of the {{gcloud}} library in the > Google cloud storage hooks/operators--specifically for Google cloud storage > Airfow logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-15) Remove GCloud from Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275706#comment-15275706 ] Bolke de Bruin commented on AIRFLOW-15: --- [~criccomini] is this issue resolved? > Remove GCloud from Airflow > -- > > Key: AIRFLOW-15 > URL: https://issues.apache.org/jira/browse/AIRFLOW-15 > Project: Apache Airflow > Issue Type: Task > Components: gcp >Reporter: Chris Riccomini >Assignee: Chris Riccomini > Labels: gcp > > After speaking with Google, there was some concern about using the > [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library > for Airflow. There are several concerns: > # It's not clear (even to people at Google) what this library is, who owns > it, etc. > # It does not support all services (the way > [google-api-python-client|https://github.com/google/google-api-python-client] > does). > # There are compatibility issues between google-api-python-client and > gcloudpython. > We currently support both, after libraries depending on which package you you > install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove > the {{airflow[gcloud]}} packaged, and all associated code. > The main associated code, afaik, is the use of the {{gcloud}} library in the > Google cloud storage hooks/operators--specifically for Google cloud storage > Airfow logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-15) Remove GCloud from Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263085#comment-15263085 ] Chris Riccomini commented on AIRFLOW-15: Got a +1 from [~jlowin]. Merged. > Remove GCloud from Airflow > -- > > Key: AIRFLOW-15 > URL: https://issues.apache.org/jira/browse/AIRFLOW-15 > Project: Apache Airflow > Issue Type: Task >Reporter: Chris Riccomini >Assignee: Chris Riccomini > > After speaking with Google, there was some concern about using the > [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library > for Airflow. There are several concerns: > # It's not clear (even to people at Google) what this library is, who owns > it, etc. > # It does not support all services (the way > [google-api-python-client|https://github.com/google/google-api-python-client] > does). > # There are compatibility issues between google-api-python-client and > gcloudpython. > We currently support both, after libraries depending on which package you you > install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove > the {{airflow[gcloud]}} packaged, and all associated code. > The main associated code, afaik, is the use of the {{gcloud}} library in the > Google cloud storage hooks/operators--specifically for Google cloud storage > Airfow logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-15) Remove GCloud from Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262418#comment-15262418 ] Chris Riccomini commented on AIRFLOW-15: Pull request: https://github.com/airbnb/airflow/pull/1448 [~jlowin], could use a review. > Remove GCloud from Airflow > -- > > Key: AIRFLOW-15 > URL: https://issues.apache.org/jira/browse/AIRFLOW-15 > Project: Apache Airflow > Issue Type: Task >Reporter: Chris Riccomini >Assignee: Chris Riccomini > > After speaking with Google, there was some concern about using the > [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library > for Airflow. There are several concerns: > # It's not clear (even to people at Google) what this library is, who owns > it, etc. > # It does not support all services (the way > [google-api-python-client|https://github.com/google/google-api-python-client] > does). > # There are compatibility issues between google-api-python-client and > gcloudpython. > We currently support both, after libraries depending on which package you you > install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove > the {{airflow[gcloud]}} packaged, and all associated code. > The main associated code, afaik, is the use of the {{gcloud}} library in the > Google cloud storage hooks/operators--specifically for Google cloud storage > Airfow logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-15) Remove GCloud from Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262362#comment-15262362 ] Chris Riccomini commented on AIRFLOW-15: I'm going to break this ticket in two: # Remove old GCSHook and all usage of gcloud. # Update the google-api-python-client code to use a [nice clean|https://github.com/airbnb/airflow/pull/1119/files#diff-948e87b4f8f644b3ad8c7950958df033R2074] form the way that GCSHook does (that takes fields for project, key path, etc). > Remove GCloud from Airflow > -- > > Key: AIRFLOW-15 > URL: https://issues.apache.org/jira/browse/AIRFLOW-15 > Project: Apache Airflow > Issue Type: Task >Reporter: Chris Riccomini >Assignee: Chris Riccomini > > After speaking with Google, there was some concern about using the > [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library > for Airflow. There are several concerns: > # It's not clear (even to people at Google) what this library is, who owns > it, etc. > # It does not support all services (the way > [google-api-python-client|https://github.com/google/google-api-python-client] > does). > # There are compatibility issues between google-api-python-client and > gcloudpython. > We currently support both, after libraries depending on which package you you > install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove > the {{airflow[gcloud]}} packaged, and all associated code. > The main associated code, afaik, is the use of the {{gcloud}} library in the > Google cloud storage hooks/operators--specifically for Google cloud storage > Airfow logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-15) Remove GCloud from Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262352#comment-15262352 ] Chris Riccomini commented on AIRFLOW-15: Removes these: https://github.com/airbnb/airflow/pull/1137 https://github.com/airbnb/airflow/pull/1119 > Remove GCloud from Airflow > -- > > Key: AIRFLOW-15 > URL: https://issues.apache.org/jira/browse/AIRFLOW-15 > Project: Apache Airflow > Issue Type: Task >Reporter: Chris Riccomini >Assignee: Chris Riccomini > > After speaking with Google, there was some concern about using the > [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library > for Airflow. There are several concerns: > # It's not clear (even to people at Google) what this library is, who owns > it, etc. > # It does not support all services (the way > [google-api-python-client|https://github.com/google/google-api-python-client] > does). > # There are compatibility issues between google-api-python-client and > gcloudpython. > We currently support both, after libraries depending on which package you you > install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove > the {{airflow[gcloud]}} packaged, and all associated code. > The main associated code, afaik, is the use of the {{gcloud}} library in the > Google cloud storage hooks/operators--specifically for Google cloud storage > Airfow logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-15) Remove GCloud from Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15262248#comment-15262248 ] Chris Riccomini commented on AIRFLOW-15: Note: compatibility issue is because we currently use {{SignedJwtAssertionCredentials}}, which was removed in the latest Google oauth library. Upgrading gcp_api breaks due to this backwards compatibility (we'll have to update our code). But gcloud requires the latest oauth library. > Remove GCloud from Airflow > -- > > Key: AIRFLOW-15 > URL: https://issues.apache.org/jira/browse/AIRFLOW-15 > Project: Apache Airflow > Issue Type: Task >Reporter: Chris Riccomini >Assignee: Chris Riccomini > > After speaking with Google, there was some concern about using the > [gcloud-python|https://github.com/GoogleCloudPlatform/gcloud-python] library > for Airflow. There are several concerns: > # It's not clear (even to people at Google) what this library is, who owns > it, etc. > # It does not support all services (the way > [google-api-python-client|https://github.com/google/google-api-python-client] > does). > # There are compatibility issues between google-api-python-client and > gcloudpython. > We currently support both, after libraries depending on which package you you > install: {{airfow[gcp_api]}} or {{airflow[gcloud]}}. This ticket is to remove > the {{airflow[gcloud]}} packaged, and all associated code. > The main associated code, afaik, is the use of the {{gcloud}} library in the > Google cloud storage hooks/operators--specifically for Google cloud storage > Airfow logging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)