[ https://issues.apache.org/jira/browse/AIRFLOW-2222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on AIRFLOW-2222 started by Berislav Lopac. ----------------------------------------------- > GoogleCloudStorageHook.copy fails for large files between locations > ------------------------------------------------------------------- > > Key: AIRFLOW-2222 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2222 > Project: Apache Airflow > Issue Type: Bug > Reporter: Berislav Lopac > Assignee: Berislav Lopac > Priority: Major > > When copying large files (confirmed for around 3GB) between buckets in > different projects, the operation fails and the Google API returns error > [413—Payload Too > Large|https://cloud.google.com/storage/docs/json_api/v1/status-codes#413_Payload_Too_Large]. > The documentation for the error says: > {quote}The Cloud Storage JSON API supports up to 5 TB objects. > This error may, alternatively, arise if copying objects between locations > and/or storage classes can not complete within 30 seconds. In this case, use > the > [Rewrite|https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite] > method instead.{quote} > The reason seems to be that the {{GoogleCloudStorageHook.copy}} is using the > API {{copy}} method. > h3. Proposed Solution > There are two potential solutions: > # Implement {{GoogleCloudStorageHook.rewrite}} method which can be called > from operators and other objects to ensure successful execution. This method > is more flexible but requires changes both in the {{GoogleCloudStorageHook}} > class and any other classes that use it for copying files to ensure that they > explicitly call {{rewrite}} when needed. > # Modify {{GoogleCloudStorageHook.copy}} to determine when to use {{rewrite}} > instead of {{copy}} underneath. This requires updating only the > {{GoogleCloudStorageHook}} class, but the logic might not cover all the edge > cases and could be difficult to implement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)