[ https://issues.apache.org/jira/browse/BEAM-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luke Cwik updated BEAM-9078: ---------------------------- Status: Open (was: Triage Needed) > Large Tarball Artifacts Should Use GCS Resumable Upload > ------------------------------------------------------- > > Key: BEAM-9078 > URL: https://issues.apache.org/jira/browse/BEAM-9078 > Project: Beam > Issue Type: Bug > Components: runner-dataflow > Affects Versions: 2.17.0 > Reporter: Brad West > Assignee: Brad West > Priority: Major > Fix For: 2.19.0 > > Original Estimate: 1h > Time Spent: 40m > Remaining Estimate: 20m > > It's possible for the tarball uploaded to GCS to be quite large. An example > is a user vendoring multiple dependencies in their tarball so as to achieve a > more stable deployable artifact. > Before this change the GCS upload api call executed a multipart upload, which > Google > [documentation]([https://cloud.google.com/storage/docs/json_api/v1/how-tos/upload)] > states should be used when the file is small enough to upload again when the > connection fails. For large tarballs, we will hit 60 second socket timeouts > before completing the multipart upload. By passing `total_size`, apitools > first checks if the size exceeds the resumable upload threshold, and executes > the more robust resumable upload rather than a multipart, avoiding > socket timeouts. -- This message was sent by Atlassian Jira (v8.3.4#803005)