[ 
https://issues.apache.org/jira/browse/BEAM-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17017479#comment-17017479
 ] 

Brad West commented on BEAM-9078:
---------------------------------

Hmm, I assumed the Github/Jira integration (is there not one?) would 
automatically update this ticket. PR was merged and fix is included in the 
release-2.19.0 branch. Do I mark as resolved or wait for automation to take 
care of this ticket? First time contributer, so please advise. Thanks

> Large Tarball Artifacts Should Use GCS Resumable Upload
> -------------------------------------------------------
>
>                 Key: BEAM-9078
>                 URL: https://issues.apache.org/jira/browse/BEAM-9078
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>    Affects Versions: 2.17.0
>            Reporter: Brad West
>            Assignee: Brad West
>            Priority: Major
>             Fix For: 2.19.0
>
>   Original Estimate: 1h
>          Time Spent: 40m
>  Remaining Estimate: 20m
>
> It's possible for the tarball uploaded to GCS to be quite large. An example 
> is a user vendoring multiple dependencies in their tarball so as to achieve a 
> more stable deployable artifact.
> Before this change the GCS upload api call executed a multipart upload, which 
> Google 
> [documentation]([https://cloud.google.com/storage/docs/json_api/v1/how-tos/upload)]
>  states should be used when the file is small enough to upload again when the 
> connection fails. For large tarballs, we will hit 60 second socket timeouts 
> before completing the multipart upload. By passing `total_size`, apitools 
> first checks if the size exceeds the resumable upload threshold, and executes 
> the more robust resumable upload rather than a multipart, avoiding
>  socket timeouts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to