[ 
https://issues.apache.org/jira/browse/BEAM-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pei He updated BEAM-1153:
-------------------------
    Description: 
Non-batch requests uses RetryHttpRequestInitializer, which set read timeout as 
80 seconds, and does more retries.

Google Cloud auto generated Json library doesn't set HttpRequestInitializer for 
batch requests.

GcsUtil uses storageClient.batch(), and it is defined in here:
https://github.com/vparfonov/google-api-java-client/blob/master/google-api-client/src/main/java/com/google/api/client/googleapis/services/AbstractGoogleClient.java#L256

Without the HttpRequestInitializer, the default read timeout is 20 seconds.

Possible fix is: https://github.com/apache/incubator-beam/pull/1608

In additional, we can partially rollback 
https://github.com/apache/incubator-beam/pull/1359 to keep using non-batch API 
for fileSize() for single files. This will make sure existing code will keep 
work as the same way.
PR: https://github.com/apache/incubator-beam/pull/1611

  was:
Non-batch requests uses RetryHttpRequestInitializer, which set read timeout as 
80 seconds, and does more retries.

Google Cloud auto generated Json library doesn't set HttpRequestInitializer for 
batch requests.

GcsUtil uses storageClient.batch(), and it is defined in here:
https://github.com/vparfonov/google-api-java-client/blob/master/google-api-client/src/main/java/com/google/api/client/googleapis/services/AbstractGoogleClient.java#L256

Without the HttpRequestInitializer, the default read timeout is 20 seconds.

Possible fix is: https://github.com/apache/incubator-beam/pull/1608

In additional, we can partially rollback 
https://github.com/apache/incubator-beam/pull/1359 to keep using non-batch API 
for fileSize() for single files. This will make sure existing code will keep 
work as the same way.


> GcsUtil needs to set timeout and retry explicitly in BatchRequest.
> ------------------------------------------------------------------
>
>                 Key: BEAM-1153
>                 URL: https://issues.apache.org/jira/browse/BEAM-1153
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Pei He
>            Assignee: Pei He
>            Priority: Blocker
>
> Non-batch requests uses RetryHttpRequestInitializer, which set read timeout 
> as 80 seconds, and does more retries.
> Google Cloud auto generated Json library doesn't set HttpRequestInitializer 
> for batch requests.
> GcsUtil uses storageClient.batch(), and it is defined in here:
> https://github.com/vparfonov/google-api-java-client/blob/master/google-api-client/src/main/java/com/google/api/client/googleapis/services/AbstractGoogleClient.java#L256
> Without the HttpRequestInitializer, the default read timeout is 20 seconds.
> Possible fix is: https://github.com/apache/incubator-beam/pull/1608
> In additional, we can partially rollback 
> https://github.com/apache/incubator-beam/pull/1359 to keep using non-batch 
> API for fileSize() for single files. This will make sure existing code will 
> keep work as the same way.
> PR: https://github.com/apache/incubator-beam/pull/1611



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to