[ 
https://issues.apache.org/jira/browse/BEAM-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885693#comment-16885693
 ] 

Udi Meiri commented on BEAM-2264:
---------------------------------

Migrating off of apitools-based clients (such as the one used for GCS) is a 
much larger project.
Regarding threading issues, I believe that we came across them when trying to 
reuse GCS clients in multiple threads 
(https://issues.apache.org/jira/browse/BEAM-3990).
The credentials, however, should be thread-safe according to this note: 
https://github.com/googleapis/google-api-python-client/blob/master/docs/thread_safety.md

In any case, the newer credentials object don't try to be thread safe, but that 
is apparently not an issue: 
https://github.com/googleapis/google-auth-library-python/issues/246#issuecomment-371878855
(just potentially causes more refreshes)

> Re-use credential instead of generating a new one one each GCS call
> -------------------------------------------------------------------
>
>                 Key: BEAM-2264
>                 URL: https://issues.apache.org/jira/browse/BEAM-2264
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Luke Cwik
>            Priority: Minor
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> We should cache the credential used within a Pipeline and re-use it instead 
> of generating a new one on each GCS call. When executing (against 2.0.0 RC2):
> {code}
> python -m apache_beam.examples.wordcount --input 
> "gs://dataflow-samples/shakespeare/*" --output local_counts
> {code}
> Note that we seemingly generate a new access token each time instead of when 
> a refresh is required.
> {code}
>   super(GcsIO, cls).__new__(cls, storage_client))
> INFO:root:Starting the size estimation of the input
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:root:Finished the size estimation of the input at 1 files. Estimation 
> took 0.286200046539 seconds
> INFO:root:Running pipeline with DirectRunner.
> INFO:root:Starting the size estimation of the input
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:root:Finished the size estimation of the input at 43 files. Estimation 
> took 0.205624818802 seconds
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> INFO:oauth2client.client:Refreshing access_token
> INFO:oauth2client.transport:Attempting refresh to obtain initial access_token
> ... many more times ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to