[ https://issues.apache.org/jira/browse/BEAM-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16885693#comment-16885693 ]
Udi Meiri commented on BEAM-2264: --------------------------------- Migrating off of apitools-based clients (such as the one used for GCS) is a much larger project. Regarding threading issues, I believe that we came across them when trying to reuse GCS clients in multiple threads (https://issues.apache.org/jira/browse/BEAM-3990). The credentials, however, should be thread-safe according to this note: https://github.com/googleapis/google-api-python-client/blob/master/docs/thread_safety.md In any case, the newer credentials object don't try to be thread safe, but that is apparently not an issue: https://github.com/googleapis/google-auth-library-python/issues/246#issuecomment-371878855 (just potentially causes more refreshes) > Re-use credential instead of generating a new one one each GCS call > ------------------------------------------------------------------- > > Key: BEAM-2264 > URL: https://issues.apache.org/jira/browse/BEAM-2264 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core > Reporter: Luke Cwik > Priority: Minor > Time Spent: 1.5h > Remaining Estimate: 0h > > We should cache the credential used within a Pipeline and re-use it instead > of generating a new one on each GCS call. When executing (against 2.0.0 RC2): > {code} > python -m apache_beam.examples.wordcount --input > "gs://dataflow-samples/shakespeare/*" --output local_counts > {code} > Note that we seemingly generate a new access token each time instead of when > a refresh is required. > {code} > super(GcsIO, cls).__new__(cls, storage_client)) > INFO:root:Starting the size estimation of the input > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:root:Finished the size estimation of the input at 1 files. Estimation > took 0.286200046539 seconds > INFO:root:Running pipeline with DirectRunner. > INFO:root:Starting the size estimation of the input > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:root:Finished the size estimation of the input at 43 files. Estimation > took 0.205624818802 seconds > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > INFO:oauth2client.client:Refreshing access_token > INFO:oauth2client.transport:Attempting refresh to obtain initial access_token > ... many more times ... > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016)