Strikerrx01 opened a new pull request, #34135:
URL: https://github.com/apache/beam/pull/34135

   # [Python] Add caching for BigQuery table definitions
   
   ## Description
   This PR addresses issue #34076 by implementing a caching mechanism for 
BigQuery table definitions in the `BigQueryWrapper` class. Currently, the 
`get_table()` method is called independently by each worker, which can lead to 
BigQuery quota issues for users. This implementation adds a cache to store 
table definitions and reuse them across worker instances.
   
   ## Changes
   1. Added a dictionary `_table_cache` to the `BigQueryWrapper` class to store 
table definitions.
   2. Modified the `get_table()` method to check the cache before making an API 
call.
   3. Added debug logging to track cache hits and misses.
   4. Added a new method `clear_table_cache()` to allow clearing the entire 
cache or specific entries.
   5. Added unit tests to verify the caching behavior.
   
   ## Benefits
   - Reduces the number of BigQuery API calls for table definitions.
   - Mitigates quota issues for users with many workers accessing the same 
table.
   - Improves performance by avoiding redundant API calls.
   
   ## Testing
   Added unit tests that verify:
   - Cache is used for subsequent lookups of the same table.
   - Tables can be selectively cleared from the cache.
   - The entire cache can be cleared when needed.
   
   ## Additional Notes
   - The cache is maintained at the instance level of the `BigQueryWrapper` 
class.
   - Debug logs are included to help diagnose any issues with the caching 
mechanism.
   - The cache key is constructed using the format 
`"{project_id}:{dataset_id}.{table_id}"`.
   
   Fixes #34076 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to