Strikerrx01 commented on code in PR #34135:
URL: https://github.com/apache/beam/pull/34135#discussion_r1976635680
##########
sdks/python/apache_beam/io/gcp/bigquery_tools.py:
##########
@@ -786,11 +788,58 @@ def get_table(self, project_id, dataset_id, table_id):
Raises:
HttpError: if lookup failed.
"""
+ cache_key = f"{project_id}:{dataset_id}.{table_id}"
+ if cache_key in self._table_cache:
+ _LOGGER.debug("Cache hit for table: %s", cache_key)
+ return self._table_cache[cache_key]
+
+ _LOGGER.debug("Cache miss for table: %s", cache_key)
request = bigquery.BigqueryTablesGetRequest(
projectId=project_id, datasetId=dataset_id, tableId=table_id)
response = self.client.tables.Get(request)
+
+ # Store the response in cache
+ self._table_cache[cache_key] = response
return response
+ def clear_table_cache(self, project_id=None, dataset_id=None, table_id=None):
+ """Clear the cache for tables.
Review Comment:
Hi @liferoad, thanks for the review!
The `clear_table_cache` method serves two purposes:
1. **Automatic cache clearing**: With my recent commit, the cache is now
automatically cleared in the `set_table_definition_ttl` method when changing
from disabled caching (TTL=0) to enabled caching (TTL>0). This is important
because if a user had disabled caching and then re-enables it, we want to
ensure they get fresh data rather than potentially stale entries.
2. **Manual clearing (API)**: It also provides a public API for users to
manually clear the cache in specific scenarios if needed:
- Clear the entire cache by calling `wrapper.clear_table_cache()`
- Clear entries for a specific project with
`wrapper.clear_table_cache(project_id='my-project')`
- Clear entries for a specific dataset with
`wrapper.clear_table_cache(project_id='my-project', dataset_id='my-dataset')`
- Clear a specific table entry with
`wrapper.clear_table_cache(project_id='my-project', dataset_id='my-dataset',
table_id='my-table')`
While automatic clearing in most relevant scenarios is handled within the
class, exposing this method gives users fine-grained control if they need to
force a refresh of specific table definitions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]