gaborkaszab commented on code in PR #14398:
URL: https://github.com/apache/iceberg/pull/14398#discussion_r2687127212
##########
core/src/main/java/org/apache/iceberg/rest/RESTCatalog.java:
##########
@@ -96,6 +97,11 @@ public void initialize(String name, Map<String, String>
props) {
sessionCatalog.initialize(name, props);
}
+ @VisibleForTesting
+ RESTSessionCatalog sessionCatalog() {
Review Comment:
It could work as protected too, but I didn't want to expose this to whoever
is deriving from this class. I thought package private + `@VisibleForTesting`
is the way to express that this is exposed only for testing.
I changed this as you suggested.
##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -424,8 +517,26 @@ public Table loadTable(SessionContext context,
TableIdentifier identifier) {
MetadataTableType metadataType;
LoadTableResponse response;
TableIdentifier loadedIdent;
+
+ Map<String, String> responseHeaders = Maps.newHashMap();
+ TableSupplierWithETag cachedTable =
+ tableCache.getIfPresent(SessionIdTableId.of(context.sessionId(),
identifier));
+
try {
- response = loadInternal(context, identifier, snapshotMode);
+ response =
+ loadInternal(
+ context,
+ identifier,
+ snapshotMode,
+ headersForLoadTable(cachedTable),
+ responseHeaders::putAll);
+
+ if (response == null) {
+ Preconditions.checkNotNull(cachedTable, "Invalid load table response:
null");
Review Comment:
Done
##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -277,9 +316,46 @@ public void initialize(String name, Map<String, String>
unresolved) {
mergedProps,
RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED,
RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED_DEFAULT);
+
+ this.tableCache = tableCacheBuilder(mergedProps).build();
+
super.initialize(name, mergedProps);
}
+ @VisibleForTesting
Review Comment:
Sure. It is used in `TestableRESTSessionCatalog` that is in the test/
folder, hence I thought that `@VisibleForTesting` makes sense.
I changed this to protected now.
##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -424,8 +517,26 @@ public Table loadTable(SessionContext context,
TableIdentifier identifier) {
MetadataTableType metadataType;
LoadTableResponse response;
TableIdentifier loadedIdent;
+
+ Map<String, String> responseHeaders = Maps.newHashMap();
+ TableSupplierWithETag cachedTable =
+ tableCache.getIfPresent(SessionIdTableId.of(context.sessionId(),
identifier));
Review Comment:
No need to have different ETags for different snapshot mode. This works now
without that and also without adding the snapshot mode into the cache key. When
the catalog is in REFS mode then the cache only contains the REFS mode of the
table. When clients do the on-demand lazy snapshot loading, it doesn't go
through the table cache, it is done via a `TableMetadata.snapshotSupplier`
calling `loadInternal` directly avoiding the cache.
This works, however each client getting the table from cache have to load
all snapshots on-demand themselves. I deliberately decided to not implement
that with this initial PR to keep the code as simple as possible, and to make
reviewing easier.
I prefer to implement freshness-aware loading for on-demand snapshot loading
as a follow-up PR. LMK what you think!
##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -277,9 +316,46 @@ public void initialize(String name, Map<String, String>
unresolved) {
mergedProps,
RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED,
RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED_DEFAULT);
+
+ this.tableCache = tableCacheBuilder(mergedProps).build();
+
super.initialize(name, mergedProps);
}
+ @VisibleForTesting
+ Caffeine<Object, Object> tableCacheBuilder(Map<String, String> props) {
+ long expireAfterWriteMS =
+ PropertyUtil.propertyAsLong(
+ props,
+ RESTCatalogProperties.TABLE_CACHE_EXPIRE_AFTER_WRITE_MS,
+ RESTCatalogProperties.TABLE_CACHE_EXPIRE_AFTER_WRITE_MS_DEFAULT);
+ Preconditions.checkArgument(
Review Comment:
I use the max number of entries for this. It accepts zero and it means the
cache is turned off. The documentation for `maxSize()` says:
"When size is zero, elements will be evicted immediately after being loaded
into the cache. This can be useful in testing, or to disable caching
temporarily without a code change."
While the doc for `expireAfterWrite()` doesn't explicitly say this even
though it could be used for this purpose. It mentions the following though:
"Expired entries are cleaned up as part of the routine maintenance"
This gave me the impression that tweaking the expiration to zero eventually
evicts the entries, while setting maxSize to zero does this immediately. Hence
I went for the latter. Let me know if you think expiration interval param is
more suitable than max size param for this.
##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -277,9 +316,46 @@ public void initialize(String name, Map<String, String>
unresolved) {
mergedProps,
RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED,
RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED_DEFAULT);
+
+ this.tableCache = tableCacheBuilder(mergedProps).build();
Review Comment:
I wanted to create a way where I can override the method in
`TestableRESTSessionCatalog` to do everything what this method does plus adding
a ticker. I can change this function to return not the builder but the built
cache instead, but then the override has to contain all the code that the base
version has, and whenever we change how the cache is built here, we probably
have to change the same in `TestableRESTSessionCatalog` too.
##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -435,14 +546,34 @@ public Table loadTable(SessionContext context,
TableIdentifier identifier) {
// attempt to load a metadata table using the identifier's namespace
as the base table
TableIdentifier baseIdent =
TableIdentifier.of(identifier.namespace().levels());
try {
- response = loadInternal(context, baseIdent, snapshotMode);
+ responseHeaders.clear();
+ cachedTable =
+ tableCache.getIfPresent(SessionIdTableId.of(context.sessionId(),
baseIdent));
+
+ response =
+ loadInternal(
+ context,
+ baseIdent,
+ snapshotMode,
+ headersForLoadTable(cachedTable),
+ responseHeaders::putAll);
+
+ if (response == null) {
+ Preconditions.checkNotNull(cachedTable, "Invalid load table
response: null");
+
+ return MetadataTableUtils.createMetadataTableInstance(
+ cachedTable.tableSupplier().get(), metadataType);
+ }
+
loadedIdent = baseIdent;
} catch (NoSuchTableException ignored) {
// the base table does not exist
+ invalidateTable(context, baseIdent);
throw original;
}
} else {
// name is not a metadata table
+ invalidateTable(context, identifier);
Review Comment:
Technically it's feasible that we get here and the table exists in the
cache. See `TestRESTCatalog.testLoadTableInvalidatesCache`. Steps:
1) We load the table. This populates the cache
2) With another catalog instance (maybe through another engine for instance)
we drop the table. The table will remain in the cache in this catalog.
3) Another load table will fail because the table doesn't exist anymore,
while the table still in the cache. I think it makes sense to invalidate the
table in this case.
##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -277,9 +316,46 @@ public void initialize(String name, Map<String, String>
unresolved) {
mergedProps,
RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED,
RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED_DEFAULT);
+
+ this.tableCache = tableCacheBuilder(mergedProps).build();
+
super.initialize(name, mergedProps);
}
+ @VisibleForTesting
+ Caffeine<Object, Object> tableCacheBuilder(Map<String, String> props) {
+ long expireAfterWriteMS =
+ PropertyUtil.propertyAsLong(
+ props,
+ RESTCatalogProperties.TABLE_CACHE_EXPIRE_AFTER_WRITE_MS,
+ RESTCatalogProperties.TABLE_CACHE_EXPIRE_AFTER_WRITE_MS_DEFAULT);
+ Preconditions.checkArgument(
+ expireAfterWriteMS > 0, "Invalid expire after write: zero or
negative");
+
+ long numEntries =
+ PropertyUtil.propertyAsLong(
+ props,
+ RESTCatalogProperties.TABLE_CACHE_MAX_ENTRIES,
+ RESTCatalogProperties.TABLE_CACHE_MAX_ENTRIES_DEFAULT);
+ Preconditions.checkArgument(numEntries >= 0, "Invalid max entries:
negative");
+
+ Caffeine<Object, Object> builder =
+ Caffeine.newBuilder()
+ .maximumSize(numEntries)
+ .expireAfterWrite(Duration.ofMillis(expireAfterWriteMS))
+ .removalListener(
+ (compositeKey, table, cause) ->
+ LOG.debug("Evicted {} from table cache ({})",
compositeKey, cause))
+ .recordStats();
+
+ return builder;
+ }
+
+ @VisibleForTesting
Review Comment:
Sure, done.
I'm not sure I have to change `SessionIdTableId` and `TableSupplierWithETag`
also to protected because now they are private but exposed via a protected
method. Builds, but IDE has some complaints.
##########
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java:
##########
@@ -277,9 +316,46 @@ public void initialize(String name, Map<String, String>
unresolved) {
mergedProps,
RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED,
RESTCatalogProperties.REST_SCAN_PLANNING_ENABLED_DEFAULT);
+
+ this.tableCache = tableCacheBuilder(mergedProps).build();
+
super.initialize(name, mergedProps);
}
+ @VisibleForTesting
+ Caffeine<Object, Object> tableCacheBuilder(Map<String, String> props) {
Review Comment:
Yes, this returns a builder and not the built cache. The naming comes from
Caffeine and I also find it misleading :)
I explain above why I return a builder instead of the built cache.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]