jerqi commented on code in PR #8450:
URL: https://github.com/apache/gravitino/pull/8450#discussion_r2335716997
##########
core/src/main/java/org/apache/gravitino/stats/storage/LancePartitionStatisticStorage.java:
##########
@@ -133,7 +153,45 @@ public LancePartitionStatisticStorage(Map<String, String>
properties) {
properties.getOrDefault(READ_BATCH_SIZE,
String.valueOf(DEFAULT_READ_BATCH_SIZE)));
Preconditions.checkArgument(
readBatchSize > 0, "Lance partition statistics storage readBatchSize
must be positive");
+ int datasetCacheSize =
+ Integer.parseInt(
+ properties.getOrDefault(
+ DATASET_CACHE_SIZE,
String.valueOf(DEFAULT_DATASET_CACHE_SIZE)));
+ Preconditions.checkArgument(
+ datasetCacheSize > 0,
+ "Lance partition statistics storage datasetCacheSize must be
positive");
+ this.metadataFileCacheSize =
+ Long.parseLong(
+ properties.getOrDefault(
+ METADATA_FILE_CACHE_SIZE,
String.valueOf(DEFAULT_METADATA_FILE_CACHE_SIZE)));
+ Preconditions.checkArgument(
+ metadataFileCacheSize > 0,
+ "Lance partition statistics storage metadataFileCacheSizeBytes must be
positive");
+ this.indexCacheSize =
+ Long.parseLong(
+ properties.getOrDefault(INDEX_CACHE_SIZE,
String.valueOf(DEFAULT_INDEX_CACHE_SIZE)));
+ Preconditions.checkArgument(
+ indexCacheSize > 0,
+ "Lance partition statistics storage indexCacheSizeBytes must be
positive");
+
this.properties = properties;
+
+ this.cache =
+ Caffeine.newBuilder()
+ .maximumSize(datasetCacheSize)
Review Comment:
The dataset will cache the metadata file and index file. If we have
expiration time, we need the complex mechanism to trigger the cache to avoid
the slow read.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]