eric-maynard commented on code in PR #490:
URL: https://github.com/apache/polaris/pull/490#discussion_r2006126016


##########
polaris-core/src/main/java/org/apache/polaris/core/persistence/cache/EntityWeigher.java:
##########
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.polaris.core.persistence.cache;
+
+import com.github.benmanes.caffeine.cache.Weigher;
+import org.checkerframework.checker.index.qual.NonNegative;
+
+/**
+ * A {@link Weigher} implementation that weighs {@link EntityCacheEntry} 
objects by the approximate
+ * size of the entity object.
+ */
+public class EntityWeigher implements Weigher<Long, EntityCacheEntry> {
+
+  /** The amount of weight that is expected to roughly equate to 1MB of memory 
usage */
+  public static final long WEIGHT_PER_MB = 1024 * 1024;
+
+  /* Represents the approximate size of an entity beyond the properties */
+  private static final int APPROXIMATE_ENTITY_OVERHEAD = 1000;
+
+  /** Singleton instance */
+  private static final EntityWeigher instance = new EntityWeigher();
+
+  private EntityWeigher() {}
+
+  /** Gets the singleton {@link EntityWeigher} */
+  public static EntityWeigher getInstance() {
+    return instance;
+  }
+
+  /**
+   * Computes the weight of a given entity
+   *
+   * @param key The entity's key; not used
+   * @param value The entity to be cached
+   * @return The weight of the entity
+   */
+  @Override
+  public @NonNegative int weigh(Long key, EntityCacheEntry value) {
+    return APPROXIMATE_ENTITY_OVERHEAD
+        + value.getEntity().getProperties().length()
+        + value.getEntity().getInternalProperties().length();

Review Comment:
   I think you're referring to something like this?
   
   <img width="500" alt="Screenshot 2025-03-20 at 10 20 00 AM" 
src="https://github.com/user-attachments/assets/dad030c7-1216-45b7-bca8-ad4a9b3d5854";
 />
   
   This shows that the original "multiplier" (1, so no multiplier) seems to 
work best for making the cache stabilize around 100MB of heap usage given the 
target of 100M "weight units". However, a few notes:
   
   
   * Please, if you feel strongly about this or have ideas about what analysis 
you'd like to see I invite you to contribute some of your own testing. Feel 
free to use the branch or spreadsheet linked above and please do reach out if I 
can support you in any way. I've done my best here but I am not a data 
scientist.
   
   
   * I still stand by my assertion about pre-GC heap size being relevant. If 
users care about memory pressure -- i.e. the impact the memory used by the 
cache has on the performance of their system -- then they don't just care about 
post-GC usage. If we look at pre-GC usage, the multiplier should be much higher 
and that's why I've gone with 3.
   
   * I do not think any of this should be necessary to fix the cache. This is a 
very old PR now, and a very old problem with the cache being unbounded. I posit 
that it would be better to ship this with a multiplier of 5 and lower it later 
on rather than to spend a great deal of time debating the ideal value.
   
   * The configurable weight target is much more important than the multiplier. 
Even if we choose the "wrong" multiplier, it's only wrong in that it doesn't 
make the cache size close to 100MB. But we don't promise anywhere that the 
cache will be 100MB. If there's a user who, for whatever reason, really wants 
the cache to be 100MB the way to accomplish that would be to mess with the 
weight target rather than this multiplier.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to