eric-maynard commented on code in PR #490:
URL: https://github.com/apache/polaris/pull/490#discussion_r2032081509


##########
polaris-core/src/main/java/org/apache/polaris/core/persistence/cache/EntityWeigher.java:
##########
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.polaris.core.persistence.cache;
+
+import com.github.benmanes.caffeine.cache.Weigher;
+import org.checkerframework.checker.index.qual.NonNegative;
+
+/**
+ * A {@link Weigher} implementation that weighs {@link EntityCacheEntry} 
objects by the approximate
+ * size of the entity object.
+ */
+public class EntityWeigher implements Weigher<Long, EntityCacheEntry> {
+
+  /** The amount of weight that is expected to roughly equate to 1MB of memory 
usage */
+  public static final long WEIGHT_PER_MB = 1024 * 1024;
+
+  /* Represents the approximate size of an entity beyond the properties */
+  private static final int APPROXIMATE_ENTITY_OVERHEAD = 1000;
+
+  /** Singleton instance */
+  private static final EntityWeigher instance = new EntityWeigher();
+
+  private EntityWeigher() {}
+
+  /** Gets the singleton {@link EntityWeigher} */
+  public static EntityWeigher getInstance() {
+    return instance;
+  }
+
+  /**
+   * Computes the weight of a given entity
+   *
+   * @param key The entity's key; not used
+   * @param value The entity to be cached
+   * @return The weight of the entity
+   */
+  @Override
+  public @NonNegative int weigh(Long key, EntityCacheEntry value) {
+    return APPROXIMATE_ENTITY_OVERHEAD
+        + value.getEntity().getProperties().length()
+        + value.getEntity().getInternalProperties().length();

Review Comment:
   Thanks. Based on the above, here's what I collected. All of the heap size 
measurements below are taken with `System.gc()` being called beforehand. The 
same 
[spreadsheet](https://docs.google.com/spreadsheets/d/1Ee7kcW7sB9_lp1lYJQyJ4LxYh4QFicwyihHo63CJaCE/edit?gid=873444573#gid=873444573)
 & branch linked above were used. No multiplier was applied.
   
   This shows weight vs. heap usage for small & large objects (100 characters 
vs 10k characters):
   
   <img width="700" alt="Screenshot 2025-04-07 at 3 05 38 PM" 
src="https://github.com/user-attachments/assets/692eada9-ab0c-4111-bc00-3e413b833f3b";
 />
   
   As you can see, across all 4 trials heap usage stabilizes _somewhere_. So in 
that sense the weigher is working very well. It's notable that the exact amount 
of heap usage we stabilize at varies greatly by entity size. Each entity has a 
1000 "weight unit" cost associated with it besides whatever weight it's 
assigned based on its properties and the multiplier, so it makes sense that 
when using smaller entities we'll stabilize a bit lower than expected.
   
   The exact point at which we stabilize varies from ~200MB to ~60MB, 
suggesting a multiplier anywhere in the range of **0.5 and 2.0** could be 
effective. Indeed, when we perform the same test with a multiplier of 2 we see 
that all 4 trials stabilize below 100MB.
   
   Focusing in on the time before the heap usage stabilizes, we see that adding 
a WU of data adds between 0.55b - 1.85b. This points us to the same conclusion 
-- a multiplier between 0.55 and 1.85 is needed to ensure that a weight target 
of 100M weight units keeps the cache below 100MB of heap usage. 
   
   I chose 3 rather than 2 in this PR as an exceedingly conservative value 
intended to always keep us below 100MB, as well as to account for transient 
memory usage. However I am happy to change this value to anything you prefer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@polaris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to