eric-maynard commented on code in PR #490: URL: https://github.com/apache/polaris/pull/490#discussion_r2032081509
########## polaris-core/src/main/java/org/apache/polaris/core/persistence/cache/EntityWeigher.java: ########## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.polaris.core.persistence.cache; + +import com.github.benmanes.caffeine.cache.Weigher; +import org.checkerframework.checker.index.qual.NonNegative; + +/** + * A {@link Weigher} implementation that weighs {@link EntityCacheEntry} objects by the approximate + * size of the entity object. + */ +public class EntityWeigher implements Weigher<Long, EntityCacheEntry> { + + /** The amount of weight that is expected to roughly equate to 1MB of memory usage */ + public static final long WEIGHT_PER_MB = 1024 * 1024; + + /* Represents the approximate size of an entity beyond the properties */ + private static final int APPROXIMATE_ENTITY_OVERHEAD = 1000; + + /** Singleton instance */ + private static final EntityWeigher instance = new EntityWeigher(); + + private EntityWeigher() {} + + /** Gets the singleton {@link EntityWeigher} */ + public static EntityWeigher getInstance() { + return instance; + } + + /** + * Computes the weight of a given entity + * + * @param key The entity's key; not used + * @param value The entity to be cached + * @return The weight of the entity + */ + @Override + public @NonNegative int weigh(Long key, EntityCacheEntry value) { + return APPROXIMATE_ENTITY_OVERHEAD + + value.getEntity().getProperties().length() + + value.getEntity().getInternalProperties().length(); Review Comment: Thanks. Based on the above, here's what I collected. All of the heap size measurements below are taken with `System.gc()` being called beforehand. The same [spreadsheet](https://docs.google.com/spreadsheets/d/1Ee7kcW7sB9_lp1lYJQyJ4LxYh4QFicwyihHo63CJaCE/edit?gid=873444573#gid=873444573) & branch linked above were used. No multiplier was applied. This shows weight vs. heap usage for small & large objects (100 characters vs 10k characters): <img width="700" alt="Screenshot 2025-04-07 at 3 05 38 PM" src="https://github.com/user-attachments/assets/692eada9-ab0c-4111-bc00-3e413b833f3b" /> As you can see, across all 4 trials heap usage stabilizes _somewhere_. So in that sense the weigher is working very well. It's notable that the exact amount of heap usage we stabilize at varies greatly by entity size. Each entity has a 1000 "weight unit" cost associated with it besides whatever weight it's assigned based on its properties and the multiplier, so it makes sense that when using smaller entities we'll stabilize a bit lower than expected. The exact point at which we stabilize varies from ~200MB to ~60MB, suggesting a multiplier anywhere in the range of **0.5 and 2.0** could be effective. Indeed, when we perform the same test with a multiplier of 2 we see that all 4 trials stabilize below 100MB. Focusing in on the time before the heap usage stabilizes, we see that adding a WU of data adds between 0.55b - 1.85b. This points us to the same conclusion -- a multiplier between 0.55 and 1.85 is needed to ensure that a weight target of 100M weight units keeps the cache below 100MB of heap usage. I chose 3 rather than 2 in this PR as an exceedingly conservative value intended to always keep us below 100MB, as well as to account for transient memory usage. However I am happy to change this value to anything you prefer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@polaris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org