eric-maynard commented on code in PR #490: URL: https://github.com/apache/polaris/pull/490#discussion_r2006126016
########## polaris-core/src/main/java/org/apache/polaris/core/persistence/cache/EntityWeigher.java: ########## @@ -0,0 +1,64 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.polaris.core.persistence.cache; + +import com.github.benmanes.caffeine.cache.Weigher; +import org.checkerframework.checker.index.qual.NonNegative; + +/** + * A {@link Weigher} implementation that weighs {@link EntityCacheEntry} objects by the approximate + * size of the entity object. + */ +public class EntityWeigher implements Weigher<Long, EntityCacheEntry> { + + /** The amount of weight that is expected to roughly equate to 1MB of memory usage */ + public static final long WEIGHT_PER_MB = 1024 * 1024; + + /* Represents the approximate size of an entity beyond the properties */ + private static final int APPROXIMATE_ENTITY_OVERHEAD = 1000; + + /** Singleton instance */ + private static final EntityWeigher instance = new EntityWeigher(); + + private EntityWeigher() {} + + /** Gets the singleton {@link EntityWeigher} */ + public static EntityWeigher getInstance() { + return instance; + } + + /** + * Computes the weight of a given entity + * + * @param key The entity's key; not used + * @param value The entity to be cached + * @return The weight of the entity + */ + @Override + public @NonNegative int weigh(Long key, EntityCacheEntry value) { + return APPROXIMATE_ENTITY_OVERHEAD + + value.getEntity().getProperties().length() + + value.getEntity().getInternalProperties().length(); Review Comment: I think you're referring to something like this? <img width="500" alt="Screenshot 2025-03-20 at 10 20 00 AM" src="https://github.com/user-attachments/assets/dad030c7-1216-45b7-bca8-ad4a9b3d5854" /> This shows that the original "multiplier" (1, so no multiplier) seems to work best for making the cache stabilize around 100MB of heap usage given the target of 100M "weight units". However, a few notes: * Please, if you feel strongly about this or have ideas about what analysis you'd like to see I invite you to contribute some of your own testing. Feel free to use the branch or spreadsheet linked above and please do reach out if I can support you in any way. I've done my best here but I am not a data scientist. * I still stand by my assertion about pre-GC heap size being relevant. If users care about memory pressure -- i.e. the impact the memory used by the cache has on the performance of their system -- then they don't just care about post-GC usage. If we look at pre-GC usage, the multiplier should be much higher and that's why I've gone with 3. * I do not think any of this should be necessary to fix the cache. This is a very old PR now, and a very old problem with the cache being unbounded. I posit that it would be better to ship this with a multiplier of 5 and lower it later on rather than to spend a great deal of time debating the ideal value. * The configurable weight target is much more important than the multiplier. Even if we choose the "wrong" multiplier, it's only wrong in that it doesn't make the cache size close to 100MB. But we don't promise anywhere that the cache will be 100MB. If there's a user who, for whatever reason, really wants the cache to be 100MB the way to accomplish that would be to mess with the weight target rather than this multiplier. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
