yifan-c commented on code in PR #4459:
URL: https://github.com/apache/cassandra/pull/4459#discussion_r2485587072
##########
src/java/org/apache/cassandra/db/compression/CompressionDictionaryCache.java:
##########
@@ -67,6 +70,7 @@ public CompressionDictionaryCache()
{
try
{
+
totalSize.addAndGet(-dictionary.rawDictionary().length);
Review Comment:
It is not just the rawDictionary (heap) that is kept in memory; there are
also off-heap memory usage, `ZstdDictDecompress` (one per
`ZstdCompressionDictionary`) and `ZstdDictCompress` (many per
`ZstdCompressionDictionary`). However, zstd-jni, does not expose the API to
check the size of the struct, `ZSTD_CDict` and `ZSTD_DDict`. Each keeps a copy
of the raw dictionary, and the runtime lookup tables. Therefore, it is at least
3x.
How about we update use the estimated and leave a comment to explain the
compromise?
##########
src/java/org/apache/cassandra/db/compression/CompressionDictionaryCache.java:
##########
@@ -101,7 +105,10 @@ public void add(@Nullable CompressionDictionary
compressionDictionary)
// Only update cache if not already in the cache
DictId newDictId = compressionDictionary.dictId();
- cache.get(newDictId, id -> compressionDictionary);
+ cache.get(newDictId, id -> {
+ totalSize.addAndGet(compressionDictionary.rawDictionary().length);
Review Comment:
How about adding a new interface method in `CompressionDictionary` to return
the size? It should handle the `ZstdDictDecompress` and `ZstdDictCompress`
cleanly, as they are initialized lazily.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]