Re: [PR] CASSANDRA-20940 Extend Nodetool tablestats for dictionary memory usage [cassandra]

via GitHub Sun, 02 Nov 2025 23:57:50 -0800


yifan-c commented on code in PR #4459:
URL: https://github.com/apache/cassandra/pull/4459#discussion_r2485587072



##########
src/java/org/apache/cassandra/db/compression/CompressionDictionaryCache.java:
##########
@@ -67,6 +70,7 @@ public CompressionDictionaryCache()
                                  {
                                      try
                                      {
+                                         
totalSize.addAndGet(-dictionary.rawDictionary().length);

Review Comment:
   It is not just the rawDictionary (heap) that is kept in memory; there are 
also off-heap memory usage,  `ZstdDictDecompress` (one per 
`ZstdCompressionDictionary`) and `ZstdDictCompress` (many per 
`ZstdCompressionDictionary`). However, zstd-jni, does not expose the API to 
check the size of the struct, `ZSTD_CDict` and `ZSTD_DDict`. Each keeps a copy 
of the raw dictionary, and the runtime lookup tables. Therefore, it is at least 
3x. 
   How about we update use the estimated and leave a comment to explain the 
compromise? 



##########
src/java/org/apache/cassandra/db/compression/CompressionDictionaryCache.java:
##########
@@ -101,7 +105,10 @@ public void add(@Nullable CompressionDictionary 
compressionDictionary)
 
         // Only update cache if not already in the cache
         DictId newDictId = compressionDictionary.dictId();
-        cache.get(newDictId, id -> compressionDictionary);
+        cache.get(newDictId, id -> {
+            totalSize.addAndGet(compressionDictionary.rawDictionary().length);

Review Comment:
   How about adding a new interface method in `CompressionDictionary` to return 
the size? It should handle the `ZstdDictDecompress` and `ZstdDictCompress` 
cleanly, as they are initialized lazily. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] CASSANDRA-20940 Extend Nodetool tablestats for dictionary memory usage [cassandra]

Reply via email to