[ https://issues.apache.org/jira/browse/CASSANDRA-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754387#comment-13754387 ]
Vijay commented on CASSANDRA-5939: ---------------------------------- {quote} While java has overhead, it's not... {quote} Well try the following code in CacheProviderTest {code} @Test public void testCompareSizes() throws IOException { RowCacheKey key = new RowCacheKey(UUID.randomUUID(), ByteBufferUtil.bytes("test")); ColumnFamily cf = createCF(); System.out.println("size:" + (key.memorySize() + cf.memorySize())); System.out.println("key size:" + key.memorySize()); System.out.println("value size:" + cf.memorySize()); RowCacheSerializer serializer = new RowCacheSerializer(); DataOutputBuffer out = new DataOutputBuffer(); serializer.serialize(cf, out); System.out.println("ser size:" + out.getLength()); IRowCacheEntry cf2 = serializer.deserialize(new DataInputStream(new ByteArrayInputStream(out.getData()))); Assert.assertEquals(cf, cf2); } {code} output (actually value/CF overhead memorySize uses measureDeep() JAMM) {code} size:74120 key size:48 value size:74072 ser size:66 {code} I am just trying to figure out if there is any bug I am missing/overlooking. I agree that we need to have a configuration for the key size in JVM heap to contain OOM's etc. We can use this ticket to solve that issue. I do understand, we have removed CLHM in 2.0 so we can concentrate on getting a better configuration for SC. > Cache Providers calculate very different row sizes > -------------------------------------------------- > > Key: CASSANDRA-5939 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5939 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 1.2.8 > Reporter: Chris Burroughs > Assignee: Vijay > > Took the same production node and bounced it 4 times comparing version and > cache provider. ConcurrentLinkedHashCacheProvider and > SerializingCacheProvider produce very different results resulting in an order > of magnitude difference in rows cached. In all cases the row cache size was > 2048 MB. Hit rate is provided for color, but entries & size are the > important part. > 1.2.8 ConcurrentLinkedHashCacheProvider: > * entries: 23,217 > * hit rate: 43% > * size: 2,147,398,344 > 1.2.8 about 20 minutes of SerializingCacheProvider: > * entries: 221,709 > * hit rate: 68% > * size: 18,417254 > 1.2.5 ConcurrentLinkedHashCacheProvider: > * entries: 25,967 > * hit rate: ~ 50% > * size: 2,147,421,704 > 1.2.5 about 20 minutes of SerializingCacheProvider: > * entries: 228,457 > * hit rate: ~ 70% > * size: 19,070,315 > A related(?) problem is that the ConcurrentLinkedHashCacheProvider sizes seem > to be highly variable. Digging up the values for 5 different nodes in the > cluster using ConcurrentLinkedHashCacheProvider shows a wide variance in > number of entries: > * 12k > * 444k > * 10k > * 25k > * 25k -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira