[ 
https://issues.apache.org/jira/browse/CASSANDRA-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754387#comment-13754387
 ] 

Vijay commented on CASSANDRA-5939:
----------------------------------

{quote}
While java has overhead, it's not...
{quote}

Well try the following code in CacheProviderTest

{code}
    @Test
    public void testCompareSizes() throws IOException
    {
        RowCacheKey key = new RowCacheKey(UUID.randomUUID(), 
ByteBufferUtil.bytes("test"));
        ColumnFamily cf = createCF();
        System.out.println("size:" + (key.memorySize() + cf.memorySize()));
        System.out.println("key size:" + key.memorySize());
        System.out.println("value size:" + cf.memorySize());
        RowCacheSerializer serializer = new RowCacheSerializer();
        DataOutputBuffer out = new DataOutputBuffer();
        serializer.serialize(cf, out);
        System.out.println("ser size:" + out.getLength());

        IRowCacheEntry cf2 = serializer.deserialize(new DataInputStream(new 
ByteArrayInputStream(out.getData())));
        Assert.assertEquals(cf, cf2);
    }
{code}

output (actually value/CF overhead memorySize uses measureDeep() JAMM)

{code}
size:74120
key size:48
value size:74072
ser size:66
{code}

I am just trying to figure out if there is any bug I am missing/overlooking. I 
agree that we need to have a configuration for the key size in JVM heap to 
contain OOM's etc. 
We can use this ticket to solve that issue. I do understand, we have removed 
CLHM in 2.0 so we can concentrate on getting a better configuration for SC.
                
> Cache Providers calculate very different row sizes
> --------------------------------------------------
>
>                 Key: CASSANDRA-5939
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5939
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: 1.2.8
>            Reporter: Chris Burroughs
>            Assignee: Vijay
>
> Took the same production node and bounced it 4 times comparing version and 
> cache provider.  ConcurrentLinkedHashCacheProvider and 
> SerializingCacheProvider produce very different results resulting in an order 
> of magnitude difference in rows cached.  In all cases the row cache size was 
> 2048 MB.  Hit rate is provided for color, but entries & size are the 
> important part.
> 1.2.8 ConcurrentLinkedHashCacheProvider:
>  * entries: 23,217
>  * hit rate: 43%
>  * size: 2,147,398,344
> 1.2.8 about 20 minutes of SerializingCacheProvider:
>  * entries: 221,709
>  * hit rate: 68%
>  * size: 18,417254
> 1.2.5 ConcurrentLinkedHashCacheProvider:
>  * entries: 25,967
>  * hit rate: ~ 50%
>  * size:  2,147,421,704
> 1.2.5 about 20 minutes of SerializingCacheProvider:
>  * entries: 228,457
>  * hit rate: ~ 70%
>  * size: 19,070,315
> A related(?) problem is that the ConcurrentLinkedHashCacheProvider sizes seem 
> to be highly variable.  Digging up the values for 5 different nodes in the 
> cluster using ConcurrentLinkedHashCacheProvider shows a wide variance in 
> number of entries:
>  * 12k
>  * 444k
>  * 10k
>  * 25k
>  * 25k

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to