[jira] [Comment Edited] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

Ariel Weisberg (JIRA) Mon, 12 Jan 2015 11:29:47 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274005#comment-14274005
 ]


Ariel Weisberg edited comment on CASSANDRA-7438 at 1/12/15 7:28 PM:
--------------------------------------------------------------------

If you go all the way down the JMH rabbit hole you don't need to do any of your 
own timing and JMH will actually do some smart things to give you accurate 
timing and ameliorate the impact of non-scalable/expensive timing measurement. 
Metrics uses System.nanoTime() internally so it isn't really any better as far 
as I can tell. System.nanoTime() on Linux is pretty scalable 
http://shipilev.net/blog/2014/nanotrusting-nanotime/. When I tested it in JMH 
it actually seemed to be linearly scalable, but JMH will solve that for you 
even on platforms where nanoTime is finicky.

The C* integration looks good. I'm glad it was easy. When it comes to exposing 
configuration parameters less is more. I would prefer not to expose anything 
new because once people start using them they don't like to have the options 
taken away (or disabled). We should make an effort to set them automatically 
(or good enough) and if that fails we can add user visible configuration. My 
preference is to make the options accessible via properties as an escape hatch 
in production, and then add them to config if we really can't set them 
automatically.

Can you prefix any System properties you have with a classname/package or 
something that makes it clear they are part of OHC?

The stress tool when used without workload profiles does some validation. It 
checks that values are there and that the contents are correct.

Did not know about the JNA synchronized block. That is surprising, but I am 
glad to hear it is getting fixed. For access to jemalloc I recommend using 
unsafe and LD_PRELOAD jemalloc. I think that would be the recommended approach 
and the one you should benchmark against and JNA would be there as a fallback. 
That gives you a JNI call for allocation/deallocation.

I am trying out the JMH benchmark and looking at the new linked implementation 
right now. How are you starting the JMH benchmark?



was (Author: aweisberg):
If you go all the way down the JMH rabbit hole you don't need to do any of your 
own timing and JMH will actually do some smart things to give you accurate 
timing and ameliorate the impact of non-scalable/expensive timing measurement. 
Metrics uses System.nanoTime() internally so it isn't really any better as far 
as I can tell. System.nanoTime() on Linux is pretty scalable 
http://shipilev.net/blog/2014/nanotrusting-nanotime/. When I tested it in JMH 
it actually seemed to be linearly scalable, but JMH will solve that for you 
even on platforms where nanoTime is finicky.

The C* integration looks good. I'm glad it was easy. When it comes to exposing 
configuration parameters less is more. I would prefer not to expose anything 
new because once people start using them they don't like to have the options 
taken away (or disabled). We should make an effort to set them automatically 
(or good enough) and if that fails we can add user visible configuration. My 
preference is to make the options accessible via properties as an escape hatch 
in production, and then add them to config if we really can't set them 
automatically.

The stress tool when used without workload profiles does some validation. It 
checks that values are there and that the contents are correct.

Did not know about the JNA synchronized block. That is surprising, but I am 
glad to hear it is getting fixed. For access to jemalloc I recommend using 
unsafe and LD_PRELOAD jemalloc. I think that would be the recommended approach 
and the one you should benchmark against and JNA would be there as a fallback. 
That gives you a JNI call for allocation/deallocation.

I am trying out the JMH benchmark and looking at the new linked implementation 
right now. How are you starting the JMH benchmark?


> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Robert Stupp
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in 
> JVM heap as BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better 
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off 
> heap and use JNI to interact with cache. We might want to ensure that the new 
> implementation match the existing API's (ICache), and the implementation 
> needs to have safe memory access, low overhead in memory and less memcpy's 
> (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

Reply via email to