[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282467#comment-14282467
 ] 

Robert Stupp edited comment on CASSANDRA-7438 at 1/19/15 12:41 PM:
-------------------------------------------------------------------

I think the possibly best alternative to access malloc/free is {{Unsafe}} with 
jemalloc in LD_PRELOAD. Native code of {{Unsafe.allocateMemory}} is basically 
just a wrapper around {{malloc()}}/{{free()}}.

Updated the git branch with the following changes:
* update to OHC 0.3
* benchmark: add new command line option to specify key length (-kl)
* free capacity handling moved to segments
* allow to specify preferred memory allocation via system property 
"org.caffinitas.ohc.allocator"
* allow to specify defaults of OHCacheBuilder via system properties prefixed 
with "org.caffinitas.org."
* benchmark: make metrics in local to the driver threads
* benchmark: disable bucket histogram in stats by default

I did not change the default number of segments = 2 * CPUs - but I thought 
about that (since you experienced that 256 segments on c3.8xlarge gives some 
improvement). A naive approach to say e.g. 8 * CPUs feels too heavy for small 
systems (with one socket) and might be too much outside of benchmarking. If 
someone wants to get most out of it in production and really hits the number of 
segments, he can always configure it better. WDYT?

Using jemalloc on Linux via LD_PRELOAD is probably the way to go in C* (since 
off-heap is also used elsewhere).
I think we should leave the OS allocator on OSX.
Don't know much about allocator performance on Windows.

For now I do not plan any new features in OHC for C* - so maybe we shall start 
a final review round?


was (Author: snazy):
I think the possibly best alternative to access malloc/free is {{Unsafe}} with 
jemalloc in LD_PRELOAD. Native code of {{Unsafe.allocateMemory}} is basically 
just a wrapper around {{malloc()}}/{{free()}}.

Updated the git branch with the following changes:
* update to OHC 0.3
* benchmark: add new command line option to specify key length (-kl)
* free capacity handling moved to segments
* allow to specify preferred memory allocation via system property 
"org.caffinitas.ohc.allocator"
* allow to specify defaults of OHCacheBuilder via system properties prefixed 
with "org.caffinitas.org."
* benchmark: make metrics in local to the driver threads
* benchmark: disable bucket histogram in stats by default

I did not change the default number of segments = 2 * CPUs - but I thought 
about that (since you experienced that 256 segments on c3.8xlarge gives some 
improvement). A naive approach to say e.g. 8 * CPUs feels too heavy for small 
systems (with one socket) and might be too much outside of benchmarking. If 
someone wants to get most out of it in production and really hits the number of 
segments, he can always configure it better. WDYT?

Using jemalloc on Linux via LD_PRELOAD is probably the way to go in C* (since 
off-heap is also used elsewhere).
I think we should leave the OS allocator on OSX.
Don't know much about allocator performance on Windows.

For now I do not plan any new features for C* - so maybe we shall start a final 
review round?

> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Robert Stupp
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in 
> JVM heap as BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better 
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off 
> heap and use JNI to interact with cache. We might want to ensure that the new 
> implementation match the existing API's (ICache), and the implementation 
> needs to have safe memory access, low overhead in memory and less memcpy's 
> (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to