[ https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229284#comment-14229284 ]
Robert Stupp edited comment on CASSANDRA-7438 at 12/1/14 10:14 AM: ------------------------------------------------------------------- Have pushed the latest changes of OHC to https://github.com/snazy/ohc. It has been nearly completely rewritten. Architecture (in brief): * OHC consists of multiple segments (default: 2 x #CPUs). Less segments leads to more contention, more segments gives no measurable improvement. * Each segment consists of an off-heap-hash-map (defaults: table-size=8192, load-factor=.75). (The hash table requires 8 bytes per bucket) * Hash entries in a bucket are organized in a double-linked-list * LRU replacement policy is built-in via its own double-linked-list * Critical sections that mutually lock a segment are pretty short (code + CPU) - just a 'synchronized' keyword, no StampedLock/ReentrantLock * Capacity for the cache is configured globally and managed "locally" in each segment * Eviction (or "replacement" or "cleanup") is triggered when free capacity goes below a trigger value and cleans up to a target free capacity * Uses murmur hash on serialized key. Most significant bits are used to find the segment, least significant bits for the segment's hash map. Non-production relevant stuff: * Allows to start off-heap access in "debug" mode, that checks for accesses outside of allocated region and produces exceptions instead of SIGSEGV or jemalloc errors * ohc-benchmark updated to reflect changes About replacement policy: Currently LRU is built in - but I'm not really sold on LRU as is. Alternatives could be * timestamp (not sold on this either - basically the same as LRU) * LIRS (https://en.wikipedia.org/wiki/LIRS_caching_algorithm), big overhead (space) * 2Q (counts accesses, divides counter regularly) * LRU+random (50/50) (may give the same result than LIRS, but without LIRS' overhead) But replacement of LRU with something else is out of scope of this ticket and should be done with real workloads in C* - although the last one is "just" a additional config parameter. IMO we should add a per-table option that configures whether the row cache receives data on reads+writes or just on reads. Might prevent garbage in the cache caused by write heavy tables. {{Unsafe.allocateMemory()}} gives about 5-10% performance improvement compared to jemalloc. Reason fot it might be that JNA library (which has some synchronized blocks in it). IMO OHC is ready to be merged into C* code base. Edit2: (remove edit1) was (Author: snazy): Have pushed the latest changes of OHC to https://github.com/snazy/ohc. It has been nearly completely rewritten. Architecture (in brief): * OHC consists of multiple segments (default: 2 x #CPUs). Less segments leads to more contention, more segments gives no measurable improvement. * Each segment consists of an off-heap-hash-map (defaults: table-size=8192, load-factor=.75). (The hash table requires 8 bytes per bucket) * Hash entries in a bucket are organized in a double-linked-list * LRU replacement policy is built-in via its own double-linked-list * Critical sections that mutually lock a segment are pretty short (code + CPU) - just a 'synchronized' keyword, no StampedLock/ReentrantLock * Capacity for the cache is configured globally and managed "locally" in each segment * Eviction (or "replacement" or "cleanup") is triggered when free capacity goes below a trigger value and cleans up to a target free capacity * Uses murmur hash on serialized key. Most significant bits are used to find the segment, least significant bits for the segment's hash map. Non-production relevant stuff: * Allows to start off-heap access in "debug" mode, that checks for accesses outside of allocated region and produces exceptions instead of SIGSEGV or jemalloc errors * ohc-benchmark updated to reflect changes About replacement policy: Currently LRU is built in - but I'm not really sold on LRU as is. Alternatives could be * timestamp (not sold on this either - basically the same as LRU) * LIRS (https://en.wikipedia.org/wiki/LIRS_caching_algorithm), big overhead (space) * 2Q (counts accesses, divides counter regularly) * LRU+random (50/50) (may give the same result than LIRS, but without LIRS' overhead) But replacement of LRU with something else is out of scope of this ticket and should be done with real workloads in C* - although the last one is "just" a additional config parameter. IMO we should add a per-table option that configures whether the row cache receives data on reads+writes or just on reads. Might prevent garbage in the cache caused by write heavy tables. {{Unsafe.allocateMemory()}} gives about 5-10% performance improvement compared to jemalloc. Reason fot it might be that JNA library (which has some synchronized blocks in it). IMO OHC is ready to be merged into C* code base. Edit: the fact that there are two double-linked lists is a left-over of several experiments and it will be merged into one double-linked-list. It needs to be and will be fixed. > Serializing Row cache alternative (Fully off heap) > -------------------------------------------------- > > Key: CASSANDRA-7438 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7438 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Linux > Reporter: Vijay > Assignee: Vijay > Labels: performance > Fix For: 3.0 > > Attachments: 0001-CASSANDRA-7438.patch, tests.zip > > > Currently SerializingCache is partially off heap, keys are still stored in > JVM heap as BB, > * There is a higher GC costs for a reasonably big cache. > * Some users have used the row cache efficiently in production for better > results, but this requires careful tunning. > * Overhead in Memory for the cache entries are relatively high. > So the proposal for this ticket is to move the LRU cache logic completely off > heap and use JNI to interact with cache. We might want to ensure that the new > implementation match the existing API's (ICache), and the implementation > needs to have safe memory access, low overhead in memory and less memcpy's > (As much as possible). > We might also want to make this cache configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)