[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231627#comment-14231627
 ] 

Ariel Weisberg commented on CASSANDRA-7438:
-------------------------------------------

Robert I don't seem to be getting the latest code for your work on master? For 
instance the key comparison code does 8 bytes at a time and doesn't handle 
trailing bytes as far as I can tell.

To Vijay's point. A pseudo-random test against the map that does say 200 
million operations against a keyspace of several million entries and mirrors 
the operations on a regular hash map and checks they have the same contents 
periodically would be helpful in having some confidence in the map. Size it so 
the LRU doesn't do anything. Print the seed at the beginning of the test so it 
can be reproduced. I think this basically duplicates the benchmark, but having 
it as a unit test is nice. We can tune the number of operations and keys down 
for running in CI. You could also look a the unit tests for Guava's cache or 
j.u.HashMap and borrow those. Nice thing about data structure APIs is that the 
tests already exist.

bq. Yes, basically from JDK. Could not get that via inheritance.
What are the licensing and attribution requirements for that code?

bq. IMO hash code should be 64 bits because 32 bits might not be sufficient.
[~benedict] might have some opinions on how to get the best bits out of 
MurmurHash3. 32 bits is 256-512 gigabytes of cache for 128 byte entries which 
is not bad. I don't feel strongly either way since I don't know whether callers 
will have the hash precomputed.

bq. Nope - would not be. But it's 2^27 (limited by a stupid constant used for 
both max# of segments and max# of buckets). Worth taking a look at it - it's 
weird, yes.
In OffHeapMap line 222 it seems to have a gate preventing rehashing to > 2 ^ 24 
buckets.

bq. (Hope I caught all of your comments)
I'll check them once you update.

> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in 
> JVM heap as BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better 
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off 
> heap and use JNI to interact with cache. We might want to ensure that the new 
> implementation match the existing API's (ICache), and the implementation 
> needs to have safe memory access, low overhead in memory and less memcpy's 
> (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to