[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

Robert Stupp (JIRA) Mon, 01 Dec 2014 23:49:52 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14231132#comment-14231132
 ]


Robert Stupp commented on CASSANDRA-7438:
-----------------------------------------

[~aweisberg], thanks for the review :)

Some of the changes you suggested are already in. Much of the code has been 
moved to OffHeapMap.
Batch cleanup has been completely removed - it's now handled inside OffHeapMap.
It makes runtime and code much nicer.

I've delayed descent unit tests to later - the stuff changed to often.
And there would be a big change when merging the stuff into C* code base, 
removing all that unused code and Cache interface implementation.

All of the duplicated stuff has been removed - we don't need that - even for a 
general-purpose cache it would have been not useful.

bq. Key buffer might be worth a thread local sized to a high watermark
Hm - do you mean sth like {{static final ThreadLocal<KeyBuffer> 
perThreadBuffer;}} inside SegmentedCacheImpl?

Regarding the "line level code review" (it's fine the way you did it IMO):

bq. Don't catch exceptions
Done.

bq. 8 bytes at a time, how does it handle trailing bytes and alignment?
Tailing bytes: fails back to per-byte comparison. Alignment: key and value are 
aligned on 8-byte boundaries.

bq. Agrona has an Unsafe ByteBuffer implementation that looks like it makes a 
little better use of various intrinsics then AbstractDataOutput.
Good hint! Will definitely take a look at it!

bq. I don't see why the cache can't use CPU endianness since the key/value are 
just copied.
Ah - you mean that stuff in HashEntryInput/Output. No - you can't always copy 
it using unsafe API.
I don't recall exactly why I removed that optimization (had that implemented 
before), but it had sth to do with data serialized for KeyBuffer and putting it 
into off-heap.
But it makes sense for values (since these are always directly serialized to 
off-heap).

bq. UTF encoded string stuff ... get that via inheritance?
Yes, basically from JDK. Could not get that via inheritance.

bq. hashing the key yourself ... 32-bits
Thought about it (and had that previously). Yes - if we have a good hash code, 
we can use it.
But I don't know whether the calling code has a hash code.
IMO hash code should be 64 bits because 32 bits might not be sufficient.

bq. allocation fails maybe throw OutOfMemoryError
That would shut down C* daemon ;) Maybe. Not sure about that.
I think if you run into such a situation (out of off-heap/system memory) you 
are completely lost.
It just ignores that put() and removes the old entry.

bq. entry is too large maybe return an error of some sort
No. The calling code cannot do anything meaningful with it. But the calling 
could could check for that in advance (before constructing any object related 
to caching), if it has enough information.

bq. catch VirtualMachineError and not error
done

bq. hotN()
I _think_ it is used to persist the hot set of the cache.

bq. concerned about materializing the full list on heap
Agree. Thought about "patching" cache off-heap addresses into DirectByteBuffer 
and using that for serialization.

bq. I don't think storage for 16 million keys is enough?
Nope - would not be. But it's 2^27 (limited by a stupid constant used for both 
max# of segments and max# of buckets). Worth taking a look at it - it's weird, 
yes.

bq. value loading case,
Don't think we need that API.

bq. Release is not synchronized.
Yep - will do that.

(Hope I caught all of your comments)

> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in 
> JVM heap as BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better 
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off 
> heap and use JNI to interact with cache. We might want to ensure that the new 
> implementation match the existing API's (ICache), and the implementation 
> needs to have safe memory access, low overhead in memory and less memcpy's 
> (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7438) Serializing Row cache alternative (Fully off heap)

Reply via email to