[ 
https://issues.apache.org/jira/browse/CASSANDRA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133122#comment-14133122
 ] 

Benedict commented on CASSANDRA-7282:
-------------------------------------

bq. I wouldn't even call this a hashCode to avoid the confusion, perhaps an 
"ordering key" or "ordering prefix"?

Creating an interface would be problematic, since we need to have our map key 
be a shared type for both CSLM keys and NBHOM keys. So I'm going to stick with 
the current situation.

If you meant from a purely documentation point of view, it is absolutely 
essential that the value is a _hash_, otherwise performance will be O\(n\^2\), 
so whilst it may be worth clarifying it is essential we call it a hashCode(). 

To elaborate on this in documentation, I've included the following extra comment

{quote}
This data structure essentially only works for keys that are first sorted by 
some hash value (and may then be sorted 
within those hashes arbitrarily), where a 32-bit prefix of the hash we sort by 
is returned by hashCode()
{quote}

> Faster Memtable map
> -------------------
>
>                 Key: CASSANDRA-7282
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7282
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: profile.yaml, reads.svg, run1.svg, writes.svg
>
>
> Currently we maintain a ConcurrentSkipLastMap of DecoratedKey -> Partition in 
> our memtables. Maintaining this is an O(lg(n)) operation; since the vast 
> majority of users use a hash partitioner, it occurs to me we could maintain a 
> hybrid ordered list / hash map. The list would impose the normal order on the 
> collection, but a hash index would live alongside as part of the same data 
> structure, simply mapping into the list and permitting O(1) lookups and 
> inserts.
> I've chosen to implement this initial version as a linked-list node per item, 
> but we can optimise this in future by storing fatter nodes that permit a 
> cache-line's worth of hashes to be checked at once,  further reducing the 
> constant factor costs for lookups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to