[ 
https://issues.apache.org/jira/browse/CASSANDRA-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165306#comment-13165306
 ] 

Jonathan Ellis commented on CASSANDRA-3545:
-------------------------------------------

bq. Think about something faster that MD5 for hashing 

I've suggested a MRP (MurmurRandomPartitioner) in the past...  Murmur is 
substantially faster than MD5, especially v3 (CASSANDRA-2975), and with 
CASSANDRA-1034 done we don't need to rely on tokens being unique.  Murmur gives 
quite good hash distribution, which is the main thing we care about for 
partitioning.
                
> Fix very low Secondary Index performance
> ----------------------------------------
>
>                 Key: CASSANDRA-3545
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3545
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Evgeny Ryabitskiy
>             Fix For: 1.0.6
>
>         Attachments: 0001-3545.patch, 0002-cleanup.patch, 
> IndexSearchPerformance.png
>
>
> While performing index search + value filtering over large Index Row ( ~100k 
> keys per index value) with chunks (size of 512-1024 keys) search time is 
> about 8-12 seconds, which is very very low.
> After profiling I got this picture:
> 60% of search time is calculating MD5 hash with MessageDigester (Of cause it 
> is because of RundomPartitioner).
> 33% of search time (half of all MD5 hash calculating time) is double 
> calculating of MD5 for comparing two row keys while rotating Index row to 
> startKey (when performing search query for next chunk).
> I see several performance improvements:
> 1) Use good algorithm to search startKey in sorted collection, that is faster 
> then iteration over all keys. This solution is on first place because it 
> simple, need only local code changes and should solve problem (increase 
> search in multiple times).
> 2) Don't calculate MD5 hash for startKey every time. It's optimal to compute 
> it once (so search will be twice faster).
> Also need local code changes.
> 3) Think about something faster that MD5 for hashing (like 
> TigerRandomPartitioner with Tiger/128 hash).
> Need research and maybe this research was done.
> 4) Don't use Tokens (with MD5 hash for RandomPartitioner) for comparing and 
> sorting keys in index rows. In index rows, keys can be stored and compared 
> with simple Byte Comparator. 
> This solution requires huge code changes.
> I'm going to start from first solution. Next improvements can be done with 
> next tickets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to