[ https://issues.apache.org/jira/browse/CASSANDRA-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Evgeny Ryabitskiy updated CASSANDRA-3545: ----------------------------------------- Attachment: CASSANDRA-3545_v2.patch New version of patch with fix of bugs and clear code. > Fix very low Secondary Index performance > ---------------------------------------- > > Key: CASSANDRA-3545 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3545 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 0.7.0 > Reporter: Evgeny Ryabitskiy > Fix For: 1.0.6 > > Attachments: CASSANDRA-3545.patch, CASSANDRA-3545_v2.patch, > IndexSearchPerformance.png > > > While performing index search + value filtering over large Index Row ( ~100k > keys per index value) with chunks (size of 512-1024 keys) search time is > about 8-12 seconds, which is very very low. > After profiling I got this picture: > 60% of search time is calculating MD5 hash with MessageDigester (Of cause it > is because of RundomPartitioner). > 33% of search time (half of all MD5 hash calculating time) is double > calculating of MD5 for comparing two row keys while rotating Index row to > startKey (when performing search query for next chunk). > I see several performance improvements: > 1) Use good algorithm to search startKey in sorted collection, that is faster > then iteration over all keys. This solution is on first place because it > simple, need only local code changes and should solve problem (increase > search in multiple times). > 2) Don't calculate MD5 hash for startKey every time. It's optimal to compute > it once (so search will be twice faster). > Also need local code changes. > 3) Think about something faster that MD5 for hashing (like > TigerRandomPartitioner with Tiger/128 hash). > Need research and maybe this research was done. > 4) Don't use Tokens (with MD5 hash for RandomPartitioner) for comparing and > sorting keys in index rows. In index rows, keys can be stored and compared > with simple Byte Comparator. > This solution requires huge code changes. > I'm going to start from first solution. Next improvements can be done with > next tickets. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira