[ https://issues.apache.org/jira/browse/CASSANDRA-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165407#comment-13165407 ]
Hudson commented on CASSANDRA-3545: ----------------------------------- Integrated in Cassandra #1248 (See [https://builds.apache.org/job/Cassandra/1248/]) Improve memtable slice iteration performance patch by slebresne; reviewed by jbellis for CASSANDRA-3545 slebresne : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1211999 Files : * /cassandra/trunk/CHANGES.txt * /cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java * /cassandra/trunk/src/java/org/apache/cassandra/db/ArrayBackedSortedColumns.java * /cassandra/trunk/src/java/org/apache/cassandra/db/CollationController.java * /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java * /cassandra/trunk/src/java/org/apache/cassandra/db/ISortedColumns.java * /cassandra/trunk/src/java/org/apache/cassandra/db/Memtable.java * /cassandra/trunk/src/java/org/apache/cassandra/db/RowIteratorFactory.java * /cassandra/trunk/src/java/org/apache/cassandra/db/ThreadSafeSortedColumns.java * /cassandra/trunk/src/java/org/apache/cassandra/db/TreeMapBackedSortedColumns.java * /cassandra/trunk/src/java/org/apache/cassandra/db/filter/IFilter.java * /cassandra/trunk/src/java/org/apache/cassandra/db/filter/NamesQueryFilter.java * /cassandra/trunk/src/java/org/apache/cassandra/db/filter/QueryFilter.java * /cassandra/trunk/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java * /cassandra/trunk/src/java/org/apache/cassandra/db/index/keys/KeysSearcher.java * /cassandra/trunk/src/java/org/apache/cassandra/service/RowRepairResolver.java * /cassandra/trunk/test/unit/org/apache/cassandra/db/ArrayBackedSortedColumnsTest.java > Fix very low Secondary Index performance > ---------------------------------------- > > Key: CASSANDRA-3545 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3545 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 0.7.0 > Reporter: Evgeny Ryabitskiy > Assignee: Sylvain Lebresne > Fix For: 1.0.6 > > Attachments: 0001-3545.patch, 0002-cleanup.patch > > > While performing index search + value filtering over large Index Row ( ~100k > keys per index value) with chunks (size of 512-1024 keys) search time is > about 8-12 seconds, which is very very low. > After profiling I got this picture: > 60% of search time is calculating MD5 hash with MessageDigester (Of cause it > is because of RundomPartitioner). > 33% of search time (half of all MD5 hash calculating time) is double > calculating of MD5 for comparing two row keys while rotating Index row to > startKey (when performing search query for next chunk). > I see several performance improvements: > 1) Use good algorithm to search startKey in sorted collection, that is faster > then iteration over all keys. This solution is on first place because it > simple, need only local code changes and should solve problem (increase > search in multiple times). > 2) Don't calculate MD5 hash for startKey every time. It's optimal to compute > it once (so search will be twice faster). > Also need local code changes. > 3) Think about something faster that MD5 for hashing (like > TigerRandomPartitioner with Tiger/128 hash). > Need research and maybe this research was done. > 4) Don't use Tokens (with MD5 hash for RandomPartitioner) for comparing and > sorting keys in index rows. In index rows, keys can be stored and compared > with simple Byte Comparator. > This solution requires huge code changes. > I'm going to start from first solution. Next improvements can be done with > next tickets. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira