[jira] [Commented] (CASSANDRA-3545) Fix very low Secondary Index performance

Hudson (Commented) (JIRA) Thu, 08 Dec 2011 10:28:09 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165407#comment-13165407
 ]


Hudson commented on CASSANDRA-3545:
-----------------------------------

Integrated in Cassandra #1248 (See 
[https://builds.apache.org/job/Cassandra/1248/])
    Improve memtable slice iteration performance
patch by slebresne; reviewed by jbellis for CASSANDRA-3545

slebresne : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1211999
Files : 
* /cassandra/trunk/CHANGES.txt
* /cassandra/trunk/src/java/org/apache/cassandra/db/AbstractColumnContainer.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/db/ArrayBackedSortedColumns.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/CollationController.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ISortedColumns.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/Memtable.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/RowIteratorFactory.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ThreadSafeSortedColumns.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/db/TreeMapBackedSortedColumns.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/filter/IFilter.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/filter/NamesQueryFilter.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/filter/QueryFilter.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/filter/SliceQueryFilter.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/index/keys/KeysSearcher.java
* /cassandra/trunk/src/java/org/apache/cassandra/service/RowRepairResolver.java
* 
/cassandra/trunk/test/unit/org/apache/cassandra/db/ArrayBackedSortedColumnsTest.java

                
> Fix very low Secondary Index performance
> ----------------------------------------
>
>                 Key: CASSANDRA-3545
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3545
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.0
>            Reporter: Evgeny Ryabitskiy
>            Assignee: Sylvain Lebresne
>             Fix For: 1.0.6
>
>         Attachments: 0001-3545.patch, 0002-cleanup.patch
>
>
> While performing index search + value filtering over large Index Row ( ~100k 
> keys per index value) with chunks (size of 512-1024 keys) search time is 
> about 8-12 seconds, which is very very low.
> After profiling I got this picture:
> 60% of search time is calculating MD5 hash with MessageDigester (Of cause it 
> is because of RundomPartitioner).
> 33% of search time (half of all MD5 hash calculating time) is double 
> calculating of MD5 for comparing two row keys while rotating Index row to 
> startKey (when performing search query for next chunk).
> I see several performance improvements:
> 1) Use good algorithm to search startKey in sorted collection, that is faster 
> then iteration over all keys. This solution is on first place because it 
> simple, need only local code changes and should solve problem (increase 
> search in multiple times).
> 2) Don't calculate MD5 hash for startKey every time. It's optimal to compute 
> it once (so search will be twice faster).
> Also need local code changes.
> 3) Think about something faster that MD5 for hashing (like 
> TigerRandomPartitioner with Tiger/128 hash).
> Need research and maybe this research was done.
> 4) Don't use Tokens (with MD5 hash for RandomPartitioner) for comparing and 
> sorting keys in index rows. In index rows, keys can be stored and compared 
> with simple Byte Comparator. 
> This solution requires huge code changes.
> I'm going to start from first solution. Next improvements can be done with 
> next tickets.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3545) Fix very low Secondary Index performance

Reply via email to