[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated CASSANDRA-4324:
----------------------------------------

    Attachment: lucene-core-4.0-SNAPSHOT.jar
                CASSANDRA-4324.patch

FSTMemUsage compares the memory usage of the FST vs. IndexSummary.  

On 1 million keys these are the results:

FST: 39,032,383 bytes
IndexSummary: 43,996,068 bytes

A difference of about 4 megabytes.  FST w would be far smaller if the MD5 hash 
was not being applied to the key, eg, it does best to with keys that are 
sequential so that prefix compression may be applied.

To run FSTMemUsage, the lucene-core-4.0-SNAPSHOT.jar needs to be added to the 
lib/ directory.  

The patch was generated using 'git diff HEAD~1..HEAD' because 'git diff' after 
'git add' did not work.
                
> Implement Lucene FST in for key index
> -------------------------------------
>
>                 Key: CASSANDRA-4324
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jason Rutherglen
>            Assignee: Jason Rutherglen
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
> lucene-core-4.0-SNAPSHOT.jar
>
>
> The Lucene FST data structure offers a compact and fast system for indexing 
> Cassandra keys.  More keys may be loaded which in turn should seeks faster.
> * Update the IndexSummary class to make use of the Lucene FST, overriding the 
> serialization mechanism.
> * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to