[ https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Rutherglen updated CASSANDRA-4324: ---------------------------------------- Attachment: lucene-core-4.0-SNAPSHOT.jar CASSANDRA-4324.patch FSTMemUsage compares the memory usage of the FST vs. IndexSummary. On 1 million keys these are the results: FST: 39,032,383 bytes IndexSummary: 43,996,068 bytes A difference of about 4 megabytes. FST w would be far smaller if the MD5 hash was not being applied to the key, eg, it does best to with keys that are sequential so that prefix compression may be applied. To run FSTMemUsage, the lucene-core-4.0-SNAPSHOT.jar needs to be added to the lib/ directory. The patch was generated using 'git diff HEAD~1..HEAD' because 'git diff' after 'git add' did not work. > Implement Lucene FST in for key index > ------------------------------------- > > Key: CASSANDRA-4324 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4324 > Project: Cassandra > Issue Type: Improvement > Reporter: Jason Rutherglen > Assignee: Jason Rutherglen > Priority: Minor > Fix For: 1.2 > > Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, > lucene-core-4.0-SNAPSHOT.jar > > > The Lucene FST data structure offers a compact and fast system for indexing > Cassandra keys. More keys may be loaded which in turn should seeks faster. > * Update the IndexSummary class to make use of the Lucene FST, overriding the > serialization mechanism. > * Alter SSTableReader to make use of the FST seek mechanism -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira