[jira] [Updated] (CASSANDRA-4324) Implement Lucene FST in for key index
[ https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-4324: -- Reviewer: (was: yukim) Does CASSANDRA-5506 make this obsolete? ISTM that the savings here can only come from the Token, since the keys themselves will not be ordered appropriately. (That is: CASSANDRA-5506 orders the keys by token, but only stores the underlying key byte[], and regenerates the token when necessary.) Implement Lucene FST in for key index - Key: CASSANDRA-4324 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324 Project: Cassandra Issue Type: Improvement Reporter: Jason Rutherglen Assignee: Jason Rutherglen Priority: Minor Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar The Lucene FST data structure offers a compact and fast system for indexing Cassandra keys. More keys may be loaded which in turn should seeks faster. * Update the IndexSummary class to make use of the Lucene FST, overriding the serialization mechanism. * Alter SSTableReader to make use of the FST seek mechanism -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4324) Implement Lucene FST in for key index
[ https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated CASSANDRA-4324: Attachment: lucene-core-4.0-SNAPSHOT.jar CASSANDRA-4324.patch FSTMemUsage compares the memory usage of the FST vs. IndexSummary. On 1 million keys these are the results: FST: 39,032,383 bytes IndexSummary: 43,996,068 bytes A difference of about 4 megabytes. FST w would be far smaller if the MD5 hash was not being applied to the key, eg, it does best to with keys that are sequential so that prefix compression may be applied. To run FSTMemUsage, the lucene-core-4.0-SNAPSHOT.jar needs to be added to the lib/ directory. The patch was generated using 'git diff HEAD~1..HEAD' because 'git diff' after 'git add' did not work. Implement Lucene FST in for key index - Key: CASSANDRA-4324 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324 Project: Cassandra Issue Type: Improvement Reporter: Jason Rutherglen Assignee: Jason Rutherglen Priority: Minor Fix For: 1.2 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar The Lucene FST data structure offers a compact and fast system for indexing Cassandra keys. More keys may be loaded which in turn should seeks faster. * Update the IndexSummary class to make use of the Lucene FST, overriding the serialization mechanism. * Alter SSTableReader to make use of the FST seek mechanism -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4324) Implement Lucene FST in for key index
[ https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated CASSANDRA-4324: Attachment: CASSANDRA-4324.patch Reran with sorting the keys: Reran FST: 3,564,246 IndexSummary: 4,399,624 The FST is 19% smaller, if the IndexSummary mem calculation is correct. Implement Lucene FST in for key index - Key: CASSANDRA-4324 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324 Project: Cassandra Issue Type: Improvement Reporter: Jason Rutherglen Assignee: Jason Rutherglen Priority: Minor Fix For: 1.2 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar The Lucene FST data structure offers a compact and fast system for indexing Cassandra keys. More keys may be loaded which in turn should seeks faster. * Update the IndexSummary class to make use of the Lucene FST, overriding the serialization mechanism. * Alter SSTableReader to make use of the FST seek mechanism -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4324) Implement Lucene FST in for key index
[ https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated CASSANDRA-4324: Attachment: CASSANDRA-4324.patch Attached is a rough first cut of this functionality. It is in a rough state however I figured now is a good point to get some feedback before proceeding further. In the dev environment I had to copy the Lucene library into the project and am not sure why that is necessary, as Lucene is included as a dependency in the pom.xml file. The sample key range code is confusing as I am unsure of the purpose. The patch was created using 'git diff'. Implement Lucene FST in for key index - Key: CASSANDRA-4324 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324 Project: Cassandra Issue Type: Improvement Reporter: Jason Rutherglen Assignee: Jason Rutherglen Priority: Minor Attachments: CASSANDRA-4324.patch The Lucene FST data structure offers a compact and fast system for indexing Cassandra keys. More keys may be loaded which in turn should seeks faster. * Update the IndexSummary class to make use of the Lucene FST, overriding the serialization mechanism. * Alter SSTableReader to make use of the FST seek mechanism -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4324) Implement Lucene FST in for key index
[ https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-4324: -- Affects Version/s: (was: 1.1.1) Fix Version/s: (was: 1.1.1) Implement Lucene FST in for key index - Key: CASSANDRA-4324 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324 Project: Cassandra Issue Type: Improvement Reporter: Jason Rutherglen Assignee: Jason Rutherglen Priority: Minor The Lucene FST data structure offers a compact and fast system for indexing Cassandra keys. More keys may be loaded which in turn should seeks faster. * Update the IndexSummary class to make use of the Lucene FST, overriding the serialization mechanism. * Alter SSTableReader to make use of the FST seek mechanism -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira