[jira] [Updated] (CASSANDRA-4324) Implement Lucene FST in for key index

2013-04-27 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-4324:
--

Reviewer:   (was: yukim)

Does CASSANDRA-5506 make this obsolete?  ISTM that the savings here can only 
come from the Token, since the keys themselves will not be ordered 
appropriately.  (That is: CASSANDRA-5506 orders the keys by token, but only 
stores the underlying key byte[], and regenerates the token when necessary.)

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-14 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated CASSANDRA-4324:


Attachment: lucene-core-4.0-SNAPSHOT.jar
CASSANDRA-4324.patch

FSTMemUsage compares the memory usage of the FST vs. IndexSummary.  

On 1 million keys these are the results:

FST: 39,032,383 bytes
IndexSummary: 43,996,068 bytes

A difference of about 4 megabytes.  FST w would be far smaller if the MD5 hash 
was not being applied to the key, eg, it does best to with keys that are 
sequential so that prefix compression may be applied.

To run FSTMemUsage, the lucene-core-4.0-SNAPSHOT.jar needs to be added to the 
lib/ directory.  

The patch was generated using 'git diff HEAD~1..HEAD' because 'git diff' after 
'git add' did not work.

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-07-14 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated CASSANDRA-4324:


Attachment: CASSANDRA-4324.patch

Reran with sorting the keys:

Reran 

FST: 3,564,246
IndexSummary: 4,399,624

The FST is 19% smaller, if the IndexSummary mem calculation is correct.

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Fix For: 1.2

 Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
 CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-06-27 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated CASSANDRA-4324:


Attachment: CASSANDRA-4324.patch

Attached is a rough first cut of this functionality.  It is in a rough state 
however I figured now is a good point to get some feedback before proceeding 
further.

In the dev environment I had to copy the Lucene library into the project and am 
not sure why that is necessary, as Lucene is included as a dependency in the 
pom.xml file.  

The sample key range code is confusing as I am unsure of the purpose.

The patch was created using 'git diff'.

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor
 Attachments: CASSANDRA-4324.patch


 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-4324) Implement Lucene FST in for key index

2012-06-20 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-4324:
--

Affects Version/s: (was: 1.1.1)
Fix Version/s: (was: 1.1.1)

 Implement Lucene FST in for key index
 -

 Key: CASSANDRA-4324
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jason Rutherglen
Assignee: Jason Rutherglen
Priority: Minor

 The Lucene FST data structure offers a compact and fast system for indexing 
 Cassandra keys.  More keys may be loaded which in turn should seeks faster.
 * Update the IndexSummary class to make use of the Lucene FST, overriding the 
 serialization mechanism.
 * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira