[ 
https://issues.apache.org/jira/browse/CASSANDRA-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416372#comment-13416372
 ] 

Yuki Morishita commented on CASSANDRA-4324:
-------------------------------------------

Jason,

I used YourKit and profiled memory usage for your test (little bit modified to 
call IndexSummary#complete) and it shows

IndexSummary: 21,597,040 (~20MB)
FST: 3,576,248 (~3.4MB)

for storing 10,000 keys to each, so it's pretty impressive. If we can deliver 
this, it will be huge win.
(Note that on disk, IndexSummary only writes key portion of DecoratedKey so it 
may be smaller than FST.)

My concerns left are as follows:

* Planned 1.2 release saves IndexSummary to disk(CASSANDRA-2392), so I think it 
is better to leave current implementation and add FST version of IndexSummary 
so you can rw from both.
* DecoratedKeys stored inside current IndexSummary are actually accessed from 
various places, and FST version will lack those information, you may need to 
figure out the alternative way to preserve current functionality.
* If you want to use Lucene 4.0, we should release this feature after 4.0 
release.

bq. Also the last results are for 100,000 keys rather than 1 mil.

IndexSummary holds keys for every index_interval(default 128), so I think you 
don't need to test with 1 mil.
                
> Implement Lucene FST in for key index
> -------------------------------------
>
>                 Key: CASSANDRA-4324
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4324
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jason Rutherglen
>            Assignee: Jason Rutherglen
>            Priority: Minor
>             Fix For: 1.2
>
>         Attachments: CASSANDRA-4324.patch, CASSANDRA-4324.patch, 
> CASSANDRA-4324.patch, lucene-core-4.0-SNAPSHOT.jar
>
>
> The Lucene FST data structure offers a compact and fast system for indexing 
> Cassandra keys.  More keys may be loaded which in turn should seeks faster.
> * Update the IndexSummary class to make use of the Lucene FST, overriding the 
> serialization mechanism.
> * Alter SSTableReader to make use of the FST seek mechanism

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to