[ 
https://issues.apache.org/jira/browse/CASSANDRA-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082273#comment-13082273
 ] 

Pavel Yaskevich commented on CASSANDRA-2988:
--------------------------------------------

bq. This patch is about changing the most part of the load() method. I am not 
clear how we could only change the initialization of RandomAcessReader.

Why can't we just leave SST.load() as is but instead make a separate class 
where that special buffering will be done? If index read is sequential then 
just extend DataInputStream and override read(...) methods that should do the 
trick.

bq. FYI, these two patches are completely independent with each other.

I know that and I already took a brief look at it but lets go with review 
sequentially one-by-one.

> Improve SSTableReader.load() when loading index files
> -----------------------------------------------------
>
>                 Key: CASSANDRA-2988
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2988
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Melvin Wang
>            Assignee: Melvin Wang
>            Priority: Minor
>             Fix For: 1.0
>
>         Attachments: c2988-modified-buffer.patch, 
> c2988-parallel-load-sstables.patch
>
>
> * when we create BufferredRandomAccessFile, we pass skipCache=true. This 
> hurts the read performance because we always process the index files 
> sequentially. Simple fix would be set it to false.
> * multiple index files of a single column family can be loaded in parallel. 
> This buys a lot when you have multiple super large index files.
> * we may also change how we buffer. By using BufferredRandomAccessFile, for 
> every read, we need bunch of checking like
>   - do we need to rebuffer?
>   - isEOF()?
>   - assertions
>   These can be simplified to some extent.  We can blindly buffer the index 
> file by chunks and process the buffer until a key lies across boundary of a 
> chunk. Then we rebuffer and start from the beginning of the partially read 
> key. Conceptually, this is same as what BRAF does but w/o the overhead in the 
> read**() methods in BRAF.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to