[jira] [Commented] (CASSANDRA-2498) Improve read performance in update-intensive workload

Hudson (JIRA) Mon, 22 Aug 2011 04:49:59 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088587#comment-13088587
 ]


Hudson commented on CASSANDRA-2498:
-----------------------------------

Integrated in Cassandra #1039 (See 
[https://builds.apache.org/job/Cassandra/1039/])
    Stop reading from sstables once we know we have the most recent columns
patch by Daniel Lundin and jbellis for CASSANDRA-2498

jbellis : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1159942
Files : 
* /cassandra/trunk/src/java/org/apache/cassandra/io/sstable/SSTable.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/CollationController.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/SuperColumn.java
* /cassandra/trunk/CHANGES.txt
* /cassandra/trunk/src/java/org/apache/cassandra/db/Column.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/IColumn.java
* 
/cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/DataTracker.java
* 
/cassandra/trunk/src/java/org/apache/cassandra/db/columniterator/SSTableSliceIterator.java
* /cassandra/trunk/test/unit/org/apache/cassandra/db/ColumnFamilyStoreTest.java
* /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java


> Improve read performance in update-intensive workload
> -----------------------------------------------------
>
>                 Key: CASSANDRA-2498
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2498
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: ponies
>             Fix For: 1.0
>
>         Attachments: 2498-v2.txt, 2498-v3.txt, 2498-v4.txt, 
> supersede-name-filter-collations.patch
>
>
> Read performance in an update-heavy environment relies heavily on compaction 
> to maintain good throughput. (This is not the case for workloads where rows 
> are only inserted once, because the bloom filter keeps us from having to 
> check sstables unnecessarily.)
> Very early versions of Cassandra attempted to mitigate this by checking 
> sstables in descending generation order (mostly equivalent to descending 
> mtime): once all the requested columns were found, it would not check any 
> older sstables.
> This was incorrect, because data timestamp will not correspond to sstable 
> timestamp, both because compaction has the side effect of "refreshing" data 
> to a newer sstable, and because hintead handoff may send us data older than 
> what we already have.
> Instead, we could create a per-sstable piece of metadata containing the most 
> recent (client-specified) timestamp for any column in the sstable.  We could 
> then sort sstables by this timestamp instead, and perform a similar 
> optimization (if the remaining sstable client-timestamps are older than the 
> oldest column found in the desired result set so far, we don't need to look 
> further). Since under almost every workload, client timestamps of data in a 
> given sstable will tend to be similar, we expect this to cut the number of 
> sstables down proportionally to how frequently each column in the row is 
> updated. (If each column is updated with each write, we only have to check a 
> single sstable.)
> This may also be useful information when deciding which SSTables to compact.
> (Note that this optimization is only appropriate for named-column queries, 
> not slice queries, since we don't know what non-overlapping columns may exist 
> in older sstables.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2498) Improve read performance in update-intensive workload

Reply via email to