[ https://issues.apache.org/jira/browse/CASSANDRA-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088587#comment-13088587 ]
Hudson commented on CASSANDRA-2498: ----------------------------------- Integrated in Cassandra #1039 (See [https://builds.apache.org/job/Cassandra/1039/]) Stop reading from sstables once we know we have the most recent columns patch by Daniel Lundin and jbellis for CASSANDRA-2498 jbellis : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1159942 Files : * /cassandra/trunk/src/java/org/apache/cassandra/io/sstable/SSTable.java * /cassandra/trunk/src/java/org/apache/cassandra/db/CollationController.java * /cassandra/trunk/src/java/org/apache/cassandra/db/SuperColumn.java * /cassandra/trunk/CHANGES.txt * /cassandra/trunk/src/java/org/apache/cassandra/db/Column.java * /cassandra/trunk/src/java/org/apache/cassandra/db/IColumn.java * /cassandra/trunk/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java * /cassandra/trunk/src/java/org/apache/cassandra/db/DataTracker.java * /cassandra/trunk/src/java/org/apache/cassandra/db/columniterator/SSTableSliceIterator.java * /cassandra/trunk/test/unit/org/apache/cassandra/db/ColumnFamilyStoreTest.java * /cassandra/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java > Improve read performance in update-intensive workload > ----------------------------------------------------- > > Key: CASSANDRA-2498 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2498 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Jonathan Ellis > Assignee: Jonathan Ellis > Priority: Minor > Labels: ponies > Fix For: 1.0 > > Attachments: 2498-v2.txt, 2498-v3.txt, 2498-v4.txt, > supersede-name-filter-collations.patch > > > Read performance in an update-heavy environment relies heavily on compaction > to maintain good throughput. (This is not the case for workloads where rows > are only inserted once, because the bloom filter keeps us from having to > check sstables unnecessarily.) > Very early versions of Cassandra attempted to mitigate this by checking > sstables in descending generation order (mostly equivalent to descending > mtime): once all the requested columns were found, it would not check any > older sstables. > This was incorrect, because data timestamp will not correspond to sstable > timestamp, both because compaction has the side effect of "refreshing" data > to a newer sstable, and because hintead handoff may send us data older than > what we already have. > Instead, we could create a per-sstable piece of metadata containing the most > recent (client-specified) timestamp for any column in the sstable. We could > then sort sstables by this timestamp instead, and perform a similar > optimization (if the remaining sstable client-timestamps are older than the > oldest column found in the desired result set so far, we don't need to look > further). Since under almost every workload, client timestamps of data in a > given sstable will tend to be similar, we expect this to cut the number of > sstables down proportionally to how frequently each column in the row is > updated. (If each column is updated with each write, we only have to check a > single sstable.) > This may also be useful information when deciding which SSTables to compact. > (Note that this optimization is only appropriate for named-column queries, > not slice queries, since we don't know what non-overlapping columns may exist > in older sstables.) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira