[ https://issues.apache.org/jira/browse/CASSANDRA-2864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256832#comment-13256832 ]
Jonathan Ellis commented on CASSANDRA-2864: ------------------------------------------- If so, how do you avoid scanning the sstables? Does this only work on named-column queries? That is, if I ask for a slice from X to Y, if you have data in your cache for X1 X2, how do you know there is not also an X3 on disk somewhere? > Alternative Row Cache Implementation > ------------------------------------ > > Key: CASSANDRA-2864 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2864 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Daniel Doubleday > Assignee: Daniel Doubleday > Priority: Minor > > we have been working on an alternative implementation to the existing row > cache(s) > We have 2 main goals: > - Decrease memory -> get more rows in the cache without suffering a huge > performance penalty > - Reduce gc pressure > This sounds a lot like we should be using the new serializing cache in 0.8. > Unfortunately our workload consists of loads of updates which would > invalidate the cache all the time. > The second unfortunate thing is that the idea we came up with doesn't fit the > new cache provider api... > It looks like this: > Like the serializing cache we basically only cache the serialized byte > buffer. we don't serialize the bloom filter and try to do some other minor > compression tricks (var ints etc not done yet). The main difference is that > we don't deserialize but use the normal sstable iterators and filters as in > the regular uncached case. > So the read path looks like this: > return filter.collectCollatedColumns(memtable iter, cached row iter) > The write path is not affected. It does not update the cache > During flush we merge all memtable updates with the cached rows. > The attached patch is based on 0.8 branch r1143352 > It does not replace the existing row cache but sits aside it. Theres > environment switch to choose the implementation. This way it is easy to > benchmark performance differences. > -DuseSSTableCache=true enables the alternative cache. It shares its > configuration with the standard row cache. So the cache capacity is shared. > We have duplicated a fair amount of code. First we actually refactored the > existing sstable filter / reader but than decided to minimize dependencies. > Also this way it is easy to customize serialization for in memory sstable > rows. > We have also experimented a little with compression but since this task at > this stage is mainly to kick off discussion we wanted to keep things simple. > But there is certainly room for optimizations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira