[ https://issues.apache.org/jira/browse/CASSANDRA-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205335#comment-13205335 ]
Dominic Williams commented on CASSANDRA-3748: --------------------------------------------- Weird - ok just starting to dig into this with sstable2json, I ran repair and compact on the problem sstable again. This time it compaction actually deleted the ghosts. So I'm guessing non-reported issues were occurring with compact. Consider issue closed I'll upgrade to 1.07 and hopefully this will not happen again. > Range ghosts don't disappear as expected and accumulate > ------------------------------------------------------- > > Key: CASSANDRA-3748 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3748 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.0.3 > Environment: Cassandra on Debian > Reporter: Dominic Williams > Labels: compaction, ghost-row, range, remove > Fix For: 1.0.8 > > Original Estimate: 6h > Remaining Estimate: 6h > > I have a problem where range ghosts are accumulating and cannot be removed by > reducing GCSeconds and compacting. > In our system, we have some cfs that represent "markets" where each row > represents an item. Once an item is sold, it is removed from the market by > passing its key to remove(). > The problem, which was hidden for some time by caching, is appearing on read. > Every few seconds our system collates a random sample from each cf/market by > choosing a random starting point: > String startKey = RNG.nextUUID()) > and then loading a page range of rows, specifying the key range as: > KeyRange keyRange = new KeyRange(pageSize); > keyRange.setStart_key(startKey); > keyRange.setEnd_key(maxKey); > The returned rows are iterated over, and ghosts ignored. If insufficient rows > are obtained, the process is repeated using the key of the last row as the > starting key (or wrapping if necessary etc). > When performance was lagging, we did a test and found that constructing a > random sample of 40 items (rows) involved iterating over hundreds of > thousands of ghost rows. > Our first attempt to deal with this was to halve our GCGraceSeconds and then > perform major compactions. However, this had no effect on the number of ghost > rows being returned. Furthermore, on examination it seems clear that the > number of ghost rows being created within GCSeconds window must be smaller > than the number being returned. Thus looks like a bug. > We are using Cassandra 1.0.3 with Sylain's patch from CASSANDRA-3510 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira