[ https://issues.apache.org/jira/browse/CASSANDRA-6666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078171#comment-14078171 ]
Vishal Mehta commented on CASSANDRA-6666: ----------------------------------------- Hello Every, Please pardon my ignorance, since I am writing first time in opensource bug report. Recently I think I hit this bug because I saw similar symptoms in my 3 node cassandra setup. Where I am running a test with around 12K qps (inserts in 3 different tables) with TTL set to 1 hour and keyspace has GC seconds set to 14400 (4 hours). So tests eventually runs to a point where Cassandra sees Tombstones more than 100K and it crashes with following exception in /var/log/cassandra/cassandra.log. {noformat} ERROR 13:23:56,747 Scanned over 100000 tombstones in system.hints; query aborted (see tombstone_fail_threshold) ERROR 13:23:56,962 Exception in thread Thread[HintedHandoff:1,1,main] org.apache.cassandra.db.filter.TombstoneOverwhelmingException at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:202) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376) at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:373) at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:330) at org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:91) at org.apache.cassandra.db.HintedHandOffManager$5.run(HintedHandOffManager.java:547) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) INFO 13:24:00,987 No gossip backlog; proceeding {noformat} *Note:* Is it plausible to keep GC seconds closer to TTLs? Also I could see one of the node deleted all the records from disk and freed up the space, where as other two nodes never deleted their tombstones. > Avoid accumulating tombstones after partial hint replay > ------------------------------------------------------- > > Key: CASSANDRA-6666 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6666 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Jonathan Ellis > Assignee: Jonathan Ellis > Priority: Minor > Labels: hintedhandoff > Fix For: 2.0.10 > > Attachments: 6666.txt, cassandra_system.log.debug.gz > > -- This message was sent by Atlassian JIRA (v6.2#6252)