[ 
https://issues.apache.org/jira/browse/CASSANDRA-6666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078171#comment-14078171
 ] 

Vishal Mehta edited comment on CASSANDRA-6666 at 7/29/14 7:01 PM:
------------------------------------------------------------------

Hello Everyone,

Please pardon my ignorance, since I am writing first time in opensource bug 
report.

Recently I think I hit this bug because I saw similar symptoms in my 3 node 
cassandra setup. Where I am running a test with around 12K qps (inserts in 3 
different tables) with TTL set to 1 hour and keyspace has GC seconds set to 
14400 (4 hours).

So tests eventually runs to a point where Cassandra sees Tombstones more than 
100K and it crashes with following exception in 
/var/log/cassandra/cassandra.log.

{noformat}
ERROR 13:23:56,747 Scanned over 100000 tombstones in system.hints; query 
aborted (see tombstone_fail_threshold)
ERROR 13:23:56,962 Exception in thread Thread[HintedHandoff:1,1,main]
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
        at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:202)
        at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
        at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
        at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
        at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
        at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
        at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547)
        at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376)
        at 
org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:373)
        at 
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:330)
        at 
org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:91)
        at 
org.apache.cassandra.db.HintedHandOffManager$5.run(HintedHandOffManager.java:547)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
 INFO 13:24:00,987 No gossip backlog; proceeding
{noformat}

*Note:* Is it plausible to keep GC seconds closer to TTLs? Also I could see one 
of the node deleted all the records from disk and freed up the space, where as 
other two nodes never deleted their tombstones.

Please advise.
Regards,
Vishal



was (Author: vmehta):
Hello Every,

Please pardon my ignorance, since I am writing first time in opensource bug 
report.

Recently I think I hit this bug because I saw similar symptoms in my 3 node 
cassandra setup. Where I am running a test with around 12K qps (inserts in 3 
different tables) with TTL set to 1 hour and keyspace has GC seconds set to 
14400 (4 hours).

So tests eventually runs to a point where Cassandra sees Tombstones more than 
100K and it crashes with following exception in 
/var/log/cassandra/cassandra.log.

{noformat}
ERROR 13:23:56,747 Scanned over 100000 tombstones in system.hints; query 
aborted (see tombstone_fail_threshold)
ERROR 13:23:56,962 Exception in thread Thread[HintedHandoff:1,1,main]
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
        at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:202)
        at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
        at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
        at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
        at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
        at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
        at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1547)
        at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1376)
        at 
org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:373)
        at 
org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:330)
        at 
org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:91)
        at 
org.apache.cassandra.db.HintedHandOffManager$5.run(HintedHandOffManager.java:547)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)
 INFO 13:24:00,987 No gossip backlog; proceeding
{noformat}

*Note:* Is it plausible to keep GC seconds closer to TTLs? Also I could see one 
of the node deleted all the records from disk and freed up the space, where as 
other two nodes never deleted their tombstones.



> Avoid accumulating tombstones after partial hint replay
> -------------------------------------------------------
>
>                 Key: CASSANDRA-6666
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6666
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>            Priority: Minor
>              Labels: hintedhandoff
>             Fix For: 2.0.10
>
>         Attachments: 6666.txt, cassandra_system.log.debug.gz
>
>




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to