Re: Crash with TombstoneOverwhelmingException

2014-01-17 Thread Shao-Chuan Wang
Agree with Robert about the dogfood.

http://www.datastax.com/docs/datastax_enterprise3.2/dse_release_notes#rn-3-2-4
It may be a good indicator when DSE starts using Cassandra 2.x.y in
production.


> From: Robert Coli 
> Date: Mon, Dec 30, 2013 at 2:58 PM
> Subject: Re: Crash with TombstoneOverwhelmingException
> To: "user@cassandra.apache.org" 
>
>
>  On Wed, Dec 25, 2013 at 10:01 AM, Edward Capriolo 
> wrote:
>
>> I have to hijack this thread. There seem to be many problems with the
>> 2.0.3 release.
>>
>
> +1. There is no 2.0.x release I consider production ready, even after
> today's 2.0.4.
>
> Outside of passing all unit tests, factors into the release voting process?
>> What other type of extended real world testing should be done to find
>> bugs like this one that unit testing wont?
>>
>
> I also +1 these questions. Voting seems of limited use given the outputs
> of the process.
>
>>
>> Here is a whack y idea that I am half serious about. Make a CMS for
>> http://cassndra.apache.org  that back ends it's data and reporting into
>> cassandra. No release unless Cassanda db that servers the site is upgraded
>> first. :)
>>
>
> I agree wholeheartedly that eating ones own dogfood is informative.
>
> =Rob
>
>
>


Re: Crash with TombstoneOverwhelmingException

2014-01-16 Thread Cyril Scetbon
With cassandra an update is equivalent to an insert

Cyril Scetbon

> Le 14 janv. 2014 à 08:38, David Tinker  a écrit :
> 
> We never delete rows but we do a lot of updates. Is that where the
> tombstones are coming from?


Re: Crash with TombstoneOverwhelmingException

2014-01-13 Thread David Tinker
We are seeing the exact same exception in our logs. Is there any workaround?

We never delete rows but we do a lot of updates. Is that where the
tombstones are coming from?

On Wed, Dec 25, 2013 at 5:24 PM, Sanjeeth Kumar  wrote:
> Hi all,
>   One of my cassandra nodes crashes with the following exception
> periodically -
> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 SliceQueryFilter.java (line
> 200) Scanned over 10 tombstones; query aborted (see tombstone_fail_thr
> eshold)
> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java (line
> 187) Exception in thread Thread[HintedHandoff:33,1,main]
> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
> at
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
> at
> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
> at
> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306)
> at
> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351)
> at
> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309)
> at
> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92)
> at
> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
> Why does this happen? Does this relate to any incorrect config value?
>
> The Cassandra Version I'm running is
> ReleaseVersion: 2.0.3
>
> - Sanjeeth
>



-- 
http://qdb.io/ Persistent Message Queues With Replay and #RabbitMQ Integration


Re: Crash with TombstoneOverwhelmingException

2013-12-30 Thread Robert Coli
 On Wed, Dec 25, 2013 at 10:01 AM, Edward Capriolo wrote:

> I have to hijack this thread. There seem to be many problems with the
> 2.0.3 release.
>

+1. There is no 2.0.x release I consider production ready, even after
today's 2.0.4.

Outside of passing all unit tests, factors into the release voting process?
> What other type of extended real world testing should be done to find bugs
> like this one that unit testing wont?
>

I also +1 these questions. Voting seems of limited use given the outputs of
the process.

>
> Here is a whack y idea that I am half serious about. Make a CMS for
> http://cassndra.apache.org  that back ends it's data and reporting into
> cassandra. No release unless Cassanda db that servers the site is upgraded
> first. :)
>

I agree wholeheartedly that eating ones own dogfood is informative.

=Rob


Re: Crash with TombstoneOverwhelmingException

2013-12-27 Thread Kais Ahmed
You can read the comments about this new feature here :

https://issues.apache.org/jira/browse/CASSANDRA-6117


2013/12/27 Kais Ahmed 

> This threshold is to prevent bad performance, you can increase the value
>
>
> 2013/12/27 Sanjeeth Kumar 
>
>> Thanks for the replies.
>> I dont think this is just a warning , incorrectly logged as an error.
>> Everytime there is a crash, this is the exact traceback I see in the logs.
>> I just browsed through the code and the code throws
>> a TombstoneOverwhelmingException exception in these situations and I did
>> not see this being caught and handled some place. I might be wrong though.
>>
>> But I would also like to understand why this threshold value is important
>> , so that I can set a right threshold.
>>
>> - Sanjeeth
>>
>>
>> On Fri, Dec 27, 2013 at 11:33 AM, Edward Capriolo 
>> wrote:
>>
>>> I do not think the feature is supposed to crash the server. It could be
>>> that the message is the logs and the crash is not related to this message.
>>> WARN might be a better logging level for any message, even though the first
>>> threshold is WARN and the second is FAIL. ERROR is usually something more
>>> dramatic.
>>>
>>>
>>> On Wed, Dec 25, 2013 at 1:02 PM, Laing, Michael <
>>> michael.la...@nytimes.com> wrote:
>>>
 It's a feature:

 In the stock cassandra.yaml file for 2.03 see:

  # When executing a scan, within or across a partition, we need to keep
> the
> # tombstones seen in memory so we can return them to the coordinator,
> which
> # will use them to make sure other replicas also know about the
> deleted rows.
> # With workloads that generate a lot of tombstones, this can cause
> performance
> # problems and even exaust the server heap.
> # (
> http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
> )
> # Adjust the thresholds here if you understand the dangers and want to
> # scan more tombstones anyway.  These thresholds may also be adjusted
> at runtime
> # using the StorageService mbean.
> tombstone_warn_threshold: 1000
> tombstone_failure_threshold: 10


 You are hitting the failure threshold.

 ml


 On Wed, Dec 25, 2013 at 12:17 PM, Rahul Menon  wrote:

> Sanjeeth,
>
> Looks like the error is being populated from the hintedhandoff, what
> is the size of your hints cf?
>
> Thanks
> Rahul
>
>
> On Wed, Dec 25, 2013 at 8:54 PM, Sanjeeth Kumar wrote:
>
>> Hi all,
>>   One of my cassandra nodes crashes with the following exception
>> periodically -
>> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276
>> SliceQueryFilter.java (line 200) Scanned over 10 tombstones; query
>> aborted (see tombstone_fail_thr
>> eshold)
>> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java
>> (line 187) Exception in thread Thread[HintedHandoff:33,1,main]
>> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
>> at
>> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
>> at
>> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
>> at
>> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
>> at
>> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487)
>> at
>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92)
>> at
>> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>>
>> Why does this happen? Does this relate to any incorrect config value?
>>
>> The Cassandra Version I'm running is
>> ReleaseVersion: 2.0.3
>>
>> - Sanjeeth
>>
>>
>

>>>
>>
>


Re: Crash with TombstoneOverwhelmingException

2013-12-27 Thread Kais Ahmed
This threshold is to prevent bad performance, you can increase the value


2013/12/27 Sanjeeth Kumar 

> Thanks for the replies.
> I dont think this is just a warning , incorrectly logged as an error.
> Everytime there is a crash, this is the exact traceback I see in the logs.
> I just browsed through the code and the code throws
> a TombstoneOverwhelmingException exception in these situations and I did
> not see this being caught and handled some place. I might be wrong though.
>
> But I would also like to understand why this threshold value is important
> , so that I can set a right threshold.
>
> - Sanjeeth
>
>
> On Fri, Dec 27, 2013 at 11:33 AM, Edward Capriolo 
> wrote:
>
>> I do not think the feature is supposed to crash the server. It could be
>> that the message is the logs and the crash is not related to this message.
>> WARN might be a better logging level for any message, even though the first
>> threshold is WARN and the second is FAIL. ERROR is usually something more
>> dramatic.
>>
>>
>> On Wed, Dec 25, 2013 at 1:02 PM, Laing, Michael <
>> michael.la...@nytimes.com> wrote:
>>
>>> It's a feature:
>>>
>>> In the stock cassandra.yaml file for 2.03 see:
>>>
>>>  # When executing a scan, within or across a partition, we need to keep
 the
 # tombstones seen in memory so we can return them to the coordinator,
 which
 # will use them to make sure other replicas also know about the deleted
 rows.
 # With workloads that generate a lot of tombstones, this can cause
 performance
 # problems and even exaust the server heap.
 # (
 http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
 )
 # Adjust the thresholds here if you understand the dangers and want to
 # scan more tombstones anyway.  These thresholds may also be adjusted
 at runtime
 # using the StorageService mbean.
 tombstone_warn_threshold: 1000
 tombstone_failure_threshold: 10
>>>
>>>
>>> You are hitting the failure threshold.
>>>
>>> ml
>>>
>>>
>>> On Wed, Dec 25, 2013 at 12:17 PM, Rahul Menon  wrote:
>>>
 Sanjeeth,

 Looks like the error is being populated from the hintedhandoff, what is
 the size of your hints cf?

 Thanks
 Rahul


 On Wed, Dec 25, 2013 at 8:54 PM, Sanjeeth Kumar wrote:

> Hi all,
>   One of my cassandra nodes crashes with the following exception
> periodically -
> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 SliceQueryFilter.java
> (line 200) Scanned over 10 tombstones; query aborted (see
> tombstone_fail_thr
> eshold)
> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java
> (line 187) Exception in thread Thread[HintedHandoff:33,1,main]
> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
> at
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
> at
> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
> at
> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306)
> at
> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351)
> at
> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309)
> at
> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92)
> at
> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
> Why does this happen? Does this relate to any incorrect config value?
>
> The Cassandra Version I'm running is
> ReleaseVersion: 2.0.3
>
> - Sanjeeth
>
>

>>>
>>
>


Re: Crash with TombstoneOverwhelmingException

2013-12-27 Thread Sanjeeth Kumar
Thanks for the replies.
I dont think this is just a warning , incorrectly logged as an error.
Everytime there is a crash, this is the exact traceback I see in the logs.
I just browsed through the code and the code throws
a TombstoneOverwhelmingException exception in these situations and I did
not see this being caught and handled some place. I might be wrong though.

But I would also like to understand why this threshold value is important ,
so that I can set a right threshold.

- Sanjeeth

On Fri, Dec 27, 2013 at 11:33 AM, Edward Capriolo wrote:

> I do not think the feature is supposed to crash the server. It could be
> that the message is the logs and the crash is not related to this message.
> WARN might be a better logging level for any message, even though the first
> threshold is WARN and the second is FAIL. ERROR is usually something more
> dramatic.
>
>
> On Wed, Dec 25, 2013 at 1:02 PM, Laing, Michael  > wrote:
>
>> It's a feature:
>>
>> In the stock cassandra.yaml file for 2.03 see:
>>
>>  # When executing a scan, within or across a partition, we need to keep
>>> the
>>> # tombstones seen in memory so we can return them to the coordinator,
>>> which
>>> # will use them to make sure other replicas also know about the deleted
>>> rows.
>>> # With workloads that generate a lot of tombstones, this can cause
>>> performance
>>> # problems and even exaust the server heap.
>>> # (
>>> http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
>>> )
>>> # Adjust the thresholds here if you understand the dangers and want to
>>> # scan more tombstones anyway.  These thresholds may also be adjusted at
>>> runtime
>>> # using the StorageService mbean.
>>> tombstone_warn_threshold: 1000
>>> tombstone_failure_threshold: 10
>>
>>
>> You are hitting the failure threshold.
>>
>> ml
>>
>>
>> On Wed, Dec 25, 2013 at 12:17 PM, Rahul Menon  wrote:
>>
>>> Sanjeeth,
>>>
>>> Looks like the error is being populated from the hintedhandoff, what is
>>> the size of your hints cf?
>>>
>>> Thanks
>>> Rahul
>>>
>>>
>>> On Wed, Dec 25, 2013 at 8:54 PM, Sanjeeth Kumar wrote:
>>>
 Hi all,
   One of my cassandra nodes crashes with the following exception
 periodically -
 ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 SliceQueryFilter.java
 (line 200) Scanned over 10 tombstones; query aborted (see
 tombstone_fail_thr
 eshold)
 ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java
 (line 187) Exception in thread Thread[HintedHandoff:33,1,main]
 org.apache.cassandra.db.filter.TombstoneOverwhelmingException
 at
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201)
 at
 org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
 at
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
 at
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
 at
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
 at
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
 at
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487)
 at
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306)
 at
 org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351)
 at
 org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309)
 at
 org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92)
 at
 org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)

 Why does this happen? Does this relate to any incorrect config value?

 The Cassandra Version I'm running is
 ReleaseVersion: 2.0.3

 - Sanjeeth


>>>
>>
>


Re: Crash with TombstoneOverwhelmingException

2013-12-26 Thread Edward Capriolo
I do not think the feature is supposed to crash the server. It could be
that the message is the logs and the crash is not related to this message.
WARN might be a better logging level for any message, even though the first
threshold is WARN and the second is FAIL. ERROR is usually something more
dramatic.


On Wed, Dec 25, 2013 at 1:02 PM, Laing, Michael
wrote:

> It's a feature:
>
> In the stock cassandra.yaml file for 2.03 see:
>
> # When executing a scan, within or across a partition, we need to keep the
>> # tombstones seen in memory so we can return them to the coordinator,
>> which
>> # will use them to make sure other replicas also know about the deleted
>> rows.
>> # With workloads that generate a lot of tombstones, this can cause
>> performance
>> # problems and even exaust the server heap.
>> # (
>> http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
>> )
>> # Adjust the thresholds here if you understand the dangers and want to
>> # scan more tombstones anyway.  These thresholds may also be adjusted at
>> runtime
>> # using the StorageService mbean.
>> tombstone_warn_threshold: 1000
>> tombstone_failure_threshold: 10
>
>
> You are hitting the failure threshold.
>
> ml
>
>
> On Wed, Dec 25, 2013 at 12:17 PM, Rahul Menon  wrote:
>
>> Sanjeeth,
>>
>> Looks like the error is being populated from the hintedhandoff, what is
>> the size of your hints cf?
>>
>> Thanks
>> Rahul
>>
>>
>> On Wed, Dec 25, 2013 at 8:54 PM, Sanjeeth Kumar wrote:
>>
>>> Hi all,
>>>   One of my cassandra nodes crashes with the following exception
>>> periodically -
>>> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 SliceQueryFilter.java
>>> (line 200) Scanned over 10 tombstones; query aborted (see
>>> tombstone_fail_thr
>>> eshold)
>>> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java
>>> (line 187) Exception in thread Thread[HintedHandoff:33,1,main]
>>> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
>>> at
>>> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201)
>>> at
>>> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
>>> at
>>> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
>>> at
>>> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
>>> at
>>> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
>>> at
>>> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
>>> at
>>> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487)
>>> at
>>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306)
>>> at
>>> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351)
>>> at
>>> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309)
>>> at
>>> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92)
>>> at
>>> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> at java.lang.Thread.run(Thread.java:744)
>>>
>>> Why does this happen? Does this relate to any incorrect config value?
>>>
>>> The Cassandra Version I'm running is
>>> ReleaseVersion: 2.0.3
>>>
>>> - Sanjeeth
>>>
>>>
>>
>


Re: Crash with TombstoneOverwhelmingException

2013-12-25 Thread Laing, Michael
It's a feature:

In the stock cassandra.yaml file for 2.03 see:

# When executing a scan, within or across a partition, we need to keep the
> # tombstones seen in memory so we can return them to the coordinator, which
> # will use them to make sure other replicas also know about the deleted
> rows.
> # With workloads that generate a lot of tombstones, this can cause
> performance
> # problems and even exaust the server heap.
> # (
> http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
> )
> # Adjust the thresholds here if you understand the dangers and want to
> # scan more tombstones anyway.  These thresholds may also be adjusted at
> runtime
> # using the StorageService mbean.
> tombstone_warn_threshold: 1000
> tombstone_failure_threshold: 10


You are hitting the failure threshold.

ml


On Wed, Dec 25, 2013 at 12:17 PM, Rahul Menon  wrote:

> Sanjeeth,
>
> Looks like the error is being populated from the hintedhandoff, what is
> the size of your hints cf?
>
> Thanks
> Rahul
>
>
> On Wed, Dec 25, 2013 at 8:54 PM, Sanjeeth Kumar wrote:
>
>> Hi all,
>>   One of my cassandra nodes crashes with the following exception
>> periodically -
>> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 SliceQueryFilter.java
>> (line 200) Scanned over 10 tombstones; query aborted (see
>> tombstone_fail_thr
>> eshold)
>> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java
>> (line 187) Exception in thread Thread[HintedHandoff:33,1,main]
>> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
>> at
>> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
>> at
>> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
>> at
>> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
>> at
>> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487)
>> at
>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92)
>> at
>> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>>
>> Why does this happen? Does this relate to any incorrect config value?
>>
>> The Cassandra Version I'm running is
>> ReleaseVersion: 2.0.3
>>
>> - Sanjeeth
>>
>>
>


Re: Crash with TombstoneOverwhelmingException

2013-12-25 Thread Edward Capriolo
I have to hijack this thread. There seem to be many problems with the 2.0.3
release.  If this exception is being generated by hinted-handoff, I could
understand where it is coming from. If you have many hints and many
tombstones then this new feature interacts with the hint delivery process,
in a bad way.

If I understand the feature correctly this feature should always be off for
the hints, because the regardless of how many tombstones are in the hints
this rule should not apply.

I want to bring up these questions:

Outside of passing all unit tests, factors into the release voting process?
What other type of extended real world testing should be done to find bugs
like this one that unit testing wont?

Not trying to call anyone out this feature/bug. I totally understand why
you would want a warning, or want to opt out of a read scanning over a
massive number of tombstones, and I think it is a smart feature. But what I
want more is to trust that every release is battle tested.

Here is a whack y idea that I am half serious about. Make a CMS for
http://cassndra.apache.org  that back ends it's data and reporting into
cassandra. No release unless Cassanda db that servers the site is upgraded
first. :)


On Wed, Dec 25, 2013 at 12:17 PM, Rahul Menon  wrote:

> Sanjeeth,
>
> Looks like the error is being populated from the hintedhandoff, what is
> the size of your hints cf?
>
> Thanks
> Rahul
>
>
> On Wed, Dec 25, 2013 at 8:54 PM, Sanjeeth Kumar wrote:
>
>> Hi all,
>>   One of my cassandra nodes crashes with the following exception
>> periodically -
>> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 SliceQueryFilter.java
>> (line 200) Scanned over 10 tombstones; query aborted (see
>> tombstone_fail_thr
>> eshold)
>> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java
>> (line 187) Exception in thread Thread[HintedHandoff:33,1,main]
>> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
>> at
>> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
>> at
>> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
>> at
>> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
>> at
>> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
>> at
>> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487)
>> at
>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309)
>> at
>> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92)
>> at
>> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>>
>> Why does this happen? Does this relate to any incorrect config value?
>>
>> The Cassandra Version I'm running is
>> ReleaseVersion: 2.0.3
>>
>> - Sanjeeth
>>
>>
>


Re: Crash with TombstoneOverwhelmingException

2013-12-25 Thread Rahul Menon
Sanjeeth,

Looks like the error is being populated from the hintedhandoff, what is the
size of your hints cf?

Thanks
Rahul


On Wed, Dec 25, 2013 at 8:54 PM, Sanjeeth Kumar  wrote:

> Hi all,
>   One of my cassandra nodes crashes with the following exception
> periodically -
> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,276 SliceQueryFilter.java
> (line 200) Scanned over 10 tombstones; query aborted (see
> tombstone_fail_thr
> eshold)
> ERROR [HintedHandoff:33] 2013-12-25 20:29:22,278 CassandraDaemon.java
> (line 187) Exception in thread Thread[HintedHandoff:33,1,main]
> org.apache.cassandra.db.filter.TombstoneOverwhelmingException
> at
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
> at
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72)
> at
> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297)
> at
> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306)
> at
> org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351)
> at
> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309)
> at
> org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92)
> at
> org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
> Why does this happen? Does this relate to any incorrect config value?
>
> The Cassandra Version I'm running is
> ReleaseVersion: 2.0.3
>
> - Sanjeeth
>
>