[ 
https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133974#comment-14133974
 ] 

Christian Spriegel edited comment on CASSANDRA-7886 at 9/15/14 3:07 PM:
------------------------------------------------------------------------

[~kohlisankalp]: Hi again! Sorry for all the mails, but I just had a look at 
your 2.1 patch:

I think removing the try-catch in ReadVerbHandler should do the trick, right? 
Then TOEs would be handled by your code in the MessageDeliveryTask?

ReadVerbHandler:
{code}
        Row row;
-        try
-        {
            row = command.getRow(keyspace);
-        }
-        catch (TombstoneOverwhelmingException e)
-        {
-            // error already logged.  Drop the request
-            return;
-        }
{code}

Edit: Looking a bit closer, I think its missing a few more pieces. But in my 
naive mind it does not look like a big protocol change. I would like to hear 
your opinion.


was (Author: christianmovi):
[~kohlisankalp]: Hi again! Sorry for all the mails, but I just had a look at 
your 2.1 patch:

I think removing the try-catch in ReadVerbHandler should do the trick, right? 
Then TOEs would be handled by your code in the MessageDeliveryTask?

ReadVerbHandler:
{code}
        Row row;
-        try
-        {
            row = command.getRow(keyspace);
-        }
-        catch (TombstoneOverwhelmingException e)
-        {
-            // error already logged.  Drop the request
-            return;
-        }
{code}

> TombstoneOverwhelmingException should not wait for timeout
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-7886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Tested with Cassandra 2.0.8
>            Reporter: Christian Spriegel
>            Priority: Minor
>             Fix For: 3.0
>
>
> *Issue*
> When you have TombstoneOverwhelmingExceptions occuring in queries, this will 
> cause the query to be simply dropped on every data-node, but no response is 
> sent back to the coordinator. Instead the coordinator waits for the specified 
> read_request_timeout_in_ms.
> On the application side this can cause memory issues, since the application 
> is waiting for the timeout interval for every request.Therefore, if our 
> application runs into TombstoneOverwhelmingExceptions, then (sooner or later) 
> our entire application cluster goes down :-(
> *Proposed solution*
> I think the data nodes should send a error message to the coordinator when 
> they run into a TombstoneOverwhelmingException. Then the coordinator does not 
> have to wait for the timeout-interval.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to