[ 
https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143623#comment-14143623
 ] 

Christian Spriegel commented on CASSANDRA-7886:
-----------------------------------------------

[~jbellis]: Dont get me wrong: There is definitely some client-limitation 
necessary in the application. But it is really not a nice situation that all 
queries are just sitting there and waiting.

Just to clarify: The patch is not only about TOEs. It will report back any 
Exception.

Another reason why I'd like this functionality is because it makes 
understanding TOEs easier. Think of a developer running his query in CQLSH: 
With this patch the user will get a clear message that something is wrong, 
instead of a timeout. I know I found this to be confusing in the beginning, and 
I probably still do. We could even show the ip address of the host causing the 
error in the message. Then the user could see which host is responsible for the 
failure.

Is there anything about the patch itself you dont like? Imho its not adding 
much complexity. Most of the patch is the new Exception classes and logging. 
The actual code handling the failure is just a few lines.

> TombstoneOverwhelmingException should not wait for timeout
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-7886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Tested with Cassandra 2.0.8
>            Reporter: Christian Spriegel
>            Assignee: Christian Spriegel
>            Priority: Minor
>             Fix For: 2.1.1
>
>         Attachments: 7886_v1.txt
>
>
> *Issue*
> When you have TombstoneOverwhelmingExceptions occuring in queries, this will 
> cause the query to be simply dropped on every data-node, but no response is 
> sent back to the coordinator. Instead the coordinator waits for the specified 
> read_request_timeout_in_ms.
> On the application side this can cause memory issues, since the application 
> is waiting for the timeout interval for every request.Therefore, if our 
> application runs into TombstoneOverwhelmingExceptions, then (sooner or later) 
> our entire application cluster goes down :-(
> *Proposed solution*
> I think the data nodes should send a error message to the coordinator when 
> they run into a TombstoneOverwhelmingException. Then the coordinator does not 
> have to wait for the timeout-interval.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to