[ https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143623#comment-14143623 ]
Christian Spriegel commented on CASSANDRA-7886: ----------------------------------------------- [~jbellis]: Dont get me wrong: There is definitely some client-limitation necessary in the application. But it is really not a nice situation that all queries are just sitting there and waiting. Just to clarify: The patch is not only about TOEs. It will report back any Exception. Another reason why I'd like this functionality is because it makes understanding TOEs easier. Think of a developer running his query in CQLSH: With this patch the user will get a clear message that something is wrong, instead of a timeout. I know I found this to be confusing in the beginning, and I probably still do. We could even show the ip address of the host causing the error in the message. Then the user could see which host is responsible for the failure. Is there anything about the patch itself you dont like? Imho its not adding much complexity. Most of the patch is the new Exception classes and logging. The actual code handling the failure is just a few lines. > TombstoneOverwhelmingException should not wait for timeout > ---------------------------------------------------------- > > Key: CASSANDRA-7886 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7886 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Tested with Cassandra 2.0.8 > Reporter: Christian Spriegel > Assignee: Christian Spriegel > Priority: Minor > Fix For: 2.1.1 > > Attachments: 7886_v1.txt > > > *Issue* > When you have TombstoneOverwhelmingExceptions occuring in queries, this will > cause the query to be simply dropped on every data-node, but no response is > sent back to the coordinator. Instead the coordinator waits for the specified > read_request_timeout_in_ms. > On the application side this can cause memory issues, since the application > is waiting for the timeout interval for every request.Therefore, if our > application runs into TombstoneOverwhelmingExceptions, then (sooner or later) > our entire application cluster goes down :-( > *Proposed solution* > I think the data nodes should send a error message to the coordinator when > they run into a TombstoneOverwhelmingException. Then the coordinator does not > have to wait for the timeout-interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)