[ https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14122809#comment-14122809 ]
Sylvain Lebresne commented on CASSANDRA-7886: --------------------------------------------- bq. So if cassandra suddenly decides to make every request wait for 15 sec I meant that if *every* requests hits TombstoneOverwhelmingException, you either have set the threshold way too low, or you have a problem with you data model which generates too much tombstones. And if every requests hits TombstoneOverwhelmingException, even if C* was to return more quickly from it, you'll still won't make much progress. And if you just have a couple of specific requets that might end up hitting the tombstone threshold occasionally and that still break your application cluster, that also sound like something you should mitigate client side (after all, getting timeout can happen, TombstoneOverwhelmingException or not). bq. Can we set fixversion to 3.0 already so that this ticket wont be forgotten? Sure, but setting a fixversion is never a promise. > TombstoneOverwhelmingException should not wait for timeout > ---------------------------------------------------------- > > Key: CASSANDRA-7886 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7886 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Tested with Cassandra 2.0.8 > Reporter: Christian Spriegel > Priority: Minor > Fix For: 3.0 > > > *Issue* > When you have TombstoneOverwhelmingExceptions occuring in queries, this will > cause the query to be simply dropped on every data-node, but no response is > sent back to the coordinator. Instead the coordinator waits for the specified > read_request_timeout_in_ms. > On the application side this can cause memory issues, since the application > is waiting for the timeout interval for every request.Therefore, if our > application runs into TombstoneOverwhelmingExceptions, then (sooner or later) > our entire application cluster goes down :-( > *Proposed solution* > I think the data nodes should send a error message to the coordinator when > they run into a TombstoneOverwhelmingException. Then the coordinator does not > have to wait for the timeout-interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)