[ https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256948#comment-14256948 ]
Christian Spriegel commented on CASSANDRA-7886: ----------------------------------------------- Hi @thobbs! I have a chrismas present for you, in form of a patch file ;-) I attached a v5 patch that contains the fixes. Regarding TOE: Currently I throw TOEs as exceptions and they get logged just like any other exception. I am not sure if this is desireable and would like to hear your feedback. I think we have the following options: - Leave as it is in v5, meaning TOEs get logged with stacktraces. - Add catch blocks where neccessary and log it in user-friendly way. But it might be in many places. Also in this case I would prefer making TOE a checked exception. Imho TOE should not be unchecked. - Add TOE logging to C* default exception handler. (I did not investigate yet, but I assume there is a exceptionhandler) - Leave it as it was before Here a few examples how TOEs look now to the user: TOE using a 3.0 CQLSH (still on CQL-protocol 3): {code} cqlsh:test> select * from test; code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} cqlsh:test> {code} TOE using a 2.0 CQLSH: {code} cqlsh:test> select * from test; Request did not complete within rpc_timeout. {code} TOE with cassandra-cli: {code} [default@unknown] use test; Authenticated to keyspace: test [default@test] list test; Using default limit of 100 Using default cell limit of 100 null TimedOutException() at org.apache.cassandra.thrift.Cassandra$get_range_slices_result$get_range_slices_resultStandardScheme.read(Cassandra.java:17448) at org.apache.cassandra.thrift.Cassandra$get_range_slices_result$get_range_slices_resultStandardScheme.read(Cassandra.java:17397) at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:17323) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:802) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:786) at org.apache.cassandra.cli.CliClient.executeList(CliClient.java:1520) at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:285) at org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:201) at org.apache.cassandra.cli.CliMain.main(CliMain.java:331) [default@test] {code} > Coordinator should not wait for read timeouts when replicas hit Exceptions > -------------------------------------------------------------------------- > > Key: CASSANDRA-7886 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7886 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Tested with Cassandra 2.0.8 > Reporter: Christian Spriegel > Assignee: Christian Spriegel > Priority: Minor > Labels: protocolv4 > Fix For: 3.0 > > Attachments: 7886_v1.txt, 7886_v2_trunk.txt, 7886_v3_trunk.txt, > 7886_v4_trunk.txt, 7886_v5_trunk.txt > > > *Issue* > When you have TombstoneOverwhelmingExceptions occuring in queries, this will > cause the query to be simply dropped on every data-node, but no response is > sent back to the coordinator. Instead the coordinator waits for the specified > read_request_timeout_in_ms. > On the application side this can cause memory issues, since the application > is waiting for the timeout interval for every request.Therefore, if our > application runs into TombstoneOverwhelmingExceptions, then (sooner or later) > our entire application cluster goes down :-( > *Proposed solution* > I think the data nodes should send a error message to the coordinator when > they run into a TombstoneOverwhelmingException. Then the coordinator does not > have to wait for the timeout-interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)