[ https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14262943#comment-14262943 ]
Christian Spriegel commented on CASSANDRA-7886: ----------------------------------------------- Hi [~thobbs], uploaded new patch: V6 Here is what I did: - Fixed logging of TOEs... -- ... in StorageProxy for local reads -- ... in MessageDeliveryTask for remote reads - Added partitionKey(as DecoratedKey) and lastCellName logging to TOE. - Changed SliceQueryFilter not to throw TOEs Exception for System-keyspace. Cassandra does not seem to like TOEs in system queries. These TOEs will always be logged as warnings instead. This is how TOEs look like in system.log: {quote} ERROR [SharedPool-Worker-1] 2015-01-02 15:07:24,878 MessageDeliveryTask.java:81 - Scanned over 201 tombstones in test.test; 100 columns were requested; query aborted (see tombstone_failure_threshold); partitionKey=DecoratedKey(78703492656118554854272571946195123045, 31); lastCell=188; delInfo={deletedAt=-9223372036854775808, localDeletion=2147483647}; slices=[-] {quote} kind regards, Christian > Coordinator should not wait for read timeouts when replicas hit Exceptions > -------------------------------------------------------------------------- > > Key: CASSANDRA-7886 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7886 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Tested with Cassandra 2.0.8 > Reporter: Christian Spriegel > Assignee: Christian Spriegel > Priority: Minor > Labels: protocolv4 > Fix For: 3.0 > > Attachments: 7886_v1.txt, 7886_v2_trunk.txt, 7886_v3_trunk.txt, > 7886_v4_trunk.txt, 7886_v5_trunk.txt, 7886_v6_trunk.txt > > > *Issue* > When you have TombstoneOverwhelmingExceptions occuring in queries, this will > cause the query to be simply dropped on every data-node, but no response is > sent back to the coordinator. Instead the coordinator waits for the specified > read_request_timeout_in_ms. > On the application side this can cause memory issues, since the application > is waiting for the timeout interval for every request.Therefore, if our > application runs into TombstoneOverwhelmingExceptions, then (sooner or later) > our entire application cluster goes down :-( > *Proposed solution* > I think the data nodes should send a error message to the coordinator when > they run into a TombstoneOverwhelmingException. Then the coordinator does not > have to wait for the timeout-interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)