[ 
https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14262943#comment-14262943
 ] 

Christian Spriegel commented on CASSANDRA-7886:
-----------------------------------------------

Hi [~thobbs],

uploaded new patch: V6

Here is what I did:
- Fixed logging of TOEs...
-- ... in StorageProxy for local reads
-- ... in MessageDeliveryTask for remote reads
- Added partitionKey(as DecoratedKey) and lastCellName logging to TOE.
- Changed SliceQueryFilter not to throw TOEs Exception for System-keyspace. 
Cassandra does not seem to like TOEs in system queries. These TOEs will always 
be logged as warnings instead.


This is how TOEs look like in system.log:
{quote}
ERROR [SharedPool-Worker-1] 2015-01-02 15:07:24,878 MessageDeliveryTask.java:81 
- Scanned over 201 tombstones in test.test; 100 columns were requested; query 
aborted (see tombstone_failure_threshold); 
partitionKey=DecoratedKey(78703492656118554854272571946195123045, 31); 
lastCell=188; delInfo={deletedAt=-9223372036854775808, 
localDeletion=2147483647}; slices=[-]
{quote}

kind regards,
Christian


> Coordinator should not wait for read timeouts when replicas hit Exceptions
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Tested with Cassandra 2.0.8
>            Reporter: Christian Spriegel
>            Assignee: Christian Spriegel
>            Priority: Minor
>              Labels: protocolv4
>             Fix For: 3.0
>
>         Attachments: 7886_v1.txt, 7886_v2_trunk.txt, 7886_v3_trunk.txt, 
> 7886_v4_trunk.txt, 7886_v5_trunk.txt, 7886_v6_trunk.txt
>
>
> *Issue*
> When you have TombstoneOverwhelmingExceptions occuring in queries, this will 
> cause the query to be simply dropped on every data-node, but no response is 
> sent back to the coordinator. Instead the coordinator waits for the specified 
> read_request_timeout_in_ms.
> On the application side this can cause memory issues, since the application 
> is waiting for the timeout interval for every request.Therefore, if our 
> application runs into TombstoneOverwhelmingExceptions, then (sooner or later) 
> our entire application cluster goes down :-(
> *Proposed solution*
> I think the data nodes should send a error message to the coordinator when 
> they run into a TombstoneOverwhelmingException. Then the coordinator does not 
> have to wait for the timeout-interval.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to