[ https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249056#comment-14249056 ]
Tyler Hobbs commented on CASSANDRA-7886: ---------------------------------------- bq. Hi Tyler Hobbs, sorry I kept you waiting for so long. No worries, I know you're busy :) bq. The commented code was meant as a preparation for WriteFailureExceptions. Does it perhaps make sense to fully add WriteFailureException? As a follow up ticket, we could implement it then for the different writes. Or do you want me to get rid it? I do think it's a good idea to implement something similar for writes, and splitting that into a second ticket would be good. So go ahead and delete the comments for this patch. {quote} Just to make sure that we dont touch anything new here: TOEs are logged inside SliceQueryFilter.collectReducedColumns already. I simply took this catch block from the ReadVerbHandler/RangeSliceVerbHandler and put into StorageProxy/MessageDeliveryTask. I don't like that either, but I did not want to touch it. Do you still want me to change it? {quote} Yes, go ahead and remove those other try/catch blocks as well. I can't see a reason why they should be suppressed once the logging statement is removed. bq. I merged ReadTimeoutException|ReadFailureException into a single catch block. Cool. The way you did it there looks perfect. Further up in StorageProxy there's an almost identical chunk of code. Can you condense that one as well? bq. I also added the last cell-name to the TOE, so that an administrator can get an estimate where to look for the tombstones. This doesn't really match the tickets new name, but is related to my original issue The many implementations of CellName don't implement {{toString()}}, so I think you want {{container.getComparator().getString(cell.name())}} instead. > Coordinator should not wait for read timeouts when replicas hit Exceptions > -------------------------------------------------------------------------- > > Key: CASSANDRA-7886 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7886 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Tested with Cassandra 2.0.8 > Reporter: Christian Spriegel > Assignee: Christian Spriegel > Priority: Minor > Labels: protocolv4 > Fix For: 3.0 > > Attachments: 7886_v1.txt, 7886_v2_trunk.txt, 7886_v3_trunk.txt, > 7886_v4_trunk.txt > > > *Issue* > When you have TombstoneOverwhelmingExceptions occuring in queries, this will > cause the query to be simply dropped on every data-node, but no response is > sent back to the coordinator. Instead the coordinator waits for the specified > read_request_timeout_in_ms. > On the application side this can cause memory issues, since the application > is waiting for the timeout interval for every request.Therefore, if our > application runs into TombstoneOverwhelmingExceptions, then (sooner or later) > our entire application cluster goes down :-( > *Proposed solution* > I think the data nodes should send a error message to the coordinator when > they run into a TombstoneOverwhelmingException. Then the coordinator does not > have to wait for the timeout-interval. -- This message was sent by Atlassian JIRA (v6.3.4#6332)