[jira] [Commented] (CASSANDRA-7886) Coordinator should not wait for read timeouts when replicas hit Exceptions

Tyler Hobbs (JIRA) Tue, 30 Dec 2014 12:31:32 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14261460#comment-14261460
 ]


Tyler Hobbs commented on CASSANDRA-7886:
----------------------------------------

bq. Regarding TOE: Currently I throw TOEs as exceptions and they get logged 
just like any other exception. I am not sure if this is desireable and would 
like to hear your feedback. I think we have the following options:

bq. Leave as it is in v5, meaning TOEs get logged with stacktraces.

Hmm, I forgot that with the previous setup, we wouldn't have stacktraces logged 
for TOEs under normal circumstances.

bq. Add catch blocks where neccessary and log it in user-friendly way. But it 
might be in many places. Also in this case I would prefer making TOE a checked 
exception. Imho TOE should not be unchecked.

I believe TOEs should remain unchecked.  They are closer in nature to an 
IOError than something that calling methods should explicitly account for.  
They would also add a lot of noise to the entire read path.

bq. Add TOE logging to C* default exception handler. (I did not investigate 
yet, but I assume there is a exceptionhandler)

We do have an unhandled exception handler (in {{CassandraDaemon}}), but I'm not 
sure that's the best solution either.  It might be okay to suppress stacktraces 
for TOEs on the normal read path, but in unexpected cases (like, say, dealing 
with hints or other system tables internally) we would want to see the 
stacktrace.  Unfortunately we can't reliably distinguish the two at this level.

bq. Leave it as it was before

I think it's a toss-up between this (catching TOEs in a few places and 
suppressing) and always allowing stacktraces to be logged for 
TombstoneOverwhelmingExceptions.

> Coordinator should not wait for read timeouts when replicas hit Exceptions
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Tested with Cassandra 2.0.8
>            Reporter: Christian Spriegel
>            Assignee: Christian Spriegel
>            Priority: Minor
>              Labels: protocolv4
>             Fix For: 3.0
>
>         Attachments: 7886_v1.txt, 7886_v2_trunk.txt, 7886_v3_trunk.txt, 
> 7886_v4_trunk.txt, 7886_v5_trunk.txt
>
>
> *Issue*
> When you have TombstoneOverwhelmingExceptions occuring in queries, this will 
> cause the query to be simply dropped on every data-node, but no response is 
> sent back to the coordinator. Instead the coordinator waits for the specified 
> read_request_timeout_in_ms.
> On the application side this can cause memory issues, since the application 
> is waiting for the timeout interval for every request.Therefore, if our 
> application runs into TombstoneOverwhelmingExceptions, then (sooner or later) 
> our entire application cluster goes down :-(
> *Proposed solution*
> I think the data nodes should send a error message to the coordinator when 
> they run into a TombstoneOverwhelmingException. Then the coordinator does not 
> have to wait for the timeout-interval.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7886) Coordinator should not wait for read timeouts when replicas hit Exceptions

Reply via email to