[ 
https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246592#comment-14246592
 ] 

Christian Spriegel commented on CASSANDRA-7886:
-----------------------------------------------

Hi [~thobbs], sorry I kept you waiting for so long.


{quote}Instead of using Unavailable when the protocol version is less than 4, 
use ReadTimeout. Unavailable signals that some of the replicas are considered 
to be down, which is not the case here. Plus, ReadTimeout is the error that is 
currently returned in these circumstances.{quote}
Makes sense. I changed Unavailable to ReadTimeout for CQL3 and Thrift.


{quote}In ErrorMessage.encodedSize(), there's some commented out code for 
READ_FAILURE handling.{quote}
The commented code was meant as a preparation for WriteFailureExceptions. Does 
it perhaps make sense to fully add WriteFailureException? As a follow up 
ticket, we could implement it then for the different writes. Or do you want me 
to get rid it?


{quote}Instead of catching and ignoring TombstoneOverwhelmingException in 
multiple places, I suggest you move the logged error message into the TOE 
message and let it propagate (and be logged) like any other exception.{quote}
Just to make sure that we dont touch anything new here: TOEs are logged inside 
SliceQueryFilter.collectReducedColumns already. I simply took this catch block 
from the ReadVerbHandler/RangeSliceVerbHandler and put into 
StorageProxy/MessageDeliveryTask.
I don't like that either, but I did not want to touch it. Do you still want me 
to change it?


{quote}Can you update docs/native_protocol_v4.spec with these changes? You can 
look at the previous specs to see examples of the "changes from the previous 
version" section{quote}
Ok. Should we also add WriteFailures?


{quote}In StorageProxy, the unavailables counter should not be incremented for 
read failures. I suggest creating a new, separate failure counter.{quote}
Done.

{quote}Also in StorageProxy, there's now quite a bit of code duplication around 
building error messages for ReadTimeoutExceptions and ReadFailureExceptions. 
Can you condense those somewhat?{quote}
I merged ReadTimeoutException|ReadFailureException into a single catch block.


I also added the last cell-name to the TOE, so that an administrator can get an 
estimate where to look for the tombstones. This doesn't really match the 
tickets new name, but is related to my original issue :-)


Overall, one question remains from my side: Should I also prepare 
WriteFailureExceptions? I could (as a follow-up ticket) add these to the 
write-codepath.



> Coordinator should not wait for read timeouts when replicas hit Exceptions
> --------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Tested with Cassandra 2.0.8
>            Reporter: Christian Spriegel
>            Assignee: Christian Spriegel
>            Priority: Minor
>              Labels: protocolv4
>             Fix For: 3.0
>
>         Attachments: 7886_v1.txt, 7886_v2_trunk.txt, 7886_v3_trunk.txt
>
>
> *Issue*
> When you have TombstoneOverwhelmingExceptions occuring in queries, this will 
> cause the query to be simply dropped on every data-node, but no response is 
> sent back to the coordinator. Instead the coordinator waits for the specified 
> read_request_timeout_in_ms.
> On the application side this can cause memory issues, since the application 
> is waiting for the timeout interval for every request.Therefore, if our 
> application runs into TombstoneOverwhelmingExceptions, then (sooner or later) 
> our entire application cluster goes down :-(
> *Proposed solution*
> I think the data nodes should send a error message to the coordinator when 
> they run into a TombstoneOverwhelmingException. Then the coordinator does not 
> have to wait for the timeout-interval.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to