[ https://issues.apache.org/jira/browse/CASSANDRA-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074850#comment-17074850 ]
Benedict Elliott Smith commented on CASSANDRA-15642: ---------------------------------------------------- bq. there is no indication externally to the user/client that the information returned in not complete/reliable What is your definition of complete/reliable? bq. I am not sure how that would be possible without waiting all the responses come back or timeout, but happy to be explained if I am missing something At the point of failure you know if you are failing because of failure or timeout. The problem is only that we produce a nonsense error message that is inconsistent. We are of course able to produce an error message whose information is internally consistent with the situation. For instance, we tend to have a pattern of: {code} if (isFailed()) reportFailure() {cod} However the state changes between testing and reporting. We should instead have: {code} state = state() if (isFailed(state)) reportFailure(state) {code} We also now have a situation where state is a tuple of {{(triggerPrimitive, detailMap)}}, and we report a combination of {{triggerPrimitive}} and {{detailMap}}, despite them not being consistent. Our decision and reporting should rest solely on {{detailMap}} with {{triggerPrimitive}} serving only for scheduling purposes (waking up the waiting thread) > Inconsistent failure messages on distributed queries > ---------------------------------------------------- > > Key: CASSANDRA-15642 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15642 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Coordination > Reporter: Kevin Gallardo > Priority: Normal > > As a follow up to some exploration I have done for CASSANDRA-15543, I > realized the following behavior in both {{ReadCallback}} and > {{AbstractWriteHandler}}: > - await for responses > - when all required number of responses have come back: unblock the wait > - when a single failure happens: unblock the wait > - when unblocked, look to see if the counter of failures is > 1 and if so > return an error message based on the {{failures}} map that's been filled > Error messages that can result from this behavior can be a ReadTimeout, a > ReadFailure, a WriteTimeout or a WriteFailure. > In case of a Write/ReadFailure, the user will get back an error looking like > the following: > "Failure: Received X responses, and Y failures" > (if this behavior I describe is incorrect, please correct me) > This causes a usability problem. Since the handler will fail and throw an > exception as soon as 1 failure happens, the error message that is returned to > the user may not be accurate. > (note: I am not entirely sure of the behavior in case of timeouts for now) > For example, say a request at CL = QUORUM = 3, a failed request may complete > first, then a successful one completes, and another fails. If the exception > is thrown fast enough, the error message could say > "Failure: Received 0 response, and 1 failure at CL = 3" > Which: > 1. doesn't make a lot of sense because the CL doesn't match the number of > results in the message, so you end up thinking "what happened with the rest > of the required CL?" > 2. the information is incorrect. We did receive a successful response, only > it came after the initial failure. > From that logic, I think it is safe to assume that the information returned > in the error message cannot be trusted in case of a failure. Only information > users should extract out of it is that at least 1 node has failed. > For a big improvement in usability, the {{ReadCallback}} and > {{AbstractWriteResponseHandler}} could instead wait for all responses to come > back before unblocking the wait, or let it timeout. This is way, the users > will be able to have some trust around the information returned to them. > Additionally, an error that happens first prevents a timeout to happen > because it fails immediately, and so potentially it hides problems with other > replicas. If we were to wait for all responses, we might get a timeout, in > that case we'd also be able to tell wether failures have happened *before* > that timeout, and have a more complete diagnostic where you can't detect both > errors at the same time. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org