[jira] [Commented] (CASSANDRA-3569) Failure detector downs should not break streams

Sylvain Lebresne (Commented) (JIRA) Wed, 07 Dec 2011 14:57:06 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164802#comment-13164802
 ]


Sylvain Lebresne commented on CASSANDRA-3569:
---------------------------------------------

bq. it uses the same failure detection algorithm which is as far as I can tell 
about as orthogonal as you can get from the concerns of streaming.

The only concern with streaming is to not break the connection until we have a 
very strong conviction that the remote end is indeed failed. Both the FD and a 
socket timeout (or I/O op timeout) are based on the same idea: they track 
messages from the remote end and if they don't get any message for some period 
of time, they decide the remote end must be dead. And for both, you gain better 
assurance the detection is correct by augmenting the time you wait, though for 
the FD it is less direct and based on augmenting the phi threshold. But 
provided you augment it enough, the FD will wait 2 hours with no message before 
convicting a node, and be equivalent to a 2h timeout. So I do am missing how 
the FD algorithm is so orthogonal to 'streaming's concerns', or at least any 
more than a socket/operation timeout.

Now it is true that a socket timeout would be based on the actual socket the 
streaming is on (rather than on the node hearbeat) and be checked by the OS 
(rather than our FD), and so a socket timeout could be more trustworthy. If 
that is why you think the FD is ill-suited, fine, that's a fair point we can 
discuss on. But let's not make decisions on vague (with all due respect) notion 
of 'our way of doing failure detection is fundamentally broken in many ways'. 
And I'm personally fine having a 'different and wider discussion' if that helps 
improving the code (btw, if our way of doing failure detection is really 
*fundamentally* broken in *many* ways, it shouldn't be too hard to show how).

bq. Note also that in the normal case of a process crashing or whatnot, the TCP 
connection will die immediately. This is a problem when there is either a 
network/firewalling glitch causing a silent death of the connection, or e.g. 
the machine panicing and getting restarted

Yes, and as said above, for the case where the connection don't die immediately 
I'm not convinced a tcp timeout would be fundamentally better (nor worse btw) 
that using the FD. When the connection does die, I'm fine with detecting that 
and feeding it back to repair as long.

bq. In what way specifically do you claim that my proposed solution would cause 
repairs not to fail?

I'm not claiming that at all. I was really only reminding that the main goal of 
detecting failure in the first place was to address frustrated and confused 
users of hanging repair to be sure we were on the same page. More generally, I 
am *not* claiming what you propose wouldn't work in any way. I do not share 
however your idea that the current use of the FD is utterly broken (I could 
agree that the current specific threshold for repair is possibly still too 
low), and that using a timeout would be much better. And since so far that 
seems to be the only argument for changing, I'd prefer we don't change for the 
pleasure of changing, and that if we do change, we fully understand why.
                
> Failure detector downs should not break streams
> -----------------------------------------------
>
>                 Key: CASSANDRA-3569
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3569
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>
> CASSANDRA-2433 introduced this behavior just to get repairs to don't sit 
> there waiting forever. In my opinion the correct fix to that problem is to 
> use TCP keep alive. Unfortunately the TCP keep alive period is insanely high 
> by default on a modern Linux, so just doing that is not entirely good either.
> But using the failure detector seems non-sensicle to me. We have a 
> communication method which is the TCP transport, that we know is used for 
> long-running processes that you don't want to incorrectly be killed for no 
> good reason, and we are using a failure detector tuned to detecting when not 
> to send real-time sensitive request to nodes in order to actively kill a 
> working connection.
> So, rather than add complexity with protocol based ping/pongs and such, I 
> propose that we simply just use TCP keep alive for streaming connections and 
> instruct operators of production clusters to tweak 
> net.ipv4.tcp_keepalive_{probes,intvl} as appropriate (or whatever equivalent 
> on their OS).
> I can submit the patch. Awaiting opinions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3569) Failure detector downs should not break streams

Reply via email to