[ https://issues.apache.org/jira/browse/CASSANDRA-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407105#comment-13407105 ]
Sylvain Lebresne commented on CASSANDRA-4285: --------------------------------------------- If I understand that correctly, only the coordinator of a given batch might be able to replay batches. The problem I can see with that is that if the node dies and you never "replace it" (i.e. bring a node with the same IP back up), then you might never replay some batches. Which put a strong burden on the operator not to screw up. Besides, the batches won't be replay until a replacement node is brought up, which means that even if we replay it ultimately, it can take an unbounded time to do it. So I would also add a mechanism to allow other nodes to replay batches. For instance, when a node A detects that another node B is down, it could check whether it has some batches for B locally and replay them (node B will replay them too when he's back up but that doesn't matter). bq. we need to retry the read indefinitely in case another replica recovered For that too we can use the failure detector to track which node we've successfully checked since restart (avoids the "indefinitely" part). bq. default RF will be 1; operators can increase if desired I'll admit I find 1 just a bit too low for a default (especially given it'll be global) and I would prefer at least 2. My reasoning is that: # RF=1 is a tad unsafe as far as durability is concerned. # RF=1 has the problem that if the one replica you've picked might timeout. Even if you automatically retry another shard (which I'm not in favor of, see below), it will screw up the latency. RF > 1 (with CL.ONE) largely mitigate that issue. # A higher RF won't be slower during the writes (it will actually be faster because of my preceding point) and that is really what we care about. If replay is a bit slower because of it, it's not a big deal (especially given that there will never be much to replay). bq. Part of the goal here is to avoid forcing the client to retry on TimedOutException. So if we attempt a batchlog write that times out, we should also retry to another shard instead of propagating TOE to the client. I think that what this ticket will provide is an extention of the atomicity that exists for batches to the same key to all batches, and I don't think this give us much more than that. So I fully expect the retry policy for clients to be unchanged (most of the time client applications want to retry because what they care about is to achieve a given consistency level, or because they care that the data is replicated to at least X node). In other words, I see a timeout as saying "I haven't been able to achieve the requested consistency level in time". This ticket doesn't change that, it only makes stronger guarantee on the state of the DB in that case (which is good). But I don't see why that would make us start doing retry server-side. bq. we shouldn't have to make the client retry for timeouts writing to the replicas either; we can do the retry server-side Same as above, I disagree :). bq. Instead, we should introduce a new exception (InProgressException?) to indicate that the data isn't available to read yet As said above I think that this should still be a TimeoutException. However, I do see a point in giving more info on what that timeout means and I've opened for CASSANDRA-4414 for that (which I meant to do since some time anyway). Having suceesfully wrote to the DCL could just be one of the info we would add to the TimeoutException. > Atomic, eventually-consistent batches > ------------------------------------- > > Key: CASSANDRA-4285 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4285 > Project: Cassandra > Issue Type: New Feature > Components: API, Core > Reporter: Jonathan Ellis > Assignee: Jonathan Ellis > > I discussed this in the context of triggers (CASSANDRA-1311) but it's useful > as a standalone feature as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira