[ 
https://issues.apache.org/jira/browse/CASSANDRA-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406777#comment-13406777
 ] 

Jonathan Ellis commented on CASSANDRA-4285:
-------------------------------------------

We can break up implementation as follows:

# Add an atomic_batch_mutate method (with the same parameters as batch_mutate) 
using batchlog/SimpleStrategy in CassandraServer + StorageProxy
# Implement batchlog replay
# Implement a custom BatchlogStrategy that ensures redundancy at RF=1
# Add CQL3 support

3 and 4 appear straightforward.  1 and 2 have some hairier corners:

For the batchlog write:
- We don't want to make the write path more fragile (in the sense that atomic 
writes will fail, where non-atomic ones would succeed).  But, the batchlog 
shard will probably be on a different machine than the replicas.  If that 
machine is down, we could raise UnavailableException...  but better would be to 
try different shards until we find one whose owner is up.
- Part of the goal here is to avoid forcing the client to retry on 
TimedOutException.  So if we attempt a batchlog write that times out, we should 
also retry to another shard instead of propagating TOE to the client.
- Corollary: we don't need to worry about batchlog hints.
- Finally, once the batchlog write succeeds, we shouldn't have to make the 
client retry for timeouts writing to the replicas either; we can do the retry 
server-side.  But, we can't just return success, since that would imply that 
we'd achieved the requested ConsistencyLevel and the data is available to be 
read.  Instead, we should introduce a new exception (InProgressException?) to 
indicate that the data isn't available to read yet, but the client does not 
need to retry.  (We could use this exception as well for normal reads, where we 
have at least one replica acknowledge the update in time.)
- What about RF and CL for batchlog?  If it's convenient, we can allow users to 
customize batchlog RF, but we should always use CL=1 for read and writes.  (If 
we need to go lower level though instead of re-using the normal write path, I'm 
fine with hardcoding RF=1.)  We don't care about "latest versions," since we're 
append only and if we don't see an entry on one replay attempt we'll see it on 
the next, and we really don't need more durability than one replica since it's 
only a "staging area" until it's sent out to the replicas.  The main 
alternative would be to use the CL for the batch, in the batchlog write.  I 
don't like that though because that's going to introduce extra latency for the 
batchlog write that you don't want 99% of the time.

For replay:
- The main difficulty is that the batchlog shard owners can't be assumed to be 
alive when we restart.  So, we'll need to track replay status for each shard: 
check on startup and retry periodically until we're successful.  (One advantage 
of restricting batchlog to RF=1 is, when a read succeeds we know we're done 
replaying.  But if we have RF>1, then we need to retry the read indefinitely in 
case another replica recovered that had additional entries.
                
> Atomic, eventually-consistent batches
> -------------------------------------
>
>                 Key: CASSANDRA-4285
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4285
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>
> I discussed this in the context of triggers (CASSANDRA-1311) but it's useful 
> as a standalone feature as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to