[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16205395#comment-16205395
 ] 

Kurt Greaves commented on CASSANDRA-13442:
------------------------------------------

It's still not clear to me what you mean by 10-20x, are you saying you could 
need 1-100x more hardware if you don't use a transitive replica?

Regarding vnodes, yes, having less _is_ better. But having >1 token per node 
immediately makes things much more difficult. Once you have multiple tokens it 
is extremely unlikely you will be able to run multiple repairs on the same 
table across the cluster, as the replicas for any specific node will likely 
include other nodes in the cluster (or at least in other racks) that you would 
be repairing on if you were to repair multiple nodes/ranges simultaneously. 
There's also the overhead in repairs of having multiple tokens, which is 
significant. This is where reducing vnodes helps, but you can only really 
reduce it to 16, maybe 8 in some cases and the overhead is still quite large at 
this point. I haven't seen any clusters >20 nodes repair successfully within a 
reasonable GCGS using vnodes and a reasonable amount of data per node.
RangeAwareCompaction could improve this, but not perfectly. CASSANDRA-9143 may 
also help for incremental repairs, although I'm not sure if it stops the 
problem of running multiple repairs on the same table. It doesn't really reduce 
the overhead of vnodes as far as I'm aware.

> Support a means of strongly consistent highly available replication with 
> tunable storage requirements
> -----------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13442
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>            Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.
> Configuration for something like this in NTS might be something similar to { 
> DC1="3-1", DC2="3-2" } where the first value is the replication factor used 
> for consistency and the second values is the number of transient replicas. If 
> you specify { DC1=3, DC2=3 } then the number of transient replicas defaults 
> to 0 and you get the same behavior you have today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to