[ 
https://issues.apache.org/jira/browse/CASSANDRA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197294#comment-16197294
 ] 

Ariel Weisberg commented on CASSANDRA-13442:
--------------------------------------------

bq. Considering this seems to be mostly about reducing storage costs so write 
bound workloads can run "dense" nodes, 
It's not either/or as these two things compose further reducing costs.

Dense nodes don't reduce costs 10-20x. With dense nodes you still need to pay 
for the additional RAM, disk, and datacenter power, cooling do go up slightly 
as you stuff more power utilization into each box. Dense nodes also still need 
to process each request so you also need to scale up read and write throughput 
which is not always a dimension we are claiming to improve with dense nodes.

Dense nodes won't let you increase replication on hardware where you can't fit 
an entire replica of your data set in most cases. Such as racks or DCs in a 
region that have limited capacity and by limited I mean many times less 
capacity. What do we expect from dense nodes? 2x? 4x? Are all use cases going 
to behave well with various strategies we use to get to dense nodes?

bq. While this idea does seem interesting, it seems very complex and you are 
still trading off replicas for additional storage. 
The target is 10x to 20x less storage. So additional storage yes, but not the 
same order of magnitude. In other words we pay something (complexity) and we 
get something (some replicas require 10-20x less hardware).

I also think 10-20x storage savings is a conservative estimate assuming the 
worse case utilization during an outage where transient data must be stored at 
transient replicas. With vnodes data would be spread out over several nodes so 
the additional utilization at each node could be substantially less.

bq. Seems that the primary use case would be multiple datacenters with 
transient replicas, which granted would be nice, 
Multiple data centers aren't required to benefit. Many people will be able to 
go from RF=3 in a DC today to RF=5 and lose two nodes with no availability or 
data loss instead of just one node.

There are other permutations where being able to inexpensively add a transient 
replica can increase availability like RF=3 with one replica at each DC. Write 
at CL.ALL, read at LOCAL_ONE, fall back to reading from a remote DC if 
LOCAL_ONE fails. You get strong consistency, but not write availability. Add a 
transient replicas at each DC and write at EACH_QUORUM and you get write 
availability after a single node fails.

bq. you're probably able to just store less replicas in each datacenter anyway, 
at least if we had more flexible consistency levels.
I'm not sure what you mean by flexibility.

Not without losing either availability or consistency under failure scenarios. 
If you run RF=3 today with strong consistency you can't drop to RF=2 without 
losing availability if there is a node failure.

> Support a means of strongly consistent highly available replication with 
> tunable storage requirements
> -----------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13442
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13442
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction, Coordination, Distributed Metadata, Local 
> Write-Read Paths
>            Reporter: Ariel Weisberg
>
> Replication factors like RF=2 can't provide strong consistency and 
> availability because if a single node is lost it's impossible to reach a 
> quorum of replicas. Stepping up to RF=3 will allow you to lose a node and 
> still achieve quorum for reads and writes, but requires committing additional 
> storage.
> The requirement of a quorum for writes/reads doesn't seem to be something 
> that can be relaxed without additional constraints on queries, but it seems 
> like it should be possible to relax the requirement that 3 full copies of the 
> entire data set are kept. What is actually required is a covering data set 
> for the range and we should be able to achieve a covering data set and high 
> availability without having three full copies. 
> After a repair we know that some subset of the data set is fully replicated. 
> At that point we don't have to read from a quorum of nodes for the repaired 
> data. It is sufficient to read from a single node for the repaired data and a 
> quorum of nodes for the unrepaired data.
> One way to exploit this would be to have N replicas, say the last N replicas 
> (where N varies with RF) in the preference list, delete all repaired data 
> after a repair completes. Subsequent quorum reads will be able to retrieve 
> the repaired data from any of the two full replicas and the unrepaired data 
> from a quorum read of any replica including the "transient" replicas.
> Configuration for something like this in NTS might be something similar to { 
> DC1="3-1", DC2="3-2" } where the first value is the replication factor used 
> for consistency and the second values is the number of transient replicas. If 
> you specify { DC1=3, DC2=3 } then the number of transient replicas defaults 
> to 0 and you get the same behavior you have today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to