[jira] [Commented] (CASSANDRA-14404) Transient Replication & Cheap Quorums: Decouple storage requirements from consensus group size using incremental repair
[ https://issues.apache.org/jira/browse/CASSANDRA-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17773888#comment-17773888 ] sean commented on CASSANDRA-14404: -- In Cassandra 4.0.X, We have a table with 2 materialized views. When a Cassandra node is down. Some write operations may fail. It works fine in Cassandra 3.11.x. May be related to this change. client exception: com.datastax.oss.driver.api.core.servererrors.WriteFailureException: Cassandra failure during write query at consistency LOCAL_QUORUM (2 responses were required but only 1 replica responded, 2 failed) at com.datastax.oss.driver.api.core.servererrors.WriteFailureException.copy(WriteFailureException.java:142) at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149) at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53) at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30) at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230) at com.datastax.oss.driver.api.core.cql.SyncCqlSession.execute(SyncCqlSession.java:54) Cassandra: [MutationStage-11] [org.apache.cassandra.db.Keyspace] Unknown exception caught while attempting to update MaterializedView! cycling.cycling_task org.apache.cassandra.exceptions.UnavailableException: Cannot achieve consistency level ONE at org.apache.cassandra.exceptions.UnavailableException.create(UnavailableException.java:37) at org.apache.cassandra.locator.ReplicaPlans.assureSufficientLiveReplicas(ReplicaPlans.java:170) at org.apache.cassandra.locator.ReplicaPlans.assureSufficientLiveReplicasForWrite(ReplicaPlans.java:113) at org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:354) at org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:345) at org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:339) at org.apache.cassandra.service.StorageProxy.wrapViewBatchResponseHandler(StorageProxy.java:1312) at org.apache.cassandra.service.StorageProxy.mutateMV(StorageProxy.java:1004) at org.apache.cassandra.db.view.TableViews.pushViewReplicaUpdates(TableViews.java:167) at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:647) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:513) at org.apache.cassandra.db.Mutation.apply(Mutation.java:215) at org.apache.cassandra.db.Mutation.apply(Mutation.java:220) at org.apache.cassandra.db.Mutation.apply(Mutation.java:229) at org.apache.cassandra.service.StorageProxy$4.runMayThrow(StorageProxy.java:1537) at org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2326) > Transient Replication & Cheap Quorums: Decouple storage requirements from > consensus group size using incremental repair > --- > > Key: CASSANDRA-14404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14404 > Project: Cassandra > Issue Type: New Feature > Components: Consistency/Hints, Consistency/Repair, Feature/2i Index, > Feature/Materialized Views, Legacy/Coordination, Legacy/Core, Legacy/CQL, > Legacy/Distributed Metadata, Legacy/Local Write-Read Paths, Legacy/Testing, > Legacy/Tools >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Fix For: 5.x > > > Transient Replication is an implementation of [Witness > Replicas|http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.146.3429=rep1=pdf] > that leverages incremental repair to make full replicas consistent with > transient replicas that don't store the entire data set. Witness replicas are > used in real world systems such as Megastore and Spanner to increase > availability inexpensively without having to commit to more full copies of > the database. Transient replicas implement functionality similar to > upgradable and temporary replicas from the paper. > With transient replication the replication factor is increased beyond the > desired level of data redundancy by adding replicas that only store data when > sufficient full replicas are unavailable to store the data. These replicas > are called transient replicas. When incremental repair runs transient > replicas stream any data they have received to full replicas and once the > data is fully replicated it is dropped at the transient replicas. > Cheap quorums are a further set of optimizations on the write path to avoid > writing to transient replicas
[jira] [Commented] (CASSANDRA-14404) Transient Replication & Cheap Quorums: Decouple storage requirements from consensus group size using incremental repair
[ https://issues.apache.org/jira/browse/CASSANDRA-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599457#comment-16599457 ] Ariel Weisberg commented on CASSANDRA-14404: The initial implementation of Transient Replication and Cheap Quorums was committed as [f7431b432875e334170ccdb19934d05545d2cebd|https://github.com/apache/cassandra/commit/f7431b432875e334170ccdb19934d05545d2cebd]. > Transient Replication & Cheap Quorums: Decouple storage requirements from > consensus group size using incremental repair > --- > > Key: CASSANDRA-14404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14404 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Core, CQL, Distributed Metadata, Hints, > Local Write-Read Paths, Materialized Views, Repair, Secondary Indexes, > Testing, Tools >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Major > Fix For: 4.0 > > > Transient Replication is an implementation of [Witness > Replicas|http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.146.3429=rep1=pdf] > that leverages incremental repair to make full replicas consistent with > transient replicas that don't store the entire data set. Witness replicas are > used in real world systems such as Megastore and Spanner to increase > availability inexpensively without having to commit to more full copies of > the database. Transient replicas implement functionality similar to > upgradable and temporary replicas from the paper. > With transient replication the replication factor is increased beyond the > desired level of data redundancy by adding replicas that only store data when > sufficient full replicas are unavailable to store the data. These replicas > are called transient replicas. When incremental repair runs transient > replicas stream any data they have received to full replicas and once the > data is fully replicated it is dropped at the transient replicas. > Cheap quorums are a further set of optimizations on the write path to avoid > writing to transient replicas unless sufficient full replicas are available > as well as optimizations on the read path to prefer reading from transient > replicas. When writing at quorum to a table configured to use transient > replication the quorum will always prefer available full replicas over > transient replicas so that transient replicas don't have to process writes. > Rapid write protection (similar to rapid read protection) reduces tail > latency when full replicas are slow/unavailable to respond by sending writes > to additional replicas if necessary. > Transient replicas can generally service reads faster because they don't have > to do anything beyond bloom filter checks if they have no data. With vnodes > and larger size clusters they will not have a large quantity of data even in > failure cases where transient replicas start to serve a steady amount of > write traffic for some of their transiently replicated ranges. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14404) Transient Replication & Cheap Quorums: Decouple storage requirements from consensus group size using incremental repair
[ https://issues.apache.org/jira/browse/CASSANDRA-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599146#comment-16599146 ] Ariel Weisberg commented on CASSANDRA-14404: There are no transient nodes. All nodes are the same. If you have transient replication enabled each node will transiently replicate some ranges instead of fully replicating them. Capacity requirements are reduced evenly across all nodes in the cluster. Nodes are not temporarily transient replicas during expansion. They need to stream data like a full replica for the transient range before they can serve reads. There is a pending state similar to how there is a pending state for full replicas. Transient replicas also always receive writes when they are pending. There may be some room to relax how that is handled, but for now we opt to send pending transient ranges a bit more data and avoid reading from them when maybe we could. This doesn't change how expansion works with vnodes. The same restrictions still apply. We won't officially support vnodes until we have done more testing and really thought through the corner cases. It's quite possible we will relax the restriction on creating transient keyspaces with vnodes in 4.0.x. > Transient Replication & Cheap Quorums: Decouple storage requirements from > consensus group size using incremental repair > --- > > Key: CASSANDRA-14404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14404 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Core, CQL, Distributed Metadata, Hints, > Local Write-Read Paths, Materialized Views, Repair, Secondary Indexes, > Testing, Tools >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Major > Fix For: 4.0 > > > Transient Replication is an implementation of [Witness > Replicas|http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.146.3429=rep1=pdf] > that leverages incremental repair to make full replicas consistent with > transient replicas that don't store the entire data set. Witness replicas are > used in real world systems such as Megastore and Spanner to increase > availability inexpensively without having to commit to more full copies of > the database. Transient replicas implement functionality similar to > upgradable and temporary replicas from the paper. > With transient replication the replication factor is increased beyond the > desired level of data redundancy by adding replicas that only store data when > sufficient full replicas are unavailable to store the data. These replicas > are called transient replicas. When incremental repair runs transient > replicas stream any data they have received to full replicas and once the > data is fully replicated it is dropped at the transient replicas. > Cheap quorums are a further set of optimizations on the write path to avoid > writing to transient replicas unless sufficient full replicas are available > as well as optimizations on the read path to prefer reading from transient > replicas. When writing at quorum to a table configured to use transient > replication the quorum will always prefer available full replicas over > transient replicas so that transient replicas don't have to process writes. > Rapid write protection (similar to rapid read protection) reduces tail > latency when full replicas are slow/unavailable to respond by sending writes > to additional replicas if necessary. > Transient replicas can generally service reads faster because they don't have > to do anything beyond bloom filter checks if they have no data. With vnodes > and larger size clusters they will not have a large quantity of data even in > failure cases where transient replicas start to serve a steady amount of > write traffic for some of their transiently replicated ranges. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14404) Transient Replication & Cheap Quorums: Decouple storage requirements from consensus group size using incremental repair
[ https://issues.apache.org/jira/browse/CASSANDRA-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599094#comment-16599094 ] Constance Eustace commented on CASSANDRA-14404: --- So are (basically) these transient nodes basically serving as centralized hinted handoff caches rather than having the hinted handoffs cluttering up full replicas, especially nodes that have no concern for the token range involved? I understand that hinted handoffs aren't being replaced by this, but is that kind of the idea? Are the transient nodes sitting around? Will the transient nodes have cheaper/lower hardware requirements? During cluster expansion, does the newly streaming node acquiring data function as a temporary transient node until it becomes a full replica? Likewise while shrinking, does a previously full replica function as a transient while it streams off data? Can this help vnode expansion with multiple concurrent nodes? Admittedly I'm not familiar with how much work has gone into fixing cluster expansion with vnodes, it is my understanding that you typically expand only one node at a time or in multiples of the datacenter size > Transient Replication & Cheap Quorums: Decouple storage requirements from > consensus group size using incremental repair > --- > > Key: CASSANDRA-14404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14404 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Core, CQL, Distributed Metadata, Hints, > Local Write-Read Paths, Materialized Views, Repair, Secondary Indexes, > Testing, Tools >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Major > Fix For: 4.0 > > > Transient Replication is an implementation of [Witness > Replicas|http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.146.3429=rep1=pdf] > that leverages incremental repair to make full replicas consistent with > transient replicas that don't store the entire data set. Witness replicas are > used in real world systems such as Megastore and Spanner to increase > availability inexpensively without having to commit to more full copies of > the database. Transient replicas implement functionality similar to > upgradable and temporary replicas from the paper. > With transient replication the replication factor is increased beyond the > desired level of data redundancy by adding replicas that only store data when > sufficient full replicas are unavailable to store the data. These replicas > are called transient replicas. When incremental repair runs transient > replicas stream any data they have received to full replicas and once the > data is fully replicated it is dropped at the transient replicas. > Cheap quorums are a further set of optimizations on the write path to avoid > writing to transient replicas unless sufficient full replicas are available > as well as optimizations on the read path to prefer reading from transient > replicas. When writing at quorum to a table configured to use transient > replication the quorum will always prefer available full replicas over > transient replicas so that transient replicas don't have to process writes. > Rapid write protection (similar to rapid read protection) reduces tail > latency when full replicas are slow/unavailable to respond by sending writes > to additional replicas if necessary. > Transient replicas can generally service reads faster because they don't have > to do anything beyond bloom filter checks if they have no data. With vnodes > and larger size clusters they will not have a large quantity of data even in > failure cases where transient replicas start to serve a steady amount of > write traffic for some of their transiently replicated ranges. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14404) Transient Replication & Cheap Quorums: Decouple storage requirements from consensus group size using incremental repair
[ https://issues.apache.org/jira/browse/CASSANDRA-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448626#comment-16448626 ] Duarte Nunes commented on CASSANDRA-14404: -- Ah, CL is now a function of RF + count(Witnesses). > Transient Replication & Cheap Quorums: Decouple storage requirements from > consensus group size using incremental repair > --- > > Key: CASSANDRA-14404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14404 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Core, CQL, Distributed Metadata, Hints, > Local Write-Read Paths, Materialized Views, Repair, Secondary Indexes, > Testing, Tools >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Major > Fix For: 4.0 > > > Transient Replication is an implementation of [Witness > Replicas|http://www2.cs.uh.edu/~paris/MYPAPERS/Icdcs86.pdf] that leverages > incremental repair to make full replicas consistent with transient replicas > that don't store the entire data set. Witness replicas are used in real world > systems such as Megastore and Spanner to increase availability inexpensively > without having to commit to more full copies of the database. Transient > replicas implement functionality similar to upgradable and temporary replicas > from the paper. > With transient replication the replication factor is increased beyond the > desired level of data redundancy by adding replicas that only store data when > sufficient full replicas are unavailable to store the data. These replicas > are called transient replicas. When incremental repair runs transient > replicas stream any data they have received to full replicas and once the > data is fully replicated it is dropped at the transient replicas. > Cheap quorums are a further set of optimizations on the write path to avoid > writing to transient replicas unless sufficient full replicas are available > as well as optimizations on the read path to prefer reading from transient > replicas. When writing at quorum to a table configured to use transient > replication the quorum will always prefer available full replicas over > transient replicas so that transient replicas don't have to process writes. > Rapid write protection (similar to rapid read protection) reduces tail > latency when full replicas are temporarily late to respond by sending writes > to additional replicas if necessary. > Transient replicas can generally service reads faster because they don't have > do anything beyond bloom filter checks if they have no data. With vnodes and > larger size clusters they will not have a large quantity of data even in > failure cases where transient replicas start to serve a steady amount of > write traffic for some of their transiently replicated ranges. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14404) Transient Replication & Cheap Quorums: Decouple storage requirements from consensus group size using incremental repair
[ https://issues.apache.org/jira/browse/CASSANDRA-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448355#comment-16448355 ] Ariel Weisberg commented on CASSANDRA-14404: This is not sloppy quorums. Sloppy quorums don't provide strong consistency. We still enforce strict quorum membership. >From the Dynamo paper: {quote}To remedy this it does not enforce strict quorum membership and instead it uses a “sloppy quorum”; all read and write operations are performed on the first N healthy nodes from the preference list, which may not always be the first N nodes encountered while walking the consistent hashing ring. {quote} We aren't going to allow you to use transient replication with 2i or MV in version 1. > Transient Replication & Cheap Quorums: Decouple storage requirements from > consensus group size using incremental repair > --- > > Key: CASSANDRA-14404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14404 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Core, CQL, Distributed Metadata, Hints, > Local Write-Read Paths, Materialized Views, Repair, Secondary Indexes, > Testing, Tools >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Major > Fix For: 4.0 > > > Transient Replication is an implementation of [Witness > Replicas|http://www2.cs.uh.edu/~paris/MYPAPERS/Icdcs86.pdf > (https://www.google.com/url?sa=t=j==s=web=1=rja=8=0ahUKEwi834a%E2%80%948HaAhWCneAKHdj8DzAQFggpMAA=http%3A%2F%2Fwww2.cs.uh.edu%2F~paris%2FMYPAPERS%2FIcdcs86.pdf=AOvVaw0GfCaaAtdzHiM65du1-qeI)] > that leverages incremental repair to make full replicas consistent with > transient replicas that don't store the entire data set. Witness replicas are > used in real world systems such as Megastore and Spanner to increase > availability inexpensively without having to commit to more full copies of > the database. Transient replicas implement functionality similar to > upgradable and temporary replicas from the paper. > With transient replication the replication factor is increased beyond the > desired level of data redundancy by adding replicas that only store data when > sufficient full replicas are unavailable to store the data. These replicas > are called transient replicas. When incremental repair runs transient > replicas stream any data they have received to full replicas and once the > data is fully replicated it is dropped at the transient replicas. > Cheap quorums are a further set of optimizations on the write path to avoid > writing to transient replicas unless sufficient full replicas are available > as well as optimizations on the read path to prefer reading from transient > replicas. When writing at quorum to a table configured to use transient > replication the quorum will always prefer available full replicas over > transient replicas so that transient replicas don't have to process writes. > Rapid write protection (similar to rapid read protection) reduces tail > latency when full replicas are temporarily late to respond by sending writes > to additional replicas if necessary. > Transient replicas can generally service reads faster because they don't have > do anything beyond bloom filter checks if they have no data. With vnodes and > larger size clusters they will not have a large quantity of data even in > failure cases where transient replicas start to serve a steady amount of > write traffic for some of their transiently replicated ranges. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14404) Transient Replication & Cheap Quorums: Decouple storage requirements from consensus group size using incremental repair
[ https://issues.apache.org/jira/browse/CASSANDRA-14404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447871#comment-16447871 ] Duarte Nunes commented on CASSANDRA-14404: -- Still haven't read the linked paper, but this is pretty much sloppy quorums, no? Also, out of curiosity, how will this intersect with materialized views? Will a transient replica have a paired transient view replica, will it use the paired view replica of the base replica on which behalf it is accepting a write, or will it simply not call into the view write path? > Transient Replication & Cheap Quorums: Decouple storage requirements from > consensus group size using incremental repair > --- > > Key: CASSANDRA-14404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14404 > Project: Cassandra > Issue Type: New Feature > Components: Coordination, Core, CQL, Distributed Metadata, Hints, > Local Write-Read Paths, Materialized Views, Repair, Secondary Indexes, > Testing, Tools >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Major > Fix For: 4.0 > > > Transient Replication is an implementation of [Witness > Replicas|http://www2.cs.uh.edu/~paris/MYPAPERS/Icdcs86.pdf > (https://www.google.com/url?sa=t=j==s=web=1=rja=8=0ahUKEwi834a%E2%80%948HaAhWCneAKHdj8DzAQFggpMAA=http%3A%2F%2Fwww2.cs.uh.edu%2F~paris%2FMYPAPERS%2FIcdcs86.pdf=AOvVaw0GfCaaAtdzHiM65du1-qeI)] > that leverages incremental repair to make full replicas consistent with > transient replicas that don't store the entire data set. Witness replicas are > used in real world systems such as Megastore and Spanner to increase > availability inexpensively without having to commit to more full copies of > the database. Transient replicas implement functionality similar to > upgradable and temporary replicas from the paper. > With transient replication the replication factor is increased beyond the > desired level of data redundancy by adding replicas that only store data when > sufficient full replicas are unavailable to store the data. These replicas > are called transient replicas. When incremental repair runs transient > replicas stream any data they have received to full replicas and once the > data is fully replicated it is dropped at the transient replicas. > Cheap quorums are a further set of optimizations on the write path to avoid > writing to transient replicas unless sufficient full replicas are available > as well as optimizations on the read path to prefer reading from transient > replicas. When writing at quorum to a table configured to use transient > replication the quorum will always prefer available full replicas over > transient replicas so that transient replicas don't have to process writes. > Rapid write protection (similar to rapid read protection) reduces tail > latency when full replicas are temporarily late to respond by sending writes > to additional replicas if necessary. > Transient replicas can generally service reads faster because they don't have > do anything beyond bloom filter checks if they have no data. With vnodes and > larger size clusters they will not have a large quantity of data even in > failure cases where transient replicas start to serve a steady amount of > write traffic for some of their transiently replicated ranges. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org