Re: Understand eventually consistent
First of all, in your example W=CL? If it so, then the success of any read / write operarion will be determine by if the CL required can be satisfied in that moment. If you write with CL ONE over a CF with RF 3 when 1 node of the replicas is down, then the operarion will success and HitedHandOff will manage to propagate the op through the falling node when it comes up. Instead, when you execute the same OP using CL QUORUM, then it means RF /2+1, it will try to write on the coordinator node and replica. Considering only 1 replica is down, the OP will success too. Now consider same OP but with CL ALL, it will fail since it cant assure that coordinador and both replicas are updated. Hope you can understand the relation between CL and RF Enviado desde mi iPhone El 23/02/2011, a las 21:43, mcasandra mohitanch...@gmail.com escribió: I am reading this again http://wiki.apache.org/cassandra/HintedHandoff and got little confused. This is my understanding about how HH should work based on what I read in Dynamo Paper: 1) Say node A, B, C, D, E are in the cluster in a ring (in that order). 2) For a given key K RF=3. 3) Node B holds theyhash of that key K. Which means when K is written it will be written to B (owner of the hash) + C + D since RF = 3 4) If Node D goes down and there is a write again to key K then this time key K row will be written with W=1 to B (owner) + C + E (HH) since RF=3 needs to be satisfied. Is this correct? 5) In above scenario where node D is down and if we are reading at W=2 and R=2 would it fail even though original nodes B + C are up? Here I am thinking W=2 and R=2 means that 2 nodes that hold the key K are up so it satisfies the CL and thus writes and read will not fail. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6058576.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Understand eventually consistent
Javier Canillas wrote: Instead, when you execute the same OP using CL QUORUM, then it means RF /2+1, it will try to write on the coordinator node and replica. Considering only 1 replica is down, the OP will success too. I am assuming even read will succeed when CL QUORUM and RF=3 and 1 node is down. Javier Canillas wrote: Now consider same OP but with CL ALL, it will fail since it cant assure that coordinador and both replicas are updated. Can you please explain this little more? I thought CL ALL will fail because it needs all the nodes to be up. http://wiki.apache.org/cassandra/API -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061399.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Understand eventually consistent
Well, it will need all nodes that are required on the operation to be up, and to response in a timely fashion, even a time-out rpc of 1 replica will get you a fail response. CL is calculated based on the RF configured for the ColumnFamily. The ConsistencyLevel is an enum that controls both read and write behavior based on ReplicationFactor in your storage-conf.xml. QUORUM = RF / 2 +1; ALL = RF ONE = 1 ANY = 0 Then, on a column family configured with RF = 6, QUORUM means be sure to write at least over 4 nodes before responding, but on a column family configured with RF = 3, QUORUM means be sure to write on 2 at least. In cases where RF is 1 or 2, then QUORUM is like ALL (be sure to write on all nodes involved). On Thu, Feb 24, 2011 at 3:29 PM, mcasandra mohitanch...@gmail.com wrote: Javier Canillas wrote: Instead, when you execute the same OP using CL QUORUM, then it means RF /2+1, it will try to write on the coordinator node and replica. Considering only 1 replica is down, the OP will success too. I am assuming even read will succeed when CL QUORUM and RF=3 and 1 node is down. Javier Canillas wrote: Now consider same OP but with CL ALL, it will fail since it cant assure that coordinador and both replicas are updated. Can you please explain this little more? I thought CL ALL will fail because it needs all the nodes to be up. http://wiki.apache.org/cassandra/API -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061399.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Understand eventually consistent
Does HH count towards QUORUM? Say RF=1 and CL of W=QUORUM and one node that owns the key dies. Would subsequent write operations for that key be successful? I am guessing it will not succeed. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061593.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Understand eventually consistent
On Thu, Feb 24, 2011 at 1:26 PM, mcasandra mohitanch...@gmail.com wrote: Does HH count towards QUORUM? Say RF=1 and CL of W=QUORUM and one node that owns the key dies. Would subsequent write operations for that key be successful? I am guessing it will not succeed. No, it would not succeed. It would only succeed at CL.ANY. -- Tyler Hobbs Software Engineer, DataStax http://datastax.com/ Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra Python client library
Re: Understand eventually consistent
HH is some kind of write repair, so it has nothing to do with CL that is a requirement of the operation; and it won't be used over reads. In your example QUORUM is the same as ALL, since you only have 1 RF (only the data holder - coordinator). If that node fails, all read / writes will fail. Now, on another scenario, with RF = 3 and 1 node down: CL = QUORUM. Will work, but the coordination will mark an HH over the write and attempt to do it for some time over the failed node. Despite this, the operation will success for the client. CL = ALL. Will fail. CL = ONE. Will work. 2 HH will be sent to replicas to perform the update. *Consider CL is the client minimum requirement over an operation to succeed*. If the cluster can assure that value, then the operation will succeed and returned to the client (despite some HH work needs to be done after), if not an error response will be returned. On Thu, Feb 24, 2011 at 4:26 PM, mcasandra mohitanch...@gmail.com wrote: Does HH count towards QUORUM? Say RF=1 and CL of W=QUORUM and one node that owns the key dies. Would subsequent write operations for that key be successful? I am guessing it will not succeed. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061593.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Understand eventually consistent
Javier Canillas wrote: HH is some kind of write repair, so it has nothing to do with CL that is a requirement of the operation; and it won't be used over reads. In your example QUORUM is the same as ALL, since you only have 1 RF (only the data holder - coordinator). If that node fails, all read / writes will fail. Now, on another scenario, with RF = 3 and 1 node down: CL = QUORUM. Will work, but the coordination will mark an HH over the write and attempt to do it for some time over the failed node. Despite this, the operation will success for the client. CL = ALL. Will fail. CL = ONE. Will work. 2 HH will be sent to replicas to perform the update. *Consider CL is the client minimum requirement over an operation to succeed*. If the cluster can assure that value, then the operation will succeed and returned to the client (despite some HH work needs to be done after), if not an error response will be returned. On Thu, Feb 24, 2011 at 4:26 PM, mcasandra mohitanch...@gmail.com wrote: Does HH count towards QUORUM? Say RF=1 and CL of W=QUORUM and one node that owns the key dies. Would subsequent write operations for that key be successful? I am guessing it will not succeed. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061593.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. Thanks! In above scenario what happens if 2 nodes die and RF=3, CL of W=QUORUM. Would a write succeed since one write can be made to coordinator node with HH and other to the replica node that is up. And similarly in above scenario would read succeed. Would HH be considered towards CL in this case? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061772.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Understand eventually consistent
No, since you are intentionally asking that at least a QUORUM of the RFs are written. So in your scenario, only 1 node is up of 3, and QUORUM value is 2. So that operation will fail, no HH is made. A read won't succedd either, since you are asking that the data to be returned must be validated at least by 2 nodes. HH only takes place on write operations and when the OP succeded because the CL can be satisfied and other replicas are down. Then the coordinator uses HH to perform the updates on the failed replicas (as soon as they get up). On Thu, Feb 24, 2011 at 5:13 PM, mcasandra mohitanch...@gmail.com wrote: Javier Canillas wrote: HH is some kind of write repair, so it has nothing to do with CL that is a requirement of the operation; and it won't be used over reads. In your example QUORUM is the same as ALL, since you only have 1 RF (only the data holder - coordinator). If that node fails, all read / writes will fail. Now, on another scenario, with RF = 3 and 1 node down: CL = QUORUM. Will work, but the coordination will mark an HH over the write and attempt to do it for some time over the failed node. Despite this, the operation will success for the client. CL = ALL. Will fail. CL = ONE. Will work. 2 HH will be sent to replicas to perform the update. *Consider CL is the client minimum requirement over an operation to succeed*. If the cluster can assure that value, then the operation will succeed and returned to the client (despite some HH work needs to be done after), if not an error response will be returned. On Thu, Feb 24, 2011 at 4:26 PM, mcasandra mohitanch...@gmail.com wrote: Does HH count towards QUORUM? Say RF=1 and CL of W=QUORUM and one node that owns the key dies. Would subsequent write operations for that key be successful? I am guessing it will not succeed. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061593.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. Thanks! In above scenario what happens if 2 nodes die and RF=3, CL of W=QUORUM. Would a write succeed since one write can be made to coordinator node with HH and other to the replica node that is up. And similarly in above scenario would read succeed. Would HH be considered towards CL in this case? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061772.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Understand eventually consistent
Thanks. This helps a lot! -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6061838.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Understand eventually consistent
I am reading this again http://wiki.apache.org/cassandra/HintedHandoff and got little confused. This is my understanding about how HH should work based on what I read in Dynamo Paper: 1) Say node A, B, C, D, E are in the cluster in a ring (in that order). 2) For a given key K RF=3. 3) Node B holds theyhash of that key K. Which means when K is written it will be written to B (owner of the hash) + C + D since RF = 3 4) If Node D goes down and there is a write again to key K then this time key K row will be written with W=1 to B (owner) + C + E (HH) since RF=3 needs to be satisfied. Is this correct? 5) In above scenario where node D is down and if we are reading at W=2 and R=2 would it fail even though original nodes B + C are up? Here I am thinking W=2 and R=2 means that 2 nodes that hold the key K are up so it satisfies the CL and thus writes and read will not fail. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6058576.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Understand eventually consistent
David Strauss-2 wrote: On Fri, 2011-02-18 at 12:01 -0600, Anthony John wrote: Writes will go thru w/hinted handoff, read will fail That is not correct. Hinted handoffs do not count toward reaching QUORUM counts.[1] [1] http://wiki.apache.org/cassandra/HintedHandoff -- David Strauss | da...@davidstrauss.net | +1 512 577 5827 [mobile] I read the logic of why writes are not allowed. But other alternative is to allow write and just fail the reads until it's in sync again. Is there some other problem with this logic? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6049678.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Understand eventually consistent
I read the logic of why writes are not allowed. But other alternative is to allow write and just fail the reads until it's in sync again. Is there some other problem with this logic? The problem lies in until it's in sync again. A given node cannot easily know for a given read, whether everything is in sync with respect to the data participating in that read. I didn't think about it very carefully, but off the top of my head, in the most general case this would seem to require strong co-ordination that is antithetical to the design of Cassandra. (Consider that for a read of a set of columns, for each column the node would have to know whether the nodes participating in the read have any hints pending in the cluster. Since the co-ordinating node cannot know the context in which the call is made (maybe the client or some other client *just* wrote at quorom with nodes down), this essentially implies co-ordination on every read, at all times.) -- / Peter Schuller
Re: Understand eventually consistent
On Fri, 2011-02-18 at 12:01 -0600, Anthony John wrote: Writes will go thru w/hinted handoff, read will fail That is not correct. Hinted handoffs do not count toward reaching QUORUM counts.[1] [1] http://wiki.apache.org/cassandra/HintedHandoff -- David Strauss | da...@davidstrauss.net | +1 512 577 5827 [mobile] signature.asc Description: This is a digitally signed message part
Re: Understand eventually consistent
At Quorum - if 2 of 3 nodes are down, a read should not be returned, right ? But yes - if single node READs are opted for, it will go through. The original question was - Why is Cassandra called eventually consistent data store? Because at write time, there is not a guarantee that all replicas are consistent. But they eventually will be! At Quorum write and Read - you will not get inconsistent results and your read will force consistency, if such a state has not yet been arrived at for the particular piece of data. But you have the option of or writing and reading at a lower standard, which could result in inconsistencies. HTH, -JA On Fri, Feb 18, 2011 at 12:00 AM, Stu Hood stuh...@gmail.com wrote: But, the reason that it isn't safe to say that we are a strongly consistent store is that if 2 of your 3 replicas were to die and come back with no data, QUORUM might return the wrong result. A requirement of a strongly consistent store is that replicas cannot begin answering queries until they are consistent: this is not a requirement in Cassandra, althought arguably should be an option at some point in the distant future. On Thu, Feb 17, 2011 at 5:26 PM, Aaron Morton aa...@thelastpickle.comwrote: For background... http://wiki.apache.org/cassandra/ArchitectureOverview (There is a section on consistency in there) For deep background... http://www.allthingsdistributed.com/2008/12/eventually_consistent.html http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf In short, yes (for all your questions) if you read and write at Quorum you have consistency behavior for your operations. Even though some nodes may have an inconsistent view of the data, e.g. one node is partitioned by a broken network or is overloaded and does not respond. Aaron On 18 Feb, 2011,at 02:11 PM, mcasandra mohitanch...@gmail.com wrote: Why is Cassandra called eventually consistent data store? Wouldn't it be consistent if QUORAM is used? Another question is when I specify replication factor of 3 and write with factor of 2 and read with factor of 2 then what happens? 1. When write occurs cassandra will return to the client only when the writes go to commit log on 2 nodes successfully? 2. When read happens cassandra will return only when it is able to read from 2 nodes and determine that it has consistent copy? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6038330.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Understand eventually consistent
Related question: Is it a good idea to specify ConsistencyLevels on a per-operation basis? For example: Read ONE Write ALL would deliver consistent read results, just like Read ALL Write ONE. However, if you specify Read ONE Write QUORUM you cannot give such guarantees anymore. Should there be (is there) a programming abstraction on top of ConsistencyLevel that takes care of these things and makes them explicit to the application developer? On Fri, Feb 18, 2011 at 2:04 PM, Anthony John chirayit...@gmail.com wrote: At Quorum - if 2 of 3 nodes are down, a read should not be returned, right ? But yes - if single node READs are opted for, it will go through. The original question was - Why is Cassandra called eventually consistent data store? Because at write time, there is not a guarantee that all replicas are consistent. But they eventually will be! At Quorum write and Read - you will not get inconsistent results and your read will force consistency, if such a state has not yet been arrived at for the particular piece of data. But you have the option of or writing and reading at a lower standard, which could result in inconsistencies. HTH, -JA On Fri, Feb 18, 2011 at 12:00 AM, Stu Hood stuh...@gmail.com wrote: But, the reason that it isn't safe to say that we are a strongly consistent store is that if 2 of your 3 replicas were to die and come back with no data, QUORUM might return the wrong result. A requirement of a strongly consistent store is that replicas cannot begin answering queries until they are consistent: this is not a requirement in Cassandra, althought arguably should be an option at some point in the distant future. On Thu, Feb 17, 2011 at 5:26 PM, Aaron Morton aa...@thelastpickle.com wrote: For background... http://wiki.apache.org/cassandra/ArchitectureOverview (There is a section on consistency in there) For deep background... http://www.allthingsdistributed.com/2008/12/eventually_consistent.html http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf In short, yes (for all your questions) if you read and write at Quorum you have consistency behavior for your operations. Even though some nodes may have an inconsistent view of the data, e.g. one node is partitioned by a broken network or is overloaded and does not respond. Aaron On 18 Feb, 2011,at 02:11 PM, mcasandra mohitanch...@gmail.com wrote: Why is Cassandra called eventually consistent data store? Wouldn't it be consistent if QUORAM is used? Another question is when I specify replication factor of 3 and write with factor of 2 and read with factor of 2 then what happens? 1. When write occurs cassandra will return to the client only when the writes go to commit log on 2 nodes successfully? 2. When read happens cassandra will return only when it is able to read from 2 nodes and determine that it has consistent copy? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6038330.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Understand eventually consistent
On Fri, Feb 18, 2011 at 12:00 AM, Stu Hood stuh...@gmail.com wrote: But, the reason that it isn't safe to say that we are a strongly consistent store is that if 2 of your 3 replicas were to die and come back with no data, QUORUM might return the wrong result. Not so. If you allow vaporizing arbitrary numbers of machines without a trace then only systems that block for all replicas on each update could be considered strongly consistent, and I don't know of any systems in the wild that actually do that. Certainly other systems commonly considered strongly consisent like HBase do not. A requirement of a strongly consistent store is that replicas cannot begin answering queries until they are consistent The system as a whole can be consistent even if an individual replica is not; that is the point of CL ONE. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Understand eventually consistent
Again, my understanding! 1. Writes will go thru w/hinted handoff, read will fail 2. Yes - but Oracle and others have no partition tolerance and lower levels of availability. To build in partition tolerance and high availability and still be shared nothing to avoid SPOF (to cover the RAC implementation), you have to write on to multiple nodes and then read off multiple nodes to ensure consistency. You could always run RF=1 to be like most of the traditional DBMSs. The issues you would phase are the ones that Cassandra is trying to prevent! HTH, -JA On Fri, Feb 18, 2011 at 11:53 AM, mcasandra mohitanch...@gmail.com wrote: I have couple of more quesitons: 1. What happens when RF = 3, R = 2 and W = 2 and 2 machines go down? Would read and write fail or get the results from that one machine that is up? 2. Someone in this thread mentioned that write is eventually consistent. Is it because response is returned to the client as soon as data is written to commit log. But isn't this same as other RDBMS? Oracle does the same thing it writes to REDO log and somepoint later does a checkpoint and flushes data to disk. But RDBMS is not called eventually consistent. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6040893.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Understand eventually consistent
For background...http://wiki.apache.org/cassandra/ArchitectureOverview(There is a section on consistency in there)For deep background...http://www.allthingsdistributed.com/2008/12/eventually_consistent.htmlhttp://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdfIn short, yes (for all your questions) if you read and write at Quorum you have consistencybehavior for your operations. Even though some nodesmay have an inconsistent view of the data, e.g. one node is partitioned by a broken network or is overloaded and does not respond.AaronOn 18 Feb, 2011,at 02:11 PM, mcasandra mohitanch...@gmail.com wrote: Why is Cassandra called eventually consistent data store? Wouldn't it be consistent if QUORAM is used? Another question is when I specify replication factor of 3 and write with factor of 2 and read with factor of 2 then what happens? 1. When write occurs cassandra will return to the client only when the writes go to commit log on 2 nodes successfully? 2. When read happens cassandra will return only when it is able to read from 2 nodes and determine that it has consistent copy? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Understand-eventually-consistent-tp6038330p6038330.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.