Re: Write everywhere, read anywhere
On Thu, Aug 4, 2011 at 10:25 AM, Jeremiah Jordan < jeremiah.jor...@morningstar.com> wrote: > If you have RF=3 quorum won’t fail with one node down. So R/W quorum > will be consistent in the case of one node down. If two nodes go down at > the same time, then you can get inconsistent data from quorum write/read if > the write fails with TimeOut, the nodes come back up, and then read asks the > two nodes that were down what the value is. And another read asks the node > that was up, and a node that was down. Those two reads will get different > answers. > So the short answer is: yea, same thing can happen with quorum... It's true that the failure scenarios are slightly different, but it's not entirely true that two nodes need to fail to trigger inconsistencies with quorum. A single node could be partitioned and produce the same result. If a network event occurs on a single host then any writes that came in before the event, that are processed before phi evict kicks in and marks the rest of the cluster unavailable, will be written locally. From the rest of the cluster's perspective only one node "failed," but from that node's perspective the entire rest of the cluster failed. Obviously, similar things could happen with DC_QUORUM if a datacenter went offline. Mike
RE: Write everywhere, read anywhere
If you have RF=3 quorum won't fail with one node down. So R/W quorum will be consistent in the case of one node down. If two nodes go down at the same time, then you can get inconsistent data from quorum write/read if the write fails with TimeOut, the nodes come back up, and then read asks the two nodes that were down what the value is. And another read asks the node that was up, and a node that was down. Those two reads will get different answers. From: Mike Malone [mailto:m...@simplegeo.com] Sent: Thursday, August 04, 2011 12:16 PM To: user@cassandra.apache.org Subject: Re: Write everywhere, read anywhere 2011/8/3 Patricio Echagüe On Wed, Aug 3, 2011 at 4:00 PM, Philippe wrote: Hello, I have a 3-node, RF=3, cluster configured to write at CL.ALL and read at CL.ONE. When I take one of the nodes down, writes fail which is what I expect. When I run a repair, I see data being streamed from those column families... that I didn't expect. How can the nodes diverge ? Does this mean that reading at CL.ONE may return inconsistent data ? we abort the mutation before hand when there are enough replicas alive. If a mutation went through and in the middle of it a replica goes down, in that case you can write to some nodes and the request will Timeout. In that case the CL.ONE may return inconsistence data. Doesn't CL.QUORUM suffer from the same problem? There's no isolation or rollback with CL.QUORUM either. So if I do a quorum write with RF=3 and it fails after hitting a single node, a subsequent quorum read could return the old data (if it hits the two nodes that didn't receive the write) or the new data that failed mid-write (if it hits the node that did receive the write). Basically, the scenarios where CL.ALL + CL.ONE results in a read of inconsistent data could also cause a CL.QUORUM write followed by a CL.QUORUM read to return inconsistent data. Right? The problem (if there is one) is that even in the quorum case columns with the most recent timestamp win during repair resolution, not columns that have quorum consensus. Mike
Re: Write everywhere, read anywhere
2011/8/3 Patricio Echagüe > > > On Wed, Aug 3, 2011 at 4:00 PM, Philippe wrote: > >> Hello, >> I have a 3-node, RF=3, cluster configured to write at CL.ALL and read at >> CL.ONE. When I take one of the nodes down, writes fail which is what I >> expect. >> When I run a repair, I see data being streamed from those column >> families... that I didn't expect. How can the nodes diverge ? Does this mean >> that reading at CL.ONE may return inconsistent data ? >> > > we abort the mutation before hand when there are enough replicas alive. If > a mutation went through and in the middle of it a replica goes down, in that > case you can write to some nodes and the request will Timeout. > In that case the CL.ONE may return inconsistence data. > Doesn't CL.QUORUM suffer from the same problem? There's no isolation or rollback with CL.QUORUM either. So if I do a quorum write with RF=3 and it fails after hitting a single node, a subsequent quorum read could return the old data (if it hits the two nodes that didn't receive the write) or the new data that failed mid-write (if it hits the node that did receive the write). Basically, the scenarios where CL.ALL + CL.ONE results in a read of inconsistent data could also cause a CL.QUORUM write followed by a CL.QUORUM read to return inconsistent data. Right? The problem (if there is one) is that even in the quorum case columns with the most recent timestamp win during repair resolution, not columns that have quorum consensus. Mike
Re: Write everywhere, read anywhere
On Wed, Aug 3, 2011 at 4:00 PM, Philippe wrote: > Hello, > I have a 3-node, RF=3, cluster configured to write at CL.ALL and read at > CL.ONE. When I take one of the nodes down, writes fail which is what I > expect. > When I run a repair, I see data being streamed from those column > families... that I didn't expect. How can the nodes diverge ? Does this mean > that reading at CL.ONE may return inconsistent data ? > we abort the mutation before hand when there are enough replicas alive. If a mutation went through and in the middle of it a replica goes down, in that case you can write to some nodes and the request will Timeout. In that case the CL.ONE may return inconsistence data. > > Question 2 : I've doing this rather than CL.QUORUM because I've been > expecting CL.ONE to return data faster than CL.QUORUM. Is that a good > assumption ? Yes, it's ok for writes to be down for a while. > When you hit a node that own the piece of data, CL.ONE will be faster as you don't have to wait for a read across the network to reach another node. For CL.QUORUM we fire reads in parallel to all the replicas and wait until completing quorum. If I'm not wrong, in some cases the difference may be negligible for CL.ONE and CL.QUORUM when you hit a coordinator that doesn't own the data since you are going over the network anyway (assuming all nodes take the same time to reply) > > Thanks >
Write everywhere, read anywhere
Hello, I have a 3-node, RF=3, cluster configured to write at CL.ALL and read at CL.ONE. When I take one of the nodes down, writes fail which is what I expect. When I run a repair, I see data being streamed from those column families... that I didn't expect. How can the nodes diverge ? Does this mean that reading at CL.ONE may return inconsistent data ? Question 2 : I've doing this rather than CL.QUORUM because I've been expecting CL.ONE to return data faster than CL.QUORUM. Is that a good assumption ? Yes, it's ok for writes to be down for a while. Thanks