Re: Behavior on inconsistent reads

2012-05-14 Thread aaron morton
What sort of corruption are you thinking about ?

Whenever the first CL nodes involved in a read do not agree on the "current" 
value a  process is run to resolve their differences. This can result in an a 
node that is out of sync getting repaired.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 11/05/2012, at 11:17 PM, Carpenter, Curt wrote:

> I (now) understand that the point of this is to get the most recent copy (at 
> least of the nodes checked) if all replicas simply haven’t been updated to 
> the latest changes. But what about dealing with corruption? What if the most 
> recent copy is corrupt? With a Zookeeper-based transaction system on top, 
> corruption is all I’m worried about.
>  
> From: Dave Brosius [mailto:dbros...@mebigfatguy.com] 
> Sent: Thursday, May 10, 2012 10:03 PM
> 
> If you read at Consistency of at least quorum, you are guaranteed that at 
> least one of the nodes has the latest data, and so you get the right data. If 
> you read with less than quorum it would be possible for all the nodes that 
> respond to have stale data.
> 
> On 05/10/2012 09:46 PM, Carpenter, Curt wrote:
> Hi all, newbie here. Be gentle.
>  
> From 
> http://www.datastax.com/docs/1.0/cluster_architecture/about_client_requests:
> “Thus, the coordinator first contacts the replicas specified by the 
> consistency level. The coordinator will send these requests to the replicas 
> that are currently responding most promptly. The nodes contacted will respond 
> with the requested data; if multiple nodes are contacted, the rows from each 
> replica are compared in memory to see if they are consistent. If they are 
> not, then the replica that has the most recent data (based on the timestamp) 
> is used by the coordinator to forward the result back to the client.
> 
> To ensure that all replicas have the most recent version of frequently-read 
> data, the coordinator also contacts and compares the data from all the 
> remaining replicas that own the row in the background, and if they are 
> inconsistent, issues writes to the out-of-date replicas to update the row to 
> reflect the most recently written values. This process is known as read 
> repair. Read repair can be configured per column family 
> (usingread_repair_chance), and is enabled by default.
> 
> For example, in a cluster with a replication factor of 3, and a read 
> consistency level of QUORUM, 2 of the 3 replicas for the given row are 
> contacted to fulfill the read request. Supposing the contacted replicas had 
> different versions of the row, the replica with the most recent version would 
> return the requested data. In the background, the third replica is checked 
> for consistency with the first two, and if needed, the most recent replica 
> issues a write to the out-of-date replicas.”
> 
>  
> Always returns the most recent? What if the most recent write is corrupt? I 
> thought the whole point of a quorum was that consistency is verified before 
> the data is returned to the client. No?
>  
> Thanks,
>  
> Curt



RE: Behavior on inconsistent reads

2012-05-11 Thread Carpenter, Curt
I (now) understand that the point of this is to get the most recent copy
(at least of the nodes checked) if all replicas simply haven't been
updated to the latest changes. But what about dealing with corruption?
What if the most recent copy is corrupt? With a Zookeeper-based
transaction system on top, corruption is all I'm worried about. 

 

From: Dave Brosius [mailto:dbros...@mebigfatguy.com] 
Sent: Thursday, May 10, 2012 10:03 PM



If you read at Consistency of at least quorum, you are guaranteed that
at least one of the nodes has the latest data, and so you get the right
data. If you read with less than quorum it would be possible for all the
nodes that respond to have stale data.

On 05/10/2012 09:46 PM, Carpenter, Curt wrote: 

Hi all, newbie here. Be gentle.

 

From
http://www.datastax.com/docs/1.0/cluster_architecture/about_client_reque
sts:

"Thus, the coordinator first contacts the replicas specified by the
consistency level. The coordinator will send these requests to the
replicas that are currently responding most promptly. The nodes
contacted will respond with the requested data; if multiple nodes are
contacted, the rows from each replica are compared in memory to see if
they are consistent. If they are not, then the replica that has the most
recent data (based on the timestamp) is used by the coordinator to
forward the result back to the client.

To ensure that all replicas have the most recent version of
frequently-read data, the coordinator also contacts and compares the
data from all the remaining replicas that own the row in the background,
and if they are inconsistent, issues writes to the out-of-date replicas
to update the row to reflect the most recently written values. This
process is known as read repair. Read repair can be configured per
column family (usingread_repair_chance
 ), and is enabled by default.

For example, in a cluster with a replication factor of 3, and a read
consistency level of QUORUM, 2 of the 3 replicas for the given row are
contacted to fulfill the read request. Supposing the contacted replicas
had different versions of the row, the replica with the most recent
version would return the requested data. In the background, the third
replica is checked for consistency with the first two, and if needed,
the most recent replica issues a write to the out-of-date replicas."

 

Always returns the most recent? What if the most recent write is
corrupt? I thought the whole point of a quorum was that consistency is
verified before the data is returned to the client. No?

 

Thanks,

 

Curt

 



Re: Behavior on inconsistent reads

2012-05-10 Thread Dave Brosius
If you read at Consistency of at least quorum, you are guaranteed that 
at least one of the nodes has the latest data, and so you get the right 
data. If you read with less than quorum it would be possible for all the 
nodes that respond to have stale data.




On 05/10/2012 09:46 PM, Carpenter, Curt wrote:


Hi all, newbie here. Be gentle.

From 
http://www.datastax.com/docs/1.0/cluster_architecture/about_client_requests:


"Thus, the coordinator first contacts the replicas specified by the 
consistency level. The coordinator will send these requests to the 
replicas that are currently responding most promptly. The nodes 
contacted will respond with the requested data; if multiple nodes are 
contacted, the rows from each replica are compared in memory to see if 
they are consistent. If they are not, then the replica that has the 
most recent data (based on the timestamp) is used by the coordinator 
to forward the result back to the client.


To ensure that all replicas have the most recent version of 
frequently-read data, the coordinator also contacts and compares the 
data from all the remaining replicas that own the row in the 
background, and if they are inconsistent, issues writes to the 
out-of-date replicas to update the row to reflect the most recently 
written values. This process is known as/read repair/. Read repair can 
be configured per column family (using/read_repair_chance/ 
), 
and is enabled by default.


For example, in a cluster with a replication factor of 3, and a read 
consistency level of QUORUM, 2 of the 3 replicas for the given row are 
contacted to fulfill the read request. Supposing the contacted 
replicas had different versions of the row, the replica with the most 
recent version would return the requested data. In the background, the 
third replica is checked for consistency with the first two, and if 
needed, the most recent replica issues a write to the out-of-date 
replicas."


Always returns the most recent? What if the most recent write is 
corrupt? I thought the whole point of a quorum was that consistency is 
verified /before/ the data is returned to the client. No?


Thanks,

Curt





Behavior on inconsistent reads

2012-05-10 Thread Carpenter, Curt
Hi all, newbie here. Be gentle.

 

From
http://www.datastax.com/docs/1.0/cluster_architecture/about_client_reque
sts:

"Thus, the coordinator first contacts the replicas specified by the
consistency level. The coordinator will send these requests to the
replicas that are currently responding most promptly. The nodes
contacted will respond with the requested data; if multiple nodes are
contacted, the rows from each replica are compared in memory to see if
they are consistent. If they are not, then the replica that has the most
recent data (based on the timestamp) is used by the coordinator to
forward the result back to the client.

To ensure that all replicas have the most recent version of
frequently-read data, the coordinator also contacts and compares the
data from all the remaining replicas that own the row in the background,
and if they are inconsistent, issues writes to the out-of-date replicas
to update the row to reflect the most recently written values. This
process is known as read repair. Read repair can be configured per
column family (usingread_repair_chance
 ), and is enabled by default.

For example, in a cluster with a replication factor of 3, and a read
consistency level of QUORUM, 2 of the 3 replicas for the given row are
contacted to fulfill the read request. Supposing the contacted replicas
had different versions of the row, the replica with the most recent
version would return the requested data. In the background, the third
replica is checked for consistency with the first two, and if needed,
the most recent replica issues a write to the out-of-date replicas."

 

Always returns the most recent? What if the most recent write is
corrupt? I thought the whole point of a quorum was that consistency is
verified before the data is returned to the client. No?

 

Thanks,

 

Curt