[jira] [Commented] (CASSANDRA-7056) Add RAMP transactions

2014-09-20 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142227#comment-14142227
 ] 

Peter Bailis commented on CASSANDRA-7056:
-

bq. Let's assume we query from partition A and B, and we see the results don't 
match timestamps, we would pull the latest batchlog assuming they are from the 
same batch but let's say they in fact are not. In this case we wasted a lot of 
time so my question is should we only do this in the user supplies a new CL 
type?

If you set the same, unique (e.g., UUID) write timestamp for all writes in a 
batch, then you know that any results with different timestamps  are part of 
different batches. So, given mismatched timestamps, should you check the 
batchlog for pending writes? One solution is to always check (as in 
RAMP-Small). This doesn't require any extra metadata, but, as you point out, 
also requires 2 RTTs. To cut down on these RTTs, you could also do attach a 
Bloom filter of the items in each batch and only check any possibly missing 
writes (as in RAMP-Hybrid). (I can go into more detail if you want.) However, I 
agree that you might not want to pay these costs *all* of the time for reads. 
Would a BATCH_READ or other modifier to CQL SELECT statements make sense?

bq. In the case of a global index we plan on reading the data after reading the 
index. The data query might reveal the indexed value is stale. We would need to 
apply the batchlog and fix the index, would we then restart the entire query? 
or maybe overquery assuming some index values will be stale? Either way this 
query looks different than the above scenario.

I think there are a few options. The easiest is to simply filter out the out of 
date rows, and then you are guaranteed to see a subset of the index entries. 
Alternatively, you could provide a snapshot index read where you read the 
older, overwritten values from the data node. If you want a read latest and 
read snapshot mode, there are some options I can describe, but they generally 
entail either more metadata or, otherwise, using locks/blocking coordination, 
which I don't think you want.


 Add RAMP transactions
 -

 Key: CASSANDRA-7056
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Tupshin Harper
Priority: Minor

 We should take a look at 
 [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/]
  transactions, and figure out if they can be used to provide more efficient 
 LWT (or LWT-like) operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7056) Add RAMP transactions

2014-06-30 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048119#comment-14048119
 ] 

Peter Bailis commented on CASSANDRA-7056:
-

bq.  I doubt this will be dramatically more complex, but the approach to 
implementation is fundamentally different. It seems to me supporting 
transactions of arbitrary size is an equally powerful win to consistent 
transactions.

I agree streaming batches could be really useful. In effect, you're turning 
an operation you'd have to perform client-side (e.g., you can simulate 
streaming by simply buffering your write sets and then calling one big BATCH) 
into a server-assisted one (where your proposed read-buffer/memtable stores the 
pending inserts while you're still deciding what goes into the transaction). 
From the RAMP perspective, this doesn't change things substantially -- you just 
have to make sure to propagate the appropriate txn metadata after you've 
determined what writes made it into the batch.

[~benedict]: towards your point on non-QUORUM but QUORUM reads, I agree there 
are some cool tricks to play. There's some additional complexity in these 
optimizations, but, the basic observation is a good one: if I already have a 
transaction ID I want to read from and the metadata associated with it, all I 
have to do is find the matching versions which don't necessarily require QUORUM 
reads for consistency w.r.t. the ID.

 Add RAMP transactions
 -

 Key: CASSANDRA-7056
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Tupshin Harper
Priority: Minor

 We should take a look at 
 [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/]
  transactions, and figure out if they can be used to provide more efficient 
 LWT (or LWT-like) operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7056) Add RAMP transactions

2014-06-25 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043771#comment-14043771
 ] 

Peter Bailis commented on CASSANDRA-7056:
-

 RAMP has a requirement that anything being read/written that way is always 
 written in the same groupings. If you update B,C and then update A,B. You 
 can't read B,C anymore successfully, as the times on B and C will never match.

This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 
1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 
and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will 
return B = 2, C = 1. The timestamps on B and C will indeed---as you point 
out---never match, but simply returning matching timestamps is *not* not the 
goal: the goal is that if you read any write in a given batch, you will read 
the rest of the writes in the batch (to the items you requested in the batch 
read)


 Add RAMP transactions
 -

 Key: CASSANDRA-7056
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Tupshin Harper
Priority: Minor

 We should take a look at 
 [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/]
  transactions, and figure out if they can be used to provide more efficient 
 LWT (or LWT-like) operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-7056) Add RAMP transactions

2014-06-25 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043771#comment-14043771
 ] 

Peter Bailis edited comment on CASSANDRA-7056 at 6/25/14 5:16 PM:
--

 RAMP has a requirement that anything being read/written that way is always 
 written in the same groupings. If you update B,C and then update A,B. You 
 can't read B,C anymore successfully, as the times on B and C will never match.

This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 
1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 
and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will 
return B = 2, C = 1. The timestamps on B and C will indeed--as you point 
out--never match, but simply returning matching timestamps is *not* not the 
goal: the goal is that if you read any write in a given batch, you will read 
the rest of the writes in the batch (to the items you requested in the batch 
read)



was (Author: pbailis):
 RAMP has a requirement that anything being read/written that way is always 
 written in the same groupings. If you update B,C and then update A,B. You 
 can't read B,C anymore successfully, as the times on B and C will never match.

This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 
1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 
and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will 
return B = 2, C = 1. The timestamps on B and C will indeed---as you point 
out---never match, but simply returning matching timestamps is *not* not the 
goal: the goal is that if you read any write in a given batch, you will read 
the rest of the writes in the batch (to the items you requested in the batch 
read)


 Add RAMP transactions
 -

 Key: CASSANDRA-7056
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Tupshin Harper
Priority: Minor

 We should take a look at 
 [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/]
  transactions, and figure out if they can be used to provide more efficient 
 LWT (or LWT-like) operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-7056) Add RAMP transactions

2014-06-25 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043771#comment-14043771
 ] 

Peter Bailis edited comment on CASSANDRA-7056 at 6/25/14 5:18 PM:
--

 RAMP has a requirement that anything being read/written that way is always 
 written in the same groupings. If you update B,C and then update A,B. You 
 can't read B,C anymore successfully, as the times on B and C will never match.

This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 
1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 
and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will 
return B = 2, C = 1. The timestamps on B and C will indeed (as you point out) 
not match, but simply returning matching timestamps is *not* the goal: the goal 
is that if you read any write in a given batch, you will read the rest of the 
writes in the batch (to the items you requested in the batch read)



was (Author: pbailis):
 RAMP has a requirement that anything being read/written that way is always 
 written in the same groupings. If you update B,C and then update A,B. You 
 can't read B,C anymore successfully, as the times on B and C will never match.

This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 
1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 
and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will 
return B = 2, C = 1. The timestamps on B and C will indeed (as you point out) 
never match, but simply returning matching timestamps is *not* the goal: the 
goal is that if you read any write in a given batch, you will read the rest of 
the writes in the batch (to the items you requested in the batch read)


 Add RAMP transactions
 -

 Key: CASSANDRA-7056
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Tupshin Harper
Priority: Minor

 We should take a look at 
 [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/]
  transactions, and figure out if they can be used to provide more efficient 
 LWT (or LWT-like) operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-7056) Add RAMP transactions

2014-06-25 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043771#comment-14043771
 ] 

Peter Bailis edited comment on CASSANDRA-7056 at 6/25/14 5:16 PM:
--

 RAMP has a requirement that anything being read/written that way is always 
 written in the same groupings. If you update B,C and then update A,B. You 
 can't read B,C anymore successfully, as the times on B and C will never match.

This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 
1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 
and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will 
return B = 2, C = 1. The timestamps on B and C will indeed (as you point out) 
never match, but simply returning matching timestamps is *not* the goal: the 
goal is that if you read any write in a given batch, you will read the rest of 
the writes in the batch (to the items you requested in the batch read)



was (Author: pbailis):
 RAMP has a requirement that anything being read/written that way is always 
 written in the same groupings. If you update B,C and then update A,B. You 
 can't read B,C anymore successfully, as the times on B and C will never match.

This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 
1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 
and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will 
return B = 2, C = 1. The timestamps on B and C will indeed (as you point out) 
never match, but simply returning matching timestamps is *not* not the goal: 
the goal is that if you read any write in a given batch, you will read the rest 
of the writes in the batch (to the items you requested in the batch read)


 Add RAMP transactions
 -

 Key: CASSANDRA-7056
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Tupshin Harper
Priority: Minor

 We should take a look at 
 [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/]
  transactions, and figure out if they can be used to provide more efficient 
 LWT (or LWT-like) operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-7056) Add RAMP transactions

2014-06-25 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043771#comment-14043771
 ] 

Peter Bailis edited comment on CASSANDRA-7056 at 6/25/14 5:16 PM:
--

 RAMP has a requirement that anything being read/written that way is always 
 written in the same groupings. If you update B,C and then update A,B. You 
 can't read B,C anymore successfully, as the times on B and C will never match.

This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 
1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 
and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will 
return B = 2, C = 1. The timestamps on B and C will indeed (as you point out) 
never match, but simply returning matching timestamps is *not* not the goal: 
the goal is that if you read any write in a given batch, you will read the rest 
of the writes in the batch (to the items you requested in the batch read)



was (Author: pbailis):
 RAMP has a requirement that anything being read/written that way is always 
 written in the same groupings. If you update B,C and then update A,B. You 
 can't read B,C anymore successfully, as the times on B and C will never match.

This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 
1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 
and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will 
return B = 2, C = 1. The timestamps on B and C will indeed--as you point 
out--never match, but simply returning matching timestamps is *not* not the 
goal: the goal is that if you read any write in a given batch, you will read 
the rest of the writes in the batch (to the items you requested in the batch 
read)


 Add RAMP transactions
 -

 Key: CASSANDRA-7056
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Tupshin Harper
Priority: Minor

 We should take a look at 
 [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/]
  transactions, and figure out if they can be used to provide more efficient 
 LWT (or LWT-like) operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Comment Edited] (CASSANDRA-7056) Add RAMP transactions

2014-06-25 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043771#comment-14043771
 ] 

Peter Bailis edited comment on CASSANDRA-7056 at 6/25/14 5:19 PM:
--

 RAMP has a requirement that anything being read/written that way is always 
 written in the same groupings. If you update B,C and then update A,B. You 
 can't read B,C anymore successfully, as the times on B and C will never match.

This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 
1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 
and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will 
return B = 2, C = 1. The timestamps on B and C will indeed (as you point out) 
not match, but simply returning matching timestamps is *not* the goal: the goal 
is that if you read any write in a given batch, you will be able to read the 
rest of the writes in the batch (i.e., if you also attempt to read any other 
items that were written in the batch, you will see the corresponding writes).



was (Author: pbailis):
 RAMP has a requirement that anything being read/written that way is always 
 written in the same groupings. If you update B,C and then update A,B. You 
 can't read B,C anymore successfully, as the times on B and C will never match.

This isn't entirely correct. Let's say I do an atomic batch B1 that writes B = 
1 and C = 1 with timestamp 1, then you do an atomic batch B2 that writes A = 2 
and B = 2 at timestamp 2. Under RAMP, subsequent batch reads from B and C will 
return B = 2, C = 1. The timestamps on B and C will indeed (as you point out) 
not match, but simply returning matching timestamps is *not* the goal: the goal 
is that if you read any write in a given batch, you will read the rest of the 
writes in the batch (to the items you requested in the batch read)


 Add RAMP transactions
 -

 Key: CASSANDRA-7056
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Tupshin Harper
Priority: Minor

 We should take a look at 
 [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/]
  transactions, and figure out if they can be used to provide more efficient 
 LWT (or LWT-like) operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7056) Add RAMP transactions

2014-06-25 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043851#comment-14043851
 ] 

Peter Bailis commented on CASSANDRA-7056:
-

[~jjordan] Good question. The short answer is that this behavior (reading A @2 
and C@1) is well-defined under RAMP. Just like in Cassandra today, the fact 
that I read a write at time 500 doesn't mean I'm going to see the effects of 
all writes that occur before time 500! Rather, the guarantee that RAMP adds is 
that, once you see the effects of one write in the the batch, you'll see all of 
the writes in the batch.

So, in your scenario, you have three batches: B1 {A=1, B=1} at time 1, B1.5 
{B=1.5, C=1.5} at time 1.5, and B2 {A=2, B=2} at time 2. You could get the 
behavior you describe above if B1 executes and completes, B2 executes and 
complete, and we subsequently read sometime before B1.5 completes. So, I guess 
I disagree that the real C you should be getting is the one from [the batch at 
time 1.5] because you didn't yet see the effect of any writes from B1.5. 
However, once B1.5 completes, you *will* be guaranteed to read C at time 1.5.

It may be easier to think of RAMP as providing the ability to take each of your 
normal reads and writes under LWW and turn them into multi-column, multi-table 
writes that are all going to be visible/reflected in the table state (once 
completed). There's no special ordering guarantees beyond what Cassandra 
already provides; if you need strong ordering guarantees (e.g., enforcing 
sequential assignment of timestamps), it's a case for CAS.

 Add RAMP transactions
 -

 Key: CASSANDRA-7056
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Tupshin Harper
Priority: Minor

 We should take a look at 
 [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/]
  transactions, and figure out if they can be used to provide more efficient 
 LWT (or LWT-like) operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-7056) Add RAMP transactions

2014-06-24 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042848#comment-14042848
 ] 

Peter Bailis commented on CASSANDRA-7056:
-

As I mentioned at the Next-Generation Cassandra Conference, I'm happy to get 
the ball rolling on an implementation of RAMP in Cassandra.

To reiterate a few points from the NGCC, I think RAMP could provide some useful 
isolation guarantees for Cassandra's Atomic Batch operations (either none of 
the updates will be visible, or all are visible) as well as provide the basis 
for consistent global secondary index updates in Cassandra-6477. I've posted 
my slides from the NGCC on SpeakerDeck; the Cassandra-specific implementation 
details start on transition number 287.
https://speakerdeck.com/pbailis/scalable-atomic-visibility-with-ramp-transactions

I have some time to hack on this and am willing to work on a patch and/or 
hammer out the Cassandra-specific design with you all over JIRA or otherwise!



 Add RAMP transactions
 -

 Key: CASSANDRA-7056
 URL: https://issues.apache.org/jira/browse/CASSANDRA-7056
 Project: Cassandra
  Issue Type: Wish
  Components: Core
Reporter: Tupshin Harper
Priority: Minor

 We should take a look at 
 [RAMP|http://www.bailis.org/blog/scalable-atomic-visibility-with-ramp-transactions/]
  transactions, and figure out if they can be used to provide more efficient 
 LWT (or LWT-like) operations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6477) Global indexes

2014-06-24 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042865#comment-14042865
 ] 

Peter Bailis commented on CASSANDRA-6477:
-

Just to follow up regarding our conversations at the NGCC: is there any 
interest at this point in delivering any form of consistent index updates 
(e.g., beyond eventually consistent index entries), or is the primary goal 
right now simply to get basic global index functionality working?

Also, FWIW, though I have yet to think through [~benedict]'s second proposal, 
the local CAS approach makes a lot of sense from my perspective insofar as 
you're willing to tolerate RF-1 redundant (but, importantly, idempotent!) index 
invalidations. I think this will work especially well when, per [~tjake]'s 
proposal, you start partitioning the secondary index servers.

 Global indexes
 --

 Key: CASSANDRA-6477
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Reporter: Jonathan Ellis
 Fix For: 3.0


 Local indexes are suitable for low-cardinality data, where spreading the 
 index across the cluster is a Good Thing.  However, for high-cardinality 
 data, local indexes require querying most nodes in the cluster even if only a 
 handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5455) Remove PBSPredictor

2013-05-27 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13668054#comment-13668054
 ] 

Peter Bailis commented on CASSANDRA-5455:
-

bq. Do we need any core changes at all, then? (Under the #3 for now plan.)

Nope; the predictor I linked uses the per-CF latency metrics. The downside is 
accuracy.


 Remove PBSPredictor
 ---

 Key: CASSANDRA-5455
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5455
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
 Fix For: 2.0

 Attachments: 5455.txt


 It was a fun experiment, but it's unmaintained and the bar to understanding 
 what is going on is high.  Case in point: PBSTest has been failing 
 intermittently for some time now, possibly even since it was created.  Or 
 possibly not and it was a regression from a refactoring we did.  Who knows?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5455) Remove PBSPredictor

2013-05-24 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13666888#comment-13666888
 ] 

Peter Bailis commented on CASSANDRA-5455:
-

Okay. #1 will likely require more extensive code changes: basically, it'll 
require EstimatedHistograms for each of the servers acting as replicas for a 
given ColumnFamily and will require EstimatedHistogram tracing in the 
StorageProxy (to separate network-based latency from disk-based latency). Are 
these changes feasible?

re: a window of individual latency times, looking at the Metrics 
implementation of EstimatedHistogram, EstimatedHistogram.values() should 
provide a reasonable enough sample (especially since, as you mention, since it 
has other uses as well).

Perhaps the simplest strategy is to go with #3 for now but implement #1 in the 
future if there's interest. #3 is easy; I've already written an example 
external module to do RTT/2 predictions: 
https://github.com/pbailis/pbs-predictor/blob/9d31acd1667b08affa609278689b540d8e0380f5/pbspredictor/src/main/java/edu/berkeley/pbs/cassandra/CassandraLatencyTrace.java


 Remove PBSPredictor
 ---

 Key: CASSANDRA-5455
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5455
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
 Fix For: 2.0

 Attachments: 5455.txt


 It was a fun experiment, but it's unmaintained and the bar to understanding 
 what is going on is high.  Case in point: PBSTest has been failing 
 intermittently for some time now, possibly even since it was created.  Or 
 possibly not and it was a regression from a refactoring we did.  Who knows?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5455) Remove PBSPredictor

2013-05-14 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13657705#comment-13657705
 ] 

Peter Bailis commented on CASSANDRA-5455:
-

I've thought some more about different options for enabling metrics that are 
useful to both PBS (in an external module, if committers prefer) and anyone 
else who would be interested in finer-grained tracing.

To start, I *do* think that there is interest in a PBS module: if an eventually 
consistent store is returning stale data, how stale *is* it? Especially given 
that many (most?) Cassandra client libraries (including the Datastax 
java-driver) choose CL=ONE by default, I'd expect most users would prefer to 
understand how their choice of N,R, and W affects their latency and consistency.

I've been contacted by several Cassandra users who are interested in and/or 
using this functionality and understand that several developers are interested 
in PBS for Riak (notably, Andy Gross highlighted PBS in his 2013 RICON East 
keynote as a useful feature Basho would like). We originally chose Cassandra 
based on our familiarity with the code base and on early discussions with 
Jonathan but we plan to integrate PBS functionality into Riak with the help of 
their committers in the near-term future. So I do think there is interest, and, 
if you're curious about *use cases* for this functionality, Shivaram and I will 
be demoing PBS in Cassandra 1.2 at the upcoming SIGMOD 2013 conference. Our 
demo proposal sketches three application vignettes, including the obvious 
integration with monitoring tools but also automatically tuning N,R, and W and 
and providing consistency and latency SLAs:
http://www.bailis.org/papers/pbs-demo-sigmod2013.pdf

So, on the more technical side, there are two statistics that aren't currently 
measured (in trunk) that are required for accurate PBS predictions. First, PBS 
requires per-server statistics. Currently, the ColumnFamily RTT read/write 
latency metrics are aggregated across all servers. Second, PBS requires a 
measure how how long a read/write request takes before it is processed (i.e., 
how long it took from a client sending  each read/write request to when it was 
performed). This requires knowledge of one-way request latencies as well as 
read/write request-specific logic.

The 1.2 PBS patch provided both of these, aggregating by server and measuring 
the delay until processing. As Jonathan notes above, the latter measurement was 
conservative--the remote replica recorded the time that it enqueued its 
response rather than the exact moment a read or write was performed, namely for 
simplicity of code. The coordinating server could then closely approximate the 
return time as RTT-(remote timestamp).

Given these requirements and the current state of trunk, there are a few ways 
forward to support an external PBS prediction module:

1a.) Modify Cassandra to store latency statistics on a per-server and 
per-ColumnFamily granularity. As Rick Branson has pointed out, this is actually 
useful for monitoring other than PBS and can be used to detect slower replicas.

1b.) Modify Cassandra to store local processing times for requests (i.e., 
expand StorageMetrics, which currently does not track the time required to, 
say, fulfill a local read stage). This also has the benefit of understanding 
whether a Cassandra node is slow due to network or disk.

2.) Use the newly developed tracing functionality to reconstruct latencies for 
selected requests. Performing any sort of profiling will require tracing to be 
enabled (this appears to be somewhat heavyweight given the amount of data that 
is logged for each request , and reconstructing latencies from the trace table 
may be expensive (i.e., amount to a many-way self-join).

3.) Use RTT/2 based on ColumnFamily LatencyMetrics as an inaccurate but already 
supported external predictor.

4.) Leave the PBS latency sampling as in 1.2 but remove the PBS predictor code. 
Expose the latency samples via an Mbean for users like Rick who would benefit 
from it.

Proposal #1 has benefits for many users and seems a natural extension to the 
existing metrics but requires changes to the existing code. Proposal #2 puts 
substantial burden on an end-user and, without a fixed schema for the trace 
table, may amount to a fair bit of code munging. Proposal #3 is inaccurate but 
works on trunk. Proposal #4 is essentially 1.2.0 without the requirement to 
maintain any PBS-specific code and is a reasonable stop-gap before proposal #1. 
All of these proposals are amenable to sampling.

I'd welcome your feedback on these proposals and next steps.

 Remove PBSPredictor
 ---

 Key: CASSANDRA-5455
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5455
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan 

[jira] [Comment Edited] (CASSANDRA-5455) Remove PBSPredictor

2013-05-14 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13657705#comment-13657705
 ] 

Peter Bailis edited comment on CASSANDRA-5455 at 5/14/13 11:48 PM:
---

I've thought some more about different options for enabling metrics that are 
useful to both PBS (in an external module, if committers prefer) and anyone 
else who would be interested in finer-grained tracing.

To start, I *do* think that there is interest in a PBS module: if an eventually 
consistent store is returning stale data, how stale *is* it? Especially given 
that many (most?) Cassandra client libraries (including the Datastax 
java-driver) choose CL=ONE by default, I'd expect most users would prefer to 
understand how their choice of N,R, and W affects their latency and consistency.

I've been contacted by several Cassandra users who are interested in and/or 
using this functionality and understand that several developers are interested 
in PBS for Riak (notably, Andy Gross highlighted PBS in his 2013 RICON East 
keynote as a useful feature Basho would like). We originally chose Cassandra 
based on our familiarity with the code base and on early discussions with 
Jonathan but we plan to integrate PBS functionality into Riak with the help of 
their committers in the near-term future. So I do think there is interest, and, 
if you're curious about *use cases* for this functionality, Shivaram and I will 
be demoing PBS in Cassandra 1.2 at the upcoming SIGMOD 2013 conference. Our 
demo proposal sketches three application vignettes, including the obvious 
integration with monitoring tools but also automatically tuning N,R, and W and 
and providing consistency and latency SLAs:
http://www.bailis.org/papers/pbs-demo-sigmod2013.pdf

So, on the more technical side, there are two statistics that aren't currently 
measured (in trunk) that are required for accurate PBS predictions. First, PBS 
requires per-server statistics. Currently, the ColumnFamily RTT read/write 
latency metrics are aggregated across all servers. Second, PBS requires a 
measure how how long a read/write request takes before it is processed (i.e., 
how long it took from a client sending  each read/write request to when it was 
performed). This requires knowledge of one-way request latencies as well as 
read/write request-specific logic.

The 1.2 PBS patch provided both of these, aggregating by server and measuring 
the delay until processing. As Jonathan notes above, the latter measurement was 
conservative; the remote replica recorded the time that it enqueued its 
response rather than the exact moment a read or write was performed, namely for 
simplicity of code. The coordinating server could then closely approximate the 
return time as RTT-(remote timestamp).

Given these requirements and the current state of trunk, there are a few ways 
forward to support an external PBS prediction module:

1a.) Modify Cassandra to store latency statistics on a per-server and 
per-ColumnFamily granularity. As Rick Branson has pointed out, this is actually 
useful for monitoring other than PBS and can be used to detect slower replicas.

1b.) Modify Cassandra to store local processing times for requests (i.e., 
expand StorageMetrics, which currently does not track the time required to, 
say, fulfill a local read stage). This also has the benefit of understanding 
whether a Cassandra node is slow due to network or disk.

2.) Use the newly developed tracing functionality to reconstruct latencies for 
selected requests. Performing any sort of profiling will require tracing to be 
enabled (this appears to be somewhat heavyweight given the amount of data that 
is logged for each request , and reconstructing latencies from the trace table 
may be expensive (i.e., amount to a many-way self-join).

3.) Use RTT/2 based on ColumnFamily LatencyMetrics as an inaccurate but already 
supported external predictor.

4.) Leave the PBS latency sampling as in 1.2 but remove the PBS predictor code. 
Expose the latency samples via an Mbean for users like Rick who would benefit 
from it.

Proposal #1 has benefits for many users and seems a natural extension to the 
existing metrics but requires changes to the existing code. Proposal #2 puts 
substantial burden on an end-user and, without a fixed schema for the trace 
table, may amount to a fair bit of code munging. Proposal #3 is inaccurate but 
works on trunk. Proposal #4 is essentially 1.2.0 without the requirement to 
maintain any PBS-specific code and is a reasonable stop-gap before proposal #1. 
All of these proposals are amenable to sampling.

I'd welcome your feedback on these proposals and next steps.

  was (Author: pbailis):
I've thought some more about different options for enabling metrics that 
are useful to both PBS (in an external module, if committers prefer) and anyone 
else 

[jira] [Commented] (CASSANDRA-5455) Remove PBSPredictor

2013-05-10 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13654669#comment-13654669
 ] 

Peter Bailis commented on CASSANDRA-5455:
-

I am one of the original authors of CASSANDRA-4261 and was previously unaware 
of this change. I'm happy to make any changes to the tests, perform necessary 
code refactoring, or write additional documentation (but was unable to do so 
given the window between ticket creation and commit). That is, I will maintain 
this functionality given the opportunity to do so.

Could you please elaborate on what you'd like to see fixed? I suspect they'll 
be fairly straightforward, and, if anyone knows how to fix them, I (and 
Shivaram) probably do.

If the answer is that we don't want this functionality, then that's a 
different case. But that's not what I'm getting from this ticket or 
CASSANDRA-4261 or hearing from users.

 Remove PBSPredictor
 ---

 Key: CASSANDRA-5455
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5455
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
 Fix For: 2.0

 Attachments: 5455.txt


 It was a fun experiment, but it's unmaintained and the bar to understanding 
 what is going on is high.  Case in point: PBSTest has been failing 
 intermittently for some time now, possibly even since it was created.  Or 
 possibly not and it was a regression from a refactoring we did.  Who knows?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (CASSANDRA-5455) Remove PBSPredictor

2013-05-10 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13654669#comment-13654669
 ] 

Peter Bailis edited comment on CASSANDRA-5455 at 5/10/13 5:58 PM:
--

I am one of the original authors of CASSANDRA-4261 and was previously unaware 
of this change. I'm happy to make any changes to the tests, perform necessary 
code refactoring, or write additional documentation (but was unable to do so 
given the window between ticket creation and commit). That is, I will maintain 
this functionality given the opportunity to do so.

Could you please elaborate on what you'd like to see fixed? I suspect they'll 
be fairly straightforward, and, if anyone knows how to fix them, I (and 
Shivaram) probably do.

If the answer is that we don't want this functionality, then that's a 
different case. But that's not what I'm getting from this ticket or 
CASSANDRA-4261 or am hearing from users.

  was (Author: pbailis):
I am one of the original authors of CASSANDRA-4261 and was previously 
unaware of this change. I'm happy to make any changes to the tests, perform 
necessary code refactoring, or write additional documentation (but was unable 
to do so given the window between ticket creation and commit). That is, I will 
maintain this functionality given the opportunity to do so.

Could you please elaborate on what you'd like to see fixed? I suspect they'll 
be fairly straightforward, and, if anyone knows how to fix them, I (and 
Shivaram) probably do.

If the answer is that we don't want this functionality, then that's a 
different case. But that's not what I'm getting from this ticket or 
CASSANDRA-4261 or hearing from users.
  
 Remove PBSPredictor
 ---

 Key: CASSANDRA-5455
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5455
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
 Fix For: 2.0

 Attachments: 5455.txt


 It was a fun experiment, but it's unmaintained and the bar to understanding 
 what is going on is high.  Case in point: PBSTest has been failing 
 intermittently for some time now, possibly even since it was created.  Or 
 possibly not and it was a regression from a refactoring we did.  Who knows?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (CASSANDRA-5455) Remove PBSPredictor

2013-05-10 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13654669#comment-13654669
 ] 

Peter Bailis edited comment on CASSANDRA-5455 at 5/10/13 5:59 PM:
--

I am one of the original authors of CASSANDRA-4261 and was previously unaware 
of this change. I'm happy to make any changes to the tests, perform necessary 
code refactoring, or write additional documentation (but was unable to do so 
given the window between ticket creation and commit). That is, I will maintain 
this functionality given the opportunity to do so.

Could you please elaborate on what you'd like to see fixed? I suspect it'll be 
fairly straightforward, and, if anyone knows how to make the changes, I (and 
Shivaram) probably do.

If the answer is that we don't want this functionality, then that's a 
different case. But that's not what I'm getting from this ticket or 
CASSANDRA-4261 or am hearing from users.

  was (Author: pbailis):
I am one of the original authors of CASSANDRA-4261 and was previously 
unaware of this change. I'm happy to make any changes to the tests, perform 
necessary code refactoring, or write additional documentation (but was unable 
to do so given the window between ticket creation and commit). That is, I will 
maintain this functionality given the opportunity to do so.

Could you please elaborate on what you'd like to see fixed? I suspect they'll 
be fairly straightforward, and, if anyone knows how to fix them, I (and 
Shivaram) probably do.

If the answer is that we don't want this functionality, then that's a 
different case. But that's not what I'm getting from this ticket or 
CASSANDRA-4261 or am hearing from users.
  
 Remove PBSPredictor
 ---

 Key: CASSANDRA-5455
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5455
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
 Fix For: 2.0

 Attachments: 5455.txt


 It was a fun experiment, but it's unmaintained and the bar to understanding 
 what is going on is high.  Case in point: PBSTest has been failing 
 intermittently for some time now, possibly even since it was created.  Or 
 possibly not and it was a regression from a refactoring we did.  Who knows?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5455) Remove PBSPredictor

2013-05-10 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13654761#comment-13654761
 ] 

Peter Bailis commented on CASSANDRA-5455:
-

I don't believe that the StorageProxy tracks the latencies according to the 
same granularity. For example, the PBS latency tracking will record both how 
long it took for the request to reach a remote replica and be processed as well 
as how long the return trip takes.

That said, it shouldn't be too difficult to either 1.) simply expose the 
recorded latencies via an optional module providing a finer granularity 
tracing interface via JMX [thereby removing all actual prediction code but 
keeping the logging in place for folks who might want this] or 2.) modifying 
StorageProxy to log these latencies in addition to the coarser granularity 
measurements it already takes.

I can provide assistance with either.

 Remove PBSPredictor
 ---

 Key: CASSANDRA-5455
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5455
 Project: Cassandra
  Issue Type: Bug
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
 Fix For: 2.0

 Attachments: 5455.txt


 It was a fun experiment, but it's unmaintained and the bar to understanding 
 what is going on is high.  Case in point: PBSTest has been failing 
 intermittently for some time now, possibly even since it was created.  Or 
 possibly not and it was a regression from a refactoring we did.  Who knows?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-09-17 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457525#comment-13457525
 ] 

Peter Bailis commented on CASSANDRA-4261:
-

Jonathan,

Thanks for the rebase! Looking at the updated code, we can still log the start 
of the operation in MessagingService.sendRR() but move the reply timestamp 
logging from the ResponseVerbHandler to MessagingService.receive(). This won't 
be too bad, and we can filter the MessageIn instances passed to PBSPredictor by 
both the verb type and/or by id. Does that make sense?

Also, re: CASSANDRA-4009, it should be possible to use this code, but there are 
two issues:
1.) We need finer-granularity tracing than what is currently implemented. We 
need to know how long it takes to hit a given node and not just the end-to-end 
round-trip latencies.
2.) Using a histogram instead of keeping around the actual latencies will 
reduce the fidelity of the predictions. The impact of this depends on the 
bucket size and distribution.

Let us know what you think!

 [patch] Support consistency-latency prediction in nodetool
 --

 Key: CASSANDRA-4261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Affects Versions: 1.2.0 beta 1
Reporter: Peter Bailis
 Attachments: 4261-v4.txt, demo-pbs-v3.sh, pbs-nodetool-v3.patch


 h3. Introduction
 Cassandra supports a variety of replication configurations: ReplicationFactor 
 is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
 {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
 consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
 {{THREE}}. What should users choose?
 This patch provides a latency-consistency analysis within {{nodetool}}. Users 
 can accurately predict Cassandra's behavior in their production environments 
 without interfering with performance.
 What's the probability that we'll read a write t seconds after it completes? 
 What about reading one of the last k writes? This patch provides answers via 
 {{nodetool predictconsistency}}:
 {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
 \\ \\
 {code:title=Example output|borderStyle=solid}
 //N == ReplicationFactor
 //R == read ConsistencyLevel
 //W == write ConsistencyLevel
 user@test:$ nodetool predictconsistency 3 100 1
 Performing consistency prediction
 100ms after a given write, with maximum version staleness of k=1
 N=3, R=1, W=1
 Probability of consistent reads: 0.678900
 Average read latency: 5.377900ms (99.900th %ile 40ms)
 Average write latency: 36.971298ms (99.900th %ile 294ms)
 N=3, R=1, W=2
 Probability of consistent reads: 0.791600
 Average read latency: 5.372500ms (99.900th %ile 39ms)
 Average write latency: 303.630890ms (99.900th %ile 357ms)
 N=3, R=1, W=3
 Probability of consistent reads: 1.00
 Average read latency: 5.426600ms (99.900th %ile 42ms)
 Average write latency: 1382.650879ms (99.900th %ile 629ms)
 N=3, R=2, W=1
 Probability of consistent reads: 0.915800
 Average read latency: 11.091000ms (99.900th %ile 348ms)
 Average write latency: 42.663101ms (99.900th %ile 284ms)
 N=3, R=2, W=2
 Probability of consistent reads: 1.00
 Average read latency: 10.606800ms (99.900th %ile 263ms)
 Average write latency: 310.117615ms (99.900th %ile 335ms)
 N=3, R=3, W=1
 Probability of consistent reads: 1.00
 Average read latency: 52.657501ms (99.900th %ile 565ms)
 Average write latency: 39.949799ms (99.900th %ile 237ms)
 {code}
 h3. Demo
 Here's an example scenario you can run using 
 [ccm|https://github.com/pcmanus/ccm]. The prediction is fast:
 {code:borderStyle=solid}
 cd cassandra-source-dir with patch applied
 ant
 ccm create consistencytest --cassandra-dir=. 
 ccm populate -n 5
 ccm start
 # if start fails, you might need to initialize more loopback interfaces
 # e.g., sudo ifconfig lo0 alias 127.0.0.2
 # use stress to get some sample latency data
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read
 bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
 {code}
 h3. What and Why
 We've implemented [Probabilistically Bounded 
 Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
 consistency-latency trade-offs within Cassandra. Our 
 [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
 of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
 addition to profiling our own Cassandra deployments. In our experience, 
 prediction is both accurate and much more lightweight than profiling and 
 manually testing each possible 

[jira] [Commented] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-08-18 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13437434#comment-13437434
 ] 

Peter Bailis commented on CASSANDRA-4261:
-

Is there anything else you'd like to have us do for the patch?

 [patch] Support consistency-latency prediction in nodetool
 --

 Key: CASSANDRA-4261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Affects Versions: 1.2.0
Reporter: Peter Bailis
 Attachments: demo-pbs-v3.sh, pbs-nodetool-v3.patch


 h3. Introduction
 Cassandra supports a variety of replication configurations: ReplicationFactor 
 is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
 {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
 consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
 {{THREE}}. What should users choose?
 This patch provides a latency-consistency analysis within {{nodetool}}. Users 
 can accurately predict Cassandra's behavior in their production environments 
 without interfering with performance.
 What's the probability that we'll read a write t seconds after it completes? 
 What about reading one of the last k writes? This patch provides answers via 
 {{nodetool predictconsistency}}:
 {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
 \\ \\
 {code:title=Example output|borderStyle=solid}
 //N == ReplicationFactor
 //R == read ConsistencyLevel
 //W == write ConsistencyLevel
 user@test:$ nodetool predictconsistency 3 100 1
 Performing consistency prediction
 100ms after a given write, with maximum version staleness of k=1
 N=3, R=1, W=1
 Probability of consistent reads: 0.678900
 Average read latency: 5.377900ms (99.900th %ile 40ms)
 Average write latency: 36.971298ms (99.900th %ile 294ms)
 N=3, R=1, W=2
 Probability of consistent reads: 0.791600
 Average read latency: 5.372500ms (99.900th %ile 39ms)
 Average write latency: 303.630890ms (99.900th %ile 357ms)
 N=3, R=1, W=3
 Probability of consistent reads: 1.00
 Average read latency: 5.426600ms (99.900th %ile 42ms)
 Average write latency: 1382.650879ms (99.900th %ile 629ms)
 N=3, R=2, W=1
 Probability of consistent reads: 0.915800
 Average read latency: 11.091000ms (99.900th %ile 348ms)
 Average write latency: 42.663101ms (99.900th %ile 284ms)
 N=3, R=2, W=2
 Probability of consistent reads: 1.00
 Average read latency: 10.606800ms (99.900th %ile 263ms)
 Average write latency: 310.117615ms (99.900th %ile 335ms)
 N=3, R=3, W=1
 Probability of consistent reads: 1.00
 Average read latency: 52.657501ms (99.900th %ile 565ms)
 Average write latency: 39.949799ms (99.900th %ile 237ms)
 {code}
 h3. Demo
 Here's an example scenario you can run using 
 [ccm|https://github.com/pcmanus/ccm]. The prediction is fast:
 {code:borderStyle=solid}
 cd cassandra-source-dir with patch applied
 ant
 ccm create consistencytest --cassandra-dir=. 
 ccm populate -n 5
 ccm start
 # if start fails, you might need to initialize more loopback interfaces
 # e.g., sudo ifconfig lo0 alias 127.0.0.2
 # use stress to get some sample latency data
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read
 bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
 {code}
 h3. What and Why
 We've implemented [Probabilistically Bounded 
 Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
 consistency-latency trade-offs within Cassandra. Our 
 [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
 of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
 addition to profiling our own Cassandra deployments. In our experience, 
 prediction is both accurate and much more lightweight than profiling and 
 manually testing each possible replication configuration (especially in 
 production!).
 This analysis is important for the many users we've talked to and heard about 
 who use partial quorum operation (e.g., non-{{QUORUM}} 
 {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely 
 depends on their runtime environment and, short of profiling in production, 
 there's no existing way to answer these questions. (Keep in mind, Cassandra 
 defaults to CL={{ONE}}, meaning users don't know how stale their data will 
 be.)
 We outline limitations of the current approach after describing how it's 
 done. We believe that this is a useful feature that can provide guidance and 
 fairly accurate estimation for most users.
 h3. Interface
 This patch allows users to perform this prediction in production using 
 {{nodetool}}.
 Users enable tracing 

[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-07-12 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
{{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
{{THREE}}. What should users choose?

This patch provides a latency-consistency analysis within {{nodetool}}. Users 
can accurately predict Cassandra's behavior in their production environments 
without interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? This patch provides answers via 
{{nodetool predictconsistency}}:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
Performing consistency prediction
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.678900
Average read latency: 5.377900ms (99.900th %ile 40ms)
Average write latency: 36.971298ms (99.900th %ile 294ms)

N=3, R=1, W=2
Probability of consistent reads: 0.791600
Average read latency: 5.372500ms (99.900th %ile 39ms)
Average write latency: 303.630890ms (99.900th %ile 357ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 5.426600ms (99.900th %ile 42ms)
Average write latency: 1382.650879ms (99.900th %ile 629ms)

N=3, R=2, W=1
Probability of consistent reads: 0.915800
Average read latency: 11.091000ms (99.900th %ile 348ms)
Average write latency: 42.663101ms (99.900th %ile 284ms)

N=3, R=2, W=2
Probability of consistent reads: 1.00
Average read latency: 10.606800ms (99.900th %ile 263ms)
Average write latency: 310.117615ms (99.900th %ile 335ms)

N=3, R=3, W=1
Probability of consistent reads: 1.00
Average read latency: 52.657501ms (99.900th %ile 565ms)
Average write latency: 39.949799ms (99.900th %ile 237ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than profiling and 
manually testing each possible replication configuration (especially in 
production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). 
Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime 
environment and, short of profiling in production, there's no existing way to 
answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, 
meaning users don't know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by calling 
{{enableConsistencyPredictionLogging()}} in the {{PBSPredictorMBean}}.

Cassandra logs a variable number of latencies (controllable via JMX 
({{setMaxLoggedLatenciesForConsistencyPrediction(int maxLogged)}}, default: 
1). Each latency is 8 bytes, and there are 4 distributions we require, so 
the space overhead is {{32*logged_latencies}} bytes of memory for the 
predicting node.

{{nodetool predictconsistency}} predicts the latency 

[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-07-12 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
{{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
{{THREE}}. What should users choose?

This patch provides a latency-consistency analysis within {{nodetool}}. Users 
can accurately predict Cassandra's behavior in their production environments 
without interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? This patch provides answers via 
{{nodetool predictconsistency}}:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
Performing consistency prediction
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.678900
Average read latency: 5.377900ms (99.900th %ile 40ms)
Average write latency: 36.971298ms (99.900th %ile 294ms)

N=3, R=1, W=2
Probability of consistent reads: 0.791600
Average read latency: 5.372500ms (99.900th %ile 39ms)
Average write latency: 303.630890ms (99.900th %ile 357ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 5.426600ms (99.900th %ile 42ms)
Average write latency: 1382.650879ms (99.900th %ile 629ms)

N=3, R=2, W=1
Probability of consistent reads: 0.915800
Average read latency: 11.091000ms (99.900th %ile 348ms)
Average write latency: 42.663101ms (99.900th %ile 284ms)

N=3, R=2, W=2
Probability of consistent reads: 1.00
Average read latency: 10.606800ms (99.900th %ile 263ms)
Average write latency: 310.117615ms (99.900th %ile 335ms)

N=3, R=3, W=1
Probability of consistent reads: 1.00
Average read latency: 52.657501ms (99.900th %ile 565ms)
Average write latency: 39.949799ms (99.900th %ile 237ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than profiling and 
manually testing each possible replication configuration (especially in 
production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). 
Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime 
environment and, short of profiling in production, there's no existing way to 
answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, 
meaning users don't know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by calling 
{{enableConsistencyPredictionLogging()}} in the {{PBSPredictorMBean}}.

Cassandra logs a variable number of latencies (controllable via JMX 
({{setMaxLoggedLatenciesForConsistencyPrediction(int maxLogged)}}, default: 
1). Each latency is 8 bytes, and there are 4 distributions we require, so 
the space overhead is {{32*logged_latencies}} bytes of memory for the 
predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 
{{setNumberTrialsForConsistencyPrediction(int numTrials)}} Monte Carlo 

[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-07-10 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Attachment: (was: pbs-nodetool-v2.patch)

 [patch] Support consistency-latency prediction in nodetool
 --

 Key: CASSANDRA-4261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Affects Versions: 1.2
Reporter: Peter Bailis
 Attachments: demo-pbs-v3.sh, pbs-nodetool-v3.patch


 h3. Introduction
 Cassandra supports a variety of replication configurations: ReplicationFactor 
 is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
 {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
 consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
 {{THREE}}. What should users choose?
 This patch provides a latency-consistency analysis within {{nodetool}}. Users 
 can accurately predict Cassandra's behavior in their production environments 
 without interfering with performance.
 What's the probability that we'll read a write t seconds after it completes? 
 What about reading one of the last k writes? This patch provides answers via 
 {{nodetool predictconsistency}}:
 {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
 \\ \\
 {code:title=Example output|borderStyle=solid}
 //N == ReplicationFactor
 //R == read ConsistencyLevel
 //W == write ConsistencyLevel
 user@test:$ nodetool predictconsistency 3 100 1
 Performing consistency prediction
 100ms after a given write, with maximum version staleness of k=1
 N=3, R=1, W=1
 Probability of consistent reads: 0.678900
 Average read latency: 5.377900ms (99.900th %ile 40ms)
 Average write latency: 36.971298ms (99.900th %ile 294ms)
 N=3, R=1, W=2
 Probability of consistent reads: 0.791600
 Average read latency: 5.372500ms (99.900th %ile 39ms)
 Average write latency: 303.630890ms (99.900th %ile 357ms)
 N=3, R=1, W=3
 Probability of consistent reads: 1.00
 Average read latency: 5.426600ms (99.900th %ile 42ms)
 Average write latency: 1382.650879ms (99.900th %ile 629ms)
 N=3, R=2, W=1
 Probability of consistent reads: 0.915800
 Average read latency: 11.091000ms (99.900th %ile 348ms)
 Average write latency: 42.663101ms (99.900th %ile 284ms)
 N=3, R=2, W=2
 Probability of consistent reads: 1.00
 Average read latency: 10.606800ms (99.900th %ile 263ms)
 Average write latency: 310.117615ms (99.900th %ile 335ms)
 N=3, R=3, W=1
 Probability of consistent reads: 1.00
 Average read latency: 52.657501ms (99.900th %ile 565ms)
 Average write latency: 39.949799ms (99.900th %ile 237ms)
 {code}
 h3. Demo
 Here's an example scenario you can run using 
 [ccm|https://github.com/pcmanus/ccm]. The prediction is fast:
 {code:borderStyle=solid}
 cd cassandra-source-dir with patch applied
 ant
 # turn on consistency logging
 sed -i .bak 's/log_latencies_for_consistency_prediction: 
 false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml
 ccm create consistencytest --cassandra-dir=. 
 ccm populate -n 5
 ccm start
 # if start fails, you might need to initialize more loopback interfaces
 # e.g., sudo ifconfig lo0 alias 127.0.0.2
 # use stress to get some sample latency data
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read
 bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
 {code}
 h3. What and Why
 We've implemented [Probabilistically Bounded 
 Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
 consistency-latency trade-offs within Cassandra. Our 
 [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
 of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
 addition to profiling our own Cassandra deployments. In our experience, 
 prediction is both accurate and much more lightweight than profiling and 
 manually testing each possible replication configuration (especially in 
 production!).
 This analysis is important for the many users we've talked to and heard about 
 who use partial quorum operation (e.g., non-{{QUORUM}} 
 {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely 
 depends on their runtime environment and, short of profiling in production, 
 there's no existing way to answer these questions. (Keep in mind, Cassandra 
 defaults to CL={{ONE}}, meaning users don't know how stale their data will 
 be.)
 We outline limitations of the current approach after describing how it's 
 done. We believe that this is a useful feature that can provide guidance and 
 fairly accurate estimation for most users.
 h3. Interface
 This patch allows users to 

[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-07-10 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Attachment: (was: demo-pbs-v2.sh)

 [patch] Support consistency-latency prediction in nodetool
 --

 Key: CASSANDRA-4261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Affects Versions: 1.2
Reporter: Peter Bailis
 Attachments: demo-pbs-v3.sh, pbs-nodetool-v3.patch


 h3. Introduction
 Cassandra supports a variety of replication configurations: ReplicationFactor 
 is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
 {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
 consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
 {{THREE}}. What should users choose?
 This patch provides a latency-consistency analysis within {{nodetool}}. Users 
 can accurately predict Cassandra's behavior in their production environments 
 without interfering with performance.
 What's the probability that we'll read a write t seconds after it completes? 
 What about reading one of the last k writes? This patch provides answers via 
 {{nodetool predictconsistency}}:
 {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
 \\ \\
 {code:title=Example output|borderStyle=solid}
 //N == ReplicationFactor
 //R == read ConsistencyLevel
 //W == write ConsistencyLevel
 user@test:$ nodetool predictconsistency 3 100 1
 Performing consistency prediction
 100ms after a given write, with maximum version staleness of k=1
 N=3, R=1, W=1
 Probability of consistent reads: 0.678900
 Average read latency: 5.377900ms (99.900th %ile 40ms)
 Average write latency: 36.971298ms (99.900th %ile 294ms)
 N=3, R=1, W=2
 Probability of consistent reads: 0.791600
 Average read latency: 5.372500ms (99.900th %ile 39ms)
 Average write latency: 303.630890ms (99.900th %ile 357ms)
 N=3, R=1, W=3
 Probability of consistent reads: 1.00
 Average read latency: 5.426600ms (99.900th %ile 42ms)
 Average write latency: 1382.650879ms (99.900th %ile 629ms)
 N=3, R=2, W=1
 Probability of consistent reads: 0.915800
 Average read latency: 11.091000ms (99.900th %ile 348ms)
 Average write latency: 42.663101ms (99.900th %ile 284ms)
 N=3, R=2, W=2
 Probability of consistent reads: 1.00
 Average read latency: 10.606800ms (99.900th %ile 263ms)
 Average write latency: 310.117615ms (99.900th %ile 335ms)
 N=3, R=3, W=1
 Probability of consistent reads: 1.00
 Average read latency: 52.657501ms (99.900th %ile 565ms)
 Average write latency: 39.949799ms (99.900th %ile 237ms)
 {code}
 h3. Demo
 Here's an example scenario you can run using 
 [ccm|https://github.com/pcmanus/ccm]. The prediction is fast:
 {code:borderStyle=solid}
 cd cassandra-source-dir with patch applied
 ant
 # turn on consistency logging
 sed -i .bak 's/log_latencies_for_consistency_prediction: 
 false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml
 ccm create consistencytest --cassandra-dir=. 
 ccm populate -n 5
 ccm start
 # if start fails, you might need to initialize more loopback interfaces
 # e.g., sudo ifconfig lo0 alias 127.0.0.2
 # use stress to get some sample latency data
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read
 bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
 {code}
 h3. What and Why
 We've implemented [Probabilistically Bounded 
 Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
 consistency-latency trade-offs within Cassandra. Our 
 [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
 of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
 addition to profiling our own Cassandra deployments. In our experience, 
 prediction is both accurate and much more lightweight than profiling and 
 manually testing each possible replication configuration (especially in 
 production!).
 This analysis is important for the many users we've talked to and heard about 
 who use partial quorum operation (e.g., non-{{QUORUM}} 
 {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely 
 depends on their runtime environment and, short of profiling in production, 
 there's no existing way to answer these questions. (Keep in mind, Cassandra 
 defaults to CL={{ONE}}, meaning users don't know how stale their data will 
 be.)
 We outline limitations of the current approach after describing how it's 
 done. We believe that this is a useful feature that can provide guidance and 
 fairly accurate estimation for most users.
 h3. Interface
 This patch allows users to perform 

[jira] [Commented] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-06-07 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291541#comment-13291541
 ] 

Peter Bailis commented on CASSANDRA-4261:
-

I agree that JMX would better. I'll work on changing this configuration and 
will post performance numbers shortly. I should be able to have this done in a 
week or so (latency due to my schedule, not due to task difficulty).

 [patch] Support consistency-latency prediction in nodetool
 --

 Key: CASSANDRA-4261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Affects Versions: 1.2
Reporter: Peter Bailis
 Attachments: demo-pbs-v2.sh, pbs-nodetool-v2.patch


 h3. Introduction
 Cassandra supports a variety of replication configurations: ReplicationFactor 
 is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
 {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
 consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
 {{THREE}}. What should users choose?
 This patch provides a latency-consistency analysis within {{nodetool}}. Users 
 can accurately predict Cassandra's behavior in their production environments 
 without interfering with performance.
 What's the probability that we'll read a write t seconds after it completes? 
 What about reading one of the last k writes? This patch provides answers via 
 {{nodetool predictconsistency}}:
 {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
 \\ \\
 {code:title=Example output|borderStyle=solid}
 //N == ReplicationFactor
 //R == read ConsistencyLevel
 //W == write ConsistencyLevel
 user@test:$ nodetool predictconsistency 3 100 1
 Performing consistency prediction
 100ms after a given write, with maximum version staleness of k=1
 N=3, R=1, W=1
 Probability of consistent reads: 0.678900
 Average read latency: 5.377900ms (99.900th %ile 40ms)
 Average write latency: 36.971298ms (99.900th %ile 294ms)
 N=3, R=1, W=2
 Probability of consistent reads: 0.791600
 Average read latency: 5.372500ms (99.900th %ile 39ms)
 Average write latency: 303.630890ms (99.900th %ile 357ms)
 N=3, R=1, W=3
 Probability of consistent reads: 1.00
 Average read latency: 5.426600ms (99.900th %ile 42ms)
 Average write latency: 1382.650879ms (99.900th %ile 629ms)
 N=3, R=2, W=1
 Probability of consistent reads: 0.915800
 Average read latency: 11.091000ms (99.900th %ile 348ms)
 Average write latency: 42.663101ms (99.900th %ile 284ms)
 N=3, R=2, W=2
 Probability of consistent reads: 1.00
 Average read latency: 10.606800ms (99.900th %ile 263ms)
 Average write latency: 310.117615ms (99.900th %ile 335ms)
 N=3, R=3, W=1
 Probability of consistent reads: 1.00
 Average read latency: 52.657501ms (99.900th %ile 565ms)
 Average write latency: 39.949799ms (99.900th %ile 237ms)
 {code}
 h3. Demo
 Here's an example scenario you can run using 
 [ccm|https://github.com/pcmanus/ccm]. The prediction is fast:
 {code:borderStyle=solid}
 cd cassandra-source-dir with patch applied
 ant
 # turn on consistency logging
 sed -i .bak 's/log_latencies_for_consistency_prediction: 
 false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml
 ccm create consistencytest --cassandra-dir=. 
 ccm populate -n 5
 ccm start
 # if start fails, you might need to initialize more loopback interfaces
 # e.g., sudo ifconfig lo0 alias 127.0.0.2
 # use stress to get some sample latency data
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read
 bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
 {code}
 h3. What and Why
 We've implemented [Probabilistically Bounded 
 Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
 consistency-latency trade-offs within Cassandra. Our 
 [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
 of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
 addition to profiling our own Cassandra deployments. In our experience, 
 prediction is both accurate and much more lightweight than profiling and 
 manually testing each possible replication configuration (especially in 
 production!).
 This analysis is important for the many users we've talked to and heard about 
 who use partial quorum operation (e.g., non-{{QUORUM}} 
 {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely 
 depends on their runtime environment and, short of profiling in production, 
 there's no existing way to answer these questions. (Keep in mind, Cassandra 
 defaults to CL={{ONE}}, meaning users don't know how stale their data will 
 

[jira] [Comment Edited] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-06-07 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291541#comment-13291541
 ] 

Peter Bailis edited comment on CASSANDRA-4261 at 6/8/12 4:07 AM:
-

I agree that JMX would work better. I'll work on changing this configuration 
and will post performance numbers shortly. I should be able to have this done 
in a week or so (latency due to my schedule, not due to task difficulty).

  was (Author: pbailis):
I agree that JMX would better. I'll work on changing this configuration and 
will post performance numbers shortly. I should be able to have this done in a 
week or so (latency due to my schedule, not due to task difficulty).
  
 [patch] Support consistency-latency prediction in nodetool
 --

 Key: CASSANDRA-4261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Affects Versions: 1.2
Reporter: Peter Bailis
 Attachments: demo-pbs-v2.sh, pbs-nodetool-v2.patch


 h3. Introduction
 Cassandra supports a variety of replication configurations: ReplicationFactor 
 is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
 {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
 consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
 {{THREE}}. What should users choose?
 This patch provides a latency-consistency analysis within {{nodetool}}. Users 
 can accurately predict Cassandra's behavior in their production environments 
 without interfering with performance.
 What's the probability that we'll read a write t seconds after it completes? 
 What about reading one of the last k writes? This patch provides answers via 
 {{nodetool predictconsistency}}:
 {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
 \\ \\
 {code:title=Example output|borderStyle=solid}
 //N == ReplicationFactor
 //R == read ConsistencyLevel
 //W == write ConsistencyLevel
 user@test:$ nodetool predictconsistency 3 100 1
 Performing consistency prediction
 100ms after a given write, with maximum version staleness of k=1
 N=3, R=1, W=1
 Probability of consistent reads: 0.678900
 Average read latency: 5.377900ms (99.900th %ile 40ms)
 Average write latency: 36.971298ms (99.900th %ile 294ms)
 N=3, R=1, W=2
 Probability of consistent reads: 0.791600
 Average read latency: 5.372500ms (99.900th %ile 39ms)
 Average write latency: 303.630890ms (99.900th %ile 357ms)
 N=3, R=1, W=3
 Probability of consistent reads: 1.00
 Average read latency: 5.426600ms (99.900th %ile 42ms)
 Average write latency: 1382.650879ms (99.900th %ile 629ms)
 N=3, R=2, W=1
 Probability of consistent reads: 0.915800
 Average read latency: 11.091000ms (99.900th %ile 348ms)
 Average write latency: 42.663101ms (99.900th %ile 284ms)
 N=3, R=2, W=2
 Probability of consistent reads: 1.00
 Average read latency: 10.606800ms (99.900th %ile 263ms)
 Average write latency: 310.117615ms (99.900th %ile 335ms)
 N=3, R=3, W=1
 Probability of consistent reads: 1.00
 Average read latency: 52.657501ms (99.900th %ile 565ms)
 Average write latency: 39.949799ms (99.900th %ile 237ms)
 {code}
 h3. Demo
 Here's an example scenario you can run using 
 [ccm|https://github.com/pcmanus/ccm]. The prediction is fast:
 {code:borderStyle=solid}
 cd cassandra-source-dir with patch applied
 ant
 # turn on consistency logging
 sed -i .bak 's/log_latencies_for_consistency_prediction: 
 false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml
 ccm create consistencytest --cassandra-dir=. 
 ccm populate -n 5
 ccm start
 # if start fails, you might need to initialize more loopback interfaces
 # e.g., sudo ifconfig lo0 alias 127.0.0.2
 # use stress to get some sample latency data
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read
 bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
 {code}
 h3. What and Why
 We've implemented [Probabilistically Bounded 
 Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
 consistency-latency trade-offs within Cassandra. Our 
 [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
 of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
 addition to profiling our own Cassandra deployments. In our experience, 
 prediction is both accurate and much more lightweight than profiling and 
 manually testing each possible replication configuration (especially in 
 production!).
 This analysis is important for the many users we've talked to and heard about 
 who use partial quorum operation 

[jira] [Commented] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-06-06 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290801#comment-13290801
 ] 

Peter Bailis commented on CASSANDRA-4261:
-

re: performance, we haven't noticed anything, but we also haven't done much 
serious load testing. I agree that there shouldn't be much overhead, and the 
only thing I can think of possibly being a problem would be contention in the 
ConcurrentHashMap that maps requestIDs to lists of latencies. However, this 
*really* shouldn't be a problem. To quantify this, I can run and report numbers 
for something like stress on an EC2 cluster. Would that work? Are there 
existing performance regression tests? If you have a preference for a different 
workload or configuration, let me know.

re: the other config file settings, 
max_logged_latencies_for_consistency_prediction is possibly useful. Because we 
use a LRU policy for the latency logging, the number of latencies logged 
indirectly determines the window of time for sampling. If you want to capture a 
longer trace of network behavior, you'd increase the window, and if you wanted 
to do some on-the-fly tuning, you might shorten it. However, we could easily 
set this as a runtime configuration via nodetool instead.

 [patch] Support consistency-latency prediction in nodetool
 --

 Key: CASSANDRA-4261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Affects Versions: 1.2
Reporter: Peter Bailis
 Attachments: demo-pbs-v2.sh, pbs-nodetool-v2.patch


 h3. Introduction
 Cassandra supports a variety of replication configurations: ReplicationFactor 
 is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
 {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
 consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
 {{THREE}}. What should users choose?
 This patch provides a latency-consistency analysis within {{nodetool}}. Users 
 can accurately predict Cassandra's behavior in their production environments 
 without interfering with performance.
 What's the probability that we'll read a write t seconds after it completes? 
 What about reading one of the last k writes? This patch provides answers via 
 {{nodetool predictconsistency}}:
 {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
 \\ \\
 {code:title=Example output|borderStyle=solid}
 //N == ReplicationFactor
 //R == read ConsistencyLevel
 //W == write ConsistencyLevel
 user@test:$ nodetool predictconsistency 3 100 1
 Performing consistency prediction
 100ms after a given write, with maximum version staleness of k=1
 N=3, R=1, W=1
 Probability of consistent reads: 0.678900
 Average read latency: 5.377900ms (99.900th %ile 40ms)
 Average write latency: 36.971298ms (99.900th %ile 294ms)
 N=3, R=1, W=2
 Probability of consistent reads: 0.791600
 Average read latency: 5.372500ms (99.900th %ile 39ms)
 Average write latency: 303.630890ms (99.900th %ile 357ms)
 N=3, R=1, W=3
 Probability of consistent reads: 1.00
 Average read latency: 5.426600ms (99.900th %ile 42ms)
 Average write latency: 1382.650879ms (99.900th %ile 629ms)
 N=3, R=2, W=1
 Probability of consistent reads: 0.915800
 Average read latency: 11.091000ms (99.900th %ile 348ms)
 Average write latency: 42.663101ms (99.900th %ile 284ms)
 N=3, R=2, W=2
 Probability of consistent reads: 1.00
 Average read latency: 10.606800ms (99.900th %ile 263ms)
 Average write latency: 310.117615ms (99.900th %ile 335ms)
 N=3, R=3, W=1
 Probability of consistent reads: 1.00
 Average read latency: 52.657501ms (99.900th %ile 565ms)
 Average write latency: 39.949799ms (99.900th %ile 237ms)
 {code}
 h3. Demo
 Here's an example scenario you can run using 
 [ccm|https://github.com/pcmanus/ccm]. The prediction is fast:
 {code:borderStyle=solid}
 cd cassandra-source-dir with patch applied
 ant
 # turn on consistency logging
 sed -i .bak 's/log_latencies_for_consistency_prediction: 
 false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml
 ccm create consistencytest --cassandra-dir=. 
 ccm populate -n 5
 ccm start
 # if start fails, you might need to initialize more loopback interfaces
 # e.g., sudo ifconfig lo0 alias 127.0.0.2
 # use stress to get some sample latency data
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read
 bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
 {code}
 h3. What and Why
 We've implemented [Probabilistically Bounded 
 Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
 consistency-latency trade-offs within Cassandra. Our 
 [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in 

[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-06-05 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Attachment: pbs-nodetool-v2.patch

Update to patch. Fixed a bug where, if two reads happen with the same latency, 
we make sure to treat them separately. This required a two-line change that 
effectively excludes reads we've considered for a given trial. Also added a 
check in the test for this case.

 [patch] Support consistency-latency prediction in nodetool
 --

 Key: CASSANDRA-4261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Affects Versions: 1.2
Reporter: Peter Bailis
 Attachments: demo-pbs.sh, pbs-nodetool-v2.patch


 h3. Introduction
 Cassandra supports a variety of replication configurations: ReplicationFactor 
 is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
 {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
 consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
 {{THREE}}. What should users choose?
 This patch provides a latency-consistency analysis within {{nodetool}}. Users 
 can accurately predict Cassandra's behavior in their production environments 
 without interfering with performance.
 What's the probability that we'll read a write t seconds after it completes? 
 What about reading one of the last k writes? This patch provides answers via 
 {{nodetool predictconsistency}}:
 {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
 \\ \\
 {code:title=Example output|borderStyle=solid}
 //N == ReplicationFactor
 //R == read ConsistencyLevel
 //W == write ConsistencyLevel
 user@test:$ nodetool predictconsistency 3 100 1
 100ms after a given write, with maximum version staleness of k=1
 N=3, R=1, W=1
 Probability of consistent reads: 0.811700
 Average read latency: 6.896300ms (99.900th %ile 174ms)
 Average write latency: 8.788000ms (99.900th %ile 252ms)
 N=3, R=1, W=2
 Probability of consistent reads: 0.867200
 Average read latency: 6.818200ms (99.900th %ile 152ms)
 Average write latency: 33.226101ms (99.900th %ile 420ms)
 N=3, R=1, W=3
 Probability of consistent reads: 1.00
 Average read latency: 6.766800ms (99.900th %ile 111ms)
 Average write latency: 153.764999ms (99.900th %ile 969ms)
 N=3, R=2, W=1
 Probability of consistent reads: 0.951500
 Average read latency: 18.065800ms (99.900th %ile 414ms)
 Average write latency: 8.322600ms (99.900th %ile 232ms)
 N=3, R=2, W=2
 Probability of consistent reads: 0.983000
 Average read latency: 18.009001ms (99.900th %ile 387ms)
 Average write latency: 35.797100ms (99.900th %ile 478ms)
 N=3, R=3, W=1
 Probability of consistent reads: 0.993900
 Average read latency: 101.959702ms (99.900th %ile 1094ms)
 Average write latency: 8.518600ms (99.900th %ile 236ms)
 {code}
 h3. Demo
 Here's an example scenario you can run using 
 [ccm|https://github.com/pcmanus/ccm]. The prediction is fast:
 {code:borderStyle=solid}
 cd cassandra-source-dir with patch applied
 ant
 # turn on consistency logging
 sed -i .bak 's/log_latencies_for_consistency_prediction: 
 false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml
 ccm create consistencytest --cassandra-dir=. 
 ccm populate -n 5
 ccm start
 # if start fails, you might need to initialize more loopback interfaces
 # e.g., sudo ifconfig lo0 alias 127.0.0.2
 # use stress to get some sample latency data
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read
 bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
 {code}
 h3. What and Why
 We've implemented [Probabilistically Bounded 
 Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
 consistency-latency trade-offs within Cassandra. Our 
 [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
 of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
 addition to profiling our own Cassandra deployments. In our experience, 
 prediction is both accurate and much more lightweight than profiling and 
 manually testing each possible replication configuration (especially in 
 production!).
 This analysis is important for the many users we've talked to and heard about 
 who use partial quorum operation (e.g., non-{{QUORUM}} 
 {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely 
 depends on their runtime environment and, short of profiling in production, 
 there's no existing way to answer these questions. (Keep in mind, Cassandra 
 defaults to CL={{ONE}}, meaning users don't know how stale their data will 
 be.)
 We outline 

[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-06-05 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Attachment: (was: pbs-nodetool-v1.patch)

 [patch] Support consistency-latency prediction in nodetool
 --

 Key: CASSANDRA-4261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Affects Versions: 1.2
Reporter: Peter Bailis
 Attachments: demo-pbs.sh, pbs-nodetool-v2.patch


 h3. Introduction
 Cassandra supports a variety of replication configurations: ReplicationFactor 
 is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
 {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
 consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
 {{THREE}}. What should users choose?
 This patch provides a latency-consistency analysis within {{nodetool}}. Users 
 can accurately predict Cassandra's behavior in their production environments 
 without interfering with performance.
 What's the probability that we'll read a write t seconds after it completes? 
 What about reading one of the last k writes? This patch provides answers via 
 {{nodetool predictconsistency}}:
 {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
 \\ \\
 {code:title=Example output|borderStyle=solid}
 //N == ReplicationFactor
 //R == read ConsistencyLevel
 //W == write ConsistencyLevel
 user@test:$ nodetool predictconsistency 3 100 1
 100ms after a given write, with maximum version staleness of k=1
 N=3, R=1, W=1
 Probability of consistent reads: 0.811700
 Average read latency: 6.896300ms (99.900th %ile 174ms)
 Average write latency: 8.788000ms (99.900th %ile 252ms)
 N=3, R=1, W=2
 Probability of consistent reads: 0.867200
 Average read latency: 6.818200ms (99.900th %ile 152ms)
 Average write latency: 33.226101ms (99.900th %ile 420ms)
 N=3, R=1, W=3
 Probability of consistent reads: 1.00
 Average read latency: 6.766800ms (99.900th %ile 111ms)
 Average write latency: 153.764999ms (99.900th %ile 969ms)
 N=3, R=2, W=1
 Probability of consistent reads: 0.951500
 Average read latency: 18.065800ms (99.900th %ile 414ms)
 Average write latency: 8.322600ms (99.900th %ile 232ms)
 N=3, R=2, W=2
 Probability of consistent reads: 0.983000
 Average read latency: 18.009001ms (99.900th %ile 387ms)
 Average write latency: 35.797100ms (99.900th %ile 478ms)
 N=3, R=3, W=1
 Probability of consistent reads: 0.993900
 Average read latency: 101.959702ms (99.900th %ile 1094ms)
 Average write latency: 8.518600ms (99.900th %ile 236ms)
 {code}
 h3. Demo
 Here's an example scenario you can run using 
 [ccm|https://github.com/pcmanus/ccm]. The prediction is fast:
 {code:borderStyle=solid}
 cd cassandra-source-dir with patch applied
 ant
 # turn on consistency logging
 sed -i .bak 's/log_latencies_for_consistency_prediction: 
 false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml
 ccm create consistencytest --cassandra-dir=. 
 ccm populate -n 5
 ccm start
 # if start fails, you might need to initialize more loopback interfaces
 # e.g., sudo ifconfig lo0 alias 127.0.0.2
 # use stress to get some sample latency data
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read
 bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
 {code}
 h3. What and Why
 We've implemented [Probabilistically Bounded 
 Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
 consistency-latency trade-offs within Cassandra. Our 
 [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
 of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
 addition to profiling our own Cassandra deployments. In our experience, 
 prediction is both accurate and much more lightweight than profiling and 
 manually testing each possible replication configuration (especially in 
 production!).
 This analysis is important for the many users we've talked to and heard about 
 who use partial quorum operation (e.g., non-{{QUORUM}} 
 {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely 
 depends on their runtime environment and, short of profiling in production, 
 there's no existing way to answer these questions. (Keep in mind, Cassandra 
 defaults to CL={{ONE}}, meaning users don't know how stale their data will 
 be.)
 We outline limitations of the current approach after describing how it's 
 done. We believe that this is a useful feature that can provide guidance and 
 fairly accurate estimation for most users.
 h3. Interface
 This patch allows users to perform this prediction in production 

[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-06-05 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Attachment: demo-pbs-v2.sh

Updated hyperlink in demo script.

 [patch] Support consistency-latency prediction in nodetool
 --

 Key: CASSANDRA-4261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Affects Versions: 1.2
Reporter: Peter Bailis
 Attachments: demo-pbs-v2.sh, demo-pbs.sh, pbs-nodetool-v2.patch


 h3. Introduction
 Cassandra supports a variety of replication configurations: ReplicationFactor 
 is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
 {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
 consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
 {{THREE}}. What should users choose?
 This patch provides a latency-consistency analysis within {{nodetool}}. Users 
 can accurately predict Cassandra's behavior in their production environments 
 without interfering with performance.
 What's the probability that we'll read a write t seconds after it completes? 
 What about reading one of the last k writes? This patch provides answers via 
 {{nodetool predictconsistency}}:
 {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
 \\ \\
 {code:title=Example output|borderStyle=solid}
 //N == ReplicationFactor
 //R == read ConsistencyLevel
 //W == write ConsistencyLevel
 user@test:$ nodetool predictconsistency 3 100 1
 100ms after a given write, with maximum version staleness of k=1
 N=3, R=1, W=1
 Probability of consistent reads: 0.811700
 Average read latency: 6.896300ms (99.900th %ile 174ms)
 Average write latency: 8.788000ms (99.900th %ile 252ms)
 N=3, R=1, W=2
 Probability of consistent reads: 0.867200
 Average read latency: 6.818200ms (99.900th %ile 152ms)
 Average write latency: 33.226101ms (99.900th %ile 420ms)
 N=3, R=1, W=3
 Probability of consistent reads: 1.00
 Average read latency: 6.766800ms (99.900th %ile 111ms)
 Average write latency: 153.764999ms (99.900th %ile 969ms)
 N=3, R=2, W=1
 Probability of consistent reads: 0.951500
 Average read latency: 18.065800ms (99.900th %ile 414ms)
 Average write latency: 8.322600ms (99.900th %ile 232ms)
 N=3, R=2, W=2
 Probability of consistent reads: 0.983000
 Average read latency: 18.009001ms (99.900th %ile 387ms)
 Average write latency: 35.797100ms (99.900th %ile 478ms)
 N=3, R=3, W=1
 Probability of consistent reads: 0.993900
 Average read latency: 101.959702ms (99.900th %ile 1094ms)
 Average write latency: 8.518600ms (99.900th %ile 236ms)
 {code}
 h3. Demo
 Here's an example scenario you can run using 
 [ccm|https://github.com/pcmanus/ccm]. The prediction is fast:
 {code:borderStyle=solid}
 cd cassandra-source-dir with patch applied
 ant
 # turn on consistency logging
 sed -i .bak 's/log_latencies_for_consistency_prediction: 
 false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml
 ccm create consistencytest --cassandra-dir=. 
 ccm populate -n 5
 ccm start
 # if start fails, you might need to initialize more loopback interfaces
 # e.g., sudo ifconfig lo0 alias 127.0.0.2
 # use stress to get some sample latency data
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read
 bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
 {code}
 h3. What and Why
 We've implemented [Probabilistically Bounded 
 Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
 consistency-latency trade-offs within Cassandra. Our 
 [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
 of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
 addition to profiling our own Cassandra deployments. In our experience, 
 prediction is both accurate and much more lightweight than profiling and 
 manually testing each possible replication configuration (especially in 
 production!).
 This analysis is important for the many users we've talked to and heard about 
 who use partial quorum operation (e.g., non-{{QUORUM}} 
 {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely 
 depends on their runtime environment and, short of profiling in production, 
 there's no existing way to answer these questions. (Keep in mind, Cassandra 
 defaults to CL={{ONE}}, meaning users don't know how stale their data will 
 be.)
 We outline limitations of the current approach after describing how it's 
 done. We believe that this is a useful feature that can provide guidance and 
 fairly accurate estimation for most users.
 h3. Interface
 This patch allows 

[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-06-05 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
{{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
{{THREE}}. What should users choose?

This patch provides a latency-consistency analysis within {{nodetool}}. Users 
can accurately predict Cassandra's behavior in their production environments 
without interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? This patch provides answers via 
{{nodetool predictconsistency}}:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
Performing consistency prediction
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.678900
Average read latency: 5.377900ms (99.900th %ile 40ms)
Average write latency: 36.971298ms (99.900th %ile 294ms)

N=3, R=1, W=2
Probability of consistent reads: 0.791600
Average read latency: 5.372500ms (99.900th %ile 39ms)
Average write latency: 303.630890ms (99.900th %ile 357ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 5.426600ms (99.900th %ile 42ms)
Average write latency: 1382.650879ms (99.900th %ile 629ms)

N=3, R=2, W=1
Probability of consistent reads: 0.915800
Average read latency: 11.091000ms (99.900th %ile 348ms)
Average write latency: 42.663101ms (99.900th %ile 284ms)

N=3, R=2, W=2
Probability of consistent reads: 1.00
Average read latency: 10.606800ms (99.900th %ile 263ms)
Average write latency: 310.117615ms (99.900th %ile 335ms)

N=3, R=3, W=1
Probability of consistent reads: 1.00
Average read latency: 52.657501ms (99.900th %ile 565ms)
Average write latency: 39.949799ms (99.900th %ile 237ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than profiling and 
manually testing each possible replication configuration (especially in 
production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). 
Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime 
environment and, short of profiling in production, there's no existing way to 
answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, 
meaning users don't know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and 

[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-06-05 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Attachment: (was: demo-pbs.sh)

 [patch] Support consistency-latency prediction in nodetool
 --

 Key: CASSANDRA-4261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Affects Versions: 1.2
Reporter: Peter Bailis
 Attachments: demo-pbs-v2.sh, pbs-nodetool-v2.patch


 h3. Introduction
 Cassandra supports a variety of replication configurations: ReplicationFactor 
 is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
 {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
 consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
 {{THREE}}. What should users choose?
 This patch provides a latency-consistency analysis within {{nodetool}}. Users 
 can accurately predict Cassandra's behavior in their production environments 
 without interfering with performance.
 What's the probability that we'll read a write t seconds after it completes? 
 What about reading one of the last k writes? This patch provides answers via 
 {{nodetool predictconsistency}}:
 {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
 \\ \\
 {code:title=Example output|borderStyle=solid}
 //N == ReplicationFactor
 //R == read ConsistencyLevel
 //W == write ConsistencyLevel
 user@test:$ nodetool predictconsistency 3 100 1
 Performing consistency prediction
 100ms after a given write, with maximum version staleness of k=1
 N=3, R=1, W=1
 Probability of consistent reads: 0.678900
 Average read latency: 5.377900ms (99.900th %ile 40ms)
 Average write latency: 36.971298ms (99.900th %ile 294ms)
 N=3, R=1, W=2
 Probability of consistent reads: 0.791600
 Average read latency: 5.372500ms (99.900th %ile 39ms)
 Average write latency: 303.630890ms (99.900th %ile 357ms)
 N=3, R=1, W=3
 Probability of consistent reads: 1.00
 Average read latency: 5.426600ms (99.900th %ile 42ms)
 Average write latency: 1382.650879ms (99.900th %ile 629ms)
 N=3, R=2, W=1
 Probability of consistent reads: 0.915800
 Average read latency: 11.091000ms (99.900th %ile 348ms)
 Average write latency: 42.663101ms (99.900th %ile 284ms)
 N=3, R=2, W=2
 Probability of consistent reads: 1.00
 Average read latency: 10.606800ms (99.900th %ile 263ms)
 Average write latency: 310.117615ms (99.900th %ile 335ms)
 N=3, R=3, W=1
 Probability of consistent reads: 1.00
 Average read latency: 52.657501ms (99.900th %ile 565ms)
 Average write latency: 39.949799ms (99.900th %ile 237ms)
 {code}
 h3. Demo
 Here's an example scenario you can run using 
 [ccm|https://github.com/pcmanus/ccm]. The prediction is fast:
 {code:borderStyle=solid}
 cd cassandra-source-dir with patch applied
 ant
 # turn on consistency logging
 sed -i .bak 's/log_latencies_for_consistency_prediction: 
 false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml
 ccm create consistencytest --cassandra-dir=. 
 ccm populate -n 5
 ccm start
 # if start fails, you might need to initialize more loopback interfaces
 # e.g., sudo ifconfig lo0 alias 127.0.0.2
 # use stress to get some sample latency data
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read
 bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
 {code}
 h3. What and Why
 We've implemented [Probabilistically Bounded 
 Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
 consistency-latency trade-offs within Cassandra. Our 
 [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
 of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
 addition to profiling our own Cassandra deployments. In our experience, 
 prediction is both accurate and much more lightweight than profiling and 
 manually testing each possible replication configuration (especially in 
 production!).
 This analysis is important for the many users we've talked to and heard about 
 who use partial quorum operation (e.g., non-{{QUORUM}} 
 {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely 
 depends on their runtime environment and, short of profiling in production, 
 there's no existing way to answer these questions. (Keep in mind, Cassandra 
 defaults to CL={{ONE}}, meaning users don't know how stale their data will 
 be.)
 We outline limitations of the current approach after describing how it's 
 done. We believe that this is a useful feature that can provide guidance and 
 fairly accurate estimation for most users.
 h3. Interface
 This patch allows users to perform 

[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-05-22 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Attachment: demo-pbs.sh

 [patch] Support consistency-latency prediction in nodetool
 --

 Key: CASSANDRA-4261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Affects Versions: 1.2
Reporter: Peter Bailis
 Attachments: demo-pbs.sh, pbs-nodetool-v1.patch


 h3. Introduction
 Cassandra supports a variety of replication configurations: ReplicationFactor 
 is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
 {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
 consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
 {{THREE}}. What should users choose?
 This patch provides a latency-consistency analysis within {{nodetool}}. Users 
 can accurately predict Cassandra's behavior in their production environments 
 without interfering with performance.
 What's the probability that we'll read a write t seconds after it completes? 
 What about reading one of the last k writes? This patch provides answers via 
 {{nodetool predictconsistency}}:
 {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
 \\ \\
 {code:title=Example output|borderStyle=solid}
 //N == ReplicationFactor
 //R == read ConsistencyLevel
 //W == write ConsistencyLevel
 user@test:$ nodetool predictconsistency 3 100 1
 100ms after a given write, with maximum version staleness of k=1
 N=3, R=1, W=1
 Probability of consistent reads: 0.811700
 Average read latency: 6.896300ms (99.900th %ile 174ms)
 Average write latency: 8.788000ms (99.900th %ile 252ms)
 N=3, R=1, W=2
 Probability of consistent reads: 0.867200
 Average read latency: 6.818200ms (99.900th %ile 152ms)
 Average write latency: 33.226101ms (99.900th %ile 420ms)
 N=3, R=1, W=3
 Probability of consistent reads: 1.00
 Average read latency: 6.766800ms (99.900th %ile 111ms)
 Average write latency: 153.764999ms (99.900th %ile 969ms)
 N=3, R=2, W=1
 Probability of consistent reads: 0.951500
 Average read latency: 18.065800ms (99.900th %ile 414ms)
 Average write latency: 8.322600ms (99.900th %ile 232ms)
 N=3, R=2, W=2
 Probability of consistent reads: 0.983000
 Average read latency: 18.009001ms (99.900th %ile 387ms)
 Average write latency: 35.797100ms (99.900th %ile 478ms)
 N=3, R=3, W=1
 Probability of consistent reads: 0.993900
 Average read latency: 101.959702ms (99.900th %ile 1094ms)
 Average write latency: 8.518600ms (99.900th %ile 236ms)
 {code}
 h3. Demo
 Here's an example scenario you can run using 
 [ccm|https://github.com/pcmanus/ccm]. The prediction is fast:
 {code:borderStyle=solid}
 cd cassandra-source-dir with patch applied
 ant
 # turn on consistency logging
 sed -i .bak 's/log_latencies_for_consistency_prediction: 
 false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml
 ccm create consistencytest --cassandra-dir=. 
 ccm populate -n 5
 ccm start
 # if start fails, you might need to initialize more loopback interfaces
 # e.g., sudo ifconfig lo0 alias 127.0.0.2
 # use stress to get some sample latency data
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read
 bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
 {code}
 h3. What and Why
 We've implemented [Probabilistically Bounded 
 Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
 consistency-latency trade-offs within Cassandra. Our 
 [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
 of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
 addition to profiling our own Cassandra deployments. In our experience, 
 prediction is both accurate and much more lightweight than profiling and 
 manually testing each possible replication configuration (especially in 
 production!).
 This analysis is important for the many users we've talked to and heard about 
 who use partial quorum operation (e.g., non-{{QUORUM}} 
 {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely 
 depends on their runtime environment and, short of profiling in production, 
 there's no existing way to answer these questions. (Keep in mind, Cassandra 
 defaults to CL={{ONE}}, meaning users don't know how stale their data will 
 be.)
 We outline limitations of the current approach after describing how it's 
 done. We believe that this is a useful feature that can provide guidance and 
 fairly accurate estimation for most users.
 h3. Interface
 This patch allows users to perform this prediction in production using 
 {{nodetool}}.
 

[jira] [Commented] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-05-22 Thread Peter Bailis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281391#comment-13281391
 ] 

Peter Bailis commented on CASSANDRA-4261:
-

I've provided a bash script that performs a full end-to-end demonstration of 
this patch, in case you didn't want to pull a clean source tree and patch it, 
then copy and paste the commands above. The script clones Cassandra trunk, 
applies the patch, then spins up and profiles at local 5 node cluster using ccm 
as above. The script isn't robust, but it should be easy enough to debug. Enjoy!

 [patch] Support consistency-latency prediction in nodetool
 --

 Key: CASSANDRA-4261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Affects Versions: 1.2
Reporter: Peter Bailis
 Attachments: demo-pbs.sh, pbs-nodetool-v1.patch


 h3. Introduction
 Cassandra supports a variety of replication configurations: ReplicationFactor 
 is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
 {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
 consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
 {{THREE}}. What should users choose?
 This patch provides a latency-consistency analysis within {{nodetool}}. Users 
 can accurately predict Cassandra's behavior in their production environments 
 without interfering with performance.
 What's the probability that we'll read a write t seconds after it completes? 
 What about reading one of the last k writes? This patch provides answers via 
 {{nodetool predictconsistency}}:
 {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
 \\ \\
 {code:title=Example output|borderStyle=solid}
 //N == ReplicationFactor
 //R == read ConsistencyLevel
 //W == write ConsistencyLevel
 user@test:$ nodetool predictconsistency 3 100 1
 100ms after a given write, with maximum version staleness of k=1
 N=3, R=1, W=1
 Probability of consistent reads: 0.811700
 Average read latency: 6.896300ms (99.900th %ile 174ms)
 Average write latency: 8.788000ms (99.900th %ile 252ms)
 N=3, R=1, W=2
 Probability of consistent reads: 0.867200
 Average read latency: 6.818200ms (99.900th %ile 152ms)
 Average write latency: 33.226101ms (99.900th %ile 420ms)
 N=3, R=1, W=3
 Probability of consistent reads: 1.00
 Average read latency: 6.766800ms (99.900th %ile 111ms)
 Average write latency: 153.764999ms (99.900th %ile 969ms)
 N=3, R=2, W=1
 Probability of consistent reads: 0.951500
 Average read latency: 18.065800ms (99.900th %ile 414ms)
 Average write latency: 8.322600ms (99.900th %ile 232ms)
 N=3, R=2, W=2
 Probability of consistent reads: 0.983000
 Average read latency: 18.009001ms (99.900th %ile 387ms)
 Average write latency: 35.797100ms (99.900th %ile 478ms)
 N=3, R=3, W=1
 Probability of consistent reads: 0.993900
 Average read latency: 101.959702ms (99.900th %ile 1094ms)
 Average write latency: 8.518600ms (99.900th %ile 236ms)
 {code}
 h3. Demo
 Here's an example scenario you can run using 
 [ccm|https://github.com/pcmanus/ccm]. The prediction is fast:
 {code:borderStyle=solid}
 cd cassandra-source-dir with patch applied
 ant
 # turn on consistency logging
 sed -i .bak 's/log_latencies_for_consistency_prediction: 
 false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml
 ccm create consistencytest --cassandra-dir=. 
 ccm populate -n 5
 ccm start
 # if start fails, you might need to initialize more loopback interfaces
 # e.g., sudo ifconfig lo0 alias 127.0.0.2
 # use stress to get some sample latency data
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read
 bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
 {code}
 h3. What and Why
 We've implemented [Probabilistically Bounded 
 Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
 consistency-latency trade-offs within Cassandra. Our 
 [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
 of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
 addition to profiling our own Cassandra deployments. In our experience, 
 prediction is both accurate and much more lightweight than profiling and 
 manually testing each possible replication configuration (especially in 
 production!).
 This analysis is important for the many users we've talked to and heard about 
 who use partial quorum operation (e.g., non-{{QUORUM}} 
 {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely 
 depends on their runtime environment and, short of profiling in production, 
 there's no existing way to 

[jira] [Created] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)
Peter Bailis created CASSANDRA-4261:
---

 Summary: [Patch] Support consistency-latency prediction in nodetool
 Key: CASSANDRA-4261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Affects Versions: 1.2
Reporter: Peter Bailis




.h1 Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but 
QUORUM is often slower than ONE, TWO, or THREE. What should users choose?

This patch provides a latency-consistency analysis within nodetool. Users can 
accurately predict Cassandra behavior in their production environments without 
interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? nodetool predictconsistency 
provides this:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}

{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

.h1 Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

.h1 What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than trying out different 
configurations (especially in production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should 
they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, 
short of profiling in production, there's no existing way to answer these 
questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't 
know how stale their data will be.)

This patch allows users to perform this prediction in production using 
{{nodetool}}. Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. 
Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies 
(each latency is 8 bytes, and there are 4 distributions we require, so the 
space overhead is {{32*logged_latencies}} bytes of memory for the predicting 
node) and then predicts the latency and consistency for each possible 
ConsistencyLevel setting (reads and writes) by running 
{{number_trials_for_consistency_prediction}} Monte Carlo trials per 
configuration. Users shouldn't have to touch these parameters, and the defaults 
work well. 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Attachment: pbs-nodetool-v1.patch

Last commit to Cassandra fork for this patch is at 
https://github.com/pbailis/cassandra-pbs/commit/6e0ac68b43a7e6692423abf760edf88d633dd04d

 [Patch] Support consistency-latency prediction in nodetool
 --

 Key: CASSANDRA-4261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Affects Versions: 1.2
Reporter: Peter Bailis
 Attachments: pbs-nodetool-v1.patch


 .h1 Introduction
 Cassandra supports a variety of replication configurations: ReplicationFactor 
 is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
 ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, 
 but QUORUM is often slower than ONE, TWO, or THREE. What should users choose?
 This patch provides a latency-consistency analysis within nodetool. Users can 
 accurately predict Cassandra behavior in their production environments 
 without interfering with performance.
 What's the probability that we'll read a write t seconds after it completes? 
 What about reading one of the last k writes? nodetool predictconsistency 
 provides this:
 {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
 {code:title=Example output|borderStyle=solid}
 //N == ReplicationFactor
 //R == read ConsistencyLevel
 //W == write ConsistencyLevel
 user@test:$ nodetool predictconsistency 3 100 1
 100ms after a given write, with maximum version staleness of k=1
 N=3, R=1, W=1
 Probability of consistent reads: 0.811700
 Average read latency: 6.896300ms (99.900th %ile 174ms)
 Average write latency: 8.788000ms (99.900th %ile 252ms)
 N=3, R=1, W=2
 Probability of consistent reads: 0.867200
 Average read latency: 6.818200ms (99.900th %ile 152ms)
 Average write latency: 33.226101ms (99.900th %ile 420ms)
 N=3, R=1, W=3
 Probability of consistent reads: 1.00
 Average read latency: 6.766800ms (99.900th %ile 111ms)
 Average write latency: 153.764999ms (99.900th %ile 969ms)
 N=3, R=2, W=1
 Probability of consistent reads: 0.951500
 Average read latency: 18.065800ms (99.900th %ile 414ms)
 Average write latency: 8.322600ms (99.900th %ile 232ms)
 N=3, R=2, W=2
 Probability of consistent reads: 0.983000
 Average read latency: 18.009001ms (99.900th %ile 387ms)
 Average write latency: 35.797100ms (99.900th %ile 478ms)
 N=3, R=3, W=1
 Probability of consistent reads: 0.993900
 Average read latency: 101.959702ms (99.900th %ile 1094ms)
 Average write latency: 8.518600ms (99.900th %ile 236ms)
 {code}
 .h1 Demo
 Here's an example scenario you can run using 
 [ccm|https://github.com/pcmanus/ccm]. The prediction is fast:
 {code:borderStyle=solid}
 cd cassandra-source-dir with patch applied
 ant
 # turn on consistency logging
 sed -i .bak 's/log_latencies_for_consistency_prediction: 
 false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml
 ccm create consistencytest --cassandra-dir=. 
 ccm populate -n 5
 ccm start
 # if start fails, you might need to initialize more loopback interfaces
 # e.g., sudo ifconfig lo0 alias 127.0.0.2
 # use stress to get some sample latency data
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read
 bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
 {code}
 .h1 What and Why
 We've implemented [Probabilistically Bounded 
 Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
 consistency-latency trade-offs within Cassandra. Our 
 [paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
 of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
 addition to profiling our own Cassandra deployments. In our experience, 
 prediction is both accurate and much more lightweight than trying out 
 different configurations (especially in production!).
 This analysis is important for the many users we've talked to and heard about 
 who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). 
 Should they use CL=ONE? CL=TWO? It likely depends on their runtime 
 environment and, short of profiling in production, there's no existing way to 
 answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning 
 users don't know how stale their data will be.)
 This patch allows users to perform this prediction in production using 
 {{nodetool}}. Users enable tracing of latency data by setting 
 {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. 
 Cassandra logs 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Comment: was deleted

(was: Last commit to Cassandra fork for this patch is at 
https://github.com/pbailis/cassandra-pbs/commit/6e0ac68b43a7e6692423abf760edf88d633dd04d)

 [Patch] Support consistency-latency prediction in nodetool
 --

 Key: CASSANDRA-4261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Affects Versions: 1.2
Reporter: Peter Bailis
 Attachments: pbs-nodetool-v1.patch


 h2. Introduction
 Cassandra supports a variety of replication configurations: ReplicationFactor 
 is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
 ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, 
 but QUORUM is often slower than ONE, TWO, or THREE. What should users choose?
 This patch provides a latency-consistency analysis within nodetool. Users can 
 accurately predict Cassandra behavior in their production environments 
 without interfering with performance.
 What's the probability that we'll read a write t seconds after it completes? 
 What about reading one of the last k writes? nodetool predictconsistency 
 provides this:
 {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
 {code:title=Example output|borderStyle=solid}
 //N == ReplicationFactor
 //R == read ConsistencyLevel
 //W == write ConsistencyLevel
 user@test:$ nodetool predictconsistency 3 100 1
 100ms after a given write, with maximum version staleness of k=1
 N=3, R=1, W=1
 Probability of consistent reads: 0.811700
 Average read latency: 6.896300ms (99.900th %ile 174ms)
 Average write latency: 8.788000ms (99.900th %ile 252ms)
 N=3, R=1, W=2
 Probability of consistent reads: 0.867200
 Average read latency: 6.818200ms (99.900th %ile 152ms)
 Average write latency: 33.226101ms (99.900th %ile 420ms)
 N=3, R=1, W=3
 Probability of consistent reads: 1.00
 Average read latency: 6.766800ms (99.900th %ile 111ms)
 Average write latency: 153.764999ms (99.900th %ile 969ms)
 N=3, R=2, W=1
 Probability of consistent reads: 0.951500
 Average read latency: 18.065800ms (99.900th %ile 414ms)
 Average write latency: 8.322600ms (99.900th %ile 232ms)
 N=3, R=2, W=2
 Probability of consistent reads: 0.983000
 Average read latency: 18.009001ms (99.900th %ile 387ms)
 Average write latency: 35.797100ms (99.900th %ile 478ms)
 N=3, R=3, W=1
 Probability of consistent reads: 0.993900
 Average read latency: 101.959702ms (99.900th %ile 1094ms)
 Average write latency: 8.518600ms (99.900th %ile 236ms)
 {code}
 h2. Demo
 Here's an example scenario you can run using 
 [ccm|https://github.com/pcmanus/ccm]. The prediction is fast:
 {code:borderStyle=solid}
 cd cassandra-source-dir with patch applied
 ant
 # turn on consistency logging
 sed -i .bak 's/log_latencies_for_consistency_prediction: 
 false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml
 ccm create consistencytest --cassandra-dir=. 
 ccm populate -n 5
 ccm start
 # if start fails, you might need to initialize more loopback interfaces
 # e.g., sudo ifconfig lo0 alias 127.0.0.2
 # use stress to get some sample latency data
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read
 bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
 {code}
 h2. What and Why
 We've implemented [Probabilistically Bounded 
 Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
 consistency-latency trade-offs within Cassandra. Our 
 [paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
 of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
 addition to profiling our own Cassandra deployments. In our experience, 
 prediction is both accurate and much more lightweight than trying out 
 different configurations (especially in production!).
 This analysis is important for the many users we've talked to and heard about 
 who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). 
 Should they use CL=ONE? CL=TWO? It likely depends on their runtime 
 environment and, short of profiling in production, there's no existing way to 
 answer these questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning 
 users don't know how stale their data will be.)
 This patch allows users to perform this prediction in production using 
 {{nodetool}}. Users enable tracing of latency data by setting 
 {{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. 
 Cassandra logs {{max_logged_latencies_for_consistency_prediction}} 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 

h1. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but 
QUORUM is often slower than ONE, TWO, or THREE. What should users choose?

This patch provides a latency-consistency analysis within nodetool. Users can 
accurately predict Cassandra behavior in their production environments without 
interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? nodetool predictconsistency 
provides this:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}

{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h1. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h1. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than trying out different 
configurations (especially in production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should 
they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, 
short of profiling in production, there's no existing way to answer these 
questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't 
know how stale their data will be.)

This patch allows users to perform this prediction in production using 
{{nodetool}}. Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. 
Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies 
(each latency is 8 bytes, and there are 4 distributions we require, so the 
space overhead is {{32*logged_latencies}} bytes of memory for the predicting 
node) and then predicts the latency and consistency for each possible 
ConsistencyLevel setting (reads and writes) by running 
{{number_trials_for_consistency_prediction}} Monte Carlo trials per 
configuration. Users shouldn't have to touch these parameters, and the defaults 
work well. The more latencies they log, the better the predictions will be.

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h2. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but 
QUORUM is often slower than ONE, TWO, or THREE. What should users choose?

This patch provides a latency-consistency analysis within nodetool. Users can 
accurately predict Cassandra behavior in their production environments without 
interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? nodetool predictconsistency 
provides this:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}

{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h2. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h2. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than trying out different 
configurations (especially in production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should 
they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, 
short of profiling in production, there's no existing way to answer these 
questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't 
know how stale their data will be.)

This patch allows users to perform this prediction in production using 
{{nodetool}}. Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. 
Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies 
(each latency is 8 bytes, and there are 4 distributions we require, so the 
space overhead is {{32*logged_latencies}} bytes of memory for the predicting 
node) and then predicts the latency and consistency for each possible 
ConsistencyLevel setting (reads and writes) by running 
{{number_trials_for_consistency_prediction}} Monte Carlo trials per 
configuration. Users shouldn't have to touch these parameters, and the defaults 
work well. The more latencies they log, the better the predictions will be.

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but 
QUORUM is often slower than ONE, TWO, or THREE. What should users choose?

This patch provides a latency-consistency analysis within nodetool. Users can 
accurately predict Cassandra behavior in their production environments without 
interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? nodetool predictconsistency 
provides this:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}//

{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than trying out different 
configurations (especially in production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should 
they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, 
short of profiling in production, there's no existing way to answer these 
questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't 
know how stale their data will be.)

This patch allows users to perform this prediction in production using 
{{nodetool}}. Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}. 
Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies 
(each latency is 8 bytes, and there are 4 distributions we require, so the 
space overhead is {{32*logged_latencies}} bytes of memory for the predicting 
node) and then predicts the latency and consistency for each possible 
ConsistencyLevel setting (reads and writes) by running 
{{number_trials_for_consistency_prediction}} Monte Carlo trials per 
configuration. Users shouldn't have to touch these parameters, and the defaults 
work well. The more latencies they log, the better the predictions will be.

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but 
QUORUM is often slower than ONE, TWO, or THREE. What should users choose?

This patch provides a latency-consistency analysis within nodetool. Users can 
accurately predict Cassandra behavior in their production environments without 
interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? nodetool predictconsistency 
provides this:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}//

{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than trying out different 
configurations (especially in production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should 
they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, 
short of profiling in production, there's no existing way to answer these 
questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't 
know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 
{{number_trials_for_consistency_prediction}} Monte Carlo trials per 
configuration.

Users shouldn't have to touch these 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but 
QUORUM is often slower than ONE, TWO, or THREE. What should users choose?

This patch provides a latency-consistency analysis within nodetool. Users can 
accurately predict Cassandra behavior in their production environments without 
interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? nodetool predictconsistency 
provides this:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than trying out different 
configurations (especially in production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should 
they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, 
short of profiling in production, there's no existing way to answer these 
questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't 
know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 
{{number_trials_for_consistency_prediction}} Monte Carlo trials per 
configuration.

Users shouldn't have to touch these 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but 
QUORUM is often slower than ONE, TWO, or THREE. What should users choose?

This patch provides a latency-consistency analysis within nodetool. Users can 
accurately predict Cassandra behavior in their production environments without 
interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? nodetool predictconsistency 
provides this:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}

{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than trying out different 
configurations (especially in production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should 
they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, 
short of profiling in production, there's no existing way to answer these 
questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't 
know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 
{{number_trials_for_consistency_prediction}} Monte Carlo trials per 
configuration.

Users shouldn't have to touch these 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but 
QUORUM is often slower than ONE, TWO, or THREE. What should users choose?

This patch provides a latency-consistency analysis within nodetool. Users can 
accurately predict Cassandra behavior in their production environments without 
interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? nodetool predictconsistency 
provides this:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than trying out different 
configurations (especially in production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should 
they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, 
short of profiling in production, there's no existing way to answer these 
questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't 
know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 
{{number_trials_for_consistency_prediction}} Monte Carlo trials per 
configuration.

Users shouldn't have to touch these 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but 
QUORUM is often slower than ONE, TWO, or THREE. What should users choose?

This patch provides a latency-consistency analysis within nodetool. Users can 
accurately predict Cassandra behavior in their production environments without 
interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? nodetool predictconsistency 
provides this:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}


{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than trying out different 
configurations (especially in production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should 
they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, 
short of profiling in production, there's no existing way to answer these 
questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't 
know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 
{{number_trials_for_consistency_prediction}} Monte Carlo trials per 
configuration.

Users shouldn't have to touch these 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but 
QUORUM is often slower than ONE, TWO, or THREE. What should users choose?

This patch provides a latency-consistency analysis within nodetool. Users can 
accurately predict Cassandra behavior in their production environments without 
interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? nodetool predictconsistency 
provides this:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}\\

{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than trying out different 
configurations (especially in production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should 
they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, 
short of profiling in production, there's no existing way to answer these 
questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't 
know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 
{{number_trials_for_consistency_prediction}} Monte Carlo trials per 
configuration.

Users shouldn't have to touch these 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but 
QUORUM is often slower than ONE, TWO, or THREE. What should users choose?

This patch provides a latency-consistency analysis within nodetool. Users can 
accurately predict Cassandra behavior in their production environments without 
interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? This patch, exposed by {{nodetool 
predictconsistency}} provides answers:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than trying out different 
configurations (especially in production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should 
they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, 
short of profiling in production, there's no existing way to answer these 
questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't 
know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 
{{number_trials_for_consistency_prediction}} Monte Carlo trials per 
configuration.

Users 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but 
QUORUM is often slower than ONE, TWO, or THREE. What should users choose?

This patch provides a latency-consistency analysis within nodetool. Users can 
accurately predict Cassandra behavior in their production environments without 
interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? This patch, exposed by {{nodetool 
predictconsistency}} provides answers:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper||http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than trying out different 
configurations (especially in production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should 
they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, 
short of profiling in production, there's no existing way to answer these 
questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't 
know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 
{{number_trials_for_consistency_prediction}} Monte Carlo trials per 
configuration.

Users 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but 
QUORUM is often slower than ONE, TWO, or THREE. What should users choose?

This patch provides a latency-consistency analysis within nodetool. Users can 
accurately predict Cassandra behavior in their production environments without 
interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? This patch, exposed by {{nodetool 
predictconsistency}} provides answers:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than trying out different 
configurations (especially in production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should 
they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, 
short of profiling in production, there's no existing way to answer these 
questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't 
know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 
{{number_trials_for_consistency_prediction}} Monte Carlo trials per 
configuration.

Users 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
ConsistencyLevel to QUORUM for reads and writes ensures strong consistency, but 
QUORUM is often slower than ONE, TWO, or THREE. What should users choose?

This patch provides a latency-consistency analysis within nodetool. Users can 
accurately predict Cassandra behavior in their production environments without 
interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? This patch, exposed by {{nodetool 
predictconsistency}} provides answers:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than trying out different 
configurations (especially in production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-QUORUM ConsistencyLevels). Should 
they use CL=ONE? CL=TWO? It likely depends on their runtime environment and, 
short of profiling in production, there's no existing way to answer these 
questions. (Keep in mind, Cassandra defaults to CL=ONE, meaning users don't 
know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 
{{number_trials_for_consistency_prediction}} Monte Carlo trials per 
configuration.

Users 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
{{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
{{THREE}}. What should users choose?

This patch provides a latency-consistency analysis within nodetool. Users can 
accurately predict Cassandra behavior in their production environments without 
interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? This patch, exposed by {{nodetool 
predictconsistency}} provides answers:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than trying out different 
configurations (especially in production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-{{QUORUM}} 
{{ConsistencyLevel}}s). Should they use CL={{ONE}}? CL={{TWO}}? It likely 
depends on their runtime environment and, short of profiling in production, 
there's no existing way to answer these questions. (Keep in mind, Cassandra 
defaults to CL={{ONE}}, meaning users don't know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 
{{number_trials_for_consistency_prediction}} 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
{{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
{{THREE}}. What should users choose?

This patch provides a latency-consistency analysis within {{nodetool}}. Users 
can accurately predict Cassandra's behavior in their production environments 
without interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? This patch provides answers via 
{{nodetool predictconsistency}}:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than trying out different 
configurations (especially in production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-{{QUORUM}} 
{{ConsistencyLevel}}s). Should they use CL={{ONE}}? CL={{TWO}}? It likely 
depends on their runtime environment and, short of profiling in production, 
there's no existing way to answer these questions. (Keep in mind, Cassandra 
defaults to CL={{ONE}}, meaning users don't know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 
{{number_trials_for_consistency_prediction}} Monte 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
{{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
{{THREE}}. What should users choose?

This patch provides a latency-consistency analysis within {{nodetool}}. Users 
can accurately predict Cassandra's behavior in their production environments 
without interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? This patch, exposed by {{nodetool 
predictconsistency}} provides answers:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than trying out different 
configurations (especially in production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-{{QUORUM}} 
{{ConsistencyLevel}}s). Should they use CL={{ONE}}? CL={{TWO}}? It likely 
depends on their runtime environment and, short of profiling in production, 
there's no existing way to answer these questions. (Keep in mind, Cassandra 
defaults to CL={{ONE}}, meaning users don't know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
{{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
{{THREE}}. What should users choose?

This patch provides a latency-consistency analysis within {{nodetool}}. Users 
can accurately predict Cassandra's behavior in their production environments 
without interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? This patch provides answers via 
{{nodetool predictconsistency}}:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than profiling and 
manually testing each possible replication configuration (especially in 
production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-{{QUORUM}} 
{{ConsistencyLevel}}s). Should they use CL={{ONE}}? CL={{TWO}}? It likely 
depends on their runtime environment and, short of profiling in production, 
there's no existing way to answer these questions. (Keep in mind, Cassandra 
defaults to CL={{ONE}}, meaning users don't know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
{{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
{{THREE}}. What should users choose?

This patch provides a latency-consistency analysis within {{nodetool}}. Users 
can accurately predict Cassandra's behavior in their production environments 
without interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? This patch provides answers via 
{{nodetool predictconsistency}}:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than profiling and 
manually testing each possible replication configuration (especially in 
production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). 
Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime 
environment and, short of profiling in production, there's no existing way to 
answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, 
meaning users don't know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
{{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
{{THREE}}. What should users choose?

This patch provides a latency-consistency analysis within {{nodetool}}. Users 
can accurately predict Cassandra's behavior in their production environments 
without interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? This patch provides answers via 
{{nodetool predictconsistency}}:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than profiling and 
manually testing each possible replication configuration (especially in 
production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-{{QUORUM}} 
{{ConsistencyLevel}}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely 
depends on their runtime environment and, short of profiling in production, 
there's no existing way to answer these questions. (Keep in mind, Cassandra 
defaults to CL={{ONE}}, meaning users don't know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
{{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
{{THREE}}. What should users choose?

This patch provides a latency-consistency analysis within {{nodetool}}. Users 
can accurately predict Cassandra's behavior in their production environments 
without interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? This patch provides answers via 
{{nodetool predictconsistency}}:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than profiling and 
manually testing each possible replication configuration (especially in 
production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). 
Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime 
environment and, short of profiling in production, there's no existing way to 
answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, 
meaning users don't know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 

[jira] [Updated] (CASSANDRA-4261) [Patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
{{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
{{THREE}}. What should users choose?

This patch provides a latency-consistency analysis within {{nodetool}}. Users 
can accurately predict Cassandra's behavior in their production environments 
without interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? This patch provides answers via 
{{nodetool predictconsistency}}:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than profiling and 
manually testing each possible replication configuration (especially in 
production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). 
Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime 
environment and, short of profiling in production, there's no existing way to 
answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, 
meaning users don't know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 

[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Summary: [patch] Support consistency-latency prediction in nodetool  (was: 
[Patch] Support consistency-latency prediction in nodetool)

 [patch] Support consistency-latency prediction in nodetool
 --

 Key: CASSANDRA-4261
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4261
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Affects Versions: 1.2
Reporter: Peter Bailis
 Attachments: pbs-nodetool-v1.patch


 h3. Introduction
 Cassandra supports a variety of replication configurations: ReplicationFactor 
 is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
 {{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
 consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
 {{THREE}}. What should users choose?
 This patch provides a latency-consistency analysis within {{nodetool}}. Users 
 can accurately predict Cassandra's behavior in their production environments 
 without interfering with performance.
 What's the probability that we'll read a write t seconds after it completes? 
 What about reading one of the last k writes? This patch provides answers via 
 {{nodetool predictconsistency}}:
 {{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
 \\ \\
 {code:title=Example output|borderStyle=solid}
 //N == ReplicationFactor
 //R == read ConsistencyLevel
 //W == write ConsistencyLevel
 user@test:$ nodetool predictconsistency 3 100 1
 100ms after a given write, with maximum version staleness of k=1
 N=3, R=1, W=1
 Probability of consistent reads: 0.811700
 Average read latency: 6.896300ms (99.900th %ile 174ms)
 Average write latency: 8.788000ms (99.900th %ile 252ms)
 N=3, R=1, W=2
 Probability of consistent reads: 0.867200
 Average read latency: 6.818200ms (99.900th %ile 152ms)
 Average write latency: 33.226101ms (99.900th %ile 420ms)
 N=3, R=1, W=3
 Probability of consistent reads: 1.00
 Average read latency: 6.766800ms (99.900th %ile 111ms)
 Average write latency: 153.764999ms (99.900th %ile 969ms)
 N=3, R=2, W=1
 Probability of consistent reads: 0.951500
 Average read latency: 18.065800ms (99.900th %ile 414ms)
 Average write latency: 8.322600ms (99.900th %ile 232ms)
 N=3, R=2, W=2
 Probability of consistent reads: 0.983000
 Average read latency: 18.009001ms (99.900th %ile 387ms)
 Average write latency: 35.797100ms (99.900th %ile 478ms)
 N=3, R=3, W=1
 Probability of consistent reads: 0.993900
 Average read latency: 101.959702ms (99.900th %ile 1094ms)
 Average write latency: 8.518600ms (99.900th %ile 236ms)
 {code}
 h3. Demo
 Here's an example scenario you can run using 
 [ccm|https://github.com/pcmanus/ccm]. The prediction is fast:
 {code:borderStyle=solid}
 cd cassandra-source-dir with patch applied
 ant
 # turn on consistency logging
 sed -i .bak 's/log_latencies_for_consistency_prediction: 
 false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml
 ccm create consistencytest --cassandra-dir=. 
 ccm populate -n 5
 ccm start
 # if start fails, you might need to initialize more loopback interfaces
 # e.g., sudo ifconfig lo0 alias 127.0.0.2
 # use stress to get some sample latency data
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
 tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read
 bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
 {code}
 h3. What and Why
 We've implemented [Probabilistically Bounded 
 Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
 consistency-latency trade-offs within Cassandra. Our 
 [paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
 2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
 of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
 addition to profiling our own Cassandra deployments. In our experience, 
 prediction is both accurate and much more lightweight than profiling and 
 manually testing each possible replication configuration (especially in 
 production!).
 This analysis is important for the many users we've talked to and heard about 
 who use partial quorum operation (e.g., non-{{QUORUM}} 
 {{ConsistencyLevel}}). Should they use CL={{ONE}}? CL={{TWO}}? It likely 
 depends on their runtime environment and, short of profiling in production, 
 there's no existing way to answer these questions. (Keep in mind, Cassandra 
 defaults to CL={{ONE}}, meaning users don't know how stale their data will 
 be.)
 We outline limitations of the current approach after describing how it's 
 done. We believe that this is a useful feature that can provide guidance and 
 fairly accurate estimation for most users.
 h3. 

[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
{{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
{{THREE}}. What should users choose?

This patch provides a latency-consistency analysis within {{nodetool}}. Users 
can accurately predict Cassandra's behavior in their production environments 
without interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? This patch provides answers via 
{{nodetool predictconsistency}}:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than profiling and 
manually testing each possible replication configuration (especially in 
production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). 
Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime 
environment and, short of profiling in production, there's no existing way to 
answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, 
meaning users don't know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running 

[jira] [Updated] (CASSANDRA-4261) [patch] Support consistency-latency prediction in nodetool

2012-05-19 Thread Peter Bailis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Bailis updated CASSANDRA-4261:


Description: 
h3. Introduction

Cassandra supports a variety of replication configurations: ReplicationFactor 
is set per-ColumnFamily and ConsistencyLevel is set per-request. Setting 
{{ConsistencyLevel}} to {{QUORUM}} for reads and writes ensures strong 
consistency, but {{QUORUM}} is often slower than {{ONE}}, {{TWO}}, or 
{{THREE}}. What should users choose?

This patch provides a latency-consistency analysis within {{nodetool}}. Users 
can accurately predict Cassandra's behavior in their production environments 
without interfering with performance.

What's the probability that we'll read a write t seconds after it completes? 
What about reading one of the last k writes? This patch provides answers via 
{{nodetool predictconsistency}}:

{{nodetool predictconsistency ReplicationFactor TimeAfterWrite Versions}}
\\ \\
{code:title=Example output|borderStyle=solid}

//N == ReplicationFactor
//R == read ConsistencyLevel
//W == write ConsistencyLevel

user@test:$ nodetool predictconsistency 3 100 1
100ms after a given write, with maximum version staleness of k=1
N=3, R=1, W=1
Probability of consistent reads: 0.811700
Average read latency: 6.896300ms (99.900th %ile 174ms)
Average write latency: 8.788000ms (99.900th %ile 252ms)

N=3, R=1, W=2
Probability of consistent reads: 0.867200
Average read latency: 6.818200ms (99.900th %ile 152ms)
Average write latency: 33.226101ms (99.900th %ile 420ms)

N=3, R=1, W=3
Probability of consistent reads: 1.00
Average read latency: 6.766800ms (99.900th %ile 111ms)
Average write latency: 153.764999ms (99.900th %ile 969ms)

N=3, R=2, W=1
Probability of consistent reads: 0.951500
Average read latency: 18.065800ms (99.900th %ile 414ms)
Average write latency: 8.322600ms (99.900th %ile 232ms)

N=3, R=2, W=2
Probability of consistent reads: 0.983000
Average read latency: 18.009001ms (99.900th %ile 387ms)
Average write latency: 35.797100ms (99.900th %ile 478ms)

N=3, R=3, W=1
Probability of consistent reads: 0.993900
Average read latency: 101.959702ms (99.900th %ile 1094ms)
Average write latency: 8.518600ms (99.900th %ile 236ms)
{code}

h3. Demo

Here's an example scenario you can run using 
[ccm|https://github.com/pcmanus/ccm]. The prediction is fast:

{code:borderStyle=solid}
cd cassandra-source-dir with patch applied
ant

# turn on consistency logging
sed -i .bak 's/log_latencies_for_consistency_prediction: 
false/log_latencies_for_consistency_prediction: true/' conf/cassandra.yaml

ccm create consistencytest --cassandra-dir=. 
ccm populate -n 5
ccm start

# if start fails, you might need to initialize more loopback interfaces
# e.g., sudo ifconfig lo0 alias 127.0.0.2

# use stress to get some sample latency data
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o insert
tools/bin/stress -d 127.0.0.1 -l 3 -n 1 -o read

bin/nodetool -h 127.0.0.1 -p 7100 predictconsistency 3 100 1
{code}

h3. What and Why

We've implemented [Probabilistically Bounded 
Staleness|http://pbs.cs.berkeley.edu/#demo], a new technique for predicting 
consistency-latency trade-offs within Cassandra. Our 
[paper|http://arxiv.org/pdf/1204.6082.pdf] will appear in [VLDB 
2012|http://www.vldb2012.org/], and, in it, we've used PBS to profile a range 
of Dynamo-style data store deployments at places like LinkedIn and Yammer in 
addition to profiling our own Cassandra deployments. In our experience, 
prediction is both accurate and much more lightweight than profiling and 
manually testing each possible replication configuration (especially in 
production!).

This analysis is important for the many users we've talked to and heard about 
who use partial quorum operation (e.g., non-{{QUORUM}} {{ConsistencyLevel}}). 
Should they use CL={{ONE}}? CL={{TWO}}? It likely depends on their runtime 
environment and, short of profiling in production, there's no existing way to 
answer these questions. (Keep in mind, Cassandra defaults to CL={{ONE}}, 
meaning users don't know how stale their data will be.)

We outline limitations of the current approach after describing how it's done. 
We believe that this is a useful feature that can provide guidance and fairly 
accurate estimation for most users.

h3. Interface

This patch allows users to perform this prediction in production using 
{{nodetool}}.

Users enable tracing of latency data by setting 
{{log_latencies_for_consistency_prediction: true}} in {{cassandra.yaml}}.

Cassandra logs {{max_logged_latencies_for_consistency_prediction}} latencies. 
Each latency is 8 bytes, and there are 4 distributions we require, so the space 
overhead is {{32*logged_latencies}} bytes of memory for the predicting node.

{{nodetool predictconsistency}} predicts the latency and consistency for each 
possible {{ConsistencyLevel}} setting (reads and writes) by running